Building a Production-Ready Refund Agent That Won’t Break Your Business

AI agents can automate business processes. But most demos ignore a critical question: What happens when something goes wrong mid-workflow?

Consider a customer refund process:

Process refund via payment gateway → succeeds
Send confirmation email → fails

Your customer now has money refunded but no notification. Your support team has no record of what happened. Your compliance team can’t audit the decision.

This is the production reality that demos skip over.

Today I’m showing you how to build a customer refund agent that handles these failure modes correctly using AgentHelm an open-source framework I built for production-ready agent orchestration.

What Makes a Refund Agent “Production-Ready”?

A toy demo refund agent calls a few tools and returns a result. A production refund agent needs:

Transactional safety: If step 3 fails, steps 1 and 2 are automatically undone
Human approval: High-value refunds require manager sign-off
Audit trails: Every decision is logged for compliance
Error recovery: Failures don’t leave the system in an inconsistent state

Most agent frameworks (LangChain, AutoGPT) handle the first part—calling tools. None of them handle the second part—making it safe for production.

That’s what AgentHelm solves.

The Refund Workflow

Here’s what our agent needs to do:

1. Verify order is eligible for refund
2. Check customer account status
3. Validate refund amount (requires approval if >$100)
4. Process refund via payment gateway
5. Send confirmation email
6. Log audit record

The critical part: If step 5 (email) fails, we need to automatically undo step 4 (refund). We can’t leave a customer refunded without notification.

Building the Agent: Tool by Tool

Step 1: Define the Refund Tool with Rollback

from agenthelm.orchestrator.core.tool import tool

@tool(requires_approval=True, compensating_tool="reverse_refund_transaction")
def process_refund(order_id: str, customer_id: str, refund_amount: float, reason: str) -> dict:
    """Process a refund for an order.
    This is the main action that processes the payment refund.
    Requires approval if the refund amount exceeds $100.
    """
    logger.info(f"Processing refund of ${refund_amount:.2f} for order {order_id}")

    # Get order and customer details
    order = order_db.get_order(order_id)
    customer = customer_db.get_customer(customer_id)

    if not order or not customer:
        return {
            "success": False,
            "error": "Invalid order or customer"
        }

    # Process the refund through the payment processor
    payment_result = payment_processor.process_refund(
        order_id=order_id,
        amount=refund_amount,
        payment_method=order["payment_method"]
    )

    if not payment_result["success"]:
        return {
            "success": False,
            "error": f"Payment processing failed: {payment_result.get('error', 'Unknown error')}"
        }

    # Create refund record
    refund_data = {
        "order_id": order_id,
        "customer_id": customer_id,
        "amount": refund_amount,
        "reason": reason,
        "payment_transaction": payment_result,
        "status": "completed"
    }

    refund_id = refund_db.create_refund(refund_data)

    if not refund_id:
        # If refund record creation fails, we should reverse the payment transaction
        payment_processor.reverse_transaction(payment_result["transaction_id"])
        return {
            "success": False,
            "error": "Failed to create refund record"
        }

    # Update customer's refund history
    customer["refund_history"].append({
        "order_id": order_id,
        "amount": refund_amount,
        "date": datetime.now().isoformat(),
        "reason": reason,
        "refund_id": refund_id
    })

    customer_db.update_customer(customer_id, customer)

    # Update order status
    order["refund_status"] = "refunded"
    order["refund_amount"] = refund_amount
    order["refund_date"] = datetime.now().isoformat()

    order_db.update_order(order_id, order)

    return {
        "success": True,
        "refund_id": refund_id,
        "transaction_id": payment_result["transaction_id"],
        "customer_email": customer["email"]
    }

@tool()
def reverse_refund_transaction(transaction_id: str) -> dict:
    """
    Compensating action for process_refund.
    Reverses a refund transaction if something goes wrong after the payment processing.
    """
    logger.info(f"Reversing refund transaction {transaction_id}")

    result = payment_processor.reverse_transaction(transaction_id)

    if not result:
        return {
            "success": False,
            "error": "Failed to reverse transaction"
        }

    return {
        "success": True,
        "transaction_reversed": True,
        "transaction_id": transaction_id
    }

Key features:

requires_approval=True → Agent pauses and asks for human confirmation
set_compensator() → Automatic rollback if later steps fail

Step 2: Add the Notification Tool (Also with Rollback)

@tool(retries=2, compensating_tool="send_correction_email")
def send_refund_confirmation(customer_email: str, order_id: str, refund_amount: float, refund_id: str) -> dict:
    """Send a confirmation email to the customer about their refund."""
    logger.info(f"Sending refund confirmation email to {customer_email}")

    subject = f"Your Refund for Order {order_id} Has Been Processed"

    body = f"""
Dear Customer,

We're writing to confirm that your refund for Order {order_id} has been processed.

Refund Details:
- Refund ID: {refund_id}
- Amount: ${refund_amount:.2f}
- Date: {datetime.now().strftime('%Y-%m-%d')}

The refund has been issued to your original payment method. Please allow 3-5 business days for the funds to appear in your account.

If you have any questions about this refund, please contact our customer service team and reference your Refund ID.

Thank you for your business.

Best regards,
The Customer Service Team
"""

    email_result = email_service.send_email(customer_email, subject, body)

    if not email_result:
        return {
            "success": False,
            "error": "Failed to send confirmation email"
        }

    return {
        "success": True,
        "email_sent": True,
        "recipient": customer_email
    }

@tool()
def send_correction_email(customer_email: str, order_id: str) -> dict:
    """
    Compensating action for send_refund_confirmation.
    Sends a correction email if the original confirmation had issues.
    """
    logger.info(f"Sending correction email to {customer_email}")

    subject = f"Important Update About Your Refund for Order {order_id}"

    body = f"""
Dear Customer,

We're writing to inform you about an important update regarding your recent refund for Order {order_id}.

There was a technical issue with our refund processing system. Our team is working to resolve this issue as quickly as possible.

Please disregard any previous communication about this refund. We will send you a new confirmation once the refund has been properly processed.

We apologize for any inconvenience this may cause.

If you have any questions, please contact our customer service team.

Best regards,
The Customer Service Team
"""

    email_result = email_service.send_email(customer_email, subject, body)

    if not email_result:
        return {
            "success": False,
            "error": "Failed to send correction email"
        }

    return {
        "success": True,
        "email_sent": True,
        "recipient": customer_email
    }

Key features:

retries=2 → Automatically retry if email service has transient failure
Compensator sends a “we had an issue” email if rollback happens

Step 3: Add Supporting Tools

@tool()
def verify_order_status(order_id: str) -> dict:
    """Verify that an order exists and is eligible for refund."""
    order = order_db.get_order(order_id)

    if not order:
        return {"success": False, "error": f"Order {order_id} not found"}

    if order["status"] not in ["delivered", "shipped"]:
        return {
            "success": False, 
            "error": f"Order {order_id} has status '{order['status']}' which is not eligible for refund"
        }

    return {"success": True, "order_details": order}

@tool()
def verify_customer_eligibility(customer_id: str) -> dict:
    """Verify that a customer is eligible for refunds."""
    customer = customer_db.get_customer(customer_id)

    if not customer:
        return {"success": False, "error": f"Customer {customer_id} not found"}

    if customer["account_status"] != "active":
        return {
            "success": False,
            "error": f"Customer {customer_id} has account status '{customer['account_status']}' which is not eligible for refund"
        }

    return {"success": True, "customer_details": customer}

@tool()
def validate_refund_amount(order_id: str, refund_amount: float) -> dict:
    """Validate that the refund amount is valid for the given order."""
    order = order_db.get_order(order_id)

    if not order:
        return {
            "success": False,
            "error": f"Order {order_id} not found"
        }

    if refund_amount <= 0:
        return {"success": False, "error": "Refund amount must be greater than zero"}

    if refund_amount > order["total_amount"]:
        return {
            "success": False,
            "error": f"Refund amount ${refund_amount:.2f} exceeds order total ${order['total_amount']:.2f}"
        }

    return {
        "success": True,
        "valid_amount": True,
        "requires_approval": refund_amount > 100,
        "order_total": order["total_amount"]
    }

@tool()
def log_audit_record(action: str, details: dict) -> dict:
    """Log action for compliance."""
    audit_record = {
        "timestamp": datetime.now().isoformat(),
        "action": action,
        "details": details
    }

    # For simulation, we read the list, append, and write back.
    try:
        with open("audit_log.json", 'r') as f:
            log = json.load(f)
    except FileNotFoundError:
        log = []

    log.append(audit_record)

    with open("audit_log.json", 'w') as f:
        json.dump(log, f, indent=2)

    return {"logged": True, "action": action}

Step 4: Create the Agent

from agenthelm.orchestrator.core.storage import FileStorage
from agenthelm.orchestrator.core.tracer import ExecutionTracer
from agenthelm.orchestrator.agent import Agent
from agenthelm.orchestrator.llm.mistral_client import MistralClient
from agenthelm.orchestrator.core.handlers import ApprovalHandler
# Setup storage and tracer
storage = FileStorage('refund_agent_trace.json')
tracer = ExecutionTracer(storage)

# Get API key from environment variable
api_key = os.environ.get("MISTRAL_API_KEY")
if not api_key:
    raise ValueError("MISTRAL_API_KEY environment variable not set.")

# Initialize LLM client
client = MistralClient(model_name="mistral-small-latest", api_key=api_key)

# Define the list of tools for the agent
agent_tools = [
        verify_order_status,
        verify_customer_eligibility,
        validate_refund_amount,
        process_refund,
        send_refund_confirmation,
        reverse_refund_transaction,
        send_correction_email,
        log_audit_record
]

# Set up approval handler for tools that require approval
approval_handler = EmailApprovalHandler()
tracer.approval_handler = approval_handler

# Instantiate the Agent
agent = Agent(tools=agent_tools, tracer=tracer, client=client)

Step 5: Run It

result = agent.run(
    "Process a $450 refund for order ORD-1001, customer CUST-001. "
    "Reason: product not as described. Customer email: customer@example.com"
)

How the Agent Thinks: The ReAct Framework

Running the agent is simple, but how does it decide what to do? It uses a reasoning process called ReAct (for Reason and Act).

At each step, the agent doesn’t just blindly pick a tool. Instead, it follows a think-act loop:

Reason: The LLM first thinks about the overall goal, what it has done so far, and what the next logical step should be. It generates short, internal monologue explaining its reasoning.
Act: Based on its reasoning, it chooses a tool and executes it.

This reasoning is not hidden. AgentHelm captures it in the trace file, giving you an incredible tool for debugging and understanding the agent’s behavior.

For example, here is the agent’s first thought when given the refund task:

Agent’s Thought: To process a refund, we first need to verify the customer’s eligibility. This is a prerequisite step before processing the refund.

Based on this thought, it correctly chooses the verify_customer_eligibility tool. After that tool succeeds, the agent thinks again:

Agent’s Thought: The customer eligibility has been verified. The next step is to verify the order status to ensure that the order is eligible for a refund.
And so it calls verify_order_status. This step-by-step reasoning makes the agent’s behavior predictable and auditable.

What Happens: Three Scenarios

Scenario 1: Happy Path (Everything Works)

Verifying order ORD-1001 status... ✓
Checking customer eligibility... ✓
Validating refund amount $450... ✓

Approval Required for Tool: process_refund
   Order ID: ORD-1001
   Amount: $450.00
   Reason: product not as described

   Do you approve this action? [y/N]: y

Processing $450 refund for order ORD-1001... ✓
Sending confirmation to customer@example.com... ✓
AUDIT: refund_completed

Refund processed successfully

Key point: The agent paused and asked for approval because the amount was >$100.

Scenario 2: Email Fails → Automatic Rollback

Verifying order ORD-1001 status... ✓
Checking customer eligibility... ✓
Validating refund amount $450... ✓

Approval Required for Tool: process_refund
Do you approve this action? [y/N]: y

Processing $450 refund for order ORD-1001... ✓
Transaction ID: TXN-12345

Sending confirmation to customer@example.com... ✗ FAILED
Error: Email service unavailable

Workflow failed at step: send_refund_confirmation
Triggering compensating actions...

Calling reverse_refund_transaction(transaction_id='TXN-12345')
Transaction TXN-12345 reversed

Calling send_correction_email(customer_email='customer@example.com')
Correction notice sent

Refund reversed due to email delivery failure

This is the killer feature. The refund was processed, then automatically reversed when the email failed. The system is never in an inconsistent state.

Scenario 3: Approval Denied

Approval Required for Tool: process_refund
Order ID: ORD-1001
Amount: $450.00

Do you approve this action? [y/N]: n

Approval denied by user
Workflow terminated

The refund never happened. Human oversight prevented the transaction.

The Audit Trail

Every action is automatically logged. Here’s what the trace file looks like:

[
    {
      "tool_name": "verify_order_status",
      "timestamp": "2025-10-28T10:15:30Z",
      "inputs": {"order_id": "ORD-1001"},
      "outputs": {"result": {"success": true, "order_details": {}}},
      "execution_time": 0.045,
      "error_state": null
    },
    {
      "tool_name": "process_refund",
      "timestamp": "2025-10-28T10:16:12Z",
      "inputs": {
        "order_id": "ORD-1001", 
        "refund_amount": 450.0,
        "reason": "product not as described"
      },
      "outputs": {
        "result": {
            "refund_id": "REF-1001",
            "transaction_id": "TXN-12345"
        }
      },
      "execution_time": 0.230,
      "error_state": null
    },
    {
      "tool_name": "reverse_refund_transaction",
      "timestamp": "2025-10-28T10:16:45Z",
      "inputs": {"transaction_id": "TXN-12345"},
      "outputs": {"result": {"status": "reversed"}},
      "execution_time": 0.180,
      "error_state": "Compensating action for failed step: send_refund_confirmation"
    }
]

For compliance teams, this is gold. You can prove:

Who approved what
When each action occurred
Why rollbacks happened
Exact inputs/outputs for every step

Lessons Learned Building This

1. Test Rollbacks in Development

Don’t wait until production to find out your compensating actions don’t work.

Solution: Add a flag to simulate failures:

@tool()
def send_refund_confirmation(customer_email: str, simulate_failure: bool = False):
    if simulate_failure:
        raise Exception("Simulated failure for testing")
    # Normal logic

Run your workflows with simulate_failure=True during development.

2. Separate Validation from Action

My first process_refund tool both validated the amount and processed it. This made approvals tricky.

Solution: A dedicated validate_refund_amount tool checks the business rules (e.g., amount < order total, amount > $100 requires approval). The process_refund tool, which requires approval, can then focus only on the transaction. The agent uses the output of the validation to know when to proceed.

3. Atomic Tools Are Easier to Debug

My first version had a single process_refund_workflow tool that did everything. When it failed, I couldn’t tell which step broke.

Solution: Each tool does ONE thing. Easier to test, debug, and reuse.

4. Retries Save You From Flaky APIs

Email services, payment gateways, and external APIs fail randomly. retries=2 saved me countless debugging sessions.

5. The Trace File is Your Best Friend

When something goes wrong in production, the trace file tells you EXACTLY what happened. Invest time making it readable and queryable.

Why This Matters

Most companies won’t deploy AI agents because they can’t trust them.

They’ve seen demos. They know agents CAN work. But they don’t know:

What happens when the agent makes a mistake
How to audit agent decisions for compliance
How to prevent catastrophic failures

AgentHelm solves these problems by bringing distributed systems reliability patterns to AI agents:

Transactional semantics (like databases)
Structured observability (like OpenTelemetry)
Policy enforcement (like API gateways)

This isn’t new technology. It’s applying 20 years of production systems engineering to agents.

Try It Yourself

The full code for this refund agent is on GitHub:

Repo: https://github.com/hadywalied/agenthelm
Example: https://github.com/hadywalied/agenthelm/blob/main/examples/customer_refund_agent/refund_agent.py
Docs: https://hadywalied.github.io/agenthelm/

Install it:

pip install agenthelm

Run the example:

export MISTRAL_API_KEY='your_key_here'
cd examples/refund_agent
python refund_agent.py

If you’re deploying agents in production, I’d love your feedback. What’s the biggest blocker you’re facing: observability, safety, or reliability?

Open an issue on GitHub or comment below. Let’s build reliable AI together.