Writing Scenarios

Learn how to create effective scenario tests

Basic Scenario Structure

Every scenario test follows this basic pattern:

result = await scenario.run(
    name="descriptive test name",
    description="detailed scenario context",
    agents=[
        YourAgent(),
        scenario.UserSimulatorAgent(),
        scenario.JudgeAgent(criteria=["success criteria"])
    ],
    script=[] # Optional
)

Name and Description

The name should be concise and descriptive:

# Good names
name="weather query with location clarification"
name="booking cancellation with refund request"
name="technical support escalation"
 
# Avoid generic names
name="test 1"
name="agent test"
name="basic scenario"

The description provides context that guides the user simulator:

description="""
    User is planning a weekend trip and needs weather information.
    They initially ask about general weather but then want specific
    details about outdoor activities. They might be concerned about rain.
"""

The setId is included in all events emitted during scenario execution, making it easy to filter and analyze related test results in monitoring systems.

Grouping Your Sets and Batches

While optional, we strongly recommend setting stable identifiers for your scenarios, sets, and batches for better organization and tracking in LangWatch.

set_id: Groups related scenarios into a test suite. This corresponds to the "Simulation Set" in the UI.
batch_run_id: Groups all scenarios that were run together in a single execution (e.g., a single CI job). This is automatically generated but can be overridden.

import os
 
result = await scenario.run(
    name="my first scenario",
    description="A simple test to see if the agent responds.",
    set_id="my-test-suite",
    agents=[
        scenario.Agent(my_agent),
        scenario.UserSimulatorAgent(),
    ]
)

You can also set the batch_run_id using environment variables for CI/CD integration:

import os
 
# Set batch ID for CI/CD integration
os.environ["SCENARIO_BATCH_RUN_ID"] = os.environ.get("GITHUB_RUN_ID", "local-run")
 
result = await scenario.run(
    name="my first scenario",
    description="A simple test to see if the agent responds.",
    set_id="my-test-suite",
    agents=[
        scenario.Agent(my_agent),
        scenario.UserSimulatorAgent(),
    ]
)

The batch_run_id is automatically generated for each test run, but you can also set it globally using the SCENARIO_BATCH_RUN_ID environment variable.

Connect Your Agent

To start testing your agent, you first need to connect it to the scenario, this is done through the AgentAdapter interface:

import scenario
 
class MyAgent(scenario.AgentAdapter):
    async def call(self, input: scenario.AgentInput) -> scenario.AgentReturnTypes:
        # Get the user's message
        user_message = input.last_new_user_message_str()
 
        # Call your existing agent
        response = await my_existing_agent.process(user_message)
 
        # Return the response (can be string, OpenAI message, or list of messages)
        return response

The adapter pattern lets you connect any existing agent - whether it's a simple function or custom framework - without modifying your agent's code.

For detailed integration patterns including simple string, OpenAI-messages, or framework-specific integrations, see Agent Integration.

Example: Complete Scenario

Here's a complete example showing all concepts together:

@pytest.mark.agent_test
@pytest.mark.asyncio
async def test_customer_service_billing():
    class CustomerServiceAgent(scenario.AgentAdapter):
        async def call(self, input: scenario.AgentInput) -> scenario.AgentReturnTypes:
            return await customer_service_bot.process(
                messages=input.messages,
                context={"department": "billing"}
            )
 
    result = await scenario.run(
        name="billing dispute resolution",
        description="""
            Customer received a bill that seems higher than expected.
            They're not angry but are confused and want an explanation.
            They have their account information ready and are generally
            cooperative but need clear explanations.
        """,
        agents=[
            CustomerServiceAgent(),
            scenario.UserSimulatorAgent(),
            scenario.JudgeAgent(criteria=[
                "Agent asks for account information to look up the bill",
                "Agent reviews the bill details with the customer",
                "Agent explains any charges that seem unusual or high",
                "Agent offers options if there was an error",
                "Agent maintains a professional and helpful tone",
                "Agent ensures customer understands before ending"
            ])
        ],
        max_turns=8  # Reasonable limit for this type of interaction
    )
 
    assert result.success
 
    # Additional assertions
    assert len(result.messages) >= 4  # Should have substantial conversation
    assert "account" in str(result.messages).lower()  # Should discuss account

Writing Tips

1. Start with User Intent

Begin with what the user wants to accomplish:

# Good: Clear user intent
description="User wants to change their password but forgot their current one"
 
# Better: Add context and constraints
description="""
    User is locked out of their account after multiple failed login attempts.
    They need to change their password but don't remember the current one.
    They have access to their email but not their phone for 2FA.
"""

2. Include Personality and Context

Make scenarios realistic by adding human elements:

description="""
    User is a small business owner who is stressed about tax deadline.
    They need help categorizing expenses but aren't familiar with
    accounting terms. They appreciate patient explanations and examples.
"""

3. Test Edge Cases

Use scenarios to explore edge cases:

description="""
    User initially asks about product A but then changes their mind
    and asks about product B, then asks to compare both products.
    They're indecisive and might change requirements multiple times.
"""

4. Layer Complexity Gradually

Start simple, then add complexity:

# Basic scenario
description="User wants to book a flight to Paris"
 
# More complex
description="""
    User wants to book a flight to Paris but has specific requirements:
    direct flight, departure after 2 PM, willing to be flexible on dates
    for better prices. They're traveling for business and need receipts.
"""

Common Patterns

Information Gathering

Test how well your agent collects necessary information:

description="""
    User needs technical support but doesn't know technical details.
"""
agents=[
    CustomerServiceAgent(),
    scenario.UserSimulatorAgent(),
    scenario.JudgeAgent(criteria=[
        "Agent should ask for the user's account number or email",
        "Agent should ask what model is their router",
    ])
]

Clarification and Confirmation

Test how agents handle ambiguous requests:

description="""
    User asks to "cancel my thing" without specifying what they want to cancel.
"""
agents=[
    CustomerServiceAgent(),
    scenario.UserSimulatorAgent(),
    scenario.JudgeAgent(criteria=[
        "Agent should ask clarifying questions to identify the correct item and confirm before taking action.",
    ])
]

Error Recovery

Test how agents handle mistakes, you can enforce it to have made a mistake by using a script, and let the rest play out by itself.

description="""
    Agent initially misunderstands user's request and offers wrong
    solution. User corrects them. Agent should acknowledge the
    mistake and provide the right help.
"""
script=[
    scenario.user("change my subscription"),
    scenario.agent("Sure, I'm going to upgrade you to the Pro plan... done!"),
    scenario.user("what!? no I don't want to upgrade, I want to cancel"),
    scenario.proceed()
]

Next Steps

Now that you understand how to write scenarios, learn about more advanced techniques:

Scripted Simulations - Take control of conversation flow
Cache - Make tests deterministic and faster
Debug Mode - Debug scenarios interactively