Writing Scenarios
Basic Scenario Structure
Every scenario test follows this basic pattern:
result = await scenario.run(
name="descriptive test name",
description="detailed scenario context",
agents=[
YourAgent(),
scenario.UserSimulatorAgent(),
scenario.JudgeAgent(criteria=["success criteria"])
],
script=[] # Optional
)
Name and Description
The name should be concise and descriptive:
# Good names
name="weather query with location clarification"
name="booking cancellation with refund request"
name="technical support escalation"
# Avoid generic names
name="test 1"
name="agent test"
name="basic scenario"
The description provides context that guides the user simulator:
description="""
User is planning a weekend trip and needs weather information.
They initially ask about general weather but then want specific
details about outdoor activities. They might be concerned about rain.
"""
The setId
is included in all events emitted during scenario execution, making it easy to filter and analyze related test results in monitoring systems.
Grouping Your Sets and Batches
While optional, we strongly recommend setting stable identifiers for your scenarios, sets, and batches for better organization and tracking in LangWatch.
- set_id: Groups related scenarios into a test suite. This corresponds to the "Simulation Set" in the UI.
- batch_run_id: Groups all scenarios that were run together in a single execution (e.g., a single CI job). This is automatically generated but can be overridden.
import os
result = await scenario.run(
name="my first scenario",
description="A simple test to see if the agent responds.",
set_id="my-test-suite",
agents=[
scenario.Agent(my_agent),
scenario.UserSimulatorAgent(),
]
)
You can also set the batch_run_id
using environment variables for CI/CD integration:
import os
# Set batch ID for CI/CD integration
os.environ["SCENARIO_BATCH_RUN_ID"] = os.environ.get("GITHUB_RUN_ID", "local-run")
result = await scenario.run(
name="my first scenario",
description="A simple test to see if the agent responds.",
set_id="my-test-suite",
agents=[
scenario.Agent(my_agent),
scenario.UserSimulatorAgent(),
]
)
The batch_run_id
is automatically generated for each test run, but you can also set it globally using the SCENARIO_BATCH_RUN_ID
environment variable.
Connect Your Agent
To start testing your agent, you first need to connect it to the scenario, this is done through the AgentAdapter
interface:
import scenario
class MyAgent(scenario.AgentAdapter):
async def call(self, input: scenario.AgentInput) -> scenario.AgentReturnTypes:
# Get the user's message
user_message = input.last_new_user_message_str()
# Call your existing agent
response = await my_existing_agent.process(user_message)
# Return the response (can be string, OpenAI message, or list of messages)
return response
The adapter pattern lets you connect any existing agent - whether it's a simple function or custom framework - without modifying your agent's code.
For detailed integration patterns including simple string, OpenAI-messages, or framework-specific integrations, see Agent Integration.
Example: Complete Scenario
Here's a complete example showing all concepts together:
@pytest.mark.agent_test
@pytest.mark.asyncio
async def test_customer_service_billing():
class CustomerServiceAgent(scenario.AgentAdapter):
async def call(self, input: scenario.AgentInput) -> scenario.AgentReturnTypes:
return await customer_service_bot.process(
messages=input.messages,
context={"department": "billing"}
)
result = await scenario.run(
name="billing dispute resolution",
description="""
Customer received a bill that seems higher than expected.
They're not angry but are confused and want an explanation.
They have their account information ready and are generally
cooperative but need clear explanations.
""",
agents=[
CustomerServiceAgent(),
scenario.UserSimulatorAgent(),
scenario.JudgeAgent(criteria=[
"Agent asks for account information to look up the bill",
"Agent reviews the bill details with the customer",
"Agent explains any charges that seem unusual or high",
"Agent offers options if there was an error",
"Agent maintains a professional and helpful tone",
"Agent ensures customer understands before ending"
])
],
max_turns=8 # Reasonable limit for this type of interaction
)
assert result.success
# Additional assertions
assert len(result.messages) >= 4 # Should have substantial conversation
assert "account" in str(result.messages).lower() # Should discuss account
Writing Tips
1. Start with User Intent
Begin with what the user wants to accomplish:
# Good: Clear user intent
description="User wants to change their password but forgot their current one"
# Better: Add context and constraints
description="""
User is locked out of their account after multiple failed login attempts.
They need to change their password but don't remember the current one.
They have access to their email but not their phone for 2FA.
"""
2. Include Personality and Context
Make scenarios realistic by adding human elements:
description="""
User is a small business owner who is stressed about tax deadline.
They need help categorizing expenses but aren't familiar with
accounting terms. They appreciate patient explanations and examples.
"""
3. Test Edge Cases
Use scenarios to explore edge cases:
description="""
User initially asks about product A but then changes their mind
and asks about product B, then asks to compare both products.
They're indecisive and might change requirements multiple times.
"""
4. Layer Complexity Gradually
Start simple, then add complexity:
# Basic scenario
description="User wants to book a flight to Paris"
# More complex
description="""
User wants to book a flight to Paris but has specific requirements:
direct flight, departure after 2 PM, willing to be flexible on dates
for better prices. They're traveling for business and need receipts.
"""
Common Patterns
Information Gathering
Test how well your agent collects necessary information:
description="""
User needs technical support but doesn't know technical details.
"""
agents=[
CustomerServiceAgent(),
scenario.UserSimulatorAgent(),
scenario.JudgeAgent(criteria=[
"Agent should ask for the user's account number or email",
"Agent should ask what model is their router",
])
]
Clarification and Confirmation
Test how agents handle ambiguous requests:
description="""
User asks to "cancel my thing" without specifying what they want to cancel.
"""
agents=[
CustomerServiceAgent(),
scenario.UserSimulatorAgent(),
scenario.JudgeAgent(criteria=[
"Agent should ask clarifying questions to identify the correct item and confirm before taking action.",
])
]
Error Recovery
Test how agents handle mistakes, you can enforce it to have made a mistake by using a script
,
and let the rest play out by itself.
description="""
Agent initially misunderstands user's request and offers wrong
solution. User corrects them. Agent should acknowledge the
mistake and provide the right help.
"""
script=[
scenario.user("change my subscription"),
scenario.agent("Sure, I'm going to upgrade you to the Pro plan... done!"),
scenario.user("what!? no I don't want to upgrade, I want to cancel"),
scenario.proceed()
]
Next Steps
Now that you understand how to write scenarios, learn about more advanced techniques:
- Scripted Simulations - Take control of conversation flow
- Cache - Make tests deterministic and faster
- Debug Mode - Debug scenarios interactively