A powerful TypeScript library for testing AI agents in realistic, scripted scenarios.
Scenario provides a declarative DSL for defining test cases, allowing you to control conversation flow, simulate user behavior, and evaluate agent performance against predefined criteria.
userSimulatorAgent
to generate natural user interactions.judgeAgent
to automatically assess conversations against success criteria.AgentAdapter
interface.pnpm add @langwatch/scenario
# or
npm install @langwatch/scenario
# or
yarn add @langwatch/scenario
Create your first scenario test in under a minute.
// echo.test.ts
import scenario, { type AgentAdapter, AgentRole } from "@langwatch/scenario";
// 1. Create an adapter for your agent
const echoAgent: AgentAdapter = {
role: AgentRole.AGENT,
call: async (input) => {
// This agent simply echoes back the last message content
const lastMessage = input.messages[input.messages.length - 1];
return `You said: ${lastMessage.content}`;
},
};
// 2. Define and run your scenario
async function testEchoAgent() {
const result = await scenario.run({
name: "Echo Agent Test",
description: "The agent should echo back the user's message.",
agents: [echoAgent],
script: [
scenario.user("Hello world!"),
scenario.agent("You said: Hello world!"), // You can assert the agent's response directly
scenario.succeed("Agent correctly echoed the message."),
],
});
if (result.success) {
console.log("✅ Scenario passed!");
} else {
console.error(`❌ Scenario failed: ${result.reasoning}`);
}
}
testEchoAgent();
Scenario integrates seamlessly with test runners like Vitest or Jest. Here's a more advanced example testing an AI-powered weather agent.
// weather.test.ts
import { describe, it, expect } from "vitest";
import { openai } from "@ai-sdk/openai";
import scenario, { type AgentAdapter, AgentRole } from "@langwatch/scenario";
import { generateText, tool } from "ai";
import { z } from "zod";
describe("Weather Agent", () => {
it("should get the weather for a city", async () => {
// 1. Define the tools your agent can use
const getCurrentWeather = tool({
description: "Get the current weather in a given city.",
parameters: z.object({
city: z.string().describe("The city to get the weather for."),
}),
execute: async ({ city }) =>
`The weather in ${city} is cloudy with a temperature of 24°C.`,
});
// 2. Create an adapter for your agent
const weatherAgent: AgentAdapter = {
role: AgentRole.AGENT,
call: async (input) => {
const response = await generateText({
model: openai("gpt-4.1"),
system: `You are a helpful assistant that may help the user with weather information.`,
messages: input.messages,
tools: { get_current_weather: getCurrentWeather },
});
if (response.toolCalls?.length) {
// For simplicity, we'll just return the arguments of the first tool call
const { toolName, args } = response.toolCalls[0];
return {
role: "tool",
content: [{ type: "tool-result", toolName, result: args }],
};
}
return response.text;
},
};
// 3. Define and run your scenario
const result = await scenario.run({
name: "Checking the weather",
description:
"The user asks for the weather in a specific city, and the agent should use the weather tool to find it.",
agents: [
weatherAgent,
scenario.userSimulatorAgent({ model: openai("gpt-4.1") }),
],
script: [
scenario.user("What's the weather like in Barcelona?"),
scenario.agent(),
// You can use inline assertions within your script
(state) => {
expect(state.hasToolCall("get_current_weather")).toBe(true);
},
scenario.succeed("Agent correctly used the weather tool."),
],
});
// 4. Assert the final result
expect(result.success).toBe(true);
});
});
run(config)
The main function to execute a scenario. It takes a configuration object and returns a promise that resolves with the final ScenarioResult
.
ScenarioConfig
The configuration object for a scenario.
name: string
: A human-readable name for the scenario.description: string
: A detailed description of what the scenario tests.agents: AgentAdapter[]
: A list of agents participating in the scenario.script?: ScriptStep[]
: An optional array of steps to control the scenario flow. If not provided, the scenario will proceed automatically.maxTurns?: number
: The maximum number of conversation turns before a timeout. Defaults to 10.verbose?: boolean
: Enables detailed logging during execution.setId?: string
: (Optional) Groups related scenarios into a test suite ("Simulation Set"). Useful for organizing and tracking scenarios in the UI and reporting. If not provided, the scenario will not be grouped into a set.Agents are the participants in a scenario. They are defined by the AgentAdapter
interface.
export interface AgentAdapter {
role: AgentRole; // USER, AGENT, or JUDGE
call: (input: AgentInput) => Promise<AgentReturnTypes>;
}
Scenario provides built-in agents for common testing needs:
userSimulatorAgent(config)
: Simulates a human user, generating realistic messages based on the scenario description.judgeAgent(config)
: Evaluates the conversation against a set of criteria and determines if the scenario succeeds or fails.Scripts provide fine-grained control over the scenario's execution. A script is an array of ScriptStep
functions.
A ScriptStep
is a function that receives the current ScenarioExecutionState
and the ScenarioExecutionLike
context.
Built-in Script Steps:
user(content?)
: A user turn. If content
is provided, it's used as the message. Otherwise, the userSimulatorAgent
generates one.agent(content?)
: An agent turn. If content
is provided, it's used as the message. Otherwise, the agent under test generates a response.judge(content?)
: Forces the judgeAgent
to make a decision.message(message)
: Adds a specific CoreMessage
to the conversation.proceed(turns?, onTurn?, onStep?)
: Lets the scenario run automatically.succeed(reasoning?)
: Ends the scenario with a success verdict.fail(reasoning?)
: Ends the scenario with a failure verdict.You can also provide your own functions as script steps for making assertions:
import { expect } from "vitest";
const script = [
user("Hello"),
agent(),
(state) => {
// Make assertions on the state
expect(state.lastAssistantMessage?.content).toContain("Hi there");
},
succeed(),
];
You can configure project-wide defaults by creating a scenario.config.js
or scenario.config.mjs
file in your project root.
// scenario.config.mjs
import { defineConfig } from "@langwatch/scenario/config";
import { openai } from "@ai-sdk/openai";
export default defineConfig({
// Set a default model provider for all agents (e.g., userSimulatorAgent, judgeAgent)
defaultModel: {
model: openai("gpt-4o-mini"),
temperature: 0.1,
},
});
The library will automatically load this configuration.
The following configuration options are all optional. You can specify any combination of them in your scenario.config.js
file.
defaultModel
(Optional): An object to configure the default AI model for all agents.
model
: (Required if defaultModel
is set) An instance of a language model from a provider like @ai-sdk/openai
.temperature
(Optional): The default temperature for the model (e.g., 0.1
).maxTokens
(Optional): The default maximum number of tokens for the model to generate.You can control the library's behavior with the following environment variables:
LOG_LEVEL
: Sets the verbosity of the internal logger. Can be error
, warn
, info
, or debug
. By default, logging is silent.SCENARIO_DISABLE_SIMULATION_REPORT_INFO
: Set to true
to disable the "Scenario Simulation Reporting" banner that is printed to the console when a test run starts.LANGWATCH_API_KEY
: Your LangWatch API key. This is used as a fallback if langwatchApiKey
is not set in your config file.LANGWATCH_ENDPOINT
: The LangWatch reporting endpoint. This is used as a fallback if langwatchEndpoint
is not set in your config file.You can group related scenarios into a set ("Simulation Set") by providing the setId
option. This is useful for organizing your scenarios in the UI and for reporting in LangWatch.
const result = await scenario.run({
name: "my first scenario",
description: "A simple test to see if the agent responds.",
setId: "my-test-suite", // Group this scenario into a set
agents: [myAgent, scenario.userSimulatorAgent()],
});
This will group all scenarios with the same setId
together in the LangWatch UI and reporting tools.
setupFiles
entry enables Scenario's event logging for each test.VitestReporter
provides detailed scenario test reports in your test output.Scenario provides a convenient helper function to enhance your Vitest configuration with all the necessary setup files.
// vitest.config.ts
import { defineConfig } from "vitest/config";
import { withScenario } from "@langwatch/scenario/integrations/vitest/config";
import VitestReporter from "@langwatch/scenario/integrations/vitest/reporter";
export default withScenario(
defineConfig({
test: {
testTimeout: 180000, // 3 minutes, or however long you want to wait for the scenario to run
// Your existing setup files will be preserved and run after Scenario's setup
setupFiles: ["./my-custom-setup.ts"],
// Your existing global setup files will be preserved and run after Scenario's global setup
globalSetup: ["./my-global-setup.ts"],
// Optional: Add the Scenario reporter for detailed test reports
reporters: ["default", new VitestReporter()],
},
})
);
The withScenario
helper automatically:
If you prefer to configure Vitest manually, you can add the Scenario setup files directly:
// vitest.config.ts
import { defineConfig } from "vitest/config";
import VitestReporter from "@langwatch/scenario/integrations/vitest/reporter";
export default defineConfig({
test: {
testTimeout: 180000, // 3 minutes, or however long you want to wait for the scenario to run
setupFiles: ["@langwatch/scenario/integrations/vitest/setup"],
// Optional: Add the Scenario reporter for detailed test reports
reporters: ["default", new VitestReporter()],
},
});
This configuration:
setupFiles
entry enables Scenario's event logging for each testVitestReporter
provides detailed scenario test reports in your test output (optional)This project uses pnpm
for package management.
# Install dependencies
pnpm install
# Build the project
pnpm run build
# Run tests
pnpm test
MIT
When running scenario tests, you can set the SCENARIO_BATCH_RUN_ID
environment variable to uniquely identify a batch of test runs. This is especially useful for grouping results in reporting tools and CI pipelines.
Example:
SCENARIO_BATCH_RUN_ID=my-ci-run-123 pnpm test
If you use the provided test script, a unique batch run ID is generated automatically for each run.