Class ScenarioExecution

Manages the execution of a single scenario test.

This class orchestrates the interaction between agents (user simulator, agent under test, and judge), executes the test script step-by-step, and manages the scenario's state throughout execution. It also emits events that can be subscribed to for real-time monitoring of the scenario's progress.

Execution Flow Overview

The execution follows a turn-based system where agents take turns responding. The key concepts are:

Script Steps: Functions in the scenario script like user(), agent(), proceed(), etc.
Agent Interactions: Individual agent responses that occur when an agent takes their turn
Turns: Groups of agent interactions that happen in sequence

Message Broadcasting System

The class implements a sophisticated message broadcasting system that ensures all agents can "hear" each other's messages:

Message Creation: When an agent sends a message, it's added to the conversation history
Broadcasting: The message is immediately broadcast to all other agents via broadcastMessage()
Queue Management: Each agent has a pending message queue (pendingMessages) that stores messages from other agents
Agent Input: When an agent is called, it receives both the full conversation history and any new pending messages that have been broadcast to it
Queue Clearing: After an agent processes its pending messages, its queue is cleared

This creates a realistic conversation environment where agents can respond contextually to the full conversation history and any new messages from other agents.

Example Message Flow

Turn 1:
1. User Agent sends: "Hello"
   - Added to conversation history
   - Broadcast to Agent and Judge (pendingMessages[1] = ["Hello"], pendingMessages[2] = ["Hello"])

2. Agent is called:
   - Receives: full conversation + pendingMessages[1] = ["Hello"]
   - Sends: "Hi there! How can I help you?"
   - Added to conversation history
   - Broadcast to User and Judge (pendingMessages[0] = ["Hi there!..."], pendingMessages[2] = ["Hello", "Hi there!..."])
   - pendingMessages[1] is cleared

3. Judge is called:
   - Receives: full conversation + pendingMessages[2] = ["Hello", "Hi there!..."]
   - Evaluates and decides to continue
   - pendingMessages[2] is cleared

Each script step can trigger one or more agent interactions depending on the step type. For example, a proceed(5) step might trigger 10 agent interactions across 5 turns.

Note: This is an internal class. Most users will interact with the higher-level scenario.run() function instead of instantiating this class directly.

Example

import scenario from "@langwatch/scenario";

// This is a simplified example of what `scenario.run` does internally.
const result = await scenario.run({
  name: "My First Scenario",
  description: "A simple test of the agent's greeting.",
  agents: [
    scenario.userSimulatorAgent(),
    scenario.judgeAgent({
      criteria: ["Agent should respond with a greeting"],
    }),
  ],
  script: [
    scenario.user("Hello"),     // Script step 1: triggers 1 agent interaction
    scenario.agent(),           // Script step 2: triggers 1 agent interaction
    scenario.proceed(3),        // Script step 3: triggers multiple agent interactions
    scenario.judge(),           // Script step 4: triggers 1 agent interaction
  ]
});

console.log("Scenario result:", result.success);

Implements

ScenarioExecutionLike

Index

Constructors

constructor

new ScenarioExecution(
config: ScenarioConfig,
script: ScriptStep[],
): ScenarioExecution
Creates a new ScenarioExecution instance.
Parameters
- config: ScenarioConfig
  The scenario configuration containing agents, settings, and metadata
- script: ScriptStep[]
  The ordered sequence of script steps that define the test flow
Returns ScenarioExecution
- Defined in src/execution/scenario-execution.ts:182

Properties

`Readonly`events$

events$: Observable<
    | {
        batchRunId: string;
        metadata: { description?: string; name?: string };
        rawEvent?: any;
        scenarioId: string;
        scenarioRunId: string;
        scenarioSetId: string;
        timestamp: number;
        type: RUN_STARTED;
    }
    | {
        batchRunId: string;
        rawEvent?: any;
        results?: | {
            error?: string;
            metCriteria: string[];
            reasoning?: string;
            unmetCriteria: string[];
            verdict: Verdict;
        }
        | null;
        scenarioId: string;
        scenarioRunId: string;
        scenarioSetId: string;
        status: ScenarioRunStatus;
        timestamp: number;
        type: RUN_FINISHED;
    }
    | {
        batchRunId: string;
        messages: (
            | { content: string; id: string; name?: string; role: "developer" }
            | { content: string; id: string; name?: string; role: "system" }
            | {
                content?: string;
                id: string;
                name?: string;
                role: "assistant";
                toolCalls?: {
                    function: { arguments: string; name: string };
                    id: string;
                    type: "function";
                }[];
            }
            | { content: string; id: string; name?: string; role: "user" }
            | { content: string; id: string; role: "tool"; toolCallId: string }
        )[];
        rawEvent?: any;
        scenarioId: string;
        scenarioRunId: string;
        scenarioSetId: string;
        timestamp: number;
        type: MESSAGE_SNAPSHOT;
    },
> = ...

An observable stream of events that occur during the scenario execution. Subscribe to this to monitor the progress of the scenario in real-time.

Events include:

RUN_STARTED: When scenario execution begins
MESSAGE_SNAPSHOT: After each message is added to the conversation
RUN_FINISHED: When scenario execution completes (success/failure/error)

Accessors

messages

get messages(): ModelMessage[]
Gets the complete conversation history as an array of messages.

Returns ModelMessage[]
Array of ModelMessage objects representing the full conversation
Implementation of ScenarioExecutionLike.messages
- Defined in src/execution/scenario-execution.ts:205

result

get result(): ScenarioResult | undefined
Gets the result of the scenario execution if it has been set.

Returns ScenarioResult | undefined
The scenario result or undefined if not yet set
- Defined in src/execution/scenario-execution.ts:224

threadId

get threadId(): string
Gets the unique identifier for the conversation thread. This ID is used to maintain conversation context across multiple runs.

Returns string
The thread identifier string
Implementation of ScenarioExecutionLike.threadId
- Defined in src/execution/scenario-execution.ts:215

Methods

addAgentTime

addAgentTime(agentIdx: number, time: number): void
Adds execution time for a specific agent to the performance tracking.

This method is used internally to track how long each agent takes to respond, which is included in the final scenario result for performance analysis. The accumulated time for each agent is used to calculate total agent response times in the scenario result.
Parameters
- agentIdx: number
  The index of the agent in the agents array
- time: number
  The execution time in milliseconds to add to the agent's total
Returns void
Example
```
// This is typically called internally by the execution engine
execution.addAgentTime(0, 1500); // Agent at index 0 took 1.5 seconds
```
- Defined in src/execution/scenario-execution.ts:786

agent

agent(content?: string | ModelMessage): Promise<void>
Executes an agent turn in the conversation.

If content is provided, it's used directly as the agent's response. If not provided, the agent under test is called to generate a response based on the current conversation context and any pending messages.

This method is part of the ScenarioExecutionLike interface used by script steps.
Parameters
- Optionalcontent: string | ModelMessage
  Optional content for the agent's response. Can be a string or ModelMessage. If not provided, the agent under test will generate the response.
Returns Promise<void>
Example
```
// Let agent generate response
await execution.agent();

// Use provided content
await execution.agent("The weather is sunny today!");

// Use a ModelMessage object
await execution.agent({
  role: "assistant",
  content: "I'm here to help you with weather information."
});
```
Implementation of ScenarioExecutionLike.agent
- Defined in src/execution/scenario-execution.ts:598

execute

execute(): Promise<ScenarioResult>
Executes the entire scenario from start to finish.

This method runs through all script steps sequentially until a final result (success, failure, or error) is determined. Each script step can trigger one or more agent interactions depending on the step type:
- user() and agent() steps typically trigger one agent interaction each
- proceed() steps can trigger multiple agent interactions across multiple turns
- judge() steps trigger the judge agent to evaluate the conversation
- succeed() and fail() steps immediately end the scenario
The execution will stop early if:
- A script step returns a ScenarioResult
- The maximum number of turns is reached
- An error occurs during execution
Returns Promise<ScenarioResult>
A promise that resolves with the final result of the scenario
Throws
Error if an unhandled exception occurs during execution
Example
```
const execution = new ScenarioExecution(config, script);
const result = await execution.execute();
console.log(`Scenario ${result.success ? 'passed' : 'failed'}`);
```
- Defined in src/execution/scenario-execution.ts:290

fail

fail(reasoning?: string): Promise<ScenarioResult>
Immediately ends the scenario with a failure verdict.

This method forces the scenario to end with failure, regardless of the current conversation state. It's useful for scenarios where you want to explicitly mark failure based on specific conditions or external factors.

This method is part of the ScenarioExecutionLike interface used by script steps.
Parameters
- Optionalreasoning: string
  Optional explanation for why the scenario is being marked as failed
Returns Promise<ScenarioResult>
A promise that resolves with the final failed scenario result
Example
```
// Mark failure with default reasoning
const result = await execution.fail();

// Mark failure with custom reasoning
const result = await execution.fail(
  "Agent failed to provide accurate weather information"
);
```
Implementation of ScenarioExecutionLike.fail
- Defined in src/execution/scenario-execution.ts:759

judge

judge(content?: string | ModelMessage): Promise<ScenarioResult | null>
Invokes the judge agent to evaluate the current state of the conversation.

The judge agent analyzes the conversation history and determines whether the scenario criteria have been met. This can result in either:
- A final scenario result (success/failure) if the judge makes a decision
- Null if the judge needs more information or conversation to continue
This method is part of the ScenarioExecutionLike interface used by script steps.
Parameters
- Optionalcontent: string | ModelMessage
  Optional message to pass to the judge agent for additional context
Returns Promise<ScenarioResult | null>
A promise that resolves with:
- ScenarioResult if the judge makes a final decision, or
- Null if the conversation should continue
Example
```
// Let judge evaluate current state
const result = await execution.judge();
if (result) {
  console.log(`Judge decided: ${result.success ? 'pass' : 'fail'}`);
}

// Provide additional context to judge
const result = await execution.judge("Please consider the user's satisfaction level");
```
Implementation of ScenarioExecutionLike.judge
- Defined in src/execution/scenario-execution.ts:629

message

message(message: ModelMessage): Promise<void>
Adds a message to the conversation history.

This method is part of the ScenarioExecutionLike interface used by script steps. It automatically routes the message to the appropriate agent based on the message role:
- "user" messages are routed to USER role agents
- "assistant" messages are routed to AGENT role agents
- Other message types are added directly to the conversation
Parameters
- message: ModelMessage
  The ModelMessage to add to the conversation
Returns Promise<void>
Example
```
await execution.message({
  role: "user",
  content: "Hello, how are you?"
});
```
Implementation of ScenarioExecutionLike.message
- Defined in src/execution/scenario-execution.ts:529

proceed

proceed(
    turns?: number,
    onTurn?: (state: ScenarioExecutionStateLike) => void | Promise<void>,
    onStep?: (state: ScenarioExecutionStateLike) => void | Promise<void>,
): Promise<ScenarioResult | null>
Lets the scenario proceed automatically for a specified number of turns.

This method is a script step that simulates natural conversation flow by allowing agents to interact automatically without explicit script steps. It can trigger multiple agent interactions across multiple turns, making it useful for testing scenarios where you want to see how agents behave in extended conversations.

Unlike other script steps that typically trigger one agent interaction each, this step can trigger many agent interactions depending on the number of turns and the agents' behavior.

The method will continue until:
- The specified number of turns is reached
- A final scenario result is determined
- The maximum turns limit is reached
Parameters
- Optionalturns: number
  The number of turns to proceed. If undefined, runs until a conclusion or max turns is reached
- OptionalonTurn: (state: ScenarioExecutionStateLike) => void | Promise<void>
  Optional callback executed at the end of each turn. Receives the current execution state
- OptionalonStep: (state: ScenarioExecutionStateLike) => void | Promise<void>
  Optional callback executed after each agent interaction. Receives the current execution state
Returns Promise<ScenarioResult | null>
A promise that resolves with:
- ScenarioResult if a conclusion is reached during the proceeding, or
- Null if the specified turns complete without conclusion
Example
```
// Proceed for 5 turns
const result = await execution.proceed(5);

// Proceed until conclusion with callbacks
const result = await execution.proceed(
  undefined,
  (state) => console.log(`Turn ${state.currentTurn} completed`),
  (state) => console.log(`Agent interaction completed, ${state.messages.length} messages`)
);
```
Implementation of ScenarioExecutionLike.proceed
- Defined in src/execution/scenario-execution.ts:673

step

step(): Promise<void>
Executes a single agent interaction in the scenario.

This method is for manual step-by-step execution of the scenario, where each call represents one agent taking their turn. This is different from script steps (like user(), agent(), proceed(), etc.) which are functions in the scenario script.

Each call to this method will:
- Progress to the next turn if needed
- Find the next agent that should act
- Execute that agent's response
- Set the result if the scenario concludes
Note: This method is primarily for debugging or custom execution flows. Most users will use execute() to run the entire scenario automatically.

After calling this method, check this.result to see if the scenario has concluded.
Returns Promise<void>
Example
```
const execution = new ScenarioExecution(config, script);

// Execute one agent interaction at a time
await execution.step();
if (execution.result) {
  console.log('Scenario finished:', execution.result.success);
}
```
- Defined in src/execution/scenario-execution.ts:392

succeed

succeed(reasoning?: string): Promise<ScenarioResult>
Immediately ends the scenario with a success verdict.

This method forces the scenario to end successfully, regardless of the current conversation state. It's useful for scenarios where you want to explicitly mark success based on specific conditions or external factors.

This method is part of the ScenarioExecutionLike interface used by script steps.
Parameters
- Optionalreasoning: string
  Optional explanation for why the scenario is being marked as successful
Returns Promise<ScenarioResult>
A promise that resolves with the final successful scenario result
Example
```
// Mark success with default reasoning
const result = await execution.succeed();

// Mark success with custom reasoning
const result = await execution.succeed(
  "User successfully completed the onboarding flow"
);
```
Implementation of ScenarioExecutionLike.succeed
- Defined in src/execution/scenario-execution.ts:725

user

user(content?: string | ModelMessage): Promise<void>
Executes a user turn in the conversation.

If content is provided, it's used directly as the user's message. If not provided, the user simulator agent is called to generate an appropriate response based on the current conversation context.

This method is part of the ScenarioExecutionLike interface used by script steps.
Parameters
- Optionalcontent: string | ModelMessage
  Optional content for the user's message. Can be a string or ModelMessage. If not provided, the user simulator agent will generate the content.
Returns Promise<void>
Example
```
// Use provided content
await execution.user("What's the weather like?");

// Let user simulator generate content
await execution.user();

// Use a ModelMessage object
await execution.user({
  role: "user",
  content: "Tell me a joke"
});
```
Implementation of ScenarioExecutionLike.user
- Defined in src/execution/scenario-execution.ts:567

Class ScenarioExecution

Execution Flow Overview

Message Broadcasting System

Example Message Flow

Example

Implements

Index

Constructors

Properties

Accessors

Methods

Constructors

constructor

Parameters

Returns ScenarioExecution

Properties

Readonlyevents$

Accessors

messages

Returns ModelMessage[]

result

Returns ScenarioResult | undefined

threadId

Returns string

Methods

addAgentTime

Parameters

Returns void

Example

agent

Parameters

Returns Promise<void>

Example

execute

Returns Promise<ScenarioResult>

Throws

Example

fail

Parameters

Returns Promise<ScenarioResult>

Example

judge

Parameters

Returns Promise<ScenarioResult | null>

Example

message

Parameters

Returns Promise<void>

Example

proceed

Parameters

Returns Promise<ScenarioResult | null>

Example

step

Returns Promise<void>

Example

succeed

Parameters

Returns Promise<ScenarioResult>

Example

user

Parameters

Returns Promise<void>

Example

Settings

On This Page

`Readonly`events$