Creates a new ScenarioExecution instance.
The scenario configuration containing agents, settings, and metadata
The ordered sequence of script steps that define the test flow
Readonlyevents$An observable stream of events that occur during the scenario execution. Subscribe to this to monitor the progress of the scenario in real-time.
Events include:
Gets the complete conversation history as an array of messages.
Array of ModelMessage objects representing the full conversation
Gets the result of the scenario execution if it has been set.
The scenario result or undefined if not yet set
Gets the unique identifier for the conversation thread. This ID is used to maintain conversation context across multiple runs.
The thread identifier string
Adds execution time for a specific agent to the performance tracking.
This method is used internally to track how long each agent takes to respond, which is included in the final scenario result for performance analysis. The accumulated time for each agent is used to calculate total agent response times in the scenario result.
The index of the agent in the agents array
The execution time in milliseconds to add to the agent's total
Executes an agent turn in the conversation.
If content is provided, it's used directly as the agent's response. If not provided, the agent under test is called to generate a response based on the current conversation context and any pending messages.
This method is part of the ScenarioExecutionLike interface used by script steps.
Optionalcontent: string | ModelMessageOptional content for the agent's response. Can be a string or ModelMessage. If not provided, the agent under test will generate the response.
Executes the entire scenario from start to finish.
This method runs through all script steps sequentially until a final result (success, failure, or error) is determined. Each script step can trigger one or more agent interactions depending on the step type:
user() and agent() steps typically trigger one agent interaction eachproceed() steps can trigger multiple agent interactions across multiple turnsjudge() steps trigger the judge agent to evaluate the conversationsucceed() and fail() steps immediately end the scenarioThe execution will stop early if:
A promise that resolves with the final result of the scenario
Immediately ends the scenario with a failure verdict.
This method forces the scenario to end with failure, regardless of the current conversation state. It's useful for scenarios where you want to explicitly mark failure based on specific conditions or external factors.
This method is part of the ScenarioExecutionLike interface used by script steps.
Optionalreasoning: stringOptional explanation for why the scenario is being marked as failed
A promise that resolves with the final failed scenario result
Invokes the judge agent to evaluate the current state of the conversation.
The judge agent analyzes the conversation history and determines whether the scenario criteria have been met. This can result in either:
This method is part of the ScenarioExecutionLike interface used by script steps.
Optionalcontent: string | ModelMessageOptional message to pass to the judge agent for additional context
A promise that resolves with:
Adds a message to the conversation history.
This method is part of the ScenarioExecutionLike interface used by script steps. It automatically routes the message to the appropriate agent based on the message role:
The ModelMessage to add to the conversation
Lets the scenario proceed automatically for a specified number of turns.
This method is a script step that simulates natural conversation flow by allowing agents to interact automatically without explicit script steps. It can trigger multiple agent interactions across multiple turns, making it useful for testing scenarios where you want to see how agents behave in extended conversations.
Unlike other script steps that typically trigger one agent interaction each, this step can trigger many agent interactions depending on the number of turns and the agents' behavior.
The method will continue until:
Optionalturns: numberThe number of turns to proceed. If undefined, runs until a conclusion or max turns is reached
OptionalonTurn: (state: ScenarioExecutionStateLike) => void | Promise<void>Optional callback executed at the end of each turn. Receives the current execution state
OptionalonStep: (state: ScenarioExecutionStateLike) => void | Promise<void>Optional callback executed after each agent interaction. Receives the current execution state
A promise that resolves with:
// Proceed for 5 turns
const result = await execution.proceed(5);
// Proceed until conclusion with callbacks
const result = await execution.proceed(
undefined,
(state) => console.log(`Turn ${state.currentTurn} completed`),
(state) => console.log(`Agent interaction completed, ${state.messages.length} messages`)
);
Executes a single agent interaction in the scenario.
This method is for manual step-by-step execution of the scenario, where each call
represents one agent taking their turn. This is different from script steps (like
user(), agent(), proceed(), etc.) which are functions in the scenario script.
Each call to this method will:
Note: This method is primarily for debugging or custom execution flows. Most users
will use execute() to run the entire scenario automatically.
After calling this method, check this.result to see if the scenario has concluded.
Immediately ends the scenario with a success verdict.
This method forces the scenario to end successfully, regardless of the current conversation state. It's useful for scenarios where you want to explicitly mark success based on specific conditions or external factors.
This method is part of the ScenarioExecutionLike interface used by script steps.
Optionalreasoning: stringOptional explanation for why the scenario is being marked as successful
A promise that resolves with the final successful scenario result
Executes a user turn in the conversation.
If content is provided, it's used directly as the user's message. If not provided, the user simulator agent is called to generate an appropriate response based on the current conversation context.
This method is part of the ScenarioExecutionLike interface used by script steps.
Optionalcontent: string | ModelMessageOptional content for the user's message. Can be a string or ModelMessage. If not provided, the user simulator agent will generate the content.
Manages the execution of a single scenario test.
This class orchestrates the interaction between agents (user simulator, agent under test, and judge), executes the test script step-by-step, and manages the scenario's state throughout execution. It also emits events that can be subscribed to for real-time monitoring of the scenario's progress.
Execution Flow Overview
The execution follows a turn-based system where agents take turns responding. The key concepts are:
user(),agent(),proceed(), etc.Message Broadcasting System
The class implements a sophisticated message broadcasting system that ensures all agents can "hear" each other's messages:
broadcastMessage()pendingMessages) that stores messages from other agentsThis creates a realistic conversation environment where agents can respond contextually to the full conversation history and any new messages from other agents.
Example Message Flow
Each script step can trigger one or more agent interactions depending on the step type. For example, a
proceed(5)step might trigger 10 agent interactions across 5 turns.Note: This is an internal class. Most users will interact with the higher-level
scenario.run()function instead of instantiating this class directly.Example