Creates a new ScenarioExecution instance.
The scenario configuration containing agents, settings, and metadata
The ordered sequence of script steps that define the test flow
Readonly
events$An observable stream of events that occur during the scenario execution. Subscribe to this to monitor the progress of the scenario in real-time.
Events include:
Gets the complete conversation history as an array of messages.
Array of ModelMessage objects representing the full conversation
Gets the unique identifier for the conversation thread. This ID is used to maintain conversation context across multiple runs.
The thread identifier string
Adds execution time for a specific agent to the performance tracking.
This method is used internally to track how long each agent takes to respond, which is included in the final scenario result for performance analysis. The accumulated time for each agent is used to calculate total agent response times in the scenario result.
The index of the agent in the agents array
The execution time in milliseconds to add to the agent's total
Executes an agent turn in the conversation.
If content is provided, it's used directly as the agent's response. If not provided, the agent under test is called to generate a response based on the current conversation context and any pending messages.
This method is part of the ScenarioExecutionLike interface used by script steps.
Optional
content: string | ModelMessageOptional content for the agent's response. Can be a string or ModelMessage. If not provided, the agent under test will generate the response.
Executes the entire scenario from start to finish.
This method runs through all script steps sequentially until a final result (success, failure, or error) is determined. Each script step can trigger one or more agent interactions depending on the step type:
user()
and agent()
steps typically trigger one agent interaction eachproceed()
steps can trigger multiple agent interactions across multiple turnsjudge()
steps trigger the judge agent to evaluate the conversationsucceed()
and fail()
steps immediately end the scenarioThe execution will stop early if:
A promise that resolves with the final result of the scenario
Immediately ends the scenario with a failure verdict.
This method forces the scenario to end with failure, regardless of the current conversation state. It's useful for scenarios where you want to explicitly mark failure based on specific conditions or external factors.
This method is part of the ScenarioExecutionLike interface used by script steps.
Optional
reasoning: stringOptional explanation for why the scenario is being marked as failed
A promise that resolves with the final failed scenario result
Checks if a partial result has been set for the scenario.
This method is used internally to determine if a scenario has already reached a conclusion (success or failure) but hasn't been finalized yet. Partial results are typically set by agents that make final decisions (like judge agents) and are later finalized with the complete message history.
True if a partial result exists, false otherwise
Invokes the judge agent to evaluate the current state of the conversation.
The judge agent analyzes the conversation history and determines whether the scenario criteria have been met. This can result in either:
This method is part of the ScenarioExecutionLike interface used by script steps.
Optional
content: string | ModelMessageOptional message to pass to the judge agent for additional context
A promise that resolves with:
Adds a message to the conversation history.
This method is part of the ScenarioExecutionLike interface used by script steps. It automatically routes the message to the appropriate agent based on the message role:
The ModelMessage to add to the conversation
Lets the scenario proceed automatically for a specified number of turns.
This method is a script step that simulates natural conversation flow by allowing agents to interact automatically without explicit script steps. It can trigger multiple agent interactions across multiple turns, making it useful for testing scenarios where you want to see how agents behave in extended conversations.
Unlike other script steps that typically trigger one agent interaction each, this step can trigger many agent interactions depending on the number of turns and the agents' behavior.
The method will continue until:
Optional
turns: numberThe number of turns to proceed. If undefined, runs until a conclusion or max turns is reached
Optional
onTurn: (state: ScenarioExecutionStateLike) => void | Promise<void>Optional callback executed at the end of each turn. Receives the current execution state
Optional
onStep: (state: ScenarioExecutionStateLike) => void | Promise<void>Optional callback executed after each agent interaction. Receives the current execution state
A promise that resolves with:
// Proceed for 5 turns
const result = await execution.proceed(5);
// Proceed until conclusion with callbacks
const result = await execution.proceed(
undefined,
(state) => console.log(`Turn ${state.currentTurn} completed`),
(state) => console.log(`Agent interaction completed, ${state.messages.length} messages`)
);
Sets a partial result for the scenario.
This method is used internally to store intermediate results that may be finalized later with the complete message history. Partial results are typically created by agents that make final decisions (like judge agents) and contain the success/failure status, reasoning, and criteria evaluation, but not the complete message history.
The partial result without the messages field. Should include success status, reasoning, and criteria evaluation.
Executes a single agent interaction in the scenario.
This method is for manual step-by-step execution of the scenario, where each call
represents one agent taking their turn. This is different from script steps (like
user()
, agent()
, proceed()
, etc.) which are functions in the scenario script.
Each call to this method will:
Note: This method is primarily for debugging or custom execution flows. Most users
will use execute()
to run the entire scenario automatically.
A promise that resolves with either:
Immediately ends the scenario with a success verdict.
This method forces the scenario to end successfully, regardless of the current conversation state. It's useful for scenarios where you want to explicitly mark success based on specific conditions or external factors.
This method is part of the ScenarioExecutionLike interface used by script steps.
Optional
reasoning: stringOptional explanation for why the scenario is being marked as successful
A promise that resolves with the final successful scenario result
Executes a user turn in the conversation.
If content is provided, it's used directly as the user's message. If not provided, the user simulator agent is called to generate an appropriate response based on the current conversation context.
This method is part of the ScenarioExecutionLike interface used by script steps.
Optional
content: string | ModelMessageOptional content for the user's message. Can be a string or ModelMessage. If not provided, the user simulator agent will generate the content.
Manages the execution of a single scenario test.
This class orchestrates the interaction between agents (user simulator, agent under test, and judge), executes the test script step-by-step, and manages the scenario's state throughout execution. It also emits events that can be subscribed to for real-time monitoring of the scenario's progress.
Execution Flow Overview
The execution follows a turn-based system where agents take turns responding. The key concepts are:
user()
,agent()
,proceed()
, etc.Message Broadcasting System
The class implements a sophisticated message broadcasting system that ensures all agents can "hear" each other's messages:
broadcastMessage()
pendingMessages
) that stores messages from other agentsThis creates a realistic conversation environment where agents can respond contextually to the full conversation history and any new messages from other agents.
Example Message Flow
Each script step can trigger one or more agent interactions depending on the step type. For example, a
proceed(5)
step might trigger 10 agent interactions across 5 turns.Note: This is an internal class. Most users will interact with the higher-level
scenario.run()
function instead of instantiating this class directly.Example