@langwatch/scenario
    Preparing search index...

    Class ScenarioExecution

    Manages the execution of a single scenario test.

    This class orchestrates the interaction between agents (user simulator, agent under test, and judge), executes the test script step-by-step, and manages the scenario's state throughout execution. It also emits events that can be subscribed to for real-time monitoring of the scenario's progress.

    The execution follows a turn-based system where agents take turns responding. The key concepts are:

    • Script Steps: Functions in the scenario script like user(), agent(), proceed(), etc.
    • Agent Interactions: Individual agent responses that occur when an agent takes their turn
    • Turns: Groups of agent interactions that happen in sequence

    The class implements a sophisticated message broadcasting system that ensures all agents can "hear" each other's messages:

    1. Message Creation: When an agent sends a message, it's added to the conversation history
    2. Broadcasting: The message is immediately broadcast to all other agents via broadcastMessage()
    3. Queue Management: Each agent has a pending message queue (pendingMessages) that stores messages from other agents
    4. Agent Input: When an agent is called, it receives both the full conversation history and any new pending messages that have been broadcast to it
    5. Queue Clearing: After an agent processes its pending messages, its queue is cleared

    This creates a realistic conversation environment where agents can respond contextually to the full conversation history and any new messages from other agents.

    Turn 1:
    1. User Agent sends: "Hello"
    - Added to conversation history
    - Broadcast to Agent and Judge (pendingMessages[1] = ["Hello"], pendingMessages[2] = ["Hello"])

    2. Agent is called:
    - Receives: full conversation + pendingMessages[1] = ["Hello"]
    - Sends: "Hi there! How can I help you?"
    - Added to conversation history
    - Broadcast to User and Judge (pendingMessages[0] = ["Hi there!..."], pendingMessages[2] = ["Hello", "Hi there!..."])
    - pendingMessages[1] is cleared

    3. Judge is called:
    - Receives: full conversation + pendingMessages[2] = ["Hello", "Hi there!..."]
    - Evaluates and decides to continue
    - pendingMessages[2] is cleared

    Each script step can trigger one or more agent interactions depending on the step type. For example, a proceed(5) step might trigger 10 agent interactions across 5 turns.

    Note: This is an internal class. Most users will interact with the higher-level scenario.run() function instead of instantiating this class directly.

    import scenario from "@langwatch/scenario";

    // This is a simplified example of what `scenario.run` does internally.
    const result = await scenario.run({
    name: "My First Scenario",
    description: "A simple test of the agent's greeting.",
    agents: [
    scenario.userSimulatorAgent(),
    scenario.judgeAgent({
    criteria: ["Agent should respond with a greeting"],
    }),
    ],
    script: [
    scenario.user("Hello"), // Script step 1: triggers 1 agent interaction
    scenario.agent(), // Script step 2: triggers 1 agent interaction
    scenario.proceed(3), // Script step 3: triggers multiple agent interactions
    scenario.judge(), // Script step 4: triggers 1 agent interaction
    ]
    });

    console.log("Scenario result:", result.success);

    Implements

    Index

    Constructors

    Properties

    events$: Observable<
        | {
            batchRunId: string;
            metadata: { description?: string; name?: string };
            rawEvent?: any;
            scenarioId: string;
            scenarioRunId: string;
            scenarioSetId: string;
            timestamp: number;
            type: RUN_STARTED;
        }
        | {
            batchRunId: string;
            rawEvent?: any;
            results?: | null
            | {
                error?: string;
                metCriteria: string[];
                reasoning?: string;
                unmetCriteria: string[];
                verdict: Verdict;
            };
            scenarioId: string;
            scenarioRunId: string;
            scenarioSetId: string;
            status: ScenarioRunStatus;
            timestamp: number;
            type: RUN_FINISHED;
        }
        | {
            batchRunId: string;
            messages: (
                | { content: string; id: string; name?: string; role: "developer" }
                | { content: string; id: string; name?: string; role: "system" }
                | {
                    content?: string;
                    id: string;
                    name?: string;
                    role: "assistant";
                    toolCalls?: {
                        function: { arguments: string; name: string };
                        id: string;
                        type: "function";
                    }[];
                }
                | { content: string; id: string; name?: string; role: "user" }
                | { content: string; id: string; role: "tool"; toolCallId: string }
            )[];
            rawEvent?: any;
            scenarioId: string;
            scenarioRunId: string;
            scenarioSetId: string;
            timestamp: number;
            type: MESSAGE_SNAPSHOT;
        },
    > = ...

    An observable stream of events that occur during the scenario execution. Subscribe to this to monitor the progress of the scenario in real-time.

    Events include:

    • RUN_STARTED: When scenario execution begins
    • MESSAGE_SNAPSHOT: After each message is added to the conversation
    • RUN_FINISHED: When scenario execution completes (success/failure/error)

    Accessors

    Methods

    • Adds execution time for a specific agent to the performance tracking.

      This method is used internally to track how long each agent takes to respond, which is included in the final scenario result for performance analysis. The accumulated time for each agent is used to calculate total agent response times in the scenario result.

      Parameters

      • agentIdx: number

        The index of the agent in the agents array

      • time: number

        The execution time in milliseconds to add to the agent's total

      Returns void

      // This is typically called internally by the execution engine
      execution.addAgentTime(0, 1500); // Agent at index 0 took 1.5 seconds
    • Executes an agent turn in the conversation.

      If content is provided, it's used directly as the agent's response. If not provided, the agent under test is called to generate a response based on the current conversation context and any pending messages.

      This method is part of the ScenarioExecutionLike interface used by script steps.

      Parameters

      • Optionalcontent: string | ModelMessage

        Optional content for the agent's response. Can be a string or ModelMessage. If not provided, the agent under test will generate the response.

      Returns Promise<void>

      // Let agent generate response
      await execution.agent();

      // Use provided content
      await execution.agent("The weather is sunny today!");

      // Use a ModelMessage object
      await execution.agent({
      role: "assistant",
      content: "I'm here to help you with weather information."
      });
    • Executes the entire scenario from start to finish.

      This method runs through all script steps sequentially until a final result (success, failure, or error) is determined. Each script step can trigger one or more agent interactions depending on the step type:

      • user() and agent() steps typically trigger one agent interaction each
      • proceed() steps can trigger multiple agent interactions across multiple turns
      • judge() steps trigger the judge agent to evaluate the conversation
      • succeed() and fail() steps immediately end the scenario

      The execution will stop early if:

      • A script step returns a ScenarioResult
      • The maximum number of turns is reached
      • An error occurs during execution

      Returns Promise<ScenarioResult>

      A promise that resolves with the final result of the scenario

      Error if an unhandled exception occurs during execution

      const execution = new ScenarioExecution(config, script);
      const result = await execution.execute();
      console.log(`Scenario ${result.success ? 'passed' : 'failed'}`);
    • Immediately ends the scenario with a failure verdict.

      This method forces the scenario to end with failure, regardless of the current conversation state. It's useful for scenarios where you want to explicitly mark failure based on specific conditions or external factors.

      This method is part of the ScenarioExecutionLike interface used by script steps.

      Parameters

      • Optionalreasoning: string

        Optional explanation for why the scenario is being marked as failed

      Returns Promise<ScenarioResult>

      A promise that resolves with the final failed scenario result

      // Mark failure with default reasoning
      const result = await execution.fail();

      // Mark failure with custom reasoning
      const result = await execution.fail(
      "Agent failed to provide accurate weather information"
      );
    • Checks if a partial result has been set for the scenario.

      This method is used internally to determine if a scenario has already reached a conclusion (success or failure) but hasn't been finalized yet. Partial results are typically set by agents that make final decisions (like judge agents) and are later finalized with the complete message history.

      Returns boolean

      True if a partial result exists, false otherwise

      // This is typically used internally by the execution engine
      if (execution.hasResult()) {
      console.log('Scenario has reached a conclusion');
      }
    • Invokes the judge agent to evaluate the current state of the conversation.

      The judge agent analyzes the conversation history and determines whether the scenario criteria have been met. This can result in either:

      • A final scenario result (success/failure) if the judge makes a decision
      • Null if the judge needs more information or conversation to continue

      This method is part of the ScenarioExecutionLike interface used by script steps.

      Parameters

      • Optionalcontent: string | ModelMessage

        Optional message to pass to the judge agent for additional context

      Returns Promise<null | ScenarioResult>

      A promise that resolves with:

      • ScenarioResult if the judge makes a final decision, or
      • Null if the conversation should continue
      // Let judge evaluate current state
      const result = await execution.judge();
      if (result) {
      console.log(`Judge decided: ${result.success ? 'pass' : 'fail'}`);
      }

      // Provide additional context to judge
      const result = await execution.judge("Please consider the user's satisfaction level");
    • Adds a message to the conversation history.

      This method is part of the ScenarioExecutionLike interface used by script steps. It automatically routes the message to the appropriate agent based on the message role:

      • "user" messages are routed to USER role agents
      • "assistant" messages are routed to AGENT role agents
      • Other message types are added directly to the conversation

      Parameters

      • message: ModelMessage

        The ModelMessage to add to the conversation

      Returns Promise<void>

      await execution.message({
      role: "user",
      content: "Hello, how are you?"
      });
    • Lets the scenario proceed automatically for a specified number of turns.

      This method is a script step that simulates natural conversation flow by allowing agents to interact automatically without explicit script steps. It can trigger multiple agent interactions across multiple turns, making it useful for testing scenarios where you want to see how agents behave in extended conversations.

      Unlike other script steps that typically trigger one agent interaction each, this step can trigger many agent interactions depending on the number of turns and the agents' behavior.

      The method will continue until:

      • The specified number of turns is reached
      • A final scenario result is determined
      • The maximum turns limit is reached

      Parameters

      • Optionalturns: number

        The number of turns to proceed. If undefined, runs until a conclusion or max turns is reached

      • OptionalonTurn: (state: ScenarioExecutionStateLike) => void | Promise<void>

        Optional callback executed at the end of each turn. Receives the current execution state

      • OptionalonStep: (state: ScenarioExecutionStateLike) => void | Promise<void>

        Optional callback executed after each agent interaction. Receives the current execution state

      Returns Promise<null | ScenarioResult>

      A promise that resolves with:

      • ScenarioResult if a conclusion is reached during the proceeding, or
      • Null if the specified turns complete without conclusion
      // Proceed for 5 turns
      const result = await execution.proceed(5);

      // Proceed until conclusion with callbacks
      const result = await execution.proceed(
      undefined,
      (state) => console.log(`Turn ${state.currentTurn} completed`),
      (state) => console.log(`Agent interaction completed, ${state.messages.length} messages`)
      );
    • Sets a partial result for the scenario.

      This method is used internally to store intermediate results that may be finalized later with the complete message history. Partial results are typically created by agents that make final decisions (like judge agents) and contain the success/failure status, reasoning, and criteria evaluation, but not the complete message history.

      Parameters

      • result: Omit<ScenarioResult, "messages">

        The partial result without the messages field. Should include success status, reasoning, and criteria evaluation.

      Returns void

      // This is typically called internally by agents that make final decisions
      execution.setResult({
      success: true,
      reasoning: "Agent provided accurate weather information",
      metCriteria: ["Provides accurate weather data"],
      unmetCriteria: []
      });
    • Executes a single agent interaction in the scenario.

      This method is for manual step-by-step execution of the scenario, where each call represents one agent taking their turn. This is different from script steps (like user(), agent(), proceed(), etc.) which are functions in the scenario script.

      Each call to this method will:

      • Progress to the next turn if needed
      • Find the next agent that should act
      • Execute that agent's response
      • Return either new messages or a final scenario result

      Note: This method is primarily for debugging or custom execution flows. Most users will use execute() to run the entire scenario automatically.

      Returns Promise<ModelMessage[] | ScenarioResult>

      A promise that resolves with either:

      • Array of new messages added during the agent interaction, or
      • A final ScenarioResult if the interaction concludes the scenario

      Error if no result is returned from the step

      const execution = new ScenarioExecution(config, script);

      // Execute one agent interaction at a time
      const messages = await execution.step();
      if (Array.isArray(messages)) {
      console.log('New messages:', messages);
      } else {
      console.log('Scenario finished:', messages.success);
      }
    • Immediately ends the scenario with a success verdict.

      This method forces the scenario to end successfully, regardless of the current conversation state. It's useful for scenarios where you want to explicitly mark success based on specific conditions or external factors.

      This method is part of the ScenarioExecutionLike interface used by script steps.

      Parameters

      • Optionalreasoning: string

        Optional explanation for why the scenario is being marked as successful

      Returns Promise<ScenarioResult>

      A promise that resolves with the final successful scenario result

      // Mark success with default reasoning
      const result = await execution.succeed();

      // Mark success with custom reasoning
      const result = await execution.succeed(
      "User successfully completed the onboarding flow"
      );
    • Executes a user turn in the conversation.

      If content is provided, it's used directly as the user's message. If not provided, the user simulator agent is called to generate an appropriate response based on the current conversation context.

      This method is part of the ScenarioExecutionLike interface used by script steps.

      Parameters

      • Optionalcontent: string | ModelMessage

        Optional content for the user's message. Can be a string or ModelMessage. If not provided, the user simulator agent will generate the content.

      Returns Promise<void>

      // Use provided content
      await execution.user("What's the weather like?");

      // Let user simulator generate content
      await execution.user();

      // Use a ModelMessage object
      await execution.user({
      role: "user",
      content: "Tell me a joke"
      });