Skip to content

Mocks

Simulating external dependencies for deterministic testing

A mock is a simulated implementation of an external dependency—API, database, tool, or service—that your scenario test uses to ensure deterministic, offline-friendly execution.

Understanding Mock Levels

When testing agents, you can mock at different levels of your system architecture. Each level serves different testing purposes:

Level 1: Tool Function Mocking

  • Purpose: Test that your agent calls the right tools with correct parameters
  • Use case: Verify agent reasoning, tool selection, and parameter passing
  • What you're testing: Agent logic and tool orchestration
  • Trade-off: Fast and simple, but doesn't test tool implementation

Level 2: API/Service Mocking

  • Purpose: Test tool implementation without external dependencies
  • Use case: Test HTTP calls, database queries, and external integrations within tools
  • What you're testing: Tool implementation and external service interfaces
  • Trade-off: More realistic but requires mocking at the right boundaries

Level 3: Dependency Injection

  • Purpose: Design your system for testability from the ground up
  • Use case: Production systems where you control the architecture
  • What you're testing: Full system behavior with swappable dependencies
  • Trade-off: Most flexible but requires architectural planning

Mocking Patterns

The following examples show different levels of mocking in action, ordered by how commonly they're used in agent testing:

1. Tool Function Mocking (Level 1)

Mock tool execution functions to test your agent's tool usage behavior. This is the most common mocking pattern for modern agents.

API Data Fetching Tools

typescript
import { openai } from "@ai-sdk/openai";
import scenario, { type AgentAdapter, AgentRole } from "@langwatch/scenario";
import { generateText, tool } from "ai";
import { describe, it, expect, vi } from "vitest";
import { z } from "zod";
 
// Mock the tool function
const fetchUserDataMock = vi.fn();
 
// Define a tool that uses the mock
const fetchUserDataTool = tool({
  description: "Fetch user data from external API",
  parameters: z.object({
    userId: z.string().describe("The user ID to fetch data for"),
  }),
  execute: fetchUserDataMock,
});
 
const userDataAgent: AgentAdapter = {
  role: AgentRole.AGENT,
  call: async (input) => {
    const response = await generateText({
      model: openai("gpt-4o"),
      messages: input.messages,
      tools: { fetch_user_data: fetchUserDataTool },
      toolChoice: "auto",
    });
    return response.text;
  },
};
 
describe("Tool Call Mocking", () => {
  it("should mock tool execution", async () => {
    // Setup mock return value
    fetchUserDataMock.mockResolvedValue({
      name: "Alice",
      points: 150,
      email: "[email protected]",
    });
 
    const result = await scenario.run({
      name: "user data tool test",
      description: "Test agent's ability to fetch user data via tool",
      agents: [userDataAgent, scenario.userSimulatorAgent()],
      script: [
        scenario.user("Show me user data for ID 123"),
        scenario.agent(),
        (state) => {
          // Verify the mock was called with correct parameters
          expect(fetchUserDataMock).toHaveBeenCalledWith({ userId: "123" });
        },
        scenario.succeed(),
      ],
    });
 
    expect(result.success).toBe(true);
  });
});

Database Operations Tools

typescript
import { openai } from "@ai-sdk/openai";
import scenario, { type AgentAdapter, AgentRole } from "@langwatch/scenario";
import { generateText, tool } from "ai";
import { describe, it, expect, vi } from "vitest";
import { z } from "zod";
 
// Mock the database tool functions
const saveUserMock = vi.fn();
const findUserMock = vi.fn();
 
// Define database tools
const saveUserTool = tool({
  description: "Save a user to the database",
  parameters: z.object({
    name: z.string().describe("The user's name"),
    email: z.string().describe("The user's email"),
  }),
  execute: saveUserMock,
});
 
const findUserTool = tool({
  description: "Find users by name",
  parameters: z.object({
    name: z.string().describe("The name to search for"),
  }),
  execute: findUserMock,
});
 
const databaseAgent: AgentAdapter = {
  role: AgentRole.AGENT,
  call: async (input) => {
    const response = await generateText({
      model: openai("gpt-4o"),
      messages: input.messages,
      tools: {
        save_user: saveUserTool,
        find_user: findUserTool,
      },
      toolChoice: "auto",
    });
    return response.text;
  },
};
 
describe("Database Tool Mocking", () => {
  it("should mock save user tool", async () => {
    saveUserMock.mockResolvedValue({
      id: 123,
      name: "John",
      email: "[email protected]",
    });
 
    const result = await scenario.run({
      name: "database save test",
      description: "Test agent's ability to save user data via tool",
      agents: [databaseAgent, scenario.userSimulatorAgent()],
      script: [
        scenario.user("Save a new user named John with email [email protected]"),
        scenario.agent(),
        (state) => {
          expect(saveUserMock).toHaveBeenCalledWith({
            name: "John",
            email: "[email protected]",
          });
        },
        scenario.succeed(),
      ],
    });
 
    expect(result.success).toBe(true);
  });
});

2. API/Service Mocking (Level 2)

Mock HTTP calls, database connections, and external services within your tools to test the interface between your agent system and external dependencies.

typescript
import { openai } from "@ai-sdk/openai";
import scenario, { type AgentAdapter, AgentRole } from "@langwatch/scenario";
import { generateText, tool } from "ai";
import { describe, it, expect, vi } from "vitest";
import { z } from "zod";
 
// Mock the fetch function that tools will use
const mockFetch = vi.fn();
global.fetch = mockFetch;
 
// Real tool implementation that makes HTTP calls
const fetchUserDataTool = tool({
  description: "Fetch user data from external API",
  parameters: z.object({
    userId: z.string().describe("The user ID to fetch data for"),
  }),
  execute: async ({ userId }) => {
    const response = await fetch(`https://api.example.com/users/${userId}`);
    const data = await response.json();
    return data;
  },
});
 
const userDataAgent: AgentAdapter = {
  role: AgentRole.AGENT,
  call: async (input) => {
    const response = await generateText({
      model: openai("gpt-4o"),
      messages: input.messages,
      tools: { fetch_user_data: fetchUserDataTool },
      toolChoice: "auto",
    });
    return response.text;
  },
};
 
describe("API Service Mocking", () => {
  it("should mock HTTP calls within tools", async () => {
    // Mock the actual HTTP call
    mockFetch.mockResolvedValue({
      ok: true,
      status: 200,
      json: () =>
        Promise.resolve({
          id: "123",
          name: "Alice",
          email: "[email protected]",
        }),
    });
 
    const result = await scenario.run({
      name: "api service test",
      description: "Test tool's HTTP integration",
      agents: [userDataAgent, scenario.userSimulatorAgent()],
      script: [
        scenario.user("Get user data for ID 123"),
        scenario.agent(),
        (state) => {
          // Verify the HTTP call was made correctly
          expect(mockFetch).toHaveBeenCalledWith(
            "https://api.example.com/users/123"
          );
        },
        scenario.succeed(),
      ],
    });
 
    expect(result.success).toBe(true);
  });
});

3. Tool Failure Simulation (Level 1 & 2)

Mock tool failures to test how your agent handles errors, timeouts, and edge cases. You can simulate failures at both the tool level and the service level.

typescript
import { openai } from "@ai-sdk/openai";
import scenario, { type AgentAdapter, AgentRole } from "@langwatch/scenario";
import { generateText, tool } from "ai";
import { describe, it, expect, vi } from "vitest";
import { z } from "zod";
 
// Mock the external service tool
const callExternalServiceMock = vi.fn();
 
// Define a tool that can fail
const callExternalServiceTool = tool({
  description: "Call an external service",
  parameters: z.object({
    endpoint: z.string().describe("The service endpoint to call"),
  }),
  execute: callExternalServiceMock,
});
 
const resilientAgent: AgentAdapter = {
  role: AgentRole.AGENT,
  call: async (input) => {
    try {
      const response = await generateText({
        model: openai("gpt-4o"),
        messages: input.messages,
        tools: { call_external_service: callExternalServiceTool },
        toolChoice: "auto",
      });
      return response.text;
    } catch (error) {
      return `I encountered an error: ${
        error instanceof Error ? error.message : "Unknown error"
      }. Let me try a different approach.`;
    }
  },
};
 
describe("Tool Failure Simulation", () => {
  it("should handle tool timeout errors", async () => {
    // Simulate tool timeout
    callExternalServiceMock.mockRejectedValue(new Error("Request timeout"));
 
    const result = await scenario.run({
      name: "tool timeout test",
      description: "Test agent's ability to handle tool timeouts",
      agents: [resilientAgent, scenario.userSimulatorAgent()],
      script: [
        scenario.user("Call the external service"),
        scenario.agent(),
        (state) => {
          expect(callExternalServiceMock).toHaveBeenCalled();
          // Agent should handle the error gracefully
          const response = state.lastAgentMessage().content;
          expect(response).toContain("error");
        },
        scenario.succeed(),
      ],
    });
 
    expect(result.success).toBe(true);
  });
 
  it("should handle tool rate limit errors", async () => {
    // Simulate rate limit error
    callExternalServiceMock.mockRejectedValue(new Error("Rate limit exceeded"));
 
    const result = await scenario.run({
      name: "tool rate limit test",
      description: "Test agent's ability to handle rate limits",
      agents: [resilientAgent, scenario.userSimulatorAgent()],
      script: [
        scenario.user("Call the external service"),
        scenario.agent(),
        (state) => {
          expect(callExternalServiceMock).toHaveBeenCalled();
          const response = state.lastAgentMessage().content;
          expect(response).toContain("error");
        },
        scenario.succeed(),
      ],
    });
 
    expect(result.success).toBe(true);
  });
 
  it("should handle successful tool calls", async () => {
    // Simulate successful tool call
    callExternalServiceMock.mockResolvedValue("Service call successful");
 
    const result = await scenario.run({
      name: "tool success test",
      description: "Test agent's ability to handle successful tool calls",
      agents: [resilientAgent, scenario.userSimulatorAgent()],
      script: [
        scenario.user("Call the external service"),
        scenario.agent(),
        (state) => {
          expect(callExternalServiceMock).toHaveBeenCalled();
        },
        scenario.succeed(),
      ],
    });
 
    expect(result.success).toBe(true);
  });
});

4. LLM Provider Mocking (Level 2)

For testing agent flow without actual LLM calls, you can mock the model provider APIs. However, Scenario's caching system is often a better solution for deterministic, cost-effective testing.

typescript
import { openai } from "@ai-sdk/openai";
import scenario, { type AgentAdapter, AgentRole } from "@langwatch/scenario";
import { generateText } from "ai";
import { describe, it, expect, vi } from "vitest";
 
// Mock the generateText function
const mockGenerateText = vi.fn();
vi.mock("ai", () => ({
  generateText: mockGenerateText,
}));
 
const chatAgent: AgentAdapter = {
  role: AgentRole.AGENT,
  call: async (input) => {
    const response = await generateText({
      model: openai("gpt-4o"),
      messages: input.messages,
    });
    return response.text;
  },
};
 
describe("LLM Provider Mocking", () => {
  it("should mock LLM responses", async () => {
    // Mock the LLM response
    mockGenerateText.mockResolvedValue({
      text: "I can help you with that request.",
    });
 
    const result = await scenario.run({
      name: "llm mock test",
      description: "Test with mocked LLM responses",
      agents: [chatAgent, scenario.userSimulatorAgent()],
      script: [
        scenario.user("Hello"),
        scenario.agent(),
        (state) => {
          expect(mockGenerateText).toHaveBeenCalled();
          expect(state.lastAgentMessage().content).toBe(
            "I can help you with that request."
          );
        },
        scenario.succeed(),
      ],
    });
 
    expect(result.success).toBe(true);
  });
});

Best Practices

1. Mock Tools, Not Agent Logic

Mock external dependencies through their tool interfaces, not your agent's core reasoning logic.

typescript
// ✅ Good: Mock the tool function
const mockWeatherTool = vi.fn().mockResolvedValue("Sunny, 75°F");
 
const weatherTool = tool({
  description: "Get weather data",
  parameters: z.object({ city: z.string() }),
  execute: mockWeatherTool,
});
 
// ❌ Bad: Mock the agent's reasoning
const mockAgent = vi.fn().mockResolvedValue("It's sunny");

2. Use Realistic Tool Responses

Make your tool mocks return data that matches real-world API responses and error conditions.

typescript
// ✅ Good: Realistic API response structure
mockApiTool.mockResolvedValue({
  data: { temperature: 75, condition: "sunny", humidity: 65 },
  status: "success",
  timestamp: "2024-01-15T10:30:00Z",
});
 
// ❌ Bad: Oversimplified response
mockApiTool.mockResolvedValue("sunny");

3. Test Tool Failure Scenarios

Agents must handle tool failures gracefully - test timeouts, rate limits, and error responses.

typescript
// Test multiple failure scenarios
it("should handle tool timeout", async () => {
  mockTool.mockRejectedValue(new Error("Request timeout"));
  // Test agent's timeout handling
});
 
it("should handle rate limits", async () => {
  mockTool.mockRejectedValue(new Error("Rate limit exceeded"));
  // Test agent's rate limit handling
});

General Testing Best Practices

For broader testing practices like descriptive naming, test isolation, and mock cleanup, see:

Related Concepts

  • Fixtures – Static test assets for deterministic scenarios
  • Tool Calling – Testing agent tool usage and responses
  • Cache – Caching LLM calls for faster, deterministic runs

Next up: learn how to cache LLM calls for even faster, deterministic runs.