Skip to content

Mocks

Simulating external dependencies for deterministic testing

A mock is a simulated implementation of an external dependency—API, database, tool, or service—that your scenario test uses to ensure deterministic, offline-friendly execution.

Understanding Mock Levels

When testing agents, you can mock at different levels of your system architecture. Each level serves different testing purposes:

Level 1: Tool Function Mocking

  • Purpose: Test that your agent calls the right tools with correct parameters
  • Use case: Verify agent reasoning, tool selection, and parameter passing
  • What you're testing: Agent logic and tool orchestration
  • Trade-off: Fast and simple, but doesn't test tool implementation

Level 2: API/Service Mocking

  • Purpose: Test tool implementation without external dependencies
  • Use case: Test HTTP calls, database queries, and external integrations within tools
  • What you're testing: Tool implementation and external service interfaces
  • Trade-off: More realistic but requires mocking at the right boundaries

Level 3: Dependency Injection

  • Purpose: Design your system for testability from the ground up
  • Use case: Production systems where you control the architecture
  • What you're testing: Full system behavior with swappable dependencies
  • Trade-off: Most flexible but requires architectural planning

Mocking Patterns

The following examples show different levels of mocking in action, ordered by how commonly they're used in agent testing:

1. Tool Function Mocking (Level 1)

Mock tool execution functions to test your agent's tool usage behavior. This is the most common mocking pattern for modern agents.

API Data Fetching Tools

python
# Source: https://github.com/langwatch/scenario/blob/main/python/examples/test_simple_tool_mocking.py
"""
Example test demonstrating Level 1: Tool Function Mocking with real LLM tool calling.
 
This example shows how to mock tool functions while using actual LLM tool calling
mechanisms. This is the most common mocking pattern for modern agents.
 
What we're testing:
- Agent reasoning and tool selection logic
- LLM's ability to extract parameters from natural language
- Tool orchestration and response handling
 
What we're NOT testing:
- The actual tool implementation (that's mocked out)
- External API calls or database connections
"""
 
import pytest
import scenario
from unittest.mock import patch
import litellm
import json
 
 
def fetch_user_data(user_id: str) -> dict:
    """Fetch user data from external API."""
    # This would normally make an API call
    raise NotImplementedError("This should be mocked in tests")
 
 
class UserDataAgent(scenario.AgentAdapter):
    """Agent that uses actual LLM tool calling, not hardcoded logic."""
 
    async def call(self, input: scenario.AgentInput) -> scenario.AgentReturnTypes:
        # Define tool schema for LLM
        tool_schemas = [
            {
                "type": "function",
                "function": {
                    "name": "fetch_user_data",
                    "description": "Fetch user data from external API",
                    "parameters": {
                        "type": "object",
                        "properties": {
                            "user_id": {
                                "type": "string",
                                "description": "The user ID to fetch data for",
                            }
                        },
                        "required": ["user_id"],
                    },
                },
            }
        ]
 
        # Let LLM decide when and how to call tools
        response = litellm.completion(
            model="openai/gpt-4o-mini",
            messages=input.messages,
            tools=tool_schemas,
            tool_choice="auto",
        )
 
        message = response.choices[0].message  # type: ignore[attr-defined]  # litellm response has dynamic attributes
 
        # Handle tool calls if LLM made any
        if message.tool_calls:
            tool_responses = []
 
            for tool_call in message.tool_calls:
                tool_name = tool_call.function.name
                args = json.loads(tool_call.function.arguments)
 
                # Find and execute the tool function
                if tool_name == "fetch_user_data":
                    try:
                        tool_result = fetch_user_data(**args)
                        tool_responses.append(
                            {
                                "role": "tool",
                                "tool_call_id": tool_call.id,
                                "content": json.dumps(tool_result),
                            }
                        )
                    except Exception as e:
                        tool_responses.append(
                            {
                                "role": "tool",
                                "tool_call_id": tool_call.id,
                                "content": f"Error: {str(e)}",
                            }
                        )
 
            # Continue conversation with tool results
            if tool_responses:
                follow_up_response = litellm.completion(
                    model="openai/gpt-4o-mini",
                    messages=input.messages + [message] + tool_responses,
                )
                return follow_up_response.choices[0].message.content or ""  # type: ignore[attr-defined]  # litellm response has dynamic attributes
 
        return message.content or ""
 
 
@pytest.mark.agent_test
@pytest.mark.asyncio
async def test_simple_tool_mocking():
    """Test mocking tools while using real LLM tool calling."""
 
    # Level 1: Mock the tool function itself, not any internal dependencies
    with patch("test_simple_tool_mocking.fetch_user_data") as mock_fetch:
        # Setup mock return value - what the tool should return when called
        mock_fetch.return_value = {
            "name": "Alice",
            "points": 150,
            "email": "[email protected]",
        }
 
        result = await scenario.run(
            name="user data tool test",
            description="Test agent's actual tool calling with mocked tool implementation",
            agents=[
                UserDataAgent(),
                scenario.UserSimulatorAgent(model="openai/gpt-4o-mini"),
            ],
            script=[
                scenario.user("Show me user data for ID 123"),
                scenario.agent(),
                # Verify the mock was called - proves the LLM correctly:
                # 1. Decided to use the fetch_user_data tool
                # 2. Extracted "123" as the user_id parameter from natural language
                lambda state: mock_fetch.assert_called_once_with(user_id="123"),
                scenario.succeed(),
            ],
        )
 
        assert result.success

Database Operations Tools

python
# Source: https://github.com/langwatch/scenario/blob/main/python/examples/test_database_tool_mocking.py
"""
Example test demonstrating database service mocking with real LLM tool calling.
 
This example shows how to mock database connections/operations within tools
while using actual LLM tool calling mechanisms to test realistic agent behavior.
"""
 
import pytest
import scenario
from unittest.mock import patch, Mock
import litellm
import json
 
 
# Mock database connection - this would normally be a real database client
class DatabaseClient:
    def save_user(self, name: str, email: str) -> dict:
        """Save a user to the database."""
        # This would normally execute SQL or call a database API
        raise NotImplementedError("This should be mocked in tests")
 
 
# Real tool implementation that uses database client
def save_user(name: str, email: str) -> dict:
    """Save a user to the database using database client."""
    db_client = DatabaseClient()
    return db_client.save_user(name, email)
 
 
class DatabaseAgent(scenario.AgentAdapter):
    """Agent that uses real LLM tool calling to save user data to database."""
 
    async def call(self, input: scenario.AgentInput) -> scenario.AgentReturnTypes:
        # Define the database tool schema for the LLM
        tool_schemas = [
            {
                "type": "function",
                "function": {
                    "name": "save_user",
                    "description": "Save a user to the database with their name and email",
                    "parameters": {
                        "type": "object",
                        "properties": {
                            "name": {
                                "type": "string",
                                "description": "The user's full name",
                            },
                            "email": {
                                "type": "string",
                                "description": "The user's email address",
                            },
                        },
                        "required": ["name", "email"],
                    },
                },
            }
        ]
 
        # Let the LLM decide when and how to call the database tool
        # The LLM will extract name and email from the user's natural language request
        response = litellm.completion(
            model="openai/gpt-4o-mini",
            messages=input.messages,
            tools=tool_schemas,
            tool_choice="auto",  # LLM decides when to use tools
        )
 
        message = response.choices[0].message  # type: ignore[attr-defined]  # litellm response has dynamic attributes
 
        # Handle any tool calls the LLM decided to make
        if message.tool_calls:
            tool_responses = []
 
            for tool_call in message.tool_calls:
                tool_name = tool_call.function.name
                # LLM provides the arguments (name, email) extracted from user input
                args = json.loads(tool_call.function.arguments)
 
                # Execute the appropriate tool function
                if tool_name == "save_user":
                    try:
                        # Call the actual tool function with LLM-extracted parameters
                        tool_result = save_user(**args)
                        tool_responses.append(
                            {
                                "role": "tool",
                                "tool_call_id": tool_call.id,
                                "content": json.dumps(tool_result),
                            }
                        )
                    except Exception as e:
                        # Handle tool execution errors gracefully
                        tool_responses.append(
                            {
                                "role": "tool",
                                "tool_call_id": tool_call.id,
                                "content": f"Error: {str(e)}",
                            }
                        )
 
            # If tools were called, get the LLM's final response based on tool results
            if tool_responses:
                follow_up_response = litellm.completion(
                    model="openai/gpt-4o-mini",
                    messages=input.messages + [message] + tool_responses,
                )
                return follow_up_response.choices[0].message.content or ""  # type: ignore[attr-defined]  # litellm response has dynamic attributes
 
        # Return the LLM's direct response if no tools were called
        return message.content or ""
 
 
@pytest.mark.agent_test
@pytest.mark.asyncio
async def test_database_service_mocking():
    """Test mocking database connections within tools while using real LLM tool calling."""
 
    # Mock the database client at the service level, not the tool level
    with patch("test_database_tool_mocking.DatabaseClient") as mock_db_class:
        # Setup the mock database client and response
        mock_db_client = Mock()
        mock_db_class.return_value = mock_db_client
 
        # Mock what the database would return
        mock_db_client.save_user.return_value = {
            "id": 123,
            "name": "John",
            "email": "[email protected]",
        }
 
        result = await scenario.run(
            name="database service test",
            description="Test tool's database integration with mocked database client",
            agents=[
                DatabaseAgent(),
                scenario.UserSimulatorAgent(model="openai/gpt-4o-mini"),
            ],
            script=[
                # User makes a natural language request
                scenario.user("Save a new user named John with email [email protected]"),
                # Agent uses LLM to understand request and call appropriate tools
                scenario.agent(),
                # Verify the database mock was called with specific parameters extracted by the LLM
                # This proves the LLM correctly:
                # 1. Decided to use the save_user tool
                # 2. Extracted "John" as the name parameter from natural language
                # 3. Extracted "[email protected]" as the email parameter
                lambda state: mock_db_client.save_user.assert_called_once_with(
                    "John", "[email protected]"
                ),
                scenario.succeed(),
            ],
        )
 
        assert result.success

2. API/Service Mocking (Level 2)

Mock HTTP calls, database connections, and external services within your tools to test the interface between your agent system and external dependencies.

python
# Source: https://github.com/langwatch/scenario/blob/main/python/examples/test_api_service_mocking.py
"""
Example test demonstrating API/service mocking with real LLM tool calling.
 
This example shows how to mock HTTP calls within tools while using actual
LLM tool calling mechanisms to test realistic agent behavior.
"""
 
import pytest
import scenario
from unittest.mock import patch, AsyncMock
import litellm
import json
import httpx
 
 
async def fetch_user_data(user_id: str) -> dict:
    """Fetch user data from external API."""
    async with httpx.AsyncClient() as client:
        response = await client.get(f"https://api.example.com/users/{user_id}")
        return response.json()
 
 
class UserDataAgent(scenario.AgentAdapter):
    """Agent that uses real LLM tool calling to fetch data from external APIs."""
 
    async def call(self, input: scenario.AgentInput) -> scenario.AgentReturnTypes:
        # Define the API tool schema for the LLM
        tool_schemas = [
            {
                "type": "function",
                "function": {
                    "name": "fetch_user_data",
                    "description": "Fetch user data from external API by user ID",
                    "parameters": {
                        "type": "object",
                        "properties": {
                            "user_id": {
                                "type": "string",
                                "description": "The unique identifier for the user",
                            }
                        },
                        "required": ["user_id"],
                    },
                },
            }
        ]
 
        # Let the LLM decide when and how to call the API tool
        # The LLM will extract user_id from the user's natural language request
        response = litellm.completion(
            model="openai/gpt-4o-mini",
            messages=input.messages,
            tools=tool_schemas,
            tool_choice="auto",  # LLM decides when to use tools
        )
 
        message = response.choices[0].message  # type: ignore[attr-defined]  # litellm response has dynamic attributes
 
        # Handle any tool calls the LLM decided to make
        if message.tool_calls:
            tool_responses = []
 
            for tool_call in message.tool_calls:
                tool_name = tool_call.function.name
                # LLM provides the arguments (user_id) extracted from user input
                args = json.loads(tool_call.function.arguments)
 
                # Execute the appropriate tool function
                if tool_name == "fetch_user_data":
                    try:
                        # Call the actual API tool with LLM-extracted parameters
                        # This is where our HTTP mocking takes effect
                        tool_result = await fetch_user_data(**args)
                        tool_responses.append(
                            {
                                "role": "tool",
                                "tool_call_id": tool_call.id,
                                "content": json.dumps(tool_result),
                            }
                        )
                    except Exception as e:
                        # Handle API call errors gracefully
                        tool_responses.append(
                            {
                                "role": "tool",
                                "tool_call_id": tool_call.id,
                                "content": f"Error: {str(e)}",
                            }
                        )
 
            # If tools were called, get the LLM's final response based on API results
            if tool_responses:
                follow_up_response = litellm.completion(
                    model="openai/gpt-4o-mini",
                    messages=input.messages + [message] + tool_responses,
                )
                return follow_up_response.choices[0].message.content or ""  # type: ignore[attr-defined]  # litellm response has dynamic attributes
 
        # Return the LLM's direct response if no tools were called
        return message.content or ""
 
 
@pytest.mark.agent_test
@pytest.mark.asyncio
async def test_api_service_mocking():
    """Test mocking HTTP calls within tools while using real LLM tool calling."""
 
    # Mock response data that the "API" will return
    mock_response_data = {"id": "123", "name": "Alice", "email": "[email protected]"}
 
    # Mock the HTTP client at the service level, not the agent level
    with patch("httpx.AsyncClient") as mock_client_class:
        # Setup the mock client and response
        mock_client = AsyncMock()
        mock_client_class.return_value.__aenter__.return_value = mock_client
 
        mock_response = AsyncMock()
        mock_response.json.return_value = mock_response_data
        mock_client.get.return_value = mock_response
 
        result = await scenario.run(
            name="api service test",
            description="Test tool's HTTP integration with mocked API calls",
            agents=[
                UserDataAgent(),
                scenario.UserSimulatorAgent(model="openai/gpt-4o-mini"),
            ],
            script=[
                # User makes a natural language request
                scenario.user("Get user data for ID 123"),
                # Agent uses LLM to understand request and call appropriate tools
                scenario.agent(),
                # Verify the HTTP mock was called with specific URL extracted by the LLM
                # This proves the LLM correctly:
                # 1. Decided to use the fetch_user_data tool
                # 2. Extracted "123" as the user_id parameter from natural language
                # 3. Tool constructed the correct API URL with that parameter
                lambda state: mock_client.get.assert_called_once_with(
                    "https://api.example.com/users/123"
                ),
                scenario.succeed(),
            ],
        )
 
        assert result.success

3. Tool Failure Simulation (Level 1 & 2)

Mock tool failures to test how your agent handles errors, timeouts, and edge cases. You can simulate failures at both the tool level and the service level.

python
# Source: https://github.com/langwatch/scenario/blob/main/python/examples/test_tool_failure_simulation.py
"""
Example test demonstrating tool failure simulation with real LLM tool calling.
 
This example shows how to test agent resilience by simulating tool failures,
timeouts, and other error conditions while using actual LLM tool calling.
"""
 
import pytest
import scenario
from unittest.mock import patch
import litellm
import json
 
 
def call_external_service(endpoint: str) -> str:
    """Call an external service."""
    # This would normally make an external API call
    raise NotImplementedError("This should be mocked in tests")
 
 
class ResilientAgent(scenario.AgentAdapter):
    """Agent that uses real LLM tool calling and handles external service failures gracefully."""
 
    async def call(self, input: scenario.AgentInput) -> scenario.AgentReturnTypes:
        # Define the external service tool schema for the LLM
        tool_schemas = [
            {
                "type": "function",
                "function": {
                    "name": "call_external_service",
                    "description": "Call an external service API endpoint",
                    "parameters": {
                        "type": "object",
                        "properties": {
                            "endpoint": {
                                "type": "string",
                                "description": "The API endpoint to call",
                            }
                        },
                        "required": ["endpoint"],
                    },
                },
            }
        ]
 
        # Let the LLM decide when and how to call the external service tool
        response = litellm.completion(
            model="openai/gpt-4o-mini",
            messages=input.messages,
            tools=tool_schemas,
            tool_choice="auto",
        )
 
        message = response.choices[0].message  # type: ignore[attr-defined]  # litellm response has dynamic attributes
 
        # Handle any tool calls the LLM decided to make
        if message.tool_calls:
            tool_responses = []
 
            for tool_call in message.tool_calls:
                tool_name = tool_call.function.name
                # LLM provides the arguments (endpoint) extracted from user input
                args = json.loads(tool_call.function.arguments)
 
                # Execute the appropriate tool function
                if tool_name == "call_external_service":
                    try:
                        # Call the actual external service tool with LLM-extracted parameters
                        # This is where our failure simulation takes effect
                        tool_result = call_external_service(**args)
                        tool_responses.append(
                            {
                                "role": "tool",
                                "tool_call_id": tool_call.id,
                                "content": str(tool_result),
                            }
                        )
                    except Exception as e:
                        # Handle service call errors gracefully - this is what we're testing
                        tool_responses.append(
                            {
                                "role": "tool",
                                "tool_call_id": tool_call.id,
                                "content": f"Error: {str(e)}",
                            }
                        )
 
            # If tools were called, get the LLM's final response based on service results
            if tool_responses:
                follow_up_response = litellm.completion(
                    model="openai/gpt-4o-mini",
                    messages=input.messages + [message] + tool_responses,
                )
                return follow_up_response.choices[0].message.content or ""  # type: ignore[attr-defined]  # litellm response has dynamic attributes
 
        # Return the LLM's direct response if no tools were called
        return message.content or ""
 
 
def check_error_in_message(state: scenario.ScenarioState) -> None:
    """Check that the agent's message contains error or timeout information."""
    last_msg = state.last_message()
    if last_msg["role"] == "assistant":
        content = last_msg.get("content", "")
        # Check for various error indicators the LLM might use
        error_indicators = ["error", "timeout", "timed out", "failed", "issue"]
        content_str = content if isinstance(content, str) else str(content)
        assert any(indicator in content_str.lower() for indicator in error_indicators)
 
 
def check_rate_limit_in_message(state: scenario.ScenarioState) -> None:
    """Check that the agent's message contains rate limit error information."""
    last_msg = state.last_message()
    if last_msg["role"] == "assistant":
        content = last_msg.get("content", "")
        # Check for various rate limit indicators the LLM might use
        rate_limit_indicators = [
            "rate limit",
            "exceeded",
            "limit exceeded",
            "too many requests",
        ]
        content_str = content if isinstance(content, str) else str(content)
        assert any(
            indicator in content_str.lower() for indicator in rate_limit_indicators
        )
 
 
def check_success_in_message(state: scenario.ScenarioState) -> None:
    """Check that the agent's message contains success information."""
    last_msg = state.last_message()
    if last_msg["role"] == "assistant":
        content = last_msg.get("content", "")
        # Check for various success indicators the LLM might use
        success_indicators = [
            "successful",
            "success",
            "completed",
            "call was successful",
        ]
        content_str = content if isinstance(content, str) else str(content)
        assert any(indicator in content_str.lower() for indicator in success_indicators)
 
 
@pytest.mark.agent_test
@pytest.mark.asyncio
async def test_tool_timeout_simulation():
    """Test agent's ability to handle tool timeouts."""
 
    with patch("test_tool_failure_simulation.call_external_service") as mock_service:
        # Simulate timeout error
        mock_service.side_effect = Exception("Request timeout")
 
        result = await scenario.run(
            name="tool timeout test",
            description="Test agent's ability to handle tool timeouts",
            agents=[
                ResilientAgent(),
                scenario.UserSimulatorAgent(model="openai/gpt-4o-mini"),
            ],
            script=[
                scenario.user("Call the external service at endpoint /api/data"),
                scenario.agent(),
                # Verify the mock was called with specific endpoint extracted by the LLM
                # This proves the LLM correctly extracted "/api/data" from the user message
                lambda state: mock_service.assert_called_once_with(
                    endpoint="/api/data"
                ),
                check_error_in_message,
                scenario.succeed(),
            ],
        )
 
        assert result.success
 
 
@pytest.mark.agent_test
@pytest.mark.asyncio
async def test_tool_rate_limit_simulation():
    """Test agent's ability to handle rate limits."""
 
    with patch("test_tool_failure_simulation.call_external_service") as mock_service:
        # Simulate rate limit error
        mock_service.side_effect = Exception("Rate limit exceeded")
 
        result = await scenario.run(
            name="tool rate limit test",
            description="Test agent's ability to handle rate limits",
            agents=[
                ResilientAgent(),
                scenario.UserSimulatorAgent(model="openai/gpt-4o-mini"),
            ],
            script=[
                scenario.user("Call the external service at endpoint /api/data"),
                scenario.agent(),
                # Verify the mock was called with specific endpoint extracted by the LLM
                # This proves the LLM correctly extracted "/api/data" from the user message
                lambda state: mock_service.assert_called_once_with(
                    endpoint="/api/data"
                ),
                check_rate_limit_in_message,
                scenario.succeed(),
            ],
        )
 
        assert result.success
 
 
@pytest.mark.agent_test
@pytest.mark.asyncio
async def test_tool_success_simulation():
    """Test agent's ability to handle successful tool calls."""
 
    with patch("test_tool_failure_simulation.call_external_service") as mock_service:
        # Simulate successful service call
        mock_service.return_value = "Service call successful"
 
        result = await scenario.run(
            name="tool success test",
            description="Test agent's ability to handle successful tool calls",
            agents=[
                ResilientAgent(),
                scenario.UserSimulatorAgent(model="openai/gpt-4o-mini"),
            ],
            script=[
                scenario.user("Call the external service at endpoint /api/data"),
                scenario.agent(),
                # Verify the mock was called with specific endpoint extracted by the LLM
                # This proves the LLM correctly extracted "/api/data" from the user message
                lambda state: mock_service.assert_called_once_with(
                    endpoint="/api/data"
                ),
                check_success_in_message,
                scenario.succeed(),
            ],
        )
 
        assert result.success

4. LLM Provider Mocking (Level 2)

For testing agent flow without actual LLM calls, you can mock the model provider APIs. However, Scenario's caching system is often a better solution for deterministic, cost-effective testing.

python
# Source: https://github.com/langwatch/scenario/blob/main/python/examples/test_llm_provider_mocking.py
"""
Example test demonstrating LLM provider mocking using dependency injection.
 
This example shows how to mock LLM responses by injecting a mock LLM client
into the agent, avoiding global mocking that affects the entire framework.
"""
 
import pytest
import scenario
from unittest.mock import Mock
 
 
class MockLLM:
    """Mock LLM client that returns deterministic responses."""
 
    def __init__(self):
        self.call_count = 0
        self.last_messages = None
        self.last_model = None
 
    def completion(self, model: str, messages: list) -> Mock:
        """Mock completion method that returns deterministic responses."""
        self.call_count += 1
        self.last_messages = messages
        self.last_model = model
 
        # Create mock response structure
        mock_response = Mock()
        mock_message = Mock()
        mock_message.content = "I can help you with that request."
        mock_choice = Mock()
        mock_choice.message = mock_message
        mock_response.choices = [mock_choice]
 
        return mock_response
 
 
class ChatAgent(scenario.AgentAdapter):
    """Chat agent that accepts an LLM client (real or mock) via dependency injection."""
 
    def __init__(self, llm_client=None):
        self.llm_client = llm_client
 
    async def call(self, input: scenario.AgentInput) -> scenario.AgentReturnTypes:
        # Use the injected LLM client (could be real litellm or our mock)
        assert self.llm_client is not None, "LLM client must be provided"
        response = self.llm_client.completion(
            model="openai/gpt-4o-mini",
            messages=input.messages,
        )
 
        return response.choices[0].message.content or ""
 
 
def check_specific_response(state: scenario.ScenarioState) -> None:
    """Check that the agent responded with expected mocked content."""
    last_msg = state.last_message()
    if last_msg["role"] == "assistant":
        content = last_msg.get("content", "")
        assert content == "I can help you with that request."
 
 
def check_mock_was_called_correctly(mock_llm: MockLLM) -> None:
    """Check that the mock LLM was called with expected parameters."""
    assert mock_llm.last_messages is not None, "Mock was not called"
    assert mock_llm.call_count == 1
    assert mock_llm.last_model == "openai/gpt-4o-mini"
    assert len(mock_llm.last_messages) == 2
    assert mock_llm.last_messages[0]["role"] == "user"
    assert "Hello there!" in mock_llm.last_messages[0]["content"]
 
 
@pytest.mark.agent_test
@pytest.mark.asyncio
async def test_llm_provider_mocking():
    """Test agent behavior using a mock LLM client."""
 
    # Create our mock LLM client
    mock_llm = MockLLM()
 
    result = await scenario.run(
        name="llm mock test",
        description="Test agent behavior with mock LLM client",
        agents=[
            ChatAgent(llm_client=mock_llm),
            scenario.UserSimulatorAgent(model="openai/gpt-4o-mini"),
        ],
        script=[
            scenario.user("Hello there!"),
            scenario.agent(),
            # Verify the mock LLM was called with expected parameters
            lambda state: check_mock_was_called_correctly(mock_llm),
            # Verify we got the expected mocked response
            check_specific_response,
            scenario.succeed(),
        ],
    )
 
    assert result.success
    # Additional verification outside the scenario
    assert mock_llm.call_count == 1
    assert mock_llm.last_model == "openai/gpt-4o-mini"

Best Practices

1. Mock Tools, Not Agent Logic

Mock external dependencies through their tool interfaces, not your agent's core reasoning logic.

typescript
// ✅ Good: Mock the tool function
const mockWeatherTool = vi.fn().mockResolvedValue("Sunny, 75°F");
 
const weatherTool = tool({
  description: "Get weather data",
  parameters: z.object({ city: z.string() }),
  execute: mockWeatherTool,
});
 
// ❌ Bad: Mock the agent's reasoning
const mockAgent = vi.fn().mockResolvedValue("It's sunny");

2. Use Realistic Tool Responses

Make your tool mocks return data that matches real-world API responses and error conditions.

typescript
// ✅ Good: Realistic API response structure
mockApiTool.mockResolvedValue({
  data: { temperature: 75, condition: "sunny", humidity: 65 },
  status: "success",
  timestamp: "2024-01-15T10:30:00Z",
});
 
// ❌ Bad: Oversimplified response
mockApiTool.mockResolvedValue("sunny");

3. Test Tool Failure Scenarios

Agents must handle tool failures gracefully - test timeouts, rate limits, and error responses.

typescript
// Test multiple failure scenarios
it("should handle tool timeout", async () => {
  mockTool.mockRejectedValue(new Error("Request timeout"));
  // Test agent's timeout handling
});
 
it("should handle rate limits", async () => {
  mockTool.mockRejectedValue(new Error("Rate limit exceeded"));
  // Test agent's rate limit handling
});

General Testing Best Practices

For broader testing practices like descriptive naming, test isolation, and mock cleanup, see:

Related Concepts

  • Fixtures – Static test assets for deterministic scenarios
  • Tool Calling – Testing agent tool usage and responses
  • Cache – Caching LLM calls for faster, deterministic runs

Next up: learn how to cache LLM calls for even faster, deterministic runs.