Skip to content

Testing Voice Agents

Scenario lets you write end-to-end tests for agents that listen to audio, think, and respond with either text or audio.

Video Demo

This video shows a complete example of a black box test for a voice-to-voice conversation between an agent and a user simulator.

Testing Approaches

Choose the approach that matches your voice agent's architecture:

Use-case comparison

ScenarioInputExpected OutputTypical Judge Model
Audio → Textfile part (audio) + optional promptTextgpt-4o-audio-preview or any GPT-4-level text model
Audio → Audiofile part (audio) + optional promptAudio (voice response)gpt-4o-audio-preview (handles audio)
Voice-to-Voice ConversationMultiple turns, both sides send/receive audioAudio dialogueSame as above; judge runs after conversation

General Troubleshooting

Judge ignores assistant audio
Use wrapper utilities (wrap_judge_for_audio in Python or wrapJudgeForAudio in TypeScript) to automatically transcribe audio for judge evaluation.

Tests time-out or hang
Voice models are slower. For TypeScript: --timeout 60000 or VITEST_MAX_WORKERS=1. For Python: @pytest.mark.timeout(120).

"Unsupported media type" errors
Helpers support WAV and MP3. Ensure audio files use these formats.

CI machines without audio hardware
All examples work headlessly—no speakers or microphone required.

💸 Cost tip: Each voice request bills as a standard GPT-4 audio-preview call. Keep fixture lengths and turn counts reasonable.

Related Resources