Multimodal Use Cases – Overview
Many modern agents must process more than just text. Scenario supports tests where your agent receives images, files, audio, and other modalities – individually or combined.
This section shows how to structure such tests, common pitfalls, and best-practices for reliable evaluation.
Available Guides
- Voice Agents – testing agents that listen to audio, think, and respond with either text or audio.
- Images – testing agents that process multiple images along with user messages.
- Files – coming soon (PDF, CSV, etc.).
Next Steps
Pick a modality from the list above and follow the dedicated guide.