Skip to content

Multimodal Use Cases – Overview

Many modern agents must process more than just text. Scenario supports tests where your agent receives images, files, audio, and other modalities – individually or combined.

This section shows how to structure such tests, common pitfalls, and best-practices for reliable evaluation.

Available Guides

  • Voice Agents – testing agents that listen to audio, think, and respond with either text or audio.
  • Images – testing agents that process multiple images along with user messages.
  • Filescoming soon (PDF, CSV, etc.).

Next Steps

Pick a modality from the list above and follow the dedicated guide.