Skip to content

Realtime transcription endpoint #706

@AlonKellner-RedHat

Description

@AlonKellner-RedHat

Problem Statement

vLLM supports realtime transcription using Websockets:

https://developers.openai.com/api/docs/guides/speech-to-text#streaming-the-transcription-of-an-ongoing-audio-recording
https://developers.openai.com/api/docs/guides/realtime-transcription
https://developers.openai.com/api/docs/guides/realtime?use-case=transcription#connect-with-websockets
https://docs.vllm.ai/en/latest/serving/openai_compatible_server/#realtime-api
https://docs.vllm.ai/en/latest/models/supported_models/#realtime-transcription

The current audio benchmarking is insufficient to benchmark this kind of new realtime audio models. As voice agents gain popularity, this kind of realtime communication becomes the standard, and guidellm is left behind.

Proposed Solution

Can we implement realtime endpoint support for audio models?

Alternatives Considered

Audio models come in two flavors - realtime and synchronous. This means that we can't just use synchronous mode for realtime models. Supporting realtime models required supporting the realtime endpoint.

Usage Examples

python3 -m vllm.entrypoints.openai.api_server \
  --model mistralai/Voxtral-Mini-4B-Realtime-2602 \
  --tokenizer-mode mistral \
  --config-format mistral \
  --load-format mistral \
  --trust-remote-code \
  --compilation-config '{"cudagraph_mode":"PIECEWISE"}' \
  --tensor-parallel-size 1 \
  --max-model-len 45000 \
  --max-num-batched-tokens 8192 \
  --max-num-seqs 16 \
  --gpu-memory-utilization 0.90 \
  --host 0.0.0.0 --port 8000


guidellm benchmark \
  --target http://localhost:8000/v1 \
  --request-type audio_transcriptions_realtime \
  --data /workspace/custom-audio-dataset/hf_dataset \
  --profile synchronous \
  --max-requests 10 \
  --output-dir /workspace/repo/runs/2026-04-23T13-41-41 \
  --outputs json,html,csv

Additional Context

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions