Realtime transcription endpoint

### Problem Statement

vLLM supports realtime transcription using Websockets:

https://developers.openai.com/api/docs/guides/speech-to-text#streaming-the-transcription-of-an-ongoing-audio-recording
https://developers.openai.com/api/docs/guides/realtime-transcription
https://developers.openai.com/api/docs/guides/realtime?use-case=transcription#connect-with-websockets
https://docs.vllm.ai/en/latest/serving/openai_compatible_server/#realtime-api
https://docs.vllm.ai/en/latest/models/supported_models/#realtime-transcription

The current audio benchmarking is insufficient to benchmark this kind of new realtime audio models. As voice agents gain popularity, this kind of realtime communication becomes the standard, and guidellm is left behind.

### Proposed Solution

Can we implement realtime endpoint support for audio models?

### Alternatives Considered

Audio models come in two flavors - realtime and synchronous. This means that we can't just use synchronous mode for realtime models. Supporting realtime models required supporting the realtime endpoint.

### Usage Examples

```markdown
python3 -m vllm.entrypoints.openai.api_server \
  --model mistralai/Voxtral-Mini-4B-Realtime-2602 \
  --tokenizer-mode mistral \
  --config-format mistral \
  --load-format mistral \
  --trust-remote-code \
  --compilation-config '{"cudagraph_mode":"PIECEWISE"}' \
  --tensor-parallel-size 1 \
  --max-model-len 45000 \
  --max-num-batched-tokens 8192 \
  --max-num-seqs 16 \
  --gpu-memory-utilization 0.90 \
  --host 0.0.0.0 --port 8000


guidellm benchmark \
  --target http://localhost:8000/v1 \
  --request-type audio_transcriptions_realtime \
  --data /workspace/custom-audio-dataset/hf_dataset \
  --profile synchronous \
  --max-requests 10 \
  --output-dir /workspace/repo/runs/2026-04-23T13-41-41 \
  --outputs json,html,csv
```

### Additional Context

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Realtime transcription endpoint #706

Problem Statement

Proposed Solution

Alternatives Considered

Usage Examples

Additional Context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Realtime transcription endpoint #706

Description

Problem Statement

Proposed Solution

Alternatives Considered

Usage Examples

Additional Context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions