perf: halve avatar fps and skip TTS resample on event loop by aliev · Pull Request #30 · GetStream/crashout-buddy

aliev · 2026-05-15T07:45:28Z

Two further per-call CPU wins found by walking the AnamAvatarPublisher send path against the running venv.

1. Avatar VP8 encode rate: 30fps → 15fps

AnamAvatarPublisher wraps an AVSynchronizer whose video_track inherits from aiortc's VideoStreamTrack. The track's _framerate attribute drives next_timestamp(), which is what wakes up recv() and triggers VP8 encode of the next frame.

At 30fps × 7 concurrent sessions on one pod that's ~210 encodes/sec in the executor thread pool. py-spy dumps during the previous load test showed 5 worker threads simultaneously inside encode (aiortc/codecs/vpx.py:240) — half of those go away at 15fps.

Patching _framerate directly on the instance is brittle (private attribute, no public setter in 0.5.8), but it's a one-line override and the upstream "subclass or wrap" contract would require copying significantly more state to achieve the same thing. 15fps is imperceptible for a talking-head avatar.

2. Inworld TTS native 24kHz output

anam.AnamAvatarPublisher._send_audio calls pcm.resample(target_sample_rate=24000, target_channels=1) for every TTS chunk before pushing it to Anam's audio input stream. The resample call is synchronous (no thread offload) and runs on the asyncio event loop. With the default Inworld TTS output of 16kHz, every chunk triggers numpy linear interpolation inline.

getstream's PcmData.resample does an early return when source sample rate and channels already match the target. Setting inworld.TTS(..., sample_rate=24000) makes the TTS itself emit at 24kHz, so the resample call is now a no-op and the event loop is spared the work.

This was the exact shape we'd been looking for — synchronous numpy work on the loop, scaling with concurrent TTS streams, invisible to profilers as a Python-level hot spot because numpy releases the GIL during the call but still blocks the loop coroutine.

What this PR does not address

Per-pod single-event-loop saturation when one pod ends up with 6+ sessions due to LB skew. Likely next step is more replicas, either with smaller per-pod requests or on a bigger node.

Two further per-call CPU wins found by walking through the `AnamAvatarPublisher` send path against the running venv. ## 1. Avatar VP8 encode rate: 30fps → 15fps `AnamAvatarPublisher` wraps an `AVSynchronizer` whose `video_track` inherits from `aiortc`'s `VideoStreamTrack`. The track's `_framerate` attribute drives `next_timestamp()`, which is what wakes up `recv()` and triggers VP8 encode of the next frame. At 30fps × 7 concurrent sessions on one pod that's ~210 encodes/sec in the executor thread pool. py-spy dumps during the previous load test showed 5 worker threads simultaneously inside `encode (aiortc/codecs/vpx.py:240)` — half of those go away at 15fps. Patching `_framerate` directly on the instance is brittle (private attribute, no public setter exists in 0.5.8), but it's a one-line override and the upstream contract for "subclass or wrap" requires copying significantly more state to do the same thing. 15fps is imperceptible for a talking-head avatar. ## 2. Inworld TTS native 24kHz output `anam.AnamAvatarPublisher._send_audio` calls `pcm.resample(target_sample_rate=24000, target_channels=1)` for every TTS chunk before pushing it to Anam's audio input stream. The `resample` call is **synchronous** (no thread offload) and runs on the asyncio event loop. With the default Inworld TTS output of 16kHz, every chunk triggers numpy linear interpolation inline. `getstream`'s `PcmData.resample` does an early return when source sample rate and channels already match the target. Setting `inworld.TTS(..., sample_rate=24000)` makes the TTS itself emit at 24kHz, so the resample call is now a no-op and the event loop is spared the work. This was the exact shape we'd been looking for — synchronous numpy work on the loop, scaling with concurrent TTS streams, invisible to profilers as a Python-level hot spot because numpy releases the GIL during the call but still blocks the loop coroutine. ## What this PR does not address Per-pod single-event-loop saturation when one pod ends up with 6+ sessions due to LB skew. Likely next step is more replicas, either with smaller per-pod requests or on a bigger node.

vercel · 2026-05-15T07:45:32Z

@aliev must be a member of the GetStreamio team on Vercel to deploy.
- Click here to add @aliev to the team.
- If you initiated this build, request access.

Learn more about collaboration on Vercel and other options here.

aliev marked this pull request as ready for review May 15, 2026 07:45

aliev merged commit 011847a into main May 15, 2026
1 of 3 checks passed

aliev deleted the perf/lower-avatar-fps-and-tts-native-rate branch May 15, 2026 07:48

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

perf: halve avatar fps and skip TTS resample on event loop#30

perf: halve avatar fps and skip TTS resample on event loop#30
aliev merged 1 commit into
mainfrom
perf/lower-avatar-fps-and-tts-native-rate

aliev commented May 15, 2026

Uh oh!

vercel Bot commented May 15, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

aliev commented May 15, 2026

1. Avatar VP8 encode rate: 30fps → 15fps

2. Inworld TTS native 24kHz output

What this PR does not address

Uh oh!

vercel Bot commented May 15, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant