Skip to content

perf: halve avatar fps and skip TTS resample on event loop#30

Merged
aliev merged 1 commit into
mainfrom
perf/lower-avatar-fps-and-tts-native-rate
May 15, 2026
Merged

perf: halve avatar fps and skip TTS resample on event loop#30
aliev merged 1 commit into
mainfrom
perf/lower-avatar-fps-and-tts-native-rate

Conversation

@aliev

@aliev aliev commented May 15, 2026

Copy link
Copy Markdown
Member

Two further per-call CPU wins found by walking the AnamAvatarPublisher send path against the running venv.

1. Avatar VP8 encode rate: 30fps → 15fps

AnamAvatarPublisher wraps an AVSynchronizer whose video_track inherits from aiortc's VideoStreamTrack. The track's _framerate attribute drives next_timestamp(), which is what wakes up recv() and triggers VP8 encode of the next frame.

At 30fps × 7 concurrent sessions on one pod that's ~210 encodes/sec in the executor thread pool. py-spy dumps during the previous load test showed 5 worker threads simultaneously inside encode (aiortc/codecs/vpx.py:240) — half of those go away at 15fps.

Patching _framerate directly on the instance is brittle (private attribute, no public setter in 0.5.8), but it's a one-line override and the upstream "subclass or wrap" contract would require copying significantly more state to achieve the same thing. 15fps is imperceptible for a talking-head avatar.

2. Inworld TTS native 24kHz output

anam.AnamAvatarPublisher._send_audio calls pcm.resample(target_sample_rate=24000, target_channels=1) for every TTS chunk before pushing it to Anam's audio input stream. The resample call is synchronous (no thread offload) and runs on the asyncio event loop. With the default Inworld TTS output of 16kHz, every chunk triggers numpy linear interpolation inline.

getstream's PcmData.resample does an early return when source sample rate and channels already match the target. Setting inworld.TTS(..., sample_rate=24000) makes the TTS itself emit at 24kHz, so the resample call is now a no-op and the event loop is spared the work.

This was the exact shape we'd been looking for — synchronous numpy work on the loop, scaling with concurrent TTS streams, invisible to profilers as a Python-level hot spot because numpy releases the GIL during the call but still blocks the loop coroutine.

What this PR does not address

Per-pod single-event-loop saturation when one pod ends up with 6+ sessions due to LB skew. Likely next step is more replicas, either with smaller per-pod requests or on a bigger node.

Two further per-call CPU wins found by walking through the
`AnamAvatarPublisher` send path against the running venv.

## 1. Avatar VP8 encode rate: 30fps → 15fps

`AnamAvatarPublisher` wraps an `AVSynchronizer` whose `video_track`
inherits from `aiortc`'s `VideoStreamTrack`. The track's
`_framerate` attribute drives `next_timestamp()`, which is what wakes
up `recv()` and triggers VP8 encode of the next frame.

At 30fps × 7 concurrent sessions on one pod that's ~210 encodes/sec
in the executor thread pool. py-spy dumps during the previous
load test showed 5 worker threads simultaneously inside
`encode (aiortc/codecs/vpx.py:240)` — half of those go away at 15fps.

Patching `_framerate` directly on the instance is brittle (private
attribute, no public setter exists in 0.5.8), but it's a one-line
override and the upstream contract for "subclass or wrap" requires
copying significantly more state to do the same thing. 15fps is
imperceptible for a talking-head avatar.

## 2. Inworld TTS native 24kHz output

`anam.AnamAvatarPublisher._send_audio` calls
`pcm.resample(target_sample_rate=24000, target_channels=1)` for every
TTS chunk before pushing it to Anam's audio input stream. The
`resample` call is **synchronous** (no thread offload) and runs on
the asyncio event loop. With the default Inworld TTS output of 16kHz,
every chunk triggers numpy linear interpolation inline.

`getstream`'s `PcmData.resample` does an early return when source
sample rate and channels already match the target. Setting
`inworld.TTS(..., sample_rate=24000)` makes the TTS itself emit at
24kHz, so the resample call is now a no-op and the event loop is
spared the work.

This was the exact shape we'd been looking for — synchronous numpy
work on the loop, scaling with concurrent TTS streams, invisible to
profilers as a Python-level hot spot because numpy releases the GIL
during the call but still blocks the loop coroutine.

## What this PR does not address

Per-pod single-event-loop saturation when one pod ends up with 6+
sessions due to LB skew. Likely next step is more replicas, either
with smaller per-pod requests or on a bigger node.
@vercel

vercel Bot commented May 15, 2026

Copy link
Copy Markdown

@aliev must be a member of the GetStreamio team on Vercel to deploy.
- Click here to add @aliev to the team.
- If you initiated this build, request access.

Learn more about collaboration on Vercel and other options here.

@aliev aliev marked this pull request as ready for review May 15, 2026 07:45
@aliev aliev merged commit 011847a into main May 15, 2026
1 of 3 checks passed
@aliev aliev deleted the perf/lower-avatar-fps-and-tts-native-rate branch May 15, 2026 07:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant