Skip to content

perf: disable getstream NetworkMonitor and lower face-detector fps#29

Merged
aliev merged 1 commit into
mainfrom
perf/network-monitor-and-lower-mediapipe-fps
May 15, 2026
Merged

perf: disable getstream NetworkMonitor and lower face-detector fps#29
aliev merged 1 commit into
mainfrom
perf/network-monitor-and-lower-mediapipe-fps

Conversation

@aliev

@aliev aliev commented May 15, 2026

Copy link
Copy Markdown
Member

Two independent per-call CPU savings found during the avatar-throttling investigation.

1. Disable getstream's NetworkMonitor ping-loop

getstream's NetworkMonitor pings 8.8.8.8 / 1.1.1.1 / 208.67.222.222 every second per ConnectionManager via asyncio.run_in_executor -> ping3.ping(...). On a GKE pod (no CAP_NET_RAW) ICMP raw sockets aren't permitted, so every ping fails immediately with PermissionError — but the executor still pays the thread-pool round-trip for each one. At 9 concurrent sessions that's ~27 no-op executor crossings per second.

py-spy profiling on the previous c2-standard-4 setup showed the _ping_loop + ping3.receive_one_ping chain consuming ~30% of total samples. With the WS auto-reconnect path now fixed in getstream>=3.3.3, the active ICMP probing is redundant for a server-side agent — connection failures surface through the WebRTC stack itself.

Replace NetworkMonitor.start_monitoring with a no-op coroutine at import time. The monitor object is still constructed (so anything that references it doesn't crash), it just never starts the polling task.

2. MediaPipeFaceProcessor fps 8.0 → 2.0

The face landmarker was running at 8 fps. Profile dumps showed one executor thread holding the GIL inside MediaPipe detect whenever a frame was processed, and the cumulative ~2% of total samples in dispatch_and_free adds up across pods.

Dropping to 2 fps cuts face-detector CPU and GIL-hold time by ~4x. The downstream effects (gaze/engagement state, proactive-nudge timing, re-engagement cues) operate on a multi-second window already, so the slower update cadence is well within tolerance for this use case.

Expected effect

  • Per-call CPU should drop visibly when measured on the pod (kubectl top pod).
  • The ping-loop stack frame should disappear from py-spy captures.
  • No functional regression on gaze-aware proactive behaviours.

Risks / things to verify after deploy

  • Confirm the NetworkMonitor no-op import path runs before any ConnectionManager is constructed (it's at module level in main.py, so it should).
  • Eyeball that the proactive nudge / re-engagement cue still feels natural at 2 fps face state updates.

Two independent per-call CPU savings found during the avatar-throttling
investigation.

NetworkMonitor monkey-patch
---------------------------

`getstream`'s `NetworkMonitor` pings 8.8.8.8 / 1.1.1.1 / 208.67.222.222
every second per `ConnectionManager` via `asyncio.run_in_executor` ->
`ping3.ping(...)`. On a GKE pod (no `CAP_NET_RAW`) ICMP raw sockets
aren't permitted, so every ping fails immediately with
`PermissionError` — but the executor still pays the thread-pool
round-trip for each one. At 9 concurrent sessions that's ~27 no-op
executor crossings per second.

py-spy profiling on the previous c2-standard-4 setup showed
`_ping_loop` + `ping3.receive_one_ping` chain consuming ~30% of total
samples. With the WS auto-reconnect path now fixed in
`getstream>=3.3.3`, the active ICMP probing is redundant for a
server-side agent — connection failures surface through the WebRTC
stack itself.

Replace `NetworkMonitor.start_monitoring` with a no-op coroutine at
import time. The monitor object is still constructed (so anything that
references it doesn't crash), it just never starts the polling task.

MediaPipe fps 8.0 -> 2.0
------------------------

`MediaPipeFaceProcessor` was running the face landmarker model at
8 fps. Profile dumps showed one executor thread holding the GIL inside
MediaPipe `detect` whenever a frame was processed, and the cumulative
~2% of total samples in `dispatch_and_free` adds up across pods.

Dropping to 2 fps cuts face-detector CPU and GIL-hold time by ~4x.
The downstream effects (gaze/engagement state, proactive-nudge timing,
re-engagement cues) operate on a multi-second window already, so the
slower update cadence is well within tolerance for our use case.
@vercel

vercel Bot commented May 15, 2026

Copy link
Copy Markdown

@aliev must be a member of the GetStreamio team on Vercel to deploy.
- Click here to add @aliev to the team.
- If you initiated this build, request access.

Learn more about collaboration on Vercel and other options here.

@aliev aliev marked this pull request as ready for review May 15, 2026 07:06
@aliev aliev merged commit dad4ea8 into main May 15, 2026
2 of 3 checks passed
@aliev aliev deleted the perf/network-monitor-and-lower-mediapipe-fps branch May 15, 2026 07:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant