perf: disable getstream NetworkMonitor and lower face-detector fps#29
Merged
Merged
Conversation
Two independent per-call CPU savings found during the avatar-throttling investigation. NetworkMonitor monkey-patch --------------------------- `getstream`'s `NetworkMonitor` pings 8.8.8.8 / 1.1.1.1 / 208.67.222.222 every second per `ConnectionManager` via `asyncio.run_in_executor` -> `ping3.ping(...)`. On a GKE pod (no `CAP_NET_RAW`) ICMP raw sockets aren't permitted, so every ping fails immediately with `PermissionError` — but the executor still pays the thread-pool round-trip for each one. At 9 concurrent sessions that's ~27 no-op executor crossings per second. py-spy profiling on the previous c2-standard-4 setup showed `_ping_loop` + `ping3.receive_one_ping` chain consuming ~30% of total samples. With the WS auto-reconnect path now fixed in `getstream>=3.3.3`, the active ICMP probing is redundant for a server-side agent — connection failures surface through the WebRTC stack itself. Replace `NetworkMonitor.start_monitoring` with a no-op coroutine at import time. The monitor object is still constructed (so anything that references it doesn't crash), it just never starts the polling task. MediaPipe fps 8.0 -> 2.0 ------------------------ `MediaPipeFaceProcessor` was running the face landmarker model at 8 fps. Profile dumps showed one executor thread holding the GIL inside MediaPipe `detect` whenever a frame was processed, and the cumulative ~2% of total samples in `dispatch_and_free` adds up across pods. Dropping to 2 fps cuts face-detector CPU and GIL-hold time by ~4x. The downstream effects (gaze/engagement state, proactive-nudge timing, re-engagement cues) operate on a multi-second window already, so the slower update cadence is well within tolerance for our use case.
|
@aliev must be a member of the GetStreamio team on Vercel to deploy. Learn more about collaboration on Vercel and other options here. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Two independent per-call CPU savings found during the avatar-throttling investigation.
1. Disable
getstream'sNetworkMonitorping-loopgetstream'sNetworkMonitorpings8.8.8.8/1.1.1.1/208.67.222.222every second perConnectionManagerviaasyncio.run_in_executor->ping3.ping(...). On a GKE pod (noCAP_NET_RAW) ICMP raw sockets aren't permitted, so every ping fails immediately withPermissionError— but the executor still pays the thread-pool round-trip for each one. At 9 concurrent sessions that's ~27 no-op executor crossings per second.py-spy profiling on the previous
c2-standard-4setup showed the_ping_loop+ping3.receive_one_pingchain consuming ~30% of total samples. With the WS auto-reconnect path now fixed ingetstream>=3.3.3, the active ICMP probing is redundant for a server-side agent — connection failures surface through the WebRTC stack itself.Replace
NetworkMonitor.start_monitoringwith a no-op coroutine at import time. The monitor object is still constructed (so anything that references it doesn't crash), it just never starts the polling task.2.
MediaPipeFaceProcessorfps 8.0 → 2.0The face landmarker was running at 8 fps. Profile dumps showed one executor thread holding the GIL inside MediaPipe
detectwhenever a frame was processed, and the cumulative ~2% of total samples indispatch_and_freeadds up across pods.Dropping to 2 fps cuts face-detector CPU and GIL-hold time by ~4x. The downstream effects (gaze/engagement state, proactive-nudge timing, re-engagement cues) operate on a multi-second window already, so the slower update cadence is well within tolerance for this use case.
Expected effect
kubectl top pod).Risks / things to verify after deploy
NetworkMonitorno-op import path runs before anyConnectionManageris constructed (it's at module level inmain.py, so it should).