Skip to content

fix: re-trigger deferred context frame push on UserStoppedSpeakingFrame#4367

Open
elliottventures wants to merge 2 commits into
pipecat-ai:mainfrom
elliottventures:fix/llm-assistant-aggregator-user-stopped-rerun
Open

fix: re-trigger deferred context frame push on UserStoppedSpeakingFrame#4367
elliottventures wants to merge 2 commits into
pipecat-ai:mainfrom
elliottventures:fix/llm-assistant-aggregator-user-stopped-rerun

Conversation

@elliottventures
Copy link
Copy Markdown

What

When LLMAssistantAggregator._handle_function_call_result runs while self._user_speaking is True, the outer conditional currently drops the context push silently — unlike the bot-speaking branch, which sets _push_context_on_bot_stopped_speaking = True and re-triggers on BotStoppedSpeakingFrame. This PR mirrors that pattern for the user-side.

Why

A short user trigger utterance whose transcription-driven turn-start races the function-call result hits this window:

  1. [Transcription:user] [Next.] arrives
  2. _run_function_call fires (driven by the same transcription)
  3. _on_user_turn_started sets self._user_speaking = True
  4. FunctionCallResultFrame arrives at the aggregator
  5. Outer if run_llm and not self._user_speaking: evaluates False → push silently dropped
  6. _on_user_turn_stopped fires a few ms later, unsetting _user_speaking — but nothing re-evaluates the dropped push

Result: the tool response never reaches the LLM service. For Gemini Live specifically, this manifests as the model sitting silent waiting for a response to its function call, until the user speaks again and the interruption path kicks in. TTFB on affected turns can reach minutes.

The bot-speaking branch has had this covered via _push_context_on_bot_stopped_speaking + re-trigger in BotStoppedSpeakingFrame for a while. The user-side was missing the equivalent.

The fix

Five small edits to LLMAssistantAggregator, all mirroring the existing bot-side pattern:

  1. New flag _push_context_on_user_stopped_speaking in __init__
  2. Reset it in reset() and push_context_frame() alongside the bot flag
  3. Outer condition at the run_llm check changed from if run_llm and not self._user_speaking: to if run_llm:, with a new elif self._user_speaking: branch inside that sets the flag (before the existing elif self._bot_speaking:)
  4. process_frame's UserStoppedSpeakingFrame handler flushes the deferred push if the flag is set and the bot isn't speaking

Total +18 / -1 lines.

Testing

  • Existing tests/test_context_aggregators_universal.py — 37/37 pass pre- and post-change.
  • Verified against the same production voice agent referenced in fix: send realtime input after tool_result so Gemini 3.x runs inference #4366. Before this fix (subclass monkeypatch locally): every compound user utterance (e.g., "Freezer is on top. Next.") stalled silently. After this fix: clean turn completion across 20+ test turns.

Happy to add a targeted unit test for the race if a maintainer has a preferred shape — it needs a bit of plumbing to simulate the specific UserStartedSpeakingFrame → FunctionCallResultFrame → UserStoppedSpeakingFrame ordering.

Related

Companion to #4366. Both surfaced during the same Gemini Live function-calling diagnosis.

@elliottventures
Copy link
Copy Markdown
Author

@markbackman Wanted to make sure you guys saw this one as well

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant