Fix/aws realtime tooluse barge in #3704

kachenjr · 2025-10-22T20:09:49Z

Fix Nova Sonic audio routing, tool use, and barge-in handling

This commit fixes three critical bugs in the Nova Sonic realtime plugin that
prevent it from working correctly in production scenarios.

Issues Fixed

1. Audio Routing After Tool Calls

Problem: Audio frames not playing after tool execution
Root Cause: Audio routed to wrong generation after tool calls complete
Solution: Create new generation per ASSISTANT SPECULATIVE text event
Impact: Audio now plays correctly for each assistant response

Nova Sonic sends ASSISTANT SPECULATIVE text events to signal new assistant
turns, including after tool calls. Each turn needs its own generation to
ensure audio frames route to the correct audio channel.

2. Tool Use Across Multiple Turns

Problem: Tool calls fail or behave incorrectly across multiple turns
Root Cause: Generation not closed after tool call, preventing framework
from delivering tool results via update_chat_ctx()
Solution: Close generation immediately after emitting tool call
Impact: Tool use now works reliably across multiple conversation turns

The LiveKit framework expects the generation to close so it can call
update_chat_ctx() with tool results. A new generation is created when
Nova Sonic sends the next ASSISTANT SPECULATIVE event with the response.

3. Crashes on User Interruption (Barge-In)

Problem: Session crashes when user interrupts assistant mid-response
Root Cause: Race conditions with future initialization and None pointer
access after barge-in sets _current_generation to None
Solution:

Initialize futures as None, create lazily in initialize_streams()
Add defensive None checks throughout event handlers
Impact: Interruptions handled gracefully without crashes

Creating futures in init causes race conditions during session restart.
Lazy initialization ensures the event loop exists before future creation.

Additional Improvements

Simplified Architecture

Message tracking: Single content_id_map dict instead of 4 separate dicts
(messages, user_messages, speculative_messages, tool_messages)
Restart tracking: Per-turn _restart_attempts instead of session-level
tracking for better barge-in metrics
Timestamps: Float (time.time()) instead of ISO-8601 strings for easier
duration calculations

The single dict approach is simpler, easier to debug, and sufficient for
tracking content IDs and their types. Both approaches have identical memory
characteristics (no leaks) since dicts live inside _ResponseGeneration
instances that are created and destroyed per turn.

Adopted from Origin

Added ModelStreamErrorException to recoverable errors (from origin/main
commit c674705, Oct 16, 2025)

Testing

All features verified working:

Audio playback ✓
Tool use (multiple turns) ✓
Barge-in/interruptions ✓
Multi-turn conversations ✓

Tested against origin/main and confirmed tool use does not work without
these fixes.

Breaking Changes

None. Public API unchanged. Metrics format unchanged. Only internal
implementation differs.

AI Assistance

Portions of this code were developed with assistance from AI tools for
debugging, testing, and implementation of the fixes described above.

Co-authored-by: Amazon Q Developer

This commit fixes three critical bugs in the Nova Sonic realtime plugin that prevent it from working correctly in production scenarios. ## Issues Fixed ### 1. Audio Routing After Tool Calls **Problem**: Audio frames not playing after tool execution **Root Cause**: Audio routed to wrong generation after tool calls complete **Solution**: Create new generation per ASSISTANT SPECULATIVE text event **Impact**: Audio now plays correctly for each assistant response Nova Sonic sends ASSISTANT SPECULATIVE text events to signal new assistant turns, including after tool calls. Each turn needs its own generation to ensure audio frames route to the correct audio channel. ### 2. Tool Use Across Multiple Turns **Problem**: Tool calls fail or behave incorrectly across multiple turns **Root Cause**: Generation not closed after tool call, preventing framework from delivering tool results via update_chat_ctx() **Solution**: Close generation immediately after emitting tool call **Impact**: Tool use now works reliably across multiple conversation turns The LiveKit framework expects the generation to close so it can call update_chat_ctx() with tool results. A new generation is created when Nova Sonic sends the next ASSISTANT SPECULATIVE event with the response. ### 3. Crashes on User Interruption (Barge-In) **Problem**: Session crashes when user interrupts assistant mid-response **Root Cause**: Race conditions with future initialization and None pointer access after barge-in sets _current_generation to None **Solution**: - Initialize futures as None, create lazily in initialize_streams() - Add defensive None checks throughout event handlers **Impact**: Interruptions handled gracefully without crashes Creating futures in __init__ causes race conditions during session restart. Lazy initialization ensures the event loop exists before future creation. ## Additional Improvements ### Simplified Architecture - **Message tracking**: Single content_id_map dict instead of 4 separate dicts (messages, user_messages, speculative_messages, tool_messages) - **Restart tracking**: Per-turn _restart_attempts instead of session-level tracking for better barge-in metrics - **Timestamps**: Float (time.time()) instead of ISO-8601 strings for easier duration calculations The single dict approach is simpler, easier to debug, and sufficient for tracking content IDs and their types. Both approaches have identical memory characteristics (no leaks) since dicts live inside _ResponseGeneration instances that are created and destroyed per turn. ### Adopted from Origin - Added ModelStreamErrorException to recoverable errors (from origin/main commit c674705, Oct 16, 2025) - Removed child-safety line from DEFAULT_SYSTEM_PROMPT to match origin ## Testing All features verified working: - Audio playback ✓ - Tool use (multiple turns) ✓ - Barge-in/interruptions ✓ - Multi-turn conversations ✓ Tested against origin/main and confirmed tool use does not work without these fixes. ## Breaking Changes None. Public API unchanged. Metrics format unchanged. Only internal implementation differs. ## AI Assistance Portions of this code were developed with assistance from AI tools for debugging, testing, and implementation of the fixes described above. --- Co-authored-by: Amazon Q Developer

Add safe_mode=True to jokeapi call to filter out inappropriate content.

CLAassistant · 2025-10-22T20:20:50Z

All committers have signed the CLA.

- Add None check before accessing content_id_map in tool handler - Add type cast for audio_bytes to satisfy mypy Buffer type requirement

Add type cast for tool task result to fix indexing error on line 1261

theomonnom · 2025-10-27T05:10:09Z

Thanks!

Co-authored-by: Jarrett Kachenmeister <[email protected]>

JarrettAWS added 3 commits October 22, 2025 15:50

Ensure jokes are age-appropriate in realtime_joke_teller example

fa8047b

Add safe_mode=True to jokeapi call to filter out inappropriate content.

Default to Pun

525a602

JarrettAWS and others added 3 commits October 22, 2025 16:33

Fix mypy type checking issues

2e634ca

- Add None check before accessing content_id_map in tool handler - Add type cast for audio_bytes to satisfy mypy Buffer type requirement

Fix remaining mypy type checking issue

a89ac3c

Add type cast for tool task result to fix indexing error on line 1261

Merge branch 'livekit:main' into fix/aws-realtime-tooluse-barge-in

0c50b7c

kachenjr mentioned this pull request Oct 24, 2025

Nova Sonic Realtime Model: LLM goes silent after saying it will run a tool (tool triggers only after user input) #3664

Open

BumaldaOverTheWater94 approved these changes Oct 24, 2025

View reviewed changes

bcherry requested a review from a team October 24, 2025 17:58

theomonnom approved these changes Oct 27, 2025

View reviewed changes

theomonnom merged commit e6b2309 into livekit:main Oct 27, 2025
9 checks passed

akshaym1shra pushed a commit to akshaym1shra/agents that referenced this pull request Nov 3, 2025

Fix/aws realtime tooluse barge in (livekit#3704)

daba1fb

Co-authored-by: Jarrett Kachenmeister <[email protected]>

akshaym1shra pushed a commit to akshaym1shra/agents that referenced this pull request Nov 10, 2025

Fix/aws realtime tooluse barge in (livekit#3704)

e516bc6

Co-authored-by: Jarrett Kachenmeister <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix/aws realtime tooluse barge in #3704

Fix/aws realtime tooluse barge in #3704

Uh oh!

kachenjr commented Oct 22, 2025

Uh oh!

CLAassistant commented Oct 22, 2025 •

edited

Loading

Uh oh!

Uh oh!

theomonnom commented Oct 27, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Fix/aws realtime tooluse barge in #3704

Fix/aws realtime tooluse barge in #3704

Uh oh!

Conversation

kachenjr commented Oct 22, 2025

Fix Nova Sonic audio routing, tool use, and barge-in handling

Issues Fixed

1. Audio Routing After Tool Calls

2. Tool Use Across Multiple Turns

3. Crashes on User Interruption (Barge-In)

Additional Improvements

Simplified Architecture

Adopted from Origin

Testing

Breaking Changes

AI Assistance

Uh oh!

CLAassistant commented Oct 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

theomonnom commented Oct 27, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

CLAassistant commented Oct 22, 2025 •

edited

Loading