pipecat version
0.0.108
Python version
3.13
Operating System
Ubuntu 24.04
Issue description
Summary
We are seeing a reproducible issue with InworldTTSService + post-TTS assistant aggregation in pipecat-ai==0.0.108.
When Inworld returns word timestamps, Pipecat emits word-level TTSTextFrames without punctuation. Because the assistant context is built downstream from TTS, those punctuation-less tokens become the canonical assistant message stored in LLMContext.
That flattened assistant text is then reused in later LLM prompts, and the model starts imitating the punctuation-less style on subsequent turns.
This also shows up in frontend transcript streams: interim bot transcript buffers are built word-by-word without punctuation
Environment
pipecat-ai==0.0.108
pipecat-ai-flows>=0.0.22
InworldTTSService
- WebRTC call flow
Pipeline shape:
transport.input() -> stt -> context_aggregator.user() -> llm -> tts -> transport.output() -> context_aggregator.assistant()
Reproduction steps
- Use
InworldTTSService with assistant aggregation after TTS.
- Let the LLM produce a punctuated response with multiple clauses/sentences.
- Let Inworld return word timestamps.
- Observe that assistant history stored in context is punctuation-less.
- Trigger the next LLM turn.
- Observe that the next prompt already contains punctuation-less assistant messages and that the model starts imitating that style.
Expected behavior
- Assistant memory stored in
LLMContext should preserve the original assistant text punctuation.
- Future LLM prompts should not be degraded by punctuation-less TTS alignment text.
- Frontend transcript consumers should not receive a final transcript that is effectively a flattened run-on sentence with punctuation separated into its own final event.
Actual behavior
When Inworld timestamps are present:
- The spoken assistant text is reconstructed from word timestamps.
- Those timestamps contain bare words without punctuation.
LLMAssistantAggregator stores that punctuation-less text in assistant context.
- The next LLM request includes assistant history like:
Hey welcome back It’s good to have you again We’ll just pick up where we left off and continue with the screening interview Are you ready to get started with the next set of questions
instead of the original punctuated text.
- The LLM then starts replying in the same flattened style.
Logs
From our local logs, `OpenAILLMService` receives assistant history like this:
{'role': 'assistant', 'content': 'Hey welcome back It’s good to have you again We’ll just pick up where we left off and continue with the screening interview Are you ready to get started with the next set of questions'}
{'role': 'assistant', 'content': 'Hey are you still there Just wanted to check in real quick'}
{'role': 'assistant', 'content': 'Hey just checking in one more time are you ready to continue If I don’t hear back I’ll have to go ahead and end the call on my side'}
{'role': 'assistant', 'content': 'Hey uh this is virtual assistant actually thanks for jumping back in Are you ready to continue with the screening questions'}
{'role': 'assistant', 'content': 'Great thanks So just so I understand your current situation are you working right now or are you between roles'}
Those turns were originally generated as natural punctuated speech, but the history fed back into the LLM is flattened.
pipecat version
0.0.108
Python version
3.13
Operating System
Ubuntu 24.04
Issue description
Summary
We are seeing a reproducible issue with
InworldTTSService+ post-TTS assistant aggregation inpipecat-ai==0.0.108.When Inworld returns word timestamps, Pipecat emits word-level
TTSTextFrames without punctuation. Because the assistant context is built downstream from TTS, those punctuation-less tokens become the canonical assistant message stored inLLMContext.That flattened assistant text is then reused in later LLM prompts, and the model starts imitating the punctuation-less style on subsequent turns.
This also shows up in frontend transcript streams: interim bot transcript buffers are built word-by-word without punctuation
Environment
pipecat-ai==0.0.108pipecat-ai-flows>=0.0.22InworldTTSServicePipeline shape:
Reproduction steps
InworldTTSServicewith assistant aggregation after TTS.Expected behavior
LLMContextshould preserve the original assistant text punctuation.Actual behavior
When Inworld timestamps are present:
LLMAssistantAggregatorstores that punctuation-less text in assistant context.instead of the original punctuated text.
Logs