Problem Statement
Pipecat has GeminiTTSService, but it currently targets Google Cloud Text-to-Speech streaming models such as gemini-2.5-flash-tts / gemini-2.5-pro-tts via texttospeech_v1.StreamingSynthesize.
Google's Gemini speech generation docs now show Gemini TTS through google-genai generate_content with response_modalities=[AUDIO], including gemini-3.1-flash-tts-preview. The docs also state that TTS does not support streaming, so this seems like a different integration shape than the existing streaming service.
Proposed Solution
Would maintainers be open to a focused non-streaming Gemini TTS service for this path?
Possible shape:
- Use
google-genai client.models.generate_content(...)
- Support Gemini API speech generation models such as
gemini-3.1-flash-tts-preview
- Emit returned audio as
TTSAudioRawFrame after the response completes
- Document that this is not intended as a low-latency realtime TTS path
- Include a minimal mocked test and example
Alternative Solutions
This may be related to #4351, but that PR is Vertex/aiplatform-focused and uses stream_generate_content. This issue is specifically about the non-streaming Gemini API TTS flow documented here:
https://ai.google.dev/gemini-api/docs/speech-generation
Additional Context
Happy to help implement this if the direction sounds reasonable.
Would you be willing to help implement this feature?
Problem Statement
Pipecat has
GeminiTTSService, but it currently targets Google Cloud Text-to-Speech streaming models such asgemini-2.5-flash-tts/gemini-2.5-pro-ttsviatexttospeech_v1.StreamingSynthesize.Google's Gemini speech generation docs now show Gemini TTS through
google-genaigenerate_contentwithresponse_modalities=[AUDIO], includinggemini-3.1-flash-tts-preview. The docs also state that TTS does not support streaming, so this seems like a different integration shape than the existing streaming service.Proposed Solution
Would maintainers be open to a focused non-streaming Gemini TTS service for this path?
Possible shape:
google-genaiclient.models.generate_content(...)gemini-3.1-flash-tts-previewTTSAudioRawFrameafter the response completesAlternative Solutions
This may be related to #4351, but that PR is Vertex/aiplatform-focused and uses
stream_generate_content. This issue is specifically about the non-streaming Gemini API TTS flow documented here:https://ai.google.dev/gemini-api/docs/speech-generation
Additional Context
Happy to help implement this if the direction sounds reasonable.
Would you be willing to help implement this feature?