Skip to content

Support non-streaming Gemini 3.1 Flash TTS via google-genai #4444

@Anrahya

Description

@Anrahya

Problem Statement

Pipecat has GeminiTTSService, but it currently targets Google Cloud Text-to-Speech streaming models such as gemini-2.5-flash-tts / gemini-2.5-pro-tts via texttospeech_v1.StreamingSynthesize.

Google's Gemini speech generation docs now show Gemini TTS through google-genai generate_content with response_modalities=[AUDIO], including gemini-3.1-flash-tts-preview. The docs also state that TTS does not support streaming, so this seems like a different integration shape than the existing streaming service.

Proposed Solution

Would maintainers be open to a focused non-streaming Gemini TTS service for this path?

Possible shape:

  • Use google-genai client.models.generate_content(...)
  • Support Gemini API speech generation models such as gemini-3.1-flash-tts-preview
  • Emit returned audio as TTSAudioRawFrame after the response completes
  • Document that this is not intended as a low-latency realtime TTS path
  • Include a minimal mocked test and example

Alternative Solutions

This may be related to #4351, but that PR is Vertex/aiplatform-focused and uses stream_generate_content. This issue is specifically about the non-streaming Gemini API TTS flow documented here:

https://ai.google.dev/gemini-api/docs/speech-generation

Additional Context

Happy to help implement this if the direction sounds reasonable.

Would you be willing to help implement this feature?

  • Yes, I'd like to contribute

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions