Added support for gpt4o-realtime models for Speect to Speech interactions by sharananurag998 · Pull Request #659 · openai/openai-agents-python

sharananurag998 · 2025-05-07T07:50:15Z

This PR introduces real-time voice pipeline support for OpenAI’s gpt-4o-realtime-preview model, enabling seamless, low-latency speech-to-speech interactions in the Speect framework. The update brings a modern, streaming audio interface, integrated tool execution, and robust event handling—while maintaining full compatibility with the existing STT/TTS pipeline.

Key Features & Changes

RealtimeVoicePipeline:
- New pipeline for direct, continuous audio-to-audio conversations with OpenAI’s real-time models.
- Handles streaming microphone input and speaker output at 24kHz, as required by the API.
- Supports push-to-talk and half-duplex operation to prevent echo/feedback.
Integrated Tool Calls:
- Tools are registered with the pipeline and executed automatically when the model requests a function call.
- Tool results are sent back to the model using the correct OpenAI Realtime API protocol.
Event Handling & Debugging:
- Full support for all major OpenAI Realtime API events, including:
  - Audio and text deltas
  - Tool call arguments (streamed and completed)
  - Transcription events (conversation.item.input_audio_transcription.delta and .completed)
  - Session and rate limit updates
- Example logs all transcription events for easy debugging of what the model “hears.”
Echo & Feedback Mitigation:
- Implements a buffer window after assistant audio playback to prevent microphone echo from triggering new turns.
- Optionally enables server-side noise/echo reduction via input_audio_noise_reduction in the session config.
Sample Rate Fixes:
- Ensures both input and output audio are always 24kHz PCM, as required by the OpenAI API (fixes “slow motion” audio bug).
Backwards Compatibility:
- All changes are fully compatible with the existing STT/TTS pipeline and configuration.
- Legacy examples and workflows continue to work without modification.
Documentation & Examples:
- Updated docs/voice/pipeline.md with new real-time usage, configuration, and troubleshooting sections.
- New example: continuous_realtime_assistant.py demonstrates push-to-talk, tool calls, and event handling.

🛠️ How to Use

Realtime Pipeline:
See the new example and documentation for how to use RealtimeVoicePipeline with your OpenAI API key and tools.
Classic Pipeline:
No changes required—existing STT/TTS flows are unaffected.

…ions - Added detailed documentation for the new `RealtimeVoicePipeline`, including usage examples and event handling for real-time audio interaction. - Introduced a new example script demonstrating the `RealtimeVoicePipeline` with continuous audio streaming and tool execution.

dkundel-openai · 2025-05-14T18:24:59Z

Thank you so much for the PR @sharananurag998! I'll try to look at the PR later this week. Thank you for your patience

sharananurag998 · 2025-05-15T04:24:02Z

@dkundel-openai @rm-openai

I haven't found a way for native speech-to-speech integration with an agent, but we can define an agent and use it as a tool in the real-time speech pipeline, and it works!

The agent-as-tool approach provides better latency than the STT-TTS-based VoicePipeline.

Also this branch has Juspay specific MCP tool handling changes since we're using the fork as a python dependency, I'll move it to a separate branch so that main can be merged.

@dkundel-openai you can review the new pipeline and let me know of any changes I'll be happy to work on it.

EmanueleTribi · 2025-05-29T15:21:35Z

Hi everyone, any news on this pull request or in general timeline to integrate the realtime api? i'm very much interested in using it with the SDK agent and i was wondering if to write my own code or to wait it to be directly integrated. Thanks!
@dkundel-openai @sharananurag998

github-actions · 2025-06-09T02:12:36Z

This PR is stale because it has been open for 10 days with no activity.

lobes · 2025-06-13T03:41:39Z

Hi everyone, any news on this pull request or in general timeline to integrate the realtime api? i'm very much interested in using it with the SDK agent and i was wondering if to write my own code or to wait it to be directly integrated. Thanks! @dkundel-openai @sharananurag998

I second this! Would love to have realtime STS support in this SDK.

github-actions · 2025-06-24T02:11:51Z

This PR is stale because it has been open for 10 days with no activity.

EmanueleTribi · 2025-06-24T11:11:01Z

Hi everyone, any news on this pull request or in general timeline to integrate the realtime api? i'm very much interested in using it with the SDK agent and i was wondering if to write my own code or to wait it to be directly integrated. Thanks! @dkundel-openai @sharananurag998

I second this! Would love to have realtime STS support in this SDK.

No news but I think on top of that the new Voice Live API has been released (both still in beta) so i don't think they Will integrate unless they released these API in a stable version

sibblegp · 2025-06-24T16:50:23Z

Darn, I was waiting on exactly this. Does it also do handoffs or do you have to use agent.as_tool()?

sibblegp · 2025-07-04T19:36:09Z

We should get this merged and support agent handoffs.

EmanueleTribi · 2025-07-04T20:05:10Z

Darn, I was waiting on exactly this. Does it also do handoffs or do you have to use agent.as_tool()?

I did a custom implementation using sdk agents as tools of the voice live, so it is definetly possibile

EmanueleTribi · 2025-07-04T20:15:17Z

We should get this merged and support agent handoffs.

I think they are already working on this since the typescript version already has it

sibblegp · 2025-07-04T21:31:09Z

I think the reason this hasn't been merged is because it doesn't follow a lot of the agents SDK workflow. You don't define a realtime agent for example. No handoffs for a collection of agents. It's a realtime implementation of tools but not agents. Also, I think context should be passed into the run command. The standard agent model also doesn't require you to name your contextwrapper "context". I call mine "wrapper" for example.

This is great work though and something the main SDK desperately needs. The Typescript library already has it.

sibblegp · 2025-07-04T21:38:48Z

Sorry I didn't see your replies till just now. Been reviewing your code for hours! I sincerely hope they are working on it. I have an issue about it. I wish they would confirm it's coming. It's very important to my product.

sibblegp · 2025-07-05T12:08:54Z

@sharananurag998 Could you pull the latest code from the agents SDK into your repo so I can use it? I'd be interested in hiring you as a consultant if you could help add some features. Please contact me. I'm not hard to find.

sibblegp · 2025-07-05T17:57:24Z

Got this working with Twilio but it truncates the final part of the audio and there is no way to cutoff a response if someone starts speaking.

seratch · 2025-08-12T07:54:36Z

This project now provides Realtime Agent module, which utilizes Realtime API in a consistent manner with Agents SDK TS. So, please try the module out: https://openai.github.io/openai-agents-python/realtime/quickstart/

Thank you so much again for your time and efforts on this PR!

sharananurag998 force-pushed the main branch 3 times, most recently from 8bcb389 to b8899f7 Compare May 7, 2025 11:06

sharananurag998 marked this pull request as draft May 7, 2025 14:29

feat: Context handling in realtime

692f4fd

sharananurag998 force-pushed the main branch from b8899f7 to 692f4fd Compare May 9, 2025 05:38

added context to tool input

acebd8e

rm-openai requested a review from dkundel-openai May 14, 2025 16:35

github-actions Bot added the stale label Jun 9, 2025

github-actions Bot removed the stale label Jun 14, 2025

github-actions Bot added the stale label Jun 24, 2025

github-actions Bot removed the stale label Jun 25, 2025

seratch added enhancement New feature or request feature:core feature:voice labels Jun 25, 2025

ALavallee mentioned this pull request Jul 4, 2025

When will realtime capabilities be added like they are in the Typescript SDK? #894

Closed

seratch closed this Aug 12, 2025

Conversation

sharananurag998 commented May 7, 2025

Key Features & Changes

🛠️ How to Use

Uh oh!

dkundel-openai commented May 14, 2025

Uh oh!

sharananurag998 commented May 15, 2025

Uh oh!

EmanueleTribi commented May 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot commented Jun 9, 2025

Uh oh!

lobes commented Jun 13, 2025

Uh oh!

github-actions Bot commented Jun 24, 2025

Uh oh!

EmanueleTribi commented Jun 24, 2025

Uh oh!

sibblegp commented Jun 24, 2025

Uh oh!

sibblegp commented Jul 4, 2025

Uh oh!

EmanueleTribi commented Jul 4, 2025

Uh oh!

EmanueleTribi commented Jul 4, 2025

Uh oh!

sibblegp commented Jul 4, 2025

Uh oh!

sibblegp commented Jul 4, 2025

Uh oh!

sibblegp commented Jul 5, 2025

Uh oh!

sibblegp commented Jul 5, 2025

Uh oh!

seratch commented Aug 12, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

EmanueleTribi commented May 29, 2025 •

edited

Loading