Added support for gpt4o-realtime models for Speect to Speech interactions#659
Added support for gpt4o-realtime models for Speect to Speech interactions#659sharananurag998 wants to merge 3 commits intoopenai:mainfrom
Conversation
…ions - Added detailed documentation for the new `RealtimeVoicePipeline`, including usage examples and event handling for real-time audio interaction. - Introduced a new example script demonstrating the `RealtimeVoicePipeline` with continuous audio streaming and tool execution.
8bcb389 to
b8899f7
Compare
|
Thank you so much for the PR @sharananurag998! I'll try to look at the PR later this week. Thank you for your patience |
|
I haven't found a way for native speech-to-speech integration with an agent, but we can define an agent and use it as a tool in the real-time speech pipeline, and it works! The agent-as-tool approach provides better latency than the STT-TTS-based VoicePipeline. Also this branch has Juspay specific MCP tool handling changes since we're using the fork as a python dependency, I'll move it to a separate branch so that main can be merged. @dkundel-openai you can review the new pipeline and let me know of any changes I'll be happy to work on it. |
|
Hi everyone, any news on this pull request or in general timeline to integrate the realtime api? i'm very much interested in using it with the SDK agent and i was wondering if to write my own code or to wait it to be directly integrated. Thanks! |
|
This PR is stale because it has been open for 10 days with no activity. |
I second this! Would love to have realtime STS support in this SDK. |
|
This PR is stale because it has been open for 10 days with no activity. |
No news but I think on top of that the new Voice Live API has been released (both still in beta) so i don't think they Will integrate unless they released these API in a stable version |
|
Darn, I was waiting on exactly this. Does it also do handoffs or do you have to use agent.as_tool()? |
|
We should get this merged and support agent handoffs. |
I did a custom implementation using sdk agents as tools of the voice live, so it is definetly possibile |
I think they are already working on this since the typescript version already has it |
|
I think the reason this hasn't been merged is because it doesn't follow a lot of the agents SDK workflow. You don't define a realtime agent for example. No handoffs for a collection of agents. It's a realtime implementation of tools but not agents. Also, I think context should be passed into the run command. The standard agent model also doesn't require you to name your contextwrapper "context". I call mine "wrapper" for example. This is great work though and something the main SDK desperately needs. The Typescript library already has it. |
|
Sorry I didn't see your replies till just now. Been reviewing your code for hours! I sincerely hope they are working on it. I have an issue about it. I wish they would confirm it's coming. It's very important to my product. |
|
@sharananurag998 Could you pull the latest code from the agents SDK into your repo so I can use it? I'd be interested in hiring you as a consultant if you could help add some features. Please contact me. I'm not hard to find. |
|
Got this working with Twilio but it truncates the final part of the audio and there is no way to cutoff a response if someone starts speaking. |
|
This project now provides Realtime Agent module, which utilizes Realtime API in a consistent manner with Agents SDK TS. So, please try the module out: https://openai.github.io/openai-agents-python/realtime/quickstart/ Thank you so much again for your time and efforts on this PR! |
This PR introduces real-time voice pipeline support for OpenAI’s
gpt-4o-realtime-previewmodel, enabling seamless, low-latency speech-to-speech interactions in the Speect framework. The update brings a modern, streaming audio interface, integrated tool execution, and robust event handling—while maintaining full compatibility with the existing STT/TTS pipeline.Key Features & Changes
RealtimeVoicePipeline:
Integrated Tool Calls:
Event Handling & Debugging:
conversation.item.input_audio_transcription.deltaand.completed)Echo & Feedback Mitigation:
input_audio_noise_reductionin the session config.Sample Rate Fixes:
Backwards Compatibility:
Documentation & Examples:
docs/voice/pipeline.mdwith new real-time usage, configuration, and troubleshooting sections.continuous_realtime_assistant.pydemonstrates push-to-talk, tool calls, and event handling.🛠️ How to Use
See the new example and documentation for how to use
RealtimeVoicePipelinewith your OpenAI API key and tools.No changes required—existing STT/TTS flows are unaffected.