Added support for gpt4o-realtime models for Speect to Speech interactions #659
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR introduces real-time voice pipeline support for OpenAI’s
gpt-4o-realtime-preview
model, enabling seamless, low-latency speech-to-speech interactions in the Speect framework. The update brings a modern, streaming audio interface, integrated tool execution, and robust event handling—while maintaining full compatibility with the existing STT/TTS pipeline.Key Features & Changes
RealtimeVoicePipeline:
Integrated Tool Calls:
Event Handling & Debugging:
conversation.item.input_audio_transcription.delta
and.completed
)Echo & Feedback Mitigation:
input_audio_noise_reduction
in the session config.Sample Rate Fixes:
Backwards Compatibility:
Documentation & Examples:
docs/voice/pipeline.md
with new real-time usage, configuration, and troubleshooting sections.continuous_realtime_assistant.py
demonstrates push-to-talk, tool calls, and event handling.🛠️ How to Use
See the new example and documentation for how to use
RealtimeVoicePipeline
with your OpenAI API key and tools.No changes required—existing STT/TTS flows are unaffected.