Skip to content

Conversation

@dpmishler
Copy link

Add Flux Conversational Speech Recognition Support

This PR adds support for Deepgram's Flux model (flux-general-en), the first conversational speech recognition model built specifically for voice agents.

Features Added

  • New flux_request() and flux_request_with_options() methods for Flux streaming
  • Support for turn-based conversation detection with FluxResponse types
  • Configurable end-of-turn detection parameters:
    • eot_threshold - Confidence required for EndOfTurn events
    • eager_eot_threshold - Confidence for early EagerEndOfTurn events
    • eot_timeout_ms - Maximum silence before forcing turn end
  • New TurnEvent enum with variants: StartOfTurn, EndOfTurn, EagerEndOfTurn, TurnResumed, Update
  • Uses /v2/listen endpoint for Flux API

Examples

  • simple_flux - File streaming example
  • microphone_flux - Real-time microphone streaming example

Implementation Details

  • Follows existing websocket.rs patterns for consistency
  • Comprehensive error handling and edge case coverage
  • Partial frame handling for fragmented JSON messages
  • Proper connection state tracking and cleanup
  • Tests for URL construction and query parameter encoding

Documentation

  • Updated CHANGELOG.md with Flux feature details
  • Updated examples/README.md with Flux examples
  • Inline documentation with links to Deepgram Flux API Reference

Testing

  • All existing tests pass (131 tests)
  • New tests for Flux URL construction and query encoding
  • Verified with real API using both file and microphone examples

- Replace serde_json::json! macro with ControlMessage enum + serde_json::to_string
- Matches websocket.rs pattern for compatibility with minimal dependency versions
- Fix clippy warning in text_to_speech_to_stream example
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant