feat(conductor): add opt-in voice STT to Telegram bridge#309
feat(conductor): add opt-in voice STT to Telegram bridge#309Abeansits wants to merge 3 commits intoasheshgoplani:mainfrom
Conversation
Replace Groq Whisper API with local parakeet-mlx (parakeet-tdt-0.6b-v3) for voice message transcription. Add TTS voice replies using macOS say + ffmpeg (OGG/Opus output), toggled via BRIDGE_TTS_ENABLED env var. - stt_worker.py: standalone subprocess worker that normalizes audio to mono 16kHz WAV and runs parakeet-mlx inference, crash-isolated from the bot event loop - bridge.py: transcribe_voice() calls stt_worker via async subprocess (60s timeout), generate_voice_reply() chains say + ffmpeg via async subprocesses with per-step timeouts and proper kill/cleanup Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Switch from importing parakeet-mlx as a Python library to invoking the parakeet-mlx CLI binary. This avoids import/dependency issues and is cleaner for subprocess-based isolation. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Strip generate_voice_reply(), BRIDGE_TTS_ENABLED/BRIDGE_TTS_VOICE config, bot.send_voice() TTS response block, say+ffmpeg pipeline, and FSInputFile import. Voice-to-text (STT) remains intact. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
asheshgoplani
left a comment
There was a problem hiding this comment.
Interesting voice transcription feature! A few issues to address:
-
Hardcoded ffmpeg path: The code uses a hardcoded path for ffmpeg instead of using
shutil.which('ffmpeg')as described in the PR description. Please use dynamic path resolution so it works across different systems and installation methods (e.g., Homebrew on macOS puts it in/opt/homebrew/bin/, Linux distros vary). -
Error handling: The voice transcription path needs proper error handling for common failure cases: ffmpeg not installed, microphone not available, transcription API timeout, invalid audio format. Users should get clear error messages rather than cryptic tracebacks.
-
Needs rebase: This PR is based on the stale v0.27.0 base. Main was rolled back to v0.26.4 after the Go 1.25 incident, so this needs a rebase onto current main.
Please fix the hardcoded path issue and add error handling, then rebase onto current main.
Summary
stt_worker.py), crash-isolated from the bot event loop. No cloud API needed.BRIDGE_STT_ENABLED=trueenv var is set. Without it, voice messages are silently ignored.stt_worker.pyusesshutil.which('parakeet-mlx')to find the CLI on PATH, withPARAKEET_CLI_PATHenv var as an explicit override.~/.agent-deck/conductor/bridge.login addition to stdout.Changes
conductor/bridge.py:transcribe_voice()downloads voice files and calls stt_worker via async subprocess (60s timeout).handle_message()now handlesmessage.voicewhenBRIDGE_STT_ENABLED=true.conductor/stt_worker.py: New standalone STT worker that finds and invokes theparakeet-mlxCLI, reads output text files, and prints transcription to stdout.Configuration
Test plan
BRIDGE_STT_ENABLED=true, send a voice message via Telegram and verify transcriptionTranscribing...status then the transcribed text is forwarded to the conductor[Could not transcribe voice message.]