GH#14044: tighten voice-ai-models.md (147→126 lines)#14092
Conversation
…ion guidance Remove 4 Pick: lines and Selection by Priority table, both fully redundant with the Decision Flow tree. Move Decision Flow to top (primacy effect). Compress Riva table (remove redundant Role column, shorten headers). Preserve Bark expressiveness note in table. Zero knowledge loss: all 28 models, all specs, all cross-references, GPU planning, and decision paths retained.
|
Warning Rate limit exceeded
Your organization is not enrolled in usage-based pricing. Contact your admin to enable usage-based pricing to continue reviews beyond the rate limit, or try again in 22 minutes and 23 seconds. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
WalkthroughDocumentation restructuring of the Voice AI Models agent guide. A new Decision Flow section was added to route users to specific models based on use case (TTS, STT, or conversational S2S), while prior recommendation guidance was consolidated and the NVIDIA Riva Composable Pipelines section was simplified. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 inconclusive)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
🔍 Code Quality Report�[0;35m[MONITOR]�[0m Code Review Monitoring Report SonarCloud: 0 bugs, 0 vulnerabilities, 1 code smells Mon Mar 30 10:37:12 UTC 2026: Code review monitoring started 📈 Current Quality Metrics
Generated on: Mon Mar 30 10:37:15 UTC 2026 Generated by AI DevOps Framework Code Review Monitoring |
There was a problem hiding this comment.
Code Review
This pull request reorganizes the voice-ai-models.md documentation by moving the 'Decision Flow' section to the top for better visibility and removing several 'Pick' summary lines to reduce redundancy. While the reorganization improves the flow, feedback highlights that removing the summary for Local STT results in the loss of specific technical details—such as language support for Parakeet models and OS dependencies for Apple Speech—that are not currently reflected in the remaining tables.
| | NVIDIA Parakeet V3 | 0.6B | 9.6 | Fastest | 2GB | | ||
| | Apple Speech | Built-in | 9.0 | Fast | On-device | | ||
|
|
||
| Pick: Large v3 Turbo → best balance. Parakeet V3 → multilingual speed (25 langs). Parakeet V2 → English-only. Apple Speech → zero-setup macOS 26+. |
There was a problem hiding this comment.
Removing this line causes a loss of important information about language support for Parakeet V2 (English-only) and Parakeet V3 (multilingual), as well as the OS dependency for Apple Speech. This information is not present in the local STT table above or in the new Decision Flow. To adhere to the goal of 'Zero knowledge loss' and the project's practice of maintaining detailed explanations for key technical components, please consider adding this information to the local STT table before removing this summary.
References
- Restore detailed explanations for key concepts and technical details to ensure clarity and prevent knowledge loss.
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
.agents/tools/voice/voice-ai-models.md (1)
63-63: Line 63: Replace ambiguous “stale” with a dated status note.
stalewithout a date/source is hard to operationalize. Prefer a dated qualifier (e.g., “VRAM estimate, last verified 2026-03”) or move it to a short “Notes” phrase.Suggested doc tweak
-| Bark (Suno) | 1.0B | MIT | 13+ | Yes (prompt) | 6GB (stale, expressive: laughter/music) | +| Bark (Suno) | 1.0B | MIT | 13+ | Yes (prompt) | ~6GB (estimate; last verified 2026-03, expressive: laughter/music) |🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In @.agents/tools/voice/voice-ai-models.md at line 63, Update the table row for "Bark (Suno)" to remove the ambiguous "stale" qualifier and replace it with a dated status note or move it to a Notes column; for example change "6GB (stale, expressive: laughter/music)" to "6GB (VRAM estimate, last verified 2026-03; expressive: laughter/music)" or relocate "VRAM estimate, last verified 2026-03" into a Notes field—edit the line containing "Bark (Suno)" in voice-ai-models.md accordingly.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In @.agents/tools/voice/voice-ai-models.md:
- Around line 110-120: The Pipeline table and the inline "Pipeline: `Audio ->
[Parakeet ASR] -> [Any LLM] -> [Magpie TTS] -> Audio`" / cascaded S2S note
should not be compressed in-place; instead extract this section into separate
chapter files for the reference-corpus strategy and replace it here with a slim
index entry linking to those new files. Concretely, create new chapter docs
(e.g., voice-models-parakeet.md, voice-pipeline-s2s.md) containing the full
table and pipeline details, update this file's block (the table and the
Pipeline/Cascaded S2S lines) to a short index summary pointing to those
chapters, and ensure filenames/classes referenced in nav or TOC reflect the new
chapter names so links resolve.
---
Nitpick comments:
In @.agents/tools/voice/voice-ai-models.md:
- Line 63: Update the table row for "Bark (Suno)" to remove the ambiguous
"stale" qualifier and replace it with a dated status note or move it to a Notes
column; for example change "6GB (stale, expressive: laughter/music)" to "6GB
(VRAM estimate, last verified 2026-03; expressive: laughter/music)" or relocate
"VRAM estimate, last verified 2026-03" into a Notes field—edit the line
containing "Bark (Suno)" in voice-ai-models.md accordingly.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 5a518407-6aec-4260-9e47-e0f3fcab1d23
📒 Files selected for processing (1)
.agents/tools/voice/voice-ai-models.md
| | Component | Model | Languages | NIM | | ||
| |-----------|-------|-----------|-----| | ||
| | ASR | Parakeet TDT 0.6B v2 | English | HF (research) | | ||
| | ASR | Parakeet CTC 1.1B | English | Yes | | ||
| | ASR | Parakeet RNNT 1.1B | 25 | Yes | | ||
| | TTS | Magpie Multilingual | 17+ | Yes | | ||
| | TTS | Magpie Zero-Shot | English+ | API | | ||
| | Enhancement | StudioVoice | Any | Yes | | ||
| | Translation | Riva Translate | 36 | Yes | | ||
|
|
||
| Pipeline: `Audio -> [Parakeet ASR] -> [Any LLM] -> [Magpie TTS] -> Audio`. See `cloud-voice-agents.md`. Cascaded S2S (VAD+STT+LLM+TTS): see `speech-to-speech.md`. |
There was a problem hiding this comment.
Line 110–120: This conflicts with the reference-corpus strategy from the linked issue.
The section is compressed in-place, but the issue objective for reference corpora asks for extraction into chapter files plus a slim index rather than content compression. Please align this section (and likely the doc structure) to that strategy before merge.
Proposed structural direction
-## NVIDIA Riva Composable Pipelines
-| Component | Model | Languages | NIM |
-...
-Pipeline: `Audio -> [Parakeet ASR] -> [Any LLM] -> [Magpie TTS] -> Audio`. See `cloud-voice-agents.md`. Cascaded S2S (VAD+STT+LLM+TTS): see `speech-to-speech.md`.
+## NVIDIA Riva Composable Pipelines
+High-level index only. Detailed matrix moved to `tools/voice/voice-ai-models-riva.md`.
+Pipeline overview: `Audio -> [Parakeet ASR] -> [Any LLM] -> [Magpie TTS] -> Audio`.
+See:
+- `tools/voice/voice-ai-models-riva.md` (full component matrix)
+- `tools/voice/cloud-voice-agents.md`
+- `tools/voice/speech-to-speech.md`🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In @.agents/tools/voice/voice-ai-models.md around lines 110 - 120, The Pipeline
table and the inline "Pipeline: `Audio -> [Parakeet ASR] -> [Any LLM] -> [Magpie
TTS] -> Audio`" / cascaded S2S note should not be compressed in-place; instead
extract this section into separate chapter files for the reference-corpus
strategy and replace it here with a slim index entry linking to those new files.
Concretely, create new chapter docs (e.g., voice-models-parakeet.md,
voice-pipeline-s2s.md) containing the full table and pipeline details, update
this file's block (the table and the Pipeline/Cascaded S2S lines) to a short
index summary pointing to those chapters, and ensure filenames/classes
referenced in nav or TOC reflect the new chapter names so links resolve.
…T table Address Gemini review: the removed Pick line had details (Parakeet V3 25 langs, Parakeet V2 English-only, Apple Speech macOS 26+) not present in the table. Add these as parenthetical notes in the VRAM column.
🔍 Code Quality Report�[0;35m[MONITOR]�[0m Code Review Monitoring Report SonarCloud: 0 bugs, 0 vulnerabilities, 1 code smells Mon Mar 30 10:41:53 UTC 2026: Code review monitoring started 📈 Current Quality Metrics
Generated on: Mon Mar 30 10:41:56 UTC 2026 Generated by AI DevOps Framework Code Review Monitoring |
|
Nitpick about 'stale' wording: refers to unmaintained project status, not VRAM estimate staleness. Chapter file split: file is 126 lines, well under 300-line split threshold. Gemini's valid concern about lost Parakeet/Apple Speech details addressed in follow-up commit.



Summary
Details
Classification: Reference corpus (model selection reference). At 126 lines, well under the 300-line split threshold — tightening, not splitting.
What was removed (and why safe):
Zero knowledge loss: All 28 models, all specs, all cross-references, GPU planning, decision paths, and the Bark expressiveness note retained.
Runtime Testing
self-assessedCloses #14044
aidevops.sh v3.5.455 plugin for OpenCode v1.3.7 with claude-opus-4-6 spent 2m and 7,249 tokens on this as a headless worker.
Summary by CodeRabbit