-
Notifications
You must be signed in to change notification settings - Fork 150
fix: gemini transcription test cases #1031
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: graphite-base/1031
Are you sure you want to change the base?
fix: gemini transcription test cases #1031
Conversation
📝 WalkthroughSummary by CodeRabbit
✏️ Tip: You can customize this high-level summary in your review settings. WalkthroughTests and utilities were updated to compute provider-specific audio response formats (forcing Gemini to WAV) and apply that to TTS/transcription tests; Gemini test config now enables transcription. Gemini transcription code pre-initializes ExtraParams and a timeout increased for Gemini in test account config. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20–30 minutes
Poem
Pre-merge checks and finishing touches❌ Failed checks (2 warnings, 1 inconclusive)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
📜 Recent review detailsConfiguration used: CodeRabbit UI Review profile: CHILL Plan: Pro 📒 Files selected for processing (6)
🚧 Files skipped from review as they are similar to previous changes (4)
🧰 Additional context used📓 Path-based instructions (1)**⚙️ CodeRabbit configuration file
Files:
🧠 Learnings (1)📚 Learning: 2025-12-09T17:07:42.007ZApplied to files:
🧬 Code graph analysis (2)core/internal/testutil/utils.go (1)
core/internal/testutil/account.go (1)
⏰ Context from checks skipped due to timeout of 900000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
🔇 Additional comments (2)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
This stack of pull requests is managed by Graphite. Learn more about stacking. |
08fb63c to
2cdf8f4
Compare
366d759 to
315d945
Compare
315d945 to
e07c0c1
Compare
2cdf8f4 to
4d6d663
Compare
e07c0c1 to
91a7f05
Compare
4d6d663 to
0e8801a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (5)
core/internal/testutil/transcription_stream.go (3)
67-78: Format mismatch between TTS generation and transcription request.The TTS request correctly uses
responseFormat(line 78) computed viaGetProviderResponseFormat, which returns "wav" for Gemini. However, the subsequent transcription request at line 140 usestc.format(which is "mp3" per test cases) instead of the actual format that was generated.Additionally, the audio filename at line 118 uses
tc.formatfor the file extension, which would create a.mp3extension for Gemini audio that's actually in WAV format.Apply this approach to fix the mismatch:
voice := GetProviderVoice(testConfig.Provider, tc.voiceType) responseFormat := GetProviderResponseFormat(testConfig.Provider, tc.format) + // For Gemini, update tc.format to match actual generated format + actualFormat := responseFormat ttsRequest := &schemas.BifrostSpeechRequest{Then use
actualFormatconsistently for the filename (line 118) and transcription Format parameter (line 140).
118-118: File extension doesn't match actual audio format.The filename uses
tc.formatwhich could be "mp3", but for Gemini the actual audio format is "wav" (fromGetProviderResponseFormat). This creates files with incorrect extensions.Consider using the computed
responseFormatvariable instead:- audioFileName := filepath.Join(tempDir, "stream_roundtrip_"+tc.name+"."+tc.format) + audioFileName := filepath.Join(tempDir, "stream_roundtrip_"+tc.name+"."+responseFormat)
140-140: Transcription Format parameter should match actual audio format.The
Formatparameter is set totc.format(which is "mp3" per test cases at lines 40, 47, 54), but the actual audio was generated in the provider-specific format ("wav" for Gemini). This mismatches the actual audio format with what the transcription service is told.Use the actual generated format:
Params: &schemas.TranscriptionParameters{ Language: bifrost.Ptr("en"), - Format: bifrost.Ptr(tc.format), + Format: bifrost.Ptr(responseFormat), ResponseFormat: tc.responseFormat, },core/internal/testutil/transcription.go (2)
64-76: Format mismatch: TTS generates provider-specific format but transcription expects "mp3".Similar to the streaming tests, the TTS correctly generates audio in the provider-specific format (WAV for Gemini) using
GetProviderResponseFormatat line 64. However, the transcription request at line 137 hardcodesFormat: bifrost.Ptr("mp3"), creating a mismatch.Additionally, the filename at line 117 uses
tc.formatwhich creates incorrect file extensions for Gemini.Store the actual format and use it consistently:
voice := GetProviderVoice(testConfig.Provider, tc.voiceType) responseFormat := GetProviderResponseFormat(testConfig.Provider, tc.format) + actualFormat := responseFormat ttsRequest := &schemas.BifrostSpeechRequest{Then update line 117:
- audioFileName := filepath.Join(tempDir, "roundtrip_"+tc.name+"."+tc.format) + audioFileName := filepath.Join(tempDir, "roundtrip_"+tc.name+"."+actualFormat)And line 137:
Params: &schemas.TranscriptionParameters{ Language: bifrost.Ptr("en"), - Format: bifrost.Ptr("mp3"), + Format: bifrost.Ptr(actualFormat), ResponseFormat: tc.responseFormat, },
137-137: Hardcoded "mp3" format doesn't match actual audio format.The
Formatparameter is hardcoded to "mp3", but the actual audio was generated in the provider-specific format (WAV for Gemini via line 64'sGetProviderResponseFormat). This tells the transcription service the wrong input format.Use the actual generated format instead of hardcoding:
Params: &schemas.TranscriptionParameters{ Language: bifrost.Ptr("en"), - Format: bifrost.Ptr("mp3"), + Format: bifrost.Ptr(responseFormat), ResponseFormat: tc.responseFormat, },
🧹 Nitpick comments (2)
core/providers/gemini/transcription.go (1)
47-52: Consider removing redundant initialization checks.Since
Params.ExtraParamsis now pre-initialized at line 17, the nil checks at lines 47-51 are redundant. While defensive coding is good practice, these checks will never trigger.You could simplify to:
if part.FileData != nil && strings.HasPrefix(strings.ToLower(part.FileData.MIMEType), "audio/") { - if bifrostReq.Params == nil { - bifrostReq.Params = &schemas.TranscriptionParameters{} - } - if bifrostReq.Params.ExtraParams == nil { - bifrostReq.Params.ExtraParams = make(map[string]interface{}) - } bifrostReq.Params.ExtraParams["file_uri"] = part.FileData.FileURIcore/internal/testutil/utils.go (1)
88-99: Add documentation explaining Gemini's WAV-only constraint.While the comment mentions that Gemini only supports WAV format, it would be helpful to document why this limitation exists (API constraint, format compatibility, etc.) to help future maintainers understand the reasoning.
Enhance the documentation:
// GetProviderResponseFormat returns the appropriate response format for speech synthesis based on the provider -// For Gemini, only "wav" format is supported, so we always return "wav" regardless of the requested format +// For Gemini, the API only supports "wav" format for speech synthesis transcription round-trips, +// so we always return "wav" regardless of the requested format. Other providers support their requested formats. func GetProviderResponseFormat(provider schemas.ModelProvider, requestedFormat string) string {
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (6)
core/internal/testutil/account.go(1 hunks)core/internal/testutil/transcription.go(3 hunks)core/internal/testutil/transcription_stream.go(2 hunks)core/internal/testutil/utils.go(3 hunks)core/providers/gemini/gemini_test.go(1 hunks)core/providers/gemini/transcription.go(1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**
⚙️ CodeRabbit configuration file
always check the stack if there is one for the current PR. do not give localized reviews for the PR, always see all changes in the light of the whole stack of PRs (if there is a stack, if there is no stack you can continue to make localized suggestions/reviews)
Files:
core/internal/testutil/utils.gocore/internal/testutil/transcription.gocore/internal/testutil/transcription_stream.gocore/internal/testutil/account.gocore/providers/gemini/gemini_test.gocore/providers/gemini/transcription.go
🧠 Learnings (1)
📚 Learning: 2025-12-09T17:07:42.007Z
Learnt from: qwerty-dvorak
Repo: maximhq/bifrost PR: 1006
File: core/schemas/account.go:9-18
Timestamp: 2025-12-09T17:07:42.007Z
Learning: In core/schemas/account.go, the HuggingFaceKeyConfig field within the Key struct is currently unused and reserved for future Hugging Face inference endpoint deployments. Do not flag this field as missing from OpenAPI documentation or require its presence in the API spec until the feature is actively implemented and used. When the feature is added, update the OpenAPI docs accordingly; otherwise, treat this field as non-breaking and not part of the current API surface.
Applied to files:
core/internal/testutil/utils.gocore/internal/testutil/transcription.gocore/internal/testutil/transcription_stream.gocore/internal/testutil/account.gocore/providers/gemini/gemini_test.gocore/providers/gemini/transcription.go
🧬 Code graph analysis (5)
core/internal/testutil/utils.go (1)
core/schemas/bifrost.go (1)
Gemini(48-48)
core/internal/testutil/transcription.go (1)
core/internal/testutil/utils.go (1)
GetProviderResponseFormat(90-99)
core/internal/testutil/transcription_stream.go (1)
core/internal/testutil/utils.go (1)
GetProviderResponseFormat(90-99)
core/internal/testutil/account.go (1)
core/schemas/provider.go (1)
DefaultRequestTimeoutInSeconds(15-15)
core/providers/gemini/transcription.go (1)
core/schemas/transcriptions.go (1)
TranscriptionParameters(32-45)
⏰ Context from checks skipped due to timeout of 900000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
🔇 Additional comments (4)
core/internal/testutil/account.go (1)
481-481: LGTM: Timeout increase appropriate for transcription workloads.Increasing the Gemini timeout from 120 to 300 seconds aligns with enabling transcription test scenarios, which may require longer processing times for audio analysis.
core/providers/gemini/gemini_test.go (1)
50-51: LGTM: Transcription test scenarios enabled for Gemini.Enabling
TranscriptionandTranscriptionStreamscenarios correctly activates the transcription test paths for the Gemini provider.core/providers/gemini/transcription.go (1)
16-18: LGTM: Params initialization ensures ExtraParams is always available.Pre-initializing
Paramswith an emptyExtraParamsmap simplifies downstream code that relies on this field.core/internal/testutil/utils.go (1)
555-567: LGTM: Provider-specific format correctly applied to TTS generation.The use of
GetProviderResponseFormatensures that Gemini uses WAV format while other providers can use their requested formats.
| name: "RoundTrip_Basic_MP3", | ||
| text: TTSTestTextBasic, | ||
| voiceType: "primary", | ||
| format: "mp3", | ||
| format: "wav", | ||
| responseFormat: bifrost.Ptr("json"), | ||
| }, | ||
| { | ||
| name: "RoundTrip_Medium_MP3", | ||
| text: TTSTestTextMedium, | ||
| voiceType: "secondary", | ||
| format: "mp3", | ||
| format: "wav", | ||
| responseFormat: bifrost.Ptr("json"), | ||
| }, | ||
| { | ||
| name: "RoundTrip_Technical_MP3", | ||
| text: TTSTestTextTechnical, | ||
| voiceType: "tertiary", | ||
| format: "mp3", | ||
| format: "wav", | ||
| responseFormat: bifrost.Ptr("json"), | ||
| }, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Test names misleading: contain "_MP3" but format is "wav".
The test case names reference "MP3" (lines 34, 41, 48) but the format field is now "wav" (lines 37, 44, 51). This creates confusion about what format is actually being tested.
Update the test names to reflect the actual format:
{
- name: "RoundTrip_Basic_MP3",
+ name: "RoundTrip_Basic_WAV",
text: TTSTestTextBasic,
voiceType: "primary",
format: "wav",
responseFormat: bifrost.Ptr("json"),
},
{
- name: "RoundTrip_Medium_MP3",
+ name: "RoundTrip_Medium_WAV",
text: TTSTestTextMedium,
voiceType: "secondary",
format: "wav",
responseFormat: bifrost.Ptr("json"),
},
{
- name: "RoundTrip_Technical_MP3",
+ name: "RoundTrip_Technical_WAV",
text: TTSTestTextTechnical,
voiceType: "tertiary",
format: "wav",
responseFormat: bifrost.Ptr("json"),
},📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| name: "RoundTrip_Basic_MP3", | |
| text: TTSTestTextBasic, | |
| voiceType: "primary", | |
| format: "mp3", | |
| format: "wav", | |
| responseFormat: bifrost.Ptr("json"), | |
| }, | |
| { | |
| name: "RoundTrip_Medium_MP3", | |
| text: TTSTestTextMedium, | |
| voiceType: "secondary", | |
| format: "mp3", | |
| format: "wav", | |
| responseFormat: bifrost.Ptr("json"), | |
| }, | |
| { | |
| name: "RoundTrip_Technical_MP3", | |
| text: TTSTestTextTechnical, | |
| voiceType: "tertiary", | |
| format: "mp3", | |
| format: "wav", | |
| responseFormat: bifrost.Ptr("json"), | |
| }, | |
| { | |
| name: "RoundTrip_Basic_WAV", | |
| text: TTSTestTextBasic, | |
| voiceType: "primary", | |
| format: "wav", | |
| responseFormat: bifrost.Ptr("json"), | |
| }, | |
| { | |
| name: "RoundTrip_Medium_WAV", | |
| text: TTSTestTextMedium, | |
| voiceType: "secondary", | |
| format: "wav", | |
| responseFormat: bifrost.Ptr("json"), | |
| }, | |
| { | |
| name: "RoundTrip_Technical_WAV", | |
| text: TTSTestTextTechnical, | |
| voiceType: "tertiary", | |
| format: "wav", | |
| responseFormat: bifrost.Ptr("json"), | |
| }, |
🤖 Prompt for AI Agents
In core/internal/testutil/transcription.go around lines 34 to 53, the test case
names incorrectly include "_MP3" while the format field is "wav"; update the
name fields for each case to use "_WAV" (e.g., "RoundTrip_Basic_WAV",
"RoundTrip_Medium_WAV", "RoundTrip_Technical_WAV") so the test names accurately
reflect the format being tested and keep naming consistent with the format
value.
0e8801a to
e38b506
Compare
91a7f05 to
a21bf88
Compare

Summary
Briefly explain the purpose of this PR and the problem it solves.
Changes
Type of change
Affected areas
How to test
Describe the steps to validate this change. Include commands and expected outcomes.
If adding new configs or environment variables, document them here.
Screenshots/Recordings
If UI changes, add before/after screenshots or short clips.
Breaking changes
If yes, describe impact and migration instructions.
Related issues
Link related issues and discussions. Example: Closes #123
Security considerations
Note any security implications (auth, secrets, PII, sandboxing, etc.).
Checklist
docs/contributing/README.mdand followed the guidelines