Skip to content

Conversation

@TejasGhatte
Copy link
Collaborator

Summary

Briefly explain the purpose of this PR and the problem it solves.

Changes

  • What was changed and why
  • Any notable design decisions or trade-offs

Type of change

  • Bug fix
  • Feature
  • Refactor
  • Documentation
  • Chore/CI

Affected areas

  • Core (Go)
  • Transports (HTTP)
  • Providers/Integrations
  • Plugins
  • UI (Next.js)
  • Docs

How to test

Describe the steps to validate this change. Include commands and expected outcomes.

# Core/Transports
go version
go test ./...

# UI
cd ui
pnpm i || npm i
pnpm test || npm test
pnpm build || npm run build

If adding new configs or environment variables, document them here.

Screenshots/Recordings

If UI changes, add before/after screenshots or short clips.

Breaking changes

  • Yes
  • No

If yes, describe impact and migration instructions.

Related issues

Link related issues and discussions. Example: Closes #123

Security considerations

Note any security implications (auth, secrets, PII, sandboxing, etc.).

Checklist

  • I read docs/contributing/README.md and followed the guidelines
  • I added/updated tests where appropriate
  • I updated documentation where needed
  • I verified builds succeed (Go and UI)
  • I verified the CI pipeline passes locally if applicable

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Dec 9, 2025

📝 Walkthrough

Summary by CodeRabbit

  • Bug Fixes

    • Increased Gemini request timeout from 120s to 300s for improved stability.
  • New Features

    • Enabled transcription and streaming audio paths for the Gemini provider.
  • Chores

    • Standardized audio format handling across transcription flows and tests.
    • Simplified transcription initialization to reduce redundant setup.

✏️ Tip: You can customize this high-level summary in your review settings.

Walkthrough

Tests and utilities were updated to compute provider-specific audio response formats (forcing Gemini to WAV) and apply that to TTS/transcription tests; Gemini test config now enables transcription. Gemini transcription code pre-initializes ExtraParams and a timeout increased for Gemini in test account config.

Changes

Cohort / File(s) Summary
Test utilities — response format & timeout
core/internal/testutil/account.go, core/internal/testutil/utils.go
Added GetProviderResponseFormat(provider, requestedFormat) which returns "wav" for Gemini and otherwise returns the requested format; GenerateTTSAudioForTest now uses this computed response format. Increased Gemini default request timeout from 120s to 300s in test account config.
Transcription tests — use provider-specific format
core/internal/testutil/transcription.go, core/internal/testutil/transcription_stream.go
Switched test audio format usage to compute and pass provider-specific responseFormat into TTS requests and into transcription expectation construction (tests moved from MP3→WAV in these cases).
Provider test configuration
core/providers/gemini/gemini_test.go
Enabled Transcription and TranscriptionStream flags in the Gemini ComprehensiveTestConfig to exercise transcription code paths.
Gemini provider implementation — params init
core/providers/gemini/transcription.go
Pre-initialized Params with an empty ExtraParams map when building BifrostTranscriptionRequest; removed redundant downstream initializations while preserving prompt, file_uri, and other behavior.
Module manifest
go.mod
(Touched/updated as part of the diff set.)

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20–30 minutes

  • Mixed changes across tests and provider code require reading test flows and provider request construction.
  • Areas to inspect closely:
    • GetProviderResponseFormat() behavior — confirm forcing "wav" for Gemini is intended and covers all test cases.
    • Test assertions and expected transcription formats after switching MP3→WAV.
    • gemini/transcription.go initialization — ensure pre-initialized ExtraParams does not mask nil checks or alter downstream serialization.
    • account.go timeout increase — ensure tests relying on shorter timeouts are not affected.

Poem

🐰 I hopped through code, nibbling bits of test,
I nudged Gemini toward WAV for its best.
Timeouts stretched so the runs won't race,
Params prepped tidy in their place.
Hooray — the builds hum with a rabbit's grace! 🥕✨

Pre-merge checks and finishing touches

❌ Failed checks (2 warnings, 1 inconclusive)
Check name Status Explanation Resolution
Description check ⚠️ Warning The PR description is entirely empty placeholders with no actual content filled in—Summary, Changes, type of change, affected areas, and all other sections lack any substantive information. Fill in all required sections with specific details: Summary explaining the problem, detailed Changes list, selection of Type of change and Affected areas, and any relevant related issues or security considerations.
Linked Issues check ⚠️ Warning The linked issue #123 concerns Files API support for providers; the PR changes focus on Gemini transcription test fixes, which are unrelated to Files API functionality. Verify the correct linked issues are referenced. The PR should link to Gemini transcription-related issues, not the Files API issue #123.
Out of Scope Changes check ❓ Inconclusive The PR includes changes to Gemini provider timeout configuration and transcription logic alongside test fixes, which may be beyond the scope of fixing test cases alone. Clarify whether the timeout configuration change (120→300 seconds) and transcription logic modifications are intentional fixes or incidental changes that should be in a separate PR.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title 'fix: gemini transcription test cases' is specific and directly relates to the main changes in the PR which fix Gemini transcription test configuration.
Docstring Coverage ✅ Passed Docstring coverage is 83.33% which is sufficient. The required threshold is 80.00%.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch 12-09-fix_gemini_transcription_test_cases

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 91a7f05 and a21bf88.

📒 Files selected for processing (6)
  • core/internal/testutil/account.go (1 hunks)
  • core/internal/testutil/transcription.go (3 hunks)
  • core/internal/testutil/transcription_stream.go (2 hunks)
  • core/internal/testutil/utils.go (3 hunks)
  • core/providers/gemini/gemini_test.go (1 hunks)
  • core/providers/gemini/transcription.go (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (4)
  • core/providers/gemini/gemini_test.go
  • core/internal/testutil/transcription_stream.go
  • core/internal/testutil/transcription.go
  • core/providers/gemini/transcription.go
🧰 Additional context used
📓 Path-based instructions (1)
**

⚙️ CodeRabbit configuration file

always check the stack if there is one for the current PR. do not give localized reviews for the PR, always see all changes in the light of the whole stack of PRs (if there is a stack, if there is no stack you can continue to make localized suggestions/reviews)

Files:

  • core/internal/testutil/utils.go
  • core/internal/testutil/account.go
🧠 Learnings (1)
📚 Learning: 2025-12-09T17:07:42.007Z
Learnt from: qwerty-dvorak
Repo: maximhq/bifrost PR: 1006
File: core/schemas/account.go:9-18
Timestamp: 2025-12-09T17:07:42.007Z
Learning: In core/schemas/account.go, the HuggingFaceKeyConfig field within the Key struct is currently unused and reserved for future Hugging Face inference endpoint deployments. Do not flag this field as missing from OpenAPI documentation or require its presence in the API spec until the feature is actively implemented and used. When the feature is added, update the OpenAPI docs accordingly; otherwise, treat this field as non-breaking and not part of the current API surface.

Applied to files:

  • core/internal/testutil/utils.go
  • core/internal/testutil/account.go
🧬 Code graph analysis (2)
core/internal/testutil/utils.go (1)
core/schemas/bifrost.go (1)
  • Gemini (48-48)
core/internal/testutil/account.go (1)
core/schemas/provider.go (1)
  • DefaultRequestTimeoutInSeconds (15-15)
⏰ Context from checks skipped due to timeout of 900000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
🔇 Additional comments (2)
core/internal/testutil/account.go (1)

480-491: Longer Gemini test timeout is reasonable but may stretch failing tests

Bumping DefaultRequestTimeoutInSeconds to 300s for Gemini in testutil makes sense given the heavier/variable TTS + transcription flows added in this stack, but it also means a single failing call can block for up to 5 minutes (before retries/backoff). As this is test-only config, that trade-off is likely fine—just keep an eye on end-to-end test duration for Gemini-heavy suites, and dial it back if runs become too slow.

core/internal/testutil/utils.go (1)

88-99: Provider-specific response format helper looks good

GetProviderResponseFormat cleanly centralizes the Gemini “force WAV” behavior while keeping other providers on the requested format, and it fits well with the rest of the testutil helpers and Gemini changes in this stack. No changes needed here.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@TejasGhatte TejasGhatte force-pushed the 12-08-feat_added_support_for_gemini_native_converterswq branch from 08fb63c to 2cdf8f4 Compare December 9, 2025 15:30
@TejasGhatte TejasGhatte force-pushed the 12-09-fix_gemini_transcription_test_cases branch from 366d759 to 315d945 Compare December 9, 2025 15:30
@TejasGhatte TejasGhatte changed the base branch from 12-08-feat_added_support_for_gemini_native_converterswq to graphite-base/1031 December 10, 2025 05:36
@TejasGhatte TejasGhatte force-pushed the 12-09-fix_gemini_transcription_test_cases branch from 315d945 to e07c0c1 Compare December 10, 2025 09:59
@TejasGhatte TejasGhatte changed the base branch from graphite-base/1031 to 12-08-feat_added_support_for_gemini_native_converterswq December 10, 2025 09:59
@TejasGhatte TejasGhatte changed the base branch from 12-08-feat_added_support_for_gemini_native_converterswq to graphite-base/1031 December 10, 2025 10:43
@TejasGhatte TejasGhatte force-pushed the 12-09-fix_gemini_transcription_test_cases branch from e07c0c1 to 91a7f05 Compare December 10, 2025 11:30
@TejasGhatte TejasGhatte changed the base branch from graphite-base/1031 to 12-08-feat_added_support_for_gemini_native_converterswq December 10, 2025 11:30
@TejasGhatte TejasGhatte marked this pull request as ready for review December 10, 2025 11:30
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (5)
core/internal/testutil/transcription_stream.go (3)

67-78: Format mismatch between TTS generation and transcription request.

The TTS request correctly uses responseFormat (line 78) computed via GetProviderResponseFormat, which returns "wav" for Gemini. However, the subsequent transcription request at line 140 uses tc.format (which is "mp3" per test cases) instead of the actual format that was generated.

Additionally, the audio filename at line 118 uses tc.format for the file extension, which would create a .mp3 extension for Gemini audio that's actually in WAV format.

Apply this approach to fix the mismatch:

 				voice := GetProviderVoice(testConfig.Provider, tc.voiceType)
 				responseFormat := GetProviderResponseFormat(testConfig.Provider, tc.format)
+				// For Gemini, update tc.format to match actual generated format
+				actualFormat := responseFormat
 				ttsRequest := &schemas.BifrostSpeechRequest{

Then use actualFormat consistently for the filename (line 118) and transcription Format parameter (line 140).


118-118: File extension doesn't match actual audio format.

The filename uses tc.format which could be "mp3", but for Gemini the actual audio format is "wav" (from GetProviderResponseFormat). This creates files with incorrect extensions.

Consider using the computed responseFormat variable instead:

-				audioFileName := filepath.Join(tempDir, "stream_roundtrip_"+tc.name+"."+tc.format)
+				audioFileName := filepath.Join(tempDir, "stream_roundtrip_"+tc.name+"."+responseFormat)

140-140: Transcription Format parameter should match actual audio format.

The Format parameter is set to tc.format (which is "mp3" per test cases at lines 40, 47, 54), but the actual audio was generated in the provider-specific format ("wav" for Gemini). This mismatches the actual audio format with what the transcription service is told.

Use the actual generated format:

 					Params: &schemas.TranscriptionParameters{
 						Language:       bifrost.Ptr("en"),
-						Format:         bifrost.Ptr(tc.format),
+						Format:         bifrost.Ptr(responseFormat),
 						ResponseFormat: tc.responseFormat,
 					},
core/internal/testutil/transcription.go (2)

64-76: Format mismatch: TTS generates provider-specific format but transcription expects "mp3".

Similar to the streaming tests, the TTS correctly generates audio in the provider-specific format (WAV for Gemini) using GetProviderResponseFormat at line 64. However, the transcription request at line 137 hardcodes Format: bifrost.Ptr("mp3"), creating a mismatch.

Additionally, the filename at line 117 uses tc.format which creates incorrect file extensions for Gemini.

Store the actual format and use it consistently:

 				voice := GetProviderVoice(testConfig.Provider, tc.voiceType)
 				responseFormat := GetProviderResponseFormat(testConfig.Provider, tc.format)
+				actualFormat := responseFormat
 
 				ttsRequest := &schemas.BifrostSpeechRequest{

Then update line 117:

-				audioFileName := filepath.Join(tempDir, "roundtrip_"+tc.name+"."+tc.format)
+				audioFileName := filepath.Join(tempDir, "roundtrip_"+tc.name+"."+actualFormat)

And line 137:

 					Params: &schemas.TranscriptionParameters{
 						Language:       bifrost.Ptr("en"),
-						Format:         bifrost.Ptr("mp3"),
+						Format:         bifrost.Ptr(actualFormat),
 						ResponseFormat: tc.responseFormat,
 					},

137-137: Hardcoded "mp3" format doesn't match actual audio format.

The Format parameter is hardcoded to "mp3", but the actual audio was generated in the provider-specific format (WAV for Gemini via line 64's GetProviderResponseFormat). This tells the transcription service the wrong input format.

Use the actual generated format instead of hardcoding:

 					Params: &schemas.TranscriptionParameters{
 						Language:       bifrost.Ptr("en"),
-						Format:         bifrost.Ptr("mp3"),
+						Format:         bifrost.Ptr(responseFormat),
 						ResponseFormat: tc.responseFormat,
 					},
🧹 Nitpick comments (2)
core/providers/gemini/transcription.go (1)

47-52: Consider removing redundant initialization checks.

Since Params.ExtraParams is now pre-initialized at line 17, the nil checks at lines 47-51 are redundant. While defensive coding is good practice, these checks will never trigger.

You could simplify to:

 			if part.FileData != nil && strings.HasPrefix(strings.ToLower(part.FileData.MIMEType), "audio/") {
-				if bifrostReq.Params == nil {
-					bifrostReq.Params = &schemas.TranscriptionParameters{}
-				}
-				if bifrostReq.Params.ExtraParams == nil {
-					bifrostReq.Params.ExtraParams = make(map[string]interface{})
-				}
 				bifrostReq.Params.ExtraParams["file_uri"] = part.FileData.FileURI
core/internal/testutil/utils.go (1)

88-99: Add documentation explaining Gemini's WAV-only constraint.

While the comment mentions that Gemini only supports WAV format, it would be helpful to document why this limitation exists (API constraint, format compatibility, etc.) to help future maintainers understand the reasoning.

Enhance the documentation:

 // GetProviderResponseFormat returns the appropriate response format for speech synthesis based on the provider
-// For Gemini, only "wav" format is supported, so we always return "wav" regardless of the requested format
+// For Gemini, the API only supports "wav" format for speech synthesis transcription round-trips,
+// so we always return "wav" regardless of the requested format. Other providers support their requested formats.
 func GetProviderResponseFormat(provider schemas.ModelProvider, requestedFormat string) string {
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 0e8801a and 91a7f05.

📒 Files selected for processing (6)
  • core/internal/testutil/account.go (1 hunks)
  • core/internal/testutil/transcription.go (3 hunks)
  • core/internal/testutil/transcription_stream.go (2 hunks)
  • core/internal/testutil/utils.go (3 hunks)
  • core/providers/gemini/gemini_test.go (1 hunks)
  • core/providers/gemini/transcription.go (1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**

⚙️ CodeRabbit configuration file

always check the stack if there is one for the current PR. do not give localized reviews for the PR, always see all changes in the light of the whole stack of PRs (if there is a stack, if there is no stack you can continue to make localized suggestions/reviews)

Files:

  • core/internal/testutil/utils.go
  • core/internal/testutil/transcription.go
  • core/internal/testutil/transcription_stream.go
  • core/internal/testutil/account.go
  • core/providers/gemini/gemini_test.go
  • core/providers/gemini/transcription.go
🧠 Learnings (1)
📚 Learning: 2025-12-09T17:07:42.007Z
Learnt from: qwerty-dvorak
Repo: maximhq/bifrost PR: 1006
File: core/schemas/account.go:9-18
Timestamp: 2025-12-09T17:07:42.007Z
Learning: In core/schemas/account.go, the HuggingFaceKeyConfig field within the Key struct is currently unused and reserved for future Hugging Face inference endpoint deployments. Do not flag this field as missing from OpenAPI documentation or require its presence in the API spec until the feature is actively implemented and used. When the feature is added, update the OpenAPI docs accordingly; otherwise, treat this field as non-breaking and not part of the current API surface.

Applied to files:

  • core/internal/testutil/utils.go
  • core/internal/testutil/transcription.go
  • core/internal/testutil/transcription_stream.go
  • core/internal/testutil/account.go
  • core/providers/gemini/gemini_test.go
  • core/providers/gemini/transcription.go
🧬 Code graph analysis (5)
core/internal/testutil/utils.go (1)
core/schemas/bifrost.go (1)
  • Gemini (48-48)
core/internal/testutil/transcription.go (1)
core/internal/testutil/utils.go (1)
  • GetProviderResponseFormat (90-99)
core/internal/testutil/transcription_stream.go (1)
core/internal/testutil/utils.go (1)
  • GetProviderResponseFormat (90-99)
core/internal/testutil/account.go (1)
core/schemas/provider.go (1)
  • DefaultRequestTimeoutInSeconds (15-15)
core/providers/gemini/transcription.go (1)
core/schemas/transcriptions.go (1)
  • TranscriptionParameters (32-45)
⏰ Context from checks skipped due to timeout of 900000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
🔇 Additional comments (4)
core/internal/testutil/account.go (1)

481-481: LGTM: Timeout increase appropriate for transcription workloads.

Increasing the Gemini timeout from 120 to 300 seconds aligns with enabling transcription test scenarios, which may require longer processing times for audio analysis.

core/providers/gemini/gemini_test.go (1)

50-51: LGTM: Transcription test scenarios enabled for Gemini.

Enabling Transcription and TranscriptionStream scenarios correctly activates the transcription test paths for the Gemini provider.

core/providers/gemini/transcription.go (1)

16-18: LGTM: Params initialization ensures ExtraParams is always available.

Pre-initializing Params with an empty ExtraParams map simplifies downstream code that relies on this field.

core/internal/testutil/utils.go (1)

555-567: LGTM: Provider-specific format correctly applied to TTS generation.

The use of GetProviderResponseFormat ensures that Gemini uses WAV format while other providers can use their requested formats.

Comment on lines 34 to 53
name: "RoundTrip_Basic_MP3",
text: TTSTestTextBasic,
voiceType: "primary",
format: "mp3",
format: "wav",
responseFormat: bifrost.Ptr("json"),
},
{
name: "RoundTrip_Medium_MP3",
text: TTSTestTextMedium,
voiceType: "secondary",
format: "mp3",
format: "wav",
responseFormat: bifrost.Ptr("json"),
},
{
name: "RoundTrip_Technical_MP3",
text: TTSTestTextTechnical,
voiceType: "tertiary",
format: "mp3",
format: "wav",
responseFormat: bifrost.Ptr("json"),
},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Test names misleading: contain "_MP3" but format is "wav".

The test case names reference "MP3" (lines 34, 41, 48) but the format field is now "wav" (lines 37, 44, 51). This creates confusion about what format is actually being tested.

Update the test names to reflect the actual format:

 		{
-			name:           "RoundTrip_Basic_MP3",
+			name:           "RoundTrip_Basic_WAV",
 			text:           TTSTestTextBasic,
 			voiceType:      "primary",
 			format:         "wav",
 			responseFormat: bifrost.Ptr("json"),
 		},
 		{
-			name:           "RoundTrip_Medium_MP3",
+			name:           "RoundTrip_Medium_WAV",
 			text:           TTSTestTextMedium,
 			voiceType:      "secondary",
 			format:         "wav",
 			responseFormat: bifrost.Ptr("json"),
 		},
 		{
-			name:           "RoundTrip_Technical_MP3",
+			name:           "RoundTrip_Technical_WAV",
 			text:           TTSTestTextTechnical,
 			voiceType:      "tertiary",
 			format:         "wav",
 			responseFormat: bifrost.Ptr("json"),
 		},
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
name: "RoundTrip_Basic_MP3",
text: TTSTestTextBasic,
voiceType: "primary",
format: "mp3",
format: "wav",
responseFormat: bifrost.Ptr("json"),
},
{
name: "RoundTrip_Medium_MP3",
text: TTSTestTextMedium,
voiceType: "secondary",
format: "mp3",
format: "wav",
responseFormat: bifrost.Ptr("json"),
},
{
name: "RoundTrip_Technical_MP3",
text: TTSTestTextTechnical,
voiceType: "tertiary",
format: "mp3",
format: "wav",
responseFormat: bifrost.Ptr("json"),
},
{
name: "RoundTrip_Basic_WAV",
text: TTSTestTextBasic,
voiceType: "primary",
format: "wav",
responseFormat: bifrost.Ptr("json"),
},
{
name: "RoundTrip_Medium_WAV",
text: TTSTestTextMedium,
voiceType: "secondary",
format: "wav",
responseFormat: bifrost.Ptr("json"),
},
{
name: "RoundTrip_Technical_WAV",
text: TTSTestTextTechnical,
voiceType: "tertiary",
format: "wav",
responseFormat: bifrost.Ptr("json"),
},
🤖 Prompt for AI Agents
In core/internal/testutil/transcription.go around lines 34 to 53, the test case
names incorrectly include "_MP3" while the format field is "wav"; update the
name fields for each case to use "_WAV" (e.g., "RoundTrip_Basic_WAV",
"RoundTrip_Medium_WAV", "RoundTrip_Technical_WAV") so the test names accurately
reflect the format being tested and keep naming consistent with the format
value.

@TejasGhatte TejasGhatte changed the base branch from 12-08-feat_added_support_for_gemini_native_converterswq to graphite-base/1031 December 11, 2025 20:01
@TejasGhatte TejasGhatte mentioned this pull request Dec 11, 2025
16 tasks
@Pratham-Mishra04 Pratham-Mishra04 force-pushed the 12-09-fix_gemini_transcription_test_cases branch from 91a7f05 to a21bf88 Compare December 11, 2025 20:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants