-
Notifications
You must be signed in to change notification settings - Fork 189
feat: added audio support to chat completions #1114
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: added audio support to chat completions #1114
Conversation
|
Warning Rate limit exceeded@Pratham-Mishra04 has exceeded the limit for the number of commits or files that can be reviewed per hour. Please wait 14 minutes and 5 seconds before requesting another review. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ⛔ Files ignored due to path filters (1)
📒 Files selected for processing (55)
📝 WalkthroughWalkthroughAdds chat audio support across schema, streaming assembly, tests, provider configs, UI playback, pricing, and version bumps; introduces audio types/fields, streaming audio accumulation, test helpers for audio scenarios, and UI components for rendering audio and transcripts. Changes
Sequence DiagramsequenceDiagram
autonumber
participant Client
participant API
participant Streamer
participant Assembler
participant UI
Client->>API: Send chat completion request (includes audio input)
API->>Streamer: Start streaming response (audio modality)
Streamer->>Assembler: Emit chunk (delta with Audio data/transcript)
Assembler->>Assembler: If first audio chunk -> init Audio struct (ID, Data, Transcript)
Streamer->>Assembler: Emit subsequent audio chunks (Data, Transcript, metadata)
Assembler->>Assembler: Accumulate Data & Transcript, update ID/ExpiresAt when present
Streamer-->>API: Streaming complete (final usage)
API->>Client: Final assembled message (text + audio metadata + audio data)
Client->>UI: Render message
UI->>UI: Display transcript, audio player (data), ID & expiry
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes
Suggested reviewers
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
Comment |
This stack of pull requests is managed by Graphite. Learn more about stacking. |
bfbb622 to
49b39f2
Compare
c3f7d77 to
39a34e8
Compare
49b39f2 to
aff923a
Compare
39a34e8 to
1460f5d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (2)
core/internal/testutil/chat_audio.go (2)
68-72: Consider validating audio format values.The audio format (
"wav") and voice ("alloy") are hardcoded without validation. Consider:
- Documenting supported format values (wav, pcm16, mp3, etc.) in
ChatAudioParameters- Adding validation or enums for supported formats to catch configuration errors early
- Verifying these formats are actually supported by the providers being tested
For reference, the streaming test uses
"pcm16"format (line 199), suggesting multiple formats are supported but not documented.
287-300: Consider adding format validation for audio data.The test validates that audio data is present but doesn't verify its format or integrity. Consider optionally:
- Validating that accumulated audio data is valid base64
- Checking data size is within reasonable bounds (audio files can be large)
- Verifying the format matches the requested output format ("wav" or "pcm16")
This is a nice-to-have enhancement for test robustness but not critical since providers are responsible for format correctness.
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (12)
core/internal/testutil/account.go(6 hunks)core/internal/testutil/chat_audio.go(1 hunks)core/internal/testutil/tests.go(2 hunks)core/internal/testutil/utils.go(3 hunks)core/providers/azure/azure_test.go(2 hunks)core/providers/openai/openai_test.go(2 hunks)core/schemas/chatcompletions.go(7 hunks)core/schemas/mux.go(2 hunks)core/schemas/responses.go(1 hunks)framework/streaming/chat.go(1 hunks)ui/app/workspace/logs/views/logChatMessageView.tsx(2 hunks)ui/lib/types/logs.ts(1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**
⚙️ CodeRabbit configuration file
always check the stack if there is one for the current PR. do not give localized reviews for the PR, always see all changes in the light of the whole stack of PRs (if there is a stack, if there is no stack you can continue to make localized suggestions/reviews)
Files:
core/providers/azure/azure_test.goui/app/workspace/logs/views/logChatMessageView.tsxui/lib/types/logs.tsframework/streaming/chat.gocore/schemas/responses.gocore/internal/testutil/account.gocore/internal/testutil/chat_audio.gocore/internal/testutil/tests.gocore/schemas/mux.gocore/internal/testutil/utils.gocore/schemas/chatcompletions.gocore/providers/openai/openai_test.go
🧠 Learnings (4)
📚 Learning: 2025-12-09T17:07:42.007Z
Learnt from: qwerty-dvorak
Repo: maximhq/bifrost PR: 1006
File: core/schemas/account.go:9-18
Timestamp: 2025-12-09T17:07:42.007Z
Learning: In core/schemas/account.go, the HuggingFaceKeyConfig field within the Key struct is currently unused and reserved for future Hugging Face inference endpoint deployments. Do not flag this field as missing from OpenAPI documentation or require its presence in the API spec until the feature is actively implemented and used. When the feature is added, update the OpenAPI docs accordingly; otherwise, treat this field as non-breaking and not part of the current API surface.
Applied to files:
core/providers/azure/azure_test.goframework/streaming/chat.gocore/schemas/responses.gocore/internal/testutil/account.gocore/internal/testutil/chat_audio.gocore/internal/testutil/tests.gocore/schemas/mux.gocore/internal/testutil/utils.gocore/schemas/chatcompletions.gocore/providers/openai/openai_test.go
📚 Learning: 2025-12-15T10:16:21.909Z
Learnt from: qwerty-dvorak
Repo: maximhq/bifrost PR: 1006
File: core/providers/huggingface/huggingface_test.go:12-63
Timestamp: 2025-12-15T10:16:21.909Z
Learning: In provider tests under core/providers/<provider>/*_test.go, do not require or flag the use of defer for Shutdown(); instead call client.Shutdown() at the end of each test function. This pattern appears consistent across all provider tests. Apply this rule only within this path; for other tests or resources, defer may still be appropriate.
Applied to files:
core/providers/azure/azure_test.gocore/providers/openai/openai_test.go
📚 Learning: 2025-12-11T11:58:25.307Z
Learnt from: Pratham-Mishra04
Repo: maximhq/bifrost PR: 1000
File: core/providers/openai/responses.go:42-84
Timestamp: 2025-12-11T11:58:25.307Z
Learning: In core/providers/openai/responses.go (and related OpenAI response handling), document and enforce the API format constraint: if ResponsesReasoning != nil and the response contains content blocks, all content blocks should be treated as reasoning blocks by default. Implement type guards or parsing logic accordingly, and add unit tests to verify that when ResponsesReasoning is non-nil, content blocks are labeled as reasoning blocks. Include clear comments in the code explaining the rationale and ensure downstream consumers rely on this behavior.
Applied to files:
core/providers/openai/openai_test.go
📚 Learning: 2025-12-14T14:43:30.902Z
Learnt from: Radheshg04
Repo: maximhq/bifrost PR: 980
File: core/providers/openai/images.go:10-22
Timestamp: 2025-12-14T14:43:30.902Z
Learning: Enforce the OpenAI image generation SSE event type values across the OpenAI image flow in the repository: use "image_generation.partial_image" for partial chunks, "image_generation.completed" for the final result, and "error" for errors. Apply this consistently in schemas, constants, tests, accumulator routing, and UI code within core/providers/openai (and related Go files) to ensure uniform event typing and avoid mismatches.
Applied to files:
core/providers/openai/openai_test.go
🧬 Code graph analysis (7)
core/providers/azure/azure_test.go (2)
core/schemas/provider.go (1)
Provider(314-361)core/schemas/bifrost.go (1)
Azure(36-36)
ui/lib/types/logs.ts (1)
core/schemas/chatcompletions.go (1)
ChatAudioMessageAudio(710-715)
framework/streaming/chat.go (2)
core/schemas/chatcompletions.go (2)
ChatAssistantMessage(634-641)ChatAudioMessageAudio(710-715)ui/lib/types/logs.ts (1)
ChatAudioMessageAudio(120-125)
core/internal/testutil/chat_audio.go (2)
core/internal/testutil/utils.go (2)
GetSampleAudioBase64(220-233)CreateAudioChatMessage(312-329)core/schemas/chatcompletions.go (6)
ChatMessage(477-486)BifrostChatResponse(27-42)BifrostChatRequest(12-19)ChatParameters(156-186)ChatAudioParameters(225-228)BifrostLLMUsage(830-837)
core/internal/testutil/tests.go (1)
core/internal/testutil/chat_audio.go (2)
RunChatAudioTest(14-149)RunChatAudioStreamTest(152-334)
core/schemas/mux.go (3)
ui/lib/types/logs.ts (2)
CompletionTokensDetails(198-203)ResponsesResponseOutputTokens(556-558)core/schemas/responses.go (1)
ResponsesResponseOutputTokens(277-288)core/schemas/chatcompletions.go (1)
ChatCompletionTokensDetails(849-861)
core/schemas/chatcompletions.go (1)
ui/lib/types/logs.ts (1)
ChatAudioMessageAudio(120-125)
⏰ Context from checks skipped due to timeout of 900000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
🔇 Additional comments (19)
core/providers/azure/azure_test.go (1)
27-30: LGTM! Audio test configuration properly integrated.The Azure test configuration correctly adds the ChatAudioModel field with the preview model name and enables the ChatAudio scenario flag, aligning with the broader test framework changes for audio support.
Also applies to: 54-54
core/internal/testutil/tests.go (2)
67-68: LGTM! Audio test scenarios properly integrated.The new audio test scenarios are correctly added to the comprehensive test suite and will execute for providers that support them.
121-122: LGTM! Summary logic correctly gates audio tests.The summary output correctly checks both the ChatAudio flag and the presence of a ChatAudioModel, consistent with other conditional test scenarios like Reasoning and Embedding.
ui/app/workspace/logs/views/logChatMessageView.tsx (1)
183-207: LGTM! Audio output rendering properly implemented.The audio output section correctly:
- Conditionally renders based on message.audio existence
- Displays transcript and audio player when data is present
- Converts Unix timestamp to JavaScript Date for expiry display
- Follows consistent styling patterns with other message blocks
core/providers/openai/openai_test.go (1)
41-41: LGTM! OpenAI audio test configuration properly set.The ChatAudioModel field and ChatAudio scenario flag are correctly configured, consistent with the Azure test setup.
Also applies to: 75-75
framework/streaming/chat.go (1)
106-135: LGTM! Audio streaming accumulation properly implemented.The audio chunk accumulation logic correctly:
- Initializes audio structure on first chunk with all metadata fields
- Accumulates audio data and transcript across subsequent chunks
- Updates ID and ExpiresAt metadata when provided (treating them as final values per the comment)
- Follows the same accumulation pattern used for content, refusal, and reasoning
The approach is consistent with the existing streaming framework patterns.
core/internal/testutil/account.go (2)
55-55: LGTM! Test configuration fields properly added.The ChatAudio flag and ChatAudioModel field are correctly added to the test configuration structs, enabling audio test scenarios for supported providers.
Also applies to: 69-69
586-623: LGTM! Provider configurations correctly enable audio support.The OpenAI and Azure provider configurations properly set ChatAudioModel to "gpt-4o-mini-audio-preview" and enable the ChatAudio scenario flag, which aligns with the providers that support chat audio functionality.
Also applies to: 729-762
ui/lib/types/logs.ts (1)
117-125: LGTM! Audio types properly defined.The ChatAudioMessageAudio interface and the optional audio field on ChatMessage are correctly structured and align with the backend Go types. Field names follow the existing snake_case convention for JSON serialization.
core/internal/testutil/utils.go (2)
219-233: LGTM! Audio sample loading utility properly implemented.The GetSampleAudioBase64() function follows the established pattern of GetLionBase64Image() and correctly:
- Locates the sample audio file using runtime.Caller
- Reads and base64-encodes the audio data
- Provides appropriate error handling
312-329: LGTM! Audio message constructor properly implemented.The CreateAudioChatMessage() function correctly constructs a chat message with both text and audio input blocks, following the same pattern as CreateImageChatMessage(). The audio block structure properly includes data and format fields.
core/schemas/responses.go (2)
268-274: LGTM - Comprehensive input token breakdown.The addition of
TextTokensandImageTokensprovides granular tracking of different input modalities alongside the existingAudioTokens. MakingCachedTokensoptional (viaomitempty) is a good change for providers that don't support prompt caching.
278-278: LGTM - Completes output token breakdown.Adding
TextTokensto the output tokens aligns with the input token changes and provides consistent token tracking across both directions.core/schemas/mux.go (2)
654-662: LGTM - Bidirectional token field mapping.The conversion logic properly maps the new
TextTokensandImageTokensfields from Chat API to Responses API format in both input and output token details. The straightforward field mapping maintains data integrity.
690-698: LGTM - Completes bidirectional conversion.The reverse mapping from Responses API to Chat API format properly handles the new token fields, ensuring data preservation in round-trip conversions.
core/internal/testutil/chat_audio.go (1)
218-270: LGTM - Comprehensive streaming audio accumulation.The stream chunk accumulation logic properly handles:
- Text content deltas
- Audio data and transcript accumulation
- Audio metadata (ID, ExpiresAt)
- Final usage statistics
The logic correctly uses a
strings.Builderfor efficient string concatenation.core/schemas/chatcompletions.go (3)
709-715: LGTM - Audio message type aligns with UI contract.The
ChatAudioMessageAudiotype properly defines the audio data structure with all required fields. The structure matches the UI type definition inui/lib/types/logs.ts, ensuring consistency across the stack.Note: The
Datafield will contain base64-encoded audio which can be large. Ensure downstream systems (UI, logs, storage) can handle the payload size appropriately.
157-163: LGTM - Audio parameters properly integrated.The
Audiofield is correctly added toChatParametersas an optional pointer withomitempty. The presence of theModalitiesfield (line 163) allows requests to specify["text", "audio"]to enable audio output, which aligns with the test usage.
840-850: LGTM - Token breakdown extended for multiple modalities.The addition of
TextTokensandImageTokensto bothChatPromptTokensDetailsandChatCompletionTokensDetailscompletes the token tracking for different input/output modalities (text, audio, image). This aligns with the changes inresponses.goandmux.gofor bidirectional API conversion.
62ab3e9 to
3e89fe3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
♻️ Duplicate comments (2)
plugins/mocker/version (1)
1-1: Verify version consistency across plugins.This plugin is also at 1.3.49, matching
plugins/telemetry/version, but differs fromplugins/governance/versionat 1.3.50. Please ensure all plugin versions are intentionally staggered or should be synchronized to the same version for coordinated release across the stack.core/schemas/chatcompletions.go (1)
224-228: Acknowledged: Schema mirrors OpenAI's API specification.Based on learnings, the
ChatAudioParametersstructure intentionally mirrors OpenAI's API specifications, and the valid values forFormatandVoiceare defined by OpenAI's documentation.
🧹 Nitpick comments (4)
core/internal/testutil/chat_audio.go (4)
62-90: Consider handling the nil response case before creating a BifrostError.The operation function returns a custom
BifrostErrorwhenresponseis nil (lines 84-89), but this path is reached only after checkingerr != nil. Ifclient.ChatCompletionRequestreturns(nil, nil), this creates an error, which is correct. However, the error message "No chat response returned" could be more descriptive for debugging.if response != nil { return response, nil } return nil, &schemas.BifrostError{ IsBifrostError: true, Error: &schemas.ErrorField{ - Message: "No chat response returned", + Message: "ChatCompletionRequest returned nil response without error", }, }
130-140: Inconsistent test failure severity for audio validation.Lines 130-134 use
t.Errorfor missingaudio.Data, while lines 125-127 uset.Fatalfor missingaudioobject. Ifaudio.Databeing empty is equally critical (as the expected behavior inretryContextsuggests), consider usingt.Fatalfor consistency, or document why the test should continue despite missing data.
191-203: Streaming test lacks MaxCompletionTokens parameter unlike non-streaming.The non-streaming test (line 73) sets
MaxCompletionTokens: bifrost.Ptr(200), but the streaming test omits this. This inconsistency may lead to different behavior or longer-running tests. Consider aligning the parameters for consistency.chatReq := &schemas.BifrostChatRequest{ Provider: testConfig.Provider, Model: testConfig.ChatAudioModel, Input: chatMessages, Params: &schemas.ChatParameters{ - Modalities: []string{"text", "audio"}, + Modalities: []string{"text", "audio"}, + MaxCompletionTokens: bifrost.Ptr(200), Audio: &schemas.ChatAudioParameters{ Voice: "alloy", Format: "pcm16", // output format }, }, Fallbacks: testConfig.Fallbacks, }
286-300: Consider using t.Fatal for missing accumulated audio data to fail fast.Similar to the non-streaming test, using
t.Error(lines 289, 297) allows the test to continue even when critical audio data is missing. If accumulated audio data and transcript are required outcomes,t.Fatalwould provide clearer failure semantics.
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (36)
core/internal/testutil/account.go(9 hunks)core/internal/testutil/chat_audio.go(1 hunks)core/internal/testutil/tests.go(2 hunks)core/internal/testutil/utils.go(3 hunks)core/providers/azure/azure_test.go(2 hunks)core/providers/openai/openai_test.go(2 hunks)core/schemas/chatcompletions.go(7 hunks)core/schemas/mux.go(2 hunks)core/schemas/responses.go(1 hunks)core/version(1 hunks)framework/changelog.md(1 hunks)framework/modelcatalog/pricing.go(3 hunks)framework/streaming/chat.go(1 hunks)framework/version(1 hunks)plugins/governance/changelog.md(1 hunks)plugins/governance/version(1 hunks)plugins/jsonparser/changelog.md(1 hunks)plugins/jsonparser/version(1 hunks)plugins/logging/changelog.md(1 hunks)plugins/logging/version(1 hunks)plugins/maxim/changelog.md(1 hunks)plugins/maxim/version(1 hunks)plugins/mocker/changelog.md(1 hunks)plugins/mocker/version(1 hunks)plugins/otel/changelog.md(1 hunks)plugins/otel/version(1 hunks)plugins/semanticcache/changelog.md(1 hunks)plugins/semanticcache/version(1 hunks)plugins/telemetry/changelog.md(1 hunks)plugins/telemetry/version(1 hunks)transports/changelog.md(1 hunks)transports/version(1 hunks)ui/app/workspace/logs/views/filters.tsx(1 hunks)ui/app/workspace/logs/views/logChatMessageView.tsx(2 hunks)ui/lib/constants/icons.tsx(1 hunks)ui/lib/types/logs.ts(1 hunks)
✅ Files skipped from review due to trivial changes (17)
- plugins/maxim/version
- plugins/semanticcache/changelog.md
- plugins/otel/version
- plugins/logging/version
- plugins/logging/changelog.md
- plugins/mocker/changelog.md
- plugins/maxim/changelog.md
- transports/version
- framework/changelog.md
- transports/changelog.md
- plugins/telemetry/changelog.md
- core/version
- plugins/governance/changelog.md
- framework/version
- plugins/otel/changelog.md
- plugins/jsonparser/changelog.md
- plugins/jsonparser/version
🚧 Files skipped from review as they are similar to previous changes (6)
- framework/streaming/chat.go
- core/providers/openai/openai_test.go
- core/internal/testutil/tests.go
- core/schemas/mux.go
- ui/app/workspace/logs/views/logChatMessageView.tsx
- core/providers/azure/azure_test.go
👮 Files not reviewed due to content moderation or server errors (1)
- ui/lib/constants/icons.tsx
🧰 Additional context used
📓 Path-based instructions (1)
**
⚙️ CodeRabbit configuration file
always check the stack if there is one for the current PR. do not give localized reviews for the PR, always see all changes in the light of the whole stack of PRs (if there is a stack, if there is no stack you can continue to make localized suggestions/reviews)
Files:
core/internal/testutil/utils.goui/app/workspace/logs/views/filters.tsxui/lib/constants/icons.tsxplugins/semanticcache/versioncore/internal/testutil/chat_audio.goplugins/telemetry/versioncore/schemas/responses.goui/lib/types/logs.tsplugins/mocker/versionplugins/governance/versioncore/schemas/chatcompletions.goframework/modelcatalog/pricing.gocore/internal/testutil/account.go
🧠 Learnings (2)
📚 Learning: 2025-12-09T17:07:42.007Z
Learnt from: qwerty-dvorak
Repo: maximhq/bifrost PR: 1006
File: core/schemas/account.go:9-18
Timestamp: 2025-12-09T17:07:42.007Z
Learning: In core/schemas/account.go, the HuggingFaceKeyConfig field within the Key struct is currently unused and reserved for future Hugging Face inference endpoint deployments. Do not flag this field as missing from OpenAPI documentation or require its presence in the API spec until the feature is actively implemented and used. When the feature is added, update the OpenAPI docs accordingly; otherwise, treat this field as non-breaking and not part of the current API surface.
Applied to files:
core/internal/testutil/utils.gocore/internal/testutil/chat_audio.gocore/schemas/responses.gocore/schemas/chatcompletions.goframework/modelcatalog/pricing.gocore/internal/testutil/account.go
📚 Learning: 2025-12-17T08:44:08.788Z
Learnt from: Pratham-Mishra04
Repo: maximhq/bifrost PR: 1114
File: core/schemas/chatcompletions.go:224-228
Timestamp: 2025-12-17T08:44:08.788Z
Learning: In core/schemas/chatcompletions.go, ensure the schema structures mirror OpenAI's API specifications exactly. Use the valid values for fields like ChatAudioParameters.Format and ChatAudioParameters.Voice as defined by OpenAI's documentation, and avoid adding additional inline documentation or constants to maintain direct compatibility with OpenAI's API.
Applied to files:
core/schemas/chatcompletions.go
🧬 Code graph analysis (5)
core/internal/testutil/utils.go (2)
core/schemas/chatcompletions.go (6)
ChatMessage(477-486)ChatMessageContent(532-535)ChatContentBlock(598-605)ChatContentBlockTypeText(590-590)ChatContentBlockTypeInputAudio(592-592)ChatInputAudio(616-619)core/schemas/utils.go (1)
Ptr(16-18)
ui/lib/types/logs.ts (1)
core/schemas/chatcompletions.go (1)
ChatAudioMessageAudio(710-715)
core/schemas/chatcompletions.go (1)
ui/lib/types/logs.ts (1)
ChatAudioMessageAudio(120-125)
framework/modelcatalog/pricing.go (2)
core/providers/perplexity/types.go (1)
Usage(69-78)core/schemas/chatcompletions.go (1)
BifrostLLMUsage(830-837)
core/internal/testutil/account.go (5)
core/schemas/account.go (1)
BedrockKeyConfig(56-64)ui/lib/types/config.ts (1)
BedrockKeyConfig(63-71)core/utils.go (1)
Ptr(56-58)core/schemas/utils.go (1)
Ptr(16-18)core/schemas/provider.go (1)
Provider(314-361)
⏰ Context from checks skipped due to timeout of 900000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (21)
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
🔇 Additional comments (25)
plugins/semanticcache/version (1)
1-1: Verify if version bump is intentional for this plugin.The semantic cache plugin version is bumped from 1.3.48 to 1.3.49, but no code changes were made to any files in the
plugins/semanticcache/directory. Since this PR focuses on audio support for chat completions (unrelated to semantic cache), please confirm whether this version bump is intentional or if it was accidentally included.plugins/governance/version (1)
1-1: Version bump looks correct, but verify cross-plugin consistency.The version bump to 1.3.50 appears appropriate. However, this is ahead of
plugins/telemetry/versionandplugins/mocker/version(both at 1.3.49 in this PR). Given that this PR is part of a stacked set, please verify that the version differences across plugins are intentional and aligned with the release strategy for the entire stack.ui/app/workspace/logs/views/filters.tsx (1)
327-330: LGTM!Minor UI styling improvements that enhance the visual consistency of the "More actions" popover.
ui/lib/types/logs.ts (1)
117-125: LGTM!The TypeScript interface correctly mirrors the Go schema structure from
core/schemas/chatcompletions.go, maintaining type safety across the stack.framework/modelcatalog/pricing.go (2)
43-51: LGTM!Correctly gates usage extraction on the presence of
Usagedata, preventing potential nil pointer issues.
76-78: LGTM!Properly extracts audio duration for pricing calculations when available.
core/schemas/responses.go (1)
268-278: LGTM!Token detail extensions correctly add audio-related fields and properly use
omitemptytags to avoid serializing zero values. The changes align with the broader audio token tracking enhancements across the stack.core/internal/testutil/account.go (5)
55-69: LGTM!The addition of
ChatAudioandChatAudioModelfields properly extends the test configuration to support audio chat scenarios across the test suite.
165-183: LGTM!The second Bedrock key configuration with
UseForBatchAPI: bifrost.Ptr(true)correctly enables batch API testing for Bedrock. The separation allows independent testing of standard and batch API flows.
213-220: LGTM!Azure deployment mappings correctly include the audio-related models:
gpt-4o-mini-audio-preview,whisper-1, andgpt-4o-mini-tts, aligning with Azure's audio capabilities.
594-626: LGTM!OpenAI provider configuration properly enables chat audio testing with the
gpt-4o-mini-audio-previewmodel.
732-767: LGTM!Azure provider configuration comprehensively enables audio capabilities including chat audio, speech synthesis, and transcription with appropriate model mappings.
core/internal/testutil/utils.go (3)
5-5: LGTM!The
encoding/base64import is required for the new audio encoding functionality.
219-233: LGTM!The
GetSampleAudioBase64()function follows the established pattern ofGetLionBase64Image()and correctly loads and encodes the sample audio file.
312-329: LGTM!The
CreateAudioChatMessage()helper correctly constructs a chat message with both text and audio input blocks, following the schema patterns established incore/schemas/chatcompletions.go.core/internal/testutil/chat_audio.go (3)
14-18: LGTM: Guard clause properly gates audio test execution.The early return with informative logging when audio is not supported follows the established pattern for conditional test scenarios.
152-156: LGTM: Streaming test properly gates on audio support.Mirrors the non-streaming test's guard clause pattern appropriately.
217-270: LGTM: Stream chunk accumulation logic is well-structured.The loop correctly accumulates text content, audio data, transcript, ID, and expiry metadata from streamed deltas. Error handling within the loop properly fails fast on chunk errors.
core/schemas/chatcompletions.go (7)
157-181: LGTM: ChatParameters extended with Audio field.The
Audio *ChatAudioParametersfield is properly placed alphabetically with other parameters and uses a pointer for optional semantics, consistent with the existing pattern.
522-526: LGTM: UnmarshalJSON correctly includes Audio in assistant message detection.The condition now properly checks for
assistantMsg.Audio != nilto ensure audio-containing messages are correctly identified as assistant messages during unmarshalling.
634-641: LGTM: ChatAssistantMessage extended with Audio field.The
Audio *ChatAudioMessageAudiofield is properly added with appropriate JSON tag and pointer semantics for optional data.
777-777: LGTM: ChatStreamResponseChoiceDelta extended with Audio field.The
Audio *ChatAudioMessageAudiofield enables streaming audio data accumulation, aligning with the test utility implementation inchat_audio.go.
839-847: LGTM: ChatPromptTokensDetails extended with TextTokens and ImageTokens.These additions align with the broader token tracking enhancements for audio support, maintaining consistency with the existing
AudioTokensfield.
849-861: LGTM: ChatCompletionTokensDetails extended with TextTokens.Consistent with the prompt tokens details extension, enabling comprehensive token tracking across modalities.
709-715: The review comment's concern does not apply—the current struct design is correct.The OpenAI Chat Completions API always returns all audio fields (
id,data,expires_at, andtranscript) as populated values. The non-pointer, non-omitempty struct fields are the appropriate choice since these fields are always present in responses. Using pointers would be unnecessary and inconsistent with the codebase pattern of reserving pointers for truly optional fields.
3e89fe3 to
cc9bdd3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (1)
plugins/telemetry/version (1)
1-1: Version inconsistency already flagged in prior review.The concern about version alignment (1.3.49 here vs 1.3.50 in other plugins) was already raised in a previous review comment. Please verify if this divergence is intentional or if all plugin versions should be synchronized.
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
core/internal/testutil/scenarios/media/sample.mp3is excluded by!**/*.mp3
📒 Files selected for processing (36)
core/internal/testutil/account.go(9 hunks)core/internal/testutil/chat_audio.go(1 hunks)core/internal/testutil/tests.go(2 hunks)core/internal/testutil/utils.go(3 hunks)core/providers/azure/azure_test.go(2 hunks)core/providers/openai/openai_test.go(2 hunks)core/schemas/chatcompletions.go(7 hunks)core/schemas/mux.go(2 hunks)core/schemas/responses.go(1 hunks)core/version(1 hunks)framework/changelog.md(1 hunks)framework/modelcatalog/pricing.go(3 hunks)framework/streaming/chat.go(1 hunks)framework/version(1 hunks)plugins/governance/changelog.md(1 hunks)plugins/governance/version(1 hunks)plugins/jsonparser/changelog.md(1 hunks)plugins/jsonparser/version(1 hunks)plugins/logging/changelog.md(1 hunks)plugins/logging/version(1 hunks)plugins/maxim/changelog.md(1 hunks)plugins/maxim/version(1 hunks)plugins/mocker/changelog.md(1 hunks)plugins/mocker/version(1 hunks)plugins/otel/changelog.md(1 hunks)plugins/otel/version(1 hunks)plugins/semanticcache/changelog.md(1 hunks)plugins/semanticcache/version(1 hunks)plugins/telemetry/changelog.md(1 hunks)plugins/telemetry/version(1 hunks)transports/changelog.md(1 hunks)transports/version(1 hunks)ui/app/workspace/logs/views/filters.tsx(1 hunks)ui/app/workspace/logs/views/logChatMessageView.tsx(2 hunks)ui/lib/constants/icons.tsx(1 hunks)ui/lib/types/logs.ts(1 hunks)
✅ Files skipped from review due to trivial changes (9)
- plugins/otel/changelog.md
- transports/version
- framework/changelog.md
- plugins/governance/changelog.md
- core/version
- plugins/maxim/version
- plugins/semanticcache/changelog.md
- framework/version
- plugins/semanticcache/version
🚧 Files skipped from review as they are similar to previous changes (12)
- plugins/jsonparser/changelog.md
- core/internal/testutil/tests.go
- ui/lib/types/logs.ts
- transports/changelog.md
- framework/streaming/chat.go
- core/schemas/responses.go
- plugins/maxim/changelog.md
- plugins/jsonparser/version
- plugins/logging/version
- ui/app/workspace/logs/views/filters.tsx
- core/schemas/mux.go
- plugins/telemetry/changelog.md
👮 Files not reviewed due to content moderation or server errors (1)
- ui/lib/constants/icons.tsx
🧰 Additional context used
📓 Path-based instructions (1)
**
⚙️ CodeRabbit configuration file
always check the stack if there is one for the current PR. do not give localized reviews for the PR, always see all changes in the light of the whole stack of PRs (if there is a stack, if there is no stack you can continue to make localized suggestions/reviews)
Files:
core/providers/openai/openai_test.goplugins/mocker/changelog.mdframework/modelcatalog/pricing.gocore/internal/testutil/utils.goui/app/workspace/logs/views/logChatMessageView.tsxplugins/logging/changelog.mdcore/internal/testutil/account.goui/lib/constants/icons.tsxplugins/governance/versioncore/schemas/chatcompletions.gocore/internal/testutil/chat_audio.goplugins/telemetry/versionplugins/mocker/versioncore/providers/azure/azure_test.goplugins/otel/version
🧠 Learnings (5)
📚 Learning: 2025-12-09T17:07:42.007Z
Learnt from: qwerty-dvorak
Repo: maximhq/bifrost PR: 1006
File: core/schemas/account.go:9-18
Timestamp: 2025-12-09T17:07:42.007Z
Learning: In core/schemas/account.go, the HuggingFaceKeyConfig field within the Key struct is currently unused and reserved for future Hugging Face inference endpoint deployments. Do not flag this field as missing from OpenAPI documentation or require its presence in the API spec until the feature is actively implemented and used. When the feature is added, update the OpenAPI docs accordingly; otherwise, treat this field as non-breaking and not part of the current API surface.
Applied to files:
core/providers/openai/openai_test.goframework/modelcatalog/pricing.gocore/internal/testutil/utils.gocore/internal/testutil/account.gocore/schemas/chatcompletions.gocore/internal/testutil/chat_audio.gocore/providers/azure/azure_test.go
📚 Learning: 2025-12-11T11:58:25.307Z
Learnt from: Pratham-Mishra04
Repo: maximhq/bifrost PR: 1000
File: core/providers/openai/responses.go:42-84
Timestamp: 2025-12-11T11:58:25.307Z
Learning: In core/providers/openai/responses.go (and related OpenAI response handling), document and enforce the API format constraint: if ResponsesReasoning != nil and the response contains content blocks, all content blocks should be treated as reasoning blocks by default. Implement type guards or parsing logic accordingly, and add unit tests to verify that when ResponsesReasoning is non-nil, content blocks are labeled as reasoning blocks. Include clear comments in the code explaining the rationale and ensure downstream consumers rely on this behavior.
Applied to files:
core/providers/openai/openai_test.go
📚 Learning: 2025-12-14T14:43:30.902Z
Learnt from: Radheshg04
Repo: maximhq/bifrost PR: 980
File: core/providers/openai/images.go:10-22
Timestamp: 2025-12-14T14:43:30.902Z
Learning: Enforce the OpenAI image generation SSE event type values across the OpenAI image flow in the repository: use "image_generation.partial_image" for partial chunks, "image_generation.completed" for the final result, and "error" for errors. Apply this consistently in schemas, constants, tests, accumulator routing, and UI code within core/providers/openai (and related Go files) to ensure uniform event typing and avoid mismatches.
Applied to files:
core/providers/openai/openai_test.go
📚 Learning: 2025-12-15T10:16:21.909Z
Learnt from: qwerty-dvorak
Repo: maximhq/bifrost PR: 1006
File: core/providers/huggingface/huggingface_test.go:12-63
Timestamp: 2025-12-15T10:16:21.909Z
Learning: In provider tests under core/providers/<provider>/*_test.go, do not require or flag the use of defer for Shutdown(); instead call client.Shutdown() at the end of each test function. This pattern appears consistent across all provider tests. Apply this rule only within this path; for other tests or resources, defer may still be appropriate.
Applied to files:
core/providers/openai/openai_test.gocore/providers/azure/azure_test.go
📚 Learning: 2025-12-17T08:44:08.788Z
Learnt from: Pratham-Mishra04
Repo: maximhq/bifrost PR: 1114
File: core/schemas/chatcompletions.go:224-228
Timestamp: 2025-12-17T08:44:08.788Z
Learning: In core/schemas/chatcompletions.go, ensure the schema structures mirror OpenAI's API specifications exactly. Use the valid values for fields like ChatAudioParameters.Format and ChatAudioParameters.Voice as defined by OpenAI's documentation, and avoid adding additional inline documentation or constants to maintain direct compatibility with OpenAI's API.
Applied to files:
core/schemas/chatcompletions.go
🧬 Code graph analysis (4)
framework/modelcatalog/pricing.go (2)
core/providers/perplexity/types.go (1)
Usage(69-78)core/schemas/chatcompletions.go (1)
BifrostLLMUsage(830-837)
core/internal/testutil/utils.go (2)
core/schemas/chatcompletions.go (7)
ChatMessage(477-486)ChatMessageRoleUser(470-470)ChatMessageContent(532-535)ChatContentBlock(598-605)ChatContentBlockTypeText(590-590)ChatContentBlockTypeInputAudio(592-592)ChatInputAudio(616-619)core/schemas/utils.go (1)
Ptr(16-18)
core/internal/testutil/account.go (4)
core/schemas/account.go (2)
BedrockKeyConfig(56-64)AzureKeyConfig(23-27)ui/lib/types/config.ts (2)
BedrockKeyConfig(63-71)AzureKeyConfig(23-27)core/utils.go (1)
Ptr(56-58)core/schemas/utils.go (1)
Ptr(16-18)
core/schemas/chatcompletions.go (1)
ui/lib/types/logs.ts (1)
ChatAudioMessageAudio(120-125)
⏰ Context from checks skipped due to timeout of 900000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
🔇 Additional comments (30)
plugins/otel/version (1)
1-1: LGTM! Version bump is appropriate.The patch-level increment from 1.0.48 to 1.0.49 is consistent with the audio support feature addition and aligns with the version bumps across other components in this stacked PR.
plugins/governance/version (1)
1-1: LGTM! Version bump is clean.The patch version increment is straightforward and consistent with the feature addition in this PR stack.
plugins/mocker/version (1)
1-1: No version bump verification needed—this is the initial repository commit with all files newly added. The mocker plugin version "1.3.49" is being set for the first time, not bumped from a prior version. Additionally, the mocker plugin itself contains no audio-related code, so the versioning decision is independent of the PR's audio feature additions.plugins/logging/changelog.md (1)
1-1: No changes needed. Plugin changelog follows established repository convention.All plugin changelogs in this repository (governance, jsonparser, logging, maxim, mocker, otel, semanticcache, telemetry) consistently document version updates only. Audio support features are documented in
core/changelog.mdwith entries such as "feat: added transcription and speech support for azure" and "feat: add audio token pricing support for models." The code confirms audio handling is implemented inplugins/logging/operations.gowith theAudioOutputfield.plugins/mocker/changelog.md (1)
1-1: LGTM!Changelog entry correctly documents the core and framework version updates associated with this release.
ui/app/workspace/logs/views/logChatMessageView.tsx (1)
183-207: Audio output section follows established patterns.The implementation correctly mirrors the structure of existing content sections (reasoning, refusal, annotations) with appropriate conditional rendering for transcript, audio data, and metadata.
core/providers/openai/openai_test.go (2)
41-41: LGTM!ChatAudioModel correctly configured for OpenAI's audio-preview model.
75-75: ChatAudio scenario properly enabled.The test scenario flag aligns with the new ChatAudioModel configuration, ensuring audio tests will execute for OpenAI.
core/internal/testutil/utils.go (2)
312-329: Audio message construction aligns with schema definitions.The function correctly uses
ChatContentBlockTypeInputAudioand populatesChatInputAudiowithDataandFormatfields matching the schema structure inchatcompletions.go.
219-233: LGTM—function correctly implements audio base64 encoding for test utilities.The implementation properly follows the established pattern from
GetLionBase64Image(), usingruntime.Caller()to locate and load the sample audio file. The function correctly returns raw base64 without a data URI prefix, which is the expected format for theChatInputAudio.Datafield as documented in the schema.core/providers/azure/azure_test.go (3)
30-30: ChatAudioModel and scenario correctly configured for Azure.The
gpt-4o-mini-audio-previewmodel and enabledChatAudioscenario align with the OpenAI test configuration, ensuring consistent audio test coverage across providers.Also applies to: 56-56
57-60: Audio-related scenarios appropriately configured.
- Transcription enabled, TranscriptionStream disabled with documented reason
- SpeechSynthesis and SpeechSynthesisStream both enabled
The configuration reflects Azure's current capability support.
36-36: No changes needed—Azure ReasoningModel configuration is correct.Azure is the only cloud providing access to both Claude and GPT frontier models to customers on one platform through Microsoft Foundry, where Opus 4.5 is now available in public preview. The codebase shows this is an intentional multi-cloud partnership pattern: Bedrock uses
claude-4.5-sonnet, Vertex usesclaude-4.5-haiku, and both Azure and Anthropic useclaude-opus-4-5for reasoning. Each provider tests with models native to their platform.Likely an incorrect or invalid review comment.
core/internal/testutil/account.go (5)
55-56: LGTM! Audio scenario flag added consistently.The
ChatAudiofield follows the established pattern for capability flags and integrates cleanly with the existing test scenario structure.
69-70: LGTM! Model configuration field added consistently.The
ChatAudioModelfield aligns with the existing pattern for model-specific configurations.
165-183: LGTM! Secondary Bedrock key configuration added for additional test coverage.This second Bedrock key configuration uses a different ARN (
AWS_BEDROCK_ARNvsAWS_ARN), includes additional model deployments (notablyclaude-3.5-sonnet), and is specifically enabled for batch API testing viaUseForBatchAPI.
599-640: LGTM! Provider configurations correctly updated with audio capabilities.The OpenAI and Azure test configurations properly set:
ChatAudioModelfields pointing to appropriate audio-capable modelsChatAudioscenario flags enabled- Additional audio-related models and scenarios for Azure (transcription, speech synthesis)
The comment on Line 767 noting Azure's lack of proper streaming transcription support is helpful context.
Also applies to: 744-781
225-237: No action required — The Azure API version 2025-01-01-preview is the correct and stable choice for audio feature testing. The "preview" designation in the API version reflects the API surface, not the model lifecycle. The audio models (whisper-1, gpt-4o-mini-tts) are production-ready and suitable for CI/CD environments.core/internal/testutil/chat_audio.go (2)
14-149: LGTM! Non-streaming audio test is well-structured and comprehensive.The test function properly:
- Guards execution based on scenario configuration
- Loads and encodes sample audio
- Constructs requests with appropriate audio parameters (input: mp3, output: wav)
- Uses the retry framework for resilience
- Validates response structure, audio data presence, and transcript content
- Provides clear logging for debugging
The different input (mp3) and output (wav) formats appear intentional and align with typical audio processing workflows.
152-334: LGTM! Streaming audio test correctly handles chunk accumulation and validation.The streaming test function:
- Uses
pcm16format appropriate for streaming audio (vswavfor non-streaming)- Properly accumulates audio data, transcripts, and metadata across chunks
- Validates presence of audio in the stream
- Checks optional audio token usage details when available
- Handles both text and audio modalities correctly
The accumulation logic correctly builds up
audioDataandaudioTranscriptfrom delta chunks while capturingaudioIDandaudioExpiresAtmetadata.core/schemas/chatcompletions.go (7)
157-186: LGTM! Audio parameters field added to chat parameters.The
Audiofield is correctly typed as*ChatAudioParametersand follows the existing optional parameter pattern with theomitemptyJSON tag.
224-228: LGTM! Audio parameters struct mirrors OpenAI's specification.The
ChatAudioParametersstructure correctly definesFormatandVoicefields as strings with optional JSON encoding, matching OpenAI's chat audio API specification.Based on learnings, the team intentionally mirrors OpenAI's API structure exactly, so supported values are defined by OpenAI's documentation.
522-526: LGTM! UnmarshalJSON correctly includes audio in assistant message detection.The condition properly checks for
assistantMsg.Audio != nilalongside other optional fields to determine whether to populateChatAssistantMessage.
636-641: LGTM! Audio field added to assistant message structure.The
Audiofield is correctly typed and positioned withinChatAssistantMessage, following the established pattern for optional message content.
709-715: LGTM! Audio message structure correctly defined.The
ChatAudioMessageAudiostruct properly defines all required audio fields and maintains type consistency with the corresponding TypeScript interface inui/lib/types/logs.ts.
773-781: LGTM! Audio field added to streaming response delta.The
Audiofield inChatStreamResponseChoiceDeltacorrectly enables audio data to be streamed in chunks, consistent with other delta fields likeContentandReasoning.
839-843: LGTM! Token details extended for multimodal usage tracking.The addition of
TextTokensandImageTokensfields to token detail structures enables proper breakdown of usage across modalities (text, audio, image), supporting accurate pricing and usage analytics for multimodal chat completions.Also applies to: 849-858
framework/modelcatalog/pricing.go (3)
42-51: LGTM! Defensive handling of speech response usage.The updated logic properly checks for
Usagepresence before constructingBifrostLLMUsage, preventing potential nil pointer issues and gracefully handling cases where usage data might not be available.
76-78: LGTM! Audio duration extraction for transcription pricing.The code correctly extracts
audioSecondsfromTranscriptionResponse.Usage.Secondswhen available, enabling duration-based pricing calculations for audio transcription operations.
97-99: LGTM! Audio duration extraction fixed for streaming transcription.The code now correctly references
result.TranscriptionStreamResponse.Usage.Seconds(not the non-streaming variant), resolving the issue flagged in the previous review.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (1)
framework/modelcatalog/pricing.go (1)
97-99: LGTM! Previous issue has been resolved.The code now correctly references
result.TranscriptionStreamResponse.Usage.Secondsinstead of the incorrectresult.TranscriptionResponse.Usage.Secondsthat was flagged in a previous review.
🧹 Nitpick comments (2)
plugins/otel/changelog.md (1)
1-1: Include the otel plugin's own version bump in its changelog entry.The changelog entry documents updates to core and framework versions, but according to the PR context, the otel plugin version itself is being bumped from 1.0.48 to 1.0.49. Since this is the otel plugin's own changelog, it should explicitly document this plugin's version change.
Apply this diff to include the otel plugin version bump:
-- chore: update core version to 1.2.39 and framework version to 1.1.49 +- chore: bump otel plugin to 1.0.49; update core version to 1.2.39 and framework version to 1.1.49ui/app/workspace/logs/views/logChatMessageView.tsx (1)
182-207: Audio output rendering looks good overall.The implementation follows the established pattern for other content blocks. A few observations:
- Line 202: If
message.audio.expires_atis0or undefined, the date display could show an invalid date ("Invalid Date" or epoch time "Jan 1, 1970"). Consider adding a guard:{message.audio.id && ( <div className="text-xs text-muted-foreground"> - ID: {message.audio.id} | Expires: {new Date(message.audio.expires_at * 1000).toLocaleString()} + ID: {message.audio.id}{message.audio.expires_at ? ` | Expires: ${new Date(message.audio.expires_at * 1000).toLocaleString()}` : ''} </div> )}
- The conditional rendering structure is consistent with other sections in the component.
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
core/internal/testutil/scenarios/media/sample.mp3is excluded by!**/*.mp3
📒 Files selected for processing (36)
core/internal/testutil/account.go(9 hunks)core/internal/testutil/chat_audio.go(1 hunks)core/internal/testutil/tests.go(2 hunks)core/internal/testutil/utils.go(3 hunks)core/providers/azure/azure_test.go(2 hunks)core/providers/openai/openai_test.go(2 hunks)core/schemas/chatcompletions.go(7 hunks)core/schemas/mux.go(2 hunks)core/schemas/responses.go(1 hunks)core/version(1 hunks)framework/changelog.md(1 hunks)framework/modelcatalog/pricing.go(3 hunks)framework/streaming/chat.go(1 hunks)framework/version(1 hunks)plugins/governance/changelog.md(1 hunks)plugins/governance/version(1 hunks)plugins/jsonparser/changelog.md(1 hunks)plugins/jsonparser/version(1 hunks)plugins/logging/changelog.md(1 hunks)plugins/logging/version(1 hunks)plugins/maxim/changelog.md(1 hunks)plugins/maxim/version(1 hunks)plugins/mocker/changelog.md(1 hunks)plugins/mocker/version(1 hunks)plugins/otel/changelog.md(1 hunks)plugins/otel/version(1 hunks)plugins/semanticcache/changelog.md(1 hunks)plugins/semanticcache/version(1 hunks)plugins/telemetry/changelog.md(1 hunks)plugins/telemetry/version(1 hunks)transports/changelog.md(1 hunks)transports/version(1 hunks)ui/app/workspace/logs/views/filters.tsx(1 hunks)ui/app/workspace/logs/views/logChatMessageView.tsx(2 hunks)ui/lib/constants/icons.tsx(1 hunks)ui/lib/types/logs.ts(1 hunks)
✅ Files skipped from review due to trivial changes (15)
- plugins/semanticcache/changelog.md
- plugins/maxim/changelog.md
- plugins/logging/version
- core/version
- plugins/mocker/changelog.md
- plugins/otel/version
- framework/changelog.md
- plugins/telemetry/changelog.md
- plugins/telemetry/version
- plugins/governance/changelog.md
- plugins/logging/changelog.md
- plugins/semanticcache/version
- plugins/jsonparser/changelog.md
- plugins/mocker/version
- transports/version
🚧 Files skipped from review as they are similar to previous changes (3)
- core/schemas/mux.go
- framework/streaming/chat.go
- core/internal/testutil/utils.go
👮 Files not reviewed due to content moderation or server errors (1)
- ui/lib/constants/icons.tsx
🧰 Additional context used
📓 Path-based instructions (1)
**
⚙️ CodeRabbit configuration file
always check the stack if there is one for the current PR. do not give localized reviews for the PR, always see all changes in the light of the whole stack of PRs (if there is a stack, if there is no stack you can continue to make localized suggestions/reviews)
Files:
plugins/jsonparser/versioncore/internal/testutil/tests.goui/lib/constants/icons.tsxcore/providers/openai/openai_test.goui/app/workspace/logs/views/filters.tsxframework/modelcatalog/pricing.goplugins/governance/versioncore/internal/testutil/account.goframework/versionplugins/otel/changelog.mdplugins/maxim/versiontransports/changelog.mdui/app/workspace/logs/views/logChatMessageView.tsxcore/internal/testutil/chat_audio.goui/lib/types/logs.tscore/schemas/chatcompletions.gocore/providers/azure/azure_test.gocore/schemas/responses.go
🧠 Learnings (5)
📚 Learning: 2025-12-09T17:07:42.007Z
Learnt from: qwerty-dvorak
Repo: maximhq/bifrost PR: 1006
File: core/schemas/account.go:9-18
Timestamp: 2025-12-09T17:07:42.007Z
Learning: In core/schemas/account.go, the HuggingFaceKeyConfig field within the Key struct is currently unused and reserved for future Hugging Face inference endpoint deployments. Do not flag this field as missing from OpenAPI documentation or require its presence in the API spec until the feature is actively implemented and used. When the feature is added, update the OpenAPI docs accordingly; otherwise, treat this field as non-breaking and not part of the current API surface.
Applied to files:
core/internal/testutil/tests.gocore/providers/openai/openai_test.goframework/modelcatalog/pricing.gocore/internal/testutil/account.gocore/internal/testutil/chat_audio.gocore/schemas/chatcompletions.gocore/providers/azure/azure_test.gocore/schemas/responses.go
📚 Learning: 2025-12-11T11:58:25.307Z
Learnt from: Pratham-Mishra04
Repo: maximhq/bifrost PR: 1000
File: core/providers/openai/responses.go:42-84
Timestamp: 2025-12-11T11:58:25.307Z
Learning: In core/providers/openai/responses.go (and related OpenAI response handling), document and enforce the API format constraint: if ResponsesReasoning != nil and the response contains content blocks, all content blocks should be treated as reasoning blocks by default. Implement type guards or parsing logic accordingly, and add unit tests to verify that when ResponsesReasoning is non-nil, content blocks are labeled as reasoning blocks. Include clear comments in the code explaining the rationale and ensure downstream consumers rely on this behavior.
Applied to files:
core/providers/openai/openai_test.go
📚 Learning: 2025-12-14T14:43:30.902Z
Learnt from: Radheshg04
Repo: maximhq/bifrost PR: 980
File: core/providers/openai/images.go:10-22
Timestamp: 2025-12-14T14:43:30.902Z
Learning: Enforce the OpenAI image generation SSE event type values across the OpenAI image flow in the repository: use "image_generation.partial_image" for partial chunks, "image_generation.completed" for the final result, and "error" for errors. Apply this consistently in schemas, constants, tests, accumulator routing, and UI code within core/providers/openai (and related Go files) to ensure uniform event typing and avoid mismatches.
Applied to files:
core/providers/openai/openai_test.go
📚 Learning: 2025-12-15T10:16:21.909Z
Learnt from: qwerty-dvorak
Repo: maximhq/bifrost PR: 1006
File: core/providers/huggingface/huggingface_test.go:12-63
Timestamp: 2025-12-15T10:16:21.909Z
Learning: In provider tests under core/providers/<provider>/*_test.go, do not require or flag the use of defer for Shutdown(); instead call client.Shutdown() at the end of each test function. This pattern appears consistent across all provider tests. Apply this rule only within this path; for other tests or resources, defer may still be appropriate.
Applied to files:
core/providers/openai/openai_test.gocore/providers/azure/azure_test.go
📚 Learning: 2025-12-17T08:44:08.788Z
Learnt from: Pratham-Mishra04
Repo: maximhq/bifrost PR: 1114
File: core/schemas/chatcompletions.go:224-228
Timestamp: 2025-12-17T08:44:08.788Z
Learning: In core/schemas/chatcompletions.go, ensure the schema structures mirror OpenAI's API specifications exactly. Use the valid values for fields like ChatAudioParameters.Format and ChatAudioParameters.Voice as defined by OpenAI's documentation, and avoid adding additional inline documentation or constants to maintain direct compatibility with OpenAI's API.
Applied to files:
core/schemas/chatcompletions.go
🧬 Code graph analysis (6)
core/internal/testutil/tests.go (1)
core/internal/testutil/chat_audio.go (2)
RunChatAudioTest(14-149)RunChatAudioStreamTest(152-334)
framework/modelcatalog/pricing.go (2)
core/providers/perplexity/types.go (1)
Usage(69-78)core/schemas/chatcompletions.go (1)
BifrostLLMUsage(830-837)
core/internal/testutil/account.go (3)
core/schemas/account.go (2)
BedrockKeyConfig(56-64)AzureKeyConfig(23-27)core/utils.go (1)
Ptr(56-58)core/schemas/utils.go (1)
Ptr(16-18)
ui/lib/types/logs.ts (1)
core/schemas/chatcompletions.go (1)
ChatAudioMessageAudio(710-715)
core/schemas/chatcompletions.go (1)
ui/lib/types/logs.ts (1)
ChatAudioMessageAudio(120-125)
core/providers/azure/azure_test.go (3)
core/schemas/provider.go (1)
Provider(314-361)core/schemas/bifrost.go (2)
Azure(36-36)Fallback(145-148)core/schemas/models.go (1)
Model(109-129)
⏰ Context from checks skipped due to timeout of 900000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
🔇 Additional comments (24)
ui/app/workspace/logs/views/filters.tsx (1)
327-330: LGTM! UI theming enhancements align well with the audio feature additions.The accent background and hover effects improve visual consistency and user feedback in the filters popover.
core/internal/testutil/account.go (5)
55-55: LGTM! ChatAudio flag follows established testing patterns.The new field enables proper gating of audio test scenarios across providers.
69-69: LGTM! ChatAudioModel field is consistent with existing model configuration patterns.Properly enables provider-specific audio model selection for testing.
165-183: Verify the intent behind the duplicated Bedrock key configuration.The second Bedrock key (lines 165-183) appears to duplicate most deployments from the first key (lines 148-164), with the main differences being:
- Uses
AWS_BEDROCK_ARNinstead ofAWS_ARN- Sets
UseForBatchAPI: true- Adds
claude-3.5-sonnetdeploymentIf this separation is intentional (batch-enabled vs. non-batch keys), consider:
- Adding a comment explaining the purpose of each key configuration
- Ensuring the first key explicitly sets
UseForBatchAPI: falsefor clarityBased on the broader PR stack context for audio support, please confirm whether this Bedrock key duplication is:
- Part of the audio feature requirements (e.g., audio needs batch API support)
- A pre-existing pattern being maintained
- An unintended side effect that should be cleaned up
607-640: LGTM! OpenAI audio configuration is correctly set up.The
gpt-4o-mini-audio-previewmodel andChatAudio: trueflag properly enable audio testing for OpenAI.
745-780: LGTM! Azure provider configuration comprehensively supports audio features.The model assignments and scenario flags correctly enable:
- Chat audio via
gpt-4o-mini-audio-preview- Transcription via
whisper-1- Speech synthesis via
gpt-4o-mini-ttsThe scenarios accurately reflect Azure's audio capabilities.
transports/changelog.md (1)
8-9: LGTM! Changelog correctly documents the coordinated version bump.The chore entry properly records the core and framework version updates associated with this PR.
core/internal/testutil/tests.go (2)
67-68: LGTM! Audio test scenarios are properly integrated into the comprehensive test suite.The
RunChatAudioTestandRunChatAudioStreamTestfunctions correctly follow theTestScenarioFuncsignature and are appropriately added to the test execution flow.
121-122: LGTM! Test summary entries correctly gate audio scenarios on both flag and model presence.The gating logic
testConfig.Scenarios.ChatAudio && testConfig.ChatAudioModel != ""follows the established pattern for model-specific features and ensures tests only run when properly configured.framework/modelcatalog/pricing.go (2)
42-51: LGTM! Good defensive programming.The null check for
SpeechResponse.Usagebefore accessing its fields prevents potential nil pointer dereferences. The early return with0is appropriate when no usage data is available.
76-78: LGTM!Correctly extracts
audioSecondsfromTranscriptionResponse.Usage.Secondsfor duration-based pricing calculations.core/providers/openai/openai_test.go (2)
75-75: LGTM!Enabling the
ChatAudiotest scenario aligns with the PR's objective to add audio support to chat completions.
41-41: The ChatAudioModel name is correct. The GPT-4o-mini-audio-preview model is a smaller, lower-cost model that provides audio capabilities at a fraction of the cost of GPT-4o-audio-preview. The test configuration is accurate.ui/lib/types/logs.ts (1)
117-125: LGTM! Type definitions align with backend schema.The
ChatAudioMessageAudiointerface correctly mirrors the Go struct fromcore/schemas/chatcompletions.go. The field types are compatible:
id: string↔ID stringdata: string↔Data stringexpires_at: number↔ExpiresAt inttranscript: string↔Transcript stringConsider whether any fields should be optional (e.g.,
transcript?) if the API can return partial audio data, but this depends on backend behavior.core/schemas/responses.go (2)
267-275: LGTM! Extended token detail fields for multimodal support.The addition of
TextTokens,AudioTokens, andImageTokensfields withomitemptytags appropriately extends the token tracking capabilities for multimodal responses. Theomitemptytags ensure backward compatibility by omitting zero values in JSON serialization.
277-288: LGTM!The new
TextTokensfield inResponsesResponseOutputTokenscomplements the input token details and maintains consistency with the input token structure.core/internal/testutil/chat_audio.go (2)
14-149: LGTM!The non-streaming audio test implementation is thorough and follows established testing patterns. The test properly gates execution, loads audio samples, configures retry logic, validates responses including audio data and transcripts, and provides clear logging. The format conversion testing (mp3 input → wav output) adds good coverage.
152-334: LGTM!The streaming audio test is well-implemented with proper chunk accumulation, comprehensive validation of audio data/transcript/metadata, and token usage tracking. The use of pcm16 format for streaming provides good test coverage complementary to the wav format in non-streaming tests.
core/providers/azure/azure_test.go (1)
26-62: LGTM!The Azure test configuration properly incorporates audio-related models and scenarios. The explicit disabling of TranscriptionStream with a clear comment is good practice, and the model selections are appropriate for Azure OpenAI service capabilities.
core/schemas/chatcompletions.go (5)
157-157: LGTM!The Audio field is properly added to ChatParameters as an optional pointer with appropriate JSON tagging.
224-228: LGTM!The ChatAudioParameters structure correctly mirrors OpenAI's API specification. Based on learnings, the valid values for Format and Voice are defined by OpenAI's documentation, and the project intentionally maintains direct compatibility without adding inline documentation or constants.
524-524: LGTM!The Audio field is properly added to ChatAssistantMessage, and the UnmarshalJSON method is correctly updated to include the Audio check when determining whether to populate the ChatAssistantMessage pointer. These changes work together cohesively.
Also applies to: 636-636
709-715: LGTM!The ChatAudioMessageAudio type is well-defined with appropriate fields for audio data, transcript, expiration, and ID. The structure aligns with the UI TypeScript interface shown in the relevant code snippets (ui/lib/types/logs.ts).
777-777: LGTM!The additions enable comprehensive token usage tracking across modalities:
- Audio field properly added to streaming delta responses (line 777)
- TextTokens and ImageTokens fields complement existing AudioTokens for granular multi-modal usage accounting
These changes support accurate cost calculation and usage reporting for audio-enabled chat completions.
Also applies to: 840-842, 850-850
d9c725e to
2e56871
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (11)
plugins/jsonparser/changelog.md (1)
1-1: Consider adding feature details to the changelog entry.The changelog documents a version bump but lacks description of the audio features added in this PR. Given the substantial scope of changes (audio schema, streaming, UI rendering, provider integrations), consider expanding this entry with brief feature highlights for users tracking changes.
core/providers/mistral/transcription_test.go (1)
992-1014: LGTM! Comprehensive streaming test case.The addition of
Stream: schemas.Ptr(true)correctly tests streaming behavior with all optional fields. The test appropriately verifies that the stream flag is serialized alongside other parameters.Optional enhancement: Consider adding a test case for
Stream: schemas.Ptr(false)to verify behavior when streaming is explicitly disabled. This would provide more complete coverage of the Stream field's behavior.{ name: "non-streaming request with stream false", request: &MistralTranscriptionRequest{ Model: "voxtral-mini-latest", File: []byte{0x01, 0x02, 0x03}, Language: schemas.Ptr("en"), Stream: schemas.Ptr(false), }, expectedFields: map[string]string{ "stream": "false", "model": "voxtral-mini-latest", "language": "en", }, },framework/streaming/chat.go (1)
106-135: Audio chunk accumulation logic looks sound; consider propagating any extra audio metadataThe first vs subsequent‑chunk handling for
Audio(ID/Data/ExpiresAt/Transcript) is correct and consistent with how other streaming fields are accumulated. Ifschemas.ChatAudioMessageAudio(andChatStreamResponseChoiceDelta.Audio) expose additional metadata used downstream (e.g., an audioFormator similar), consider also initializing/updating those fields here so non‑streaming and streaming chat‑audio messages carry a consistent audio shape into the UI and logs.ui/app/workspace/logs/views/logDetailsSheet.tsx (2)
37-39: Type assertion withas anyis used for audio format extraction.The repeated
as anycasts suggest theLogEntrytype may need updating to include theaudiofield inparams. Consider extending the type definition to avoid runtime type assertions.
419-428: MissingaudioFormatprop for embedding output.Other
LogChatMessageViewusages passaudioFormat, but this embedding output case does not. For consistency, consider adding the prop here as well.<LogChatMessageView message={{ role: "assistant", content: JSON.stringify( log.embedding_output.map((embedding) => embedding.embedding), null, 2, ), }} + audioFormat={audioFormat} />core/internal/testutil/transcription.go (1)
62-70: Duplicated provider/model override logic.This same pattern appears 5 times in this file and similarly in
transcription_stream.go. Consider extracting a helper function to reduce duplication.func (tc *ComprehensiveTestConfig) GetSpeechSynthesisProviderAndModel() (schemas.ModelProvider, string) { provider := tc.Provider if tc.ExternalTTSProvider != "" { provider = tc.ExternalTTSProvider } model := tc.SpeechSynthesisModel if tc.ExternalTTSModel != "" { model = tc.ExternalTTSModel } return provider, model }core/internal/testutil/transcription_stream.go (1)
65-73: Same duplicated override pattern as intranscription.go.This pattern appears 4 times in this file. The helper function suggested for
transcription.gowould also benefit this file.ui/app/workspace/logs/views/audioPlayer.tsx (1)
58-92: Variable naming could be clearer.The variable
pcmDataon line 61 holds the decoded binary data regardless of format. Consider renaming tobinaryDataoraudioDatafor clarity, since it's not always PCM data.- const binaryString = atob(base64Data); - const pcmData = Uint8Array.from(binaryString, (c) => c.charCodeAt(0)); + const binaryString = atob(base64Data); + const audioData = Uint8Array.from(binaryString, (c) => c.charCodeAt(0)); // Handle PCM16 format - convert to WAV if (audioFormat === "pcm16" || audioFormat === "pcm_s16le_16") { - const wavData = convertPCM16ToWAV(pcmData); + const wavData = convertPCM16ToWAV(audioData);core/internal/testutil/chat_audio.go (2)
32-34: Input audio format is hardcoded to "mp3".Both test functions hardcode the input audio format to "mp3" when calling
CreateAudioChatMessage. This is acceptable for test utilities, but consider whether testing multiple input formats would provide additional coverage.Also applies to: 169-172
276-278: Validation accepts either audio data OR transcript.Line 276 validates that at least one of
accumulatedAudioDataoraccumulatedTranscriptis present, using OR logic. This is lenient—providers might stream only transcript without audio data, or vice versa. Consider whether both fields should be required for a complete audio response, depending on your API contract expectations.core/internal/testutil/account.go (1)
77-78: ExternalTTSProvider and ExternalTTSModel fields are declared but unused.Lines 77-78 add
ExternalTTSProviderandExternalTTSModelfields toComprehensiveTestConfig, but these fields aren't populated in any provider configuration in this file. They may be used elsewhere in the stack or reserved for future functionality.
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
core/internal/testutil/scenarios/media/sample.mp3is excluded by!**/*.mp3
📒 Files selected for processing (45)
core/internal/testutil/account.go(9 hunks)core/internal/testutil/chat_audio.go(1 hunks)core/internal/testutil/tests.go(2 hunks)core/internal/testutil/transcription.go(6 hunks)core/internal/testutil/transcription_stream.go(5 hunks)core/internal/testutil/utils.go(4 hunks)core/internal/testutil/validation_presets.go(2 hunks)core/providers/azure/azure_test.go(2 hunks)core/providers/mistral/mistral.go(3 hunks)core/providers/mistral/mistral_test.go(2 hunks)core/providers/mistral/transcription_test.go(2 hunks)core/providers/openai/openai.go(1 hunks)core/providers/openai/openai_test.go(2 hunks)core/schemas/chatcompletions.go(7 hunks)core/schemas/mux.go(2 hunks)core/schemas/responses.go(1 hunks)core/version(1 hunks)framework/changelog.md(1 hunks)framework/modelcatalog/pricing.go(3 hunks)framework/streaming/chat.go(1 hunks)framework/version(1 hunks)plugins/governance/changelog.md(1 hunks)plugins/governance/version(1 hunks)plugins/jsonparser/changelog.md(1 hunks)plugins/jsonparser/version(1 hunks)plugins/logging/changelog.md(1 hunks)plugins/logging/version(1 hunks)plugins/maxim/changelog.md(1 hunks)plugins/maxim/version(1 hunks)plugins/mocker/changelog.md(1 hunks)plugins/mocker/version(1 hunks)plugins/otel/changelog.md(1 hunks)plugins/otel/version(1 hunks)plugins/semanticcache/changelog.md(1 hunks)plugins/semanticcache/version(1 hunks)plugins/telemetry/changelog.md(1 hunks)plugins/telemetry/version(1 hunks)transports/changelog.md(1 hunks)transports/version(1 hunks)ui/app/workspace/logs/views/audioPlayer.tsx(3 hunks)ui/app/workspace/logs/views/filters.tsx(1 hunks)ui/app/workspace/logs/views/logChatMessageView.tsx(3 hunks)ui/app/workspace/logs/views/logDetailsSheet.tsx(5 hunks)ui/lib/constants/icons.tsx(1 hunks)ui/lib/types/logs.ts(1 hunks)
✅ Files skipped from review due to trivial changes (4)
- plugins/jsonparser/version
- plugins/telemetry/changelog.md
- plugins/maxim/changelog.md
- plugins/mocker/version
🚧 Files skipped from review as they are similar to previous changes (21)
- ui/app/workspace/logs/views/filters.tsx
- plugins/semanticcache/version
- core/providers/openai/openai_test.go
- plugins/semanticcache/changelog.md
- plugins/maxim/version
- framework/modelcatalog/pricing.go
- plugins/mocker/changelog.md
- core/version
- plugins/otel/version
- ui/app/workspace/logs/views/logChatMessageView.tsx
- core/schemas/chatcompletions.go
- plugins/logging/version
- plugins/governance/changelog.md
- plugins/governance/version
- transports/version
- plugins/logging/changelog.md
- core/internal/testutil/utils.go
- ui/lib/types/logs.ts
- framework/version
- plugins/telemetry/version
- framework/changelog.md
👮 Files not reviewed due to content moderation or server errors (1)
- ui/lib/constants/icons.tsx
🧰 Additional context used
📓 Path-based instructions (1)
**
⚙️ CodeRabbit configuration file
always check the stack if there is one for the current PR. do not give localized reviews for the PR, always see all changes in the light of the whole stack of PRs (if there is a stack, if there is no stack you can continue to make localized suggestions/reviews)
Files:
core/internal/testutil/validation_presets.gocore/internal/testutil/transcription_stream.gocore/providers/mistral/mistral.goui/lib/constants/icons.tsxcore/internal/testutil/tests.gocore/internal/testutil/chat_audio.goplugins/jsonparser/changelog.mdcore/providers/azure/azure_test.gocore/schemas/mux.gocore/internal/testutil/transcription.gocore/providers/mistral/mistral_test.gocore/internal/testutil/account.goui/app/workspace/logs/views/audioPlayer.tsxcore/providers/openai/openai.goui/app/workspace/logs/views/logDetailsSheet.tsxcore/schemas/responses.gocore/providers/mistral/transcription_test.goframework/streaming/chat.goplugins/otel/changelog.mdtransports/changelog.md
🧠 Learnings (4)
📚 Learning: 2025-12-09T17:07:42.007Z
Learnt from: qwerty-dvorak
Repo: maximhq/bifrost PR: 1006
File: core/schemas/account.go:9-18
Timestamp: 2025-12-09T17:07:42.007Z
Learning: In core/schemas/account.go, the HuggingFaceKeyConfig field within the Key struct is currently unused and reserved for future Hugging Face inference endpoint deployments. Do not flag this field as missing from OpenAPI documentation or require its presence in the API spec until the feature is actively implemented and used. When the feature is added, update the OpenAPI docs accordingly; otherwise, treat this field as non-breaking and not part of the current API surface.
Applied to files:
core/internal/testutil/validation_presets.gocore/internal/testutil/transcription_stream.gocore/providers/mistral/mistral.gocore/internal/testutil/tests.gocore/internal/testutil/chat_audio.gocore/providers/azure/azure_test.gocore/schemas/mux.gocore/internal/testutil/transcription.gocore/providers/mistral/mistral_test.gocore/internal/testutil/account.gocore/providers/openai/openai.gocore/schemas/responses.gocore/providers/mistral/transcription_test.goframework/streaming/chat.go
📚 Learning: 2025-12-15T10:16:21.909Z
Learnt from: qwerty-dvorak
Repo: maximhq/bifrost PR: 1006
File: core/providers/huggingface/huggingface_test.go:12-63
Timestamp: 2025-12-15T10:16:21.909Z
Learning: In provider tests under core/providers/<provider>/*_test.go, do not require or flag the use of defer for Shutdown(); instead call client.Shutdown() at the end of each test function. This pattern appears consistent across all provider tests. Apply this rule only within this path; for other tests or resources, defer may still be appropriate.
Applied to files:
core/providers/azure/azure_test.gocore/providers/mistral/mistral_test.gocore/providers/mistral/transcription_test.go
📚 Learning: 2025-12-11T11:58:25.307Z
Learnt from: Pratham-Mishra04
Repo: maximhq/bifrost PR: 1000
File: core/providers/openai/responses.go:42-84
Timestamp: 2025-12-11T11:58:25.307Z
Learning: In core/providers/openai/responses.go (and related OpenAI response handling), document and enforce the API format constraint: if ResponsesReasoning != nil and the response contains content blocks, all content blocks should be treated as reasoning blocks by default. Implement type guards or parsing logic accordingly, and add unit tests to verify that when ResponsesReasoning is non-nil, content blocks are labeled as reasoning blocks. Include clear comments in the code explaining the rationale and ensure downstream consumers rely on this behavior.
Applied to files:
core/providers/openai/openai.go
📚 Learning: 2025-12-14T14:43:30.902Z
Learnt from: Radheshg04
Repo: maximhq/bifrost PR: 980
File: core/providers/openai/images.go:10-22
Timestamp: 2025-12-14T14:43:30.902Z
Learning: Enforce the OpenAI image generation SSE event type values across the OpenAI image flow in the repository: use "image_generation.partial_image" for partial chunks, "image_generation.completed" for the final result, and "error" for errors. Apply this consistently in schemas, constants, tests, accumulator routing, and UI code within core/providers/openai (and related Go files) to ensure uniform event typing and avoid mismatches.
Applied to files:
core/providers/openai/openai.go
🧬 Code graph analysis (10)
core/internal/testutil/validation_presets.go (1)
core/internal/testutil/response_validation.go (1)
ResponseExpectations(18-43)
core/internal/testutil/transcription_stream.go (3)
core/internal/testutil/utils.go (2)
GetProviderVoice(39-87)GenerateTTSAudioForTest(568-640)core/schemas/speech.go (1)
BifrostSpeechRequest(9-16)core/schemas/models.go (1)
Model(109-129)
core/providers/mistral/mistral.go (3)
core/providers/mistral/types.go (2)
MistralTranscriptionStreamEventType(88-88)MistralTranscriptionStreamEventDone(98-98)core/schemas/bifrost.go (1)
BifrostContextKeyStreamEndIndicator(126-126)core/schemas/transcriptions.go (1)
TranscriptionStreamResponseTypeDone(115-115)
core/internal/testutil/tests.go (1)
core/internal/testutil/chat_audio.go (2)
RunChatAudioTest(14-149)RunChatAudioStreamTest(152-318)
core/providers/azure/azure_test.go (3)
core/schemas/provider.go (1)
Provider(314-361)core/schemas/bifrost.go (2)
Azure(36-36)Fallback(145-148)core/schemas/models.go (1)
Model(109-129)
core/schemas/mux.go (2)
core/schemas/responses.go (1)
ResponsesResponseOutputTokens(277-288)core/schemas/chatcompletions.go (1)
ChatCompletionTokensDetails(849-861)
core/internal/testutil/transcription.go (2)
core/internal/testutil/utils.go (4)
GetProviderVoice(39-87)GenerateTTSAudioForTest(568-640)TTSTestTextBasic(20-20)TTSTestTextMedium(23-23)core/schemas/speech.go (1)
BifrostSpeechRequest(9-16)
core/providers/mistral/mistral_test.go (1)
core/schemas/bifrost.go (1)
OpenAI(35-35)
core/providers/openai/openai.go (1)
core/schemas/chatcompletions.go (1)
ChatStreamResponseChoice(768-770)
ui/app/workspace/logs/views/logDetailsSheet.tsx (2)
ui/app/workspace/logs/views/logEntryDetailsView.tsx (1)
LogEntryDetailsView(15-49)ui/app/workspace/logs/views/logChatMessageView.tsx (1)
LogChatMessageView(67-214)
⏰ Context from checks skipped due to timeout of 900000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (25)
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
🔇 Additional comments (23)
plugins/otel/changelog.md (1)
1-1: Version bump entry is appropriate and correctly formatted.This changelog entry correctly documents the version bumps for dependencies affected by the PR stack. Since plugins/otel is not directly involved in the audio support feature, a simple version bump entry is the right approach.
transports/changelog.md (2)
9-9: Version bump entry is appropriate.The chore entry correctly documents dependency version updates. Since transports is not directly affected by the audio support feature, only a version bump entry is needed, consistent with plugins/otel.
8-8: Clarify the relationship of the docs link update to this PR.The docs link update on line 8 appears to be unrelated to the audio support feature (which doesn't affect transports per the PR objectives). Please confirm whether this was intended to be included in this PR or should be part of a separate commit/PR.
core/providers/mistral/transcription_test.go (1)
978-990: LGTM! Stream flag correctly added to test case.The addition of
Stream: schemas.Ptr(true)is appropriate for this "basic streaming request" test case. The expected form field correctly includes"stream": "true", ensuring the flag is properly serialized into the multipart body.core/schemas/mux.go (1)
652-671: Token usage conversions now correctly include text/image token breakdownsThe added
TextTokens/ImageTokenswiring betweenBifrostLLMUsageandResponsesResponseUsageis symmetric in both directions and keeps existing audio/cached/other token fields intact, so downstream usage accounting should see richer modality breakdowns without regressions.Also applies to: 688-707
core/schemas/responses.go (1)
267-288: Responses token structs extended in line with new usage mappingsAdding
TextTokens/ImageTokenshere (and markingaudio_tokens/cached_tokensasomitempty) matches the new conversions inmux.goand lets callers consume a more granular token breakdown while keeping JSON output compact when counts are zero.core/internal/testutil/validation_presets.go (1)
202-215: ChatAudio expectations and scenario wiring are consistent with existing presetsThe
ChatAudioExpectationspreset (no text content, single choice, usage/timestamps/model/latency,response_type: "chat_audio") and the new"ChatAudio"branch inGetExpectationsForScenarioline up with the chat‑audio test helper semantics and the broader expectations framework.Also applies to: 298-300
core/providers/openai/openai.go (1)
1047-1053: Streaming now correctly emits chunks for audio‑only deltasIncluding
Delta.Audio != nilin the emission predicate (under the existing non‑nil choice/delta guards) lets pure audio chunks flow through the streaming pipeline, which is required to drive the new audio accumulator and chat‑audio stream tests. The parentheses keep the boolean logic unambiguous.core/internal/testutil/tests.go (1)
67-69: ChatAudio and ChatAudioStream are cleanly integrated into the comprehensive test suiteAdding
RunChatAudioTest/RunChatAudioStreamTestto the scenario list and mirroring that with"ChatAudio"/"ChatAudioStream"entries in the summary (gated onScenarios.ChatAudioandChatAudioModel) keeps execution and reporting aligned while still auto‑skipping when the provider/model doesn’t support chat‑audio.Also applies to: 121-123
core/providers/mistral/mistral.go (2)
573-578: Stream completion handling looks correct.The done event handling properly sets latency, marks the stream end indicator in context, and ensures the response type is explicitly set to
Done. This aligns with the broader streaming pattern in the codebase.
470-474: Dual end-of-stream event check is intentional.The code checks for both
MistralTranscriptionStreamEventDone("transcription.done") and"transcript.text.done"(a local schema constant). This pattern is intentional and documented in the comment at line 573. The check for"transcript.text.done"as a raw event handles edge cases, test scenarios, or potential future API changes, even though Mistral's API natively emits "transcription.done".core/providers/azure/azure_test.go (1)
26-61: Test configuration properly enables audio scenarios.The new fields (
ChatAudioModel,SpeechSynthesisModel,TranscriptionModel) and scenario flags are correctly configured. The explicitTranscriptionStream: falsewith the explanatory comment is good documentation practice.core/providers/mistral/mistral_test.go (2)
31-35: External TTS configuration for Mistral transcription tests.Using OpenAI as the external TTS provider for Mistral's transcription round-trip tests is appropriate since Mistral supports transcription but not speech synthesis. This enables comprehensive testing of Mistral's transcription capabilities.
52-53: Enabling TranscriptionStream and ListModels tests.Good additions to expand test coverage for Mistral.
ui/app/workspace/logs/views/audioPlayer.tsx (2)
128-138: Extension determination logic is correct.The file extension mapping covers the expected formats and correctly maps PCM16 to WAV since the conversion produces WAV output.
15-56: No action needed. ThesampleRate = 24000default is the correct specification for OpenAI's audio APIs and is already configurable via function parameters if other sample rates become necessary.core/internal/testutil/chat_audio.go (3)
14-18: LGTM! Proper gating for provider support.The early return pattern when chat audio is not supported by the provider is clean and prevents unnecessary test execution.
Also applies to: 151-156
62-90: LGTM! Well-structured operation function with proper error handling.The chat operation function correctly handles nil responses by creating a custom BifrostError, which provides clear error messages during test failures.
204-262: LGTM! Comprehensive streaming accumulation.The streaming test properly accumulates all audio-related fields (data, transcript, ID, expiresAt) and usage metadata across chunks. The loop structure and error handling are sound.
core/internal/testutil/account.go (4)
55-55: LGTM! New fields support chat audio functionality.The additions to
TestScenarios(ChatAudio) andComprehensiveTestConfig(ChatAudioModel, ExternalTTSProvider, ExternalTTSModel) are well-placed and consistent with the PR's objectives to add audio support.Also applies to: 69-69, 77-78
167-185: Second Bedrock key configuration for multi-ARN testing.This second Bedrock key uses a different ARN environment variable (
AWS_BEDROCK_ARNat line 175 vs default at line 158) and includes batch API support (line 184). This appears intentional for testing multiple Bedrock deployment scenarios.
224-237: Second Azure key configuration now includes audio model.The configuration now includes
gpt-4o-mini-audio-previewdeployment (line 234) with API version2025-01-01-preview(line 230), addressing the previous review concern about model-to-API-version consistency.
607-607: LGTM! OpenAI and Azure configurations properly enable chat audio.Both provider configurations correctly:
- Specify chat audio models (gpt-4o-mini-audio-preview)
- Enable the ChatAudio scenario flag
- Azure additionally configures SpeechSynthesisModel and TranscriptionModel for audio-related tests
Also applies to: 639-639, 745-750, 764-767, 780-780
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (8)
core/providers/mistral/mistral.go (1)
573-578: Consider extracting the string literal to a constant.The string
"transcript.text.done"is used as a fallback event type check. For maintainability, consider defining this as a constant alongsideMistralTranscriptionStreamEventDoneintypes.goto keep all event type definitions co-located.// Check for done event (handle both "transcription.done" and "transcript.text.done") -if MistralTranscriptionStreamEventType(eventType) == MistralTranscriptionStreamEventDone || eventType == "transcript.text.done" { +if MistralTranscriptionStreamEventType(eventType) == MistralTranscriptionStreamEventDone || eventType == string(MistralTranscriptionStreamEventTextDone) {This assumes adding a new constant in
types.go:MistralTranscriptionStreamEventTextDone MistralTranscriptionStreamEventType = "transcript.text.done"core/internal/testutil/transcription.go (1)
62-70: Consider extracting the TTS provider/model derivation to a helper function.This pattern is repeated 5 times throughout the file. A helper would reduce duplication and centralize the override logic.
// In testutil package, add helper: func GetSpeechSynthesisConfig(testConfig ComprehensiveTestConfig) (provider schemas.ModelProvider, model string) { provider = testConfig.Provider if testConfig.ExternalTTSProvider != "" { provider = testConfig.ExternalTTSProvider } model = testConfig.SpeechSynthesisModel if testConfig.ExternalTTSModel != "" { model = testConfig.ExternalTTSModel } return }Then replace the repeated blocks with:
speechSynthesisProvider, speechSynthesisModel := GetSpeechSynthesisConfig(testConfig)core/internal/testutil/transcription_stream.go (1)
65-73: Same helper extraction opportunity as in transcription.go.This TTS provider/model derivation pattern is repeated 4 times in this file (and 5 times in
transcription.go). Extracting to a shared helper would improve maintainability across both files.core/internal/testutil/account.go (1)
167-185: Confirm Bedrock dual‑key configuration and intended split between ARN env vars.The second Bedrock key uses
AWS_BEDROCK_ARN(vsAWS_ARNin the first key), addsclaude-3.5-sonnet, and is markedUseForBatchAPI, while both keys share overlapping Claude deployments. That’s a reasonable pattern if you’re deliberately testing separate Bedrock roles/accounts (e.g., one for real‑time, one for batch), but if both ARNs point at the same account this may be redundant and could be simplified to a single key.Please confirm the two ARNs are intentionally distinct; if not, consider collapsing to one key and keeping the
claude-3.5-sonnetdeployment andUseForBatchAPIthere.core/internal/testutil/validation_presets.go (1)
202-215: ChatAudio expectations are sensible; consider adding explicit “has audio” hints if you need stricter checks.
ChatAudioExpectationsand the"ChatAudio"branch inGetExpectationsForScenarioline up with how chat‑audio responses are modeled (choice present, non‑text content, usage/meta required). If you later tighten validation, you could mirror the speech/transcription presets by adding ProviderSpecific flags likeshould_have_audioor a minimal byte length to distinguish “audio present but empty” cases, but it’s fine to defer that until you have concrete needs.Also applies to: 298-300
ui/app/workspace/logs/views/logDetailsSheet.tsx (1)
37-40: Audio request metadata is surfaced cleanly and threaded through to message rendering.Extracting
audioFormatfromlog.params(with theextra_params.audiofallback), rendering Audio Format/Voice explicitly, and excluding theaudiokey from the generic params grid avoids duplication and keeps audio details discoverable. PassingaudioFormatinto allLogChatMessageViewusages that can display audio output keeps the UI and playback logic aligned with the original request.If you find yourself extending audio params further, you might consider hoisting
const params = log.params as any;once near the top and reusing it instead of repeating the cast in multiple spots, but that’s purely cosmetic.Also applies to: 122-133, 137-139, 379-380, 388-389, 407-408
ui/app/workspace/logs/views/audioPlayer.tsx (1)
5-8: AudioPlayer’s format‑aware blob handling looks solid; a couple of edge cases are worth keeping in mind.The format‑aware
createAudioBlob(including PCM16→WAV conversion) is a good fit for the new chat‑audio flows, and wiring the same helper into both play and download keeps behavior consistent with theaudioFormatpassed from logs.Two small, non‑blocking nits you may want to consider later:
- If
srcever comes through as a full data URL (data:audio/...;base64,xxx) rather than raw base64,atobwill throw; trimming off anydata:*;base64,prefix before decoding would make this more robust.- When users pause audio and never let it finish, the current logic doesn’t revoke the object URL created for that play; the leak is minor in typical log‑inspection usage but could be eliminated by tracking and revoking the last URL on each new play.
Neither of these needs to block this PR, but they’re easy hardening wins if you touch this component again.
Also applies to: 10-11, 16-56, 58-86, 94-104, 120-143
core/internal/testutil/chat_audio.go (1)
189-206: Streaming test uses different audio format than non-streaming.Non-streaming test (line 71) uses
Format: "wav"while streaming test (line 198) usesFormat: "pcm16". This appears intentional to test different output formats, but consider adding a comment explaining this difference for clarity.
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
core/internal/testutil/scenarios/media/sample.mp3is excluded by!**/*.mp3
📒 Files selected for processing (45)
core/internal/testutil/account.go(9 hunks)core/internal/testutil/chat_audio.go(1 hunks)core/internal/testutil/tests.go(2 hunks)core/internal/testutil/transcription.go(6 hunks)core/internal/testutil/transcription_stream.go(5 hunks)core/internal/testutil/utils.go(4 hunks)core/internal/testutil/validation_presets.go(2 hunks)core/providers/azure/azure_test.go(2 hunks)core/providers/mistral/mistral.go(3 hunks)core/providers/mistral/mistral_test.go(2 hunks)core/providers/mistral/transcription_test.go(2 hunks)core/providers/openai/openai.go(1 hunks)core/providers/openai/openai_test.go(2 hunks)core/schemas/chatcompletions.go(7 hunks)core/schemas/mux.go(2 hunks)core/schemas/responses.go(1 hunks)core/version(1 hunks)framework/changelog.md(1 hunks)framework/modelcatalog/pricing.go(3 hunks)framework/streaming/chat.go(1 hunks)framework/version(1 hunks)plugins/governance/changelog.md(1 hunks)plugins/governance/version(1 hunks)plugins/jsonparser/changelog.md(1 hunks)plugins/jsonparser/version(1 hunks)plugins/logging/changelog.md(1 hunks)plugins/logging/version(1 hunks)plugins/maxim/changelog.md(1 hunks)plugins/maxim/version(1 hunks)plugins/mocker/changelog.md(1 hunks)plugins/mocker/version(1 hunks)plugins/otel/changelog.md(1 hunks)plugins/otel/version(1 hunks)plugins/semanticcache/changelog.md(1 hunks)plugins/semanticcache/version(1 hunks)plugins/telemetry/changelog.md(1 hunks)plugins/telemetry/version(1 hunks)transports/changelog.md(1 hunks)transports/version(1 hunks)ui/app/workspace/logs/views/audioPlayer.tsx(3 hunks)ui/app/workspace/logs/views/filters.tsx(1 hunks)ui/app/workspace/logs/views/logChatMessageView.tsx(3 hunks)ui/app/workspace/logs/views/logDetailsSheet.tsx(5 hunks)ui/lib/constants/icons.tsx(1 hunks)ui/lib/types/logs.ts(1 hunks)
✅ Files skipped from review due to trivial changes (3)
- plugins/mocker/changelog.md
- plugins/telemetry/changelog.md
- plugins/jsonparser/changelog.md
🚧 Files skipped from review as they are similar to previous changes (21)
- plugins/logging/changelog.md
- framework/version
- plugins/telemetry/version
- framework/streaming/chat.go
- plugins/otel/version
- ui/app/workspace/logs/views/filters.tsx
- plugins/semanticcache/version
- plugins/governance/changelog.md
- plugins/mocker/version
- core/internal/testutil/tests.go
- plugins/otel/changelog.md
- framework/changelog.md
- plugins/governance/version
- core/providers/azure/azure_test.go
- core/providers/openai/openai_test.go
- plugins/semanticcache/changelog.md
- core/version
- ui/lib/types/logs.ts
- plugins/maxim/version
- core/schemas/mux.go
- plugins/logging/version
👮 Files not reviewed due to content moderation or server errors (1)
- ui/lib/constants/icons.tsx
🧰 Additional context used
📓 Path-based instructions (1)
**
⚙️ CodeRabbit configuration file
always check the stack if there is one for the current PR. do not give localized reviews for the PR, always see all changes in the light of the whole stack of PRs (if there is a stack, if there is no stack you can continue to make localized suggestions/reviews)
Files:
plugins/jsonparser/versioncore/providers/openai/openai.gocore/internal/testutil/validation_presets.goui/lib/constants/icons.tsxcore/providers/mistral/mistral.gocore/internal/testutil/chat_audio.gocore/internal/testutil/transcription.gocore/providers/mistral/mistral_test.gocore/internal/testutil/utils.goframework/modelcatalog/pricing.gocore/providers/mistral/transcription_test.goui/app/workspace/logs/views/logChatMessageView.tsxcore/internal/testutil/account.gocore/schemas/responses.gocore/schemas/chatcompletions.gotransports/changelog.mdcore/internal/testutil/transcription_stream.goui/app/workspace/logs/views/audioPlayer.tsxui/app/workspace/logs/views/logDetailsSheet.tsxplugins/maxim/changelog.mdtransports/version
🧠 Learnings (5)
📚 Learning: 2025-12-09T17:07:42.007Z
Learnt from: qwerty-dvorak
Repo: maximhq/bifrost PR: 1006
File: core/schemas/account.go:9-18
Timestamp: 2025-12-09T17:07:42.007Z
Learning: In core/schemas/account.go, the HuggingFaceKeyConfig field within the Key struct is currently unused and reserved for future Hugging Face inference endpoint deployments. Do not flag this field as missing from OpenAPI documentation or require its presence in the API spec until the feature is actively implemented and used. When the feature is added, update the OpenAPI docs accordingly; otherwise, treat this field as non-breaking and not part of the current API surface.
Applied to files:
core/providers/openai/openai.gocore/internal/testutil/validation_presets.gocore/providers/mistral/mistral.gocore/internal/testutil/chat_audio.gocore/internal/testutil/transcription.gocore/providers/mistral/mistral_test.gocore/internal/testutil/utils.goframework/modelcatalog/pricing.gocore/providers/mistral/transcription_test.gocore/internal/testutil/account.gocore/schemas/responses.gocore/schemas/chatcompletions.gocore/internal/testutil/transcription_stream.go
📚 Learning: 2025-12-11T11:58:25.307Z
Learnt from: Pratham-Mishra04
Repo: maximhq/bifrost PR: 1000
File: core/providers/openai/responses.go:42-84
Timestamp: 2025-12-11T11:58:25.307Z
Learning: In core/providers/openai/responses.go (and related OpenAI response handling), document and enforce the API format constraint: if ResponsesReasoning != nil and the response contains content blocks, all content blocks should be treated as reasoning blocks by default. Implement type guards or parsing logic accordingly, and add unit tests to verify that when ResponsesReasoning is non-nil, content blocks are labeled as reasoning blocks. Include clear comments in the code explaining the rationale and ensure downstream consumers rely on this behavior.
Applied to files:
core/providers/openai/openai.go
📚 Learning: 2025-12-14T14:43:30.902Z
Learnt from: Radheshg04
Repo: maximhq/bifrost PR: 980
File: core/providers/openai/images.go:10-22
Timestamp: 2025-12-14T14:43:30.902Z
Learning: Enforce the OpenAI image generation SSE event type values across the OpenAI image flow in the repository: use "image_generation.partial_image" for partial chunks, "image_generation.completed" for the final result, and "error" for errors. Apply this consistently in schemas, constants, tests, accumulator routing, and UI code within core/providers/openai (and related Go files) to ensure uniform event typing and avoid mismatches.
Applied to files:
core/providers/openai/openai.go
📚 Learning: 2025-12-15T10:16:21.909Z
Learnt from: qwerty-dvorak
Repo: maximhq/bifrost PR: 1006
File: core/providers/huggingface/huggingface_test.go:12-63
Timestamp: 2025-12-15T10:16:21.909Z
Learning: In provider tests under core/providers/<provider>/*_test.go, do not require or flag the use of defer for Shutdown(); instead call client.Shutdown() at the end of each test function. This pattern appears consistent across all provider tests. Apply this rule only within this path; for other tests or resources, defer may still be appropriate.
Applied to files:
core/providers/mistral/mistral_test.gocore/providers/mistral/transcription_test.go
📚 Learning: 2025-12-17T08:44:08.788Z
Learnt from: Pratham-Mishra04
Repo: maximhq/bifrost PR: 1114
File: core/schemas/chatcompletions.go:224-228
Timestamp: 2025-12-17T08:44:08.788Z
Learning: In core/schemas/chatcompletions.go, ensure the schema structures mirror OpenAI's API specifications exactly. Use the valid values for fields like ChatAudioParameters.Format and ChatAudioParameters.Voice as defined by OpenAI's documentation, and avoid adding additional inline documentation or constants to maintain direct compatibility with OpenAI's API.
Applied to files:
core/schemas/chatcompletions.go
🧬 Code graph analysis (11)
core/providers/openai/openai.go (1)
core/schemas/chatcompletions.go (1)
ChatStreamResponseChoice(768-770)
core/internal/testutil/validation_presets.go (1)
core/internal/testutil/response_validation.go (1)
ResponseExpectations(18-43)
core/providers/mistral/mistral.go (3)
core/providers/mistral/types.go (2)
MistralTranscriptionStreamEventType(88-88)MistralTranscriptionStreamEventDone(98-98)core/schemas/bifrost.go (1)
BifrostContextKeyStreamEndIndicator(126-126)core/schemas/transcriptions.go (1)
TranscriptionStreamResponseTypeDone(115-115)
core/internal/testutil/chat_audio.go (4)
core/internal/testutil/utils.go (3)
GetSampleAudioBase64(220-233)CreateAudioChatMessage(312-329)GetErrorMessage(642-675)core/schemas/chatcompletions.go (8)
ChatMessage(477-486)BifrostChatResponse(27-42)BifrostChatRequest(12-19)ChatParameters(156-186)ChatAudioParameters(225-228)ChatAssistantMessage(634-641)BifrostLLMUsage(830-837)ChatStreamResponseChoice(768-770)core/internal/testutil/test_retry_framework.go (6)
GetTestRetryConfigForScenario(1116-1148)TestRetryContext(168-173)ChatRetryConfig(186-193)ChatRetryCondition(126-129)StreamingRetryConfig(819-839)WithStreamRetry(580-688)core/schemas/bifrost.go (3)
ChatCompletionRequest(91-91)BifrostStream(426-433)ChatCompletionStreamRequest(92-92)
core/internal/testutil/transcription.go (2)
core/internal/testutil/utils.go (2)
GetProviderVoice(39-87)GenerateTTSAudioForTest(568-640)core/schemas/speech.go (1)
BifrostSpeechRequest(9-16)
core/providers/mistral/mistral_test.go (1)
core/schemas/bifrost.go (1)
OpenAI(35-35)
core/internal/testutil/utils.go (2)
core/schemas/chatcompletions.go (7)
ChatMessage(477-486)ChatMessageRoleUser(470-470)ChatMessageContent(532-535)ChatContentBlock(598-605)ChatContentBlockTypeText(590-590)ChatContentBlockTypeInputAudio(592-592)ChatInputAudio(616-619)core/schemas/utils.go (1)
Ptr(16-18)
framework/modelcatalog/pricing.go (2)
core/providers/perplexity/types.go (1)
Usage(69-78)core/schemas/chatcompletions.go (1)
BifrostLLMUsage(830-837)
ui/app/workspace/logs/views/logChatMessageView.tsx (2)
core/schemas/chatcompletions.go (1)
ChatMessage(477-486)ui/lib/types/logs.ts (1)
ChatMessage(108-118)
core/schemas/chatcompletions.go (1)
ui/lib/types/logs.ts (1)
ChatAudioMessageAudio(120-125)
ui/app/workspace/logs/views/logDetailsSheet.tsx (2)
ui/app/workspace/logs/views/logEntryDetailsView.tsx (1)
LogEntryDetailsView(15-49)ui/app/workspace/logs/views/logChatMessageView.tsx (1)
LogChatMessageView(67-214)
⏰ Context from checks skipped due to timeout of 900000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (30)
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
🔇 Additional comments (32)
transports/changelog.md (1)
8-9: Verify version bump documentation aligns with PR scope and stack coordination.Line 9 documents version bumps for core and framework, but the main PR objective is adding audio support for chat completions. Given that this is a stacked PR (#1114) with dependent PRs in the stack, please verify:
- Audio feature documentation: Should the audio support feature have an explicit changelog entry in this file, or is it documented in a changelog from another PR in the stack?
- Version bump scope: Confirm that documenting core and framework version bumps in
transports/changelog.mdis intentional and not duplicative of entries incore/changelog.mdorframework/changelog.md.- Version accuracy: Verify that the version numbers (core 1.2.39, framework 1.1.49) match the actual version files updated by this PR and are consistent across the stack.
transports/version (1)
1-1: Verify version bump appropriateness for the stacked changes.The version increment is syntactically correct. However, given that this PR stack introduces significant new functionality (audio support for chat completions, new schema fields, streaming capabilities, and UI components), please confirm that a patch version bump (1.3.49 → 1.3.50) aligns with your project's versioning strategy. If following semantic versioning, new features typically warrant a minor version bump (e.g., 1.4.0).
Based on coding guidelines, reviewing in context of the entire PR stack.
framework/modelcatalog/pricing.go (3)
42-51: LGTM—enables token-based pricing for speech responses.The nil check safely handles cases where usage data is unavailable and correctly constructs
BifrostLLMUsagewhen present, enabling accurate cost calculation for speech operations.
97-99: LGTM—audioSeconds extraction is correct.The extraction correctly references
TranscriptionStreamResponse.Usage.Seconds(consistent with the streaming response context), and the past issue about incorrect field reference has been properly addressed.
76-78: Correctly extracts audio duration for transcription pricing.This change enables duration-based pricing for transcription operations when
Secondsis reported in usage. The extraction is appropriate—onlyTranscriptionResponseandTranscriptionStreamResponseprovide duration data via theSecondsfield in theirUsagestructs, whileSpeechUsagecontains only token counts and intentionally remains token-based for pricing.plugins/jsonparser/version (1)
1-1: LGTM!Version bump aligns with the broader audio support feature additions in this PR stack.
plugins/maxim/changelog.md (1)
1-1: LGTM!Changelog entry properly documents the version updates for core and framework.
core/providers/mistral/mistral_test.go (2)
31-35: LGTM!The external TTS provider configuration correctly enables Mistral transcription tests by using OpenAI for TTS generation, since Mistral doesn't natively support speech synthesis but does support transcription via
voxtral-mini-latest.
52-53: LGTM!Enabling
TranscriptionStreamandListModelsscenarios properly expands test coverage for Mistral's audio capabilities.core/providers/mistral/mistral.go (1)
470-474: LGTM!Breaking early after detecting a done event prevents unnecessary waiting and properly terminates the stream processing loop. The dual-check for both
MistralTranscriptionStreamEventDoneand"transcript.text.done"ensures compatibility with different event formats from the Mistral API.core/internal/testutil/transcription.go (1)
73-76: LGTM!The derived
speechSynthesisProvideris correctly used for voice selection and TTS request construction, ensuring providers without native TTS can leverage external TTS services for transcription round-trip testing.core/providers/mistral/transcription_test.go (3)
983-984: LGTM!The
Streamfield is correctly set totruefor streaming request tests, and the expected form fields properly validate that"stream": "true"is included in the multipart body.
1276-1334: Good context cancellation test coverage.This test properly validates that the streaming transcription respects context cancellation by using a short timeout (500ms) against a slow server (5s delay), ensuring the client doesn't hang indefinitely.
1467-1475: LGTM!The
testLoggeris a minimal no-op implementation suitable for unit tests where logging output isn't needed.core/internal/testutil/transcription_stream.go (2)
76-79: LGTM!The derived provider and model are correctly applied to voice selection and TTS request construction for streaming transcription tests.
276-328: Solid round-trip validation logic.The 50% word-match threshold with detailed word-by-word comparison logging provides good debugging capability while allowing for transcription variations across providers.
core/internal/testutil/account.go (1)
55-56: ChatAudio flags and models are wired consistently across OpenAI and Azure test configs.The new
ChatAudioflag plusChatAudioModelfields are threaded cleanly intoTestScenariosandComprehensiveTestConfig, and the OpenAI/Azure entries inAllProviderConfigsuse them in a way that should integrate smoothly with the new chat‑audio test helpers. Other providers safely default these fields to zero values, so there’s no regression risk in existing scenarios.Also applies to: 69-79, 597-608, 639-640, 745-751, 780-781
core/providers/openai/openai.go (1)
1047-1052: IncludingDelta.Audioin the streaming emit condition is the right extension for chat audio.Treating non‑nil
Delta.Audiothe same as content/tool‑call deltas ensures audio‑only chunks advancechunkIndexand get emitted, without impacting existing text/tool streaming behavior.core/internal/testutil/utils.go (1)
219-233: Audio test helpers are consistent with existing patterns and schema types.
GetSampleAudioBase64mirrors the existing image‑fixture loader and returns a clean base64 payload suitable forChatInputAudio.Data, andCreateAudioChatMessagecorrectly composes a user message with both text andinput_audioblocks using the provided format hint. This should plug into the new chat‑audio tests without surprising existing helpers.Also applies to: 312-329
ui/app/workspace/logs/views/logChatMessageView.tsx (1)
4-9: Audio input/output rendering is wired correctly into the chat message view.The new
audioFormatprop,input_audioblock rendering, and “Audio Output” section (transcript +AudioPlayer+ guarded expiry display) align with the extendedChatMessageshape and avoid earlierInvalid Datepitfalls. This should give clear visibility into both the audio payload and its metadata in log views without impacting existing text/tool/annotation handling.Also applies to: 50-62, 67-68, 184-207
core/schemas/responses.go (1)
267-278: LGTM! Token detail fields extended consistently.The additions of
TextTokensandImageTokensto input tokens andTextTokensto output tokens align with the broader audio support feature and match the corresponding changes incore/schemas/chatcompletions.go. Theomitemptytags are appropriate for these optional detail fields.core/internal/testutil/chat_audio.go (4)
62-90: Non-streaming audio request setup looks correct.The request correctly sets:
Modalities: []string{"text", "audio"}for audio outputAudioparameters with voice and output formatMaxCompletionTokensto limit response sizeThe error handling properly wraps nil responses in a
BifrostError.
216-262: Audio stream accumulation logic is well-structured.The code correctly:
- Accumulates audio data and transcript across chunks using
strings.Builder- Captures metadata (ID, ExpiresAt) when present
- Handles the delta structure from
ChatStreamResponseChoiceDelta.Audio- Tracks final usage for validation
275-290: Validation allows either audio data or transcript.The validation at line 276-278 requires at least one of audio data or transcript to be present. This is pragmatic since some providers may stream transcript before/without audio data, but consider whether this meets the test's intent of validating full audio response capability.
36-49: ChatAudio scenario falls through to default retry config.
GetTestRetryConfigForScenario("ChatAudio", testConfig)has no explicit case in the switch statement and falls through todefault, returningDefaultTestRetryConfig(). Given that other audio-related scenarios likeSpeechSynthesis,SpeechSynthesisStream, andTranscriptionhave dedicated retry configs, verify whetherChatAudioshould also have a custom retry config to account for audio processing latency, or if using the default config is intentional.core/schemas/chatcompletions.go (7)
156-181: ChatParameters extended with audio support.The
Audio *ChatAudioParametersfield is correctly added to enable audio configuration in chat requests. The field ordering change (alphabetical with Audio first) maintains consistency.
224-228: ChatAudioParameters mirrors OpenAI's API spec.Based on learnings, this struct intentionally mirrors OpenAI's API specifications without additional inline documentation or constants, maintaining direct compatibility.
522-526: Unmarshalling correctly includes Audio field check.The condition now includes
assistantMsg.Audio != nilto properly detect when an assistant message contains audio data, ensuring theChatAssistantMessageis populated correctly during JSON unmarshalling.
633-641: ChatAssistantMessage extended with Audio field.The
Audio *ChatAudioMessageAudiofield is properly added to support audio responses in assistant messages.
709-715: ChatAudioMessageAudio struct is consistent with UI types.The struct fields match the TypeScript interface in
ui/lib/types/logs.ts:119-124:
id→IDdata→Dataexpires_at→ExpiresAttranscript→TranscriptThis ensures frontend-backend type consistency.
773-781: Streaming delta includes Audio field.The
Audio *ChatAudioMessageAudiofield inChatStreamResponseChoiceDeltaenables streaming audio responses, which is handled by the accumulation logic in the test utilities and the streaming framework.
839-850: Token detail structs extended with TextTokens and ImageTokens.The additions align with the corresponding changes in
core/schemas/responses.go, providing consistent token tracking across chat completions and responses APIs.
1460f5d to
bcc2067
Compare
2e56871 to
1358dd1
Compare
1358dd1 to
5c049c8
Compare
bcc2067 to
4490ef2
Compare
Merge activity
|

Summary
Add support for chat audio input/output functionality, enabling models to process audio inputs and generate audio responses.
Changes
ChatAudioflag to test scenarios to indicate audio input/output supportChatAudioModelfield to test configurations for providers that support audioType of change
Affected areas
How to test
Test the chat audio functionality with providers that support it:
Screenshots/Recordings
N/A
Breaking changes
Related issues
N/A
Security considerations
Audio data is handled similarly to other sensitive content in the system.
Checklist