Skip to content

Conversation

@Pratham-Mishra04
Copy link
Collaborator

Summary

Add support for chat audio input/output functionality, enabling models to process audio inputs and generate audio responses.

Changes

  • Added ChatAudio flag to test scenarios to indicate audio input/output support
  • Added ChatAudioModel field to test configurations for providers that support audio
  • Implemented new test cases for chat audio functionality (both streaming and non-streaming)
  • Extended schemas to support audio input and output in chat messages
  • Added audio parameters to chat request parameters
  • Updated streaming accumulator to handle audio data chunks
  • Added UI components to display audio outputs in logs
  • Added support for tracking audio token usage

Type of change

  • Feature
  • Bug fix
  • Refactor
  • Documentation
  • Chore/CI

Affected areas

  • Core (Go)
  • Transports (HTTP)
  • Providers/Integrations
  • Plugins
  • UI (Next.js)
  • Docs

How to test

Test the chat audio functionality with providers that support it:

# Test OpenAI audio support
go test ./core/providers/openai -run TestOpenAI/ChatAudio

# Test Azure audio support
go test ./core/providers/azure -run TestAzure/ChatAudio

# Run all comprehensive tests
go test ./core/internal/testutil -run TestComprehensive

Screenshots/Recordings

N/A

Breaking changes

  • Yes
  • No

Related issues

N/A

Security considerations

Audio data is handled similarly to other sensitive content in the system.

Checklist

  • I added/updated tests where appropriate
  • I verified builds succeed (Go and UI)
  • I verified the CI pipeline passes locally if applicable

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Dec 17, 2025

Warning

Rate limit exceeded

@Pratham-Mishra04 has exceeded the limit for the number of commits or files that can be reviewed per hour. Please wait 14 minutes and 5 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

📥 Commits

Reviewing files that changed from the base of the PR and between 2e56871 and 5c049c8.

⛔ Files ignored due to path filters (1)
  • core/internal/testutil/scenarios/media/sample.mp3 is excluded by !**/*.mp3
📒 Files selected for processing (55)
  • .github/workflows/pr-tests.yml (1 hunks)
  • .github/workflows/release-pipeline.yml (4 hunks)
  • .github/workflows/snyk.yml (4 hunks)
  • Makefile (2 hunks)
  • core/internal/testutil/account.go (9 hunks)
  • core/internal/testutil/chat_audio.go (1 hunks)
  • core/internal/testutil/tests.go (2 hunks)
  • core/internal/testutil/transcription.go (6 hunks)
  • core/internal/testutil/transcription_stream.go (5 hunks)
  • core/internal/testutil/utils.go (4 hunks)
  • core/internal/testutil/validation_presets.go (2 hunks)
  • core/providers/azure/azure_test.go (2 hunks)
  • core/providers/mistral/mistral.go (3 hunks)
  • core/providers/mistral/mistral_test.go (2 hunks)
  • core/providers/mistral/transcription_test.go (2 hunks)
  • core/providers/openai/openai.go (1 hunks)
  • core/providers/openai/openai_test.go (2 hunks)
  • core/schemas/chatcompletions.go (7 hunks)
  • core/schemas/mux.go (2 hunks)
  • core/schemas/responses.go (1 hunks)
  • core/version (1 hunks)
  • docs/contributing/setting-up-repo.mdx (1 hunks)
  • docs/plugins/building-dynamic-binary.mdx (9 hunks)
  • docs/plugins/writing-plugin.mdx (3 hunks)
  • examples/plugins/hello-world/Makefile (1 hunks)
  • framework/changelog.md (1 hunks)
  • framework/modelcatalog/pricing.go (3 hunks)
  • framework/streaming/chat.go (1 hunks)
  • framework/version (1 hunks)
  • helm-charts/bifrost/Chart.yaml (1 hunks)
  • helm-charts/index.yaml (1 hunks)
  • plugins/governance/changelog.md (1 hunks)
  • plugins/governance/version (1 hunks)
  • plugins/jsonparser/changelog.md (1 hunks)
  • plugins/jsonparser/version (1 hunks)
  • plugins/logging/changelog.md (1 hunks)
  • plugins/logging/version (1 hunks)
  • plugins/maxim/changelog.md (1 hunks)
  • plugins/maxim/version (1 hunks)
  • plugins/mocker/changelog.md (1 hunks)
  • plugins/mocker/version (1 hunks)
  • plugins/otel/changelog.md (1 hunks)
  • plugins/otel/version (1 hunks)
  • plugins/semanticcache/changelog.md (1 hunks)
  • plugins/semanticcache/version (1 hunks)
  • plugins/telemetry/changelog.md (1 hunks)
  • plugins/telemetry/version (1 hunks)
  • transports/Dockerfile (1 hunks)
  • transports/changelog.md (1 hunks)
  • transports/version (1 hunks)
  • ui/app/workspace/logs/views/audioPlayer.tsx (3 hunks)
  • ui/app/workspace/logs/views/filters.tsx (1 hunks)
  • ui/app/workspace/logs/views/logChatMessageView.tsx (3 hunks)
  • ui/app/workspace/logs/views/logDetailsSheet.tsx (5 hunks)
  • ui/lib/types/logs.ts (1 hunks)
📝 Walkthrough

Walkthrough

Adds chat audio support across schema, streaming assembly, tests, provider configs, UI playback, pricing, and version bumps; introduces audio types/fields, streaming audio accumulation, test helpers for audio scenarios, and UI components for rendering audio and transcripts.

Changes

Cohort / File(s) Summary
Schema & Token Mappings
core/schemas/chatcompletions.go, core/schemas/responses.go, core/schemas/mux.go
New audio types (ChatAudioParameters, ChatAudioMessageAudio); added Audio fields to chat parameter/assistant/delta structs; added TextTokens/ImageTokens to token detail structs and mapped them in conversions.
Streaming Assembly
framework/streaming/chat.go, core/providers/openai/openai.go
Accumulate and merge audio chunks (Data, Transcript), update audio metadata (ID, ExpiresAt) during stream assembly; treat Delta.Audio as chunk content in streaming logic.
Test Infrastructure & Utilities
core/internal/testutil/account.go, core/internal/testutil/chat_audio.go, core/internal/testutil/tests.go, core/internal/testutil/utils.go, core/internal/testutil/validation_presets.go, core/internal/testutil/transcription.go, core/internal/testutil/transcription_stream.go
Added ChatAudio flag and audio model/provider fields; new helpers RunChatAudioTest and RunChatAudioStreamTest; sample audio loading and audio message constructors; validation presets for ChatAudio; support ExternalTTSProvider/ExternalTTSModel overrides wired into TTS/transcription test flows.
Provider Test Configs & Mistral changes
core/providers/azure/azure_test.go, core/providers/openai/openai_test.go, core/providers/mistral/mistral_test.go, core/providers/mistral/mistral.go, core/providers/mistral/transcription_test.go
Populated audio/transcription/speech model fields and scenario flags; added ExternalTTSProvider/ExternalTTSModel in tests; Mistral streaming: recognize additional done events and allow early break; added Stream *bool to Mistral transcription request for multipart handling.
UI Types & Views
ui/lib/types/logs.ts, ui/app/workspace/logs/views/logChatMessageView.tsx, ui/app/workspace/logs/views/logDetailsSheet.tsx, ui/app/workspace/logs/views/audioPlayer.tsx
New ChatAudioMessageAudio type and optional audio on ChatMessage; Log view accepts audioFormat and renders transcript, player, ID, expiry; AudioPlayer supports multiple formats including PCM16→WAV conversion and download handling; log details display audio request params.
Pricing & Costing
framework/modelcatalog/pricing.go
Use SpeechResponse.Usage for token-based costing when present; extract audioSeconds from transcription usage for duration-based pricing.
Icons, Styling & Versions
ui/lib/constants/icons.tsx, ui/app/workspace/logs/views/filters.tsx, core/version, framework/version, various plugin changelogs/versions, transports/*
Replaced nebius SVG with static image; minor popover styling changes; bumped core/framework/plugin/transports versions and updated changelogs.

Sequence Diagram

sequenceDiagram
    autonumber
    participant Client
    participant API
    participant Streamer
    participant Assembler
    participant UI

    Client->>API: Send chat completion request (includes audio input)
    API->>Streamer: Start streaming response (audio modality)
    Streamer->>Assembler: Emit chunk (delta with Audio data/transcript)
    Assembler->>Assembler: If first audio chunk -> init Audio struct (ID, Data, Transcript)
    Streamer->>Assembler: Emit subsequent audio chunks (Data, Transcript, metadata)
    Assembler->>Assembler: Accumulate Data & Transcript, update ID/ExpiresAt when present
    Streamer-->>API: Streaming complete (final usage)
    API->>Client: Final assembled message (text + audio metadata + audio data)
    Client->>UI: Render message
    UI->>UI: Display transcript, audio player (data), ID & expiry
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

  • Review schema/token mappings in core/schemas/* for compatibility and backward-compatibility with existing consumers.
  • Inspect streaming assembly (framework/streaming/chat.go) and openai streaming condition changes for correct accumulation and concurrency safety.
  • Validate test helpers and ExternalTTS overrides in core/internal/testutil/* for consistent provider/model usage across test paths.
  • Confirm UI audio playback and PCM16→WAV conversion in audioPlayer.tsx handles malformed/large payloads and format differences.
  • Verify pricing logic in framework/modelcatalog/pricing.go correctly uses usage fields and durations.

Suggested reviewers

  • akshaydeo
  • danpiths

Poem

🐰 I hop through schema, stream, and song,
Chunks stitch up audio all day long,
Tests hum true and players play,
Tokens counted on the way,
A rabbit cheers: "Audio's strong!" 🎧

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 61.54% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title 'feat: added audio support to chat completions' clearly summarizes the main change—adding audio input/output functionality to chat completions—and directly reflects the primary objective of the PR.
Description check ✅ Passed The PR description includes all key template sections: a clear summary of the purpose, comprehensive changes list, feature classification, affected areas, testing instructions with specific commands, breaking changes confirmation, security considerations, and a completed checklist confirming appropriate actions were taken.

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Collaborator Author

Pratham-Mishra04 commented Dec 17, 2025

@Pratham-Mishra04 Pratham-Mishra04 force-pushed the 12-16-feat_audio_gen_support_added_to_openai_chat_completions branch from bfbb622 to 49b39f2 Compare December 17, 2025 07:59
@Pratham-Mishra04 Pratham-Mishra04 force-pushed the 12-16-feat_added_audio_support_in_azure branch from c3f7d77 to 39a34e8 Compare December 17, 2025 07:59
@Pratham-Mishra04 Pratham-Mishra04 force-pushed the 12-16-feat_audio_gen_support_added_to_openai_chat_completions branch from 49b39f2 to aff923a Compare December 17, 2025 08:12
@Pratham-Mishra04 Pratham-Mishra04 force-pushed the 12-16-feat_added_audio_support_in_azure branch from 39a34e8 to 1460f5d Compare December 17, 2025 08:12
@Pratham-Mishra04 Pratham-Mishra04 marked this pull request as ready for review December 17, 2025 08:13
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
core/internal/testutil/chat_audio.go (2)

68-72: Consider validating audio format values.

The audio format ("wav") and voice ("alloy") are hardcoded without validation. Consider:

  1. Documenting supported format values (wav, pcm16, mp3, etc.) in ChatAudioParameters
  2. Adding validation or enums for supported formats to catch configuration errors early
  3. Verifying these formats are actually supported by the providers being tested

For reference, the streaming test uses "pcm16" format (line 199), suggesting multiple formats are supported but not documented.


287-300: Consider adding format validation for audio data.

The test validates that audio data is present but doesn't verify its format or integrity. Consider optionally:

  1. Validating that accumulated audio data is valid base64
  2. Checking data size is within reasonable bounds (audio files can be large)
  3. Verifying the format matches the requested output format ("wav" or "pcm16")

This is a nice-to-have enhancement for test robustness but not critical since providers are responsible for format correctness.

📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1460f5d and aff923a.

📒 Files selected for processing (12)
  • core/internal/testutil/account.go (6 hunks)
  • core/internal/testutil/chat_audio.go (1 hunks)
  • core/internal/testutil/tests.go (2 hunks)
  • core/internal/testutil/utils.go (3 hunks)
  • core/providers/azure/azure_test.go (2 hunks)
  • core/providers/openai/openai_test.go (2 hunks)
  • core/schemas/chatcompletions.go (7 hunks)
  • core/schemas/mux.go (2 hunks)
  • core/schemas/responses.go (1 hunks)
  • framework/streaming/chat.go (1 hunks)
  • ui/app/workspace/logs/views/logChatMessageView.tsx (2 hunks)
  • ui/lib/types/logs.ts (1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**

⚙️ CodeRabbit configuration file

always check the stack if there is one for the current PR. do not give localized reviews for the PR, always see all changes in the light of the whole stack of PRs (if there is a stack, if there is no stack you can continue to make localized suggestions/reviews)

Files:

  • core/providers/azure/azure_test.go
  • ui/app/workspace/logs/views/logChatMessageView.tsx
  • ui/lib/types/logs.ts
  • framework/streaming/chat.go
  • core/schemas/responses.go
  • core/internal/testutil/account.go
  • core/internal/testutil/chat_audio.go
  • core/internal/testutil/tests.go
  • core/schemas/mux.go
  • core/internal/testutil/utils.go
  • core/schemas/chatcompletions.go
  • core/providers/openai/openai_test.go
🧠 Learnings (4)
📚 Learning: 2025-12-09T17:07:42.007Z
Learnt from: qwerty-dvorak
Repo: maximhq/bifrost PR: 1006
File: core/schemas/account.go:9-18
Timestamp: 2025-12-09T17:07:42.007Z
Learning: In core/schemas/account.go, the HuggingFaceKeyConfig field within the Key struct is currently unused and reserved for future Hugging Face inference endpoint deployments. Do not flag this field as missing from OpenAPI documentation or require its presence in the API spec until the feature is actively implemented and used. When the feature is added, update the OpenAPI docs accordingly; otherwise, treat this field as non-breaking and not part of the current API surface.

Applied to files:

  • core/providers/azure/azure_test.go
  • framework/streaming/chat.go
  • core/schemas/responses.go
  • core/internal/testutil/account.go
  • core/internal/testutil/chat_audio.go
  • core/internal/testutil/tests.go
  • core/schemas/mux.go
  • core/internal/testutil/utils.go
  • core/schemas/chatcompletions.go
  • core/providers/openai/openai_test.go
📚 Learning: 2025-12-15T10:16:21.909Z
Learnt from: qwerty-dvorak
Repo: maximhq/bifrost PR: 1006
File: core/providers/huggingface/huggingface_test.go:12-63
Timestamp: 2025-12-15T10:16:21.909Z
Learning: In provider tests under core/providers/<provider>/*_test.go, do not require or flag the use of defer for Shutdown(); instead call client.Shutdown() at the end of each test function. This pattern appears consistent across all provider tests. Apply this rule only within this path; for other tests or resources, defer may still be appropriate.

Applied to files:

  • core/providers/azure/azure_test.go
  • core/providers/openai/openai_test.go
📚 Learning: 2025-12-11T11:58:25.307Z
Learnt from: Pratham-Mishra04
Repo: maximhq/bifrost PR: 1000
File: core/providers/openai/responses.go:42-84
Timestamp: 2025-12-11T11:58:25.307Z
Learning: In core/providers/openai/responses.go (and related OpenAI response handling), document and enforce the API format constraint: if ResponsesReasoning != nil and the response contains content blocks, all content blocks should be treated as reasoning blocks by default. Implement type guards or parsing logic accordingly, and add unit tests to verify that when ResponsesReasoning is non-nil, content blocks are labeled as reasoning blocks. Include clear comments in the code explaining the rationale and ensure downstream consumers rely on this behavior.

Applied to files:

  • core/providers/openai/openai_test.go
📚 Learning: 2025-12-14T14:43:30.902Z
Learnt from: Radheshg04
Repo: maximhq/bifrost PR: 980
File: core/providers/openai/images.go:10-22
Timestamp: 2025-12-14T14:43:30.902Z
Learning: Enforce the OpenAI image generation SSE event type values across the OpenAI image flow in the repository: use "image_generation.partial_image" for partial chunks, "image_generation.completed" for the final result, and "error" for errors. Apply this consistently in schemas, constants, tests, accumulator routing, and UI code within core/providers/openai (and related Go files) to ensure uniform event typing and avoid mismatches.

Applied to files:

  • core/providers/openai/openai_test.go
🧬 Code graph analysis (7)
core/providers/azure/azure_test.go (2)
core/schemas/provider.go (1)
  • Provider (314-361)
core/schemas/bifrost.go (1)
  • Azure (36-36)
ui/lib/types/logs.ts (1)
core/schemas/chatcompletions.go (1)
  • ChatAudioMessageAudio (710-715)
framework/streaming/chat.go (2)
core/schemas/chatcompletions.go (2)
  • ChatAssistantMessage (634-641)
  • ChatAudioMessageAudio (710-715)
ui/lib/types/logs.ts (1)
  • ChatAudioMessageAudio (120-125)
core/internal/testutil/chat_audio.go (2)
core/internal/testutil/utils.go (2)
  • GetSampleAudioBase64 (220-233)
  • CreateAudioChatMessage (312-329)
core/schemas/chatcompletions.go (6)
  • ChatMessage (477-486)
  • BifrostChatResponse (27-42)
  • BifrostChatRequest (12-19)
  • ChatParameters (156-186)
  • ChatAudioParameters (225-228)
  • BifrostLLMUsage (830-837)
core/internal/testutil/tests.go (1)
core/internal/testutil/chat_audio.go (2)
  • RunChatAudioTest (14-149)
  • RunChatAudioStreamTest (152-334)
core/schemas/mux.go (3)
ui/lib/types/logs.ts (2)
  • CompletionTokensDetails (198-203)
  • ResponsesResponseOutputTokens (556-558)
core/schemas/responses.go (1)
  • ResponsesResponseOutputTokens (277-288)
core/schemas/chatcompletions.go (1)
  • ChatCompletionTokensDetails (849-861)
core/schemas/chatcompletions.go (1)
ui/lib/types/logs.ts (1)
  • ChatAudioMessageAudio (120-125)
⏰ Context from checks skipped due to timeout of 900000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
🔇 Additional comments (19)
core/providers/azure/azure_test.go (1)

27-30: LGTM! Audio test configuration properly integrated.

The Azure test configuration correctly adds the ChatAudioModel field with the preview model name and enables the ChatAudio scenario flag, aligning with the broader test framework changes for audio support.

Also applies to: 54-54

core/internal/testutil/tests.go (2)

67-68: LGTM! Audio test scenarios properly integrated.

The new audio test scenarios are correctly added to the comprehensive test suite and will execute for providers that support them.


121-122: LGTM! Summary logic correctly gates audio tests.

The summary output correctly checks both the ChatAudio flag and the presence of a ChatAudioModel, consistent with other conditional test scenarios like Reasoning and Embedding.

ui/app/workspace/logs/views/logChatMessageView.tsx (1)

183-207: LGTM! Audio output rendering properly implemented.

The audio output section correctly:

  • Conditionally renders based on message.audio existence
  • Displays transcript and audio player when data is present
  • Converts Unix timestamp to JavaScript Date for expiry display
  • Follows consistent styling patterns with other message blocks
core/providers/openai/openai_test.go (1)

41-41: LGTM! OpenAI audio test configuration properly set.

The ChatAudioModel field and ChatAudio scenario flag are correctly configured, consistent with the Azure test setup.

Also applies to: 75-75

framework/streaming/chat.go (1)

106-135: LGTM! Audio streaming accumulation properly implemented.

The audio chunk accumulation logic correctly:

  • Initializes audio structure on first chunk with all metadata fields
  • Accumulates audio data and transcript across subsequent chunks
  • Updates ID and ExpiresAt metadata when provided (treating them as final values per the comment)
  • Follows the same accumulation pattern used for content, refusal, and reasoning

The approach is consistent with the existing streaming framework patterns.

core/internal/testutil/account.go (2)

55-55: LGTM! Test configuration fields properly added.

The ChatAudio flag and ChatAudioModel field are correctly added to the test configuration structs, enabling audio test scenarios for supported providers.

Also applies to: 69-69


586-623: LGTM! Provider configurations correctly enable audio support.

The OpenAI and Azure provider configurations properly set ChatAudioModel to "gpt-4o-mini-audio-preview" and enable the ChatAudio scenario flag, which aligns with the providers that support chat audio functionality.

Also applies to: 729-762

ui/lib/types/logs.ts (1)

117-125: LGTM! Audio types properly defined.

The ChatAudioMessageAudio interface and the optional audio field on ChatMessage are correctly structured and align with the backend Go types. Field names follow the existing snake_case convention for JSON serialization.

core/internal/testutil/utils.go (2)

219-233: LGTM! Audio sample loading utility properly implemented.

The GetSampleAudioBase64() function follows the established pattern of GetLionBase64Image() and correctly:

  • Locates the sample audio file using runtime.Caller
  • Reads and base64-encodes the audio data
  • Provides appropriate error handling

312-329: LGTM! Audio message constructor properly implemented.

The CreateAudioChatMessage() function correctly constructs a chat message with both text and audio input blocks, following the same pattern as CreateImageChatMessage(). The audio block structure properly includes data and format fields.

core/schemas/responses.go (2)

268-274: LGTM - Comprehensive input token breakdown.

The addition of TextTokens and ImageTokens provides granular tracking of different input modalities alongside the existing AudioTokens. Making CachedTokens optional (via omitempty) is a good change for providers that don't support prompt caching.


278-278: LGTM - Completes output token breakdown.

Adding TextTokens to the output tokens aligns with the input token changes and provides consistent token tracking across both directions.

core/schemas/mux.go (2)

654-662: LGTM - Bidirectional token field mapping.

The conversion logic properly maps the new TextTokens and ImageTokens fields from Chat API to Responses API format in both input and output token details. The straightforward field mapping maintains data integrity.


690-698: LGTM - Completes bidirectional conversion.

The reverse mapping from Responses API to Chat API format properly handles the new token fields, ensuring data preservation in round-trip conversions.

core/internal/testutil/chat_audio.go (1)

218-270: LGTM - Comprehensive streaming audio accumulation.

The stream chunk accumulation logic properly handles:

  • Text content deltas
  • Audio data and transcript accumulation
  • Audio metadata (ID, ExpiresAt)
  • Final usage statistics

The logic correctly uses a strings.Builder for efficient string concatenation.

core/schemas/chatcompletions.go (3)

709-715: LGTM - Audio message type aligns with UI contract.

The ChatAudioMessageAudio type properly defines the audio data structure with all required fields. The structure matches the UI type definition in ui/lib/types/logs.ts, ensuring consistency across the stack.

Note: The Data field will contain base64-encoded audio which can be large. Ensure downstream systems (UI, logs, storage) can handle the payload size appropriately.


157-163: LGTM - Audio parameters properly integrated.

The Audio field is correctly added to ChatParameters as an optional pointer with omitempty. The presence of the Modalities field (line 163) allows requests to specify ["text", "audio"] to enable audio output, which aligns with the test usage.


840-850: LGTM - Token breakdown extended for multiple modalities.

The addition of TextTokens and ImageTokens to both ChatPromptTokensDetails and ChatCompletionTokensDetails completes the token tracking for different input/output modalities (text, audio, image). This aligns with the changes in responses.go and mux.go for bidirectional API conversion.

@Pratham-Mishra04 Pratham-Mishra04 force-pushed the 12-16-feat_audio_gen_support_added_to_openai_chat_completions branch 3 times, most recently from 62ab3e9 to 3e89fe3 Compare December 17, 2025 08:52
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (2)
plugins/mocker/version (1)

1-1: Verify version consistency across plugins.

This plugin is also at 1.3.49, matching plugins/telemetry/version, but differs from plugins/governance/version at 1.3.50. Please ensure all plugin versions are intentionally staggered or should be synchronized to the same version for coordinated release across the stack.

core/schemas/chatcompletions.go (1)

224-228: Acknowledged: Schema mirrors OpenAI's API specification.

Based on learnings, the ChatAudioParameters structure intentionally mirrors OpenAI's API specifications, and the valid values for Format and Voice are defined by OpenAI's documentation.

🧹 Nitpick comments (4)
core/internal/testutil/chat_audio.go (4)

62-90: Consider handling the nil response case before creating a BifrostError.

The operation function returns a custom BifrostError when response is nil (lines 84-89), but this path is reached only after checking err != nil. If client.ChatCompletionRequest returns (nil, nil), this creates an error, which is correct. However, the error message "No chat response returned" could be more descriptive for debugging.

 		if response != nil {
 			return response, nil
 		}
 		return nil, &schemas.BifrostError{
 			IsBifrostError: true,
 			Error: &schemas.ErrorField{
-				Message: "No chat response returned",
+				Message: "ChatCompletionRequest returned nil response without error",
 			},
 		}

130-140: Inconsistent test failure severity for audio validation.

Lines 130-134 use t.Error for missing audio.Data, while lines 125-127 use t.Fatal for missing audio object. If audio.Data being empty is equally critical (as the expected behavior in retryContext suggests), consider using t.Fatal for consistency, or document why the test should continue despite missing data.


191-203: Streaming test lacks MaxCompletionTokens parameter unlike non-streaming.

The non-streaming test (line 73) sets MaxCompletionTokens: bifrost.Ptr(200), but the streaming test omits this. This inconsistency may lead to different behavior or longer-running tests. Consider aligning the parameters for consistency.

 		chatReq := &schemas.BifrostChatRequest{
 			Provider: testConfig.Provider,
 			Model:    testConfig.ChatAudioModel,
 			Input:    chatMessages,
 			Params: &schemas.ChatParameters{
-				Modalities: []string{"text", "audio"},
+				Modalities:          []string{"text", "audio"},
+				MaxCompletionTokens: bifrost.Ptr(200),
 				Audio: &schemas.ChatAudioParameters{
 					Voice:  "alloy",
 					Format: "pcm16", // output format
 				},
 			},
 			Fallbacks: testConfig.Fallbacks,
 		}

286-300: Consider using t.Fatal for missing accumulated audio data to fail fast.

Similar to the non-streaming test, using t.Error (lines 289, 297) allows the test to continue even when critical audio data is missing. If accumulated audio data and transcript are required outcomes, t.Fatal would provide clearer failure semantics.

📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between aff923a and 3e89fe3.

📒 Files selected for processing (36)
  • core/internal/testutil/account.go (9 hunks)
  • core/internal/testutil/chat_audio.go (1 hunks)
  • core/internal/testutil/tests.go (2 hunks)
  • core/internal/testutil/utils.go (3 hunks)
  • core/providers/azure/azure_test.go (2 hunks)
  • core/providers/openai/openai_test.go (2 hunks)
  • core/schemas/chatcompletions.go (7 hunks)
  • core/schemas/mux.go (2 hunks)
  • core/schemas/responses.go (1 hunks)
  • core/version (1 hunks)
  • framework/changelog.md (1 hunks)
  • framework/modelcatalog/pricing.go (3 hunks)
  • framework/streaming/chat.go (1 hunks)
  • framework/version (1 hunks)
  • plugins/governance/changelog.md (1 hunks)
  • plugins/governance/version (1 hunks)
  • plugins/jsonparser/changelog.md (1 hunks)
  • plugins/jsonparser/version (1 hunks)
  • plugins/logging/changelog.md (1 hunks)
  • plugins/logging/version (1 hunks)
  • plugins/maxim/changelog.md (1 hunks)
  • plugins/maxim/version (1 hunks)
  • plugins/mocker/changelog.md (1 hunks)
  • plugins/mocker/version (1 hunks)
  • plugins/otel/changelog.md (1 hunks)
  • plugins/otel/version (1 hunks)
  • plugins/semanticcache/changelog.md (1 hunks)
  • plugins/semanticcache/version (1 hunks)
  • plugins/telemetry/changelog.md (1 hunks)
  • plugins/telemetry/version (1 hunks)
  • transports/changelog.md (1 hunks)
  • transports/version (1 hunks)
  • ui/app/workspace/logs/views/filters.tsx (1 hunks)
  • ui/app/workspace/logs/views/logChatMessageView.tsx (2 hunks)
  • ui/lib/constants/icons.tsx (1 hunks)
  • ui/lib/types/logs.ts (1 hunks)
✅ Files skipped from review due to trivial changes (17)
  • plugins/maxim/version
  • plugins/semanticcache/changelog.md
  • plugins/otel/version
  • plugins/logging/version
  • plugins/logging/changelog.md
  • plugins/mocker/changelog.md
  • plugins/maxim/changelog.md
  • transports/version
  • framework/changelog.md
  • transports/changelog.md
  • plugins/telemetry/changelog.md
  • core/version
  • plugins/governance/changelog.md
  • framework/version
  • plugins/otel/changelog.md
  • plugins/jsonparser/changelog.md
  • plugins/jsonparser/version
🚧 Files skipped from review as they are similar to previous changes (6)
  • framework/streaming/chat.go
  • core/providers/openai/openai_test.go
  • core/internal/testutil/tests.go
  • core/schemas/mux.go
  • ui/app/workspace/logs/views/logChatMessageView.tsx
  • core/providers/azure/azure_test.go
👮 Files not reviewed due to content moderation or server errors (1)
  • ui/lib/constants/icons.tsx
🧰 Additional context used
📓 Path-based instructions (1)
**

⚙️ CodeRabbit configuration file

always check the stack if there is one for the current PR. do not give localized reviews for the PR, always see all changes in the light of the whole stack of PRs (if there is a stack, if there is no stack you can continue to make localized suggestions/reviews)

Files:

  • core/internal/testutil/utils.go
  • ui/app/workspace/logs/views/filters.tsx
  • ui/lib/constants/icons.tsx
  • plugins/semanticcache/version
  • core/internal/testutil/chat_audio.go
  • plugins/telemetry/version
  • core/schemas/responses.go
  • ui/lib/types/logs.ts
  • plugins/mocker/version
  • plugins/governance/version
  • core/schemas/chatcompletions.go
  • framework/modelcatalog/pricing.go
  • core/internal/testutil/account.go
🧠 Learnings (2)
📚 Learning: 2025-12-09T17:07:42.007Z
Learnt from: qwerty-dvorak
Repo: maximhq/bifrost PR: 1006
File: core/schemas/account.go:9-18
Timestamp: 2025-12-09T17:07:42.007Z
Learning: In core/schemas/account.go, the HuggingFaceKeyConfig field within the Key struct is currently unused and reserved for future Hugging Face inference endpoint deployments. Do not flag this field as missing from OpenAPI documentation or require its presence in the API spec until the feature is actively implemented and used. When the feature is added, update the OpenAPI docs accordingly; otherwise, treat this field as non-breaking and not part of the current API surface.

Applied to files:

  • core/internal/testutil/utils.go
  • core/internal/testutil/chat_audio.go
  • core/schemas/responses.go
  • core/schemas/chatcompletions.go
  • framework/modelcatalog/pricing.go
  • core/internal/testutil/account.go
📚 Learning: 2025-12-17T08:44:08.788Z
Learnt from: Pratham-Mishra04
Repo: maximhq/bifrost PR: 1114
File: core/schemas/chatcompletions.go:224-228
Timestamp: 2025-12-17T08:44:08.788Z
Learning: In core/schemas/chatcompletions.go, ensure the schema structures mirror OpenAI's API specifications exactly. Use the valid values for fields like ChatAudioParameters.Format and ChatAudioParameters.Voice as defined by OpenAI's documentation, and avoid adding additional inline documentation or constants to maintain direct compatibility with OpenAI's API.

Applied to files:

  • core/schemas/chatcompletions.go
🧬 Code graph analysis (5)
core/internal/testutil/utils.go (2)
core/schemas/chatcompletions.go (6)
  • ChatMessage (477-486)
  • ChatMessageContent (532-535)
  • ChatContentBlock (598-605)
  • ChatContentBlockTypeText (590-590)
  • ChatContentBlockTypeInputAudio (592-592)
  • ChatInputAudio (616-619)
core/schemas/utils.go (1)
  • Ptr (16-18)
ui/lib/types/logs.ts (1)
core/schemas/chatcompletions.go (1)
  • ChatAudioMessageAudio (710-715)
core/schemas/chatcompletions.go (1)
ui/lib/types/logs.ts (1)
  • ChatAudioMessageAudio (120-125)
framework/modelcatalog/pricing.go (2)
core/providers/perplexity/types.go (1)
  • Usage (69-78)
core/schemas/chatcompletions.go (1)
  • BifrostLLMUsage (830-837)
core/internal/testutil/account.go (5)
core/schemas/account.go (1)
  • BedrockKeyConfig (56-64)
ui/lib/types/config.ts (1)
  • BedrockKeyConfig (63-71)
core/utils.go (1)
  • Ptr (56-58)
core/schemas/utils.go (1)
  • Ptr (16-18)
core/schemas/provider.go (1)
  • Provider (314-361)
⏰ Context from checks skipped due to timeout of 900000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (21)
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
🔇 Additional comments (25)
plugins/semanticcache/version (1)

1-1: Verify if version bump is intentional for this plugin.

The semantic cache plugin version is bumped from 1.3.48 to 1.3.49, but no code changes were made to any files in the plugins/semanticcache/ directory. Since this PR focuses on audio support for chat completions (unrelated to semantic cache), please confirm whether this version bump is intentional or if it was accidentally included.

plugins/governance/version (1)

1-1: Version bump looks correct, but verify cross-plugin consistency.

The version bump to 1.3.50 appears appropriate. However, this is ahead of plugins/telemetry/version and plugins/mocker/version (both at 1.3.49 in this PR). Given that this PR is part of a stacked set, please verify that the version differences across plugins are intentional and aligned with the release strategy for the entire stack.

ui/app/workspace/logs/views/filters.tsx (1)

327-330: LGTM!

Minor UI styling improvements that enhance the visual consistency of the "More actions" popover.

ui/lib/types/logs.ts (1)

117-125: LGTM!

The TypeScript interface correctly mirrors the Go schema structure from core/schemas/chatcompletions.go, maintaining type safety across the stack.

framework/modelcatalog/pricing.go (2)

43-51: LGTM!

Correctly gates usage extraction on the presence of Usage data, preventing potential nil pointer issues.


76-78: LGTM!

Properly extracts audio duration for pricing calculations when available.

core/schemas/responses.go (1)

268-278: LGTM!

Token detail extensions correctly add audio-related fields and properly use omitempty tags to avoid serializing zero values. The changes align with the broader audio token tracking enhancements across the stack.

core/internal/testutil/account.go (5)

55-69: LGTM!

The addition of ChatAudio and ChatAudioModel fields properly extends the test configuration to support audio chat scenarios across the test suite.


165-183: LGTM!

The second Bedrock key configuration with UseForBatchAPI: bifrost.Ptr(true) correctly enables batch API testing for Bedrock. The separation allows independent testing of standard and batch API flows.


213-220: LGTM!

Azure deployment mappings correctly include the audio-related models: gpt-4o-mini-audio-preview, whisper-1, and gpt-4o-mini-tts, aligning with Azure's audio capabilities.


594-626: LGTM!

OpenAI provider configuration properly enables chat audio testing with the gpt-4o-mini-audio-preview model.


732-767: LGTM!

Azure provider configuration comprehensively enables audio capabilities including chat audio, speech synthesis, and transcription with appropriate model mappings.

core/internal/testutil/utils.go (3)

5-5: LGTM!

The encoding/base64 import is required for the new audio encoding functionality.


219-233: LGTM!

The GetSampleAudioBase64() function follows the established pattern of GetLionBase64Image() and correctly loads and encodes the sample audio file.


312-329: LGTM!

The CreateAudioChatMessage() helper correctly constructs a chat message with both text and audio input blocks, following the schema patterns established in core/schemas/chatcompletions.go.

core/internal/testutil/chat_audio.go (3)

14-18: LGTM: Guard clause properly gates audio test execution.

The early return with informative logging when audio is not supported follows the established pattern for conditional test scenarios.


152-156: LGTM: Streaming test properly gates on audio support.

Mirrors the non-streaming test's guard clause pattern appropriately.


217-270: LGTM: Stream chunk accumulation logic is well-structured.

The loop correctly accumulates text content, audio data, transcript, ID, and expiry metadata from streamed deltas. Error handling within the loop properly fails fast on chunk errors.

core/schemas/chatcompletions.go (7)

157-181: LGTM: ChatParameters extended with Audio field.

The Audio *ChatAudioParameters field is properly placed alphabetically with other parameters and uses a pointer for optional semantics, consistent with the existing pattern.


522-526: LGTM: UnmarshalJSON correctly includes Audio in assistant message detection.

The condition now properly checks for assistantMsg.Audio != nil to ensure audio-containing messages are correctly identified as assistant messages during unmarshalling.


634-641: LGTM: ChatAssistantMessage extended with Audio field.

The Audio *ChatAudioMessageAudio field is properly added with appropriate JSON tag and pointer semantics for optional data.


777-777: LGTM: ChatStreamResponseChoiceDelta extended with Audio field.

The Audio *ChatAudioMessageAudio field enables streaming audio data accumulation, aligning with the test utility implementation in chat_audio.go.


839-847: LGTM: ChatPromptTokensDetails extended with TextTokens and ImageTokens.

These additions align with the broader token tracking enhancements for audio support, maintaining consistency with the existing AudioTokens field.


849-861: LGTM: ChatCompletionTokensDetails extended with TextTokens.

Consistent with the prompt tokens details extension, enabling comprehensive token tracking across modalities.


709-715: The review comment's concern does not apply—the current struct design is correct.

The OpenAI Chat Completions API always returns all audio fields (id, data, expires_at, and transcript) as populated values. The non-pointer, non-omitempty struct fields are the appropriate choice since these fields are always present in responses. Using pointers would be unnecessary and inconsistent with the codebase pattern of reserving pointers for truly optional fields.

@Pratham-Mishra04 Pratham-Mishra04 force-pushed the 12-16-feat_audio_gen_support_added_to_openai_chat_completions branch from 3e89fe3 to cc9bdd3 Compare December 17, 2025 09:20
@coderabbitai coderabbitai bot requested a review from roroghost17 December 17, 2025 09:31
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
plugins/telemetry/version (1)

1-1: Version inconsistency already flagged in prior review.

The concern about version alignment (1.3.49 here vs 1.3.50 in other plugins) was already raised in a previous review comment. Please verify if this divergence is intentional or if all plugin versions should be synchronized.

📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 3e89fe3 and cc9bdd3.

⛔ Files ignored due to path filters (1)
  • core/internal/testutil/scenarios/media/sample.mp3 is excluded by !**/*.mp3
📒 Files selected for processing (36)
  • core/internal/testutil/account.go (9 hunks)
  • core/internal/testutil/chat_audio.go (1 hunks)
  • core/internal/testutil/tests.go (2 hunks)
  • core/internal/testutil/utils.go (3 hunks)
  • core/providers/azure/azure_test.go (2 hunks)
  • core/providers/openai/openai_test.go (2 hunks)
  • core/schemas/chatcompletions.go (7 hunks)
  • core/schemas/mux.go (2 hunks)
  • core/schemas/responses.go (1 hunks)
  • core/version (1 hunks)
  • framework/changelog.md (1 hunks)
  • framework/modelcatalog/pricing.go (3 hunks)
  • framework/streaming/chat.go (1 hunks)
  • framework/version (1 hunks)
  • plugins/governance/changelog.md (1 hunks)
  • plugins/governance/version (1 hunks)
  • plugins/jsonparser/changelog.md (1 hunks)
  • plugins/jsonparser/version (1 hunks)
  • plugins/logging/changelog.md (1 hunks)
  • plugins/logging/version (1 hunks)
  • plugins/maxim/changelog.md (1 hunks)
  • plugins/maxim/version (1 hunks)
  • plugins/mocker/changelog.md (1 hunks)
  • plugins/mocker/version (1 hunks)
  • plugins/otel/changelog.md (1 hunks)
  • plugins/otel/version (1 hunks)
  • plugins/semanticcache/changelog.md (1 hunks)
  • plugins/semanticcache/version (1 hunks)
  • plugins/telemetry/changelog.md (1 hunks)
  • plugins/telemetry/version (1 hunks)
  • transports/changelog.md (1 hunks)
  • transports/version (1 hunks)
  • ui/app/workspace/logs/views/filters.tsx (1 hunks)
  • ui/app/workspace/logs/views/logChatMessageView.tsx (2 hunks)
  • ui/lib/constants/icons.tsx (1 hunks)
  • ui/lib/types/logs.ts (1 hunks)
✅ Files skipped from review due to trivial changes (9)
  • plugins/otel/changelog.md
  • transports/version
  • framework/changelog.md
  • plugins/governance/changelog.md
  • core/version
  • plugins/maxim/version
  • plugins/semanticcache/changelog.md
  • framework/version
  • plugins/semanticcache/version
🚧 Files skipped from review as they are similar to previous changes (12)
  • plugins/jsonparser/changelog.md
  • core/internal/testutil/tests.go
  • ui/lib/types/logs.ts
  • transports/changelog.md
  • framework/streaming/chat.go
  • core/schemas/responses.go
  • plugins/maxim/changelog.md
  • plugins/jsonparser/version
  • plugins/logging/version
  • ui/app/workspace/logs/views/filters.tsx
  • core/schemas/mux.go
  • plugins/telemetry/changelog.md
👮 Files not reviewed due to content moderation or server errors (1)
  • ui/lib/constants/icons.tsx
🧰 Additional context used
📓 Path-based instructions (1)
**

⚙️ CodeRabbit configuration file

always check the stack if there is one for the current PR. do not give localized reviews for the PR, always see all changes in the light of the whole stack of PRs (if there is a stack, if there is no stack you can continue to make localized suggestions/reviews)

Files:

  • core/providers/openai/openai_test.go
  • plugins/mocker/changelog.md
  • framework/modelcatalog/pricing.go
  • core/internal/testutil/utils.go
  • ui/app/workspace/logs/views/logChatMessageView.tsx
  • plugins/logging/changelog.md
  • core/internal/testutil/account.go
  • ui/lib/constants/icons.tsx
  • plugins/governance/version
  • core/schemas/chatcompletions.go
  • core/internal/testutil/chat_audio.go
  • plugins/telemetry/version
  • plugins/mocker/version
  • core/providers/azure/azure_test.go
  • plugins/otel/version
🧠 Learnings (5)
📚 Learning: 2025-12-09T17:07:42.007Z
Learnt from: qwerty-dvorak
Repo: maximhq/bifrost PR: 1006
File: core/schemas/account.go:9-18
Timestamp: 2025-12-09T17:07:42.007Z
Learning: In core/schemas/account.go, the HuggingFaceKeyConfig field within the Key struct is currently unused and reserved for future Hugging Face inference endpoint deployments. Do not flag this field as missing from OpenAPI documentation or require its presence in the API spec until the feature is actively implemented and used. When the feature is added, update the OpenAPI docs accordingly; otherwise, treat this field as non-breaking and not part of the current API surface.

Applied to files:

  • core/providers/openai/openai_test.go
  • framework/modelcatalog/pricing.go
  • core/internal/testutil/utils.go
  • core/internal/testutil/account.go
  • core/schemas/chatcompletions.go
  • core/internal/testutil/chat_audio.go
  • core/providers/azure/azure_test.go
📚 Learning: 2025-12-11T11:58:25.307Z
Learnt from: Pratham-Mishra04
Repo: maximhq/bifrost PR: 1000
File: core/providers/openai/responses.go:42-84
Timestamp: 2025-12-11T11:58:25.307Z
Learning: In core/providers/openai/responses.go (and related OpenAI response handling), document and enforce the API format constraint: if ResponsesReasoning != nil and the response contains content blocks, all content blocks should be treated as reasoning blocks by default. Implement type guards or parsing logic accordingly, and add unit tests to verify that when ResponsesReasoning is non-nil, content blocks are labeled as reasoning blocks. Include clear comments in the code explaining the rationale and ensure downstream consumers rely on this behavior.

Applied to files:

  • core/providers/openai/openai_test.go
📚 Learning: 2025-12-14T14:43:30.902Z
Learnt from: Radheshg04
Repo: maximhq/bifrost PR: 980
File: core/providers/openai/images.go:10-22
Timestamp: 2025-12-14T14:43:30.902Z
Learning: Enforce the OpenAI image generation SSE event type values across the OpenAI image flow in the repository: use "image_generation.partial_image" for partial chunks, "image_generation.completed" for the final result, and "error" for errors. Apply this consistently in schemas, constants, tests, accumulator routing, and UI code within core/providers/openai (and related Go files) to ensure uniform event typing and avoid mismatches.

Applied to files:

  • core/providers/openai/openai_test.go
📚 Learning: 2025-12-15T10:16:21.909Z
Learnt from: qwerty-dvorak
Repo: maximhq/bifrost PR: 1006
File: core/providers/huggingface/huggingface_test.go:12-63
Timestamp: 2025-12-15T10:16:21.909Z
Learning: In provider tests under core/providers/<provider>/*_test.go, do not require or flag the use of defer for Shutdown(); instead call client.Shutdown() at the end of each test function. This pattern appears consistent across all provider tests. Apply this rule only within this path; for other tests or resources, defer may still be appropriate.

Applied to files:

  • core/providers/openai/openai_test.go
  • core/providers/azure/azure_test.go
📚 Learning: 2025-12-17T08:44:08.788Z
Learnt from: Pratham-Mishra04
Repo: maximhq/bifrost PR: 1114
File: core/schemas/chatcompletions.go:224-228
Timestamp: 2025-12-17T08:44:08.788Z
Learning: In core/schemas/chatcompletions.go, ensure the schema structures mirror OpenAI's API specifications exactly. Use the valid values for fields like ChatAudioParameters.Format and ChatAudioParameters.Voice as defined by OpenAI's documentation, and avoid adding additional inline documentation or constants to maintain direct compatibility with OpenAI's API.

Applied to files:

  • core/schemas/chatcompletions.go
🧬 Code graph analysis (4)
framework/modelcatalog/pricing.go (2)
core/providers/perplexity/types.go (1)
  • Usage (69-78)
core/schemas/chatcompletions.go (1)
  • BifrostLLMUsage (830-837)
core/internal/testutil/utils.go (2)
core/schemas/chatcompletions.go (7)
  • ChatMessage (477-486)
  • ChatMessageRoleUser (470-470)
  • ChatMessageContent (532-535)
  • ChatContentBlock (598-605)
  • ChatContentBlockTypeText (590-590)
  • ChatContentBlockTypeInputAudio (592-592)
  • ChatInputAudio (616-619)
core/schemas/utils.go (1)
  • Ptr (16-18)
core/internal/testutil/account.go (4)
core/schemas/account.go (2)
  • BedrockKeyConfig (56-64)
  • AzureKeyConfig (23-27)
ui/lib/types/config.ts (2)
  • BedrockKeyConfig (63-71)
  • AzureKeyConfig (23-27)
core/utils.go (1)
  • Ptr (56-58)
core/schemas/utils.go (1)
  • Ptr (16-18)
core/schemas/chatcompletions.go (1)
ui/lib/types/logs.ts (1)
  • ChatAudioMessageAudio (120-125)
⏰ Context from checks skipped due to timeout of 900000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
🔇 Additional comments (30)
plugins/otel/version (1)

1-1: LGTM! Version bump is appropriate.

The patch-level increment from 1.0.48 to 1.0.49 is consistent with the audio support feature addition and aligns with the version bumps across other components in this stacked PR.

plugins/governance/version (1)

1-1: LGTM! Version bump is clean.

The patch version increment is straightforward and consistent with the feature addition in this PR stack.

plugins/mocker/version (1)

1-1: No version bump verification needed—this is the initial repository commit with all files newly added. The mocker plugin version "1.3.49" is being set for the first time, not bumped from a prior version. Additionally, the mocker plugin itself contains no audio-related code, so the versioning decision is independent of the PR's audio feature additions.

plugins/logging/changelog.md (1)

1-1: No changes needed. Plugin changelog follows established repository convention.

All plugin changelogs in this repository (governance, jsonparser, logging, maxim, mocker, otel, semanticcache, telemetry) consistently document version updates only. Audio support features are documented in core/changelog.md with entries such as "feat: added transcription and speech support for azure" and "feat: add audio token pricing support for models." The code confirms audio handling is implemented in plugins/logging/operations.go with the AudioOutput field.

plugins/mocker/changelog.md (1)

1-1: LGTM!

Changelog entry correctly documents the core and framework version updates associated with this release.

ui/app/workspace/logs/views/logChatMessageView.tsx (1)

183-207: Audio output section follows established patterns.

The implementation correctly mirrors the structure of existing content sections (reasoning, refusal, annotations) with appropriate conditional rendering for transcript, audio data, and metadata.

core/providers/openai/openai_test.go (2)

41-41: LGTM!

ChatAudioModel correctly configured for OpenAI's audio-preview model.


75-75: ChatAudio scenario properly enabled.

The test scenario flag aligns with the new ChatAudioModel configuration, ensuring audio tests will execute for OpenAI.

core/internal/testutil/utils.go (2)

312-329: Audio message construction aligns with schema definitions.

The function correctly uses ChatContentBlockTypeInputAudio and populates ChatInputAudio with Data and Format fields matching the schema structure in chatcompletions.go.


219-233: LGTM—function correctly implements audio base64 encoding for test utilities.

The implementation properly follows the established pattern from GetLionBase64Image(), using runtime.Caller() to locate and load the sample audio file. The function correctly returns raw base64 without a data URI prefix, which is the expected format for the ChatInputAudio.Data field as documented in the schema.

core/providers/azure/azure_test.go (3)

30-30: ChatAudioModel and scenario correctly configured for Azure.

The gpt-4o-mini-audio-preview model and enabled ChatAudio scenario align with the OpenAI test configuration, ensuring consistent audio test coverage across providers.

Also applies to: 56-56


57-60: Audio-related scenarios appropriately configured.

  • Transcription enabled, TranscriptionStream disabled with documented reason
  • SpeechSynthesis and SpeechSynthesisStream both enabled

The configuration reflects Azure's current capability support.


36-36: No changes needed—Azure ReasoningModel configuration is correct.

Azure is the only cloud providing access to both Claude and GPT frontier models to customers on one platform through Microsoft Foundry, where Opus 4.5 is now available in public preview. The codebase shows this is an intentional multi-cloud partnership pattern: Bedrock uses claude-4.5-sonnet, Vertex uses claude-4.5-haiku, and both Azure and Anthropic use claude-opus-4-5 for reasoning. Each provider tests with models native to their platform.

Likely an incorrect or invalid review comment.

core/internal/testutil/account.go (5)

55-56: LGTM! Audio scenario flag added consistently.

The ChatAudio field follows the established pattern for capability flags and integrates cleanly with the existing test scenario structure.


69-70: LGTM! Model configuration field added consistently.

The ChatAudioModel field aligns with the existing pattern for model-specific configurations.


165-183: LGTM! Secondary Bedrock key configuration added for additional test coverage.

This second Bedrock key configuration uses a different ARN (AWS_BEDROCK_ARN vs AWS_ARN), includes additional model deployments (notably claude-3.5-sonnet), and is specifically enabled for batch API testing via UseForBatchAPI.


599-640: LGTM! Provider configurations correctly updated with audio capabilities.

The OpenAI and Azure test configurations properly set:

  • ChatAudioModel fields pointing to appropriate audio-capable models
  • ChatAudio scenario flags enabled
  • Additional audio-related models and scenarios for Azure (transcription, speech synthesis)

The comment on Line 767 noting Azure's lack of proper streaming transcription support is helpful context.

Also applies to: 744-781


225-237: No action required — The Azure API version 2025-01-01-preview is the correct and stable choice for audio feature testing. The "preview" designation in the API version reflects the API surface, not the model lifecycle. The audio models (whisper-1, gpt-4o-mini-tts) are production-ready and suitable for CI/CD environments.

core/internal/testutil/chat_audio.go (2)

14-149: LGTM! Non-streaming audio test is well-structured and comprehensive.

The test function properly:

  • Guards execution based on scenario configuration
  • Loads and encodes sample audio
  • Constructs requests with appropriate audio parameters (input: mp3, output: wav)
  • Uses the retry framework for resilience
  • Validates response structure, audio data presence, and transcript content
  • Provides clear logging for debugging

The different input (mp3) and output (wav) formats appear intentional and align with typical audio processing workflows.


152-334: LGTM! Streaming audio test correctly handles chunk accumulation and validation.

The streaming test function:

  • Uses pcm16 format appropriate for streaming audio (vs wav for non-streaming)
  • Properly accumulates audio data, transcripts, and metadata across chunks
  • Validates presence of audio in the stream
  • Checks optional audio token usage details when available
  • Handles both text and audio modalities correctly

The accumulation logic correctly builds up audioData and audioTranscript from delta chunks while capturing audioID and audioExpiresAt metadata.

core/schemas/chatcompletions.go (7)

157-186: LGTM! Audio parameters field added to chat parameters.

The Audio field is correctly typed as *ChatAudioParameters and follows the existing optional parameter pattern with the omitempty JSON tag.


224-228: LGTM! Audio parameters struct mirrors OpenAI's specification.

The ChatAudioParameters structure correctly defines Format and Voice fields as strings with optional JSON encoding, matching OpenAI's chat audio API specification.

Based on learnings, the team intentionally mirrors OpenAI's API structure exactly, so supported values are defined by OpenAI's documentation.


522-526: LGTM! UnmarshalJSON correctly includes audio in assistant message detection.

The condition properly checks for assistantMsg.Audio != nil alongside other optional fields to determine whether to populate ChatAssistantMessage.


636-641: LGTM! Audio field added to assistant message structure.

The Audio field is correctly typed and positioned within ChatAssistantMessage, following the established pattern for optional message content.


709-715: LGTM! Audio message structure correctly defined.

The ChatAudioMessageAudio struct properly defines all required audio fields and maintains type consistency with the corresponding TypeScript interface in ui/lib/types/logs.ts.


773-781: LGTM! Audio field added to streaming response delta.

The Audio field in ChatStreamResponseChoiceDelta correctly enables audio data to be streamed in chunks, consistent with other delta fields like Content and Reasoning.


839-843: LGTM! Token details extended for multimodal usage tracking.

The addition of TextTokens and ImageTokens fields to token detail structures enables proper breakdown of usage across modalities (text, audio, image), supporting accurate pricing and usage analytics for multimodal chat completions.

Also applies to: 849-858

framework/modelcatalog/pricing.go (3)

42-51: LGTM! Defensive handling of speech response usage.

The updated logic properly checks for Usage presence before constructing BifrostLLMUsage, preventing potential nil pointer issues and gracefully handling cases where usage data might not be available.


76-78: LGTM! Audio duration extraction for transcription pricing.

The code correctly extracts audioSeconds from TranscriptionResponse.Usage.Seconds when available, enabling duration-based pricing calculations for audio transcription operations.


97-99: LGTM! Audio duration extraction fixed for streaming transcription.

The code now correctly references result.TranscriptionStreamResponse.Usage.Seconds (not the non-streaming variant), resolving the issue flagged in the previous review.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
framework/modelcatalog/pricing.go (1)

97-99: LGTM! Previous issue has been resolved.

The code now correctly references result.TranscriptionStreamResponse.Usage.Seconds instead of the incorrect result.TranscriptionResponse.Usage.Seconds that was flagged in a previous review.

🧹 Nitpick comments (2)
plugins/otel/changelog.md (1)

1-1: Include the otel plugin's own version bump in its changelog entry.

The changelog entry documents updates to core and framework versions, but according to the PR context, the otel plugin version itself is being bumped from 1.0.48 to 1.0.49. Since this is the otel plugin's own changelog, it should explicitly document this plugin's version change.

Apply this diff to include the otel plugin version bump:

-- chore: update core version to 1.2.39 and framework version to 1.1.49
+- chore: bump otel plugin to 1.0.49; update core version to 1.2.39 and framework version to 1.1.49
ui/app/workspace/logs/views/logChatMessageView.tsx (1)

182-207: Audio output rendering looks good overall.

The implementation follows the established pattern for other content blocks. A few observations:

  1. Line 202: If message.audio.expires_at is 0 or undefined, the date display could show an invalid date ("Invalid Date" or epoch time "Jan 1, 1970"). Consider adding a guard:
 {message.audio.id && (
   <div className="text-xs text-muted-foreground">
-    ID: {message.audio.id} | Expires: {new Date(message.audio.expires_at * 1000).toLocaleString()}
+    ID: {message.audio.id}{message.audio.expires_at ? ` | Expires: ${new Date(message.audio.expires_at * 1000).toLocaleString()}` : ''}
   </div>
 )}
  1. The conditional rendering structure is consistent with other sections in the component.
📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between aff923a and cc9bdd3.

⛔ Files ignored due to path filters (1)
  • core/internal/testutil/scenarios/media/sample.mp3 is excluded by !**/*.mp3
📒 Files selected for processing (36)
  • core/internal/testutil/account.go (9 hunks)
  • core/internal/testutil/chat_audio.go (1 hunks)
  • core/internal/testutil/tests.go (2 hunks)
  • core/internal/testutil/utils.go (3 hunks)
  • core/providers/azure/azure_test.go (2 hunks)
  • core/providers/openai/openai_test.go (2 hunks)
  • core/schemas/chatcompletions.go (7 hunks)
  • core/schemas/mux.go (2 hunks)
  • core/schemas/responses.go (1 hunks)
  • core/version (1 hunks)
  • framework/changelog.md (1 hunks)
  • framework/modelcatalog/pricing.go (3 hunks)
  • framework/streaming/chat.go (1 hunks)
  • framework/version (1 hunks)
  • plugins/governance/changelog.md (1 hunks)
  • plugins/governance/version (1 hunks)
  • plugins/jsonparser/changelog.md (1 hunks)
  • plugins/jsonparser/version (1 hunks)
  • plugins/logging/changelog.md (1 hunks)
  • plugins/logging/version (1 hunks)
  • plugins/maxim/changelog.md (1 hunks)
  • plugins/maxim/version (1 hunks)
  • plugins/mocker/changelog.md (1 hunks)
  • plugins/mocker/version (1 hunks)
  • plugins/otel/changelog.md (1 hunks)
  • plugins/otel/version (1 hunks)
  • plugins/semanticcache/changelog.md (1 hunks)
  • plugins/semanticcache/version (1 hunks)
  • plugins/telemetry/changelog.md (1 hunks)
  • plugins/telemetry/version (1 hunks)
  • transports/changelog.md (1 hunks)
  • transports/version (1 hunks)
  • ui/app/workspace/logs/views/filters.tsx (1 hunks)
  • ui/app/workspace/logs/views/logChatMessageView.tsx (2 hunks)
  • ui/lib/constants/icons.tsx (1 hunks)
  • ui/lib/types/logs.ts (1 hunks)
✅ Files skipped from review due to trivial changes (15)
  • plugins/semanticcache/changelog.md
  • plugins/maxim/changelog.md
  • plugins/logging/version
  • core/version
  • plugins/mocker/changelog.md
  • plugins/otel/version
  • framework/changelog.md
  • plugins/telemetry/changelog.md
  • plugins/telemetry/version
  • plugins/governance/changelog.md
  • plugins/logging/changelog.md
  • plugins/semanticcache/version
  • plugins/jsonparser/changelog.md
  • plugins/mocker/version
  • transports/version
🚧 Files skipped from review as they are similar to previous changes (3)
  • core/schemas/mux.go
  • framework/streaming/chat.go
  • core/internal/testutil/utils.go
👮 Files not reviewed due to content moderation or server errors (1)
  • ui/lib/constants/icons.tsx
🧰 Additional context used
📓 Path-based instructions (1)
**

⚙️ CodeRabbit configuration file

always check the stack if there is one for the current PR. do not give localized reviews for the PR, always see all changes in the light of the whole stack of PRs (if there is a stack, if there is no stack you can continue to make localized suggestions/reviews)

Files:

  • plugins/jsonparser/version
  • core/internal/testutil/tests.go
  • ui/lib/constants/icons.tsx
  • core/providers/openai/openai_test.go
  • ui/app/workspace/logs/views/filters.tsx
  • framework/modelcatalog/pricing.go
  • plugins/governance/version
  • core/internal/testutil/account.go
  • framework/version
  • plugins/otel/changelog.md
  • plugins/maxim/version
  • transports/changelog.md
  • ui/app/workspace/logs/views/logChatMessageView.tsx
  • core/internal/testutil/chat_audio.go
  • ui/lib/types/logs.ts
  • core/schemas/chatcompletions.go
  • core/providers/azure/azure_test.go
  • core/schemas/responses.go
🧠 Learnings (5)
📚 Learning: 2025-12-09T17:07:42.007Z
Learnt from: qwerty-dvorak
Repo: maximhq/bifrost PR: 1006
File: core/schemas/account.go:9-18
Timestamp: 2025-12-09T17:07:42.007Z
Learning: In core/schemas/account.go, the HuggingFaceKeyConfig field within the Key struct is currently unused and reserved for future Hugging Face inference endpoint deployments. Do not flag this field as missing from OpenAPI documentation or require its presence in the API spec until the feature is actively implemented and used. When the feature is added, update the OpenAPI docs accordingly; otherwise, treat this field as non-breaking and not part of the current API surface.

Applied to files:

  • core/internal/testutil/tests.go
  • core/providers/openai/openai_test.go
  • framework/modelcatalog/pricing.go
  • core/internal/testutil/account.go
  • core/internal/testutil/chat_audio.go
  • core/schemas/chatcompletions.go
  • core/providers/azure/azure_test.go
  • core/schemas/responses.go
📚 Learning: 2025-12-11T11:58:25.307Z
Learnt from: Pratham-Mishra04
Repo: maximhq/bifrost PR: 1000
File: core/providers/openai/responses.go:42-84
Timestamp: 2025-12-11T11:58:25.307Z
Learning: In core/providers/openai/responses.go (and related OpenAI response handling), document and enforce the API format constraint: if ResponsesReasoning != nil and the response contains content blocks, all content blocks should be treated as reasoning blocks by default. Implement type guards or parsing logic accordingly, and add unit tests to verify that when ResponsesReasoning is non-nil, content blocks are labeled as reasoning blocks. Include clear comments in the code explaining the rationale and ensure downstream consumers rely on this behavior.

Applied to files:

  • core/providers/openai/openai_test.go
📚 Learning: 2025-12-14T14:43:30.902Z
Learnt from: Radheshg04
Repo: maximhq/bifrost PR: 980
File: core/providers/openai/images.go:10-22
Timestamp: 2025-12-14T14:43:30.902Z
Learning: Enforce the OpenAI image generation SSE event type values across the OpenAI image flow in the repository: use "image_generation.partial_image" for partial chunks, "image_generation.completed" for the final result, and "error" for errors. Apply this consistently in schemas, constants, tests, accumulator routing, and UI code within core/providers/openai (and related Go files) to ensure uniform event typing and avoid mismatches.

Applied to files:

  • core/providers/openai/openai_test.go
📚 Learning: 2025-12-15T10:16:21.909Z
Learnt from: qwerty-dvorak
Repo: maximhq/bifrost PR: 1006
File: core/providers/huggingface/huggingface_test.go:12-63
Timestamp: 2025-12-15T10:16:21.909Z
Learning: In provider tests under core/providers/<provider>/*_test.go, do not require or flag the use of defer for Shutdown(); instead call client.Shutdown() at the end of each test function. This pattern appears consistent across all provider tests. Apply this rule only within this path; for other tests or resources, defer may still be appropriate.

Applied to files:

  • core/providers/openai/openai_test.go
  • core/providers/azure/azure_test.go
📚 Learning: 2025-12-17T08:44:08.788Z
Learnt from: Pratham-Mishra04
Repo: maximhq/bifrost PR: 1114
File: core/schemas/chatcompletions.go:224-228
Timestamp: 2025-12-17T08:44:08.788Z
Learning: In core/schemas/chatcompletions.go, ensure the schema structures mirror OpenAI's API specifications exactly. Use the valid values for fields like ChatAudioParameters.Format and ChatAudioParameters.Voice as defined by OpenAI's documentation, and avoid adding additional inline documentation or constants to maintain direct compatibility with OpenAI's API.

Applied to files:

  • core/schemas/chatcompletions.go
🧬 Code graph analysis (6)
core/internal/testutil/tests.go (1)
core/internal/testutil/chat_audio.go (2)
  • RunChatAudioTest (14-149)
  • RunChatAudioStreamTest (152-334)
framework/modelcatalog/pricing.go (2)
core/providers/perplexity/types.go (1)
  • Usage (69-78)
core/schemas/chatcompletions.go (1)
  • BifrostLLMUsage (830-837)
core/internal/testutil/account.go (3)
core/schemas/account.go (2)
  • BedrockKeyConfig (56-64)
  • AzureKeyConfig (23-27)
core/utils.go (1)
  • Ptr (56-58)
core/schemas/utils.go (1)
  • Ptr (16-18)
ui/lib/types/logs.ts (1)
core/schemas/chatcompletions.go (1)
  • ChatAudioMessageAudio (710-715)
core/schemas/chatcompletions.go (1)
ui/lib/types/logs.ts (1)
  • ChatAudioMessageAudio (120-125)
core/providers/azure/azure_test.go (3)
core/schemas/provider.go (1)
  • Provider (314-361)
core/schemas/bifrost.go (2)
  • Azure (36-36)
  • Fallback (145-148)
core/schemas/models.go (1)
  • Model (109-129)
⏰ Context from checks skipped due to timeout of 900000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
🔇 Additional comments (24)
ui/app/workspace/logs/views/filters.tsx (1)

327-330: LGTM! UI theming enhancements align well with the audio feature additions.

The accent background and hover effects improve visual consistency and user feedback in the filters popover.

core/internal/testutil/account.go (5)

55-55: LGTM! ChatAudio flag follows established testing patterns.

The new field enables proper gating of audio test scenarios across providers.


69-69: LGTM! ChatAudioModel field is consistent with existing model configuration patterns.

Properly enables provider-specific audio model selection for testing.


165-183: Verify the intent behind the duplicated Bedrock key configuration.

The second Bedrock key (lines 165-183) appears to duplicate most deployments from the first key (lines 148-164), with the main differences being:

  • Uses AWS_BEDROCK_ARN instead of AWS_ARN
  • Sets UseForBatchAPI: true
  • Adds claude-3.5-sonnet deployment

If this separation is intentional (batch-enabled vs. non-batch keys), consider:

  1. Adding a comment explaining the purpose of each key configuration
  2. Ensuring the first key explicitly sets UseForBatchAPI: false for clarity

Based on the broader PR stack context for audio support, please confirm whether this Bedrock key duplication is:

  • Part of the audio feature requirements (e.g., audio needs batch API support)
  • A pre-existing pattern being maintained
  • An unintended side effect that should be cleaned up

607-640: LGTM! OpenAI audio configuration is correctly set up.

The gpt-4o-mini-audio-preview model and ChatAudio: true flag properly enable audio testing for OpenAI.


745-780: LGTM! Azure provider configuration comprehensively supports audio features.

The model assignments and scenario flags correctly enable:

  • Chat audio via gpt-4o-mini-audio-preview
  • Transcription via whisper-1
  • Speech synthesis via gpt-4o-mini-tts

The scenarios accurately reflect Azure's audio capabilities.

transports/changelog.md (1)

8-9: LGTM! Changelog correctly documents the coordinated version bump.

The chore entry properly records the core and framework version updates associated with this PR.

core/internal/testutil/tests.go (2)

67-68: LGTM! Audio test scenarios are properly integrated into the comprehensive test suite.

The RunChatAudioTest and RunChatAudioStreamTest functions correctly follow the TestScenarioFunc signature and are appropriately added to the test execution flow.


121-122: LGTM! Test summary entries correctly gate audio scenarios on both flag and model presence.

The gating logic testConfig.Scenarios.ChatAudio && testConfig.ChatAudioModel != "" follows the established pattern for model-specific features and ensures tests only run when properly configured.

framework/modelcatalog/pricing.go (2)

42-51: LGTM! Good defensive programming.

The null check for SpeechResponse.Usage before accessing its fields prevents potential nil pointer dereferences. The early return with 0 is appropriate when no usage data is available.


76-78: LGTM!

Correctly extracts audioSeconds from TranscriptionResponse.Usage.Seconds for duration-based pricing calculations.

core/providers/openai/openai_test.go (2)

75-75: LGTM!

Enabling the ChatAudio test scenario aligns with the PR's objective to add audio support to chat completions.


41-41: The ChatAudioModel name is correct. The GPT-4o-mini-audio-preview model is a smaller, lower-cost model that provides audio capabilities at a fraction of the cost of GPT-4o-audio-preview. The test configuration is accurate.

ui/lib/types/logs.ts (1)

117-125: LGTM! Type definitions align with backend schema.

The ChatAudioMessageAudio interface correctly mirrors the Go struct from core/schemas/chatcompletions.go. The field types are compatible:

  • id: stringID string
  • data: stringData string
  • expires_at: numberExpiresAt int
  • transcript: stringTranscript string

Consider whether any fields should be optional (e.g., transcript?) if the API can return partial audio data, but this depends on backend behavior.

core/schemas/responses.go (2)

267-275: LGTM! Extended token detail fields for multimodal support.

The addition of TextTokens, AudioTokens, and ImageTokens fields with omitempty tags appropriately extends the token tracking capabilities for multimodal responses. The omitempty tags ensure backward compatibility by omitting zero values in JSON serialization.


277-288: LGTM!

The new TextTokens field in ResponsesResponseOutputTokens complements the input token details and maintains consistency with the input token structure.

core/internal/testutil/chat_audio.go (2)

14-149: LGTM!

The non-streaming audio test implementation is thorough and follows established testing patterns. The test properly gates execution, loads audio samples, configures retry logic, validates responses including audio data and transcripts, and provides clear logging. The format conversion testing (mp3 input → wav output) adds good coverage.


152-334: LGTM!

The streaming audio test is well-implemented with proper chunk accumulation, comprehensive validation of audio data/transcript/metadata, and token usage tracking. The use of pcm16 format for streaming provides good test coverage complementary to the wav format in non-streaming tests.

core/providers/azure/azure_test.go (1)

26-62: LGTM!

The Azure test configuration properly incorporates audio-related models and scenarios. The explicit disabling of TranscriptionStream with a clear comment is good practice, and the model selections are appropriate for Azure OpenAI service capabilities.

core/schemas/chatcompletions.go (5)

157-157: LGTM!

The Audio field is properly added to ChatParameters as an optional pointer with appropriate JSON tagging.


224-228: LGTM!

The ChatAudioParameters structure correctly mirrors OpenAI's API specification. Based on learnings, the valid values for Format and Voice are defined by OpenAI's documentation, and the project intentionally maintains direct compatibility without adding inline documentation or constants.


524-524: LGTM!

The Audio field is properly added to ChatAssistantMessage, and the UnmarshalJSON method is correctly updated to include the Audio check when determining whether to populate the ChatAssistantMessage pointer. These changes work together cohesively.

Also applies to: 636-636


709-715: LGTM!

The ChatAudioMessageAudio type is well-defined with appropriate fields for audio data, transcript, expiration, and ID. The structure aligns with the UI TypeScript interface shown in the relevant code snippets (ui/lib/types/logs.ts).


777-777: LGTM!

The additions enable comprehensive token usage tracking across modalities:

  • Audio field properly added to streaming delta responses (line 777)
  • TextTokens and ImageTokens fields complement existing AudioTokens for granular multi-modal usage accounting

These changes support accurate cost calculation and usage reporting for audio-enabled chat completions.

Also applies to: 840-842, 850-850

@Pratham-Mishra04 Pratham-Mishra04 force-pushed the 12-16-feat_audio_gen_support_added_to_openai_chat_completions branch 2 times, most recently from d9c725e to 2e56871 Compare December 17, 2025 11:21
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (11)
plugins/jsonparser/changelog.md (1)

1-1: Consider adding feature details to the changelog entry.

The changelog documents a version bump but lacks description of the audio features added in this PR. Given the substantial scope of changes (audio schema, streaming, UI rendering, provider integrations), consider expanding this entry with brief feature highlights for users tracking changes.

core/providers/mistral/transcription_test.go (1)

992-1014: LGTM! Comprehensive streaming test case.

The addition of Stream: schemas.Ptr(true) correctly tests streaming behavior with all optional fields. The test appropriately verifies that the stream flag is serialized alongside other parameters.

Optional enhancement: Consider adding a test case for Stream: schemas.Ptr(false) to verify behavior when streaming is explicitly disabled. This would provide more complete coverage of the Stream field's behavior.

{
    name: "non-streaming request with stream false",
    request: &MistralTranscriptionRequest{
        Model:    "voxtral-mini-latest",
        File:     []byte{0x01, 0x02, 0x03},
        Language: schemas.Ptr("en"),
        Stream:   schemas.Ptr(false),
    },
    expectedFields: map[string]string{
        "stream":   "false",
        "model":    "voxtral-mini-latest",
        "language": "en",
    },
},
framework/streaming/chat.go (1)

106-135: Audio chunk accumulation logic looks sound; consider propagating any extra audio metadata

The first vs subsequent‑chunk handling for Audio (ID/Data/ExpiresAt/Transcript) is correct and consistent with how other streaming fields are accumulated. If schemas.ChatAudioMessageAudio (and ChatStreamResponseChoiceDelta.Audio) expose additional metadata used downstream (e.g., an audio Format or similar), consider also initializing/updating those fields here so non‑streaming and streaming chat‑audio messages carry a consistent audio shape into the UI and logs.

ui/app/workspace/logs/views/logDetailsSheet.tsx (2)

37-39: Type assertion with as any is used for audio format extraction.

The repeated as any casts suggest the LogEntry type may need updating to include the audio field in params. Consider extending the type definition to avoid runtime type assertions.


419-428: Missing audioFormat prop for embedding output.

Other LogChatMessageView usages pass audioFormat, but this embedding output case does not. For consistency, consider adding the prop here as well.

 								<LogChatMessageView
 									message={{
 										role: "assistant",
 										content: JSON.stringify(
 											log.embedding_output.map((embedding) => embedding.embedding),
 											null,
 											2,
 										),
 									}}
+									audioFormat={audioFormat}
 								/>
core/internal/testutil/transcription.go (1)

62-70: Duplicated provider/model override logic.

This same pattern appears 5 times in this file and similarly in transcription_stream.go. Consider extracting a helper function to reduce duplication.

func (tc *ComprehensiveTestConfig) GetSpeechSynthesisProviderAndModel() (schemas.ModelProvider, string) {
    provider := tc.Provider
    if tc.ExternalTTSProvider != "" {
        provider = tc.ExternalTTSProvider
    }
    model := tc.SpeechSynthesisModel
    if tc.ExternalTTSModel != "" {
        model = tc.ExternalTTSModel
    }
    return provider, model
}
core/internal/testutil/transcription_stream.go (1)

65-73: Same duplicated override pattern as in transcription.go.

This pattern appears 4 times in this file. The helper function suggested for transcription.go would also benefit this file.

ui/app/workspace/logs/views/audioPlayer.tsx (1)

58-92: Variable naming could be clearer.

The variable pcmData on line 61 holds the decoded binary data regardless of format. Consider renaming to binaryData or audioData for clarity, since it's not always PCM data.

-			const binaryString = atob(base64Data);
-			const pcmData = Uint8Array.from(binaryString, (c) => c.charCodeAt(0));
+			const binaryString = atob(base64Data);
+			const audioData = Uint8Array.from(binaryString, (c) => c.charCodeAt(0));

 			// Handle PCM16 format - convert to WAV
 			if (audioFormat === "pcm16" || audioFormat === "pcm_s16le_16") {
-				const wavData = convertPCM16ToWAV(pcmData);
+				const wavData = convertPCM16ToWAV(audioData);
core/internal/testutil/chat_audio.go (2)

32-34: Input audio format is hardcoded to "mp3".

Both test functions hardcode the input audio format to "mp3" when calling CreateAudioChatMessage. This is acceptable for test utilities, but consider whether testing multiple input formats would provide additional coverage.

Also applies to: 169-172


276-278: Validation accepts either audio data OR transcript.

Line 276 validates that at least one of accumulatedAudioData or accumulatedTranscript is present, using OR logic. This is lenient—providers might stream only transcript without audio data, or vice versa. Consider whether both fields should be required for a complete audio response, depending on your API contract expectations.

core/internal/testutil/account.go (1)

77-78: ExternalTTSProvider and ExternalTTSModel fields are declared but unused.

Lines 77-78 add ExternalTTSProvider and ExternalTTSModel fields to ComprehensiveTestConfig, but these fields aren't populated in any provider configuration in this file. They may be used elsewhere in the stack or reserved for future functionality.

📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between cc9bdd3 and 2e56871.

⛔ Files ignored due to path filters (1)
  • core/internal/testutil/scenarios/media/sample.mp3 is excluded by !**/*.mp3
📒 Files selected for processing (45)
  • core/internal/testutil/account.go (9 hunks)
  • core/internal/testutil/chat_audio.go (1 hunks)
  • core/internal/testutil/tests.go (2 hunks)
  • core/internal/testutil/transcription.go (6 hunks)
  • core/internal/testutil/transcription_stream.go (5 hunks)
  • core/internal/testutil/utils.go (4 hunks)
  • core/internal/testutil/validation_presets.go (2 hunks)
  • core/providers/azure/azure_test.go (2 hunks)
  • core/providers/mistral/mistral.go (3 hunks)
  • core/providers/mistral/mistral_test.go (2 hunks)
  • core/providers/mistral/transcription_test.go (2 hunks)
  • core/providers/openai/openai.go (1 hunks)
  • core/providers/openai/openai_test.go (2 hunks)
  • core/schemas/chatcompletions.go (7 hunks)
  • core/schemas/mux.go (2 hunks)
  • core/schemas/responses.go (1 hunks)
  • core/version (1 hunks)
  • framework/changelog.md (1 hunks)
  • framework/modelcatalog/pricing.go (3 hunks)
  • framework/streaming/chat.go (1 hunks)
  • framework/version (1 hunks)
  • plugins/governance/changelog.md (1 hunks)
  • plugins/governance/version (1 hunks)
  • plugins/jsonparser/changelog.md (1 hunks)
  • plugins/jsonparser/version (1 hunks)
  • plugins/logging/changelog.md (1 hunks)
  • plugins/logging/version (1 hunks)
  • plugins/maxim/changelog.md (1 hunks)
  • plugins/maxim/version (1 hunks)
  • plugins/mocker/changelog.md (1 hunks)
  • plugins/mocker/version (1 hunks)
  • plugins/otel/changelog.md (1 hunks)
  • plugins/otel/version (1 hunks)
  • plugins/semanticcache/changelog.md (1 hunks)
  • plugins/semanticcache/version (1 hunks)
  • plugins/telemetry/changelog.md (1 hunks)
  • plugins/telemetry/version (1 hunks)
  • transports/changelog.md (1 hunks)
  • transports/version (1 hunks)
  • ui/app/workspace/logs/views/audioPlayer.tsx (3 hunks)
  • ui/app/workspace/logs/views/filters.tsx (1 hunks)
  • ui/app/workspace/logs/views/logChatMessageView.tsx (3 hunks)
  • ui/app/workspace/logs/views/logDetailsSheet.tsx (5 hunks)
  • ui/lib/constants/icons.tsx (1 hunks)
  • ui/lib/types/logs.ts (1 hunks)
✅ Files skipped from review due to trivial changes (4)
  • plugins/jsonparser/version
  • plugins/telemetry/changelog.md
  • plugins/maxim/changelog.md
  • plugins/mocker/version
🚧 Files skipped from review as they are similar to previous changes (21)
  • ui/app/workspace/logs/views/filters.tsx
  • plugins/semanticcache/version
  • core/providers/openai/openai_test.go
  • plugins/semanticcache/changelog.md
  • plugins/maxim/version
  • framework/modelcatalog/pricing.go
  • plugins/mocker/changelog.md
  • core/version
  • plugins/otel/version
  • ui/app/workspace/logs/views/logChatMessageView.tsx
  • core/schemas/chatcompletions.go
  • plugins/logging/version
  • plugins/governance/changelog.md
  • plugins/governance/version
  • transports/version
  • plugins/logging/changelog.md
  • core/internal/testutil/utils.go
  • ui/lib/types/logs.ts
  • framework/version
  • plugins/telemetry/version
  • framework/changelog.md
👮 Files not reviewed due to content moderation or server errors (1)
  • ui/lib/constants/icons.tsx
🧰 Additional context used
📓 Path-based instructions (1)
**

⚙️ CodeRabbit configuration file

always check the stack if there is one for the current PR. do not give localized reviews for the PR, always see all changes in the light of the whole stack of PRs (if there is a stack, if there is no stack you can continue to make localized suggestions/reviews)

Files:

  • core/internal/testutil/validation_presets.go
  • core/internal/testutil/transcription_stream.go
  • core/providers/mistral/mistral.go
  • ui/lib/constants/icons.tsx
  • core/internal/testutil/tests.go
  • core/internal/testutil/chat_audio.go
  • plugins/jsonparser/changelog.md
  • core/providers/azure/azure_test.go
  • core/schemas/mux.go
  • core/internal/testutil/transcription.go
  • core/providers/mistral/mistral_test.go
  • core/internal/testutil/account.go
  • ui/app/workspace/logs/views/audioPlayer.tsx
  • core/providers/openai/openai.go
  • ui/app/workspace/logs/views/logDetailsSheet.tsx
  • core/schemas/responses.go
  • core/providers/mistral/transcription_test.go
  • framework/streaming/chat.go
  • plugins/otel/changelog.md
  • transports/changelog.md
🧠 Learnings (4)
📚 Learning: 2025-12-09T17:07:42.007Z
Learnt from: qwerty-dvorak
Repo: maximhq/bifrost PR: 1006
File: core/schemas/account.go:9-18
Timestamp: 2025-12-09T17:07:42.007Z
Learning: In core/schemas/account.go, the HuggingFaceKeyConfig field within the Key struct is currently unused and reserved for future Hugging Face inference endpoint deployments. Do not flag this field as missing from OpenAPI documentation or require its presence in the API spec until the feature is actively implemented and used. When the feature is added, update the OpenAPI docs accordingly; otherwise, treat this field as non-breaking and not part of the current API surface.

Applied to files:

  • core/internal/testutil/validation_presets.go
  • core/internal/testutil/transcription_stream.go
  • core/providers/mistral/mistral.go
  • core/internal/testutil/tests.go
  • core/internal/testutil/chat_audio.go
  • core/providers/azure/azure_test.go
  • core/schemas/mux.go
  • core/internal/testutil/transcription.go
  • core/providers/mistral/mistral_test.go
  • core/internal/testutil/account.go
  • core/providers/openai/openai.go
  • core/schemas/responses.go
  • core/providers/mistral/transcription_test.go
  • framework/streaming/chat.go
📚 Learning: 2025-12-15T10:16:21.909Z
Learnt from: qwerty-dvorak
Repo: maximhq/bifrost PR: 1006
File: core/providers/huggingface/huggingface_test.go:12-63
Timestamp: 2025-12-15T10:16:21.909Z
Learning: In provider tests under core/providers/<provider>/*_test.go, do not require or flag the use of defer for Shutdown(); instead call client.Shutdown() at the end of each test function. This pattern appears consistent across all provider tests. Apply this rule only within this path; for other tests or resources, defer may still be appropriate.

Applied to files:

  • core/providers/azure/azure_test.go
  • core/providers/mistral/mistral_test.go
  • core/providers/mistral/transcription_test.go
📚 Learning: 2025-12-11T11:58:25.307Z
Learnt from: Pratham-Mishra04
Repo: maximhq/bifrost PR: 1000
File: core/providers/openai/responses.go:42-84
Timestamp: 2025-12-11T11:58:25.307Z
Learning: In core/providers/openai/responses.go (and related OpenAI response handling), document and enforce the API format constraint: if ResponsesReasoning != nil and the response contains content blocks, all content blocks should be treated as reasoning blocks by default. Implement type guards or parsing logic accordingly, and add unit tests to verify that when ResponsesReasoning is non-nil, content blocks are labeled as reasoning blocks. Include clear comments in the code explaining the rationale and ensure downstream consumers rely on this behavior.

Applied to files:

  • core/providers/openai/openai.go
📚 Learning: 2025-12-14T14:43:30.902Z
Learnt from: Radheshg04
Repo: maximhq/bifrost PR: 980
File: core/providers/openai/images.go:10-22
Timestamp: 2025-12-14T14:43:30.902Z
Learning: Enforce the OpenAI image generation SSE event type values across the OpenAI image flow in the repository: use "image_generation.partial_image" for partial chunks, "image_generation.completed" for the final result, and "error" for errors. Apply this consistently in schemas, constants, tests, accumulator routing, and UI code within core/providers/openai (and related Go files) to ensure uniform event typing and avoid mismatches.

Applied to files:

  • core/providers/openai/openai.go
🧬 Code graph analysis (10)
core/internal/testutil/validation_presets.go (1)
core/internal/testutil/response_validation.go (1)
  • ResponseExpectations (18-43)
core/internal/testutil/transcription_stream.go (3)
core/internal/testutil/utils.go (2)
  • GetProviderVoice (39-87)
  • GenerateTTSAudioForTest (568-640)
core/schemas/speech.go (1)
  • BifrostSpeechRequest (9-16)
core/schemas/models.go (1)
  • Model (109-129)
core/providers/mistral/mistral.go (3)
core/providers/mistral/types.go (2)
  • MistralTranscriptionStreamEventType (88-88)
  • MistralTranscriptionStreamEventDone (98-98)
core/schemas/bifrost.go (1)
  • BifrostContextKeyStreamEndIndicator (126-126)
core/schemas/transcriptions.go (1)
  • TranscriptionStreamResponseTypeDone (115-115)
core/internal/testutil/tests.go (1)
core/internal/testutil/chat_audio.go (2)
  • RunChatAudioTest (14-149)
  • RunChatAudioStreamTest (152-318)
core/providers/azure/azure_test.go (3)
core/schemas/provider.go (1)
  • Provider (314-361)
core/schemas/bifrost.go (2)
  • Azure (36-36)
  • Fallback (145-148)
core/schemas/models.go (1)
  • Model (109-129)
core/schemas/mux.go (2)
core/schemas/responses.go (1)
  • ResponsesResponseOutputTokens (277-288)
core/schemas/chatcompletions.go (1)
  • ChatCompletionTokensDetails (849-861)
core/internal/testutil/transcription.go (2)
core/internal/testutil/utils.go (4)
  • GetProviderVoice (39-87)
  • GenerateTTSAudioForTest (568-640)
  • TTSTestTextBasic (20-20)
  • TTSTestTextMedium (23-23)
core/schemas/speech.go (1)
  • BifrostSpeechRequest (9-16)
core/providers/mistral/mistral_test.go (1)
core/schemas/bifrost.go (1)
  • OpenAI (35-35)
core/providers/openai/openai.go (1)
core/schemas/chatcompletions.go (1)
  • ChatStreamResponseChoice (768-770)
ui/app/workspace/logs/views/logDetailsSheet.tsx (2)
ui/app/workspace/logs/views/logEntryDetailsView.tsx (1)
  • LogEntryDetailsView (15-49)
ui/app/workspace/logs/views/logChatMessageView.tsx (1)
  • LogChatMessageView (67-214)
⏰ Context from checks skipped due to timeout of 900000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (25)
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
🔇 Additional comments (23)
plugins/otel/changelog.md (1)

1-1: Version bump entry is appropriate and correctly formatted.

This changelog entry correctly documents the version bumps for dependencies affected by the PR stack. Since plugins/otel is not directly involved in the audio support feature, a simple version bump entry is the right approach.

transports/changelog.md (2)

9-9: Version bump entry is appropriate.

The chore entry correctly documents dependency version updates. Since transports is not directly affected by the audio support feature, only a version bump entry is needed, consistent with plugins/otel.


8-8: Clarify the relationship of the docs link update to this PR.

The docs link update on line 8 appears to be unrelated to the audio support feature (which doesn't affect transports per the PR objectives). Please confirm whether this was intended to be included in this PR or should be part of a separate commit/PR.

core/providers/mistral/transcription_test.go (1)

978-990: LGTM! Stream flag correctly added to test case.

The addition of Stream: schemas.Ptr(true) is appropriate for this "basic streaming request" test case. The expected form field correctly includes "stream": "true", ensuring the flag is properly serialized into the multipart body.

core/schemas/mux.go (1)

652-671: Token usage conversions now correctly include text/image token breakdowns

The added TextTokens/ImageTokens wiring between BifrostLLMUsage and ResponsesResponseUsage is symmetric in both directions and keeps existing audio/cached/other token fields intact, so downstream usage accounting should see richer modality breakdowns without regressions.

Also applies to: 688-707

core/schemas/responses.go (1)

267-288: Responses token structs extended in line with new usage mappings

Adding TextTokens/ImageTokens here (and marking audio_tokens/cached_tokens as omitempty) matches the new conversions in mux.go and lets callers consume a more granular token breakdown while keeping JSON output compact when counts are zero.

core/internal/testutil/validation_presets.go (1)

202-215: ChatAudio expectations and scenario wiring are consistent with existing presets

The ChatAudioExpectations preset (no text content, single choice, usage/timestamps/model/latency, response_type: "chat_audio") and the new "ChatAudio" branch in GetExpectationsForScenario line up with the chat‑audio test helper semantics and the broader expectations framework.

Also applies to: 298-300

core/providers/openai/openai.go (1)

1047-1053: Streaming now correctly emits chunks for audio‑only deltas

Including Delta.Audio != nil in the emission predicate (under the existing non‑nil choice/delta guards) lets pure audio chunks flow through the streaming pipeline, which is required to drive the new audio accumulator and chat‑audio stream tests. The parentheses keep the boolean logic unambiguous.

core/internal/testutil/tests.go (1)

67-69: ChatAudio and ChatAudioStream are cleanly integrated into the comprehensive test suite

Adding RunChatAudioTest / RunChatAudioStreamTest to the scenario list and mirroring that with "ChatAudio" / "ChatAudioStream" entries in the summary (gated on Scenarios.ChatAudio and ChatAudioModel) keeps execution and reporting aligned while still auto‑skipping when the provider/model doesn’t support chat‑audio.

Also applies to: 121-123

core/providers/mistral/mistral.go (2)

573-578: Stream completion handling looks correct.

The done event handling properly sets latency, marks the stream end indicator in context, and ensures the response type is explicitly set to Done. This aligns with the broader streaming pattern in the codebase.


470-474: Dual end-of-stream event check is intentional.

The code checks for both MistralTranscriptionStreamEventDone ("transcription.done") and "transcript.text.done" (a local schema constant). This pattern is intentional and documented in the comment at line 573. The check for "transcript.text.done" as a raw event handles edge cases, test scenarios, or potential future API changes, even though Mistral's API natively emits "transcription.done".

core/providers/azure/azure_test.go (1)

26-61: Test configuration properly enables audio scenarios.

The new fields (ChatAudioModel, SpeechSynthesisModel, TranscriptionModel) and scenario flags are correctly configured. The explicit TranscriptionStream: false with the explanatory comment is good documentation practice.

core/providers/mistral/mistral_test.go (2)

31-35: External TTS configuration for Mistral transcription tests.

Using OpenAI as the external TTS provider for Mistral's transcription round-trip tests is appropriate since Mistral supports transcription but not speech synthesis. This enables comprehensive testing of Mistral's transcription capabilities.


52-53: Enabling TranscriptionStream and ListModels tests.

Good additions to expand test coverage for Mistral.

ui/app/workspace/logs/views/audioPlayer.tsx (2)

128-138: Extension determination logic is correct.

The file extension mapping covers the expected formats and correctly maps PCM16 to WAV since the conversion produces WAV output.


15-56: No action needed. The sampleRate = 24000 default is the correct specification for OpenAI's audio APIs and is already configurable via function parameters if other sample rates become necessary.

core/internal/testutil/chat_audio.go (3)

14-18: LGTM! Proper gating for provider support.

The early return pattern when chat audio is not supported by the provider is clean and prevents unnecessary test execution.

Also applies to: 151-156


62-90: LGTM! Well-structured operation function with proper error handling.

The chat operation function correctly handles nil responses by creating a custom BifrostError, which provides clear error messages during test failures.


204-262: LGTM! Comprehensive streaming accumulation.

The streaming test properly accumulates all audio-related fields (data, transcript, ID, expiresAt) and usage metadata across chunks. The loop structure and error handling are sound.

core/internal/testutil/account.go (4)

55-55: LGTM! New fields support chat audio functionality.

The additions to TestScenarios (ChatAudio) and ComprehensiveTestConfig (ChatAudioModel, ExternalTTSProvider, ExternalTTSModel) are well-placed and consistent with the PR's objectives to add audio support.

Also applies to: 69-69, 77-78


167-185: Second Bedrock key configuration for multi-ARN testing.

This second Bedrock key uses a different ARN environment variable (AWS_BEDROCK_ARN at line 175 vs default at line 158) and includes batch API support (line 184). This appears intentional for testing multiple Bedrock deployment scenarios.


224-237: Second Azure key configuration now includes audio model.

The configuration now includes gpt-4o-mini-audio-preview deployment (line 234) with API version 2025-01-01-preview (line 230), addressing the previous review concern about model-to-API-version consistency.


607-607: LGTM! OpenAI and Azure configurations properly enable chat audio.

Both provider configurations correctly:

  • Specify chat audio models (gpt-4o-mini-audio-preview)
  • Enable the ChatAudio scenario flag
  • Azure additionally configures SpeechSynthesisModel and TranscriptionModel for audio-related tests

Also applies to: 639-639, 745-750, 764-767, 780-780

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (8)
core/providers/mistral/mistral.go (1)

573-578: Consider extracting the string literal to a constant.

The string "transcript.text.done" is used as a fallback event type check. For maintainability, consider defining this as a constant alongside MistralTranscriptionStreamEventDone in types.go to keep all event type definitions co-located.

 // Check for done event (handle both "transcription.done" and "transcript.text.done")
-if MistralTranscriptionStreamEventType(eventType) == MistralTranscriptionStreamEventDone || eventType == "transcript.text.done" {
+if MistralTranscriptionStreamEventType(eventType) == MistralTranscriptionStreamEventDone || eventType == string(MistralTranscriptionStreamEventTextDone) {

This assumes adding a new constant in types.go:

MistralTranscriptionStreamEventTextDone MistralTranscriptionStreamEventType = "transcript.text.done"
core/internal/testutil/transcription.go (1)

62-70: Consider extracting the TTS provider/model derivation to a helper function.

This pattern is repeated 5 times throughout the file. A helper would reduce duplication and centralize the override logic.

// In testutil package, add helper:
func GetSpeechSynthesisConfig(testConfig ComprehensiveTestConfig) (provider schemas.ModelProvider, model string) {
    provider = testConfig.Provider
    if testConfig.ExternalTTSProvider != "" {
        provider = testConfig.ExternalTTSProvider
    }
    model = testConfig.SpeechSynthesisModel
    if testConfig.ExternalTTSModel != "" {
        model = testConfig.ExternalTTSModel
    }
    return
}

Then replace the repeated blocks with:

speechSynthesisProvider, speechSynthesisModel := GetSpeechSynthesisConfig(testConfig)
core/internal/testutil/transcription_stream.go (1)

65-73: Same helper extraction opportunity as in transcription.go.

This TTS provider/model derivation pattern is repeated 4 times in this file (and 5 times in transcription.go). Extracting to a shared helper would improve maintainability across both files.

core/internal/testutil/account.go (1)

167-185: Confirm Bedrock dual‑key configuration and intended split between ARN env vars.

The second Bedrock key uses AWS_BEDROCK_ARN (vs AWS_ARN in the first key), adds claude-3.5-sonnet, and is marked UseForBatchAPI, while both keys share overlapping Claude deployments. That’s a reasonable pattern if you’re deliberately testing separate Bedrock roles/accounts (e.g., one for real‑time, one for batch), but if both ARNs point at the same account this may be redundant and could be simplified to a single key.

Please confirm the two ARNs are intentionally distinct; if not, consider collapsing to one key and keeping the claude-3.5-sonnet deployment and UseForBatchAPI there.

core/internal/testutil/validation_presets.go (1)

202-215: ChatAudio expectations are sensible; consider adding explicit “has audio” hints if you need stricter checks.

ChatAudioExpectations and the "ChatAudio" branch in GetExpectationsForScenario line up with how chat‑audio responses are modeled (choice present, non‑text content, usage/meta required). If you later tighten validation, you could mirror the speech/transcription presets by adding ProviderSpecific flags like should_have_audio or a minimal byte length to distinguish “audio present but empty” cases, but it’s fine to defer that until you have concrete needs.

Also applies to: 298-300

ui/app/workspace/logs/views/logDetailsSheet.tsx (1)

37-40: Audio request metadata is surfaced cleanly and threaded through to message rendering.

Extracting audioFormat from log.params (with the extra_params.audio fallback), rendering Audio Format/Voice explicitly, and excluding the audio key from the generic params grid avoids duplication and keeps audio details discoverable. Passing audioFormat into all LogChatMessageView usages that can display audio output keeps the UI and playback logic aligned with the original request.

If you find yourself extending audio params further, you might consider hoisting const params = log.params as any; once near the top and reusing it instead of repeating the cast in multiple spots, but that’s purely cosmetic.

Also applies to: 122-133, 137-139, 379-380, 388-389, 407-408

ui/app/workspace/logs/views/audioPlayer.tsx (1)

5-8: AudioPlayer’s format‑aware blob handling looks solid; a couple of edge cases are worth keeping in mind.

The format‑aware createAudioBlob (including PCM16→WAV conversion) is a good fit for the new chat‑audio flows, and wiring the same helper into both play and download keeps behavior consistent with the audioFormat passed from logs.

Two small, non‑blocking nits you may want to consider later:

  • If src ever comes through as a full data URL (data:audio/...;base64,xxx) rather than raw base64, atob will throw; trimming off any data:*;base64, prefix before decoding would make this more robust.
  • When users pause audio and never let it finish, the current logic doesn’t revoke the object URL created for that play; the leak is minor in typical log‑inspection usage but could be eliminated by tracking and revoking the last URL on each new play.

Neither of these needs to block this PR, but they’re easy hardening wins if you touch this component again.

Also applies to: 10-11, 16-56, 58-86, 94-104, 120-143

core/internal/testutil/chat_audio.go (1)

189-206: Streaming test uses different audio format than non-streaming.

Non-streaming test (line 71) uses Format: "wav" while streaming test (line 198) uses Format: "pcm16". This appears intentional to test different output formats, but consider adding a comment explaining this difference for clarity.

📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between cc9bdd3 and 2e56871.

⛔ Files ignored due to path filters (1)
  • core/internal/testutil/scenarios/media/sample.mp3 is excluded by !**/*.mp3
📒 Files selected for processing (45)
  • core/internal/testutil/account.go (9 hunks)
  • core/internal/testutil/chat_audio.go (1 hunks)
  • core/internal/testutil/tests.go (2 hunks)
  • core/internal/testutil/transcription.go (6 hunks)
  • core/internal/testutil/transcription_stream.go (5 hunks)
  • core/internal/testutil/utils.go (4 hunks)
  • core/internal/testutil/validation_presets.go (2 hunks)
  • core/providers/azure/azure_test.go (2 hunks)
  • core/providers/mistral/mistral.go (3 hunks)
  • core/providers/mistral/mistral_test.go (2 hunks)
  • core/providers/mistral/transcription_test.go (2 hunks)
  • core/providers/openai/openai.go (1 hunks)
  • core/providers/openai/openai_test.go (2 hunks)
  • core/schemas/chatcompletions.go (7 hunks)
  • core/schemas/mux.go (2 hunks)
  • core/schemas/responses.go (1 hunks)
  • core/version (1 hunks)
  • framework/changelog.md (1 hunks)
  • framework/modelcatalog/pricing.go (3 hunks)
  • framework/streaming/chat.go (1 hunks)
  • framework/version (1 hunks)
  • plugins/governance/changelog.md (1 hunks)
  • plugins/governance/version (1 hunks)
  • plugins/jsonparser/changelog.md (1 hunks)
  • plugins/jsonparser/version (1 hunks)
  • plugins/logging/changelog.md (1 hunks)
  • plugins/logging/version (1 hunks)
  • plugins/maxim/changelog.md (1 hunks)
  • plugins/maxim/version (1 hunks)
  • plugins/mocker/changelog.md (1 hunks)
  • plugins/mocker/version (1 hunks)
  • plugins/otel/changelog.md (1 hunks)
  • plugins/otel/version (1 hunks)
  • plugins/semanticcache/changelog.md (1 hunks)
  • plugins/semanticcache/version (1 hunks)
  • plugins/telemetry/changelog.md (1 hunks)
  • plugins/telemetry/version (1 hunks)
  • transports/changelog.md (1 hunks)
  • transports/version (1 hunks)
  • ui/app/workspace/logs/views/audioPlayer.tsx (3 hunks)
  • ui/app/workspace/logs/views/filters.tsx (1 hunks)
  • ui/app/workspace/logs/views/logChatMessageView.tsx (3 hunks)
  • ui/app/workspace/logs/views/logDetailsSheet.tsx (5 hunks)
  • ui/lib/constants/icons.tsx (1 hunks)
  • ui/lib/types/logs.ts (1 hunks)
✅ Files skipped from review due to trivial changes (3)
  • plugins/mocker/changelog.md
  • plugins/telemetry/changelog.md
  • plugins/jsonparser/changelog.md
🚧 Files skipped from review as they are similar to previous changes (21)
  • plugins/logging/changelog.md
  • framework/version
  • plugins/telemetry/version
  • framework/streaming/chat.go
  • plugins/otel/version
  • ui/app/workspace/logs/views/filters.tsx
  • plugins/semanticcache/version
  • plugins/governance/changelog.md
  • plugins/mocker/version
  • core/internal/testutil/tests.go
  • plugins/otel/changelog.md
  • framework/changelog.md
  • plugins/governance/version
  • core/providers/azure/azure_test.go
  • core/providers/openai/openai_test.go
  • plugins/semanticcache/changelog.md
  • core/version
  • ui/lib/types/logs.ts
  • plugins/maxim/version
  • core/schemas/mux.go
  • plugins/logging/version
👮 Files not reviewed due to content moderation or server errors (1)
  • ui/lib/constants/icons.tsx
🧰 Additional context used
📓 Path-based instructions (1)
**

⚙️ CodeRabbit configuration file

always check the stack if there is one for the current PR. do not give localized reviews for the PR, always see all changes in the light of the whole stack of PRs (if there is a stack, if there is no stack you can continue to make localized suggestions/reviews)

Files:

  • plugins/jsonparser/version
  • core/providers/openai/openai.go
  • core/internal/testutil/validation_presets.go
  • ui/lib/constants/icons.tsx
  • core/providers/mistral/mistral.go
  • core/internal/testutil/chat_audio.go
  • core/internal/testutil/transcription.go
  • core/providers/mistral/mistral_test.go
  • core/internal/testutil/utils.go
  • framework/modelcatalog/pricing.go
  • core/providers/mistral/transcription_test.go
  • ui/app/workspace/logs/views/logChatMessageView.tsx
  • core/internal/testutil/account.go
  • core/schemas/responses.go
  • core/schemas/chatcompletions.go
  • transports/changelog.md
  • core/internal/testutil/transcription_stream.go
  • ui/app/workspace/logs/views/audioPlayer.tsx
  • ui/app/workspace/logs/views/logDetailsSheet.tsx
  • plugins/maxim/changelog.md
  • transports/version
🧠 Learnings (5)
📚 Learning: 2025-12-09T17:07:42.007Z
Learnt from: qwerty-dvorak
Repo: maximhq/bifrost PR: 1006
File: core/schemas/account.go:9-18
Timestamp: 2025-12-09T17:07:42.007Z
Learning: In core/schemas/account.go, the HuggingFaceKeyConfig field within the Key struct is currently unused and reserved for future Hugging Face inference endpoint deployments. Do not flag this field as missing from OpenAPI documentation or require its presence in the API spec until the feature is actively implemented and used. When the feature is added, update the OpenAPI docs accordingly; otherwise, treat this field as non-breaking and not part of the current API surface.

Applied to files:

  • core/providers/openai/openai.go
  • core/internal/testutil/validation_presets.go
  • core/providers/mistral/mistral.go
  • core/internal/testutil/chat_audio.go
  • core/internal/testutil/transcription.go
  • core/providers/mistral/mistral_test.go
  • core/internal/testutil/utils.go
  • framework/modelcatalog/pricing.go
  • core/providers/mistral/transcription_test.go
  • core/internal/testutil/account.go
  • core/schemas/responses.go
  • core/schemas/chatcompletions.go
  • core/internal/testutil/transcription_stream.go
📚 Learning: 2025-12-11T11:58:25.307Z
Learnt from: Pratham-Mishra04
Repo: maximhq/bifrost PR: 1000
File: core/providers/openai/responses.go:42-84
Timestamp: 2025-12-11T11:58:25.307Z
Learning: In core/providers/openai/responses.go (and related OpenAI response handling), document and enforce the API format constraint: if ResponsesReasoning != nil and the response contains content blocks, all content blocks should be treated as reasoning blocks by default. Implement type guards or parsing logic accordingly, and add unit tests to verify that when ResponsesReasoning is non-nil, content blocks are labeled as reasoning blocks. Include clear comments in the code explaining the rationale and ensure downstream consumers rely on this behavior.

Applied to files:

  • core/providers/openai/openai.go
📚 Learning: 2025-12-14T14:43:30.902Z
Learnt from: Radheshg04
Repo: maximhq/bifrost PR: 980
File: core/providers/openai/images.go:10-22
Timestamp: 2025-12-14T14:43:30.902Z
Learning: Enforce the OpenAI image generation SSE event type values across the OpenAI image flow in the repository: use "image_generation.partial_image" for partial chunks, "image_generation.completed" for the final result, and "error" for errors. Apply this consistently in schemas, constants, tests, accumulator routing, and UI code within core/providers/openai (and related Go files) to ensure uniform event typing and avoid mismatches.

Applied to files:

  • core/providers/openai/openai.go
📚 Learning: 2025-12-15T10:16:21.909Z
Learnt from: qwerty-dvorak
Repo: maximhq/bifrost PR: 1006
File: core/providers/huggingface/huggingface_test.go:12-63
Timestamp: 2025-12-15T10:16:21.909Z
Learning: In provider tests under core/providers/<provider>/*_test.go, do not require or flag the use of defer for Shutdown(); instead call client.Shutdown() at the end of each test function. This pattern appears consistent across all provider tests. Apply this rule only within this path; for other tests or resources, defer may still be appropriate.

Applied to files:

  • core/providers/mistral/mistral_test.go
  • core/providers/mistral/transcription_test.go
📚 Learning: 2025-12-17T08:44:08.788Z
Learnt from: Pratham-Mishra04
Repo: maximhq/bifrost PR: 1114
File: core/schemas/chatcompletions.go:224-228
Timestamp: 2025-12-17T08:44:08.788Z
Learning: In core/schemas/chatcompletions.go, ensure the schema structures mirror OpenAI's API specifications exactly. Use the valid values for fields like ChatAudioParameters.Format and ChatAudioParameters.Voice as defined by OpenAI's documentation, and avoid adding additional inline documentation or constants to maintain direct compatibility with OpenAI's API.

Applied to files:

  • core/schemas/chatcompletions.go
🧬 Code graph analysis (11)
core/providers/openai/openai.go (1)
core/schemas/chatcompletions.go (1)
  • ChatStreamResponseChoice (768-770)
core/internal/testutil/validation_presets.go (1)
core/internal/testutil/response_validation.go (1)
  • ResponseExpectations (18-43)
core/providers/mistral/mistral.go (3)
core/providers/mistral/types.go (2)
  • MistralTranscriptionStreamEventType (88-88)
  • MistralTranscriptionStreamEventDone (98-98)
core/schemas/bifrost.go (1)
  • BifrostContextKeyStreamEndIndicator (126-126)
core/schemas/transcriptions.go (1)
  • TranscriptionStreamResponseTypeDone (115-115)
core/internal/testutil/chat_audio.go (4)
core/internal/testutil/utils.go (3)
  • GetSampleAudioBase64 (220-233)
  • CreateAudioChatMessage (312-329)
  • GetErrorMessage (642-675)
core/schemas/chatcompletions.go (8)
  • ChatMessage (477-486)
  • BifrostChatResponse (27-42)
  • BifrostChatRequest (12-19)
  • ChatParameters (156-186)
  • ChatAudioParameters (225-228)
  • ChatAssistantMessage (634-641)
  • BifrostLLMUsage (830-837)
  • ChatStreamResponseChoice (768-770)
core/internal/testutil/test_retry_framework.go (6)
  • GetTestRetryConfigForScenario (1116-1148)
  • TestRetryContext (168-173)
  • ChatRetryConfig (186-193)
  • ChatRetryCondition (126-129)
  • StreamingRetryConfig (819-839)
  • WithStreamRetry (580-688)
core/schemas/bifrost.go (3)
  • ChatCompletionRequest (91-91)
  • BifrostStream (426-433)
  • ChatCompletionStreamRequest (92-92)
core/internal/testutil/transcription.go (2)
core/internal/testutil/utils.go (2)
  • GetProviderVoice (39-87)
  • GenerateTTSAudioForTest (568-640)
core/schemas/speech.go (1)
  • BifrostSpeechRequest (9-16)
core/providers/mistral/mistral_test.go (1)
core/schemas/bifrost.go (1)
  • OpenAI (35-35)
core/internal/testutil/utils.go (2)
core/schemas/chatcompletions.go (7)
  • ChatMessage (477-486)
  • ChatMessageRoleUser (470-470)
  • ChatMessageContent (532-535)
  • ChatContentBlock (598-605)
  • ChatContentBlockTypeText (590-590)
  • ChatContentBlockTypeInputAudio (592-592)
  • ChatInputAudio (616-619)
core/schemas/utils.go (1)
  • Ptr (16-18)
framework/modelcatalog/pricing.go (2)
core/providers/perplexity/types.go (1)
  • Usage (69-78)
core/schemas/chatcompletions.go (1)
  • BifrostLLMUsage (830-837)
ui/app/workspace/logs/views/logChatMessageView.tsx (2)
core/schemas/chatcompletions.go (1)
  • ChatMessage (477-486)
ui/lib/types/logs.ts (1)
  • ChatMessage (108-118)
core/schemas/chatcompletions.go (1)
ui/lib/types/logs.ts (1)
  • ChatAudioMessageAudio (120-125)
ui/app/workspace/logs/views/logDetailsSheet.tsx (2)
ui/app/workspace/logs/views/logEntryDetailsView.tsx (1)
  • LogEntryDetailsView (15-49)
ui/app/workspace/logs/views/logChatMessageView.tsx (1)
  • LogChatMessageView (67-214)
⏰ Context from checks skipped due to timeout of 900000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (30)
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
🔇 Additional comments (32)
transports/changelog.md (1)

8-9: Verify version bump documentation aligns with PR scope and stack coordination.

Line 9 documents version bumps for core and framework, but the main PR objective is adding audio support for chat completions. Given that this is a stacked PR (#1114) with dependent PRs in the stack, please verify:

  1. Audio feature documentation: Should the audio support feature have an explicit changelog entry in this file, or is it documented in a changelog from another PR in the stack?
  2. Version bump scope: Confirm that documenting core and framework version bumps in transports/changelog.md is intentional and not duplicative of entries in core/changelog.md or framework/changelog.md.
  3. Version accuracy: Verify that the version numbers (core 1.2.39, framework 1.1.49) match the actual version files updated by this PR and are consistent across the stack.
transports/version (1)

1-1: Verify version bump appropriateness for the stacked changes.

The version increment is syntactically correct. However, given that this PR stack introduces significant new functionality (audio support for chat completions, new schema fields, streaming capabilities, and UI components), please confirm that a patch version bump (1.3.49 → 1.3.50) aligns with your project's versioning strategy. If following semantic versioning, new features typically warrant a minor version bump (e.g., 1.4.0).

Based on coding guidelines, reviewing in context of the entire PR stack.

framework/modelcatalog/pricing.go (3)

42-51: LGTM—enables token-based pricing for speech responses.

The nil check safely handles cases where usage data is unavailable and correctly constructs BifrostLLMUsage when present, enabling accurate cost calculation for speech operations.


97-99: LGTM—audioSeconds extraction is correct.

The extraction correctly references TranscriptionStreamResponse.Usage.Seconds (consistent with the streaming response context), and the past issue about incorrect field reference has been properly addressed.


76-78: Correctly extracts audio duration for transcription pricing.

This change enables duration-based pricing for transcription operations when Seconds is reported in usage. The extraction is appropriate—only TranscriptionResponse and TranscriptionStreamResponse provide duration data via the Seconds field in their Usage structs, while SpeechUsage contains only token counts and intentionally remains token-based for pricing.

plugins/jsonparser/version (1)

1-1: LGTM!

Version bump aligns with the broader audio support feature additions in this PR stack.

plugins/maxim/changelog.md (1)

1-1: LGTM!

Changelog entry properly documents the version updates for core and framework.

core/providers/mistral/mistral_test.go (2)

31-35: LGTM!

The external TTS provider configuration correctly enables Mistral transcription tests by using OpenAI for TTS generation, since Mistral doesn't natively support speech synthesis but does support transcription via voxtral-mini-latest.


52-53: LGTM!

Enabling TranscriptionStream and ListModels scenarios properly expands test coverage for Mistral's audio capabilities.

core/providers/mistral/mistral.go (1)

470-474: LGTM!

Breaking early after detecting a done event prevents unnecessary waiting and properly terminates the stream processing loop. The dual-check for both MistralTranscriptionStreamEventDone and "transcript.text.done" ensures compatibility with different event formats from the Mistral API.

core/internal/testutil/transcription.go (1)

73-76: LGTM!

The derived speechSynthesisProvider is correctly used for voice selection and TTS request construction, ensuring providers without native TTS can leverage external TTS services for transcription round-trip testing.

core/providers/mistral/transcription_test.go (3)

983-984: LGTM!

The Stream field is correctly set to true for streaming request tests, and the expected form fields properly validate that "stream": "true" is included in the multipart body.


1276-1334: Good context cancellation test coverage.

This test properly validates that the streaming transcription respects context cancellation by using a short timeout (500ms) against a slow server (5s delay), ensuring the client doesn't hang indefinitely.


1467-1475: LGTM!

The testLogger is a minimal no-op implementation suitable for unit tests where logging output isn't needed.

core/internal/testutil/transcription_stream.go (2)

76-79: LGTM!

The derived provider and model are correctly applied to voice selection and TTS request construction for streaming transcription tests.


276-328: Solid round-trip validation logic.

The 50% word-match threshold with detailed word-by-word comparison logging provides good debugging capability while allowing for transcription variations across providers.

core/internal/testutil/account.go (1)

55-56: ChatAudio flags and models are wired consistently across OpenAI and Azure test configs.

The new ChatAudio flag plus ChatAudioModel fields are threaded cleanly into TestScenarios and ComprehensiveTestConfig, and the OpenAI/Azure entries in AllProviderConfigs use them in a way that should integrate smoothly with the new chat‑audio test helpers. Other providers safely default these fields to zero values, so there’s no regression risk in existing scenarios.

Also applies to: 69-79, 597-608, 639-640, 745-751, 780-781

core/providers/openai/openai.go (1)

1047-1052: Including Delta.Audio in the streaming emit condition is the right extension for chat audio.

Treating non‑nil Delta.Audio the same as content/tool‑call deltas ensures audio‑only chunks advance chunkIndex and get emitted, without impacting existing text/tool streaming behavior.

core/internal/testutil/utils.go (1)

219-233: Audio test helpers are consistent with existing patterns and schema types.

GetSampleAudioBase64 mirrors the existing image‑fixture loader and returns a clean base64 payload suitable for ChatInputAudio.Data, and CreateAudioChatMessage correctly composes a user message with both text and input_audio blocks using the provided format hint. This should plug into the new chat‑audio tests without surprising existing helpers.

Also applies to: 312-329

ui/app/workspace/logs/views/logChatMessageView.tsx (1)

4-9: Audio input/output rendering is wired correctly into the chat message view.

The new audioFormat prop, input_audio block rendering, and “Audio Output” section (transcript + AudioPlayer + guarded expiry display) align with the extended ChatMessage shape and avoid earlier Invalid Date pitfalls. This should give clear visibility into both the audio payload and its metadata in log views without impacting existing text/tool/annotation handling.

Also applies to: 50-62, 67-68, 184-207

core/schemas/responses.go (1)

267-278: LGTM! Token detail fields extended consistently.

The additions of TextTokens and ImageTokens to input tokens and TextTokens to output tokens align with the broader audio support feature and match the corresponding changes in core/schemas/chatcompletions.go. The omitempty tags are appropriate for these optional detail fields.

core/internal/testutil/chat_audio.go (4)

62-90: Non-streaming audio request setup looks correct.

The request correctly sets:

  • Modalities: []string{"text", "audio"} for audio output
  • Audio parameters with voice and output format
  • MaxCompletionTokens to limit response size

The error handling properly wraps nil responses in a BifrostError.


216-262: Audio stream accumulation logic is well-structured.

The code correctly:

  • Accumulates audio data and transcript across chunks using strings.Builder
  • Captures metadata (ID, ExpiresAt) when present
  • Handles the delta structure from ChatStreamResponseChoiceDelta.Audio
  • Tracks final usage for validation

275-290: Validation allows either audio data or transcript.

The validation at line 276-278 requires at least one of audio data or transcript to be present. This is pragmatic since some providers may stream transcript before/without audio data, but consider whether this meets the test's intent of validating full audio response capability.


36-49: ChatAudio scenario falls through to default retry config.

GetTestRetryConfigForScenario("ChatAudio", testConfig) has no explicit case in the switch statement and falls through to default, returning DefaultTestRetryConfig(). Given that other audio-related scenarios like SpeechSynthesis, SpeechSynthesisStream, and Transcription have dedicated retry configs, verify whether ChatAudio should also have a custom retry config to account for audio processing latency, or if using the default config is intentional.

core/schemas/chatcompletions.go (7)

156-181: ChatParameters extended with audio support.

The Audio *ChatAudioParameters field is correctly added to enable audio configuration in chat requests. The field ordering change (alphabetical with Audio first) maintains consistency.


224-228: ChatAudioParameters mirrors OpenAI's API spec.

Based on learnings, this struct intentionally mirrors OpenAI's API specifications without additional inline documentation or constants, maintaining direct compatibility.


522-526: Unmarshalling correctly includes Audio field check.

The condition now includes assistantMsg.Audio != nil to properly detect when an assistant message contains audio data, ensuring the ChatAssistantMessage is populated correctly during JSON unmarshalling.


633-641: ChatAssistantMessage extended with Audio field.

The Audio *ChatAudioMessageAudio field is properly added to support audio responses in assistant messages.


709-715: ChatAudioMessageAudio struct is consistent with UI types.

The struct fields match the TypeScript interface in ui/lib/types/logs.ts:119-124:

  • idID
  • dataData
  • expires_atExpiresAt
  • transcriptTranscript

This ensures frontend-backend type consistency.


773-781: Streaming delta includes Audio field.

The Audio *ChatAudioMessageAudio field in ChatStreamResponseChoiceDelta enables streaming audio responses, which is handled by the accumulation logic in the test utilities and the streaming framework.


839-850: Token detail structs extended with TextTokens and ImageTokens.

The additions align with the corresponding changes in core/schemas/responses.go, providing consistent token tracking across chat completions and responses APIs.

@Pratham-Mishra04 Pratham-Mishra04 force-pushed the 12-16-feat_added_audio_support_in_azure branch from 1460f5d to bcc2067 Compare December 17, 2025 15:42
@Pratham-Mishra04 Pratham-Mishra04 force-pushed the 12-16-feat_audio_gen_support_added_to_openai_chat_completions branch from 2e56871 to 1358dd1 Compare December 17, 2025 15:42
@Pratham-Mishra04 Pratham-Mishra04 force-pushed the 12-16-feat_audio_gen_support_added_to_openai_chat_completions branch from 1358dd1 to 5c049c8 Compare December 17, 2025 15:48
@Pratham-Mishra04 Pratham-Mishra04 force-pushed the 12-16-feat_added_audio_support_in_azure branch from bcc2067 to 4490ef2 Compare December 17, 2025 15:48
Copy link
Contributor

akshaydeo commented Dec 17, 2025

Merge activity

  • Dec 17, 3:51 PM UTC: A user started a stack merge that includes this pull request via Graphite.
  • Dec 17, 3:55 PM UTC: @akshaydeo merged this pull request with Graphite.

@akshaydeo akshaydeo changed the base branch from 12-16-feat_added_audio_support_in_azure to graphite-base/1114 December 17, 2025 15:54
@akshaydeo akshaydeo changed the base branch from graphite-base/1114 to main December 17, 2025 15:54
@akshaydeo akshaydeo merged commit 0de1856 into main Dec 17, 2025
3 checks passed
@akshaydeo akshaydeo deleted the 12-16-feat_audio_gen_support_added_to_openai_chat_completions branch December 17, 2025 15:55
This was referenced Dec 25, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants