Skip to content

Conversation

@Pratham-Mishra04
Copy link
Collaborator

Summary

Added support for Azure transcription and speech capabilities, along with improved handling of non-JSON content types in transport interceptors.

Changes

  • Implemented speech and transcription functionality for Azure provider
  • Added transcription support for Mistral provider
  • Fixed transport interceptor middleware to properly handle multipart/form-data requests by only processing JSON content types
  • Updated changelog entries to reflect new capabilities

Type of change

  • Bug fix
  • Feature
  • Refactor
  • Documentation
  • Chore/CI

Affected areas

  • Core (Go)
  • Transports (HTTP)
  • Providers/Integrations
  • Plugins
  • UI (Next.js)
  • Docs

How to test

Test the new Azure speech and transcription capabilities:

# Core/Transports
go version
go test ./...

# Test Azure speech endpoint
curl -X POST "http://localhost:8000/v1/audio/speech" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "azure-tts-model",
    "input": "Hello world",
    "voice": "alloy"
  }'

# Test Azure transcription endpoint
curl -X POST "http://localhost:8000/v1/audio/transcriptions" \
  -F "file=@./audio-sample.mp3" \
  -F "model=azure-whisper"

Breaking changes

  • No

Related issues

Adds support for Azure speech and transcription capabilities, completing the audio feature set across providers.

Security considerations

No new security implications. Uses existing authentication mechanisms.

Checklist

  • I added/updated tests where appropriate
  • I verified builds succeed (Go and UI)

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Dec 17, 2025

Caution

Review failed

The pull request is closed.

📝 Walkthrough

Summary by CodeRabbit

  • New Features

    • Added transcription and speech support for Azure
    • Added transcription support for Mistral
    • Added prompt caching support for Anthropic and Bedrock
    • Added reasoning support for Bedrock Nova 2 models
    • Added provider key configuration option for batch API selection
    • Added cost recalculation for logs
  • Documentation

    • Updated provider feature support matrix for Azure
    • Updated integration documentation for key management
    • Updated guides for reasoning and embeddings
  • Improvements

    • Enhanced budget evaluation with provider-scoped context
    • Optimized request middleware for JSON content handling
  • Style

    • Converted Nebius icon to inline SVG format
  • Chores

    • Increased provider timeout limit to 48 hours

✏️ Tip: You can customize this high-level summary in your review settings.

Walkthrough

Implements Azure OpenAI-compatible speech/transcription (including streaming), centralizes Azure request handling, restricts transport plugin interception to JSON requests, updates changelogs and docs, fixes JSX SVG attribute names, and replaces the Nebius static image with an inline SVG.

Changes

Cohort / File(s) Change Summary
Changelog & Docs
core/changelog.md, transports/changelog.md, docs/features/unified-interface.mdx
Updated changelog wording and feature entries; updated Azure capability matrix in docs to reflect new TTS/STT support.
Azure Speech & Transcription
core/providers/azure/azure.go
Replaced stubs with full Speech, SpeechStream (SSE-style streaming), and Transcription implementations; added completeRequest helper for request construction, auth, latency, error parsing, and response decoding; propagate deployment/model metadata.
Transport Middleware
transports/bifrost-http/handlers/middlewares.go
Refactored TransportInterceptorMiddleware to only parse/intercept JSON requests, allow plugin modifications to body/headers, capture per-plugin context values, and bypass non-JSON requests.
UI SVG Fixes
ui/components/ui/icons.tsx
Converted SVG gradient attributes from stop-color/stop-opacity to React-friendly stopColor/stopOpacity.
Nebius Icon Inline SVG
ui/lib/constants/icons.tsx
Replaced nebius <img> usage with an inline, accessible SVG component (detailed paths, fills, viewBox) using existing size resolution.
Governance: Provider-aware Budgets
plugins/governance/resolver.go, plugins/governance/store.go, plugins/governance/tracker.go
Threaded provider context through budget checks and updates: updated signatures to accept provider, filtered budgets by provider when collecting/updating, and passed provider to store-level CheckBudget/UpdateBudget calls.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant AzureProvider
    participant RequestHandler as completeRequest
    participant OpenAIHandler
    participant AzureService as Azure OpenAI
    participant StreamConsumer as ClientStream

    Client->>AzureProvider: SpeechStream(ctx, input)
    AzureProvider->>AzureProvider: Validate key & resolve deployment
    AzureProvider->>RequestHandler: Build request (endpoint, headers, body)
    RequestHandler->>AzureService: fasthttp.Do() (bearer / api-key)
    AzureService-->>RequestHandler: SSE stream (events / chunks)
    RequestHandler->>RequestHandler: Parse SSE, detect errors
    RequestHandler->>StreamConsumer: Emit audio chunks / metadata
    RequestHandler->>StreamConsumer: Emit final done signal
    StreamConsumer-->>Client: Streaming audio + metadata
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

  • Pay close attention to: Azure SpeechStream SSE parsing and error handling; correctness of completeRequest auth/latency/error semantics; provider-aware budget filtering in governance store and signature changes; middleware JSON handling and safe fallback on invalid JSON; Nebius SVG sizing/accessibility.

Possibly related PRs

Suggested reviewers

  • akshaydeo
  • danpiths

Poem

🐇 I hopped to add a streaming stream,

Azure sings and icons gleam,
JSON hops through plugin gates,
Budgets now know provider states,
Nebius shines in vector dream ✨

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title 'feat: add transcription and speech support for Azure' clearly and specifically summarizes the main change of implementing audio capabilities for the Azure provider.
Description check ✅ Passed The PR description includes all key sections: summary, changes, type of change, affected areas, testing instructions, breaking changes, and checklist items completed.

📜 Recent review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1460f5d and 4490ef2.

📒 Files selected for processing (10)
  • core/changelog.md (1 hunks)
  • core/providers/azure/azure.go (2 hunks)
  • docs/features/unified-interface.mdx (1 hunks)
  • plugins/governance/resolver.go (2 hunks)
  • plugins/governance/store.go (5 hunks)
  • plugins/governance/tracker.go (1 hunks)
  • transports/bifrost-http/handlers/middlewares.go (1 hunks)
  • transports/changelog.md (1 hunks)
  • ui/components/ui/icons.tsx (2 hunks)
  • ui/lib/constants/icons.tsx (1 hunks)

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Collaborator Author

Pratham-Mishra04 commented Dec 17, 2025

@Pratham-Mishra04 Pratham-Mishra04 force-pushed the 12-16-feat_added_audio_support_in_azure branch from c3f7d77 to 39a34e8 Compare December 17, 2025 07:59
@Pratham-Mishra04 Pratham-Mishra04 force-pushed the 12-16-refactor_openai_and_mistral_audio_refactor branch from c6d091a to f7fd3bf Compare December 17, 2025 08:12
@Pratham-Mishra04 Pratham-Mishra04 force-pushed the 12-16-feat_added_audio_support_in_azure branch from 39a34e8 to 1460f5d Compare December 17, 2025 08:12
@Pratham-Mishra04 Pratham-Mishra04 marked this pull request as ready for review December 17, 2025 08:13
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (3)
transports/bifrost-http/handlers/middlewares.go (1)

82-140: JSON-only interception logic is sound; consider case-insensitive Content-Type

The interceptor now:

  • Skips all plugins when there are none or governance isn’t loaded.
  • Parses headers once and tracks original header names so plugins can remove headers.
  • Only runs for requests whose Content-Type starts with application/json, avoiding multipart/form-data and other streaming bodies.
  • Safely skips interception on invalid JSON (logs a warning and calls next(ctx)).
  • Applies per-plugin header/body mutations and propagates plugin context user values back into ctx.

This is a solid fix for the multipart/body-consumption issue. One small robustness improvement: make the JSON check case-insensitive and tolerant of charset variations:

- contentType := string(ctx.Request.Header.Peek("Content-Type"))
- isJSONRequest := strings.HasPrefix(contentType, "application/json")
+ contentType := strings.ToLower(string(ctx.Request.Header.Peek("Content-Type")))
+ isJSONRequest := strings.HasPrefix(contentType, "application/json")

This avoids missing JSON bodies when clients send Application/JSON or similar variants.

core/providers/azure/azure.go (2)

783-821: Update comment to reflect that Azure Speech is now supported

The implementation now fully supports non-streaming speech for Azure via openai.HandleOpenAISpeechRequest and sets ModelRequested/ModelDeployment in ExtraFields, but the leading comment still says “Speech is not supported by the Azure provider.”

Please update the comment to reflect the current behavior, e.g.:

-// Speech is not supported by the Azure provider.
+// Speech performs a text-to-speech request against the Azure OpenAI-compatible audio/speech endpoint.

1075-1113: Transcription support implemented; comment should be updated

Transcription now:

  • Validates the Azure key,
  • Resolves the deployment and api-version,
  • Hits /openai/deployments/{deployment}/audio/transcriptions,
  • Delegates to openai.HandleOpenAITranscriptionRequest, and
  • Sets ModelRequested and ModelDeployment in ExtraFields.

However, the preceding comment still says “Transcription is not supported by the Azure provider.” Please update the comment to describe the actual behavior and keep the public surface accurate.

📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f7fd3bf and 1460f5d.

📒 Files selected for processing (7)
  • core/changelog.md (1 hunks)
  • core/providers/azure/azure.go (2 hunks)
  • docs/features/unified-interface.mdx (1 hunks)
  • transports/bifrost-http/handlers/middlewares.go (1 hunks)
  • transports/changelog.md (1 hunks)
  • ui/components/ui/icons.tsx (2 hunks)
  • ui/lib/constants/icons.tsx (1 hunks)
👮 Files not reviewed due to content moderation or server errors (1)
  • ui/lib/constants/icons.tsx
🧰 Additional context used
📓 Path-based instructions (1)
**

⚙️ CodeRabbit configuration file

always check the stack if there is one for the current PR. do not give localized reviews for the PR, always see all changes in the light of the whole stack of PRs (if there is a stack, if there is no stack you can continue to make localized suggestions/reviews)

Files:

  • transports/bifrost-http/handlers/middlewares.go
  • ui/lib/constants/icons.tsx
  • core/changelog.md
  • ui/components/ui/icons.tsx
  • docs/features/unified-interface.mdx
  • core/providers/azure/azure.go
  • transports/changelog.md
🧠 Learnings (2)
📚 Learning: 2025-12-09T17:07:42.007Z
Learnt from: qwerty-dvorak
Repo: maximhq/bifrost PR: 1006
File: core/schemas/account.go:9-18
Timestamp: 2025-12-09T17:07:42.007Z
Learning: In core/schemas/account.go, the HuggingFaceKeyConfig field within the Key struct is currently unused and reserved for future Hugging Face inference endpoint deployments. Do not flag this field as missing from OpenAPI documentation or require its presence in the API spec until the feature is actively implemented and used. When the feature is added, update the OpenAPI docs accordingly; otherwise, treat this field as non-breaking and not part of the current API surface.

Applied to files:

  • transports/bifrost-http/handlers/middlewares.go
  • core/providers/azure/azure.go
📚 Learning: 2025-12-12T08:25:02.629Z
Learnt from: Pratham-Mishra04
Repo: maximhq/bifrost PR: 1000
File: transports/bifrost-http/integrations/router.go:709-712
Timestamp: 2025-12-12T08:25:02.629Z
Learning: In transports/bifrost-http/**/*.go, update streaming response handling to align with OpenAI Responses API: use typed SSE events such as response.created, response.output_text.delta, response.done, etc., and do not rely on the legacy data: [DONE] termination marker. Note that data: [DONE] is only used by the older Chat Completions and Text Completions streaming APIs. Ensure parsers, writers, and tests distinguish SSE events from the [DONE] sentinel and handle each event type accordingly for correct stream termination and progress updates.

Applied to files:

  • transports/bifrost-http/handlers/middlewares.go
🧬 Code graph analysis (1)
transports/bifrost-http/handlers/middlewares.go (3)
core/schemas/context.go (1)
  • NewBifrostContextWithTimeout (69-72)
examples/plugins/hello-world/main.go (2)
  • TransportInterceptor (18-22)
  • GetName (14-16)
transports/bifrost-http/handlers/utils.go (1)
  • SendError (35-44)
⏰ Context from checks skipped due to timeout of 900000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (11)
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
🔇 Additional comments (8)
ui/components/ui/icons.tsx (2)

630-641: stopOpacity JSX attribute fix looks correct

Switching stop-opacity to stopOpacity on the gradient stops is the right move for React SVG props and avoids DOM property warnings; no issues spotted.


1759-1773: MCPIcon component unchanged functionally

Only structural/trailing newline effects here; the MCPIcon SVG remains valid and self-contained.

transports/changelog.md (1)

1-8: Changelog entries correctly reflect new transport behavior

The tense fix and added entries for Azure speech/transcription, Mistral transcription, Go version bump, and docs updates align with the implementation in this and related files.

core/providers/azure/azure.go (2)

71-140: completeRequest helper nicely centralizes Azure/OpenAI-style calls

The new completeRequest function cleanly:

  • Builds the correct URL with api-version handling, including a special case for openai/v1/responses using preview.
  • Handles Azure auth (Bearer via AzureAuthorizationTokenKey vs api-key) and Anthropic-over-Azure via x-api-key + anthropic-version.
  • Reuses SetExtraHeaders, MakeRequestWithContext, and CheckAndDecodeBody, then returns a copied body to avoid fasthttp buffer pitfalls.
  • Centralizes error parsing via openai.ParseOpenAIError and wraps decode failures in a Bifrost error.

The refactors in TextCompletion, ChatCompletion, Responses, and Embedding that route through this helper significantly reduce duplication and make the behavior consistent across request types.


823-1073: Azure TTS streaming implementation looks correct; watch SSE framing and cleanup semantics

The new SpeechStream implementation:

  • Validates key/config and resolves the deployment.
  • Handles Azure auth (Bearer or api-key) and builds the /audio/speech?api-version=... URL.
  • Sets StreamBody = true and configures appropriate SSE headers (Accept: text/event-stream, Cache-Control: no-cache, Accept-Encoding: identity).
  • Uses CheckContextAndGetRequestBody with an OpenAI-compatible speech request, enabling stream_format = "sse" and swapping model for deployment.
  • Maps network/request errors to structured BifrostError values, including cancellation and timeout.
  • Parses the SSE stream manually using a rolling buffer, detecting [DONE], attempting to interpret JSON frames as Bifrost errors, and otherwise treating frames as raw audio bytes.
  • Emits BifrostSpeechStreamResponse deltas with Audio, ChunkIndex, per-chunk latency, and final Type: Done with end-to-end latency and optional raw request.

This is a good fit for Azure's binary-audio SSE behavior and aligns with the speech feature flags exposed in docs and changelog. Ensure via tests against the real Azure endpoint that:

  • Events are indeed delimited by \n\n (no \r\n\r\n mismatch), and
  • Non-error SSE frames never come back as JSON payloads that would be misinterpreted as BifrostError.

If those assumptions hold, this streaming path should be robust.

docs/features/unified-interface.mdx (1)

88-106: Azure capability row matches new backend support

Marking Azure as:

  • ✅ for TTS and TTS (stream) aligns with Speech and SpeechStream now being implemented.
  • ✅ for STT and ❌ for STT (stream) aligns with Transcription being implemented while TranscriptionStream still returns unsupported.

The rest of the matrix remains consistent with the existing provider implementations.

core/changelog.md (1)

1-6: Core changelog entries are consistent with provider features

The tense correction and new entries for:

  • handling HTML/empty provider responses,
  • Mistral transcription support, and
  • Azure transcription + speech support

line up with the corresponding code in the providers and transport layers, and with the unified-interface documentation updates.

transports/bifrost-http/handlers/middlewares.go (1)

98-118: No type mismatch exists; *fasthttp.RequestCtx implements context.Context

The fasthttp test code explicitly declares var _ context.Context = &RequestCtx{}, which means *fasthttp.RequestCtx does implement the context.Context interface. The code at line 100 will compile and run without errors.

However, note that fasthttp doesn't support cancellation, so Deadline, Done, and Err are no-ops. The context passed here will work for value storage and basic context operations, but timeout/cancellation semantics may be incomplete.

Likely an incorrect or invalid review comment.

@Pratham-Mishra04 Pratham-Mishra04 force-pushed the 12-16-feat_added_audio_support_in_azure branch from 1460f5d to bcc2067 Compare December 17, 2025 15:42
@Pratham-Mishra04 Pratham-Mishra04 force-pushed the 12-16-refactor_openai_and_mistral_audio_refactor branch from f7fd3bf to 464bc24 Compare December 17, 2025 15:42
@Pratham-Mishra04 Pratham-Mishra04 force-pushed the 12-16-refactor_openai_and_mistral_audio_refactor branch from 464bc24 to e8b50ea Compare December 17, 2025 15:48
@Pratham-Mishra04 Pratham-Mishra04 force-pushed the 12-16-feat_added_audio_support_in_azure branch from bcc2067 to 4490ef2 Compare December 17, 2025 15:48
Copy link
Contributor

akshaydeo commented Dec 17, 2025

Merge activity

  • Dec 17, 3:51 PM UTC: A user started a stack merge that includes this pull request via Graphite.
  • Dec 17, 3:54 PM UTC: @akshaydeo merged this pull request with Graphite.

@akshaydeo akshaydeo changed the base branch from 12-16-refactor_openai_and_mistral_audio_refactor to graphite-base/1113 December 17, 2025 15:53
@akshaydeo akshaydeo changed the base branch from graphite-base/1113 to main December 17, 2025 15:53
@akshaydeo akshaydeo merged commit 3e16ec6 into main Dec 17, 2025
2 checks passed
@akshaydeo akshaydeo deleted the 12-16-feat_added_audio_support_in_azure branch December 17, 2025 15:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants