feat: add transcription and speech support for Azure #1113

Pratham-Mishra04 · 2025-12-17T05:01:35Z

Summary

Added support for Azure transcription and speech capabilities, along with improved handling of non-JSON content types in transport interceptors.

Changes

Implemented speech and transcription functionality for Azure provider
Added transcription support for Mistral provider
Fixed transport interceptor middleware to properly handle multipart/form-data requests by only processing JSON content types
Updated changelog entries to reflect new capabilities

Type of change

Affected areas

How to test

Test the new Azure speech and transcription capabilities:

# Core/Transports
go version
go test ./...

# Test Azure speech endpoint
curl -X POST "http://localhost:8000/v1/audio/speech" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "azure-tts-model",
    "input": "Hello world",
    "voice": "alloy"
  }'

# Test Azure transcription endpoint
curl -X POST "http://localhost:8000/v1/audio/transcriptions" \
  -F "file=@./audio-sample.mp3" \
  -F "model=azure-whisper"

Breaking changes

No

Related issues

Adds support for Azure speech and transcription capabilities, completing the audio feature set across providers.

Security considerations

No new security implications. Uses existing authentication mechanisms.

Checklist

I added/updated tests where appropriate
I verified builds succeed (Go and UI)

coderabbitai · 2025-12-17T05:01:40Z

Caution

Review failed

The pull request is closed.

📝 Walkthrough

Summary by CodeRabbit

New Features
- Added transcription and speech support for Azure
- Added transcription support for Mistral
- Added prompt caching support for Anthropic and Bedrock
- Added reasoning support for Bedrock Nova 2 models
- Added provider key configuration option for batch API selection
- Added cost recalculation for logs
Documentation
- Updated provider feature support matrix for Azure
- Updated integration documentation for key management
- Updated guides for reasoning and embeddings
Improvements
- Enhanced budget evaluation with provider-scoped context
- Optimized request middleware for JSON content handling
Style
- Converted Nebius icon to inline SVG format
Chores
- Increased provider timeout limit to 48 hours

_{✏️ Tip: You can customize this high-level summary in your review settings.}

Walkthrough

Implements Azure OpenAI-compatible speech/transcription (including streaming), centralizes Azure request handling, restricts transport plugin interception to JSON requests, updates changelogs and docs, fixes JSX SVG attribute names, and replaces the Nebius static image with an inline SVG.

Changes

Cohort / File(s)	Change Summary
Changelog & Docs `core/changelog.md`, `transports/changelog.md`, `docs/features/unified-interface.mdx`	Updated changelog wording and feature entries; updated Azure capability matrix in docs to reflect new TTS/STT support.
Azure Speech & Transcription `core/providers/azure/azure.go`	Replaced stubs with full Speech, SpeechStream (SSE-style streaming), and Transcription implementations; added `completeRequest` helper for request construction, auth, latency, error parsing, and response decoding; propagate deployment/model metadata.
Transport Middleware `transports/bifrost-http/handlers/middlewares.go`	Refactored TransportInterceptorMiddleware to only parse/intercept JSON requests, allow plugin modifications to body/headers, capture per-plugin context values, and bypass non-JSON requests.
UI SVG Fixes `ui/components/ui/icons.tsx`	Converted SVG gradient attributes from `stop-color`/`stop-opacity` to React-friendly `stopColor`/`stopOpacity`.
Nebius Icon Inline SVG `ui/lib/constants/icons.tsx`	Replaced `nebius` `<img>` usage with an inline, accessible SVG component (detailed paths, fills, viewBox) using existing size resolution.
Governance: Provider-aware Budgets `plugins/governance/resolver.go`, `plugins/governance/store.go`, `plugins/governance/tracker.go`	Threaded provider context through budget checks and updates: updated signatures to accept provider, filtered budgets by provider when collecting/updating, and passed provider to store-level CheckBudget/UpdateBudget calls.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant AzureProvider
    participant RequestHandler as completeRequest
    participant OpenAIHandler
    participant AzureService as Azure OpenAI
    participant StreamConsumer as ClientStream

    Client->>AzureProvider: SpeechStream(ctx, input)
    AzureProvider->>AzureProvider: Validate key & resolve deployment
    AzureProvider->>RequestHandler: Build request (endpoint, headers, body)
    RequestHandler->>AzureService: fasthttp.Do() (bearer / api-key)
    AzureService-->>RequestHandler: SSE stream (events / chunks)
    RequestHandler->>RequestHandler: Parse SSE, detect errors
    RequestHandler->>StreamConsumer: Emit audio chunks / metadata
    RequestHandler->>StreamConsumer: Emit final done signal
    StreamConsumer-->>Client: Streaming audio + metadata

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Pay close attention to: Azure SpeechStream SSE parsing and error handling; correctness of completeRequest auth/latency/error semantics; provider-aware budget filtering in governance store and signature changes; middleware JSON handling and safe fallback on invalid JSON; Nebius SVG sizing/accessibility.

Possibly related PRs

feat: add transcription and speech support for Azure #1113 — Appears to implement the same Azure speech/transcription codepaths and related middleware/changelog changes.

Suggested reviewers

akshaydeo
danpiths

Poem

🐇 I hopped to add a streaming stream,

Azure sings and icons gleam,
JSON hops through plugin gates,
Budgets now know provider states,
Nebius shines in vector dream ✨

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title 'feat: add transcription and speech support for Azure' clearly and specifically summarizes the main change of implementing audio capabilities for the Azure provider.
Description check	✅ Passed	The PR description includes all key sections: summary, changes, type of change, affected areas, testing instructions, breaking changes, and checklist items completed.

📜 Recent review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1460f5d and 4490ef2.

📒 Files selected for processing (10)

core/changelog.md (1 hunks)
core/providers/azure/azure.go (2 hunks)
docs/features/unified-interface.mdx (1 hunks)
plugins/governance/resolver.go (2 hunks)
plugins/governance/store.go (5 hunks)
plugins/governance/tracker.go (1 hunks)
transports/bifrost-http/handlers/middlewares.go (1 hunks)
transports/changelog.md (1 hunks)
ui/components/ui/icons.tsx (2 hunks)
ui/lib/constants/icons.tsx (1 hunks)

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Pratham-Mishra04 · 2025-12-17T05:01:55Z

This stack of pull requests is managed by Graphite. Learn more about stacking.

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (3)

transports/bifrost-http/handlers/middlewares.go (1)
82-140: JSON-only interception logic is sound; consider case-insensitive Content-Type

The interceptor now:

Skips all plugins when there are none or governance isn’t loaded.

Parses headers once and tracks original header names so plugins can remove headers.

Only runs for requests whose Content-Type starts with application/json, avoiding multipart/form-data and other streaming bodies.

Safely skips interception on invalid JSON (logs a warning and calls next(ctx)).

Applies per-plugin header/body mutations and propagates plugin context user values back into ctx.

This is a solid fix for the multipart/body-consumption issue. One small robustness improvement: make the JSON check case-insensitive and tolerant of charset variations:
- contentType := string(ctx.Request.Header.Peek("Content-Type"))
- isJSONRequest := strings.HasPrefix(contentType, "application/json")
+ contentType := strings.ToLower(string(ctx.Request.Header.Peek("Content-Type")))
+ isJSONRequest := strings.HasPrefix(contentType, "application/json")
This avoids missing JSON bodies when clients send Application/JSON or similar variants.
core/providers/azure/azure.go (2)
783-821: Update comment to reflect that Azure Speech is now supported

The implementation now fully supports non-streaming speech for Azure via openai.HandleOpenAISpeechRequest and sets ModelRequested/ModelDeployment in ExtraFields, but the leading comment still says “Speech is not supported by the Azure provider.”

Please update the comment to reflect the current behavior, e.g.:
-// Speech is not supported by the Azure provider.
+// Speech performs a text-to-speech request against the Azure OpenAI-compatible audio/speech endpoint.
1075-1113: Transcription support implemented; comment should be updated

Transcription now:

Validates the Azure key,

Resolves the deployment and api-version,

Hits /openai/deployments/{deployment}/audio/transcriptions,

Delegates to openai.HandleOpenAITranscriptionRequest, and

Sets ModelRequested and ModelDeployment in ExtraFields.

However, the preceding comment still says “Transcription is not supported by the Azure provider.” Please update the comment to describe the actual behavior and keep the public surface accurate.

📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f7fd3bf and 1460f5d.

📒 Files selected for processing (7)

core/changelog.md (1 hunks)
core/providers/azure/azure.go (2 hunks)
docs/features/unified-interface.mdx (1 hunks)
transports/bifrost-http/handlers/middlewares.go (1 hunks)
transports/changelog.md (1 hunks)
ui/components/ui/icons.tsx (2 hunks)
ui/lib/constants/icons.tsx (1 hunks)

👮 Files not reviewed due to content moderation or server errors (1)

ui/lib/constants/icons.tsx

🧰 Additional context used

📓 Path-based instructions (1)

**

⚙️ CodeRabbit configuration file

always check the stack if there is one for the current PR. do not give localized reviews for the PR, always see all changes in the light of the whole stack of PRs (if there is a stack, if there is no stack you can continue to make localized suggestions/reviews)

Files:

transports/bifrost-http/handlers/middlewares.go
ui/lib/constants/icons.tsx
core/changelog.md
ui/components/ui/icons.tsx
docs/features/unified-interface.mdx
core/providers/azure/azure.go
transports/changelog.md

🧠 Learnings (2)

📚 Learning: 2025-12-09T17:07:42.007Z

Learnt from: qwerty-dvorak
Repo: maximhq/bifrost PR: 1006
File: core/schemas/account.go:9-18
Timestamp: 2025-12-09T17:07:42.007Z
Learning: In core/schemas/account.go, the HuggingFaceKeyConfig field within the Key struct is currently unused and reserved for future Hugging Face inference endpoint deployments. Do not flag this field as missing from OpenAPI documentation or require its presence in the API spec until the feature is actively implemented and used. When the feature is added, update the OpenAPI docs accordingly; otherwise, treat this field as non-breaking and not part of the current API surface.

Applied to files:

transports/bifrost-http/handlers/middlewares.go
core/providers/azure/azure.go

📚 Learning: 2025-12-12T08:25:02.629Z

Learnt from: Pratham-Mishra04
Repo: maximhq/bifrost PR: 1000
File: transports/bifrost-http/integrations/router.go:709-712
Timestamp: 2025-12-12T08:25:02.629Z
Learning: In transports/bifrost-http/**/*.go, update streaming response handling to align with OpenAI Responses API: use typed SSE events such as response.created, response.output_text.delta, response.done, etc., and do not rely on the legacy data: [DONE] termination marker. Note that data: [DONE] is only used by the older Chat Completions and Text Completions streaming APIs. Ensure parsers, writers, and tests distinguish SSE events from the [DONE] sentinel and handle each event type accordingly for correct stream termination and progress updates.

Applied to files:

transports/bifrost-http/handlers/middlewares.go

🧬 Code graph analysis (1)

transports/bifrost-http/handlers/middlewares.go (3)

core/schemas/context.go (1)

NewBifrostContextWithTimeout (69-72)

examples/plugins/hello-world/main.go (2)

TransportInterceptor (18-22)

GetName (14-16)

transports/bifrost-http/handlers/utils.go (1)

SendError (35-44)

⏰ Context from checks skipped due to timeout of 900000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (11)

GitHub Check: Graphite / mergeability_check
GitHub Check: Graphite / mergeability_check
GitHub Check: Graphite / mergeability_check
GitHub Check: Graphite / mergeability_check
GitHub Check: Graphite / mergeability_check
GitHub Check: Graphite / mergeability_check
GitHub Check: Graphite / mergeability_check
GitHub Check: Graphite / mergeability_check
GitHub Check: Graphite / mergeability_check
GitHub Check: Graphite / mergeability_check
GitHub Check: Graphite / mergeability_check

🔇 Additional comments (8)

ui/components/ui/icons.tsx (2)

630-641: stopOpacity JSX attribute fix looks correct

Switching stop-opacity to stopOpacity on the gradient stops is the right move for React SVG props and avoids DOM property warnings; no issues spotted.

1759-1773: MCPIcon component unchanged functionally

Only structural/trailing newline effects here; the MCPIcon SVG remains valid and self-contained.

transports/changelog.md (1)

1-8: Changelog entries correctly reflect new transport behavior

The tense fix and added entries for Azure speech/transcription, Mistral transcription, Go version bump, and docs updates align with the implementation in this and related files.

core/providers/azure/azure.go (2)

71-140: completeRequest helper nicely centralizes Azure/OpenAI-style calls

The new completeRequest function cleanly:

Builds the correct URL with api-version handling, including a special case for openai/v1/responses using preview.

Handles Azure auth (Bearer via AzureAuthorizationTokenKey vs api-key) and Anthropic-over-Azure via x-api-key + anthropic-version.

Reuses SetExtraHeaders, MakeRequestWithContext, and CheckAndDecodeBody, then returns a copied body to avoid fasthttp buffer pitfalls.

Centralizes error parsing via openai.ParseOpenAIError and wraps decode failures in a Bifrost error.

The refactors in TextCompletion, ChatCompletion, Responses, and Embedding that route through this helper significantly reduce duplication and make the behavior consistent across request types.

823-1073: Azure TTS streaming implementation looks correct; watch SSE framing and cleanup semantics

The new SpeechStream implementation:

Validates key/config and resolves the deployment.

Handles Azure auth (Bearer or api-key) and builds the /audio/speech?api-version=... URL.

Sets StreamBody = true and configures appropriate SSE headers (Accept: text/event-stream, Cache-Control: no-cache, Accept-Encoding: identity).

Uses CheckContextAndGetRequestBody with an OpenAI-compatible speech request, enabling stream_format = "sse" and swapping model for deployment.

Maps network/request errors to structured BifrostError values, including cancellation and timeout.

Parses the SSE stream manually using a rolling buffer, detecting [DONE], attempting to interpret JSON frames as Bifrost errors, and otherwise treating frames as raw audio bytes.

Emits BifrostSpeechStreamResponse deltas with Audio, ChunkIndex, per-chunk latency, and final Type: Done with end-to-end latency and optional raw request.

This is a good fit for Azure's binary-audio SSE behavior and aligns with the speech feature flags exposed in docs and changelog. Ensure via tests against the real Azure endpoint that:

Events are indeed delimited by \n\n (no \r\n\r\n mismatch), and

Non-error SSE frames never come back as JSON payloads that would be misinterpreted as BifrostError.

If those assumptions hold, this streaming path should be robust.

docs/features/unified-interface.mdx (1)

88-106: Azure capability row matches new backend support

Marking Azure as:

✅ for TTS and TTS (stream) aligns with Speech and SpeechStream now being implemented.

✅ for STT and ❌ for STT (stream) aligns with Transcription being implemented while TranscriptionStream still returns unsupported.

The rest of the matrix remains consistent with the existing provider implementations.

core/changelog.md (1)

1-6: Core changelog entries are consistent with provider features

The tense correction and new entries for:

handling HTML/empty provider responses,

Mistral transcription support, and

Azure transcription + speech support

line up with the corresponding code in the providers and transport layers, and with the unified-interface documentation updates.

transports/bifrost-http/handlers/middlewares.go (1)

98-118: No type mismatch exists; *fasthttp.RequestCtx implements context.Context

The fasthttp test code explicitly declares var _ context.Context = &RequestCtx{}, which means *fasthttp.RequestCtx does implement the context.Context interface. The code at line 100 will compile and run without errors.

However, note that fasthttp doesn't support cancellation, so Deadline, Done, and Err are no-ops. The context passed here will work for value storage and basic context operations, but timeout/cancellation semantics may be incomplete.

Likely an incorrect or invalid review comment.

akshaydeo · 2025-12-17T15:51:12Z

Merge activity

Dec 17, 3:51 PM UTC: A user started a stack merge that includes this pull request via Graphite.
Dec 17, 3:54 PM UTC: @akshaydeo merged this pull request with Graphite.

Pratham-Mishra04 requested review from akshaydeo and danpiths December 17, 2025 05:01

Pratham-Mishra04 mentioned this pull request Dec 17, 2025

feat: add handling for HTML and empty responses from providers #1105

Merged

8 tasks

This was referenced Dec 17, 2025

refactor: openai and mistral audio refactored #1112

Merged

feat: added audio support to chat completions #1114

Merged

Pratham-Mishra04 force-pushed the 12-16-feat_added_audio_support_in_azure branch from c3f7d77 to 39a34e8 Compare December 17, 2025 07:59

Pratham-Mishra04 force-pushed the 12-16-refactor_openai_and_mistral_audio_refactor branch from c6d091a to f7fd3bf Compare December 17, 2025 08:12

Pratham-Mishra04 force-pushed the 12-16-feat_added_audio_support_in_azure branch from 39a34e8 to 1460f5d Compare December 17, 2025 08:12

Pratham-Mishra04 marked this pull request as ready for review December 17, 2025 08:13

coderabbitai bot reviewed Dec 17, 2025

View reviewed changes

coderabbitai bot approved these changes Dec 17, 2025

View reviewed changes

Pratham-Mishra04 force-pushed the 12-16-feat_added_audio_support_in_azure branch from 1460f5d to bcc2067 Compare December 17, 2025 15:42

Pratham-Mishra04 force-pushed the 12-16-refactor_openai_and_mistral_audio_refactor branch from f7fd3bf to 464bc24 Compare December 17, 2025 15:42

feat: added audio support in azure

4490ef2

Pratham-Mishra04 force-pushed the 12-16-refactor_openai_and_mistral_audio_refactor branch from 464bc24 to e8b50ea Compare December 17, 2025 15:48

Pratham-Mishra04 force-pushed the 12-16-feat_added_audio_support_in_azure branch from bcc2067 to 4490ef2 Compare December 17, 2025 15:48

akshaydeo changed the base branch from 12-16-refactor_openai_and_mistral_audio_refactor to graphite-base/1113 December 17, 2025 15:53

akshaydeo changed the base branch from graphite-base/1113 to main December 17, 2025 15:53

akshaydeo merged commit 3e16ec6 into main Dec 17, 2025
2 checks passed

akshaydeo deleted the 12-16-feat_added_audio_support_in_azure branch December 17, 2025 15:54

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: add transcription and speech support for Azure #1113

feat: add transcription and speech support for Azure #1113

Uh oh!

Pratham-Mishra04 commented Dec 17, 2025

Uh oh!

coderabbitai bot commented Dec 17, 2025 •

edited

Loading

Review failed

Summary by CodeRabbit

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

Uh oh!

Pratham-Mishra04 commented Dec 17, 2025 •

edited

Loading

Uh oh!

coderabbitai bot left a comment

Uh oh!

akshaydeo commented Dec 17, 2025 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

feat: add transcription and speech support for Azure #1113

feat: add transcription and speech support for Azure #1113

Uh oh!

Conversation

Pratham-Mishra04 commented Dec 17, 2025

Summary

Changes

Type of change

Affected areas

How to test

Breaking changes

Related issues

Security considerations

Checklist

Uh oh!

coderabbitai bot commented Dec 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review failed

Summary by CodeRabbit

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

Pre-merge checks and finishing touches

Uh oh!

Pratham-Mishra04 commented Dec 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

akshaydeo commented Dec 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Merge activity

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

coderabbitai bot commented Dec 17, 2025 •

edited

Loading

Pratham-Mishra04 commented Dec 17, 2025 •

edited

Loading

akshaydeo commented Dec 17, 2025 •

edited

Loading