-
Notifications
You must be signed in to change notification settings - Fork 193
feat: add transcription and speech support for Azure #1113
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Caution Review failedThe pull request is closed. 📝 WalkthroughSummary by CodeRabbit
✏️ Tip: You can customize this high-level summary in your review settings. WalkthroughImplements Azure OpenAI-compatible speech/transcription (including streaming), centralizes Azure request handling, restricts transport plugin interception to JSON requests, updates changelogs and docs, fixes JSX SVG attribute names, and replaces the Nebius static image with an inline SVG. Changes
Sequence Diagram(s)sequenceDiagram
participant Client
participant AzureProvider
participant RequestHandler as completeRequest
participant OpenAIHandler
participant AzureService as Azure OpenAI
participant StreamConsumer as ClientStream
Client->>AzureProvider: SpeechStream(ctx, input)
AzureProvider->>AzureProvider: Validate key & resolve deployment
AzureProvider->>RequestHandler: Build request (endpoint, headers, body)
RequestHandler->>AzureService: fasthttp.Do() (bearer / api-key)
AzureService-->>RequestHandler: SSE stream (events / chunks)
RequestHandler->>RequestHandler: Parse SSE, detect errors
RequestHandler->>StreamConsumer: Emit audio chunks / metadata
RequestHandler->>StreamConsumer: Emit final done signal
StreamConsumer-->>Client: Streaming audio + metadata
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes
Possibly related PRs
Suggested reviewers
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
📜 Recent review detailsConfiguration used: Organization UI Review profile: CHILL Plan: Pro 📒 Files selected for processing (10)
Comment |
This stack of pull requests is managed by Graphite. Learn more about stacking. |
c3f7d77 to
39a34e8
Compare
c6d091a to
f7fd3bf
Compare
39a34e8 to
1460f5d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (3)
transports/bifrost-http/handlers/middlewares.go (1)
82-140: JSON-only interception logic is sound; consider case-insensitive Content-TypeThe interceptor now:
- Skips all plugins when there are none or governance isn’t loaded.
- Parses headers once and tracks original header names so plugins can remove headers.
- Only runs for requests whose
Content-Typestarts withapplication/json, avoiding multipart/form-data and other streaming bodies.- Safely skips interception on invalid JSON (logs a warning and calls
next(ctx)).- Applies per-plugin header/body mutations and propagates plugin context user values back into
ctx.This is a solid fix for the multipart/body-consumption issue. One small robustness improvement: make the JSON check case-insensitive and tolerant of charset variations:
- contentType := string(ctx.Request.Header.Peek("Content-Type")) - isJSONRequest := strings.HasPrefix(contentType, "application/json") + contentType := strings.ToLower(string(ctx.Request.Header.Peek("Content-Type"))) + isJSONRequest := strings.HasPrefix(contentType, "application/json")This avoids missing JSON bodies when clients send
Application/JSONor similar variants.core/providers/azure/azure.go (2)
783-821: Update comment to reflect that Azure Speech is now supportedThe implementation now fully supports non-streaming speech for Azure via
openai.HandleOpenAISpeechRequestand setsModelRequested/ModelDeploymentinExtraFields, but the leading comment still says “Speech is not supported by the Azure provider.”Please update the comment to reflect the current behavior, e.g.:
-// Speech is not supported by the Azure provider. +// Speech performs a text-to-speech request against the Azure OpenAI-compatible audio/speech endpoint.
1075-1113: Transcription support implemented; comment should be updated
Transcriptionnow:
- Validates the Azure key,
- Resolves the deployment and
api-version,- Hits
/openai/deployments/{deployment}/audio/transcriptions,- Delegates to
openai.HandleOpenAITranscriptionRequest, and- Sets
ModelRequestedandModelDeploymentinExtraFields.However, the preceding comment still says “Transcription is not supported by the Azure provider.” Please update the comment to describe the actual behavior and keep the public surface accurate.
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (7)
core/changelog.md(1 hunks)core/providers/azure/azure.go(2 hunks)docs/features/unified-interface.mdx(1 hunks)transports/bifrost-http/handlers/middlewares.go(1 hunks)transports/changelog.md(1 hunks)ui/components/ui/icons.tsx(2 hunks)ui/lib/constants/icons.tsx(1 hunks)
👮 Files not reviewed due to content moderation or server errors (1)
- ui/lib/constants/icons.tsx
🧰 Additional context used
📓 Path-based instructions (1)
**
⚙️ CodeRabbit configuration file
always check the stack if there is one for the current PR. do not give localized reviews for the PR, always see all changes in the light of the whole stack of PRs (if there is a stack, if there is no stack you can continue to make localized suggestions/reviews)
Files:
transports/bifrost-http/handlers/middlewares.goui/lib/constants/icons.tsxcore/changelog.mdui/components/ui/icons.tsxdocs/features/unified-interface.mdxcore/providers/azure/azure.gotransports/changelog.md
🧠 Learnings (2)
📚 Learning: 2025-12-09T17:07:42.007Z
Learnt from: qwerty-dvorak
Repo: maximhq/bifrost PR: 1006
File: core/schemas/account.go:9-18
Timestamp: 2025-12-09T17:07:42.007Z
Learning: In core/schemas/account.go, the HuggingFaceKeyConfig field within the Key struct is currently unused and reserved for future Hugging Face inference endpoint deployments. Do not flag this field as missing from OpenAPI documentation or require its presence in the API spec until the feature is actively implemented and used. When the feature is added, update the OpenAPI docs accordingly; otherwise, treat this field as non-breaking and not part of the current API surface.
Applied to files:
transports/bifrost-http/handlers/middlewares.gocore/providers/azure/azure.go
📚 Learning: 2025-12-12T08:25:02.629Z
Learnt from: Pratham-Mishra04
Repo: maximhq/bifrost PR: 1000
File: transports/bifrost-http/integrations/router.go:709-712
Timestamp: 2025-12-12T08:25:02.629Z
Learning: In transports/bifrost-http/**/*.go, update streaming response handling to align with OpenAI Responses API: use typed SSE events such as response.created, response.output_text.delta, response.done, etc., and do not rely on the legacy data: [DONE] termination marker. Note that data: [DONE] is only used by the older Chat Completions and Text Completions streaming APIs. Ensure parsers, writers, and tests distinguish SSE events from the [DONE] sentinel and handle each event type accordingly for correct stream termination and progress updates.
Applied to files:
transports/bifrost-http/handlers/middlewares.go
🧬 Code graph analysis (1)
transports/bifrost-http/handlers/middlewares.go (3)
core/schemas/context.go (1)
NewBifrostContextWithTimeout(69-72)examples/plugins/hello-world/main.go (2)
TransportInterceptor(18-22)GetName(14-16)transports/bifrost-http/handlers/utils.go (1)
SendError(35-44)
⏰ Context from checks skipped due to timeout of 900000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (11)
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
🔇 Additional comments (8)
ui/components/ui/icons.tsx (2)
630-641: stopOpacity JSX attribute fix looks correctSwitching
stop-opacitytostopOpacityon the gradient stops is the right move for React SVG props and avoids DOM property warnings; no issues spotted.
1759-1773: MCPIcon component unchanged functionallyOnly structural/trailing newline effects here; the MCPIcon SVG remains valid and self-contained.
transports/changelog.md (1)
1-8: Changelog entries correctly reflect new transport behaviorThe tense fix and added entries for Azure speech/transcription, Mistral transcription, Go version bump, and docs updates align with the implementation in this and related files.
core/providers/azure/azure.go (2)
71-140:completeRequesthelper nicely centralizes Azure/OpenAI-style callsThe new
completeRequestfunction cleanly:
- Builds the correct URL with
api-versionhandling, including a special case foropenai/v1/responsesusingpreview.- Handles Azure auth (Bearer via
AzureAuthorizationTokenKeyvsapi-key) and Anthropic-over-Azure viax-api-key+anthropic-version.- Reuses
SetExtraHeaders,MakeRequestWithContext, andCheckAndDecodeBody, then returns a copied body to avoid fasthttp buffer pitfalls.- Centralizes error parsing via
openai.ParseOpenAIErrorand wraps decode failures in a Bifrost error.The refactors in
TextCompletion,ChatCompletion,Responses, andEmbeddingthat route through this helper significantly reduce duplication and make the behavior consistent across request types.
823-1073: Azure TTS streaming implementation looks correct; watch SSE framing and cleanup semanticsThe new
SpeechStreamimplementation:
- Validates key/config and resolves the deployment.
- Handles Azure auth (Bearer or
api-key) and builds the/audio/speech?api-version=...URL.- Sets
StreamBody = trueand configures appropriate SSE headers (Accept: text/event-stream,Cache-Control: no-cache,Accept-Encoding: identity).- Uses
CheckContextAndGetRequestBodywith an OpenAI-compatible speech request, enablingstream_format = "sse"and swapping model for deployment.- Maps network/request errors to structured
BifrostErrorvalues, including cancellation and timeout.- Parses the SSE stream manually using a rolling buffer, detecting
[DONE], attempting to interpret JSON frames as Bifrost errors, and otherwise treating frames as raw audio bytes.- Emits
BifrostSpeechStreamResponsedeltas withAudio,ChunkIndex, per-chunk latency, and finalType: Donewith end-to-end latency and optional raw request.This is a good fit for Azure's binary-audio SSE behavior and aligns with the speech feature flags exposed in docs and changelog. Ensure via tests against the real Azure endpoint that:
- Events are indeed delimited by
\n\n(no\r\n\r\nmismatch), and- Non-error SSE frames never come back as JSON payloads that would be misinterpreted as
BifrostError.If those assumptions hold, this streaming path should be robust.
docs/features/unified-interface.mdx (1)
88-106: Azure capability row matches new backend supportMarking Azure as:
- ✅ for TTS and TTS (stream) aligns with
SpeechandSpeechStreamnow being implemented.- ✅ for STT and ❌ for STT (stream) aligns with
Transcriptionbeing implemented whileTranscriptionStreamstill returns unsupported.The rest of the matrix remains consistent with the existing provider implementations.
core/changelog.md (1)
1-6: Core changelog entries are consistent with provider featuresThe tense correction and new entries for:
- handling HTML/empty provider responses,
- Mistral transcription support, and
- Azure transcription + speech support
line up with the corresponding code in the providers and transport layers, and with the unified-interface documentation updates.
transports/bifrost-http/handlers/middlewares.go (1)
98-118: No type mismatch exists;*fasthttp.RequestCtximplementscontext.ContextThe fasthttp test code explicitly declares
var _ context.Context = &RequestCtx{}, which means*fasthttp.RequestCtxdoes implement thecontext.Contextinterface. The code at line 100 will compile and run without errors.However, note that fasthttp doesn't support cancellation, so Deadline, Done, and Err are no-ops. The context passed here will work for value storage and basic context operations, but timeout/cancellation semantics may be incomplete.
Likely an incorrect or invalid review comment.
1460f5d to
bcc2067
Compare
f7fd3bf to
464bc24
Compare
464bc24 to
e8b50ea
Compare
bcc2067 to
4490ef2
Compare
Merge activity
|

Summary
Added support for Azure transcription and speech capabilities, along with improved handling of non-JSON content types in transport interceptors.
Changes
Type of change
Affected areas
How to test
Test the new Azure speech and transcription capabilities:
Breaking changes
Related issues
Adds support for Azure speech and transcription capabilities, completing the audio feature set across providers.
Security considerations
No new security implications. Uses existing authentication mechanisms.
Checklist