Skip to content

Conversation

@qwerty-dvorak
Copy link
Contributor

@qwerty-dvorak qwerty-dvorak commented Dec 5, 2025

Summary

Adding Hugging Face inference provider.

Changes

  • Added hugginface folder in core/providers with huggingface_test as needed
  • Added changes to makefile which implement search through of local versions of binaries instead of search in global path namespace
  • Updated documentation for adding provider

Type of change

  • Bug fix
  • Feature
  • Refactor
  • Documentation
  • Chore/CI

Affected areas

  • Core (Go)
  • Transports (HTTP)
  • Providers/Integrations
  • Plugins
  • UI (Next.js)
  • Docs

How to test

# Providers
export HUGGING_FACE_API_KEY="" && make test-core PROVIDER=huggingface

# UI
cd ui
pnpm i || npm i
pnpm test || npm test
pnpm build || npm run build

Added new env variable for hugging face
HUGGING_FACE_API_KEY=""

Screenshots/Recordings

image.png

Breaking changes

  • Yes
  • No

Related issues

Closes #430

Checklist

  • I read docs/contributing/README.md and followed the guidelines
  • I added/updated tests where appropriate
  • I updated documentation where needed
  • I verified builds succeed (Go and UI)
  • I verified the CI pipeline passes locally if applicable

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Dec 5, 2025

Warning

Rate limit exceeded

@qwerty-dvorak has exceeded the limit for the number of commits or files that can be reviewed per hour. Please wait 6 minutes and 4 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

📥 Commits

Reviewing files that changed from the base of the PR and between 3e8d6d7 and 60e18dc.

⛔ Files ignored due to path filters (6)
  • core/internal/testutil/scenarios/media/Numbers_And_Punctuation.mp3 is excluded by !**/*.mp3
  • core/internal/testutil/scenarios/media/RoundTrip_Basic_MP3.mp3 is excluded by !**/*.mp3
  • core/internal/testutil/scenarios/media/RoundTrip_Medium_MP3.mp3 is excluded by !**/*.mp3
  • core/internal/testutil/scenarios/media/RoundTrip_Technical_MP3.mp3 is excluded by !**/*.mp3
  • core/internal/testutil/scenarios/media/Technical_Terms.mp3 is excluded by !**/*.mp3
  • ui/package-lock.json is excluded by !**/package-lock.json
📒 Files selected for processing (38)
  • .github/workflows/pr-tests.yml (1 hunks)
  • .github/workflows/release-pipeline.yml (4 hunks)
  • core/bifrost.go (2 hunks)
  • core/changelog.md (1 hunks)
  • core/internal/testutil/account.go (4 hunks)
  • core/internal/testutil/responses_stream.go (1 hunks)
  • core/internal/testutil/transcription.go (6 hunks)
  • core/providers/gemini/speech.go (1 hunks)
  • core/providers/gemini/transcription.go (2 hunks)
  • core/providers/gemini/utils.go (0 hunks)
  • core/providers/huggingface/chat.go (1 hunks)
  • core/providers/huggingface/embedding.go (1 hunks)
  • core/providers/huggingface/huggingface.go (1 hunks)
  • core/providers/huggingface/huggingface_test.go (1 hunks)
  • core/providers/huggingface/models.go (1 hunks)
  • core/providers/huggingface/responses.go (1 hunks)
  • core/providers/huggingface/speech.go (1 hunks)
  • core/providers/huggingface/transcription.go (1 hunks)
  • core/providers/huggingface/types.go (1 hunks)
  • core/providers/huggingface/utils.go (1 hunks)
  • core/providers/openai/openai.go (1 hunks)
  • core/providers/utils/audio.go (1 hunks)
  • core/providers/utils/utils.go (1 hunks)
  • core/schemas/account.go (2 hunks)
  • core/schemas/bifrost.go (3 hunks)
  • core/schemas/mux.go (3 hunks)
  • core/schemas/transcriptions.go (1 hunks)
  • docs/apis/openapi.json (1 hunks)
  • docs/contributing/adding-a-provider.mdx (1 hunks)
  • docs/docs.json (1 hunks)
  • docs/features/providers/huggingface.mdx (1 hunks)
  • docs/features/providers/supported-providers.mdx (3 hunks)
  • transports/changelog.md (1 hunks)
  • transports/config.schema.json (2 hunks)
  • ui/README.md (6 hunks)
  • ui/lib/constants/config.ts (2 hunks)
  • ui/lib/constants/icons.tsx (1 hunks)
  • ui/lib/constants/logs.ts (2 hunks)
📝 Walkthrough

Walkthrough

Adds HuggingFace as a native provider to Bifrost, implementing chat completions, embeddings, text generation, speech synthesis, and transcription through the HuggingFace Inference API with multi-backend routing, model caching, and full OpenAI-compatible request/response translation.

Changes

Cohort / File(s) Summary
HuggingFace Provider Core
core/providers/huggingface/huggingface.go, types.go, utils.go, chat.go, embedding.go, speech.go, transcription.go, responses.go, models.go, huggingface_test.go
Full provider implementation including request/response converters, type definitions for HuggingFace API schemas, model listing with provider mapping cache and retry logic, support for chat completions, embeddings, speech synthesis, transcription, and responses API with streaming variants.
Schema & Configuration Updates
core/schemas/bifrost.go, account.go, transcriptions.go, core/bifrost.go
Added HuggingFace ModelProvider constant, registered provider in factory, extended Key struct with HuggingFaceKeyConfig, added transcription parameters (MaxLength, MinLength, MaxNewTokens, MinNewTokens).
Test Infrastructure
core/internal/testutil/account.go, responses_stream.go, transcription.go
Extended test utilities with HuggingFace provider configuration, test account setup with model aliases and scenarios, fixture-based audio loading for transcription tests, increased streaming response loop threshold from 100 to 300.
UI Integration
ui/lib/constants/config.ts, icons.tsx, logs.ts, ui/README.md
Added HuggingFace to provider placeholders, API key requirements, provider icons (SVG logo), provider labels, and updated README with Redux Toolkit/RTK Query architecture documentation.
Documentation & Configuration
docs/features/providers/huggingface.mdx, supported-providers.mdx, docs/contributing/adding-a-provider.mdx, docs/docs.json, docs/apis/openapi.json, transports/config.schema.json
New HuggingFace provider guide documenting inference routing, model aliasing, and constraints; updated provider documentation structure; added comprehensive contributor guide for adding providers; updated OpenAPI schema and transport config.
Workflow & CI/CD
.github/workflows/pr-tests.yml, .github/workflows/release-pipeline.yml
Added HUGGING_FACE_API_KEY environment variable to PR test and release pipeline jobs.
Utilities & Shared Updates
core/providers/utils/audio.go, utils.go, core/providers/gemini/speech.go, gemini/transcription.go, gemini/utils.go, core/providers/openai/openai.go, core/providers/utils/utils.go, core/schemas/mux.go
Extracted audio MIME type detection to shared utility, updated Gemini provider to use shared audio detection, extended OpenAI streaming to handle content/reasoning, added HuggingFace to done-marker provider list, updated streaming logic for reasoning-only chunks.
Changelog
core/changelog.md, transports/changelog.md
Added HuggingFace provider feature entry documenting chat, responses, TTS, and speech synthesis support.

Sequence Diagram

sequenceDiagram
    participant Client
    participant HFProvider as HuggingFace<br/>Provider
    participant ModelCache as Model Provider<br/>Mapping Cache
    participant HFInference as HuggingFace<br/>Inference API
    participant HTTPClient as FastHTTP<br/>Client

    Client->>HFProvider: ChatCompletion(modelID, request)
    HFProvider->>ModelCache: Get model mapping for modelID
    alt Cache Hit
        ModelCache-->>HFProvider: {provider, model}
    else Cache Miss
        HFProvider->>HFInference: List models (inference providers)
        HFInference-->>HFProvider: Model list
        HFProvider->>ModelCache: Store mapping
        ModelCache-->>HFProvider: {provider, model}
    end
    
    HFProvider->>HFProvider: buildRequestURL(modelID, provider)
    HFProvider->>HFProvider: ToHuggingFaceChatCompletionRequest(bifrostReq)
    
    HFProvider->>HTTPClient: Execute POST request
    HTTPClient->>HFInference: Send to inference provider
    HFInference-->>HTTPClient: Chat completion response
    HTTPClient-->>HFProvider: Response body + latency
    
    alt Success
        HFProvider->>HFProvider: Parse response
        HFProvider->>HFProvider: Enrich with metadata<br/>(provider, model, latency)
        HFProvider-->>Client: BifrostChatResponse
    else Not Found (404)
        HFProvider->>ModelCache: Clear stale mapping
        HFProvider->>HFInference: Retry List models
        HFInference-->>HFProvider: Updated model list
        loop Retry with new mapping
            HFProvider->>HTTPClient: Execute POST request (retry)
            HTTPClient->>HFInference: Send to new provider
            HFInference-->>HTTPClient: Response
            HTTPClient-->>HFProvider: Response body
        end
        HFProvider-->>Client: BifrostChatResponse
    else Error
        HFProvider->>HFProvider: Decode HuggingFaceError
        HFProvider-->>Client: BifrostError
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Areas requiring extra attention:

  • Model provider mapping cache logic (core/providers/huggingface/utils.go, huggingface.go): Complex caching strategy with 404-triggered refreshes and provider normalization; verify correctness of concurrent access and cache invalidation patterns.
  • Type definitions and JSON marshaling (types.go): Extensive custom JSON marshaling for flexible input shapes (strings, arrays, objects) and enum-like fields (ToolChoice, EarlyStopping); review correctness of unmarshal/marshal round-trips.
  • Request/response converters (chat.go, embedding.go, speech.go, transcription.go): Dense parameter mapping from Bifrost schemas to HuggingFace-specific formats; ensure all conditional logic and nil-handling paths are correct.
  • Streaming and async flows (huggingface.go streaming methods, core/schemas/mux.go reasoning/content gating): New gating logic for reasoning-only chunks and content-delta emission; verify state machine correctness.
  • Test infrastructure changes (testutil/transcription.go): Fixture-based audio loading with runtime path discovery; ensure fixture paths are correct and error handling is robust.
  • Gemini provider audio detection refactoring (gemini/speech.go, gemini/transcription.go, gemini/utils.go): Extraction of audio detection logic requires verification that removed local function behavior matches new shared utility.

Poem

🐰 Whiskers twitching with delight,
HuggingFace joins Bifrost's flight,
Models abundant, from the Hub they came,
Cache and retry played the game,
Chat and embeddings, speech too—
Twenty providers, all on cue! 🤗

Pre-merge checks and finishing touches

❌ Failed checks (1 warning, 1 inconclusive)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 53.85% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
Out of Scope Changes check ❓ Inconclusive Multiple changes appear partially out-of-scope: Makefile updates (not mentioned in issue #430), UI updates (partially covered), Gemini provider modifications (unrelated to HuggingFace), and OpenAI streaming changes (separate enhancement). Clarify whether Makefile/Gemini updates and OpenAI streaming modifications are intentional refactorings or scope creep, and ensure all collateral changes are justified and tested.
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The PR title 'feat: huggingface provider added' clearly and concisely describes the main change: adding HuggingFace as a new provider to the system.
Description check ✅ Passed The PR description follows the template structure with Summary, Changes, Type of change, Affected areas, How to test, Related issues, and Checklist sections mostly complete.
Linked Issues check ✅ Passed The PR implements all core requirements from issue #430: HuggingFace provider registration, chat/text/embedding support, model listing, proper authentication, error handling, configuration, and comprehensive testing.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor Author

This stack of pull requests is managed by Graphite. Learn more about stacking.

@github-actions
Copy link
Contributor

github-actions bot commented Dec 5, 2025

🧪 Test Suite Available

This PR can be tested by a repository admin.

Run tests for PR #1006

@qwerty-dvorak qwerty-dvorak force-pushed the 12-05-feat_huggingface_provider_added branch 2 times, most recently from 135b9b5 to 5baaee2 Compare December 5, 2025 13:52
@qwerty-dvorak qwerty-dvorak marked this pull request as ready for review December 5, 2025 14:05
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 18

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
core/schemas/mux.go (1)

1146-1229: Thought content is emitted twice in the stream when delta contains both Content and Thought.

At lines 1207-1213, the code aggregates both delta.Content and delta.Thought into a single contentDelta string and emits it as OutputTextDelta at line 1218. Then at lines 1343-1377, if hasThought is true, the code separately emits ReasoningSummaryTextDelta containing the thought content again.

Since these are sequential conditions (not mutually exclusive), when a delta contains both fields, thought appears in:

  1. The aggregated OutputTextDelta event (lines 1207-1213, 1218)
  2. The separate ReasoningSummaryTextDelta event (lines 1365-1377)

Clarify the intended behavior: should thought only appear in the reasoning delta event, or is it correct for it to appear in both? If thought should be separate, remove delta.Thought from the aggregation at lines 1207-1213.

🧹 Nitpick comments (6)
docs/features/unified-interface.mdx (1)

88-106: Verify Hugging Face capability matrix matches actual implementation

The Hugging Face row advertises ✅ for Text/Text (stream)/Chat/Chat (stream)/Embeddings/TTS/STT and ❌ for both Responses modes. Please double‑check this against the actual HuggingFace provider implementation (especially Responses and any streaming audio paths) so the matrix doesn’t drift from reality.

core/providers/huggingface/huggingface_test.go (1)

59-62: Move client.Shutdown() inside the subtest or use t.Cleanup.

client.Shutdown() is called unconditionally after t.Run, but if the subtest is run in parallel or skipped, this could cause issues. Consider using t.Cleanup for proper resource cleanup.

+	t.Cleanup(func() {
+		client.Shutdown()
+	})
+
 	t.Run("HuggingFaceTests", func(t *testing.T) {
 		testutil.RunAllComprehensiveTests(t, client, ctx, testConfig)
 	})
-	client.Shutdown()
core/providers/huggingface/speech.go (1)

27-28: Consider avoiding empty Parameters struct allocation.

hfRequest.Parameters is allocated on line 28 even if no parameters are actually mapped. Consider only allocating when there are parameters to set.

 	// Map parameters if present
 	if request.Params != nil {
-		hfRequest.Parameters = &HuggingFaceSpeechParameters{}
-
 		// Map generation parameters from ExtraParams if available
 		if request.Params.ExtraParams != nil {
 			genParams := &HuggingFaceTranscriptionGenerationParameters{}
 
 			// ... parameter mapping ...
 
-			hfRequest.Parameters.GenerationParameters = genParams
+			hfRequest.Parameters = &HuggingFaceSpeechParameters{
+				GenerationParameters: genParams,
+			}
 		}
 	}
core/providers/huggingface/chat.go (1)

36-38: Errors from sonic.Marshal are silently ignored.

Multiple calls to sonic.Marshal discard the error using _. While marshalling simple structs rarely fails, silently ignoring errors could mask issues in edge cases (e.g., cyclic references, unusual types).

Consider logging or returning an error when marshalling fails for better debuggability:

contentJSON, err := sonic.Marshal(*msg.Content.ContentStr)
if err != nil {
    // At minimum, log the error for debugging
    if debug {
        fmt.Printf("[huggingface debug] Failed to marshal content: %v\n", err)
    }
    continue
}

Also applies to: 61-62, 189-194

core/providers/huggingface/huggingface.go (1)

51-85: Typo: "aquire" should be "acquire"

The function names use "aquire" instead of the correct spelling "acquire". While this doesn't affect functionality, it's a code quality issue that should be fixed for consistency and readability.

-func aquireHuggingFaceChatResponse() *HuggingFaceChatResponse {
+func acquireHuggingFaceChatResponse() *HuggingFaceChatResponse {
-func aquireHuggingFaceTranscriptionResponse() *HuggingFaceTranscriptionResponse {
+func acquireHuggingFaceTranscriptionResponse() *HuggingFaceTranscriptionResponse {
-func aquireHuggingFaceSpeechResponse() *HuggingFaceSpeechResponse {
+func acquireHuggingFaceSpeechResponse() *HuggingFaceSpeechResponse {

Don't forget to update the call sites at lines 348, 790, and 861.

core/providers/huggingface/types.go (1)

379-396: Silent error swallowing in UnmarshalJSON

The UnmarshalJSON method silently returns nil when both boolean and string unmarshaling fail, leaving the struct in an uninitialized state. This could mask malformed JSON input.

Consider returning an error when the input doesn't match expected types:

 func (e *HuggingFaceTranscriptionEarlyStopping) UnmarshalJSON(data []byte) error {
+	// Handle null explicitly
+	if string(data) == "null" {
+		return nil
+	}
+
 	// Try boolean first
 	var boolVal bool
 	if err := json.Unmarshal(data, &boolVal); err == nil {
 		e.BoolValue = &boolVal
 		return nil
 	}

 	// Try string
 	var stringVal string
 	if err := json.Unmarshal(data, &stringVal); err == nil {
 		e.StringValue = &stringVal
 		return nil
 	}

-	return nil
+	return fmt.Errorf("early_stopping must be a boolean or string, got: %s", string(data))
 }

This would require adding "fmt" to the imports.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 633a01d and 5baaee2.

📒 Files selected for processing (28)
  • .github/workflows/pr-tests.yml (1 hunks)
  • .github/workflows/release-pipeline.yml (4 hunks)
  • Makefile (8 hunks)
  • core/bifrost.go (2 hunks)
  • core/internal/testutil/account.go (3 hunks)
  • core/internal/testutil/chat_completion_stream.go (1 hunks)
  • core/internal/testutil/responses_stream.go (1 hunks)
  • core/providers/huggingface/chat.go (1 hunks)
  • core/providers/huggingface/embedding.go (1 hunks)
  • core/providers/huggingface/huggingface.go (1 hunks)
  • core/providers/huggingface/huggingface_test.go (1 hunks)
  • core/providers/huggingface/models.go (1 hunks)
  • core/providers/huggingface/speech.go (1 hunks)
  • core/providers/huggingface/transcription.go (1 hunks)
  • core/providers/huggingface/types copy.go (1 hunks)
  • core/providers/huggingface/types.go (1 hunks)
  • core/providers/huggingface/utils.go (1 hunks)
  • core/schemas/account.go (2 hunks)
  • core/schemas/bifrost.go (3 hunks)
  • core/schemas/mux.go (2 hunks)
  • docs/apis/openapi.json (1 hunks)
  • docs/contributing/adding-a-provider.mdx (1 hunks)
  • docs/features/unified-interface.mdx (1 hunks)
  • transports/config.schema.json (2 hunks)
  • ui/README.md (1 hunks)
  • ui/lib/constants/config.ts (2 hunks)
  • ui/lib/constants/icons.tsx (1 hunks)
  • ui/lib/constants/logs.ts (2 hunks)
👮 Files not reviewed due to content moderation or server errors (1)
  • ui/lib/constants/icons.tsx
🧰 Additional context used
📓 Path-based instructions (1)
**

⚙️ CodeRabbit configuration file

always check the stack if there is one for the current PR. do not give localized reviews for the PR, always see all changes in the light of the whole stack of PRs (if there is a stack, if there is no stack you can continue to make localized suggestions/reviews)

Files:

  • transports/config.schema.json
  • core/internal/testutil/responses_stream.go
  • core/schemas/bifrost.go
  • core/providers/huggingface/embedding.go
  • docs/apis/openapi.json
  • core/schemas/mux.go
  • docs/features/unified-interface.mdx
  • core/internal/testutil/chat_completion_stream.go
  • ui/lib/constants/config.ts
  • core/providers/huggingface/types copy.go
  • core/providers/huggingface/models.go
  • ui/README.md
  • core/schemas/account.go
  • core/bifrost.go
  • core/providers/huggingface/transcription.go
  • ui/lib/constants/icons.tsx
  • ui/lib/constants/logs.ts
  • Makefile
  • core/providers/huggingface/huggingface_test.go
  • core/providers/huggingface/speech.go
  • core/internal/testutil/account.go
  • docs/contributing/adding-a-provider.mdx
  • core/providers/huggingface/utils.go
  • core/providers/huggingface/chat.go
  • core/providers/huggingface/huggingface.go
  • core/providers/huggingface/types.go
🧬 Code graph analysis (10)
core/schemas/bifrost.go (1)
ui/lib/types/config.ts (1)
  • ModelProvider (171-174)
core/providers/huggingface/embedding.go (2)
core/schemas/embedding.go (4)
  • BifrostEmbeddingRequest (9-16)
  • BifrostEmbeddingResponse (22-28)
  • EmbeddingData (118-122)
  • EmbeddingStruct (124-128)
core/providers/huggingface/types.go (2)
  • HuggingFaceEmbeddingRequest (278-288)
  • HuggingFaceEmbeddingResponse (299-299)
core/schemas/mux.go (2)
core/schemas/responses.go (2)
  • BifrostResponsesStreamResponse (1422-1460)
  • ResponsesStreamResponseTypeOutputTextDelta (1370-1370)
core/schemas/utils.go (1)
  • Ptr (16-18)
core/internal/testutil/chat_completion_stream.go (1)
core/internal/testutil/utils.go (1)
  • CreateBasicChatMessage (247-254)
core/providers/huggingface/models.go (3)
core/providers/huggingface/types.go (1)
  • HuggingFaceListModelsResponse (21-23)
core/schemas/bifrost.go (13)
  • ModelProvider (32-32)
  • RequestType (86-86)
  • ChatCompletionRequest (92-92)
  • ChatCompletionStreamRequest (93-93)
  • TextCompletionRequest (90-90)
  • TextCompletionStreamRequest (91-91)
  • ResponsesRequest (94-94)
  • ResponsesStreamRequest (95-95)
  • EmbeddingRequest (96-96)
  • SpeechRequest (97-97)
  • SpeechStreamRequest (98-98)
  • TranscriptionRequest (99-99)
  • TranscriptionStreamRequest (100-100)
core/schemas/models.go (1)
  • Model (109-129)
core/schemas/account.go (1)
ui/lib/types/config.ts (3)
  • AzureKeyConfig (23-27)
  • VertexKeyConfig (36-42)
  • BedrockKeyConfig (53-60)
core/bifrost.go (2)
core/schemas/bifrost.go (1)
  • HuggingFace (51-51)
core/providers/huggingface/huggingface.go (1)
  • NewHuggingFaceProvider (88-120)
core/providers/huggingface/speech.go (3)
core/schemas/speech.go (2)
  • BifrostSpeechRequest (9-16)
  • BifrostSpeechResponse (22-29)
core/providers/huggingface/types.go (5)
  • HuggingFaceSpeechRequest (304-310)
  • HuggingFaceSpeechParameters (313-316)
  • HuggingFaceTranscriptionGenerationParameters (342-359)
  • HuggingFaceTranscriptionEarlyStopping (363-366)
  • HuggingFaceSpeechResponse (319-323)
core/schemas/bifrost.go (2)
  • BifrostResponseExtraFields (287-296)
  • HuggingFace (51-51)
core/providers/huggingface/utils.go (5)
core/providers/huggingface/huggingface.go (1)
  • HuggingFaceProvider (25-31)
core/schemas/models.go (1)
  • BifrostListModelsRequest (23-34)
core/providers/utils/utils.go (5)
  • GetRequestPath (219-239)
  • MakeRequestWithContext (39-93)
  • HandleProviderAPIError (317-337)
  • CheckAndDecodeBody (423-431)
  • NewBifrostOperationError (449-460)
core/providers/huggingface/types.go (3)
  • HuggingFaceInferenceProviderMappingResponse (25-30)
  • HuggingFaceInferenceProviderMapping (39-42)
  • HuggingFaceHubError (263-266)
core/schemas/provider.go (1)
  • ErrProviderResponseDecode (29-29)
core/providers/huggingface/chat.go (2)
core/schemas/chatcompletions.go (17)
  • BifrostChatRequest (12-19)
  • ChatContentBlockTypeText (497-497)
  • ChatContentBlockTypeImage (498-498)
  • ChatAssistantMessage (541-545)
  • ChatToolMessage (536-538)
  • ChatToolChoiceStruct (339-344)
  • BifrostResponseChoice (582-590)
  • ChatMessageRole (415-415)
  • ChatAssistantMessageToolCall (564-570)
  • ChatAssistantMessageToolCallFunction (573-576)
  • ChatNonStreamResponseChoice (605-608)
  • BifrostLogProbs (593-598)
  • ContentLogProb (632-637)
  • LogProb (625-629)
  • BifrostLLMUsage (640-647)
  • ChatStreamResponseChoice (611-613)
  • ChatStreamResponseChoiceDelta (616-622)
core/providers/huggingface/types.go (12)
  • HuggingFaceChatRequest (47-67)
  • HuggingFaceChatMessage (69-77)
  • HuggingFaceContentItem (80-84)
  • HuggingFaceImageRef (86-88)
  • HuggingFaceToolCall (90-94)
  • HuggingFaceFunction (96-100)
  • HuggingFaceResponseFormat (102-105)
  • HuggingFaceStreamOptions (114-116)
  • HuggingFaceTool (118-121)
  • HuggingFaceToolFunction (123-127)
  • HuggingFaceChatResponse (129-136)
  • HuggingFaceChatStreamResponse (190-199)
🪛 LanguageTool
docs/contributing/adding-a-provider.mdx

[grammar] ~43-~43: Use a hyphen to join words.
Context: ... OpenAI API compatible: #### Non-OpenAI Compatible Providers If the provider ha...

(QB_NEW_EN_HYPHEN)


[grammar] ~71-~71: Use a hyphen to join words.
Context: ...T - Add comprehensive tests #### OpenAI Compatible Providers If the provider is...

(QB_NEW_EN_HYPHEN)


[grammar] ~629-~629: Use a hyphen to join words.
Context: ...ls (for response converters) ### OpenAI Compatible Providers If you are impleme...

(QB_NEW_EN_HYPHEN)


[grammar] ~1469-~1469: Use a hyphen to join words.
Context: ...r complete details. --- ### For OpenAI Compatible Providers For OpenAI-compati...

(QB_NEW_EN_HYPHEN)


[uncategorized] ~1779-~1779: The official name of this software platform is spelled with a capital “H”.
Context: ....yml- ✅ Environment variable added to.github/workflows/release-pipeline.yml` (4 jobs...

(GITHUB)

🔇 Additional comments (29)
.github/workflows/release-pipeline.yml (1)

118-118: LGTM! Consistent HuggingFace API key propagation across release jobs.

The HUGGING_FACE_API_KEY has been properly added to all four release jobs (core, framework, plugins, bifrost-http) that require provider API keys for testing. The implementation is consistent with existing API key patterns and correctly sources from GitHub secrets.

Also applies to: 191-191, 268-268, 357-357

core/schemas/account.go (1)

54-56: LGTM! HuggingFaceKeyConfig follows established patterns.

The new HuggingFaceKeyConfig type and its integration into the Key struct follow the same pattern as existing provider configurations (Azure, Vertex, Bedrock). The implementation is consistent with:

  • Similar Deployments map structure used by other providers
  • Proper JSON tags with omitempty
  • Go naming conventions
  • Optional field design

Also applies to: 17-17

core/internal/testutil/account.go (1)

96-96: LGTM! HuggingFace test configuration added with conservative settings.

The HuggingFace provider has been properly integrated into the test account setup. Notable configuration choices:

  • MaxRetries: 1 - Much lower than other providers (8-10), suggesting less reliability or desire to fail fast
  • Timeout: 300 seconds - Higher than most providers (120s), indicating potentially longer response times
  • Retry backoff: 2s-30s - Conservative settings for retry attempts

These settings appear intentional for the HuggingFace Inference API characteristics. Ensure these align with production use cases and adjust if needed based on actual performance data.

Also applies to: 259-266, 512-524

.github/workflows/pr-tests.yml (1)

118-118: LGTM! HuggingFace API key added to test environment.

The HUGGING_FACE_API_KEY has been properly added to the PR test workflow, consistent with the release pipeline changes and other provider API key patterns.

ui/README.md (1)

84-84: LGTM! Documentation updated with HuggingFace provider.

The README has been updated to include HuggingFace in the list of supported providers, keeping the documentation in sync with the code changes.

core/internal/testutil/responses_stream.go (1)

693-693: Verify the lifecycle streaming safety threshold of 300 is appropriate.

The response count safety check at line 693 uses a threshold of 300 chunks. However, the file shows different thresholds for different streaming scenarios: tool streaming (100), reasoning streaming (150), lifecycle streaming (300), and basic streaming (500). Before merging, clarify:

  1. Why lifecycle streaming allows 3x more chunks than tool streaming—is this intentional differentiation or oversight?
  2. Is 300 chunks a reasonable upper bound based on actual HuggingFace lifecycle streaming behavior?
  3. Consider adding logging when approaching thresholds to help diagnose unexpected verbosity.
transports/config.schema.json (1)

135-140: HuggingFace correctly wired into config schema

providers.huggingface reuses the generic provider schema and the semanticcache provider enum now includes "huggingface", both consistent with other providers and with the new ModelProvider constant. No issues spotted.

Also applies to: 764-784

core/schemas/bifrost.go (1)

35-83: ModelProvider and provider lists updated consistently for HuggingFace

HuggingFace is added as a ModelProvider and included in both SupportedBaseProviders and StandardProviders with the correct "huggingface" identifier. This aligns with the rest of the provider plumbing.

ui/lib/constants/config.ts (1)

5-24: UI config: HuggingFace placeholder and key requirement look correct

The huggingface model placeholder and isKeyRequiredByProvider entry are consistent with other providers and with the expected HF auth model. No issues from a UI/config standpoint.

Also applies to: 26-44

ui/lib/constants/logs.ts (1)

2-20: Logging constants updated to recognize HuggingFace

Adding "huggingface" to KnownProvidersNames and ProviderLabels keeps the provider type and display labels consistent across the UI logging layer. Looks good.

Also applies to: 43-61

docs/apis/openapi.json (1)

3239-3259: OpenAPI ModelProvider enum now includes HuggingFace (and Cerebras)

Extending the ModelProvider enum to include "huggingface" (and "cerebras") brings the public API spec in line with the backend/provider constants and config schema. The change is additive and backward‑compatible.

core/bifrost.go (1)

1327-1328: LGTM!

The HuggingFace provider is correctly integrated into the provider factory, following the established pattern used by other providers that return only a pointer (like OpenAI, Anthropic, Mistral, Gemini).

core/providers/huggingface/embedding.go (2)

22-31: LGTM with a note on unsupported input types.

The conversion correctly handles Text and Texts input types. The comment on lines 29-30 appropriately documents that embedding/embeddings (int arrays) are not supported by HuggingFace feature extraction.


57-92: LGTM!

The response conversion correctly:

  • Validates nil input
  • Pre-allocates the slice with proper capacity
  • Maps each embedding to the appropriate Bifrost structure
  • Documents that HuggingFace doesn't return usage information
core/providers/huggingface/models.go (2)

46-104: LGTM!

The deriveSupportedMethods function correctly:

  • Normalizes the pipeline string for case-insensitive matching
  • Uses a map to deduplicate methods
  • Handles both pipeline tags and model tags for flexibility
  • Returns a sorted, deduplicated list of supported methods

34-39: Code correctly distinguishes between model.ID and model.ModelID fields.

The distinction is valid: model.ModelID (maps to "modelId" in the HuggingFace API response) represents the model's user-facing identifier, while model.ID (maps to "_id" in the API response) represents the internal HuggingFace identifier. The code appropriately uses ModelID for the composite ID and Name, and ID for the HuggingFaceID field.

core/providers/huggingface/speech.go (1)

94-116: LGTM!

The response conversion correctly maps the HuggingFace response to Bifrost format, with appropriate nil checks and documentation about missing usage/alignment data.

Makefile (2)

14-21: LGTM! Improved tool binary discovery for portable environments.

The logic for detecting Go binary paths (GOBIN, GOPATH, DEFAULT_GOBIN) and constructing tool paths (AIR_BIN, GOTESTSUM_BIN) is well-structured. This enables robust tool invocation without relying on global PATH configuration, which benefits environments like Nix.


65-69: Good safety check for preventing root execution in local development.

The guard against running as root on developer machines while allowing CI environments is appropriate. This prevents global npm install failures on systems like Nix.

core/providers/huggingface/transcription.go (1)

100-139: LGTM! Response conversion is well-implemented.

The ToBifrostTranscriptionResponse method properly validates inputs, handles optional timestamp chunks, and correctly maps them to TranscriptionSegment structures.

core/providers/huggingface/chat.go (2)

201-315: LGTM! Non-streaming response conversion is comprehensive.

The ToBifrostChatResponse method properly handles:

  • Nil checks and model validation
  • Choice conversion with message, role, content, and tool calls
  • Logprobs conversion including nested top logprobs
  • Usage information mapping

317-413: LGTM! Streaming response conversion is well-implemented.

The ToBifrostChatStreamResponse method correctly converts streaming delta responses, handling roles, content, reasoning (as thought), tool calls, and logprobs appropriately.

core/providers/huggingface/utils.go (1)

83-128: LGTM! URL building with proper encoding and ExtraParams handling.

The buildModelHubURL function correctly:

  • Applies default and maximum limits
  • URL-encodes all query parameters
  • Handles various types in ExtraParams with appropriate conversions
  • Constructs a well-formed URL with the inference provider filter
core/providers/huggingface/huggingface.go (3)

477-650: LGTM: Stream processing goroutine

The goroutine properly handles:

  • Context cancellation checks before processing each chunk
  • Deferred cleanup of resources (channel close, response release)
  • Scanner buffer sizing for large responses
  • Error handling with proper error propagation to the response channel
  • Both regular chat streaming and ResponsesStream fallback modes

132-205: LGTM: Request handling with proper resource management

The completeRequest function correctly:

  • Uses defer for cleanup of fasthttp resources
  • Makes a copy of the response body before releasing to avoid use-after-free
  • Handles error responses with proper error type extraction
  • Supports debug logging controlled by environment variable

87-120: LGTM: Provider initialization

Provider initialization correctly handles configuration defaults, client setup with timeouts, pool pre-warming, proxy configuration, and base URL normalization.

core/providers/huggingface/types.go (3)

1-43: LGTM: Model and inference provider types

The model metadata types and inference provider mapping structures are well-defined with appropriate JSON tags for API compatibility.


44-260: LGTM: Chat completion types

Comprehensive type definitions for chat requests/responses with:

  • Support for both streaming and non-streaming responses
  • Flexible content handling via json.RawMessage
  • Tool/function calling support
  • Logprobs support
  • Time info for streaming diagnostics

274-411: LGTM: Embedding, Speech, and Transcription types

Well-structured types for:

  • Embedding with flexible input types and encoding format options
  • Speech synthesis with generation parameters
  • Transcription with timestamp support and generation configuration
  • Type aliases for backward compatibility

@qwerty-dvorak qwerty-dvorak force-pushed the 12-05-feat_huggingface_provider_added branch from 5baaee2 to 5a72875 Compare December 5, 2025 19:17
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (12)
docs/contributing/adding-a-provider.mdx (2)

500-527: Variable name inconsistency in documentation example (duplicate comment).

The example declares hfReq on line 500 but then references providerReq in the parameter mapping section. This inconsistency was flagged in a past review and remains unresolved. Replace all providerReq references with hfReq for consistency.

Apply this diff to fix the variable names:

     // Build the request
     hfReq := &HuggingFaceChatRequest{
         Model:    bifrostReq.Model,
         Messages: hfMessages,
     }
 
     // Map parameters
     if bifrostReq.Params != nil {
         params := bifrostReq.Params
         
         // Map standard parameters
         if params.Temperature != nil {
-            providerReq.Temperature = params.Temperature
+            hfReq.Temperature = params.Temperature
         }
         if params.MaxTokens != nil {
-            providerReq.MaxTokens = params.MaxTokens
+            hfReq.MaxTokens = params.MaxTokens
         }
         // ... other standard parameters
         
         // Handle provider-specific ExtraParams
         if params.ExtraParams != nil {
             if customParam, ok := params.ExtraParams["custom_param"].(string); ok {
-                providerReq.CustomParam = &customParam
+                hfReq.CustomParam = &customParam
             }
         }
     }
 
-    return providerReq
+    return hfReq

1405-1427: Incomplete/truncated code example (duplicate comment).

The code example is cut off mid-function at the tool/function calling section, which may confuse contributors trying to implement converters. This was flagged in a past review. Either complete the example with the tool calling logic and final return, or explicitly mark it as abbreviated with a comment like:

           // Tool/Function calling
           if params.Tools != nil && len(params.Tools) > 0 {
               // Convert tools...
           }
       }

       return hfReq
   }

Consider adding clarity about why the example is truncated and pointing contributors to full working examples in core/providers/huggingface/ or core/providers/anthropic/.

core/providers/huggingface/huggingface_test.go (1)

29-33: TranscriptionModel and SpeechSynthesisModel are swapped.

Based on model capabilities:

  • Kokoro-82M is a text-to-speech (TTS) model → should be SpeechSynthesisModel
  • whisper-large-v3 is a speech-to-text (transcription) model → should be TranscriptionModel

Apply this diff to fix the model assignments:

-		TranscriptionModel:   "fal-ai/hexgrad/Kokoro-82M",
-		SpeechSynthesisModel: "fal-ai/openai/whisper-large-v3",
+		TranscriptionModel:   "fal-ai/openai/whisper-large-v3",
+		SpeechSynthesisModel: "fal-ai/hexgrad/Kokoro-82M",
 		SpeechSynthesisFallbacks: []schemas.Fallback{
-			{Provider: schemas.HuggingFace, Model: "fal-ai/openai/whisper-large-v3"},
+			{Provider: schemas.HuggingFace, Model: "fal-ai/hexgrad/Kokoro-82M"},
 		},
core/providers/huggingface/speech.go (1)

37-63: Type assertions for int will fail when values come from JSON unmarshaling.

When ExtraParams is populated from JSON (e.g., from HTTP request bodies), numeric values are unmarshaled as float64, not int. These type assertions will silently fail, and the parameters won't be set.

Handle both int and float64 types for all integer parameters. Example for max_new_tokens:

-			if val, ok := request.Params.ExtraParams["max_new_tokens"].(int); ok {
-				genParams.MaxNewTokens = &val
+			if val, ok := request.Params.ExtraParams["max_new_tokens"].(float64); ok {
+				intVal := int(val)
+				genParams.MaxNewTokens = &intVal
+			} else if val, ok := request.Params.ExtraParams["max_new_tokens"].(int); ok {
+				genParams.MaxNewTokens = &val
 			}

Apply the same pattern to: max_length, min_length, min_new_tokens, num_beams, num_beam_groups, and top_k.

Consider extracting a helper function to reduce code duplication:

func getIntParam(params map[string]any, key string) *int {
    if val, ok := params[key].(float64); ok {
        intVal := int(val)
        return &intVal
    }
    if val, ok := params[key].(int); ok {
        return &val
    }
    return nil
}
core/providers/huggingface/transcription.go (2)

14-16: Incorrect error message references "speech" instead of "transcription".

The error message says "speech request input cannot be nil" but this is a transcription request converter.

 	if request.Input == nil {
-		return nil, fmt.Errorf("speech request input cannot be nil")
+		return nil, fmt.Errorf("transcription request input cannot be nil")
 	}

38-63: Type assertions for int will fail when values come from JSON unmarshaling.

Same issue as in speech.go - when ExtraParams comes from JSON, numeric values are float64, not int. These assertions will silently fail.

Handle both int and float64 types. Consider creating a shared helper function in the package to avoid code duplication across speech.go and transcription.go:

// In a shared utils file
func getIntFromExtra(params map[string]any, key string) *int {
    if val, ok := params[key].(float64); ok {
        intVal := int(val)
        return &intVal
    }
    if val, ok := params[key].(int); ok {
        return &val
    }
    return nil
}

Then use it as:

genParams.MaxNewTokens = getIntFromExtra(request.Params.ExtraParams, "max_new_tokens")
core/providers/huggingface/chat.go (1)

69-81: Critical: Nil pointer dereference remains unaddressed.

Line 74 still dereferences tc.Function.Name without checking for nil, which can cause a panic.

Apply the previously suggested fix:

 for _, tc := range msg.ChatAssistantMessage.ToolCalls {
+	if tc.Function.Name == nil {
+		continue // Skip tool calls without a function name
+	}
 	hfToolCall := HuggingFaceToolCall{
 		ID:   tc.ID,
 		Type: tc.Type,
 		Function: HuggingFaceFunction{
 			Name:      *tc.Function.Name,
 			Arguments: tc.Function.Arguments,
 		},
 	}
 	hfToolCalls = append(hfToolCalls, hfToolCall)
 }
core/providers/huggingface/huggingface.go (4)

277-293: Critical: Wrong provider constant remains unfixed.

Line 279 still uses schemas.Gemini instead of schemas.HuggingFace in the operation check, causing incorrect permission validation.

Apply the previously suggested fix:

-if err := providerUtils.CheckOperationAllowed(schemas.Gemini, provider.customProviderConfig, schemas.ListModelsRequest); err != nil {
+if err := providerUtils.CheckOperationAllowed(schemas.HuggingFace, provider.customProviderConfig, schemas.ListModelsRequest); err != nil {
 	return nil, err
 }

700-707: Minor: Wrong request type in Embedding error messages remains unfixed.

Lines 701 and 706 use schemas.SpeechRequest instead of schemas.EmbeddingRequest in error messages, which will confuse debugging.

Apply the previously suggested fix:

 if providerMapping == nil {
-	return nil, providerUtils.NewUnsupportedOperationError(schemas.SpeechRequest, provider.GetProviderKey())
+	return nil, providerUtils.NewUnsupportedOperationError(schemas.EmbeddingRequest, provider.GetProviderKey())
 }

 mapping, ok := providerMapping[inferenceProvider]
 if !ok || mapping.ProviderModelMapping == "" || mapping.ProviderTask != "feature-extraction" {
-	return nil, providerUtils.NewUnsupportedOperationError(schemas.SpeechRequest, provider.GetProviderKey())
+	return nil, providerUtils.NewUnsupportedOperationError(schemas.EmbeddingRequest, provider.GetProviderKey())
 }

773-776: Major: Wrong task type check in Speech remains unfixed.

Line 774 checks for "automatic-speech-recognition" (transcription task) instead of "text-to-speech" (speech generation task). This will cause Speech operations to incorrectly validate against the wrong task type.

Apply the previously suggested fix:

 mapping, ok := providerMapping[inferenceProvider]
-if !ok || mapping.ProviderModelMapping == "" || mapping.ProviderTask != "automatic-speech-recognition" {
+if !ok || mapping.ProviderModelMapping == "" || mapping.ProviderTask != "text-to-speech" {
 	return nil, providerUtils.NewUnsupportedOperationError(schemas.SpeechRequest, provider.GetProviderKey())
 }

831-838: Major: Wrong task type and request type in Transcription remain unfixed.

Two issues here:

  1. Lines 832 and 837 use schemas.SpeechRequest instead of schemas.TranscriptionRequest in error messages
  2. Line 836 checks for "text-to-speech" task instead of "automatic-speech-recognition" (the tasks are swapped with the Speech function)

Apply the previously suggested fix:

 if providerMapping == nil {
-	return nil, providerUtils.NewUnsupportedOperationError(schemas.SpeechRequest, provider.GetProviderKey())
+	return nil, providerUtils.NewUnsupportedOperationError(schemas.TranscriptionRequest, provider.GetProviderKey())
 }

 mapping, ok := providerMapping[inferenceProvider]
-if !ok || mapping.ProviderModelMapping == "" || mapping.ProviderTask != "text-to-speech" {
-	return nil, providerUtils.NewUnsupportedOperationError(schemas.SpeechRequest, provider.GetProviderKey())
+if !ok || mapping.ProviderModelMapping == "" || mapping.ProviderTask != "automatic-speech-recognition" {
+	return nil, providerUtils.NewUnsupportedOperationError(schemas.TranscriptionRequest, provider.GetProviderKey())
 }
core/providers/huggingface/types.go (1)

21-23: Critical: Struct definition may not match API response format.

The HuggingFace /api/models endpoint returns a JSON array directly, but HuggingFaceListModelsResponse expects an object with a models field. This mismatch will cause unmarshaling to fail.

Run this script to verify the API response format:

#!/bin/bash
# Check the actual response format from HuggingFace API
curl -s "https://huggingface.co/api/models?limit=1" | jq -c 'if type == "array" then "ARRAY" else "OBJECT with keys: " + (keys | join(", ")) end'

If the API returns "ARRAY", change the response type to a slice or add custom UnmarshalJSON:

-type HuggingFaceListModelsResponse struct {
-	Models []HuggingFaceModel `json:"models"`
-}
+type HuggingFaceListModelsResponse []HuggingFaceModel

And update usage in listModelsByKey (line ~256 in huggingface.go):

-response := huggingfaceAPIResponse.ToBifrostListModelsResponse(providerName)
+response := ToBifrostListModelsResponse(huggingfaceAPIResponse, providerName)
🧹 Nitpick comments (8)
docs/contributing/adding-a-provider.mdx (1)

43-43: Hyphenate compound adjectives: "OpenAI-compatible" throughout documentation.

For consistency and grammatical correctness, compound adjectives should be hyphenated when they precede a noun. Update all instances of "OpenAI Compatible" to "OpenAI-compatible":

- #### Non-OpenAI Compatible Providers
+ #### Non-OpenAI-compatible Providers

- #### OpenAI Compatible Providers
+ #### OpenAI-compatible Providers

- ### OpenAI Compatible Providers
+ ### OpenAI-compatible Providers

- ### For OpenAI Compatible Providers
+ ### For OpenAI-compatible Providers

Apply these changes throughout the document to improve consistency and readability.

Also applies to: 71-71, 629-629, 1469-1469

Makefile (3)

14-21: Binary path variables introduce deterministic tool discovery—good for portability.

The approach of chaining GOBIN → GOPATH/bin → default Go locations is sound and addresses real pain points on systems like Nix. However, Line 21's fallback to which may not handle absolute paths reliably:

GOTESTSUM_BIN := $(if $(strip $(DEFAULT_GOBIN)),$(DEFAULT_GOBIN)/gotestsum,$(shell which gotestsum 2>/dev/null || echo gotestsum))

which typically expects command names (searched in PATH), not absolute paths. If which is called with a full path, it may fail unexpectedly on some systems. Consider using command -v or simplifying to just fallback to the bare name:

- GOTESTSUM_BIN := $(if $(strip $(DEFAULT_GOBIN)),$(DEFAULT_GOBIN)/gotestsum,$(shell which gotestsum 2>/dev/null || echo gotestsum))
+ GOTESTSUM_BIN := $(if $(strip $(DEFAULT_GOBIN)),$(DEFAULT_GOBIN)/gotestsum,gotestsum)

The || echo gotestsum ensures the variable is never empty, but the bare name should be sufficient; downstream checks like Line 103 already verify existence.


89-100: install-air logic is sound but the binary availability check is convoluted (Line 97).

The condition:

if ! which $$INSTALLED >/dev/null 2>&1 && [ ! -x "$$INSTALLED" ]; then

…will work (warns only if the full path doesn't exist), but it's unnecessarily complex. Since INSTALLED is a full path, which on an absolute path may fail unexpectedly on some shells. Simplify to:

- if ! which $$INSTALLED >/dev/null 2>&1 && [ ! -x "$$INSTALLED" ]; then \
+ if [ ! -x "$$INSTALLED" ]; then \

102-113: install-gotestsum has the same which issue and convoluted logic (Lines 103, 110).

Line 103:

if [ -x "$(GOTESTSUM_BIN)" ] || which $(GOTESTSUM_BIN) > /dev/null 2>&1; then

If GOTESTSUM_BIN is a full path like /home/user/go/bin/gotestsum, which on it will fail. Simplify to just check the file:

- if [ -x "$(GOTESTSUM_BIN)" ] || which $(GOTESTSUM_BIN) > /dev/null 2>&1; then \
+ if [ -x "$(GOTESTSUM_BIN)" ]; then \

Line 110 has the same convoluted conditional as Line 97—apply the same simplification:

- if ! which $$INSTALLED >/dev/null 2>&1 && [ ! -x "$$INSTALLED" ]; then \
+ if [ ! -x "$$INSTALLED" ]; then \
core/schemas/mux.go (1)

1146-1221: Confirm intent of mixing delta.Thought into output_text.delta and potential duplication

The new logic now (a) enters the text path when either hasContent or hasThought is true and (b) builds contentDelta by concatenating delta.Content and delta.Thought, then uses that for ResponsesStreamResponseTypeOutputTextDelta.Delta, while still emitting a separate ResponsesStreamResponseTypeReasoningSummaryTextDelta based on delta.Thought.

This is a behavioral change from “text delta only reflects Content” to “text delta reflects Content + Thought”, and also means the same Thought tokens appear in both the output-text and reasoning-summary streams. If Thought is intended to remain non-user-visible reasoning, this could leak it into the primary text channel and may surprise existing consumers that only look at output_text.delta.

Please double‑check that:

  • Existing UIs/clients that consume output_text.delta are meant to see Thought text, and
  • They can tolerate Thought appearing both in output_text.delta and reasoning_summary_text.delta.

If the goal was only to advance lifecycle (item creation / closing) when thought‑only chunks arrive, a narrower change that gates on hasThought but keeps Delta based solely on Content might be safer. Happy to sketch that refactor if you confirm the desired semantics.

core/providers/huggingface/huggingface_test.go (1)

59-63: Consider using defer for client.Shutdown() to ensure cleanup on panic.

If RunAllComprehensiveTests panics, client.Shutdown() won't be called, potentially leaving resources unreleased.

+	defer client.Shutdown()
+
 	t.Run("HuggingFaceTests", func(t *testing.T) {
 		testutil.RunAllComprehensiveTests(t, client, ctx, testConfig)
 	})
-	client.Shutdown()
core/providers/huggingface/chat.go (2)

34-64: Consider handling marshaling errors for content conversion.

Lines 37 and 61 discard marshaling errors when converting content. While the API may reject invalid payloads, explicitly handling these errors would make debugging easier.

Consider logging or returning errors:

 if msg.Content.ContentStr != nil {
-	contentJSON, _ := sonic.Marshal(*msg.Content.ContentStr)
+	contentJSON, err := sonic.Marshal(*msg.Content.ContentStr)
+	if err != nil {
+		// Log warning or return error
+		continue
+	}
 	hfMsg.Content = json.RawMessage(contentJSON)
 }

Apply similar handling at line 61 for ContentBlocks.


136-195: Consider handling errors in ResponseFormat and ToolChoice conversions.

Lines 138-144 and 189-193 silently discard marshaling errors when converting ResponseFormat and ToolChoice. Since these are optional but important parameters, logging failures would aid debugging.

Example for ResponseFormat:

 if params.ResponseFormat != nil {
 	responseFormatJSON, err := sonic.Marshal(params.ResponseFormat)
-	if err == nil {
+	if err != nil {
+		// Log warning: failed to marshal ResponseFormat
+	} else {
 		var hfResponseFormat HuggingFaceResponseFormat
 		if err := sonic.Unmarshal(responseFormatJSON, &hfResponseFormat); err == nil {
 			hfReq.ResponseFormat = &hfResponseFormat
+		} else {
+			// Log warning: failed to unmarshal ResponseFormat
 		}
 	}
 }
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5baaee2 and 5a72875.

📒 Files selected for processing (28)
  • .github/workflows/pr-tests.yml (1 hunks)
  • .github/workflows/release-pipeline.yml (4 hunks)
  • Makefile (8 hunks)
  • core/bifrost.go (2 hunks)
  • core/internal/testutil/account.go (3 hunks)
  • core/internal/testutil/chat_completion_stream.go (1 hunks)
  • core/internal/testutil/responses_stream.go (1 hunks)
  • core/providers/huggingface/chat.go (1 hunks)
  • core/providers/huggingface/embedding.go (1 hunks)
  • core/providers/huggingface/huggingface.go (1 hunks)
  • core/providers/huggingface/huggingface_test.go (1 hunks)
  • core/providers/huggingface/models.go (1 hunks)
  • core/providers/huggingface/speech.go (1 hunks)
  • core/providers/huggingface/transcription.go (1 hunks)
  • core/providers/huggingface/types copy.go (1 hunks)
  • core/providers/huggingface/types.go (1 hunks)
  • core/providers/huggingface/utils.go (1 hunks)
  • core/schemas/account.go (2 hunks)
  • core/schemas/bifrost.go (3 hunks)
  • core/schemas/mux.go (2 hunks)
  • docs/apis/openapi.json (1 hunks)
  • docs/contributing/adding-a-provider.mdx (1 hunks)
  • docs/features/unified-interface.mdx (1 hunks)
  • transports/config.schema.json (2 hunks)
  • ui/README.md (1 hunks)
  • ui/lib/constants/config.ts (2 hunks)
  • ui/lib/constants/icons.tsx (1 hunks)
  • ui/lib/constants/logs.ts (2 hunks)
✅ Files skipped from review due to trivial changes (1)
  • core/providers/huggingface/utils.go
🚧 Files skipped from review as they are similar to previous changes (14)
  • core/providers/huggingface/types copy.go
  • ui/README.md
  • docs/features/unified-interface.mdx
  • .github/workflows/pr-tests.yml
  • docs/apis/openapi.json
  • ui/lib/constants/logs.ts
  • core/providers/huggingface/embedding.go
  • transports/config.schema.json
  • ui/lib/constants/config.ts
  • .github/workflows/release-pipeline.yml
  • core/schemas/bifrost.go
  • core/internal/testutil/chat_completion_stream.go
  • core/internal/testutil/account.go
  • core/internal/testutil/responses_stream.go
👮 Files not reviewed due to content moderation or server errors (1)
  • ui/lib/constants/icons.tsx
🧰 Additional context used
📓 Path-based instructions (1)
**

⚙️ CodeRabbit configuration file

always check the stack if there is one for the current PR. do not give localized reviews for the PR, always see all changes in the light of the whole stack of PRs (if there is a stack, if there is no stack you can continue to make localized suggestions/reviews)

Files:

  • core/providers/huggingface/huggingface_test.go
  • core/providers/huggingface/models.go
  • core/providers/huggingface/chat.go
  • core/providers/huggingface/speech.go
  • core/schemas/mux.go
  • Makefile
  • core/providers/huggingface/transcription.go
  • core/bifrost.go
  • ui/lib/constants/icons.tsx
  • core/schemas/account.go
  • docs/contributing/adding-a-provider.mdx
  • core/providers/huggingface/types.go
  • core/providers/huggingface/huggingface.go
🧬 Code graph analysis (6)
core/providers/huggingface/huggingface_test.go (5)
core/internal/testutil/setup.go (1)
  • SetupTest (51-60)
core/internal/testutil/account.go (2)
  • ComprehensiveTestConfig (47-64)
  • TestScenarios (22-44)
core/schemas/bifrost.go (2)
  • HuggingFace (51-51)
  • Fallback (131-134)
core/schemas/models.go (1)
  • Model (109-129)
core/internal/testutil/tests.go (1)
  • RunAllComprehensiveTests (15-62)
core/providers/huggingface/models.go (3)
core/providers/huggingface/types.go (1)
  • HuggingFaceListModelsResponse (21-23)
core/schemas/bifrost.go (13)
  • ModelProvider (32-32)
  • RequestType (86-86)
  • ChatCompletionRequest (92-92)
  • ChatCompletionStreamRequest (93-93)
  • TextCompletionRequest (90-90)
  • TextCompletionStreamRequest (91-91)
  • ResponsesRequest (94-94)
  • ResponsesStreamRequest (95-95)
  • EmbeddingRequest (96-96)
  • SpeechRequest (97-97)
  • SpeechStreamRequest (98-98)
  • TranscriptionRequest (99-99)
  • TranscriptionStreamRequest (100-100)
core/schemas/models.go (2)
  • BifrostListModelsResponse (36-45)
  • Model (109-129)
core/providers/huggingface/speech.go (3)
core/schemas/speech.go (2)
  • BifrostSpeechRequest (9-16)
  • BifrostSpeechResponse (22-29)
core/providers/huggingface/types.go (5)
  • HuggingFaceSpeechRequest (304-310)
  • HuggingFaceSpeechParameters (313-316)
  • HuggingFaceTranscriptionGenerationParameters (342-359)
  • HuggingFaceTranscriptionEarlyStopping (363-366)
  • HuggingFaceSpeechResponse (319-323)
core/schemas/bifrost.go (2)
  • BifrostResponseExtraFields (287-296)
  • HuggingFace (51-51)
core/schemas/mux.go (2)
core/schemas/responses.go (2)
  • BifrostResponsesStreamResponse (1422-1460)
  • ResponsesStreamResponseTypeOutputTextDelta (1370-1370)
core/schemas/utils.go (1)
  • Ptr (16-18)
core/bifrost.go (2)
core/schemas/bifrost.go (1)
  • HuggingFace (51-51)
core/providers/huggingface/huggingface.go (1)
  • NewHuggingFaceProvider (88-120)
core/schemas/account.go (1)
ui/lib/types/config.ts (3)
  • AzureKeyConfig (23-27)
  • VertexKeyConfig (36-42)
  • BedrockKeyConfig (53-60)
🪛 LanguageTool
docs/contributing/adding-a-provider.mdx

[grammar] ~43-~43: Use a hyphen to join words.
Context: ... OpenAI API compatible: #### Non-OpenAI Compatible Providers If the provider ha...

(QB_NEW_EN_HYPHEN)


[grammar] ~71-~71: Use a hyphen to join words.
Context: ...T - Add comprehensive tests #### OpenAI Compatible Providers If the provider is...

(QB_NEW_EN_HYPHEN)


[grammar] ~629-~629: Use a hyphen to join words.
Context: ...ls (for response converters) ### OpenAI Compatible Providers If you are impleme...

(QB_NEW_EN_HYPHEN)


[grammar] ~1469-~1469: Use a hyphen to join words.
Context: ...r complete details. --- ### For OpenAI Compatible Providers For OpenAI-compati...

(QB_NEW_EN_HYPHEN)


[uncategorized] ~1779-~1779: The official name of this software platform is spelled with a capital “H”.
Context: ....yml- ✅ Environment variable added to.github/workflows/release-pipeline.yml` (4 jobs...

(GITHUB)

🔇 Additional comments (9)
Makefile (2)

155-155: Variable substitutions are correct and enable deterministic test execution.

The use of $(AIR_BIN) and $(GOTESTSUM_BIN) throughout the dev and test targets is consistent and properly formatted. This decoupling from PATH-based tool discovery is essential for reliable test execution, especially for the new HuggingFace provider tests.

Also applies to: 163-163, 349-349, 401-401, 425-425, 454-454, 548-548


24-29: Color vars, root check, and next installation fallback are sound.

  • Color refactor (lines 24–29): Using printf centralizes ANSI code definitions—good for maintainability.
  • Root user guard (lines 67–70): Defensive check prevents npm permission issues on dev machines—appropriate.
  • Next installation fallback (lines 75–85): Thoughtful multi-level strategy (local, then npx, then global) handles Nix-like environments well.

Also applies to: 66-70, 75-85

core/schemas/account.go (1)

54-56: LGTM! Consistent with existing provider key config patterns.

The HuggingFaceKeyConfig type follows the established pattern used by Azure, Vertex, and Bedrock configurations, with a Deployments map for model-to-deployment mapping.

core/bifrost.go (1)

1327-1328: LGTM! Provider integration follows established patterns.

The HuggingFace provider case follows the same pattern as other providers like OpenAI, Anthropic, and Gemini that return a provider instance without an error.

core/providers/huggingface/speech.go (1)

94-116: LGTM! Response conversion is straightforward and correct.

The ToBifrostSpeechResponse properly validates inputs and maps the response fields. The comment about missing usage/alignment data is helpful for future maintainers.

core/providers/huggingface/models.go (3)

16-44: LGTM! Model list conversion handles edge cases properly.

The method correctly:

  • Skips models with empty ModelID
  • Skips models without derivable supported methods
  • Pre-allocates the result slice with appropriate capacity
  • Creates a composite ID with the provider prefix

46-104: LGTM! Comprehensive method derivation from pipeline and tags.

The function properly:

  • Normalizes the pipeline tag
  • Uses a set to deduplicate methods
  • Handles both primary pipeline tags and secondary model tags
  • Returns a sorted slice for consistent output

11-14: These constants are used in core/providers/huggingface/utils.go (lines 90, 92–93), so no action is needed.

core/providers/huggingface/transcription.go (1)

100-139: LGTM! Response conversion with proper segment mapping.

The method correctly:

  • Validates non-nil receiver and non-empty model name
  • Maps chunks to segments with proper ID assignment
  • Safely handles timestamp arrays with bounds checking before access

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
core/schemas/mux.go (1)

1146-1220: Remove duplicate delta handling for Thought content.

The code processes delta.Thought twice with conflicting approaches:

  1. Lines 1207-1213: Concatenates Thought with Content into a single contentDelta string emitted as ResponsesStreamResponseTypeOutputTextDelta
  2. Lines 1369-1377: Emits the same delta.Thought separately as ResponsesStreamResponseTypeReasoningSummaryTextDelta

This means consumers receive the thought content both mixed into regular text deltas and as a separate reasoning delta event. Keep the separate reasoning delta emission (lines 1369-1377) and remove the concatenation of Thought into contentDelta (lines 1211-1212).

♻️ Duplicate comments (15)
core/providers/huggingface/types.go (1)

21-23: Struct definition doesn't match HuggingFace API response format.

This issue was previously flagged: The HuggingFace /api/models endpoint returns a JSON array directly, but HuggingFaceListModelsResponse expects an object with a models field. This will cause unmarshaling failures. Either change the response type to []HuggingFaceModel or add custom UnmarshalJSON logic.

docs/contributing/adding-a-provider.mdx (2)

510-527: Variable name inconsistency in documentation example.

The example code references providerReq (e.g., providerReq.Temperature), but the variable was named hfReq on line 500. This inconsistency could confuse contributors.


1406-1427: Code example is truncated mid-function.

The code block is incomplete, cutting off inside the parameter mapping section. This could confuse contributors trying to follow the pattern.

core/providers/huggingface/transcription.go (2)

14-16: Incorrect error message references "speech" instead of "transcription".

The error message says "speech request input cannot be nil" but this is a transcription request converter.


38-79: Type assertions for int will fail when values come from JSON unmarshalling.

When ExtraParams is populated from JSON (e.g., from request bodies), numeric values are unmarshalled as float64, not int. The type assertions like .(int) on lines 38, 41, 44, 47, 50, 53, and 62 will silently fail, causing these parameters to be ignored.

Consider handling both int and float64 types for all integer parameters (max_new_tokens, max_length, min_length, min_new_tokens, num_beams, num_beam_groups, top_k).

core/providers/huggingface/speech.go (1)

37-77: Type assertions for int will fail when values come from JSON unmarshalling.

When ExtraParams is populated from JSON, numeric values are unmarshalled as float64, not int. The type assertions like .(int) on lines 37, 40, 43, 46, 49, 52, and 61 will silently fail, causing these parameters to be ignored.

Consider handling both int and float64 types for all integer parameters.

core/providers/huggingface/chat.go (1)

69-78: Potential nil pointer dereference when accessing tc.Function.Name.

On line 74, tc.Function.Name is dereferenced without checking if it's nil. If the tool call has a nil Name field, this will cause a panic.

Add a nil check before dereferencing or skip tool calls without function names.

core/providers/huggingface/utils.go (3)

136-162: Empty provider and model names when input has no slashes.

When splitIntoModelProvider receives a model name with no slashes (t == 0), both prov and model remain empty strings. Downstream code at lines 313 and 398 in huggingface.go formats this as "" : "", resulting in malformed model identifiers.

Handle the t == 0 case by setting a default provider (e.g., "hf-inference") and using the input as the model name.


164-197: Incomplete provider routing in getInferenceProviderRouteURL — 13 of 19 defined providers will error.

The function only handles 6 providers (fal-ai, hf-inference, nebius, replicate, sambanova, scaleway) while INFERENCE_PROVIDERS defines 19 total. Providers like cerebras, cohere, groq, featherlessAI, fireworksAI, hyperbolic, novita, nscale, ovhcloud, publicai, together, wavespeed, and zaiOrg will hit the default error case.

This affects embedding, speech, and transcription operations. Either expand routing logic for all providers or remove unsupported ones from INFERENCE_PROVIDERS.


180-191: Copy-paste error: Wrong provider names in error messages.

Lines 184 and 190 incorrectly reference "nebius provider" when the actual providers are "sambanova" (line 184) and "scaleway" (line 190).

core/providers/huggingface/huggingface.go (5)

277-281: Bug: Wrong provider constant used in ListModels.

CheckOperationAllowed is called with schemas.Gemini instead of schemas.HuggingFace. This is a copy-paste error and will cause incorrect operation permission checks.


295-301: Bug: Wrong request types in unsupported operation errors.

Both TextCompletion and TextCompletionStream return errors with schemas.EmbeddingRequest instead of the correct request types (schemas.TextCompletionRequest and schemas.TextCompletionStreamRequest).


700-707: Bug: Wrong request type in Embedding error messages.

The error messages reference schemas.SpeechRequest instead of schemas.EmbeddingRequest when the embedding operation is unsupported (lines 701 and 706).


773-776: Bug: Wrong task type check in Speech.

The Speech function checks for "automatic-speech-recognition" task, but Speech (text-to-speech) should check for "text-to-speech". The task checks appear to be swapped between Speech and Transcription functions.


831-838: Bug: Wrong task type check and error messages in Transcription.

Two issues:

  1. The error messages use schemas.SpeechRequest instead of schemas.TranscriptionRequest (lines 832, 837)
  2. Line 836 checks for "text-to-speech" task, but Transcription should check for "automatic-speech-recognition"
🧹 Nitpick comments (4)
Makefile (2)

88-100: Extract repeated DEFAULT_GOBIN logic into a shared variable for DRY.

Lines 95 and 108 both replicate the DEFAULT_GOBIN fallback logic:

$(if $(strip $(GOBIN)),$(GOBIN)/gotestsum,$(if $(strip $(GOPATH)),$(GOPATH)/bin/gotestsum,...))

This can be defined once and reused, improving maintainability and consistency across targets.

Consider extracting into a helper pattern (or simply reusing DEFAULT_GOBIN):

INSTALLED_AIR_PATH := $(DEFAULT_GOBIN)/air
INSTALLED_GOTESTSUM_PATH := $(DEFAULT_GOBIN)/gotestsum

Then reference these in the informational messages instead of computing the logic twice.

Also applies to: 102-113


103-103: Minor: Quote GOTESTSUM_BIN in which invocation for robustness.

Line 103 uses which $(GOTESTSUM_BIN) without quotes. While binary names rarely contain spaces, shell best practice is to quote variable expansions:

which "$(GOTESTSUM_BIN)" > /dev/null 2>&1
core/providers/huggingface/types.go (2)

379-396: Consider returning an error for invalid early_stopping values.

The UnmarshalJSON method silently ignores invalid values by returning nil at line 395. If the JSON value is neither a boolean nor a string, this will leave the field in an indeterminate state. Consider returning an error to surface invalid API responses:

 func (e *HuggingFaceTranscriptionEarlyStopping) UnmarshalJSON(data []byte) error {
 	// Try boolean first
 	var boolVal bool
 	if err := json.Unmarshal(data, &boolVal); err == nil {
 		e.BoolValue = &boolVal
 		return nil
 	}
 
 	// Try string
 	var stringVal string
 	if err := json.Unmarshal(data, &stringVal); err == nil {
 		e.StringValue = &stringVal
 		return nil
 	}
 
-	return nil
+	return fmt.Errorf("early_stopping must be a boolean or string, got: %s", string(data))
 }

66-66: Extra field won't capture unknown fields with json:"-" tag.

The Extra field at line 66 has the json:"-" tag, which excludes it from JSON marshaling/unmarshaling. This means it won't capture unknown additional fields as the comment suggests. If you want to capture unknown fields, remove the json:"-" tag or use a different approach. If this field is for internal use only (not from/to JSON), the current implementation is correct but the comment should be clarified.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5baaee2 and 5a72875.

📒 Files selected for processing (28)
  • .github/workflows/pr-tests.yml (1 hunks)
  • .github/workflows/release-pipeline.yml (4 hunks)
  • Makefile (8 hunks)
  • core/bifrost.go (2 hunks)
  • core/internal/testutil/account.go (3 hunks)
  • core/internal/testutil/chat_completion_stream.go (1 hunks)
  • core/internal/testutil/responses_stream.go (1 hunks)
  • core/providers/huggingface/chat.go (1 hunks)
  • core/providers/huggingface/embedding.go (1 hunks)
  • core/providers/huggingface/huggingface.go (1 hunks)
  • core/providers/huggingface/huggingface_test.go (1 hunks)
  • core/providers/huggingface/models.go (1 hunks)
  • core/providers/huggingface/speech.go (1 hunks)
  • core/providers/huggingface/transcription.go (1 hunks)
  • core/providers/huggingface/types copy.go (1 hunks)
  • core/providers/huggingface/types.go (1 hunks)
  • core/providers/huggingface/utils.go (1 hunks)
  • core/schemas/account.go (2 hunks)
  • core/schemas/bifrost.go (3 hunks)
  • core/schemas/mux.go (2 hunks)
  • docs/apis/openapi.json (1 hunks)
  • docs/contributing/adding-a-provider.mdx (1 hunks)
  • docs/features/unified-interface.mdx (1 hunks)
  • transports/config.schema.json (2 hunks)
  • ui/README.md (1 hunks)
  • ui/lib/constants/config.ts (2 hunks)
  • ui/lib/constants/icons.tsx (1 hunks)
  • ui/lib/constants/logs.ts (2 hunks)
✅ Files skipped from review due to trivial changes (1)
  • core/providers/huggingface/types copy.go
🚧 Files skipped from review as they are similar to previous changes (12)
  • core/internal/testutil/chat_completion_stream.go
  • core/providers/huggingface/huggingface_test.go
  • ui/lib/constants/config.ts
  • core/providers/huggingface/models.go
  • ui/lib/constants/logs.ts
  • core/schemas/account.go
  • core/schemas/bifrost.go
  • transports/config.schema.json
  • docs/apis/openapi.json
  • ui/README.md
  • .github/workflows/pr-tests.yml
  • core/internal/testutil/responses_stream.go
👮 Files not reviewed due to content moderation or server errors (1)
  • ui/lib/constants/icons.tsx
🧰 Additional context used
📓 Path-based instructions (1)
**

⚙️ CodeRabbit configuration file

always check the stack if there is one for the current PR. do not give localized reviews for the PR, always see all changes in the light of the whole stack of PRs (if there is a stack, if there is no stack you can continue to make localized suggestions/reviews)

Files:

  • core/schemas/mux.go
  • docs/features/unified-interface.mdx
  • core/internal/testutil/account.go
  • core/providers/huggingface/embedding.go
  • docs/contributing/adding-a-provider.mdx
  • core/providers/huggingface/speech.go
  • Makefile
  • core/providers/huggingface/transcription.go
  • core/providers/huggingface/utils.go
  • ui/lib/constants/icons.tsx
  • core/providers/huggingface/chat.go
  • core/bifrost.go
  • core/providers/huggingface/huggingface.go
  • core/providers/huggingface/types.go
🧬 Code graph analysis (6)
core/schemas/mux.go (2)
core/schemas/responses.go (2)
  • BifrostResponsesStreamResponse (1422-1460)
  • ResponsesStreamResponseTypeOutputTextDelta (1370-1370)
core/schemas/utils.go (1)
  • Ptr (16-18)
core/internal/testutil/account.go (4)
core/schemas/bifrost.go (1)
  • HuggingFace (51-51)
core/schemas/account.go (1)
  • Key (8-18)
core/schemas/provider.go (4)
  • ProviderConfig (234-242)
  • NetworkConfig (45-53)
  • DefaultRequestTimeoutInSeconds (15-15)
  • ConcurrencyAndBufferSize (128-131)
core/internal/testutil/cross_provider_scenarios.go (1)
  • ProviderConfig (45-53)
core/providers/huggingface/embedding.go (3)
core/schemas/embedding.go (4)
  • BifrostEmbeddingRequest (9-16)
  • BifrostEmbeddingResponse (22-28)
  • EmbeddingData (118-122)
  • EmbeddingStruct (124-128)
core/providers/huggingface/types.go (2)
  • HuggingFaceEmbeddingRequest (278-288)
  • HuggingFaceEmbeddingResponse (299-299)
core/schemas/chatcompletions.go (1)
  • BifrostLLMUsage (640-647)
core/providers/huggingface/transcription.go (3)
core/schemas/transcriptions.go (2)
  • BifrostTranscriptionRequest (3-10)
  • BifrostTranscriptionResponse (16-26)
core/providers/huggingface/types.go (5)
  • HuggingFaceTranscriptionRequest (328-333)
  • HuggingFaceTranscriptionRequestParameters (336-339)
  • HuggingFaceTranscriptionGenerationParameters (342-359)
  • HuggingFaceTranscriptionEarlyStopping (363-366)
  • HuggingFaceTranscriptionResponse (399-402)
core/schemas/bifrost.go (2)
  • BifrostResponseExtraFields (287-296)
  • HuggingFace (51-51)
core/providers/huggingface/chat.go (3)
core/schemas/chatcompletions.go (4)
  • BifrostChatRequest (12-19)
  • BifrostChatResponse (26-41)
  • BifrostResponseChoice (582-590)
  • BifrostLLMUsage (640-647)
core/providers/huggingface/types.go (11)
  • HuggingFaceChatRequest (47-67)
  • HuggingFaceChatMessage (69-77)
  • HuggingFaceContentItem (80-84)
  • HuggingFaceImageRef (86-88)
  • HuggingFaceToolCall (90-94)
  • HuggingFaceFunction (96-100)
  • HuggingFaceResponseFormat (102-105)
  • HuggingFaceStreamOptions (114-116)
  • HuggingFaceTool (118-121)
  • HuggingFaceToolFunction (123-127)
  • HuggingFaceChatResponse (129-136)
core/schemas/bifrost.go (2)
  • BifrostResponseExtraFields (287-296)
  • HuggingFace (51-51)
core/bifrost.go (2)
core/schemas/bifrost.go (1)
  • HuggingFace (51-51)
core/providers/huggingface/huggingface.go (1)
  • NewHuggingFaceProvider (88-120)
🪛 LanguageTool
docs/contributing/adding-a-provider.mdx

[grammar] ~43-~43: Use a hyphen to join words.
Context: ... OpenAI API compatible: #### Non-OpenAI Compatible Providers If the provider ha...

(QB_NEW_EN_HYPHEN)


[grammar] ~71-~71: Use a hyphen to join words.
Context: ...T - Add comprehensive tests #### OpenAI Compatible Providers If the provider is...

(QB_NEW_EN_HYPHEN)


[grammar] ~629-~629: Use a hyphen to join words.
Context: ...ls (for response converters) ### OpenAI Compatible Providers If you are impleme...

(QB_NEW_EN_HYPHEN)


[grammar] ~1469-~1469: Use a hyphen to join words.
Context: ...r complete details. --- ### For OpenAI Compatible Providers For OpenAI-compati...

(QB_NEW_EN_HYPHEN)


[uncategorized] ~1779-~1779: The official name of this software platform is spelled with a capital “H”.
Context: ....yml- ✅ Environment variable added to.github/workflows/release-pipeline.yml` (4 jobs...

(GITHUB)

🔇 Additional comments (21)
Makefile (3)

15-21: Tool-path discovery logic is sound; minor consistency opportunity.

The conditional expansion for DEFAULT_GOBIN and fallback chains for AIR_BIN and GOTESTSUM_BIN are well-structured. However, the logic could be slightly clearer:

  • Line 17–18: GOBIN ?= and GOPATH ?= query environment each time; consider memoizing to $(shell go env GOBIN) and $(shell go env GOPATH) without the conditional assignment if these are truly static per invocation.
  • Line 21: The fallback to which gotestsum is good, but ensure it's tested on systems where which may not be available (rare, but worth verifying).

Overall approach is solid for improving portability and reducing hard-coded tool assumptions.


155-155: Binary invocation refactoring is correct.

The substitution of direct air with $(AIR_BIN) is clean and allows the Makefile to find air without relying on PATH. Well done.

Also applies to: 163-163


349-349: Test target refactoring to use $(GOTESTSUM_BIN) looks good.

All test invocations now consistently use the computed GOTESTSUM_BIN variable, ensuring tests can run even if gotestsum is not on PATH. Proper fallback logic in variable definition (line 21) makes this robust.

Verify that tests pass when GOTESTSUM_BIN resolves to a full path outside PATH (e.g., via GOBIN). Run a quick smoke test locally to confirm: make test-core PROVIDER=<any_provider>.

Also applies to: 401-401, 425-425, 454-454, 548-548

core/internal/testutil/account.go (2)

96-96: LGTM! HuggingFace provider added to configured providers.

The addition follows the established pattern and correctly references the HuggingFace constant defined in core/schemas/bifrost.go.


259-266: LGTM! Key retrieval properly configured.

The HuggingFace key configuration follows the same pattern as other providers, retrieving the API key from the HUGGING_FACE_API_KEY environment variable.

core/providers/huggingface/types.go (1)

129-408: Comprehensive type definitions for HuggingFace integration.

The type definitions provide good coverage for chat, embeddings, speech, and transcription APIs. The use of pointer fields, json.RawMessage, and map[string]any provides appropriate flexibility for varying API responses while maintaining type safety where possible.

docs/features/unified-interface.mdx (1)

98-98: Documentation correctly reflects HuggingFace provider capabilities.

The provider support matrix entry accurately documents HuggingFace's supported operations and follows the consistent format of other provider entries.

core/bifrost.go (2)

26-26: Import correctly added for HuggingFace provider.

The import follows the consistent pattern used for other provider packages.


1327-1328: HuggingFace provider properly integrated into factory.

The provider instantiation follows the same pattern as other providers, correctly calling NewHuggingFaceProvider with config and logger parameters. The integration is consistent with the existing provider architecture.

docs/contributing/adding-a-provider.mdx (1)

7-2065: LGTM! Comprehensive provider implementation guide.

The documentation provides excellent coverage of both OpenAI-compatible and custom provider patterns, with clear examples, conventions, and checklists. The phase-based workflow and file organization guidelines will help contributors implement providers consistently.

core/providers/huggingface/embedding.go (2)

10-55: LGTM! Solid embedding request converter.

The function properly handles nil inputs, correctly splits model/provider, and safely extracts ExtraParams with type assertions. The mapping logic is clear and follows the converter pattern.


58-93: LGTM! Well-implemented response converter.

The method correctly handles nil responses with error returns, pre-allocates slices for performance, and properly documents that HuggingFace doesn't return usage information (setting it to zero). The conversion logic is clean and efficient.

core/providers/huggingface/transcription.go (1)

100-139: LGTM! Solid transcription response converter.

The method properly validates inputs, safely handles timestamp array access, and cleanly converts HuggingFace chunks to Bifrost segments. The implementation is defensive and correct.

core/providers/huggingface/speech.go (1)

94-116: LGTM! Clean speech response converter.

The method properly validates inputs and clearly documents that HuggingFace TTS API doesn't return usage or alignment data. The implementation is straightforward and correct.

core/providers/huggingface/chat.go (2)

318-413: LGTM! Well-structured streaming response converter.

The method properly handles nil responses and cleanly converts all streaming delta fields (role, content, reasoning, tool calls, logprobs) to Bifrost format. The implementation is thorough and correct.


86-89: No action needed. The debug variable is properly accessible.

The debug variable is defined at package level in huggingface.go (line 22) and is correctly accessible in chat.go. In Go, all files within the same package share a common namespace—no import or additional qualification is required for package-level variables. The code will compile without errors.

Likely an incorrect or invalid review comment.

core/providers/huggingface/utils.go (2)

83-128: LGTM! Well-implemented URL builder.

The function properly constructs the model hub URL with pagination, sorting, and provider filtering. The type switch for handling ExtraParams is robust and covers common types.


130-134: LGTM! Solid utility functions.

The buildModelInferenceProviderURL, convertToInferenceProviderMappings, and getModelInferenceProviderMapping functions are well-implemented with proper error handling and clean logic.

Also applies to: 199-251

core/providers/huggingface/huggingface.go (3)

20-126: LGTM! Well-structured provider implementation.

The provider struct follows standard patterns with proper field ordering. The use of sync.Pool for response objects is a good performance optimization, and the acquire/release helpers are correctly implemented. The constructor properly initializes all fields and pre-warms the pools.


303-653: LGTM! Robust chat completion implementation.

Both sync and streaming chat completion methods are well-implemented. The streaming logic properly handles SSE parsing, error cases, and the complex ResponsesStream fallback scenario. The workaround for combined usage/choices chunks (lines 597-626) is clever and well-documented.


655-677: LGTM! Clean delegation to chat completion methods.

The Responses and ResponsesStream methods correctly delegate to the corresponding chat completion methods with appropriate context flags and request type overrides.

@qwerty-dvorak qwerty-dvorak force-pushed the 12-05-feat_huggingface_provider_added branch from 5a72875 to 00ecc06 Compare December 5, 2025 20:15
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
core/schemas/mux.go (1)

1146-1221: Thought is now streamed as both user-visible text and as a separate reasoning delta, creating double-streaming and changed semantics for all providers

The code confirms the concerns raised:

  • Lines 1146–1148 compute hasContent and hasThought; line 1149 branches if either is true.
  • Lines 1207–1214 concatenate both *delta.Content and *delta.Thought into contentDelta, which becomes the OutputTextDelta.Delta.
  • Line 1228 sets state.TextItemHasContent = true for either content or thought.
  • Lines 1369–1380 emit delta.Thought separately as ResponsesStreamResponseTypeReasoningSummaryTextDelta.

This creates two key problems:

  1. Double-streaming: Chunks with only Thought now emit both an output_text.delta and a reasoning_summary_text.delta, causing reasoning content to appear in the visible text channel.

  2. Semantic shift: Chunks with both Content and Thought now concatenate them into a single visible delta, mixing chain-of-thought into user-facing text instead of keeping reasoning separate.

  3. Unguarded change: This logic applies uniformly to all providers (no provider-specific conditionals), making this a behavioral change across the entire stack, not isolated to HuggingFace.

If Thought is intended as a distinct reasoning stream (per OpenAI's reasoning semantics), this mixes concerns. If it's a fallback for providers that emit primary text as Thought, this needs explicit provider-level detection to avoid leaking reasoning into normal output for other providers.

Recommend verifying this is intentional across all providers and that clients consuming only output_text.delta are prepared to receive reasoning content.

♻️ Duplicate comments (17)
docs/contributing/adding-a-provider.mdx (1)

510-527: Variable name inconsistency persists from previous review.

The code example declares hfReq at line 500 but references providerReq throughout lines 511–527. This identical issue was flagged in a previous review. While you clarified that examples are for reference, this particular snippet is labeled "Real Example from core/providers/huggingface/chat.go" and should be corrected for accuracy.

Apply this diff to align variable names:

     // Map parameters
     if bifrostReq.Params != nil {
         params := bifrostReq.Params
         
         // Map standard parameters
         if params.Temperature != nil {
-            providerReq.Temperature = params.Temperature
+            hfReq.Temperature = params.Temperature
         }
         if params.MaxTokens != nil {
-            providerReq.MaxTokens = params.MaxTokens
+            hfReq.MaxTokens = params.MaxTokens
         }
         // ... other standard parameters
         
         // Handle provider-specific ExtraParams
         if params.ExtraParams != nil {
             if customParam, ok := params.ExtraParams["custom_param"].(string); ok {
-                providerReq.CustomParam = &customParam
+                hfReq.CustomParam = &customParam
             }
         }
     }
 
-    return providerReq
+    return hfReq
core/internal/testutil/account.go (1)

512-524: Align HuggingFace retry policy with other providers

MaxRetries is set to 1 for HuggingFace while essentially all other remote providers use 8–10 retries, and the comment “HuggingFace can be variable” argues for more retries, not fewer. This will make tests unnecessarily flaky on transient errors.

Recommend matching the standard 10‑retry policy and updating/removing the comment:

	case schemas.HuggingFace:
		return &schemas.ProviderConfig{
			NetworkConfig: schemas.NetworkConfig{
				DefaultRequestTimeoutInSeconds: 300,
-				MaxRetries:                     1, // HuggingFace can be variable
+				MaxRetries:                     10, // Align with other variable cloud providers
				RetryBackoffInitial:            2 * time.Second,
				RetryBackoffMax:                30 * time.Second,
			},
core/providers/huggingface/huggingface_test.go (1)

29-33: SpeechSynthesisModel and TranscriptionModel appear to be swapped.

Kokoro-82M is a text-to-speech model and should be assigned to SpeechSynthesisModel, while whisper-large-v3 is a speech-to-text model and should be assigned to TranscriptionModel. The SpeechSynthesisFallbacks should also reference the TTS model.

-		TranscriptionModel:   "fal-ai/hexgrad/Kokoro-82M",
-		SpeechSynthesisModel: "fal-ai/openai/whisper-large-v3",
+		TranscriptionModel:   "fal-ai/openai/whisper-large-v3",
+		SpeechSynthesisModel: "fal-ai/hexgrad/Kokoro-82M",
 		SpeechSynthesisFallbacks: []schemas.Fallback{
-			{Provider: schemas.HuggingFace, Model: "fal-ai/openai/whisper-large-v3"},
+			{Provider: schemas.HuggingFace, Model: "fal-ai/hexgrad/Kokoro-82M"},
 		},
core/providers/huggingface/transcription.go (2)

14-16: Incorrect error message references "speech" instead of "transcription".

The error message says "speech request input cannot be nil" but this is a transcription request converter.

 	if request.Input == nil {
-		return nil, fmt.Errorf("speech request input cannot be nil")
+		return nil, fmt.Errorf("transcription request input cannot be nil")
 	}

38-63: Type assertions for int will fail when values come from JSON unmarshalling.

When ExtraParams is populated from JSON, numeric values are unmarshalled as float64, not int. These type assertions will silently fail, causing parameters to be ignored.

Consider handling both types. Example fix for one parameter:

-			if val, ok := request.Params.ExtraParams["max_new_tokens"].(int); ok {
-				genParams.MaxNewTokens = &val
+			if val, ok := request.Params.ExtraParams["max_new_tokens"].(int); ok {
+				genParams.MaxNewTokens = &val
+			} else if val, ok := request.Params.ExtraParams["max_new_tokens"].(float64); ok {
+				intVal := int(val)
+				genParams.MaxNewTokens = &intVal
 			}

Apply the same pattern to max_length, min_length, min_new_tokens, num_beams, num_beam_groups, and top_k.

core/providers/huggingface/speech.go (1)

37-62: Type assertions for int may fail when values come from JSON.

Same issue as in transcription.go - when ExtraParams is populated from JSON unmarshaling, numeric values are typically float64, not int. These type assertions will silently fail.

Apply the same fix pattern as suggested for transcription.go - handle both int and float64 types for all integer parameters.

core/providers/huggingface/chat.go (2)

69-81: Potential nil pointer dereference when accessing tc.Function.Name.

On line 74, tc.Function.Name is dereferenced without checking if it's nil. If a tool call has a nil Name field, this will cause a panic.

 			for _, tc := range msg.ChatAssistantMessage.ToolCalls {
+				if tc.Function.Name == nil {
+					continue // Skip tool calls without a function name
+				}
 				hfToolCall := HuggingFaceToolCall{
 					ID:   tc.ID,
 					Type: tc.Type,

86-89: Undefined variable debug will cause compilation error.

The debug variable is referenced at lines 86 and 250 but does not appear to be defined in this file. This will cause a compilation failure.

#!/bin/bash
# Check if debug variable is defined in the huggingface package
rg -n "^var debug\b|^const debug\b|debug\s*:=\s*(true|false)" core/providers/huggingface/

Also applies to: 250-256

core/providers/huggingface/utils.go (3)

136-162: Empty provider and model names when input has no slashes.

When splitIntoModelProvider receives a model name with no slashes (t == 0), both prov and model remain empty strings. This results in malformed model identifiers downstream. A user passing "llama-7b" (without organization prefix) would trigger this issue.

Consider defaulting to hf-inference and using the original name as the model:

+	} else {
+		// No slashes - default to hf-inference with the full name as model
+		prov = hfInference
+		model = bifrostModelName
+		if debug {
+			fmt.Printf("[huggingface debug] splitIntoModelProvider (t==0): prov=%s, model=%s\n", prov, model)
+		}
 	}

180-191: Copy-paste error: Wrong provider names in error messages.

Lines 184 and 190 incorrectly reference "nebius provider" when the actual providers are "sambanova" and "scaleway" respectively.

 	case "sambanova":
 		if requestType == schemas.EmbeddingRequest {
 			defaultPath = "/sambanova/v1/embeddings"
 		} else {
-			return "", fmt.Errorf("nebius provider only supports embedding requests")
+			return "", fmt.Errorf("sambanova provider only supports embedding requests")
 		}
 	case "scaleway":
 		if requestType == schemas.EmbeddingRequest {
 			defaultPath = "/scaleway/v1/embeddings"
 		} else {
-			return "", fmt.Errorf("nebius provider only supports embedding requests")
+			return "", fmt.Errorf("scaleway provider only supports embedding requests")
 		}

164-197: Incomplete provider routing — 13 of 19 defined providers will error.

The getInferenceProviderRouteURL function only handles 6 providers (fal-ai, hf-inference, nebius, replicate, sambanova, scaleway) while INFERENCE_PROVIDERS defines 19 total. Providers like cerebras, cohere, groq, fireworks-ai, together, etc. will hit the default error case.

This affects embedding, speech, and transcription operations for the majority of defined providers.

Either expand the routing logic to handle all providers or document which providers are actually supported for which request types.

core/providers/huggingface/huggingface.go (6)

51-85: Typo: "aquire" should be "acquire".

The pool helper functions are misspelled as aquireHuggingFaceChatResponse, aquireHuggingFaceTranscriptionResponse, and aquireHuggingFaceSpeechResponse. The correct spelling is "acquire". Update the function names and their call sites at lines 348, 790, and 861.


277-281: Bug: Wrong provider constant in ListModels.

Line 279 uses schemas.Gemini instead of schemas.HuggingFace in the CheckOperationAllowed call. This causes incorrect operation permission checks for the HuggingFace provider.

Apply this diff:

-	if err := providerUtils.CheckOperationAllowed(schemas.Gemini, provider.customProviderConfig, schemas.ListModelsRequest); err != nil {
+	if err := providerUtils.CheckOperationAllowed(schemas.HuggingFace, provider.customProviderConfig, schemas.ListModelsRequest); err != nil {
 		return nil, err
 	}

295-301: Bug: Wrong request types in unsupported operation errors.

Both TextCompletion (line 296) and TextCompletionStream (line 300) return errors using schemas.EmbeddingRequest instead of the correct request types. This was previously flagged and marked as addressed, but the issue persists in the current code.

Apply this diff:

 func (provider *HuggingFaceProvider) TextCompletion(ctx context.Context, key schemas.Key, request *schemas.BifrostTextCompletionRequest) (*schemas.BifrostTextCompletionResponse, *schemas.BifrostError) {
-	return nil, providerUtils.NewUnsupportedOperationError(schemas.EmbeddingRequest, provider.GetProviderKey())
+	return nil, providerUtils.NewUnsupportedOperationError(schemas.TextCompletionRequest, provider.GetProviderKey())
 }

 func (provider *HuggingFaceProvider) TextCompletionStream(ctx context.Context, postHookRunner schemas.PostHookRunner, key schemas.Key, request *schemas.BifrostTextCompletionRequest) (chan *schemas.BifrostStream, *schemas.BifrostError) {
-	return nil, providerUtils.NewUnsupportedOperationError(schemas.EmbeddingRequest, provider.GetProviderKey())
+	return nil, providerUtils.NewUnsupportedOperationError(schemas.TextCompletionStreamRequest, provider.GetProviderKey())
 }

700-707: Bug: Wrong request type in Embedding error messages.

Lines 701 and 706 use schemas.SpeechRequest instead of schemas.EmbeddingRequest when returning unsupported operation errors in the Embedding method.

Apply this diff:

 	if providerMapping == nil {
-		return nil, providerUtils.NewUnsupportedOperationError(schemas.SpeechRequest, provider.GetProviderKey())
+		return nil, providerUtils.NewUnsupportedOperationError(schemas.EmbeddingRequest, provider.GetProviderKey())
 	}

 	mapping, ok := providerMapping[inferenceProvider]
 	if !ok || mapping.ProviderModelMapping == "" || mapping.ProviderTask != "feature-extraction" {
-		return nil, providerUtils.NewUnsupportedOperationError(schemas.SpeechRequest, provider.GetProviderKey())
+		return nil, providerUtils.NewUnsupportedOperationError(schemas.EmbeddingRequest, provider.GetProviderKey())
 	}

773-776: Bug: Wrong task type check in Speech.

Line 774 checks for "automatic-speech-recognition" task, but Speech (text-to-speech) should check for "text-to-speech". The task checks appear to be swapped between Speech and Transcription functions.

Apply this diff:

 	mapping, ok := providerMapping[inferenceProvider]
-	if !ok || mapping.ProviderModelMapping == "" || mapping.ProviderTask != "automatic-speech-recognition" {
+	if !ok || mapping.ProviderModelMapping == "" || mapping.ProviderTask != "text-to-speech" {
 		return nil, providerUtils.NewUnsupportedOperationError(schemas.SpeechRequest, provider.GetProviderKey())
 	}

831-838: Bug: Wrong task type check and error messages in Transcription.

Two issues in the Transcription function:

  1. Lines 832 and 837 use schemas.SpeechRequest instead of schemas.TranscriptionRequest in error messages
  2. Line 836 checks for "text-to-speech" task, but Transcription should check for "automatic-speech-recognition" (which is what Speech incorrectly checks at line 774, suggesting these are swapped)

Apply this diff:

 	if providerMapping == nil {
-		return nil, providerUtils.NewUnsupportedOperationError(schemas.SpeechRequest, provider.GetProviderKey())
+		return nil, providerUtils.NewUnsupportedOperationError(schemas.TranscriptionRequest, provider.GetProviderKey())
 	}

 	mapping, ok := providerMapping[inferenceProvider]
-	if !ok || mapping.ProviderModelMapping == "" || mapping.ProviderTask != "text-to-speech" {
-		return nil, providerUtils.NewUnsupportedOperationError(schemas.SpeechRequest, provider.GetProviderKey())
+	if !ok || mapping.ProviderModelMapping == "" || mapping.ProviderTask != "automatic-speech-recognition" {
+		return nil, providerUtils.NewUnsupportedOperationError(schemas.TranscriptionRequest, provider.GetProviderKey())
 	}
🧹 Nitpick comments (1)
core/internal/testutil/account.go (1)

530-876: Consider adding a ComprehensiveTestConfig entry for HuggingFace

GetConfiguredProviders and GetConfigForProvider now support schemas.HuggingFace, but AllProviderConfigs doesn’t define a ComprehensiveTestConfig for it. That means HuggingFace won’t participate in these cross‑provider end‑to‑end scenarios.

Recommend adding a HuggingFace entry here (with a representative chat/embedding model) so the new provider is exercised alongside the others.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5a72875 and 00ecc06.

📒 Files selected for processing (28)
  • .github/workflows/pr-tests.yml (1 hunks)
  • .github/workflows/release-pipeline.yml (4 hunks)
  • Makefile (8 hunks)
  • core/bifrost.go (2 hunks)
  • core/internal/testutil/account.go (3 hunks)
  • core/internal/testutil/chat_completion_stream.go (1 hunks)
  • core/internal/testutil/responses_stream.go (1 hunks)
  • core/providers/huggingface/chat.go (1 hunks)
  • core/providers/huggingface/embedding.go (1 hunks)
  • core/providers/huggingface/huggingface.go (1 hunks)
  • core/providers/huggingface/huggingface_test.go (1 hunks)
  • core/providers/huggingface/models.go (1 hunks)
  • core/providers/huggingface/speech.go (1 hunks)
  • core/providers/huggingface/transcription.go (1 hunks)
  • core/providers/huggingface/types copy.go (1 hunks)
  • core/providers/huggingface/types.go (1 hunks)
  • core/providers/huggingface/utils.go (1 hunks)
  • core/schemas/account.go (2 hunks)
  • core/schemas/bifrost.go (3 hunks)
  • core/schemas/mux.go (2 hunks)
  • docs/apis/openapi.json (1 hunks)
  • docs/contributing/adding-a-provider.mdx (1 hunks)
  • docs/features/unified-interface.mdx (1 hunks)
  • transports/config.schema.json (2 hunks)
  • ui/README.md (1 hunks)
  • ui/lib/constants/config.ts (2 hunks)
  • ui/lib/constants/icons.tsx (1 hunks)
  • ui/lib/constants/logs.ts (2 hunks)
🚧 Files skipped from review as they are similar to previous changes (12)
  • ui/README.md
  • core/schemas/bifrost.go
  • core/internal/testutil/chat_completion_stream.go
  • core/bifrost.go
  • core/schemas/account.go
  • docs/features/unified-interface.mdx
  • ui/lib/constants/config.ts
  • .github/workflows/pr-tests.yml
  • .github/workflows/release-pipeline.yml
  • core/providers/huggingface/types copy.go
  • core/providers/huggingface/embedding.go
  • core/providers/huggingface/types.go
👮 Files not reviewed due to content moderation or server errors (1)
  • ui/lib/constants/icons.tsx
🧰 Additional context used
📓 Path-based instructions (1)
**

⚙️ CodeRabbit configuration file

always check the stack if there is one for the current PR. do not give localized reviews for the PR, always see all changes in the light of the whole stack of PRs (if there is a stack, if there is no stack you can continue to make localized suggestions/reviews)

Files:

  • transports/config.schema.json
  • core/providers/huggingface/huggingface_test.go
  • core/schemas/mux.go
  • core/internal/testutil/responses_stream.go
  • core/providers/huggingface/transcription.go
  • ui/lib/constants/logs.ts
  • core/providers/huggingface/models.go
  • core/internal/testutil/account.go
  • docs/apis/openapi.json
  • Makefile
  • core/providers/huggingface/speech.go
  • ui/lib/constants/icons.tsx
  • docs/contributing/adding-a-provider.mdx
  • core/providers/huggingface/chat.go
  • core/providers/huggingface/utils.go
  • core/providers/huggingface/huggingface.go
🧬 Code graph analysis (6)
core/providers/huggingface/huggingface_test.go (4)
core/internal/testutil/setup.go (1)
  • SetupTest (51-60)
core/internal/testutil/account.go (2)
  • ComprehensiveTestConfig (47-64)
  • TestScenarios (22-44)
core/schemas/bifrost.go (2)
  • HuggingFace (51-51)
  • Fallback (131-134)
core/internal/testutil/tests.go (1)
  • RunAllComprehensiveTests (15-62)
core/providers/huggingface/transcription.go (4)
core/schemas/transcriptions.go (2)
  • BifrostTranscriptionRequest (3-10)
  • BifrostTranscriptionResponse (16-26)
core/providers/huggingface/types.go (5)
  • HuggingFaceTranscriptionRequest (328-333)
  • HuggingFaceTranscriptionRequestParameters (336-339)
  • HuggingFaceTranscriptionGenerationParameters (342-359)
  • HuggingFaceTranscriptionEarlyStopping (363-366)
  • HuggingFaceTranscriptionResponse (399-402)
core/schemas/models.go (1)
  • Model (109-129)
core/schemas/bifrost.go (2)
  • BifrostResponseExtraFields (287-296)
  • HuggingFace (51-51)
core/providers/huggingface/models.go (3)
core/providers/huggingface/types.go (1)
  • HuggingFaceListModelsResponse (21-23)
core/schemas/bifrost.go (13)
  • ModelProvider (32-32)
  • RequestType (86-86)
  • ChatCompletionRequest (92-92)
  • ChatCompletionStreamRequest (93-93)
  • TextCompletionRequest (90-90)
  • TextCompletionStreamRequest (91-91)
  • ResponsesRequest (94-94)
  • ResponsesStreamRequest (95-95)
  • EmbeddingRequest (96-96)
  • SpeechRequest (97-97)
  • SpeechStreamRequest (98-98)
  • TranscriptionRequest (99-99)
  • TranscriptionStreamRequest (100-100)
core/schemas/models.go (1)
  • Model (109-129)
core/internal/testutil/account.go (4)
core/schemas/bifrost.go (1)
  • HuggingFace (51-51)
core/schemas/account.go (1)
  • Key (8-18)
core/schemas/provider.go (4)
  • ProviderConfig (234-242)
  • NetworkConfig (45-53)
  • DefaultRequestTimeoutInSeconds (15-15)
  • ConcurrencyAndBufferSize (128-131)
core/internal/testutil/cross_provider_scenarios.go (1)
  • ProviderConfig (45-53)
core/providers/huggingface/speech.go (3)
core/schemas/speech.go (2)
  • BifrostSpeechRequest (9-16)
  • BifrostSpeechResponse (22-29)
core/providers/huggingface/types.go (5)
  • HuggingFaceSpeechRequest (304-310)
  • HuggingFaceSpeechParameters (313-316)
  • HuggingFaceTranscriptionGenerationParameters (342-359)
  • HuggingFaceTranscriptionEarlyStopping (363-366)
  • HuggingFaceSpeechResponse (319-323)
core/schemas/bifrost.go (2)
  • BifrostResponseExtraFields (287-296)
  • HuggingFace (51-51)
core/providers/huggingface/huggingface.go (7)
core/providers/huggingface/types.go (5)
  • HuggingFaceChatResponse (129-136)
  • HuggingFaceTranscriptionResponse (399-402)
  • HuggingFaceSpeechResponse (319-323)
  • HuggingFaceResponseError (268-272)
  • HuggingFaceHubError (263-266)
core/providers/utils/utils.go (7)
  • GetRequestPath (219-239)
  • SetExtraHeaders (178-208)
  • HandleProviderAPIError (317-337)
  • CheckAndDecodeBody (423-431)
  • NewBifrostOperationError (449-460)
  • HandleProviderResponse (343-387)
  • ShouldSendBackRawResponse (482-487)
core/schemas/bifrost.go (4)
  • HuggingFace (51-51)
  • EmbeddingRequest (96-96)
  • BifrostStream (321-328)
  • ChatCompletionRequest (92-92)
core/providers/huggingface/chat.go (1)
  • ToHuggingFaceChatCompletionRequest (12-199)
core/schemas/mux.go (1)
  • ChatToResponsesStreamState (966-982)
core/schemas/responses.go (1)
  • BifrostResponsesResponse (45-83)
core/providers/huggingface/embedding.go (1)
  • ToHuggingFaceEmbeddingRequest (10-55)
🪛 LanguageTool
docs/contributing/adding-a-provider.mdx

[grammar] ~43-~43: Use a hyphen to join words.
Context: ... OpenAI API compatible: #### Non-OpenAI Compatible Providers If the provider ha...

(QB_NEW_EN_HYPHEN)


[grammar] ~71-~71: Use a hyphen to join words.
Context: ...T - Add comprehensive tests #### OpenAI Compatible Providers If the provider is...

(QB_NEW_EN_HYPHEN)


[grammar] ~629-~629: Use a hyphen to join words.
Context: ...ls (for response converters) ### OpenAI Compatible Providers If you are impleme...

(QB_NEW_EN_HYPHEN)


[grammar] ~1469-~1469: Use a hyphen to join words.
Context: ...r complete details. --- ### For OpenAI Compatible Providers For OpenAI-compati...

(QB_NEW_EN_HYPHEN)


[uncategorized] ~1779-~1779: The official name of this software platform is spelled with a capital “H”.
Context: ....yml- ✅ Environment variable added to.github/workflows/release-pipeline.yml` (4 jobs...

(GITHUB)

🔇 Additional comments (18)
core/internal/testutil/responses_stream.go (1)

692-695: Lifecycle stream safety cap increase looks appropriate

Bumping the lifecycle loop guard to responseCount > 300 keeps the infinite‑loop protection but avoids prematurely truncating providers that emit many small lifecycle events (e.g., more granular Hugging Face streaming), especially given the existing 200s context timeout. This looks like a safe, non‑breaking adjustment in the broader streaming test harness.

Makefile (3)

24-29: Color initialization refactor is cleaner and more portable.

Replacing hard-coded ANSI escape sequences with $(shell printf ...) improves readability and portability across different shells. This is a good improvement.


15-21: Binary path resolution logic is sound but relies on Go toolchain consistency.

The GOBIN/GOPATH detection and fallback logic correctly identifies where go install places binaries. The pattern of preferring explicit DEFAULT_GOBIN then falling back to which is robust for most environments. However, ensure downstream targets (e.g., dev, test) tolerate scenarios where binaries are missing or not in PATH—these targets already call install-air and install-gotestsum, which is good.

Verify that all targets invoking $(AIR_BIN) or $(GOTESTSUM_BIN) are guarded by corresponding install-* targets or handle missing binaries gracefully.


65-86: install-ui improvements handle root check and Next.js discovery well.

The root check (lines 67-70) prevents permission issues on developer machines. The multi-stage Next.js discovery (lines 76-85) gracefully handles systems without global npm directories (e.g., Nix). The fallback chain (local → npx → global) is well-reasoned for environment diversity.

transports/config.schema.json (2)

135-140: HuggingFace provider wired into transport config schema correctly

Using the generic #/$defs/provider ref for "huggingface" is consistent with other HTTP providers and looks good.


764-784: Semantic cache embedding provider enum correctly extended for HuggingFace

Including "huggingface" in the semanticcache plugin provider enum keeps the plugin in sync with the newly supported embedding provider.

docs/apis/openapi.json (1)

3239-3258: ModelProvider enum now exposes HuggingFace in OpenAPI

Adding "huggingface" to the ModelProvider enum keeps the public OpenAPI schema aligned with the backend provider set; change looks correct.

core/internal/testutil/account.go (1)

77-99: HuggingFace correctly added to configured providers

Including schemas.HuggingFace in GetConfiguredProviders keeps the comprehensive test account aligned with the new provider set.

ui/lib/constants/logs.ts (2)

2-20: KnownProvidersNames correctly extended with HuggingFace

Adding "huggingface" here ensures log views and filters recognize the new provider at the type level.


43-61: ProviderLabels updated for HuggingFace display name

The huggingface: "HuggingFace" label keeps the logs UI consistent with the rest of the product’s provider naming.

core/providers/huggingface/huggingface_test.go (1)

12-63: Test structure looks good overall.

The test follows the established pattern: parallel execution, environment variable check for API key, proper setup/teardown with defer cancel(), and client shutdown. The comprehensive test config enables a reasonable subset of scenarios for the HuggingFace provider.

core/providers/huggingface/models.go (2)

16-44: Model ID conversion logic looks correct.

The conversion properly handles nil input, pre-allocates the slice, skips models without IDs or supported methods, and constructs composite IDs with the provider prefix. The use of model.ModelID for the display name and model.ID for the HuggingFace reference appears intentional based on the type definitions.


46-104: Method derivation logic is well-structured.

The function correctly uses a set for deduplication, handles both pipeline tags and model tags with appropriate fallbacks, and returns a sorted slice for deterministic output. The broad tag matching (e.g., strings.Contains(tagLower, "embedding")) is reasonable for discovery purposes.

core/providers/huggingface/transcription.go (1)

100-139: Response conversion implementation is correct.

The ToBifrostTranscriptionResponse method properly validates inputs, maps the text response, and safely converts chunks to segments with appropriate bounds checking for timestamps.

core/providers/huggingface/speech.go (1)

9-24: Request and response conversion logic is well-implemented.

The ToHuggingFaceSpeechRequest properly validates input, extracts provider/model, and maps the text input. The ToBifrostSpeechResponse correctly handles nil cases and maps the audio data with appropriate extra fields.

Also applies to: 94-116

core/providers/huggingface/chat.go (1)

201-315: Response conversion implementations are comprehensive.

Both ToBifrostChatResponse and ToBifrostChatStreamResponse properly handle nil checks, convert choices with messages/deltas, map tool calls, logprobs, and usage information. The streaming response correctly maps the Reasoning field to Thought.

Also applies to: 317-413

core/providers/huggingface/utils.go (2)

83-128: URL construction for model hub API is well-implemented.

The buildModelHubURL function properly handles pagination limits with bounds checking, sets appropriate query parameters, and handles various ExtraParams types with type switching.


217-251: Model inference provider mapping retrieval is correctly implemented.

Proper use of fasthttp with deferred resource release, appropriate error handling for API responses, and clean conversion to internal mapping structure.

@qwerty-dvorak qwerty-dvorak force-pushed the 12-05-feat_huggingface_provider_added branch 2 times, most recently from f279893 to 4663a80 Compare December 5, 2025 21:20
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (10)
core/providers/huggingface/utils.go (1)

168-201: Only a subset of INFERENCE_PROVIDERS are routable in getInferenceProviderRouteURL.

The switch handles fal-ai, hf-inference, nebius, replicate, sambanova, and scaleway, but other entries in INFERENCE_PROVIDERS (e.g., cerebras, cohere, groq, featherless-ai, fireworks-ai, hyperbolic, novita, nscale, ovhcloud, publicai, together, wavespeed, zai-org) fall through to the default error. That means models mapped to those providers (and surfaced via list-models) will currently be unusable.

Either (a) add routing cases for the remaining providers with correct default paths or (b) remove any providers you don’t intend to support yet from INFERENCE_PROVIDERS so they don’t appear as “supported” in filters and mappings.

core/internal/testutil/account.go (1)

259-266: Fix HuggingFace test API key env var name and optionally support HF_TOKEN.

This branch reads HUGGING_FACE_API_KEY, but the PR/docs use HUGGINGFACE_API_KEY and the provider is supposed to support HF_TOKEN as well. As written, CI that exports HUGGINGFACE_API_KEY/HF_TOKEN will see empty keys in tests.

Consider standardizing on HUGGINGFACE_API_KEY with an HF_TOKEN fallback:

-	case schemas.HuggingFace:
-		return []schemas.Key{
-			{
-				Value:  os.Getenv("HUGGING_FACE_API_KEY"),
-				Models: []string{},
-				Weight: 1.0,
-			},
-		}, nil
+	case schemas.HuggingFace:
+		key := os.Getenv("HUGGINGFACE_API_KEY")
+		if key == "" {
+			key = os.Getenv("HF_TOKEN")
+		}
+		return []schemas.Key{
+			{
+				Value:  key,
+				Models: []string{},
+				Weight: 1.0,
+			},
+		}, nil
Makefile (3)

21-21: Simplify GOTESTSUM_BIN fallback for consistency with AIR_BIN.

The shell which gotestsum 2>/dev/null || echo gotestsum fallback adds complexity without benefit. For consistency with AIR_BIN (line 20), use a simple fallback.


103-104: Replace which with command -v for proper full-path handling.

The which $(GOTESTSUM_BIN) fallback doesn't work correctly with full paths. Use command -v which handles both cases, or align with the AIR_BIN pattern using only [ -x ].


97-98: Incorrect use of which with full path variables.

At lines 97-98 (and 110-111), which $$INSTALLED is used where $$INSTALLED is an absolute path. The which command searches PATH by name, not by checking if a full path exists. Replace with a direct existence check: [ ! -x "$$INSTALLED" ].

core/providers/huggingface/chat.go (2)

93-96: Undefined variable debug will cause compilation error.

The variable debug is referenced but never defined in this file. This will prevent the code from compiling.

Either define the debug variable or remove the debug logging:

+// Package-level debug flag (set via build tags or environment)
+var debug = false
+
 func ToHuggingFaceChatCompletionRequest(bifrostReq *schemas.BifrostChatRequest) *HuggingFaceChatRequest {

Or remove the debug blocks entirely if not needed:

-		if debug {
-			fmt.Printf("[huggingface debug] Added tool_call_id=%s to tool message\n", *msg.ChatToolMessage.ToolCallID)
-		}

257-263: Same undefined debug variable issue.

This is the same compilation error as in the request converter - debug is undefined.

core/providers/huggingface/huggingface.go (3)

295-301: Bug: Wrong request type in unsupported operation errors.

Both TextCompletion and TextCompletionStream use schemas.EmbeddingRequest instead of their respective request types. This causes incorrect error messages.

Apply this diff:

 func (provider *HuggingFaceProvider) TextCompletion(ctx context.Context, key schemas.Key, request *schemas.BifrostTextCompletionRequest) (*schemas.BifrostTextCompletionResponse, *schemas.BifrostError) {
-	return nil, providerUtils.NewUnsupportedOperationError(schemas.EmbeddingRequest, provider.GetProviderKey())
+	return nil, providerUtils.NewUnsupportedOperationError(schemas.TextCompletionRequest, provider.GetProviderKey())
 }

 func (provider *HuggingFaceProvider) TextCompletionStream(ctx context.Context, postHookRunner schemas.PostHookRunner, key schemas.Key, request *schemas.BifrostTextCompletionRequest) (chan *schemas.BifrostStream, *schemas.BifrostError) {
-	return nil, providerUtils.NewUnsupportedOperationError(schemas.EmbeddingRequest, provider.GetProviderKey())
+	return nil, providerUtils.NewUnsupportedOperationError(schemas.TextCompletionStreamRequest, provider.GetProviderKey())
 }

713-716: Bug: Wrong request type in Embedding error message.

Line 715 uses schemas.SpeechRequest instead of schemas.EmbeddingRequest when the mapping check fails.

Apply this diff:

 	mapping, ok := providerMapping[inferenceProvider]
 	if !ok || mapping.ProviderModelMapping == "" || mapping.ProviderTask != "feature-extraction" {
-		return nil, providerUtils.NewUnsupportedOperationError(schemas.SpeechRequest, provider.GetProviderKey())
+		return nil, providerUtils.NewUnsupportedOperationError(schemas.EmbeddingRequest, provider.GetProviderKey())
 	}

847-854: Bug: Wrong request type in Transcription error messages.

Lines 848 and 853 use schemas.SpeechRequest instead of schemas.TranscriptionRequest.

Apply this diff:

 	if providerMapping == nil {
-		return nil, providerUtils.NewUnsupportedOperationError(schemas.SpeechRequest, provider.GetProviderKey())
+		return nil, providerUtils.NewUnsupportedOperationError(schemas.TranscriptionRequest, provider.GetProviderKey())
 	}

 	mapping, ok := providerMapping[inferenceProvider]
 	if !ok || mapping.ProviderModelMapping == "" || mapping.ProviderTask != "automatic-speech-recognition" {
-		return nil, providerUtils.NewUnsupportedOperationError(schemas.SpeechRequest, provider.GetProviderKey())
+		return nil, providerUtils.NewUnsupportedOperationError(schemas.TranscriptionRequest, provider.GetProviderKey())
 	}
🧹 Nitpick comments (9)
docs/contributing/adding-a-provider.mdx (1)

41-75: Fix hyphenation for compound adjectives.

According to English grammar rules, compound adjectives should be hyphenated when they precede a noun. Update all instances of "OpenAI Compatible" to "OpenAI-compatible" when used as adjectives:

  • Line 43: "not OpenAI compatible" → "not OpenAI-compatible"
  • Line 71: "#### OpenAI Compatible Providers" → "#### OpenAI-Compatible Providers"
  • Line 629: "### OpenAI Compatible Providers" → "### OpenAI-Compatible Providers"
  • Line 1475: "### For OpenAI Compatible Providers" → "### For OpenAI-Compatible Providers"

Also update any related section titles and cross-references to maintain consistency throughout the document.

Also applies to: 629-632, 1475-1477

core/providers/huggingface/utils.go (2)

27-82: inferenceProvider constants and provider lists look consistent with HF docs.

INFERENCE_PROVIDERS and PROVIDERS_OR_POLICIES are well-structured; if you want slightly tighter typing, you could append the auto constant instead of the raw "auto" literal in PROVIDERS_OR_POLICIES, but the current form is functionally fine.


257-296: Int extraction helper is flexible; be aware of truncation semantics.

extractIntFromInterface handles most numeric variants (including json.Number) and falls back cleanly; just note that all float cases (and json.Number via Float64) are truncated via int(...), which is fine if all upstream numeric fields are expected to be integral.

core/providers/huggingface/huggingface_test.go (1)

24-57: HuggingFace scenarios omit embedding/audio tests despite models being configured.

You’ve set EmbeddingModel, TranscriptionModel, and SpeechSynthesisModel, but all corresponding scenario flags (Embedding, Transcription, TranscriptionStream, SpeechSynthesis, SpeechSynthesisStream) are false, so RunAllComprehensiveTests won’t actually exercise those paths. Once you’re confident in those flows, consider flipping these booleans to true so chat, embeddings, and audio are all covered end-to-end.

core/providers/huggingface/models.go (1)

46-104: deriveSupportedMethods’ pipeline/tag heuristics look solid and conservative.

Normalizing the pipeline tag, aggregating methods via a set, and falling back to tags for embeddings, chat/text, TTS, and ASR is a good balance between coverage and precision, and sorting the final method list keeps responses deterministic.

core/providers/huggingface/embedding.go (1)

10-13: Inconsistent nil handling - returning (nil, nil) may cause silent failures.

Returning (nil, nil) when bifrostReq is nil differs from other converters in this PR (e.g., ToBifrostEmbeddingResponse returns an error for nil input). This could lead to silent failures where the caller doesn't know if the request was successfully converted or if the input was nil.

Consider returning an explicit error for consistency:

 func ToHuggingFaceEmbeddingRequest(bifrostReq *schemas.BifrostEmbeddingRequest) (*HuggingFaceEmbeddingRequest, error) {
 	if bifrostReq == nil {
-		return nil, nil
+		return nil, fmt.Errorf("bifrost embedding request is nil")
 	}

Alternatively, if nil-in-nil-out is intentional, document this behavior with a comment.

core/providers/huggingface/speech.go (1)

9-12: Inconsistent nil handling pattern.

Similar to ToHuggingFaceEmbeddingRequest, returning (nil, nil) for nil input may cause silent failures. Consider returning an error or documenting the nil-in-nil-out behavior for consistency across the provider.

core/providers/huggingface/transcription.go (1)

124-129: Potential silent data loss when Timestamp has exactly one element.

The condition len(chunk.Timestamp) >= 2 handles empty and full timestamps, but if a chunk has exactly one timestamp element, both start and end will be zero. Consider logging or handling this edge case explicitly if it indicates malformed data.

core/providers/huggingface/chat.go (1)

36-38: Silently discarding marshalling errors could hide bugs.

Multiple locations discard sonic.Marshal errors with _. While this may be acceptable for optional fields, consider logging these errors at debug level to aid troubleshooting.

-			contentJSON, _ := sonic.Marshal(*msg.Content.ContentStr)
+			contentJSON, err := sonic.Marshal(*msg.Content.ContentStr)
+			if err != nil {
+				// Log error but continue - content will be empty
+				continue
+			}
 			hfMsg.Content = json.RawMessage(contentJSON)
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 00ecc06 and f279893.

📒 Files selected for processing (26)
  • .github/workflows/pr-tests.yml (1 hunks)
  • .github/workflows/release-pipeline.yml (4 hunks)
  • Makefile (8 hunks)
  • core/bifrost.go (2 hunks)
  • core/internal/testutil/account.go (3 hunks)
  • core/internal/testutil/responses_stream.go (1 hunks)
  • core/providers/huggingface/chat.go (1 hunks)
  • core/providers/huggingface/embedding.go (1 hunks)
  • core/providers/huggingface/huggingface.go (1 hunks)
  • core/providers/huggingface/huggingface_test.go (1 hunks)
  • core/providers/huggingface/models.go (1 hunks)
  • core/providers/huggingface/speech.go (1 hunks)
  • core/providers/huggingface/transcription.go (1 hunks)
  • core/providers/huggingface/types.go (1 hunks)
  • core/providers/huggingface/utils.go (1 hunks)
  • core/schemas/account.go (2 hunks)
  • core/schemas/bifrost.go (3 hunks)
  • core/schemas/mux.go (2 hunks)
  • docs/apis/openapi.json (1 hunks)
  • docs/contributing/adding-a-provider.mdx (1 hunks)
  • docs/features/unified-interface.mdx (1 hunks)
  • transports/config.schema.json (2 hunks)
  • ui/README.md (1 hunks)
  • ui/lib/constants/config.ts (2 hunks)
  • ui/lib/constants/icons.tsx (1 hunks)
  • ui/lib/constants/logs.ts (2 hunks)
🚧 Files skipped from review as they are similar to previous changes (12)
  • core/schemas/bifrost.go
  • core/bifrost.go
  • ui/lib/constants/logs.ts
  • transports/config.schema.json
  • .github/workflows/pr-tests.yml
  • core/schemas/account.go
  • core/schemas/mux.go
  • .github/workflows/release-pipeline.yml
  • ui/lib/constants/config.ts
  • core/internal/testutil/responses_stream.go
  • docs/features/unified-interface.mdx
  • ui/README.md
👮 Files not reviewed due to content moderation or server errors (1)
  • ui/lib/constants/icons.tsx
🧰 Additional context used
📓 Path-based instructions (1)
**

⚙️ CodeRabbit configuration file

always check the stack if there is one for the current PR. do not give localized reviews for the PR, always see all changes in the light of the whole stack of PRs (if there is a stack, if there is no stack you can continue to make localized suggestions/reviews)

Files:

  • docs/apis/openapi.json
  • core/providers/huggingface/embedding.go
  • core/internal/testutil/account.go
  • core/providers/huggingface/models.go
  • Makefile
  • core/providers/huggingface/speech.go
  • docs/contributing/adding-a-provider.mdx
  • core/providers/huggingface/transcription.go
  • ui/lib/constants/icons.tsx
  • core/providers/huggingface/chat.go
  • core/providers/huggingface/utils.go
  • core/providers/huggingface/huggingface_test.go
  • core/providers/huggingface/huggingface.go
  • core/providers/huggingface/types.go
🧬 Code graph analysis (5)
core/providers/huggingface/embedding.go (3)
core/schemas/embedding.go (4)
  • BifrostEmbeddingRequest (9-16)
  • BifrostEmbeddingResponse (22-28)
  • EmbeddingData (118-122)
  • EmbeddingStruct (124-128)
core/providers/huggingface/types.go (2)
  • HuggingFaceEmbeddingRequest (303-313)
  • HuggingFaceEmbeddingResponse (324-324)
core/schemas/chatcompletions.go (1)
  • BifrostLLMUsage (640-647)
core/internal/testutil/account.go (4)
core/schemas/bifrost.go (1)
  • HuggingFace (51-51)
core/schemas/account.go (1)
  • Key (8-18)
core/schemas/provider.go (4)
  • ProviderConfig (234-242)
  • NetworkConfig (45-53)
  • DefaultRequestTimeoutInSeconds (15-15)
  • ConcurrencyAndBufferSize (128-131)
core/internal/testutil/cross_provider_scenarios.go (1)
  • ProviderConfig (45-53)
core/providers/huggingface/speech.go (4)
core/schemas/speech.go (2)
  • BifrostSpeechRequest (9-16)
  • BifrostSpeechResponse (22-29)
core/providers/huggingface/types.go (5)
  • HuggingFaceSpeechRequest (329-335)
  • HuggingFaceSpeechParameters (338-341)
  • HuggingFaceTranscriptionGenerationParameters (367-384)
  • HuggingFaceTranscriptionEarlyStopping (388-391)
  • HuggingFaceSpeechResponse (344-348)
core/schemas/models.go (1)
  • Model (109-129)
core/schemas/bifrost.go (2)
  • BifrostResponseExtraFields (287-296)
  • HuggingFace (51-51)
core/providers/huggingface/huggingface.go (4)
core/providers/utils/utils.go (11)
  • GetProviderName (734-741)
  • GetRequestPath (219-239)
  • SetExtraHeaders (178-208)
  • MakeRequestWithContext (39-93)
  • HandleProviderAPIError (317-337)
  • NewBifrostOperationError (449-460)
  • HandleProviderResponse (343-387)
  • ShouldSendBackRawResponse (482-487)
  • NewUnsupportedOperationError (391-402)
  • ProcessAndSendBifrostError (571-601)
  • GetBifrostResponseForStreamResponse (777-805)
core/providers/huggingface/chat.go (1)
  • ToHuggingFaceChatCompletionRequest (12-206)
core/providers/huggingface/embedding.go (1)
  • ToHuggingFaceEmbeddingRequest (10-58)
core/providers/huggingface/transcription.go (1)
  • ToHuggingFaceTranscriptionRequest (9-101)
core/providers/huggingface/types.go (1)
core/schemas/models.go (1)
  • Model (109-129)
🪛 LanguageTool
docs/contributing/adding-a-provider.mdx

[grammar] ~43-~43: Use a hyphen to join words.
Context: ... OpenAI API compatible: #### Non-OpenAI Compatible Providers If the provider ha...

(QB_NEW_EN_HYPHEN)


[grammar] ~71-~71: Use a hyphen to join words.
Context: ...T - Add comprehensive tests #### OpenAI Compatible Providers If the provider is...

(QB_NEW_EN_HYPHEN)


[grammar] ~629-~629: Use a hyphen to join words.
Context: ...ls (for response converters) ### OpenAI Compatible Providers If you are impleme...

(QB_NEW_EN_HYPHEN)


[grammar] ~1475-~1475: Use a hyphen to join words.
Context: ...r complete details. --- ### For OpenAI Compatible Providers For OpenAI-compati...

(QB_NEW_EN_HYPHEN)


[uncategorized] ~1785-~1785: The official name of this software platform is spelled with a capital “H”.
Context: ....yml- ✅ Environment variable added to.github/workflows/release-pipeline.yml` (4 jobs...

(GITHUB)

🔇 Additional comments (31)
docs/apis/openapi.json (1)

3238-3258: ModelProvider enum now correctly exposes HuggingFace (and Cerebras) in the public API.

The added "cerebras" and "huggingface" enum values align with the new ModelProvider constants and make these providers addressable via the OpenAPI schema; no issues from a schema/compat perspective.

core/providers/huggingface/utils.go (3)

83-128: Model hub URL builder is robust and respects pagination and overrides.

The URL construction correctly clamps limit, sets sensible defaults (full, sort, direction, inference_provider), and then lets ExtraParams override as needed; this should work well for listing inference-provider-backed models.


203-255: Provider-mapping fetch and conversion are straightforward and aligned with fasthttp/providerUtils patterns.

The mapping conversion guards against nil, preserves tasks and model mappings, and the HTTP helper correctly acquires/releases fasthttp objects, handles non-200 responses via HandleProviderAPIError, and decodes JSON into the mapping type; this looks solid.


137-166: All call sites of splitIntoModelProvider properly handle the error return value. Every instance checks nameErr != nil and either propagates the error or wraps it in an appropriate error type (UnsupportedOperationError). The stricter validation of model name format is safely enforced across the codebase with no gaps in error handling.

core/internal/testutil/account.go (2)

77-99: Including HuggingFace in the configured providers set is correct.

Adding schemas.HuggingFace here keeps the test harness aligned with the new provider and ensures it participates in cross-provider setups.


512-524: HuggingFace ProviderConfig defaults look reasonable.

A 300s timeout, 10 retries, and moderate backoff (2s–30s) with standard concurrency/buffer mirror how other “variable” cloud providers are configured; this should be fine as a starting point.

core/providers/huggingface/models.go (1)

16-44: Model listing transformation correctly scopes IDs and filters unsupported models.

ToBifrostListModelsResponse sensibly skips models without IDs or derived methods, prefixes IDs with the Bifrost provider key, and stores the raw Hugging Face ID separately; this gives a clean, provider-scoped surface for /v1/models.

core/providers/huggingface/embedding.go (1)

60-96: LGTM!

The ToBifrostEmbeddingResponse method correctly converts HuggingFace embeddings to Bifrost format, properly handles nil input with an error, and documents that usage information is unavailable from the HuggingFace API.

core/providers/huggingface/speech.go (1)

98-119: LGTM!

The response converter properly validates the model name and correctly notes that HuggingFace TTS doesn't return usage or alignment data.

Makefile (1)

66-70: Good addition of root-user guard for local development.

Preventing make install-ui from running as root on developer machines avoids common permission issues with npm global installs.

core/providers/huggingface/transcription.go (1)

38-82: LGTM!

The integer parameter extraction correctly uses extractIntFromInterface to handle both int and float64 types from JSON unmarshalling, addressing the concern from previous reviews.

core/providers/huggingface/chat.go (2)

69-76: LGTM - nil pointer dereference fix is correctly implemented.

The code now safely handles a nil tc.Function.Name by using a default empty string, preventing potential panics.


324-420: LGTM!

The streaming response converter correctly handles delta fields, tool calls, logprobs, and usage conversion. The nil handling returning plain nil (without error) is appropriate for streaming contexts.

core/providers/huggingface/huggingface.go (10)

1-31: LGTM: Package setup and provider struct are well-structured.

The debug toggle via environment variable and the provider struct with proper configuration fields follow established patterns from other providers.


33-85: LGTM: Object pooling implementation is correct.

The acquire/release pattern with struct reset ensures clean state reuse. The nil checks in release functions prevent panics.


87-120: LGTM: Provider constructor follows established patterns.

Pre-warming response pools and proper configuration handling align with other provider implementations.


122-130: LGTM: Helper methods correctly delegate to utility functions.


132-205: LGTM: HTTP request handling is robust.

The response body copy at line 192 correctly prevents use-after-free when fasthttp releases its internal buffer. Error response parsing properly extracts HuggingFace-specific error details.


207-293: LGTM: Model listing implementation handles both keyed and keyless modes correctly.

The operation check at line 279 correctly uses schemas.HuggingFace (the copy-paste issue from Gemini was fixed).


303-375: LGTM: ChatCompletion implementation is well-structured.

The model name splitting and reconstruction pattern correctly handles HuggingFace's modelName:inferenceProvider format. Response conversion and extra fields population follow established patterns.


377-659: LGTM: Streaming implementation is comprehensive.

The SSE parsing, context cancellation handling, and the workaround for combined usage+content chunks (lines 604-645) are well-documented and correctly implemented. Resource cleanup via defers ensures proper release of fasthttp resources.


661-683: LGTM: Responses API correctly adapts ChatCompletion endpoints.

The fallback pattern via context flag enables code reuse while maintaining proper response type conversion.


756-830: LGTM: Speech implementation correctly validates task type.

The task check at line 787 correctly validates "text-to-speech" for Speech operations (the swapped task check from past review was fixed).

core/providers/huggingface/types.go (8)

1-48: LGTM: Model response unmarshaling correctly handles both API formats.

The custom UnmarshalJSON at lines 30-48 properly handles both the top-level array format (current API) and the object format with a models field (backward compatibility). This addresses the past review concern.


50-68: LGTM: Inference provider mapping types are well-structured.

The separation between API response types (HuggingFaceInferenceProviderInfo) and internal types (HuggingFaceInferenceProviderMapping) provides clean abstraction.


69-213: LGTM: Chat types comprehensively cover OpenAI-compatible format.

The flexible use of json.RawMessage for content and tool_choice allows handling various input formats. Response types include full logprobs and usage support.


214-285: LGTM: Streaming types correctly model delta structures.

The streaming response types properly handle incremental content delivery with optional usage and timing information.


286-298: LGTM: Error types distinguish Hub and inference API responses.


299-324: LGTM: Embedding types support flexible input formats.

The interface{} type for Inputs correctly handles both single string and string array inputs required by the feature extraction API.


326-349: LGTM: Speech types appropriately model text-to-speech API.


350-436: LGTM: Transcription types handle complex union types correctly.

The custom MarshalJSON/UnmarshalJSON for HuggingFaceTranscriptionEarlyStopping properly handles the boolean or string ("never") union type. The type aliases at lines 435-436 provide convenient access.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (4)
Makefile (3)

97-98: Simplify binary availability check—which does not work with full paths.

The condition if ! which $$INSTALLED >/dev/null 2>&1 && [ ! -x "$$INSTALLED" ]; then contains a redundant and confusing which call. Since $$INSTALLED is an absolute path (e.g., /home/user/go/bin/air), which will always fail—making the && logic unintuitive. Simply check if the file is executable.

-		if ! which $$INSTALLED >/dev/null 2>&1 && [ ! -x "$$INSTALLED" ]; then \
-			echo "$(YELLOW)Note: the installed air binary may not be on your PATH...$(NC)"; \
+		if [ ! -x "$$INSTALLED" ]; then \
+			echo "$(YELLOW)Note: the installed air binary may not be on your PATH...$(NC)"; \

110-111: Simplify binary availability check—same issue as install-air (line 97).

Apply the same simplification to install-gotestsum: remove the redundant which call and rely solely on the [ -x ] check.

-		if ! which $$INSTALLED >/dev/null 2>&1 && [ ! -x "$$INSTALLED" ]; then \
-			echo "$(YELLOW)Note: the installed gotestsum binary may not be on your PATH...$(NC)"; \
+		if [ ! -x "$$INSTALLED" ]; then \
+			echo "$(YELLOW)Note: the installed gotestsum binary may not be on your PATH...$(NC)"; \

103-103: Replace which with a more portable check or simplify to match AIR_BIN pattern.

Line 103 uses which $(GOTESTSUM_BIN) as a fallback after the [ -x ] check. Since GOTESTSUM_BIN may be a full path, which is unreliable. Either replace with command -v (which works for both names and full paths) or simplify to just [ -x ] for consistency with the AIR_BIN pattern on line 90.

Option 1 (recommended): Simplify to match the AIR_BIN pattern:

- @if [ -x "$(GOTESTSUM_BIN)" ] || which $(GOTESTSUM_BIN) > /dev/null 2>&1; then \
+ @if [ -x "$(GOTESTSUM_BIN)" ]; then \

Option 2: Use command -v for portability:

- @if [ -x "$(GOTESTSUM_BIN)" ] || which $(GOTESTSUM_BIN) > /dev/null 2>&1; then \
+ @if [ -x "$(GOTESTSUM_BIN)" ] || command -v $(GOTESTSUM_BIN) > /dev/null 2>&1; then \
core/providers/huggingface/huggingface.go (1)

295-301: Fix request type in unsupported TextCompletion operations.

Both TextCompletion and TextCompletionStream return an unsupported-operation error but incorrectly tag it as schemas.EmbeddingRequest instead of the appropriate text-completion request types. That makes error classification and telemetry misleading.

Consider updating to:

 func (provider *HuggingFaceProvider) TextCompletion(ctx context.Context, key schemas.Key, request *schemas.BifrostTextCompletionRequest) (*schemas.BifrostTextCompletionResponse, *schemas.BifrostError) {
-	return nil, providerUtils.NewUnsupportedOperationError(schemas.EmbeddingRequest, provider.GetProviderKey())
+	return nil, providerUtils.NewUnsupportedOperationError(schemas.TextCompletionRequest, provider.GetProviderKey())
 }

 func (provider *HuggingFaceProvider) TextCompletionStream(ctx context.Context, postHookRunner schemas.PostHookRunner, key schemas.Key, request *schemas.BifrostTextCompletionRequest) (chan *schemas.BifrostStream, *schemas.BifrostError) {
-	return nil, providerUtils.NewUnsupportedOperationError(schemas.EmbeddingRequest, provider.GetProviderKey())
+	return nil, providerUtils.NewUnsupportedOperationError(schemas.TextCompletionStreamRequest, provider.GetProviderKey())
 }
🧹 Nitpick comments (2)
core/providers/huggingface/types.go (1)

405-421: Consider returning an error for invalid EarlyStopping values.

The UnmarshalJSON method returns nil (line 420) when the value is neither a boolean nor a string. This silently ignores invalid values. Consider returning an error for non-null values that don't match expected types to catch malformed API responses.

Apply this diff to add error handling:

 func (e *HuggingFaceTranscriptionEarlyStopping) UnmarshalJSON(data []byte) error {
+	// Handle null explicitly
+	if string(data) == "null" {
+		return nil
+	}
+
 	// Try boolean first
 	var boolVal bool
 	if err := json.Unmarshal(data, &boolVal); err == nil {
 		e.BoolValue = &boolVal
 		return nil
 	}
 
 	// Try string
 	var stringVal string
 	if err := json.Unmarshal(data, &stringVal); err == nil {
 		e.StringValue = &stringVal
 		return nil
 	}
 
-	return nil
+	return fmt.Errorf("early_stopping must be a boolean or string, got: %s", string(data))
 }
core/providers/huggingface/huggingface.go (1)

685-754: Embedding implementation matches the expected routing and extra-field semantics.

The Embedding path:

  • Checks operation permissions for HuggingFace/Embedding.
  • Builds the HF request via ToHuggingFaceEmbeddingRequest.
  • Splits the model into inferenceProvider and modelName, then resolves a getModelInferenceProviderMapping entry and validates ProviderTask == "feature-extraction".
  • Uses the provider-specific model id and getInferenceProviderRouteURL to derive the target URL, then executes completeRequest.
  • Converts HuggingFaceEmbeddingResponse into a BifrostEmbeddingResponse and fills ExtraFields (provider, model requested, request type, latency, raw response).

This is correct and consistent with how embeddings are handled for other providers.

As a non-blocking improvement, the repeated pattern of splitting the model, looking up getModelInferenceProviderMapping, validating ProviderTask, and deriving the route URL (here and in Speech/Transcription) could be factored into a small helper to avoid drift if the mapping rules ever change.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 00ecc06 and 4663a80.

📒 Files selected for processing (26)
  • .github/workflows/pr-tests.yml (1 hunks)
  • .github/workflows/release-pipeline.yml (4 hunks)
  • Makefile (8 hunks)
  • core/bifrost.go (2 hunks)
  • core/internal/testutil/account.go (3 hunks)
  • core/internal/testutil/responses_stream.go (1 hunks)
  • core/providers/huggingface/chat.go (1 hunks)
  • core/providers/huggingface/embedding.go (1 hunks)
  • core/providers/huggingface/huggingface.go (1 hunks)
  • core/providers/huggingface/huggingface_test.go (1 hunks)
  • core/providers/huggingface/models.go (1 hunks)
  • core/providers/huggingface/speech.go (1 hunks)
  • core/providers/huggingface/transcription.go (1 hunks)
  • core/providers/huggingface/types.go (1 hunks)
  • core/providers/huggingface/utils.go (1 hunks)
  • core/schemas/account.go (2 hunks)
  • core/schemas/bifrost.go (3 hunks)
  • core/schemas/mux.go (2 hunks)
  • docs/apis/openapi.json (1 hunks)
  • docs/contributing/adding-a-provider.mdx (1 hunks)
  • docs/features/unified-interface.mdx (1 hunks)
  • transports/config.schema.json (2 hunks)
  • ui/README.md (1 hunks)
  • ui/lib/constants/config.ts (2 hunks)
  • ui/lib/constants/icons.tsx (1 hunks)
  • ui/lib/constants/logs.ts (2 hunks)
🚧 Files skipped from review as they are similar to previous changes (12)
  • ui/README.md
  • core/schemas/mux.go
  • .github/workflows/release-pipeline.yml
  • .github/workflows/pr-tests.yml
  • docs/apis/openapi.json
  • core/schemas/account.go
  • transports/config.schema.json
  • ui/lib/constants/logs.ts
  • docs/features/unified-interface.mdx
  • core/internal/testutil/responses_stream.go
  • core/schemas/bifrost.go
  • core/internal/testutil/account.go
👮 Files not reviewed due to content moderation or server errors (1)
  • ui/lib/constants/icons.tsx
🧰 Additional context used
📓 Path-based instructions (1)
**

⚙️ CodeRabbit configuration file

always check the stack if there is one for the current PR. do not give localized reviews for the PR, always see all changes in the light of the whole stack of PRs (if there is a stack, if there is no stack you can continue to make localized suggestions/reviews)

Files:

  • core/bifrost.go
  • ui/lib/constants/config.ts
  • core/providers/huggingface/embedding.go
  • docs/contributing/adding-a-provider.mdx
  • Makefile
  • core/providers/huggingface/chat.go
  • core/providers/huggingface/models.go
  • core/providers/huggingface/utils.go
  • core/providers/huggingface/transcription.go
  • ui/lib/constants/icons.tsx
  • core/providers/huggingface/huggingface.go
  • core/providers/huggingface/huggingface_test.go
  • core/providers/huggingface/speech.go
  • core/providers/huggingface/types.go
🧬 Code graph analysis (9)
core/bifrost.go (2)
core/schemas/bifrost.go (1)
  • HuggingFace (51-51)
core/providers/huggingface/huggingface.go (1)
  • NewHuggingFaceProvider (88-120)
core/providers/huggingface/embedding.go (2)
core/schemas/embedding.go (4)
  • BifrostEmbeddingRequest (9-16)
  • BifrostEmbeddingResponse (22-28)
  • EmbeddingData (118-122)
  • EmbeddingStruct (124-128)
core/providers/huggingface/types.go (2)
  • HuggingFaceEmbeddingRequest (303-313)
  • HuggingFaceEmbeddingResponse (324-324)
core/providers/huggingface/chat.go (2)
core/schemas/chatcompletions.go (2)
  • BifrostChatRequest (12-19)
  • LogProb (625-629)
core/providers/huggingface/types.go (12)
  • HuggingFaceChatRequest (72-92)
  • HuggingFaceChatMessage (94-102)
  • HuggingFaceContentItem (105-109)
  • HuggingFaceImageRef (111-113)
  • HuggingFaceToolCall (115-119)
  • HuggingFaceFunction (121-125)
  • HuggingFaceResponseFormat (127-130)
  • HuggingFaceStreamOptions (139-141)
  • HuggingFaceTool (143-146)
  • HuggingFaceToolFunction (148-152)
  • HuggingFaceChatResponse (154-161)
  • HuggingFaceChatStreamResponse (215-224)
core/providers/huggingface/models.go (3)
core/providers/huggingface/types.go (1)
  • HuggingFaceListModelsResponse (24-26)
core/schemas/bifrost.go (13)
  • ModelProvider (32-32)
  • RequestType (86-86)
  • ChatCompletionRequest (92-92)
  • ChatCompletionStreamRequest (93-93)
  • TextCompletionRequest (90-90)
  • TextCompletionStreamRequest (91-91)
  • ResponsesRequest (94-94)
  • ResponsesStreamRequest (95-95)
  • EmbeddingRequest (96-96)
  • SpeechRequest (97-97)
  • SpeechStreamRequest (98-98)
  • TranscriptionRequest (99-99)
  • TranscriptionStreamRequest (100-100)
core/schemas/models.go (1)
  • Model (109-129)
core/providers/huggingface/utils.go (2)
core/providers/huggingface/huggingface.go (1)
  • HuggingFaceProvider (25-31)
core/providers/utils/utils.go (5)
  • GetRequestPath (219-239)
  • MakeRequestWithContext (39-93)
  • HandleProviderAPIError (317-337)
  • CheckAndDecodeBody (423-431)
  • NewBifrostOperationError (449-460)
core/providers/huggingface/transcription.go (3)
core/schemas/transcriptions.go (2)
  • BifrostTranscriptionRequest (3-10)
  • BifrostTranscriptionResponse (16-26)
core/providers/huggingface/types.go (4)
  • HuggingFaceTranscriptionRequest (353-358)
  • HuggingFaceTranscriptionRequestParameters (361-364)
  • HuggingFaceTranscriptionGenerationParameters (367-384)
  • HuggingFaceTranscriptionEarlyStopping (388-391)
core/schemas/bifrost.go (2)
  • BifrostResponseExtraFields (287-296)
  • HuggingFace (51-51)
core/providers/huggingface/huggingface.go (5)
core/providers/huggingface/types.go (4)
  • HuggingFaceChatResponse (154-161)
  • HuggingFaceResponseError (293-297)
  • HuggingFaceHubError (288-291)
  • HuggingFaceEmbeddingResponse (324-324)
core/providers/utils/utils.go (5)
  • SetExtraHeaders (178-208)
  • MakeRequestWithContext (39-93)
  • HandleProviderAPIError (317-337)
  • CheckAndDecodeBody (423-431)
  • NewBifrostOperationError (449-460)
core/schemas/bifrost.go (15)
  • ModelProvider (32-32)
  • HuggingFace (51-51)
  • RequestType (86-86)
  • BifrostError (356-365)
  • ErrorField (374-381)
  • ListModelsRequest (89-89)
  • EmbeddingRequest (96-96)
  • ChatCompletionRequest (92-92)
  • ChatCompletionStreamRequest (93-93)
  • ResponsesStreamRequest (95-95)
  • ResponsesRequest (94-94)
  • SpeechRequest (97-97)
  • SpeechStreamRequest (98-98)
  • TranscriptionRequest (99-99)
  • TranscriptionStreamRequest (100-100)
core/providers/huggingface/chat.go (1)
  • ToHuggingFaceChatCompletionRequest (12-207)
core/providers/huggingface/embedding.go (1)
  • ToHuggingFaceEmbeddingRequest (10-58)
core/providers/huggingface/huggingface_test.go (5)
core/internal/testutil/setup.go (1)
  • SetupTest (51-60)
core/internal/testutil/account.go (2)
  • ComprehensiveTestConfig (47-64)
  • TestScenarios (22-44)
core/schemas/bifrost.go (2)
  • HuggingFace (51-51)
  • Fallback (131-134)
core/schemas/models.go (1)
  • Model (109-129)
core/internal/testutil/tests.go (1)
  • RunAllComprehensiveTests (15-62)
core/providers/huggingface/speech.go (3)
core/schemas/speech.go (2)
  • BifrostSpeechRequest (9-16)
  • BifrostSpeechResponse (22-29)
core/providers/huggingface/types.go (4)
  • HuggingFaceSpeechRequest (329-335)
  • HuggingFaceSpeechParameters (338-341)
  • HuggingFaceTranscriptionGenerationParameters (367-384)
  • HuggingFaceTranscriptionEarlyStopping (388-391)
core/schemas/bifrost.go (2)
  • BifrostResponseExtraFields (287-296)
  • HuggingFace (51-51)
🪛 LanguageTool
docs/contributing/adding-a-provider.mdx

[grammar] ~43-~43: Use a hyphen to join words.
Context: ... OpenAI API compatible: #### Non-OpenAI Compatible Providers If the provider ha...

(QB_NEW_EN_HYPHEN)


[grammar] ~71-~71: Use a hyphen to join words.
Context: ...T - Add comprehensive tests #### OpenAI Compatible Providers If the provider is...

(QB_NEW_EN_HYPHEN)


[grammar] ~629-~629: Use a hyphen to join words.
Context: ...ls (for response converters) ### OpenAI Compatible Providers If you are impleme...

(QB_NEW_EN_HYPHEN)


[grammar] ~1475-~1475: Use a hyphen to join words.
Context: ...r complete details. --- ### For OpenAI Compatible Providers For OpenAI-compati...

(QB_NEW_EN_HYPHEN)


[uncategorized] ~1785-~1785: The official name of this software platform is spelled with a capital “H”.
Context: ....yml- ✅ Environment variable added to.github/workflows/release-pipeline.yml` (4 jobs...

(GITHUB)

🔇 Additional comments (34)
Makefile (5)

15-21: ✓ Well-structured tool path resolution with sensible fallbacks.

The new DEFAULT_GOBIN variable elegantly prioritizes GOBINGOPATH/bin → empty, and AIR_BIN and GOTESTSUM_BIN consistently prefer local paths. This avoids hardcoding tool expectations and gracefully handles systems with custom Go installations (e.g., Nix).


24-29: ✓ Dynamic ANSI color codes improve portability and maintainability.

Using shell printf instead of literal escape sequences is more portable and easier to read. This pattern is cleaner than hardcoded escape codes.


65-70: ✓ Root check and multi-fallback Next.js detection enhance developer experience.

Lines 67-70 prevent root-user npm installs on local dev machines, which avoids permission issues on NixOS and similar systems. Lines 76-85 provide three tiers of fallback (local node_modules, npx, global), making the install flow robust across different environments.

Also applies to: 75-85


155-155: ✓ Consistent use of $(AIR_BIN) variable throughout dev target.

Both debug and non-debug paths now use the centralized $(AIR_BIN) variable, ensuring the tool path is resolved consistently.

Also applies to: 163-163


349-349: ✓ $(GOTESTSUM_BIN) consistently used across all test targets.

All test invocations now reference $(GOTESTSUM_BIN) instead of the bare gotestsum command, ensuring they use the resolved tool path. This improves reliability on systems where gotestsum is installed in non-standard locations.

Also applies to: 401-401, 425-425, 454-454, 548-548

core/bifrost.go (2)

26-26: LGTM!

The import statement is correctly placed in alphabetical order and follows Go conventions.


1327-1328: LGTM!

The HuggingFace provider is correctly wired into the factory switch with the proper constructor call and return pattern, consistent with other providers.

core/providers/huggingface/types.go (6)

11-48: LGTM!

The custom UnmarshalJSON implementation properly handles both the array and object response formats from the HuggingFace API, addressing the earlier review concern.


90-92: Clarify the purpose of the Extra field.

The Extra field is tagged with json:"-" which means it won't be marshaled or unmarshaled. If the intent is to capture unknown additional fields from the API response, you'd need custom UnmarshalJSON logic. If it's for application-level metadata only (not from API), the current approach is fine.


288-297: LGTM!

The error types properly represent different HuggingFace API error response formats.


303-324: LGTM!

The embedding types properly handle flexible input formats and the response structure matches HuggingFace's feature extraction API output.


329-348: LGTM!

The speech types follow a consistent pattern with the chat types. Note that the Extra fields with json:"-" tags won't capture unknown API fields (same consideration as the chat request Extra field).


435-436: LGTM!

The type aliases provide convenient alternative names for shared generation parameters, improving code readability.

ui/lib/constants/config.ts (2)

14-14: LGTM!

The HuggingFace model placeholder examples demonstrate both chat and embedding models with proper HuggingFace Hub naming conventions.


34-34: LGTM!

Correctly marks HuggingFace as requiring an API key, consistent with other cloud-based inference providers.

core/providers/huggingface/huggingface_test.go (1)

12-63: LGTM! Test configuration is well-structured.

The test setup correctly:

  • Checks for API key before running
  • Uses parallel execution
  • Configures comprehensive scenarios
  • Enables appropriate features (chat, streaming, tool calls, vision)
  • Disables unsupported features (text completion, embedding, speech, transcription)

The model assignments appear correct and the disabled scenarios align with the provider's current capabilities.

core/providers/huggingface/models.go (1)

16-104: LGTM! Model listing conversion is well-implemented.

The code demonstrates good practices:

  • Proper nil handling and input validation
  • Pre-allocated slices for performance
  • Skip logic for invalid models (empty ID or no supported methods)
  • Comprehensive method derivation from both pipeline tags and model tags
  • Deduplication using a map-based approach
  • Sorted output for consistency

The logic correctly maps HuggingFace model metadata to Bifrost's schema.

core/providers/huggingface/embedding.go (1)

10-96: LGTM! Embedding conversion logic is solid.

The implementation correctly handles:

  • Nil input validation
  • Model/provider extraction via splitIntoModelProvider
  • Mapping single text vs. array of texts
  • Provider-specific parameters from ExtraParams
  • Embedding array construction with proper indexing
  • Missing usage information (documented that HF doesn't provide it)

The converters follow the established pattern and handle both request and response transformations cleanly.

core/providers/huggingface/transcription.go (1)

9-142: LGTM! Transcription converters handle complex parameter mapping well.

The implementation demonstrates:

  • Proper input validation with clear error messages
  • Correct use of extractIntFromInterface to handle JSON numeric types (addressing past review concerns)
  • Comprehensive generation parameter mapping (do_sample, max_new_tokens, temperature, etc.)
  • Proper handling of the polymorphic early_stopping field (bool or string)
  • Segment conversion with timestamp preservation
  • Clean separation of concerns

Previous issues with error messages and integer type assertions have been addressed.

docs/contributing/adding-a-provider.mdx (1)

1-2070: Documentation is comprehensive and well-structured.

The guide provides excellent coverage of:

  • Clear distinction between OpenAI-compatible and custom API providers
  • Step-by-step implementation phases with proper ordering
  • File structure and naming conventions
  • Type definitions and validation requirements
  • Converter patterns with real examples
  • Test setup and CI/CD integration
  • UI and configuration updates

The reference code examples serve their purpose of illustrating patterns without needing to be complete implementations (as clarified by the author in past reviews).

core/providers/huggingface/speech.go (1)

9-120: LGTM! Speech synthesis converters are well-implemented.

The code follows the same robust patterns as transcription:

  • Input validation with clear error messages
  • Proper use of extractIntFromInterface for JSON numeric handling
  • Comprehensive generation parameter mapping
  • Handling of polymorphic early_stopping field
  • Clear documentation that HF TTS doesn't provide usage/alignment data
  • Clean response construction with proper metadata

Previous type assertion issues have been resolved.

core/providers/huggingface/chat.go (1)

12-421: LGTM! Chat converters handle complex transformations correctly.

The implementation demonstrates excellent handling of:

  • Nil pointer protection for tool calls (fixed from past review)
  • Message content conversion (string vs. structured blocks)
  • Image URL handling for vision models
  • Tool call conversion with proper nil checks
  • Response format and stream options
  • Debug logging (variable properly defined in package scope)
  • Logprobs and top_logprobs conversion
  • Streaming delta transformation

The code properly handles both non-streaming and streaming responses with appropriate type conversions.

core/providers/huggingface/utils.go (3)

137-166: Good fix for model name validation.

The splitIntoModelProvider function now properly handles invalid input (no slashes) by returning an error instead of producing empty strings. The debug logging helps trace the parsing logic, and the distinction between single-slash (org/model) and multi-slash (provider/org/model) formats is clear.


168-201: Provider routing correctly limited to supported operations.

As clarified in past reviews, only 6 providers (fal-ai, hf-inference, nebius, replicate, sambanova, scaleway) support embedding/speech/transcription operations. The other 13 providers in INFERENCE_PROVIDERS are used for chat/text-generation, which follows a different routing pattern. The error messages have been corrected to reference the appropriate provider names.


257-296: Excellent utility function for JSON numeric handling.

The extractIntFromInterface helper comprehensively handles all numeric types that can result from JSON unmarshaling:

  • All signed and unsigned integer types
  • Float types (with conversion)
  • json.Number with fallback parsing

This is used throughout the provider to safely extract integer parameters from ExtraParams, addressing the type assertion issues flagged in earlier reviews.

core/providers/huggingface/huggingface.go (9)

33-85: Response pooling helpers look correct and safe to reuse.

The sync.Pool setup and acquire/release helpers for HuggingFaceChatResponse, HuggingFaceTranscriptionResponse, and HuggingFaceSpeechResponse correctly reset structs on acquire and only reuse them after the call site is done, which avoids stale state and minimizes allocations. No changes needed here.


87-131: Provider initialization and URL handling are consistent with other providers.

NewHuggingFaceProvider, GetProviderKey, and buildRequestURL follow the existing provider patterns: they apply defaults, configure the fasthttp client (including proxy), normalize BaseURL, honor CustomProviderConfig, and respect per-request path overrides. This looks good and matches the broader design.


132-205: Core HTTP request path is robust and correctly decouples the response body.

completeRequest cleanly centralizes request construction, extra-header injection, auth, context-aware execution, non-200 handling via HandleProviderAPIError, gzip-aware decoding via CheckAndDecodeBody, and copies the body before releasing the fasthttp response to avoid use-after-free. The debug logging is also appropriately gated. No functional issues from this implementation.


207-293: Model listing logic is aligned with Bifrost patterns.

listModelsByKey and ListModels correctly apply operation-allowed checks, build the model hub URL, attach auth and extra headers, handle HTTP and provider-level errors, decode via HuggingFaceListModelsResponse, and then delegate fan-out across keys via HandleMultipleListModelsRequests, including latency and optional raw-response propagation. This looks consistent and complete.


303-375: ChatCompletion request/response wiring looks solid.

The ChatCompletion path correctly:

  • Checks operation permissions.
  • Normalizes the model via splitIntoModelProvider before building the HF request.
  • Uses CheckContextAndGetRequestBody with ToHuggingFaceChatCompletionRequest and explicitly disables streaming.
  • Builds the URL with buildRequestURL, calls completeRequest, and uses the pooled HuggingFaceChatResponse for JSON decoding.
  • Converts to BifrostChatResponse and fills ExtraFields (provider, requested model, request type, latency, raw response when enabled).

This end-to-end flow is cohesive and matches the intended Bifrost provider contract.


377-659: Streaming chat implementation handles SSE and Responses fallback correctly.

The ChatCompletionStream implementation is careful and comprehensive: it validates operation permissions, handles the Responses→Chat fallback via a stream state object, builds a streaming POST with Accept: text/event-stream, and uses bufio.Scanner on BodyStream() with an increased buffer. The loop:

  • Filters comments/empty lines and [DONE].
  • Parses SSE data: lines, detects error payloads via HuggingFaceResponseError and sends a structured BifrostError.
  • Decodes normal chunks into HuggingFaceChatStreamResponse, converts to Bifrost stream responses, populates per-chunk metadata (including latency and chunk index), and optionally attaches raw JSON.
  • Handles the combined usage+content case for Responses fallback by splitting content/usage into separate events.
  • Propagates scanner errors via ProcessAndSendError and ensures resources/channels are released.

This is a solid streaming implementation; no blocking issues identified.


661-683: Responses and ResponsesStream wrappers are straightforward and correct.

These methods reuse the ChatCompletion/ChatCompletionStream implementations and only adapt to the Responses API shape, setting ExtraFields.RequestType, Provider, and ModelRequested appropriately. This is a good reuse of existing logic and behaves as expected.


756-830: Speech and SpeechStream behaviors are coherent with provider capabilities.

Speech correctly reuses the model-mapping pattern (including ProviderTask == "text-to-speech"), builds the HF-specific speech request via ToHuggingFaceSpeechRequest, executes completeRequest, and converts the pooled HuggingFaceSpeechResponse into a BifrostSpeechResponse with fully populated ExtraFields (including latency and optional raw response). SpeechStream explicitly returns an unsupported-operation error tagged with schemas.SpeechStreamRequest, which is appropriate if streaming speech isn’t supported yet.


832-906: Transcription and TranscriptionStream correctly target ASR and error types.

Transcription:

  • Checks operation permissions for schemas.TranscriptionRequest.
  • Splits model/provider and resolves a mapping where ProviderTask == "automatic-speech-recognition".
  • Builds the body via ToHuggingFaceTranscriptionRequest, derives the inference route URL with getInferenceProviderRouteURL, and calls completeRequest.
  • Uses the pooled HuggingFaceTranscriptionResponse to decode and convert to BifrostTranscriptionResponse, filling provider/model/request-type/latency/RawResponse metadata.

TranscriptionStream cleanly returns an unsupported-operation error with the correct schemas.TranscriptionStreamRequest tag. This aligns with the intended ASR behavior and prior fixes to the task and error types.

@qwerty-dvorak qwerty-dvorak force-pushed the 12-05-feat_huggingface_provider_added branch from 4663a80 to 53cb91d Compare December 5, 2025 23:10
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (4)
Makefile (3)

97-98: DUPLICATE COMMENT: which with full path variable—still present despite prior review.

This pattern was flagged in a previous review (commit f279893) but remains in the code. While the AND logic with [ ! -x "$$INSTALLED" ] does provide fallback protection, the condition ! which $$INSTALLED will always be true for full paths (since which cannot resolve absolute paths). Simplify by removing the redundant which check:

- if ! which $$INSTALLED >/dev/null 2>&1 && [ ! -x "$$INSTALLED" ]; then \
+ if [ ! -x "$$INSTALLED" ]; then \

103-104: DUPLICATE COMMENT: which fallback for full-path variable—unclear intent and inconsistent with AIR_BIN pattern.

This pattern was flagged in a previous review (commit f279893). Although the [ -x ] check comes first and will handle full paths correctly (short-circuiting the which fallback), the fallback itself is ineffective for full paths and inconsistent with the simpler [ -x "$(AIR_BIN)" ] check used for AIR_BIN on line 90. Consider aligning both patterns:

- @if [ -x "$(GOTESTSUM_BIN)" ] || which $(GOTESTSUM_BIN) > /dev/null 2>&1; then \
+ @if [ -x "$(GOTESTSUM_BIN)" ]; then \

110-111: DUPLICATE COMMENT: which with full path variable—same issue as lines 97–98.

This pattern was flagged in a previous review (commit f279893) but persists here. The AND logic provides practical protection, but the which check is redundant for full paths. Apply the same simplification as line 97:

- if ! which $$INSTALLED >/dev/null 2>&1 && [ ! -x "$$INSTALLED" ]; then \
+ if [ ! -x "$$INSTALLED" ]; then \
core/providers/huggingface/huggingface.go (1)

381-387: Bug: Incorrect request type in unsupported operation errors.

Both TextCompletion and TextCompletionStream return errors with schemas.EmbeddingRequest instead of the correct request types.

Apply this diff to fix the error types:

 func (provider *HuggingFaceProvider) TextCompletion(ctx context.Context, key schemas.Key, request *schemas.BifrostTextCompletionRequest) (*schemas.BifrostTextCompletionResponse, *schemas.BifrostError) {
-	return nil, providerUtils.NewUnsupportedOperationError(schemas.EmbeddingRequest, provider.GetProviderKey())
+	return nil, providerUtils.NewUnsupportedOperationError(schemas.TextCompletionRequest, provider.GetProviderKey())
 }

 func (provider *HuggingFaceProvider) TextCompletionStream(ctx context.Context, postHookRunner schemas.PostHookRunner, key schemas.Key, request *schemas.BifrostTextCompletionRequest) (chan *schemas.BifrostStream, *schemas.BifrostError) {
-	return nil, providerUtils.NewUnsupportedOperationError(schemas.EmbeddingRequest, provider.GetProviderKey())
+	return nil, providerUtils.NewUnsupportedOperationError(schemas.TextCompletionStreamRequest, provider.GetProviderKey())
 }
🧹 Nitpick comments (3)
Makefile (1)

75-85: Next.js fallback chain is reasonable but could be simplified.

The logic prefers local ./node_modules/.bin/next, then npx, then global install. However, the condition structure is a bit convoluted:

  • Line 77: checks if local next exists
  • Line 79: only runs npm install if npx is found (but the intent seems to be to install next if not found)
  • Line 83: fallback comment mentions "may fail on Nix"

The logic works, but consider clarifying the intent—should the npm install on line 81 run unconditionally if local next is missing, or should it depend on npx availability? Currently, it only runs npm install if npx exists, which may not match the intent.

core/schemas/account.go (1)

54-57: Consider adding an Endpoint field for self-hosted deployments.

The current implementation only includes Deployments, which covers model-to-deployment mappings. Per the linked issue, self-hosted endpoints are an optional enhancement. For future extensibility, consider whether an Endpoint field (similar to AzureKeyConfig) would be beneficial for users deploying models on custom infrastructure or using dedicated inference endpoints.

This is non-blocking for the initial implementation.

 type HuggingFaceKeyConfig struct {
+	Endpoint    string            `json:"endpoint,omitempty"`    // Custom HuggingFace inference endpoint URL
 	Deployments map[string]string `json:"deployments,omitempty"` // Mapping of model identifiers to deployment names
 }
core/providers/huggingface/embedding.go (1)

60-104: Consider simplifying zero-usage handling.

The conversion logic is correct. However, lines 94-100 explicitly create a zero-valued BifrostLLMUsage when the HuggingFace response doesn't include usage. You could simplify by leaving Usage as nil when not provided, since the Bifrost schema allows it to be omitted.

If you prefer explicit zero values, the current implementation is fine. Otherwise, you can remove lines 94-100:

 	// Map usage information if available
 	if response.Usage != nil {
 		bifrostResponse.Usage = &schemas.BifrostLLMUsage{
 			PromptTokens:     response.Usage.PromptTokens,
 			CompletionTokens: response.Usage.CompletionTokens,
 			TotalTokens:      response.Usage.TotalTokens,
 		}
-	} else {
-		// Set empty usage if not provided
-		bifrostResponse.Usage = &schemas.BifrostLLMUsage{
-			PromptTokens:     0,
-			CompletionTokens: 0,
-			TotalTokens:      0,
-		}
 	}
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4663a80 and 53cb91d.

📒 Files selected for processing (26)
  • .github/workflows/pr-tests.yml (1 hunks)
  • .github/workflows/release-pipeline.yml (4 hunks)
  • Makefile (8 hunks)
  • core/bifrost.go (2 hunks)
  • core/internal/testutil/account.go (3 hunks)
  • core/internal/testutil/responses_stream.go (1 hunks)
  • core/providers/huggingface/chat.go (1 hunks)
  • core/providers/huggingface/embedding.go (1 hunks)
  • core/providers/huggingface/huggingface.go (1 hunks)
  • core/providers/huggingface/huggingface_test.go (1 hunks)
  • core/providers/huggingface/models.go (1 hunks)
  • core/providers/huggingface/speech.go (1 hunks)
  • core/providers/huggingface/transcription.go (1 hunks)
  • core/providers/huggingface/types.go (1 hunks)
  • core/providers/huggingface/utils.go (1 hunks)
  • core/schemas/account.go (2 hunks)
  • core/schemas/bifrost.go (3 hunks)
  • core/schemas/mux.go (2 hunks)
  • docs/apis/openapi.json (1 hunks)
  • docs/contributing/adding-a-provider.mdx (1 hunks)
  • docs/features/unified-interface.mdx (1 hunks)
  • transports/config.schema.json (2 hunks)
  • ui/README.md (1 hunks)
  • ui/lib/constants/config.ts (2 hunks)
  • ui/lib/constants/icons.tsx (1 hunks)
  • ui/lib/constants/logs.ts (2 hunks)
🚧 Files skipped from review as they are similar to previous changes (12)
  • ui/README.md
  • core/schemas/mux.go
  • core/internal/testutil/account.go
  • transports/config.schema.json
  • core/providers/huggingface/utils.go
  • .github/workflows/pr-tests.yml
  • docs/apis/openapi.json
  • core/providers/huggingface/speech.go
  • docs/features/unified-interface.mdx
  • ui/lib/constants/config.ts
  • core/bifrost.go
  • core/internal/testutil/responses_stream.go
👮 Files not reviewed due to content moderation or server errors (1)
  • ui/lib/constants/icons.tsx
🧰 Additional context used
📓 Path-based instructions (1)
**

⚙️ CodeRabbit configuration file

always check the stack if there is one for the current PR. do not give localized reviews for the PR, always see all changes in the light of the whole stack of PRs (if there is a stack, if there is no stack you can continue to make localized suggestions/reviews)

Files:

  • core/schemas/bifrost.go
  • core/schemas/account.go
  • core/providers/huggingface/transcription.go
  • docs/contributing/adding-a-provider.mdx
  • Makefile
  • core/providers/huggingface/huggingface_test.go
  • core/providers/huggingface/embedding.go
  • core/providers/huggingface/chat.go
  • ui/lib/constants/logs.ts
  • core/providers/huggingface/models.go
  • ui/lib/constants/icons.tsx
  • core/providers/huggingface/huggingface.go
  • core/providers/huggingface/types.go
🧬 Code graph analysis (6)
core/schemas/bifrost.go (1)
ui/lib/types/config.ts (1)
  • ModelProvider (171-174)
core/providers/huggingface/transcription.go (6)
core/schemas/transcriptions.go (2)
  • BifrostTranscriptionRequest (3-10)
  • BifrostTranscriptionResponse (16-26)
core/providers/huggingface/types.go (5)
  • HuggingFaceTranscriptionRequest (376-381)
  • HuggingFaceTranscriptionRequestParameters (384-387)
  • HuggingFaceTranscriptionGenerationParameters (390-407)
  • HuggingFaceTranscriptionEarlyStopping (411-414)
  • HuggingFaceTranscriptionResponse (447-450)
ui/components/ui/input.tsx (1)
  • Input (15-69)
core/schemas/models.go (1)
  • Model (109-129)
core/providers/elevenlabs/transcription.go (1)
  • ToBifrostTranscriptionResponse (100-150)
core/schemas/bifrost.go (2)
  • BifrostResponseExtraFields (287-296)
  • HuggingFace (51-51)
core/providers/huggingface/embedding.go (3)
core/schemas/embedding.go (4)
  • BifrostEmbeddingRequest (9-16)
  • BifrostEmbeddingResponse (22-28)
  • EmbeddingData (118-122)
  • EmbeddingStruct (124-128)
core/providers/huggingface/types.go (2)
  • HuggingFaceEmbeddingRequest (303-313)
  • HuggingFaceEmbeddingResponse (324-328)
core/schemas/chatcompletions.go (1)
  • BifrostLLMUsage (640-647)
core/providers/huggingface/chat.go (2)
core/schemas/chatcompletions.go (12)
  • BifrostChatRequest (12-19)
  • ChatContentBlockTypeText (497-497)
  • ChatContentBlockTypeImage (498-498)
  • ChatAssistantMessage (541-545)
  • ChatToolMessage (536-538)
  • BifrostResponseChoice (582-590)
  • ChatAssistantMessageToolCall (564-570)
  • ChatNonStreamResponseChoice (605-608)
  • BifrostLogProbs (593-598)
  • LogProb (625-629)
  • ChatStreamResponseChoice (611-613)
  • ChatStreamResponseChoiceDelta (616-622)
core/providers/huggingface/types.go (12)
  • HuggingFaceChatRequest (72-92)
  • HuggingFaceChatMessage (94-102)
  • HuggingFaceContentItem (105-109)
  • HuggingFaceImageRef (111-113)
  • HuggingFaceToolCall (115-119)
  • HuggingFaceFunction (121-125)
  • HuggingFaceResponseFormat (127-130)
  • HuggingFaceStreamOptions (139-141)
  • HuggingFaceTool (143-146)
  • HuggingFaceToolFunction (148-152)
  • HuggingFaceChatResponse (154-161)
  • HuggingFaceChatStreamResponse (215-224)
core/providers/huggingface/models.go (3)
core/providers/huggingface/types.go (1)
  • HuggingFaceListModelsResponse (24-26)
core/schemas/bifrost.go (13)
  • ModelProvider (32-32)
  • RequestType (86-86)
  • ChatCompletionRequest (92-92)
  • ChatCompletionStreamRequest (93-93)
  • TextCompletionRequest (90-90)
  • TextCompletionStreamRequest (91-91)
  • ResponsesRequest (94-94)
  • ResponsesStreamRequest (95-95)
  • EmbeddingRequest (96-96)
  • SpeechRequest (97-97)
  • SpeechStreamRequest (98-98)
  • TranscriptionRequest (99-99)
  • TranscriptionStreamRequest (100-100)
core/schemas/models.go (2)
  • BifrostListModelsResponse (36-45)
  • Model (109-129)
core/providers/huggingface/types.go (1)
core/schemas/models.go (1)
  • Model (109-129)
🪛 LanguageTool
docs/contributing/adding-a-provider.mdx

[grammar] ~43-~43: Use a hyphen to join words.
Context: ... OpenAI API compatible: #### Non-OpenAI Compatible Providers If the provider ha...

(QB_NEW_EN_HYPHEN)


[grammar] ~71-~71: Use a hyphen to join words.
Context: ...T - Add comprehensive tests #### OpenAI Compatible Providers If the provider is...

(QB_NEW_EN_HYPHEN)


[grammar] ~629-~629: Use a hyphen to join words.
Context: ...ls (for response converters) ### OpenAI Compatible Providers If you are impleme...

(QB_NEW_EN_HYPHEN)


[grammar] ~1475-~1475: Use a hyphen to join words.
Context: ...r complete details. --- ### For OpenAI Compatible Providers For OpenAI-compati...

(QB_NEW_EN_HYPHEN)


[uncategorized] ~1785-~1785: The official name of this software platform is spelled with a capital “H”.
Context: ....yml- ✅ Environment variable added to.github/workflows/release-pipeline.yml` (4 jobs...

(GITHUB)

🔇 Additional comments (25)
.github/workflows/release-pipeline.yml (1)

118-118: Environment variable additions are correct and necessary.

The HUGGING_FACE_API_KEY environment variable is properly added to all four release jobs with correct GitHub Secrets reference syntax. The environment variables are inherited by the release scripts' child processes (specifically go test), where HuggingFace tests access the key via os.Getenv("HUGGING_FACE_API_KEY"). This is the correct pattern for passing secrets to test suites during the release workflow.

The documentation already uses consistent naming (HUGGING_FACE_API_KEY with underscore) as shown in docs/contributing/adding-a-provider.mdx, so no documentation updates are needed.

Likely an incorrect or invalid review comment.

Makefile (5)

15-21: Binary path resolution and variable initialization look sound.

The new variables (GOBIN, GOPATH, DEFAULT_GOBIN, AIR_BIN, GOTESTSUM_BIN) properly detect and prioritize local Go binary paths, with sensible fallbacks. Line 21's GOTESTSUM_BIN simplification (removal of the shell which fallback) is an improvement. This approach supports the HuggingFace provider tests by ensuring test tooling is available in predictable locations.


24-29: Color variables using printf are a good improvement.

Replacing hardcoded ANSI escape sequences with shell printf calls is more portable and avoids raw escape codes in the Makefile. This is a solid enhancement.


65-70: Root check in install-ui is appropriate for local development.

The guard against running make as root on developer machines (except CI environments) prevents npm permission issues on systems like Nix. This is a pragmatic improvement.


155-155: Using AIR_BIN variable in dev target is correct.

The change from direct air invocation to $(AIR_BIN) ensures the tool is invoked from the detected/installed path, improving reliability. Good alignment with the binary path resolution strategy.

Also applies to: 163-163


349-349: Test targets correctly use GOTESTSUM_BIN variable.

All test targets (test, test-core, test-plugins) now use $(GOTESTSUM_BIN) instead of direct gotestsum invocations. This ensures consistency with the binary path resolution and supports the HuggingFace provider testing workflow. The change is well-applied across all affected targets.

Also applies to: 401-401, 425-425, 454-454, 548-548

core/schemas/bifrost.go (3)

35-52: LGTM!

The HuggingFace provider constant is correctly added with the lowercase value "huggingface", consistent with the naming convention used by other providers.


55-62: LGTM!

Adding HuggingFace to SupportedBaseProviders is appropriate, allowing it to serve as a base provider for custom provider configurations.


65-83: LGTM!

HuggingFace is correctly added to StandardProviders, completing the registration as a built-in provider.

ui/lib/constants/logs.ts (2)

2-20: LGTM!

The "huggingface" entry is correctly placed in alphabetical order within the KnownProvidersNames array, and the ProviderName type will automatically include it through type derivation.


43-61: LGTM!

The ProviderLabels entry correctly uses the brand-appropriate capitalization "HuggingFace" for user-facing display.

core/schemas/account.go (1)

8-18: LGTM!

The HuggingFaceKeyConfig field is correctly added to the Key struct following the established pattern for provider-specific configurations.

core/providers/huggingface/models.go (1)

16-104: LGTM! Well-structured model listing and method derivation.

The implementation correctly:

  • Handles nil inputs and skips invalid models
  • Derives supported methods from pipeline tags and model tags with comprehensive coverage
  • Pre-allocates slices for performance
  • Returns sorted method lists for consistency
core/providers/huggingface/types.go (2)

28-48: LGTM! Flexible JSON unmarshaling.

The custom UnmarshalJSON correctly handles both the array form [...] and object form {"models": [...]} returned by different HuggingFace API versions, with proper error handling when neither format matches.


416-444: LGTM! Correct handling of union types.

The custom JSON marshaling/unmarshaling for HuggingFaceTranscriptionEarlyStopping properly handles both boolean and string ("never") forms, which aligns with HuggingFace API's flexible parameter schema.

core/providers/huggingface/huggingface_test.go (1)

12-63: LGTM! Comprehensive test configuration.

The test correctly:

  • Gates on HUGGING_FACE_API_KEY environment variable
  • Uses appropriate models for each feature (transcription, speech synthesis, embeddings, chat, vision)
  • Configures comprehensive test scenarios covering chat, streaming, tool calls, images, embeddings, and more
  • Follows testutil patterns with proper setup and cleanup
core/providers/huggingface/transcription.go (2)

9-101: LGTM! Robust parameter handling.

The conversion correctly:

  • Validates inputs with appropriate error messages
  • Uses extractIntFromInterface to handle numeric parameters from JSON (which may be float64 or int)
  • Handles the early_stopping union type (bool or string)
  • Maps all generation parameters comprehensively

103-142: LGTM! Clean response conversion.

The conversion properly:

  • Validates non-nil response and model name
  • Maps transcription chunks to Bifrost segments with timestamps
  • Sets appropriate provider metadata in ExtraFields
core/providers/huggingface/chat.go (2)

12-207: LGTM! Comprehensive chat request conversion.

The conversion correctly:

  • Handles messages, roles, names, and content (string and structured blocks)
  • Safely processes tool calls with nil checks for Function.Name
  • Maps all chat parameters including response format, stream options, and tools
  • Uses json.RawMessage for flexible fields like ToolChoice

209-323: LGTM! Thorough response conversion.

The conversion properly:

  • Validates inputs and constructs base response fields
  • Converts choices, messages, tool calls, and logprobs to Bifrost format
  • Maps usage information when available
  • Sets appropriate ExtraFields metadata
core/providers/huggingface/huggingface.go (5)

389-461: LGTM! Solid chat completion implementation.

The implementation correctly:

  • Checks operation permissions
  • Splits model identifiers into provider and model components
  • Converts requests using the chat converter
  • Handles errors and raw responses appropriately
  • Sets all required ExtraFields metadata

463-745: LGTM! Robust streaming implementation.

The streaming logic properly:

  • Handles SSE parsing with proper line-by-line processing
  • Checks for context cancellation between chunks
  • Parses error responses in the stream
  • Converts HuggingFace stream responses to Bifrost format
  • Supports fallback to ResponsesStream when needed with proper state management
  • Handles combined usage and content chunks by splitting them into separate events

771-840: LGTM! Clean embedding implementation.

The implementation correctly:

  • Validates operation permissions
  • Retrieves provider mapping to get the correct provider-specific model ID
  • Validates the task type matches "feature-extraction"
  • Converts responses with proper error handling

842-918: LGTM! Complete speech synthesis flow.

The implementation correctly:

  • Validates permissions and request body
  • Retrieves and validates provider mapping for "text-to-speech" task
  • Downloads audio from the returned URL
  • Converts response with proper metadata

924-993: LGTM! Correct transcription implementation.

The implementation correctly:

  • Validates permissions and model mapping
  • Checks for "automatic-speech-recognition" task
  • Converts requests and responses with proper error handling
  • Sets appropriate ExtraFields metadata

@qwerty-dvorak qwerty-dvorak force-pushed the 12-05-feat_huggingface_provider_added branch 2 times, most recently from cdb45be to a1ee290 Compare December 8, 2025 10:36
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

♻️ Duplicate comments (3)
Makefile (2)

97-98: ** Redundant which check with full path will always fail.**

Although a past review comment flagged this (marked "✅ Addressed in commit f279893"), the pattern remains at line 97: if ! which $$INSTALLED && [ ! -x "$$INSTALLED" ]. Since $$INSTALLED is an absolute path (e.g., /home/user/go/bin/air), the which command will always fail (it only searches executables in PATH by name, not by path). The [ ! -x "$$INSTALLED" ] check is what matters.

Simplify to rely solely on the path check:

- if ! which $$INSTALLED >/dev/null 2>&1 && [ ! -x "$$INSTALLED" ]; then \
+ if [ ! -x "$$INSTALLED" ]; then \

This aligns with how AIR_BIN is checked earlier (line 90).


103-104: ** Inconsistent binary availability check; which fallback doesn't handle full paths.**

Line 103 uses [ -x "$(GOTESTSUM_BIN)" ] || which $(GOTESTSUM_BIN) while AIR_BIN (line 90) uses just [ -x "$(AIR_BIN)" ]. A past review flagged this inconsistency (marked "✅ Addressed"), but the pattern remains. If GOTESTSUM_BIN resolves to a full path like /home/user/go/bin/gotestsum, the which fallback fails (it only searches PATH by name).

For consistency and clarity, simplify to match the AIR_BIN pattern:

- @if [ -x "$(GOTESTSUM_BIN)" ] || which $(GOTESTSUM_BIN) > /dev/null 2>&1; then \
+ @if [ -x "$(GOTESTSUM_BIN)" ]; then \

Also apply the same simplification to line 110 in install-gotestsum to remove the redundant which check.

docs/contributing/adding-a-provider.mdx (1)

500-527: Variable name inconsistency persists in documentation example.

The example code declares hfReq on line 500, but lines 510-527 reference providerReq (e.g., providerReq.Temperature). This could confuse contributors.

Apply this diff to fix the variable names:

         // Map parameters
         if bifrostReq.Params != nil {
             params := bifrostReq.Params
             
             // Map standard parameters
             if params.Temperature != nil {
-                providerReq.Temperature = params.Temperature
+                hfReq.Temperature = params.Temperature
             }
             if params.MaxTokens != nil {
-                providerReq.MaxTokens = params.MaxTokens
+                hfReq.MaxTokens = params.MaxTokens
             }
             // ... other standard parameters
             
             // Handle provider-specific ExtraParams
             if params.ExtraParams != nil {
                 if customParam, ok := params.ExtraParams["custom_param"].(string); ok {
-                    providerReq.CustomParam = &customParam
+                    hfReq.CustomParam = &customParam
                 }
             }
         }
 
-        return providerReq
+        return hfReq
     }
🧹 Nitpick comments (9)
ui/README.md (1)

12-12: Consider removing the specific provider count to further reduce maintenance burden.

Line 12 mentions "15+ AI providers" which will require updates whenever the count changes. Since the objective is to redirect users to external documentation rather than maintain this kind of reference, consider removing the number entirely and simply linking to the provider configuration docs.

-- **Provider Management** - Configure [15+ AI providers](https://docs.getbifrost.ai/quickstart/gateway/provider-configuration)
+- **Provider Management** - Configure [AI providers](https://docs.getbifrost.ai/quickstart/gateway/provider-configuration)
core/schemas/mux.go (1)

1146-1221: delta.Thought now flows into both output text deltas and reasoning deltas — verify this is intentional

The new logic:

  • Treats hasContent := delta.Content != nil && *delta.Content != "" and hasThought := delta.Thought != nil && *delta.Thought != "".
  • Enters the text path when hasContent || hasThought.
  • Builds contentDelta by concatenating delta.Content and delta.Thought (when present) and emits it as ResponsesStreamResponseTypeOutputTextDelta via Delta: &contentDelta.

Further down, the existing block at Lines 1369–1380 still emits a separate ResponsesStreamResponseTypeReasoningSummaryTextDelta for non‑empty delta.Thought.

Net effect:

  • For chunks with only delta.Thought (no delta.Content), we now:
    • Create a text item and mark TextItemHasContent = true.
    • Emit response.output_text.delta with the thought text.
    • Also emit reasoning_summary_text.delta with the same text.
  • For chunks where both Content and Thought are set, the main output_text.delta stream carries content + thought while reasoning deltas still carry thought alone.

If delta.Thought is meant as reasoning/chain‑of‑thought that should not appear in the primary user‑visible text stream, this is a behavior change and may cause reasoning to be shown twice (or in the wrong place) in consumers that use both response.output_text.delta and reasoning_summary_text.delta.

If the intent is to surface thought text in the main output stream for specific providers (e.g., Hugging Face) while still emitting reasoning events, it might be worth:

  • Documenting this clearly, and/or
  • Considering a guard such as:
    • Only including delta.Thought in contentDelta when the upstream provider flags it as user‑visible, or
    • Skipping the separate ReasoningSummaryTextDelta emission when you’ve already folded Thought into the main Delta.

Can you confirm the intended semantics here and whether clients are expected to consume both event types simultaneously?

core/providers/huggingface/utils.go (1)

291-319: Consider adding context support for request cancellation.

The downloadAudioFromURL function doesn't accept or use a context.Context, which means:

  1. No timeout control beyond the client's default
  2. No cancellation support if the caller's context is cancelled

Consider accepting context and using DoTimeout or checking context cancellation:

-func (provider *HuggingFaceProvider) downloadAudioFromURL(audioURL string) ([]byte, error) {
+func (provider *HuggingFaceProvider) downloadAudioFromURL(ctx context.Context, audioURL string) ([]byte, error) {
 	req := fasthttp.AcquireRequest()
 	resp := fasthttp.AcquireResponse()
 	defer fasthttp.ReleaseRequest(req)
 	defer fasthttp.ReleaseResponse(resp)
 
 	req.SetRequestURI(audioURL)
 	req.Header.SetMethod(http.MethodGet)
 
-	err := provider.client.Do(req, resp)
+	_, err := providerUtils.MakeRequestWithContext(ctx, provider.client, req, resp)
 	if err != nil {
 		return nil, fmt.Errorf("failed to download audio: %w", err)
 	}
docs/contributing/adding-a-provider.mdx (1)

43-43: Optional: Consider hyphenating "OpenAI-Compatible" for consistency.

Static analysis suggests using a hyphen to join "OpenAI" and "Compatible" when used as a compound adjective (e.g., "OpenAI-Compatible Providers"). This is a minor grammatical nitpick and optional to address.

Also applies to: 71-71, 629-629, 1475-1475

core/providers/huggingface/speech.go (1)

10-12: Consider returning explicit error for nil inputs.

Returning (nil, nil) for nil input can make error handling ambiguous for callers - they need to check both return values. Consider returning an error instead, or document this behavior clearly.

 func ToHuggingFaceSpeechRequest(request *schemas.BifrostSpeechRequest) (*HuggingFaceSpeechRequest, error) {
 	if request == nil {
-		return nil, nil
+		return nil, fmt.Errorf("speech request cannot be nil")
 	}

Alternatively, if nil input is a valid case that should be handled silently, add a comment explaining this design choice.

Also applies to: 99-101

core/providers/huggingface/huggingface_test.go (1)

12-63: Comprehensive HuggingFace test configuration looks solid

Env gating, SetupTest usage, and the ComprehensiveTestConfig (models and enabled scenarios) are consistent with the provider’s implemented surfaces and should give good end‑to‑end coverage. As a minor polish, you could defer client.Shutdown() immediately after successful setup so it still runs if additional subtests are added or the test body grows.

core/providers/huggingface/embedding.go (1)

10-65: Embedding request/response converters are correct; consider mapping more params later

The request converter correctly handles hf-inference vs other providers (using Inputs vs Input), and the response converter builds EmbeddingData and BifrostLLMUsage in line with Bifrost’s schema. As a future enhancement, you could also plumb through typed fields like EncodingFormat and Dimensions (plus any non-text embedding inputs you decide to support) if/when Bifrost starts exposing them for HuggingFace embeddings.

Also applies to: 68-110

core/providers/huggingface/models.go (1)

16-44: Model listing and capability derivation are reasonable and safe

Transforming Hub models into schemas.Model with IDs of the form provider/inferenceProvider/modelId and deriving SupportedMethods from pipeline_tag plus tags gives a sensible, conservative view over the catalog. The heuristics for chat/text, embeddings, TTS, and ASR look balanced; if you find important HF tags that don’t get mapped yet, they can be incrementally added to deriveSupportedMethods.

Also applies to: 46-104

core/providers/huggingface/transcription.go (1)

11-119: Transcription converters look correct; you can later expose more params

The request converter cleanly handles hf-inference (raw audio) vs other providers and the fal‑ai data‑URL requirement, including an explicit guard against unsupported wav input. Generation parameters are mapped in a type‑safe way via SafeExtractIntPointer and direct float/bool assertions, and the response converter sensibly turns chunks.timestamp into TranscriptionSegment ranges.

When you want richer control, you can also project Language, Prompt, and ResponseFormat from BifrostTranscriptionParameters (or ExtraParams) into the HuggingFace request; doing so won’t disturb the current happy path.

Also applies to: 121-159

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 53cb91d and a1ee290.

⛔ Files ignored due to path filters (5)
  • core/internal/testutil/scenarios/audio/Numbers_And_Punctuation.mp3 is excluded by !**/*.mp3
  • core/internal/testutil/scenarios/audio/RoundTrip_Basic_MP3.mp3 is excluded by !**/*.mp3
  • core/internal/testutil/scenarios/audio/RoundTrip_Medium_MP3.mp3 is excluded by !**/*.mp3
  • core/internal/testutil/scenarios/audio/RoundTrip_Technical_MP3.mp3 is excluded by !**/*.mp3
  • core/internal/testutil/scenarios/audio/Technical_Terms.mp3 is excluded by !**/*.mp3
📒 Files selected for processing (32)
  • .github/workflows/pr-tests.yml (1 hunks)
  • .github/workflows/release-pipeline.yml (4 hunks)
  • Makefile (8 hunks)
  • core/bifrost.go (2 hunks)
  • core/internal/testutil/account.go (3 hunks)
  • core/internal/testutil/responses_stream.go (1 hunks)
  • core/internal/testutil/transcription.go (6 hunks)
  • core/providers/gemini/speech.go (2 hunks)
  • core/providers/gemini/transcription.go (2 hunks)
  • core/providers/gemini/utils.go (0 hunks)
  • core/providers/huggingface/chat.go (1 hunks)
  • core/providers/huggingface/embedding.go (1 hunks)
  • core/providers/huggingface/huggingface.go (1 hunks)
  • core/providers/huggingface/huggingface_test.go (1 hunks)
  • core/providers/huggingface/models.go (1 hunks)
  • core/providers/huggingface/speech.go (1 hunks)
  • core/providers/huggingface/transcription.go (1 hunks)
  • core/providers/huggingface/types.go (1 hunks)
  • core/providers/huggingface/utils.go (1 hunks)
  • core/providers/utils/audio.go (1 hunks)
  • core/schemas/account.go (2 hunks)
  • core/schemas/bifrost.go (3 hunks)
  • core/schemas/mux.go (2 hunks)
  • core/schemas/transcriptions.go (1 hunks)
  • docs/apis/openapi.json (1 hunks)
  • docs/contributing/adding-a-provider.mdx (1 hunks)
  • docs/features/unified-interface.mdx (1 hunks)
  • transports/config.schema.json (2 hunks)
  • ui/README.md (6 hunks)
  • ui/lib/constants/config.ts (2 hunks)
  • ui/lib/constants/icons.tsx (1 hunks)
  • ui/lib/constants/logs.ts (2 hunks)
💤 Files with no reviewable changes (1)
  • core/providers/gemini/utils.go
🚧 Files skipped from review as they are similar to previous changes (7)
  • core/internal/testutil/responses_stream.go
  • docs/apis/openapi.json
  • core/providers/huggingface/chat.go
  • transports/config.schema.json
  • .github/workflows/pr-tests.yml
  • core/internal/testutil/account.go
  • docs/features/unified-interface.mdx
👮 Files not reviewed due to content moderation or server errors (1)
  • ui/lib/constants/icons.tsx
🧰 Additional context used
📓 Path-based instructions (1)
**

⚙️ CodeRabbit configuration file

always check the stack if there is one for the current PR. do not give localized reviews for the PR, always see all changes in the light of the whole stack of PRs (if there is a stack, if there is no stack you can continue to make localized suggestions/reviews)

Files:

  • core/providers/gemini/transcription.go
  • core/schemas/transcriptions.go
  • ui/lib/constants/logs.ts
  • core/providers/huggingface/models.go
  • core/providers/gemini/speech.go
  • core/internal/testutil/transcription.go
  • core/schemas/account.go
  • ui/lib/constants/config.ts
  • core/providers/huggingface/huggingface_test.go
  • ui/README.md
  • core/schemas/mux.go
  • core/providers/huggingface/embedding.go
  • core/providers/huggingface/transcription.go
  • core/bifrost.go
  • core/providers/huggingface/speech.go
  • core/schemas/bifrost.go
  • core/providers/utils/audio.go
  • ui/lib/constants/icons.tsx
  • Makefile
  • core/providers/huggingface/utils.go
  • docs/contributing/adding-a-provider.mdx
  • core/providers/huggingface/huggingface.go
  • core/providers/huggingface/types.go
🧠 Learnings (1)
📚 Learning: 2025-09-01T15:29:17.076Z
Learnt from: TejasGhatte
Repo: maximhq/bifrost PR: 372
File: core/providers/utils.go:552-598
Timestamp: 2025-09-01T15:29:17.076Z
Learning: The detectAudioMimeType function in core/providers/utils.go is specifically designed for Gemini's audio format support, which only includes: WAV (audio/wav), MP3 (audio/mp3), AIFF (audio/aiff), AAC (audio/aac), OGG Vorbis (audio/ogg), and FLAC (audio/flac). The implementation should remain focused on these specific formats rather than being over-engineered for general-purpose audio detection.

Applied to files:

  • core/providers/utils/audio.go
🧬 Code graph analysis (11)
core/providers/gemini/transcription.go (1)
core/providers/utils/audio.go (1)
  • DetectAudioMimeType (78-119)
core/providers/huggingface/models.go (3)
core/providers/huggingface/types.go (1)
  • HuggingFaceListModelsResponse (24-26)
core/schemas/bifrost.go (13)
  • ModelProvider (32-32)
  • RequestType (86-86)
  • ChatCompletionRequest (92-92)
  • ChatCompletionStreamRequest (93-93)
  • TextCompletionRequest (90-90)
  • TextCompletionStreamRequest (91-91)
  • ResponsesRequest (94-94)
  • ResponsesStreamRequest (95-95)
  • EmbeddingRequest (96-96)
  • SpeechRequest (97-97)
  • SpeechStreamRequest (98-98)
  • TranscriptionRequest (99-99)
  • TranscriptionStreamRequest (100-100)
core/schemas/models.go (1)
  • Model (109-129)
core/providers/gemini/speech.go (1)
core/providers/utils/audio.go (1)
  • DetectAudioMimeType (78-119)
core/internal/testutil/transcription.go (4)
core/schemas/transcriptions.go (3)
  • BifrostTranscriptionRequest (3-10)
  • TranscriptionInput (28-30)
  • TranscriptionParameters (32-49)
core/internal/testutil/utils.go (1)
  • GetProviderVoice (38-86)
core/schemas/speech.go (4)
  • BifrostSpeechRequest (9-16)
  • SpeechParameters (43-58)
  • SpeechVoiceInput (65-68)
  • BifrostSpeechResponse (22-29)
core/internal/testutil/test_retry_framework.go (3)
  • TestRetryContext (168-173)
  • SpeechRetryConfig (216-223)
  • WithSpeechTestRetry (1326-1476)
core/schemas/account.go (1)
ui/lib/types/config.ts (3)
  • AzureKeyConfig (23-27)
  • VertexKeyConfig (36-42)
  • BedrockKeyConfig (53-60)
core/providers/huggingface/huggingface_test.go (5)
core/internal/testutil/setup.go (1)
  • SetupTest (51-60)
core/internal/testutil/account.go (2)
  • ComprehensiveTestConfig (47-64)
  • TestScenarios (22-44)
core/schemas/bifrost.go (2)
  • HuggingFace (51-51)
  • Fallback (131-134)
core/schemas/models.go (1)
  • Model (109-129)
core/internal/testutil/tests.go (1)
  • RunAllComprehensiveTests (15-62)
core/schemas/mux.go (2)
core/schemas/responses.go (2)
  • BifrostResponsesStreamResponse (1422-1460)
  • ResponsesStreamResponseTypeOutputTextDelta (1370-1370)
core/schemas/utils.go (1)
  • Ptr (16-18)
core/providers/huggingface/embedding.go (3)
core/schemas/embedding.go (4)
  • BifrostEmbeddingRequest (9-16)
  • BifrostEmbeddingResponse (22-28)
  • EmbeddingData (118-122)
  • EmbeddingStruct (124-128)
core/providers/huggingface/types.go (2)
  • HuggingFaceEmbeddingRequest (303-314)
  • HuggingFaceEmbeddingResponse (325-329)
core/schemas/chatcompletions.go (1)
  • BifrostLLMUsage (640-647)
core/bifrost.go (2)
core/schemas/bifrost.go (1)
  • HuggingFace (51-51)
core/providers/huggingface/huggingface.go (1)
  • NewHuggingFaceProvider (84-117)
core/schemas/bifrost.go (1)
ui/lib/types/config.ts (1)
  • ModelProvider (171-174)
ui/lib/constants/icons.tsx (1)
ui/scripts/fix-paths.js (1)
  • path (4-4)
🪛 LanguageTool
docs/contributing/adding-a-provider.mdx

[grammar] ~43-~43: Use a hyphen to join words.
Context: ... OpenAI API compatible: #### Non-OpenAI Compatible Providers If the provider ha...

(QB_NEW_EN_HYPHEN)


[grammar] ~71-~71: Use a hyphen to join words.
Context: ...T - Add comprehensive tests #### OpenAI Compatible Providers If the provider is...

(QB_NEW_EN_HYPHEN)


[grammar] ~629-~629: Use a hyphen to join words.
Context: ...ls (for response converters) ### OpenAI Compatible Providers If you are impleme...

(QB_NEW_EN_HYPHEN)


[grammar] ~1475-~1475: Use a hyphen to join words.
Context: ...r complete details. --- ### For OpenAI Compatible Providers For OpenAI-compati...

(QB_NEW_EN_HYPHEN)


[uncategorized] ~1785-~1785: The official name of this software platform is spelled with a capital “H”.
Context: ....yml- ✅ Environment variable added to.github/workflows/release-pipeline.yml` (4 jobs...

(GITHUB)

🔇 Additional comments (27)
ui/README.md (2)

1-242: Excellent update addressing past feedback—README now emphasizes external docs and less maintenance burden.

The README has been significantly improved to align with the prior feedback to redirect to documentation rather than maintain hardcoded lists. The shift from "dashboard" to "web interface," addition of external documentation links, updated architecture section (Redux Toolkit + RTK Query), and clearer Quick Start flow are all valuable enhancements.


209-218: Verify that all external documentation links are current and accessible.

The README now extensively links to docs.getbifrost.ai for detailed information. Ensure these endpoints are live, correct, and reflect the current state of the documentation (especially those that reference new HuggingFace provider features and recent architectural changes like the move to RTK Query).

.github/workflows/release-pipeline.yml (2)

118-118: Environment variable additions are consistent and correctly placed.

The four additions of HUGGING_FACE_API_KEY across the release jobs use the correct naming convention (consistent with the resolution documented in commit 53cb91d) and follow the established pattern of other provider API keys. The placement in the environment sections is logical and maintains alphabetical/categorical grouping with other secrets.

Also applies to: 191-191, 268-268, 357-357


118-118: HUGGING_FACE_API_KEY is defined but not used by any release script.

The environment variable is added consistently across all four release jobs with correct syntax, but examination of the release scripts reveals it is never referenced. None of release-core.sh, release-framework.sh, release-all-plugins.sh, release-single-plugin.sh, or release-bifrost-http.sh check for or use this variable. It only handles CODECOV_TOKEN and GH_TOKEN/GITHUB_TOKEN.

Verify that:

  1. This variable is intended for a future feature not yet implemented, or
  2. The scripts that should consume it need to be updated to use HUGGING_FACE_API_KEY

Likely an incorrect or invalid review comment.

Makefile (5)

14-21: Binary path management logic is well-structured.

The introduction of GOBIN, GOPATH, and DEFAULT_GOBIN to compute AIR_BIN and GOTESTSUM_BIN provides a consistent way to locate or fall back to binaries, avoiding hard-coded path assumptions. This improves portability across systems with different Go installation layouts.


24-29: Color definitions via printf improve portability and maintainability.

Using $(shell printf) instead of embedded ANSI codes is clearer and easier to maintain. This approach also enables future conditional color handling if needed.


75-85: Clarify npm install working directory in the next.js installation fallback.

At line 81, the code runs npm --prefix . install next while already in the ui/ directory (line 76: @cd ui &&). The --prefix . appears redundant since you're already in that directory. Either remove the prefix or adjust the path if this is intentional.

Verify that this installs next in the correct location (ui/node_modules). If intended, a comment would help clarify the reasoning.


155-155: Consistent use of $(AIR_BIN) in dev target.

Both debug and normal mode branches correctly invoke $(AIR_BIN), ensuring the target works when air is in a non-standard location (e.g., managed by DEFAULT_GOBIN).

Also applies to: 163-163


349-349: Test targets consistently use $(GOTESTSUM_BIN) variable.

All test invocations now use $(GOTESTSUM_BIN) instead of hard-coded "gotestsum", ensuring tests run even when the binary is in a non-standard location. The GOWORK=off flags are correctly positioned.

Also applies to: 401-401, 425-425, 454-454, 548-548

core/providers/gemini/transcription.go (1)

6-8: Centralizing audio MIME detection via utils is consistent and safe

Using utils.DetectAudioMimeType in ToGeminiTranscriptionRequest keeps Gemini transcription in sync with the shared audio detector and avoids duplicated logic; for any non‑empty Input.File you now get a deterministic MIME type within Gemini’s supported set, with a safe MP3 fallback. This looks correct and consistent with the speech path.

Also applies to: 159-165

core/providers/utils/audio.go (1)

64-119: Shared DetectAudioMimeType correctly mirrors Gemini’s supported formats

The new DetectAudioMimeType covers exactly the expected Gemini formats (WAV, MP3, AIFF/AIFC, AAC, OGG, FLAC) using robust header checks and keeps a conservative MP3 fallback for short/unknown data. The ordering of checks (e.g., AAC before MP3 frame sync) is sensible and matches the earlier Gemini‑specific behavior, making this a solid centralization of the audio detection logic.

Based on learnings, this stays scoped to Gemini’s supported formats without over-engineering.

core/providers/gemini/speech.go (2)

151-161: No functional change in WAV conversion branch

The minor edit around the else branch in ToBifrostSpeechResponse doesn’t alter behavior: PCM is still converted to WAV when requested, otherwise raw PCM is passed through. No issues here.


176-183: Using utils.DetectAudioMimeType for speech responses keeps MIME typing consistent

Switching MIMEType in ToGeminiSpeechResponse to utils.DetectAudioMimeType(bifrostResp.Audio) reuses the shared detector and ensures outbound audio responses are labeled with a MIME type from the same, well‑defined set as other Gemini audio flows. This is a clean and correct consolidation.

ui/lib/constants/logs.ts (1)

10-10: LGTM!

The HuggingFace provider addition follows the established pattern and is correctly integrated into both the provider names array and the labels mapping.

Also applies to: 60-60

core/schemas/account.go (2)

54-56: LGTM!

The HuggingFaceKeyConfig type definition follows the established pattern from other provider configurations (Azure, Vertex, Bedrock) and correctly includes the Deployments field for model-to-deployment mapping.


17-17: Fix typo in JSON tag: "hugggingface" → "huggingface".

The JSON tag contains three 'g's (hugggingface_key_config) instead of two. This typo will break API compatibility and serialization.

Apply this diff:

-	HuggingFaceKeyConfig *HuggingFaceKeyConfig `json:"hugggingface_key_config,omitempty"` // Hugging Face-specific key configuration
+	HuggingFaceKeyConfig *HuggingFaceKeyConfig `json:"huggingface_key_config,omitempty"` // Hugging Face-specific key configuration

Likely an incorrect or invalid review comment.

core/schemas/bifrost.go (1)

51-51: LGTM! HuggingFace provider correctly registered in schemas.

The provider constant follows the established naming conventions:

  • PascalCase for Go constant (HuggingFace)
  • Lowercase string value ("huggingface")
  • Appropriately added to both SupportedBaseProviders (enabling custom providers to use HuggingFace as a base) and StandardProviders (registering it as a built-in provider)

Also applies to: 61-61, 82-82

core/providers/huggingface/utils.go (2)

178-189: Confirmed: Copy-paste error fixed.

The error messages now correctly reference sambanova (line 182) and scaleway (line 188) instead of the previous erroneous "nebius" references noted in past reviews.


132-149: LGTM! Model name parsing now handles edge cases.

The function now properly returns an error when the model name has no slashes (t == 0), addressing the previously flagged issue. The logic correctly:

  • Returns error for invalid format (no slashes)
  • Defaults to hf-inference provider for single-slash models (e.g., org/model)
  • Extracts explicit provider for multi-slash models (e.g., provider/org/model)
docs/contributing/adding-a-provider.mdx (1)

7-13: Good addition of quick reference note.

The note directing contributors to reference implementations (cerebras/ for OpenAI-compatible, huggingface/ or anthropic/ for custom APIs) is helpful for understanding real-world patterns beyond the simplified examples.

core/providers/huggingface/speech.go (1)

41-67: LGTM! Integer extraction now handles JSON float64 values.

The use of schemas.SafeExtractIntPointer for integer parameters (max_new_tokens, max_length, min_length, etc.) properly handles the case where JSON unmarshaling produces float64 instead of int. This addresses the concern raised in past reviews.

ui/lib/constants/config.ts (1)

40-40: LGTM! HuggingFace correctly added to UI constants.

The model placeholder provides helpful examples (google/gemma-2-2b-it, nebius/Qwen/Qwen3-Embedding-8B) that demonstrate the HuggingFace model naming convention with provider prefixes. The key requirement is correctly set to true.

Likely an incorrect or invalid review comment.

core/bifrost.go (1)

26-37: HuggingFace provider wiring into Bifrost factory looks correct

The new import and createBaseProvider branch for schemas.HuggingFace follow the existing provider pattern and cleanly integrate the new provider without altering existing behavior.

Also applies to: 1327-1328

core/providers/huggingface/huggingface.go (3)

83-117: Provider construction, request execution, and model listing follow existing patterns

NewHuggingFaceProvider’s client setup, pool pre‑warming, and base‑URL handling match other providers; completeRequestWithRetry cleanly centralizes model‑ID validation and 404 cache invalidation, and listModelsByKey’s fan‑out/aggregation logic is careful about errors, latency, and optional raw responses. The wiring into ListModels with CheckOperationAllowed and the keyless path is also consistent with the rest of Bifrost.

Also applies to: 129-206, 268-401


430-751: Chat completion (sync/stream) integration looks robust

The chat path correctly normalizes the model string via splitIntoModelProvider, uses CheckContextAndGetRequestBody+ToHuggingFaceChatCompletionRequest, and maps back into Bifrost responses with proper ExtraFields (provider, requested model, request type, latency, raw response). The streaming implementation uses bufio.Scanner with an enlarged buffer, handles SSE framing, distinguishes error envelopes vs normal chunks, and supports the Responses→Chat fallback by re‑emitting events as ResponsesStream where needed. Overall this matches Bifrost’s streaming conventions and should behave well under load.


777-837: Embedding, Speech, and Transcription paths are consistent and guarded

The Embedding/Speech/Transcription methods all:

  • Check CheckOperationAllowed for the relevant request type,
  • Derive inferenceProvider/modelName via splitIntoModelProvider,
  • Use the appropriate ToHuggingFace* converter plus completeRequestWithRetry for routing and retry,
  • Convert back into Bifrost responses and set ExtraFields (provider, requested model, request type, latency, optional raw response).

The hf‑inference vs non‑hf‑inference split for transcription audio, and the fal‑ai audio‑URL handling combined with the new test fixtures, align well with the new converter logic.

Also applies to: 839-902, 908-975

core/providers/huggingface/types.go (1)

10-68: HuggingFace model, embedding, and transcription types align well with the APIs

The model types (including HuggingFaceModel, HuggingFaceListModelsResponse, and the inference‑provider mapping structs) give a clean projection of the Hub’s model/info endpoints, and the custom UnmarshalJSON for list‑models makes the code resilient to both array and object forms. Similarly, the embedding response’s multi‑shape unmarshal logic and the transcription generation/early‑stopping union cover the main formats exposed by HuggingFace while still mapping cleanly into Bifrost’s abstractions.

Also applies to: 301-397, 431-513

@qwerty-dvorak qwerty-dvorak force-pushed the 12-05-feat_huggingface_provider_added branch from 1249a29 to 55ca6ee Compare December 18, 2025 09:38
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (3)
core/providers/huggingface/types.go (1)

271-272: Remove unused Extra field or document its intended purpose.

The Extra map[string]any field with json:"-" tag is defined but never used anywhere in the codebase. This was flagged in a previous review but appears unaddressed. Either remove it or add documentation explaining its intended future use.

 type HuggingFaceSpeechRequest struct {
     Text       string                       `json:"text"`
     Provider   string                       `json:"provider" validate:"required"`
     Model      string                       `json:"model" validate:"required"`
     Parameters *HuggingFaceSpeechParameters `json:"parameters,omitempty"`
-    Extra      map[string]any               `json:"-"`
 }
docs/features/providers/huggingface.mdx (1)

47-52: Clarify fal‑ai audio format wording to match the current enforcement.

The bullets say “Only MP3 … WAV and other formats are explicitly rejected”, but the snippet’s error suggests “mp3 or ogg” and the guard only special‑cases audio/wav. To avoid confusion for maintainers, consider tightening either:

  • the prose (e.g., “WAV is explicitly rejected; MP3 is supported and other formats like OGG are best‑effort”), or
  • the code/example to truly enforce MP3‑only if that’s the intended contract.

This keeps docs and behavior aligned while still documenting the HuggingFace‑specific fal‑ai limitation (WAV rejected, MP3 required). Based on learnings, this is specific to fal‑ai when routed via HuggingFace.

Also applies to: 107-118

core/providers/huggingface/transcription.go (1)

24-47: falAI transcription branch still omits Model and Provider fields.

Non‑falAI requests correctly set both Model and Provider, but the falAI branch only sets AudioURL. For multi‑provider Hugging Face usage, fal‑ai still needs model and provider identifiers; without them you risk provider‑side 4xx or ambiguous behavior.

You’ve already added the empty‑file guard; the remaining fix is to also populate Model and Provider in the falAI branch.

Suggested patch for falAI branch
-		encoded := base64.StdEncoding.EncodeToString(request.Input.File)
-		mimeType := getMimeTypeForAudioType(utils.DetectAudioMimeType(request.Input.File))
-		if mimeType == "audio/wav" {
-			return nil, fmt.Errorf("fal-ai provider does not support audio/wav format; please use a different format like mp3 or ogg")
-		}
-		encoded = fmt.Sprintf("data:%s;base64,%s", mimeType, encoded)
-		hfRequest = &HuggingFaceTranscriptionRequest{
-			AudioURL: encoded,
-		}
+		encoded := base64.StdEncoding.EncodeToString(request.Input.File)
+		mimeType := getMimeTypeForAudioType(utils.DetectAudioMimeType(request.Input.File))
+		if mimeType == "audio/wav" {
+			return nil, fmt.Errorf("fal-ai provider does not support audio/wav format; please use a different format like mp3 or ogg")
+		}
+		encoded = fmt.Sprintf("data:%s;base64,%s", mimeType, encoded)
+		hfRequest = &HuggingFaceTranscriptionRequest{
+			AudioURL: encoded,
+			Model:    schemas.Ptr(modelName),
+			Provider: schemas.Ptr(string(inferenceProvider)),
+		}
🧹 Nitpick comments (7)
ui/README.md (1)

11-17: Address prior feedback on manual feature maintenance in README.

A past review comment suggested avoiding manually maintained feature lists in favor of redirecting entirely to docs (as done in the main README). While these changes improve the README with external links, the Key Features section still maintains an inline list of 7 features. Consider consolidating this to a single link to the full feature documentation, keeping only the most critical items or removing the list entirely in favor of a pointer to the complete feature guide.

This aligns with the spirit of avoiding lists that can become outdated and keeping the README concise.

For reference, a more consolidated approach might be:

### Key Features

Bifrost UI provides comprehensive AI infrastructure management. [View all features →](https://docs.getbifrost.ai/features)
core/schemas/mux.go (1)

1214-1241: Simplify condition by removing redundant check.

The condition at line 1216 contains a redundant hasContent check inside the second disjunct. Since the outer hasContent already short-circuits when true, the inner || hasContent in (hasReasoning || hasContent) is unnecessary.

🔎 Apply this diff to simplify the condition:
-		// Emit text delta - at least one is required for lifecycle validation
-		// Even for reasoning-only responses, we emit an empty delta on the first chunk
-		if hasContent || (!state.TextItemHasContent && (hasReasoning || hasContent)) {
+		// Emit text delta - at least one is required for lifecycle validation
+		// Even for reasoning-only responses, we emit an empty delta on the first chunk
+		if hasContent || (!state.TextItemHasContent && hasReasoning) {
docs/contributing/adding-a-provider.mdx (2)

1713-1714: Minor formatting issue: Missing newline before code block.

Line 1714 ends with } but there's no blank line before the next section. This could cause rendering issues in some Markdown processors.

             return nil, fmt.Errorf("unsupported provider: %s", targetProviderKey)
 }
+

1999-2002: Minor: Hyphenate "Tool-calling" for grammatical consistency.

Per LanguageTool and consistency with "OpenAI-compatible" elsewhere in the doc:

-**Tool calling tests fail**:
+**Tool-calling tests fail**:
core/internal/testutil/transcription.go (1)

73-97: Consider extracting fixture loading into a helper function.

The fixture loading pattern is duplicated across 5 locations with nearly identical code:

_, filename, _, _ := runtime.Caller(0)
dir := filepath.Dir(filename)
filePath := filepath.Join(dir, "scenarios", "media", fmt.Sprintf("%s.mp3", tc.name))
fileContent, err := os.ReadFile(filePath)
if err != nil {
    t.Fatalf("failed to read audio fixture %s: %v", filePath, err)
}
🔎 Consider extracting to a helper:
// loadAudioFixture loads a pre-generated audio fixture for testing
func loadAudioFixture(t *testing.T, fixtureName string) []byte {
    t.Helper()
    _, filename, _, _ := runtime.Caller(1)
    dir := filepath.Dir(filename)
    filePath := filepath.Join(dir, "scenarios", "media", fmt.Sprintf("%s.mp3", fixtureName))
    content, err := os.ReadFile(filePath)
    if err != nil {
        t.Fatalf("failed to read audio fixture %s: %v", filePath, err)
    }
    return content
}

Also applies to: 261-278, 369-386, 463-480, 561-578

core/providers/huggingface/huggingface.go (2)

369-417: Consider labeling aggregated raw list‑models responses by inference provider instead of index.

Right now combined raw responses are stored under keys like "provider_0", "provider_1". If you ever inspect this blob, mapping back to which entry came from which inference provider is indirect.

If you decide to iterate on observability later, consider storing them keyed by the concrete inference provider identifier (e.g., "hf-inference", "fal-ai") instead of just the ordinal index.


49-59: Chat response pooling is only partially used; you may simplify or expand it.

acquireHuggingFaceChatResponse pulls from a pool that you pre‑warm in NewHuggingFaceProvider, but ChatCompletion never returns these objects to the pool (only Responses does via defer releaseHuggingFaceChatResponse). That means the pre‑warmed chat pool doesn’t meaningfully reduce allocations for the main chat path.

Not urgent, but for clarity/perf you could either:

  • Drop pooling for chat responses and allocate directly in ChatCompletion, or
  • Ensure returned chat responses are eventually recycled back into this pool if you add a release point in the higher layers.

Also applies to: 97-103, 494-529

📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1249a29 and 55ca6ee.

⛔ Files ignored due to path filters (6)
  • core/internal/testutil/scenarios/media/Numbers_And_Punctuation.mp3 is excluded by !**/*.mp3
  • core/internal/testutil/scenarios/media/RoundTrip_Basic_MP3.mp3 is excluded by !**/*.mp3
  • core/internal/testutil/scenarios/media/RoundTrip_Medium_MP3.mp3 is excluded by !**/*.mp3
  • core/internal/testutil/scenarios/media/RoundTrip_Technical_MP3.mp3 is excluded by !**/*.mp3
  • core/internal/testutil/scenarios/media/Technical_Terms.mp3 is excluded by !**/*.mp3
  • ui/package-lock.json is excluded by !**/package-lock.json
📒 Files selected for processing (36)
  • .github/workflows/pr-tests.yml (1 hunks)
  • .github/workflows/release-pipeline.yml (4 hunks)
  • core/bifrost.go (2 hunks)
  • core/internal/testutil/account.go (4 hunks)
  • core/internal/testutil/responses_stream.go (1 hunks)
  • core/internal/testutil/transcription.go (6 hunks)
  • core/providers/gemini/speech.go (1 hunks)
  • core/providers/gemini/transcription.go (2 hunks)
  • core/providers/gemini/utils.go (0 hunks)
  • core/providers/huggingface/chat.go (1 hunks)
  • core/providers/huggingface/embedding.go (1 hunks)
  • core/providers/huggingface/huggingface.go (1 hunks)
  • core/providers/huggingface/huggingface_test.go (1 hunks)
  • core/providers/huggingface/models.go (1 hunks)
  • core/providers/huggingface/responses.go (1 hunks)
  • core/providers/huggingface/speech.go (1 hunks)
  • core/providers/huggingface/transcription.go (1 hunks)
  • core/providers/huggingface/types.go (1 hunks)
  • core/providers/huggingface/utils.go (1 hunks)
  • core/providers/openai/openai.go (1 hunks)
  • core/providers/utils/audio.go (1 hunks)
  • core/providers/utils/utils.go (1 hunks)
  • core/schemas/account.go (2 hunks)
  • core/schemas/bifrost.go (3 hunks)
  • core/schemas/mux.go (3 hunks)
  • core/schemas/transcriptions.go (1 hunks)
  • docs/apis/openapi.json (1 hunks)
  • docs/contributing/adding-a-provider.mdx (1 hunks)
  • docs/docs.json (1 hunks)
  • docs/features/providers/huggingface.mdx (1 hunks)
  • docs/features/providers/providers-unified-interface.mdx (2 hunks)
  • transports/config.schema.json (2 hunks)
  • ui/README.md (6 hunks)
  • ui/lib/constants/config.ts (2 hunks)
  • ui/lib/constants/icons.tsx (1 hunks)
  • ui/lib/constants/logs.ts (2 hunks)
💤 Files with no reviewable changes (1)
  • core/providers/gemini/utils.go
🚧 Files skipped from review as they are similar to previous changes (17)
  • core/providers/gemini/transcription.go
  • docs/features/providers/providers-unified-interface.mdx
  • core/providers/huggingface/models.go
  • docs/apis/openapi.json
  • core/schemas/transcriptions.go
  • core/schemas/account.go
  • core/providers/huggingface/chat.go
  • .github/workflows/pr-tests.yml
  • core/providers/huggingface/speech.go
  • ui/lib/constants/config.ts
  • core/providers/utils/audio.go
  • ui/lib/constants/logs.ts
  • core/internal/testutil/responses_stream.go
  • transports/config.schema.json
  • core/providers/huggingface/utils.go
  • core/providers/huggingface/responses.go
  • docs/docs.json
👮 Files not reviewed due to content moderation or server errors (1)
  • ui/lib/constants/icons.tsx
🧰 Additional context used
📓 Path-based instructions (1)
**

⚙️ CodeRabbit configuration file

always check the stack if there is one for the current PR. do not give localized reviews for the PR, always see all changes in the light of the whole stack of PRs (if there is a stack, if there is no stack you can continue to make localized suggestions/reviews)

Files:

  • core/providers/gemini/speech.go
  • core/providers/huggingface/embedding.go
  • core/internal/testutil/transcription.go
  • ui/lib/constants/icons.tsx
  • core/providers/openai/openai.go
  • core/schemas/bifrost.go
  • core/schemas/mux.go
  • docs/contributing/adding-a-provider.mdx
  • core/providers/utils/utils.go
  • core/bifrost.go
  • core/internal/testutil/account.go
  • core/providers/huggingface/transcription.go
  • core/providers/huggingface/huggingface_test.go
  • core/providers/huggingface/huggingface.go
  • ui/README.md
  • docs/features/providers/huggingface.mdx
  • core/providers/huggingface/types.go
🧠 Learnings (8)
📚 Learning: 2025-12-09T17:07:42.007Z
Learnt from: qwerty-dvorak
Repo: maximhq/bifrost PR: 1006
File: core/schemas/account.go:9-18
Timestamp: 2025-12-09T17:07:42.007Z
Learning: In core/schemas/account.go, the HuggingFaceKeyConfig field within the Key struct is currently unused and reserved for future Hugging Face inference endpoint deployments. Do not flag this field as missing from OpenAPI documentation or require its presence in the API spec until the feature is actively implemented and used. When the feature is added, update the OpenAPI docs accordingly; otherwise, treat this field as non-breaking and not part of the current API surface.

Applied to files:

  • core/providers/gemini/speech.go
  • core/providers/huggingface/embedding.go
  • core/internal/testutil/transcription.go
  • core/providers/openai/openai.go
  • core/schemas/bifrost.go
  • core/schemas/mux.go
  • core/providers/utils/utils.go
  • core/bifrost.go
  • core/internal/testutil/account.go
  • core/providers/huggingface/transcription.go
  • core/providers/huggingface/huggingface_test.go
  • core/providers/huggingface/huggingface.go
  • core/providers/huggingface/types.go
📚 Learning: 2025-12-15T10:34:13.855Z
Learnt from: qwerty-dvorak
Repo: maximhq/bifrost PR: 1006
File: core/providers/huggingface/chat.go:370-389
Timestamp: 2025-12-15T10:34:13.855Z
Learning: In the HuggingFace provider (core/providers/huggingface/), streaming tool call data arrives as a single delta chunk with all fields (id, type, function.name, function.arguments) present at once. Do not implement accumulation logic across deltas for streaming tool calls in this codebase; rely on single-chunk data as the guaranteed behavior. If there is existing accumulation logic in this directory, remove it or guard it behind this assumption.

Applied to files:

  • core/providers/huggingface/embedding.go
  • core/providers/huggingface/transcription.go
  • core/providers/huggingface/huggingface_test.go
  • core/providers/huggingface/huggingface.go
  • core/providers/huggingface/types.go
📚 Learning: 2025-12-15T10:19:32.071Z
Learnt from: qwerty-dvorak
Repo: maximhq/bifrost PR: 1006
File: core/internal/testutil/transcription.go:241-258
Timestamp: 2025-12-15T10:19:32.071Z
Learning: In core/internal/testutil package, the function GenerateTTSAudioForTest returns ([]byte, string) not ([]byte, error), and it already handles errors internally by calling t.Fatalf(). Therefore, the blank identifier for the second return value is correct and no external error checking is needed.

Applied to files:

  • core/internal/testutil/transcription.go
📚 Learning: 2025-12-11T11:58:25.307Z
Learnt from: Pratham-Mishra04
Repo: maximhq/bifrost PR: 1000
File: core/providers/openai/responses.go:42-84
Timestamp: 2025-12-11T11:58:25.307Z
Learning: In core/providers/openai/responses.go (and related OpenAI response handling), document and enforce the API format constraint: if ResponsesReasoning != nil and the response contains content blocks, all content blocks should be treated as reasoning blocks by default. Implement type guards or parsing logic accordingly, and add unit tests to verify that when ResponsesReasoning is non-nil, content blocks are labeled as reasoning blocks. Include clear comments in the code explaining the rationale and ensure downstream consumers rely on this behavior.

Applied to files:

  • core/providers/openai/openai.go
📚 Learning: 2025-12-14T14:43:30.902Z
Learnt from: Radheshg04
Repo: maximhq/bifrost PR: 980
File: core/providers/openai/images.go:10-22
Timestamp: 2025-12-14T14:43:30.902Z
Learning: Enforce the OpenAI image generation SSE event type values across the OpenAI image flow in the repository: use "image_generation.partial_image" for partial chunks, "image_generation.completed" for the final result, and "error" for errors. Apply this consistently in schemas, constants, tests, accumulator routing, and UI code within core/providers/openai (and related Go files) to ensure uniform event typing and avoid mismatches.

Applied to files:

  • core/providers/openai/openai.go
📚 Learning: 2025-12-15T10:16:21.909Z
Learnt from: qwerty-dvorak
Repo: maximhq/bifrost PR: 1006
File: core/providers/huggingface/huggingface_test.go:12-63
Timestamp: 2025-12-15T10:16:21.909Z
Learning: In provider tests under core/providers/<provider>/*_test.go, do not require or flag the use of defer for Shutdown(); instead call client.Shutdown() at the end of each test function. This pattern appears consistent across all provider tests. Apply this rule only within this path; for other tests or resources, defer may still be appropriate.

Applied to files:

  • core/providers/huggingface/huggingface_test.go
📚 Learning: 2025-12-15T10:06:05.395Z
Learnt from: qwerty-dvorak
Repo: maximhq/bifrost PR: 1006
File: docs/features/providers/huggingface.mdx:39-61
Timestamp: 2025-12-15T10:06:05.395Z
Learning: For fal-ai transcription requests routed through HuggingFace in Bifrost, WAV (audio/wav) is not supported and should be rejected. Only MP3 format is supported. Update the documentation and any related examples to reflect MP3 as the required input format for HuggingFace-based transcription, and note WAV should not be used. This applies specifically to the HuggingFace provider integration in this repository.

Applied to files:

  • docs/features/providers/huggingface.mdx
📚 Learning: 2025-12-09T17:08:21.123Z
Learnt from: qwerty-dvorak
Repo: maximhq/bifrost PR: 1006
File: docs/features/providers/huggingface.mdx:171-195
Timestamp: 2025-12-09T17:08:21.123Z
Learning: In docs/features/providers/huggingface.mdx, use the official Hugging Face naming conventions for provider identifiers in the capabilities table (e.g., ovhcloud-ai-endpoints, z-ai). Do not map to SDK identifiers like ovhcloud or zai-org; this aligns with Hugging Face's public docs and improves consistency for readers.

Applied to files:

  • docs/features/providers/huggingface.mdx
🧬 Code graph analysis (9)
core/providers/gemini/speech.go (1)
core/providers/utils/audio.go (1)
  • DetectAudioMimeType (78-119)
core/providers/huggingface/embedding.go (3)
core/schemas/embedding.go (4)
  • BifrostEmbeddingRequest (9-16)
  • BifrostEmbeddingResponse (22-28)
  • EmbeddingData (118-122)
  • EmbeddingStruct (124-128)
core/providers/huggingface/types.go (2)
  • HuggingFaceEmbeddingRequest (161-172)
  • InputsCustomType (211-214)
core/schemas/chatcompletions.go (1)
  • BifrostLLMUsage (845-852)
core/internal/testutil/transcription.go (4)
core/schemas/transcriptions.go (3)
  • BifrostTranscriptionRequest (3-10)
  • TranscriptionInput (28-30)
  • TranscriptionParameters (32-49)
core/internal/testutil/utils.go (1)
  • GetProviderVoice (39-87)
core/schemas/speech.go (4)
  • BifrostSpeechRequest (9-16)
  • SpeechParameters (43-58)
  • SpeechVoiceInput (65-68)
  • BifrostSpeechResponse (22-29)
core/internal/testutil/test_retry_framework.go (5)
  • GetTestRetryConfigForScenario (1118-1150)
  • TestRetryContext (168-173)
  • SpeechRetryConfig (216-223)
  • SpeechRetryCondition (144-147)
  • WithSpeechTestRetry (1328-1478)
core/providers/openai/openai.go (2)
core/schemas/chatcompletions.go (1)
  • ChatStreamResponseChoice (783-785)
ui/lib/types/logs.ts (1)
  • ReasoningDetails (127-134)
core/providers/utils/utils.go (1)
core/schemas/bifrost.go (3)
  • Cerebras (47-47)
  • Perplexity (46-46)
  • HuggingFace (51-51)
core/bifrost.go (2)
core/schemas/bifrost.go (1)
  • HuggingFace (51-51)
core/providers/huggingface/huggingface.go (1)
  • NewHuggingFaceProvider (86-120)
core/internal/testutil/account.go (4)
core/schemas/bifrost.go (1)
  • HuggingFace (51-51)
core/schemas/account.go (1)
  • Key (8-20)
core/schemas/provider.go (5)
  • ProviderConfig (267-276)
  • NetworkConfig (48-56)
  • DefaultRequestTimeoutInSeconds (15-15)
  • ConcurrencyAndBufferSize (131-134)
  • Provider (314-361)
core/internal/testutil/cross_provider_scenarios.go (1)
  • ProviderConfig (45-53)
core/providers/huggingface/huggingface_test.go (5)
core/internal/testutil/setup.go (1)
  • SetupTest (51-60)
core/internal/testutil/account.go (2)
  • ComprehensiveTestConfig (60-82)
  • TestScenarios (22-57)
core/schemas/bifrost.go (2)
  • HuggingFace (51-51)
  • Fallback (149-152)
core/schemas/models.go (1)
  • Model (109-129)
core/internal/testutil/tests.go (1)
  • RunAllComprehensiveTests (15-78)
core/providers/huggingface/types.go (1)
core/schemas/chatcompletions.go (3)
  • ChatStreamOptions (237-240)
  • ChatTool (252-257)
  • ChatToolChoiceFunction (444-446)
🪛 LanguageTool
docs/contributing/adding-a-provider.mdx

[uncategorized] ~1785-~1785: The official name of this software platform is spelled with a capital “H”.
Context: ....yml- ✅ Environment variable added to.github/workflows/release-pipeline.yml` (4 jobs...

(GITHUB)


[grammar] ~2000-~2000: Use a hyphen to join words.
Context: ...ror or incorrect stream handling Tool calling tests fail: - Solution: Ve...

(QB_NEW_EN_HYPHEN)

🔇 Additional comments (29)
core/providers/gemini/speech.go (2)

8-8: LGTM: Import added to support shared utility.

The import of the utils package correctly supports the refactored MIME type detection on line 175.


175-175: LGTM: Good refactoring to consolidate MIME type detection.

The change to use utils.DetectAudioMimeType consolidates audio MIME type detection logic into a shared utility, reducing code duplication across providers. The utility function (in core/providers/utils/audio.go) provides comprehensive format detection for WAV, MP3, AAC, AIFF, FLAC, and OGG with a sensible fallback.

ui/README.md (4)

12-12: Verify the provider count is accurate.

Line 12 references "15+ AI providers" in the link text. Please confirm this count reflects the addition of the HuggingFace provider in this PR and remains current.


53-53: Architecture documentation aligns well with provider changes.

The addition of "Redux Toolkit with RTK Query" in the Technology Stack is well-timed with the expanded provider system (HuggingFace) and provides clear context for developers integrating with the backend.


166-169: "Adding New Features" section properly updated for RTK Query.

The steps now correctly reflect RTK Query-based API state management and React hooks for local state, aligning with the backend provider architecture changes in this PR.


133-155: RTK Query example is clear and practical.

The code example demonstrates useGetLogsQuery and useCreateProviderMutation with proper error handling via getErrorMessage. This is a helpful reference for developers and aligns with the new provider integration patterns introduced in this PR.

core/schemas/mux.go (2)

1155-1160: LGTM: Reasoning-only response support implemented correctly.

The introduction of hasContent and hasReasoning guards effectively handles models that emit reasoning without visible text. The outer condition hasContent || (hasReasoning && !state.TextItemAdded) ensures a text item is created on the first chunk for reasoning-only responses, while subsequent reasoning chunks correctly skip text item creation and are handled separately (lines 1382-1393).


1409-1457: LGTM: Text item closure correctly handles reasoning-only responses.

The modification to close text items "regardless of whether it has content" (line 1410) is essential for reasoning-only responses. Previously, items with no visible content deltas would remain unclosed, violating lifecycle expectations. This ensures proper cleanup and response completion even when only reasoning/thought content is emitted.

docs/contributing/adding-a-provider.mdx (1)

7-13: LGTM! Clear quick reference for contributors.

The note providing quick references to existing implementations (OpenAI-compatible: cerebras/groq, Custom API: huggingface/anthropic) is helpful for new contributors to understand patterns quickly.

core/internal/testutil/account.go (4)

114-114: LGTM! HuggingFace correctly added to configured providers.

The provider is appropriately placed in the list alongside other cloud providers.


327-334: LGTM! HuggingFace key configuration is consistent.

The API key is correctly sourced from HUGGING_FACE_API_KEY environment variable. The absence of UseForBatchAPI is appropriate since HuggingFace doesn't support the batch API.


589-601: LGTM! Network configuration is well-tuned for HuggingFace.

The 300-second timeout appropriately handles cold starts for serverless inference endpoints, and the retry configuration (10 retries, 2s-30s backoff) matches other cloud providers for consistent resilience.


1020-1053: LGTM! Comprehensive test scenario configuration for HuggingFace.

The test configuration appropriately:

  • Uses inference provider routing format for models (e.g., groq/openai/gpt-oss-120b)
  • Enables supported features (chat, streaming, tool calls, vision, embeddings, transcription, speech)
  • Disables unsupported features (text completion, multiple tool calls, streaming transcription/speech)
  • Includes OpenAI fallback for resilience

The MultipleToolCalls: false while ToolCalls: true suggests HuggingFace supports single tool calls but not parallel tool calling, which is a reasonable limitation.

core/internal/testutil/transcription.go (2)

73-97: LGTM! Fixture-based approach for Fal-AI format incompatibility is sound.

The comment clearly explains the technical limitation: Fal-AI speech models return WAV format, but transcription requires MP3. Using pre-generated fixtures is a pragmatic workaround. Error handling with t.Fatalf ensures tests fail fast if fixtures are missing.


277-277: Blank identifier for second return value is correct.

Based on learnings, GenerateTTSAudioForTest returns ([]byte, string) and handles errors internally via t.Fatalf(), so the blank identifier is appropriate here.

core/providers/huggingface/types.go (5)

32-52: LGTM! Flexible UnmarshalJSON handles API response variations.

The implementation correctly handles both the array format [...] (current API) and the object format {"models": [...]} (potential legacy/alternate format), providing backwards compatibility.


99-130: LGTM! HuggingFaceToolChoice correctly implements flexible enum/object pattern.

The implementation properly handles the tool_choice field which can be either an enum string ("auto", "none", "required") or a function object. Reusing schemas.ChatToolChoiceFunction for the function sub-object maintains consistency with the core schemas.


211-254: LGTM! InputsCustomType handles HuggingFace's flexible input formats.

The implementation correctly supports:

  • String input (single text)
  • Array input (multiple texts)
  • Object input (fallback to struct fields)

Both MarshalJSON and UnmarshalJSON are symmetric and handle all cases appropriately.


347-364: LGTM! UnmarshalJSON now properly returns error on invalid input.

The implementation correctly returns a descriptive error when the input is neither a boolean nor a string, addressing the previous review comment about failing fast on invalid data.


174-209: LGTM! MarshalJSON correctly dereferences pointer fields.

The implementation properly uses *r.Provider, *r.Model, etc. instead of wrapping pointers again, addressing the previous double-pointer issue.

.github/workflows/release-pipeline.yml (1)

90-121: HUGGING_FACE_API_KEY wiring into release jobs looks consistent.

All four release jobs now receive HUGGING_FACE_API_KEY from secrets.HUGGING_FACE_API_KEY, matching the rest of the repo’s naming and usage. No issues from a workflow or secret‑handling perspective.

Also applies to: 165-195, 242-271, 327-360

core/schemas/bifrost.go (1)

35-52: HuggingFace provider registration is coherent with existing enums and lists.

Adding HuggingFace to ModelProvider, SupportedBaseProviders, and StandardProviders is consistent with how other providers are declared and exposed; nothing else appears missing here.

Also applies to: 55-63, 66-85

core/providers/utils/utils.go (1)

1045-1056: Extending non‑[DONE] behavior to HuggingFace is reasonable.

Treating schemas.HuggingFace as a provider that ends streams on finish_reason instead of [DONE] aligns with Cerebras/Perplexity handling and integrates cleanly with existing streaming logic.

core/providers/openai/openai.go (1)

1047-1054: Streaming now correctly emits reasoning‑only chat deltas.

Including Delta.Reasoning and Delta.ReasoningDetails in the emission predicate ensures reasoning‑only chunks are no longer silently skipped while preserving existing content/audio/tool‑call behavior. Consider adding/expanding a ChatCompletionStream test that feeds a reasoning‑only SSE delta to lock this in.

core/providers/huggingface/huggingface_test.go (1)

12-75: Comprehensive HuggingFace test configuration is well‑scoped.

Env‑gated test setup, use of testutil.ComprehensiveTestConfig with realistic models, and selectively enabled scenarios all align with existing provider tests and should give good coverage without over‑reaching unsupported flows.

core/bifrost.go (1)

17-39: HuggingFace provider wiring into the factory looks consistent.

Import and createBaseProvider case for schemas.HuggingFace match the existing provider pattern (construct and return the concrete provider, no extra side effects). No issues from the core wiring side.

Also applies to: 1854-1893

core/providers/huggingface/embedding.go (1)

10-69: Embedding request/response conversions look robust and align with HuggingFace shapes.

The request converter correctly distinguishes hf-inference vs other providers (using Inputs vs Input) and surfaces HF‑specific params from ExtraParams. The response unmarshal path defensively handles object, 2D, and 1D array formats and normalizes everything into BifrostEmbeddingResponse with sane defaults. No issues from this implementation.

Also applies to: 71-159

core/providers/huggingface/huggingface.go (2)

132-211: Core HTTP / retry plumbing for HuggingFace provider looks solid.

completeRequestWithRetry + completeRequest cleanly separate 404/model‑alias retry from generic HTTP/error handling, and the decoded body is safely copied before response release. Combined with listModelsByKey’s fan‑out aggregation and latency tracking, this gives you a good base for all operations. No changes needed here.

Also applies to: 236-271, 369-419


916-999: Embedding, Speech, and Transcription entrypoints are consistent with the shared helpers.

These methods correctly:

  • Guard operations via CheckOperationAllowed.
  • Use splitIntoModelProvider once per call and feed completeRequestWithRetry with the appropriate requiredTask.
  • Convert to/from HuggingFace types via the dedicated converters and wrap errors with NewBifrostOperationError.
  • Attach provider/model/request‑type/latency and optional raw request/response into ExtraFields.

Aside from the separate falAI transcription concern called out in transcription.go, this surface looks coherent and aligned with the rest of Bifrost’s provider implementations.

Also applies to: 1001-1081, 1087-1173

@qwerty-dvorak qwerty-dvorak force-pushed the 12-05-feat_huggingface_provider_added branch 2 times, most recently from 4c33aab to bb7d1a9 Compare December 18, 2025 13:49
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (1)
core/providers/huggingface/transcription.go (1)

37-47: fal-ai transcription request missing required Model and Provider fields.

The fal-ai branch (lines 44-46) only sets AudioURL but doesn't include Model and Provider fields, which the fal-ai API requires according to past review analysis. The non-fal-ai branch correctly sets these fields (lines 32-36).

🔎 Apply this diff to add the missing fields:
 		hfRequest = &HuggingFaceTranscriptionRequest{
 			AudioURL: encoded,
+			Model:    schemas.Ptr(modelName),
+			Provider: schemas.Ptr(string(inferenceProvider)),
 		}
🧹 Nitpick comments (7)
core/internal/testutil/responses_stream.go (1)

693-694: LGTM! Threshold increase accommodates more verbose streaming responses.

Increasing the safety guard from 100 to 300 is appropriate for the lifecycle test, which validates the complete sequence of streaming events and may legitimately produce more chunks, especially with the new HuggingFace provider.

Optional observation: Other streaming tests have lower thresholds (line 394: 100 chunks, line 527: 150 chunks). If you observe similar threshold issues with those tests for verbose providers, consider adjusting them as well. However, the current values may be intentionally tuned to each test's expected complexity.

core/schemas/mux.go (1)

1214-1241: Simplify the emission condition and verify empty delta handling.

The delta emission logic correctly handles reasoning-only responses by emitting an empty delta on the first chunk. However, the condition at line 1216 contains a redundant term that can be simplified for clarity.

🔎 Simplify the condition at line 1216

The condition hasContent || (!state.TextItemHasContent && (hasReasoning || hasContent)) can be simplified because hasContent appears in both the outer OR and the inner expression:

-		if hasContent || (!state.TextItemHasContent && (hasReasoning || hasContent)) {
+		if hasContent || (!state.TextItemHasContent && hasReasoning) {

This simplification makes the intent clearer: emit a delta if there's content, OR if this is the first delta and there's reasoning (for reasoning-only responses).

Additionally, verify that downstream consumers (clients, proxies, other providers) correctly handle empty content deltas for reasoning-only responses, as this is a relatively uncommon edge case.

#!/bin/bash
# Description: Search for code that processes output_text.delta events to verify empty delta handling

# Search for delta processing logic
rg -n -C5 'output_text\.delta|OutputTextDelta|response\.Delta' --type go -g '!*_test.go' -g '!core/schemas/'
core/providers/utils/audio.go (1)

76-98: Consider adding an inline comment explaining the 0xF6 mask.

The implementation is correct. Based on learnings, the 0xF6 mask at line 96 is intentionally stricter than the standard 0xF0 to check both the sync word (top 4 bits = 0xF) and the Layer field bits (bits 2-1 = 00), preventing MP3 Layer III (which has Layer bits = 11) from being misidentified as AAC.

🔎 Consider adding a comment to document this design decision:
 	// AAC: ADIF or ADTS (0xFFF sync) - check before MP3 frame sync to avoid misclassification
 	if bytes.HasPrefix(audioData, adif) {
 		return "audio/aac"
 	}
+	// ADTS: 0xFF followed by top 4 bits = 0xF and Layer bits (2-1) = 00
+	// Mask 0xF6 prevents MP3 Layer III (Layer bits = 11) from being misidentified as AAC
 	if len(audioData) >= 2 && audioData[0] == 0xFF && (audioData[1]&0xF6) == 0xF0 {
 		return "audio/aac"
 	}
core/schemas/transcriptions.go (1)

37-40: LGTM! HuggingFace transcription parameters added correctly.

The four new optional fields (MaxLength, MinLength, MaxNewTokens, MinNewTokens) are properly typed as integer pointers with appropriate JSON tags. The inline comments indicating "used by HuggingFace" address the previous review feedback and make the provider association clear.

Optional: Consider enhancing comments with brief functional descriptions

While the current comments clearly indicate these parameters are HuggingFace-specific, adding brief functional descriptions could improve developer understanding. For example:

-	MaxLength      *int    `json:"max_length,omitempty"`      // Maximum length of the transcription used by HuggingFace
+	MaxLength      *int    `json:"max_length,omitempty"`      // Maximum length of the generated transcription (HuggingFace)
-	MinLength      *int    `json:"min_length,omitempty"`      // Minimum length of the transcription used by HuggingFace
+	MinLength      *int    `json:"min_length,omitempty"`      // Minimum length of the generated transcription (HuggingFace)
-	MaxNewTokens   *int    `json:"max_new_tokens,omitempty"`  // Maximum new tokens to generate used by HuggingFace
+	MaxNewTokens   *int    `json:"max_new_tokens,omitempty"`  // Maximum new tokens to generate in transcription (HuggingFace)
-	MinNewTokens   *int    `json:"min_new_tokens,omitempty"`  // Minimum new tokens to generate used by HuggingFace
+	MinNewTokens   *int    `json:"min_new_tokens,omitempty"`  // Minimum new tokens to generate in transcription (HuggingFace)

This is purely for clarity and not required.

core/providers/huggingface/utils.go (1)

75-81: Minor maintainability tweaks for provider list and mapping cache.

Two small improvements you may want to consider (non-blocking):

  1. In PROVIDERS_OR_POLICIES (lines 75–81), append the auto constant instead of the string literal "auto" to keep things self‑documenting and avoid accidental divergence:

    out = append(out, auto)
  2. modelProviderMappingCache currently never expires once populated in getModelInferenceProviderMapping. If Hugging Face changes mappings at runtime, a long‑lived process will keep using stale routes. If that’s a concern for you, wrapping this in a small TTL cache (or invalidating on non‑fatal provider errors) would make the behavior more robust, while preserving the existing fast‑path.

Also applies to: 213-267

core/internal/testutil/account.go (1)

95-118: HuggingFace test wiring looks good; consider exercising Responses/Reasoning when stable.

The HuggingFace additions to GetConfiguredProviders, GetKeysForProvider, and GetConfigForProvider look consistent with other cloud providers, and the comprehensive test config covers chat, streaming, vision, embeddings, transcription, and speech.

Given the provider also implements the Responses API, you might eventually want to:

  • Set a ReasoningModel for HuggingFace, and
  • Flip Reasoning to true in the HuggingFace TestScenarios

so Responses/Reasoning flows get the same coverage as other providers once you’re confident in that path.

Also applies to: 327-335, 589-602, 1020-1053

core/providers/huggingface/types.go (1)

234-234: Optional: Consider removing the unused Extra field.

The Extra field in HuggingFaceSpeechRequest has the json:"-" tag, meaning it's not serialized. If this field isn't being used elsewhere in the codebase, removing it would improve code clarity. If it's reserved for future use, adding a brief comment explaining its purpose would be helpful.

📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 55ca6ee and bb7d1a9.

⛔ Files ignored due to path filters (6)
  • core/internal/testutil/scenarios/media/Numbers_And_Punctuation.mp3 is excluded by !**/*.mp3
  • core/internal/testutil/scenarios/media/RoundTrip_Basic_MP3.mp3 is excluded by !**/*.mp3
  • core/internal/testutil/scenarios/media/RoundTrip_Medium_MP3.mp3 is excluded by !**/*.mp3
  • core/internal/testutil/scenarios/media/RoundTrip_Technical_MP3.mp3 is excluded by !**/*.mp3
  • core/internal/testutil/scenarios/media/Technical_Terms.mp3 is excluded by !**/*.mp3
  • ui/package-lock.json is excluded by !**/package-lock.json
📒 Files selected for processing (36)
  • .github/workflows/pr-tests.yml (1 hunks)
  • .github/workflows/release-pipeline.yml (4 hunks)
  • core/bifrost.go (2 hunks)
  • core/internal/testutil/account.go (4 hunks)
  • core/internal/testutil/responses_stream.go (1 hunks)
  • core/internal/testutil/transcription.go (6 hunks)
  • core/providers/gemini/speech.go (1 hunks)
  • core/providers/gemini/transcription.go (2 hunks)
  • core/providers/gemini/utils.go (0 hunks)
  • core/providers/huggingface/chat.go (1 hunks)
  • core/providers/huggingface/embedding.go (1 hunks)
  • core/providers/huggingface/huggingface.go (1 hunks)
  • core/providers/huggingface/huggingface_test.go (1 hunks)
  • core/providers/huggingface/models.go (1 hunks)
  • core/providers/huggingface/responses.go (1 hunks)
  • core/providers/huggingface/speech.go (1 hunks)
  • core/providers/huggingface/transcription.go (1 hunks)
  • core/providers/huggingface/types.go (1 hunks)
  • core/providers/huggingface/utils.go (1 hunks)
  • core/providers/openai/openai.go (1 hunks)
  • core/providers/utils/audio.go (1 hunks)
  • core/providers/utils/utils.go (1 hunks)
  • core/schemas/account.go (2 hunks)
  • core/schemas/bifrost.go (3 hunks)
  • core/schemas/mux.go (3 hunks)
  • core/schemas/transcriptions.go (1 hunks)
  • docs/apis/openapi.json (1 hunks)
  • docs/contributing/adding-a-provider.mdx (1 hunks)
  • docs/docs.json (1 hunks)
  • docs/features/providers/huggingface.mdx (1 hunks)
  • docs/features/providers/providers-unified-interface.mdx (2 hunks)
  • transports/config.schema.json (2 hunks)
  • ui/README.md (6 hunks)
  • ui/lib/constants/config.ts (2 hunks)
  • ui/lib/constants/icons.tsx (1 hunks)
  • ui/lib/constants/logs.ts (2 hunks)
💤 Files with no reviewable changes (1)
  • core/providers/gemini/utils.go
🚧 Files skipped from review as they are similar to previous changes (15)
  • core/providers/gemini/speech.go
  • core/schemas/bifrost.go
  • core/bifrost.go
  • docs/apis/openapi.json
  • docs/docs.json
  • ui/lib/constants/logs.ts
  • core/providers/gemini/transcription.go
  • docs/features/providers/huggingface.mdx
  • core/providers/huggingface/huggingface_test.go
  • core/providers/huggingface/chat.go
  • core/schemas/account.go
  • ui/lib/constants/config.ts
  • core/providers/openai/openai.go
  • core/providers/huggingface/responses.go
  • core/providers/huggingface/speech.go
👮 Files not reviewed due to content moderation or server errors (1)
  • ui/lib/constants/icons.tsx
🧰 Additional context used
📓 Path-based instructions (1)
**

⚙️ CodeRabbit configuration file

always check the stack if there is one for the current PR. do not give localized reviews for the PR, always see all changes in the light of the whole stack of PRs (if there is a stack, if there is no stack you can continue to make localized suggestions/reviews)

Files:

  • core/providers/utils/utils.go
  • core/providers/utils/audio.go
  • core/schemas/transcriptions.go
  • transports/config.schema.json
  • ui/lib/constants/icons.tsx
  • docs/features/providers/providers-unified-interface.mdx
  • core/internal/testutil/responses_stream.go
  • core/internal/testutil/transcription.go
  • core/schemas/mux.go
  • core/providers/huggingface/models.go
  • core/providers/huggingface/transcription.go
  • ui/README.md
  • core/providers/huggingface/embedding.go
  • docs/contributing/adding-a-provider.mdx
  • core/providers/huggingface/utils.go
  • core/internal/testutil/account.go
  • core/providers/huggingface/huggingface.go
  • core/providers/huggingface/types.go
🧠 Learnings (5)
📚 Learning: 2025-12-09T17:07:42.007Z
Learnt from: qwerty-dvorak
Repo: maximhq/bifrost PR: 1006
File: core/schemas/account.go:9-18
Timestamp: 2025-12-09T17:07:42.007Z
Learning: In core/schemas/account.go, the HuggingFaceKeyConfig field within the Key struct is currently unused and reserved for future Hugging Face inference endpoint deployments. Do not flag this field as missing from OpenAPI documentation or require its presence in the API spec until the feature is actively implemented and used. When the feature is added, update the OpenAPI docs accordingly; otherwise, treat this field as non-breaking and not part of the current API surface.

Applied to files:

  • core/providers/utils/utils.go
  • core/providers/utils/audio.go
  • core/schemas/transcriptions.go
  • core/internal/testutil/responses_stream.go
  • core/internal/testutil/transcription.go
  • core/schemas/mux.go
  • core/providers/huggingface/models.go
  • core/providers/huggingface/transcription.go
  • core/providers/huggingface/embedding.go
  • core/providers/huggingface/utils.go
  • core/internal/testutil/account.go
  • core/providers/huggingface/huggingface.go
  • core/providers/huggingface/types.go
📚 Learning: 2025-09-01T15:29:17.076Z
Learnt from: TejasGhatte
Repo: maximhq/bifrost PR: 372
File: core/providers/utils.go:552-598
Timestamp: 2025-09-01T15:29:17.076Z
Learning: The detectAudioMimeType function in core/providers/utils.go is specifically designed for Gemini's audio format support, which only includes: WAV (audio/wav), MP3 (audio/mp3), AIFF (audio/aiff), AAC (audio/aac), OGG Vorbis (audio/ogg), and FLAC (audio/flac). The implementation should remain focused on these specific formats rather than being over-engineered for general-purpose audio detection.

Applied to files:

  • core/providers/utils/audio.go
📚 Learning: 2025-12-10T15:15:14.041Z
Learnt from: qwerty-dvorak
Repo: maximhq/bifrost PR: 1006
File: core/providers/utils/audio.go:92-98
Timestamp: 2025-12-10T15:15:14.041Z
Learning: In core/providers/utils/audio.go, within DetectAudioMimeType, use a mask of 0xF6 for ADTS sync detection instead of the standard 0xF0. This stricter check validates that the top nibble is 0xF and the Layer field bits (bits 2-1) are 00, preventing MP3 Layer III (Layer bits 11) from being misidentified as AAC. Ensure unit tests cover this behavior and document the rationale in code comments.

Applied to files:

  • core/providers/utils/audio.go
📚 Learning: 2025-12-15T10:19:32.071Z
Learnt from: qwerty-dvorak
Repo: maximhq/bifrost PR: 1006
File: core/internal/testutil/transcription.go:241-258
Timestamp: 2025-12-15T10:19:32.071Z
Learning: In core/internal/testutil package, the function GenerateTTSAudioForTest returns ([]byte, string) not ([]byte, error), and it already handles errors internally by calling t.Fatalf(). Therefore, the blank identifier for the second return value is correct and no external error checking is needed.

Applied to files:

  • core/internal/testutil/transcription.go
📚 Learning: 2025-12-15T10:34:13.855Z
Learnt from: qwerty-dvorak
Repo: maximhq/bifrost PR: 1006
File: core/providers/huggingface/chat.go:370-389
Timestamp: 2025-12-15T10:34:13.855Z
Learning: In the HuggingFace provider (core/providers/huggingface/), streaming tool call data arrives as a single delta chunk with all fields (id, type, function.name, function.arguments) present at once. Do not implement accumulation logic across deltas for streaming tool calls in this codebase; rely on single-chunk data as the guaranteed behavior. If there is existing accumulation logic in this directory, remove it or guard it behind this assumption.

Applied to files:

  • core/providers/huggingface/models.go
  • core/providers/huggingface/transcription.go
  • core/providers/huggingface/embedding.go
  • core/providers/huggingface/utils.go
  • core/providers/huggingface/huggingface.go
  • core/providers/huggingface/types.go
🧬 Code graph analysis (7)
core/providers/utils/utils.go (1)
core/schemas/bifrost.go (3)
  • Cerebras (47-47)
  • Perplexity (46-46)
  • HuggingFace (51-51)
core/internal/testutil/transcription.go (5)
core/schemas/transcriptions.go (3)
  • BifrostTranscriptionRequest (3-10)
  • TranscriptionInput (28-30)
  • TranscriptionParameters (32-49)
core/internal/testutil/utils.go (2)
  • GetProviderVoice (39-87)
  • GetErrorMessage (642-675)
core/schemas/speech.go (4)
  • BifrostSpeechRequest (9-16)
  • SpeechParameters (43-58)
  • SpeechVoiceInput (65-68)
  • BifrostSpeechResponse (22-29)
core/internal/testutil/test_retry_framework.go (5)
  • GetTestRetryConfigForScenario (1118-1150)
  • TestRetryContext (168-173)
  • SpeechRetryConfig (216-223)
  • SpeechRetryCondition (144-147)
  • WithSpeechTestRetry (1328-1478)
core/internal/testutil/validation_presets.go (1)
  • SpeechExpectations (146-162)
core/schemas/mux.go (2)
core/schemas/responses.go (2)
  • BifrostResponsesStreamResponse (1440-1479)
  • ResponsesStreamResponseTypeOutputTextDelta (1388-1388)
core/schemas/utils.go (1)
  • Ptr (16-18)
core/providers/huggingface/models.go (3)
core/providers/huggingface/types.go (1)
  • HuggingFaceListModelsResponse (28-30)
core/schemas/bifrost.go (9)
  • ModelProvider (32-32)
  • RequestType (88-88)
  • ChatCompletionRequest (94-94)
  • ChatCompletionStreamRequest (95-95)
  • ResponsesRequest (96-96)
  • ResponsesStreamRequest (97-97)
  • EmbeddingRequest (98-98)
  • SpeechRequest (99-99)
  • TranscriptionRequest (101-101)
core/schemas/models.go (2)
  • BifrostListModelsResponse (36-45)
  • Model (109-129)
core/providers/huggingface/transcription.go (5)
core/schemas/transcriptions.go (3)
  • BifrostTranscriptionRequest (3-10)
  • BifrostTranscriptionResponse (16-26)
  • TranscriptionSegment (87-98)
core/providers/huggingface/types.go (5)
  • HuggingFaceTranscriptionRequest (258-264)
  • HuggingFaceTranscriptionRequestParameters (267-270)
  • HuggingFaceTranscriptionGenerationParameters (273-290)
  • HuggingFaceTranscriptionEarlyStopping (294-297)
  • HuggingFaceTranscriptionResponse (330-333)
core/providers/utils/audio.go (1)
  • DetectAudioMimeType (78-119)
core/schemas/utils.go (1)
  • SafeExtractIntPointer (486-494)
core/schemas/bifrost.go (2)
  • BifrostResponseExtraFields (394-405)
  • HuggingFace (51-51)
core/providers/huggingface/huggingface.go (4)
core/providers/utils/audio.go (1)
  • DetectAudioMimeType (78-119)
core/providers/huggingface/chat.go (1)
  • ToHuggingFaceChatCompletionRequest (11-106)
core/providers/huggingface/embedding.go (2)
  • ToHuggingFaceEmbeddingRequest (11-78)
  • UnmarshalHuggingFaceEmbeddingResponse (82-168)
core/providers/huggingface/speech.go (1)
  • ToHuggingFaceSpeechRequest (9-96)
core/providers/huggingface/types.go (1)
core/schemas/chatcompletions.go (3)
  • ChatStreamOptions (237-240)
  • ChatTool (252-257)
  • ChatToolChoiceFunction (444-446)
🪛 LanguageTool
docs/contributing/adding-a-provider.mdx

[uncategorized] ~1785-~1785: The official name of this software platform is spelled with a capital “H”.
Context: ....yml- ✅ Environment variable added to.github/workflows/release-pipeline.yml` (4 jobs...

(GITHUB)


[grammar] ~2000-~2000: Use a hyphen to join words.
Context: ...ror or incorrect stream handling Tool calling tests fail: - Solution: Ve...

(QB_NEW_EN_HYPHEN)

🔇 Additional comments (31)
core/schemas/mux.go (3)

1155-1159: LGTM! Correct gating logic for reasoning-only responses.

The introduction of hasContent and hasReasoning flags with the conditional hasContent || (hasReasoning && !state.TextItemAdded) correctly handles both content-only and reasoning-only streaming responses. The text item is created on the first chunk when either content or reasoning is present.


1410-1411: LGTM! Correct lifecycle handling for reasoning-only responses.

The change to close the text item regardless of whether it has content is necessary to support reasoning-only responses. This ensures the proper lifecycle sequence (added → deltas → done) is completed even when only empty deltas are emitted.


1155-1241: Well-structured solution for reasoning-only streaming responses.

The changes cohesively address the requirement to support reasoning-only models in the Chat-to-Responses streaming conversion:

  1. Gating flags (hasContent, hasReasoning) cleanly separate concerns
  2. First-chunk semantics correctly create text items for reasoning-only responses
  3. Empty delta emission satisfies lifecycle requirements (at least one delta per item)
  4. Item closure updated to work regardless of content presence

The state tracking (TextItemAdded, TextItemHasContent, TextItemClosed) properly coordinates the lifecycle events.

Also applies to: 1410-1411

.github/workflows/pr-tests.yml (1)

118-118: LGTM!

The HUGGING_FACE_API_KEY environment variable is correctly added following the established pattern for other provider API keys. This enables CI tests to run the new HuggingFace provider test suite.

.github/workflows/release-pipeline.yml (4)

118-118: LGTM!

The HUGGING_FACE_API_KEY is correctly propagated to the core-release job, following the established pattern for other provider API keys.


193-193: LGTM!

Consistent propagation of HUGGING_FACE_API_KEY to the framework-release job.


270-270: LGTM!

Consistent propagation of HUGGING_FACE_API_KEY to the plugins-release job.


359-359: LGTM!

Consistent propagation of HUGGING_FACE_API_KEY to the bifrost-http-release job.

core/providers/utils/audio.go (2)

64-74: LGTM!

The header constants are well-organized and use idiomatic Go byte slices for efficient prefix matching.


99-119: LGTM!

The remaining format detection logic is correct:

  • AIFF/AIFC detection properly handles both variants
  • FLAC and OGG detection use correct magic bytes
  • MP3 frame sync detection covers common MPEG audio variants (0xFB, 0xF3, 0xF2, 0xFA)
  • The ordering (AAC before MP3 frame sync) prevents misclassification

The fallback to audio/mp3 is a reasonable default for unrecognized formats.

core/providers/utils/utils.go (1)

1048-1057: Verify HuggingFace Inference API stream termination behavior.

Adding schemas.HuggingFace to providers that don't send [DONE] markers follows an established pattern for non-standard streaming APIs. The comment correctly lists all three providers.

However, verification is needed: confirm that HuggingFace's Inference API actually terminates streams via finish_reason rather than [DONE] markers, as this behavior is not explicitly documented in public HuggingFace API references. Consider adding test coverage or inline documentation referencing the specific API behavior that justifies this classification.

docs/contributing/adding-a-provider.mdx (1)

424-527: HuggingFace chat converter example is now consistent and accurate.

The ToHuggingFaceChatCompletionRequest snippet uses hfReq consistently and mirrors the actual converter pattern well (messages, content blocks, tool calls, params). It’s a solid reference for new providers.

ui/README.md (1)

3-18: UI README accurately reflects the current architecture and usage.

The updated description (Next.js + RTK Query/Redux, websocket logging, provider/MCP/plugin docs links) aligns with the rest of the PR and centralizes details in docs instead of duplicating them. Looks good.

Also applies to: 46-57, 129-155

transports/config.schema.json (1)

96-151: Schema wiring for HuggingFace provider and semantic cache looks consistent.

Adding "huggingface" under providers with the base provider schema and including it in the semantic cache provider enum cleanly aligns config with the new core provider. No issues from a schema/interop perspective.

Also applies to: 813-833

core/internal/testutil/transcription.go (2)

73-97: LGTM on the Fal-AI/HuggingFace fixture handling path.

The conditional handling correctly addresses the format incompatibility between Fal-AI models (which only return WAV) and the test requirements (which need MP3). The error handling with t.Fatalf ensures tests fail fast when fixtures are missing.


98-178: TTS generation and transcription request construction looks good.

The retry framework integration, temp file management with cleanup, and transcription request construction follow established patterns in the codebase.

core/providers/huggingface/models.go (2)

16-44: Model conversion logic is well-structured.

The function correctly:

  • Handles nil response
  • Filters models without IDs or supported methods
  • Constructs consistent composite model IDs
  • Pre-allocates slice capacity for efficiency

11-14: Constants are actively used in utils.go for enforcing model fetch limits—no action needed.

The constants defaultModelFetchLimit and maxModelFetchLimit are referenced in core/providers/huggingface/utils.go (lines 90, 92–93) to enforce minimum and maximum bounds on the model fetch limit parameter.

core/providers/huggingface/embedding.go (2)

11-78: Request conversion logic is well-implemented.

The function correctly:

  • Handles nil input gracefully
  • Differentiates between hfInference (using Inputs) and other providers (using Input)
  • Maps standard parameters and provider-specific ExtraParams
  • Uses InputsCustomType wrapper for flexible input handling

80-168: Response unmarshalling handles multiple HuggingFace response formats gracefully.

The function correctly handles the three known response shapes:

  1. Standard object with data, model, usage fields
  2. 2D array [[float64...], ...] for batch embeddings
  3. 1D array [float64...] for single embeddings

The float64→float32 conversion is appropriate for embedding storage efficiency.

core/providers/huggingface/transcription.go (1)

121-160: Response conversion is well-structured.

The function correctly:

  • Validates non-nil response and non-empty model
  • Maps chunks to transcription segments with proper timestamp handling
  • Sets appropriate ExtraFields for provider tracking
core/providers/huggingface/huggingface.go (9)

19-118: Provider struct and initialization are well-designed.

Good practices observed:

  • Sync pools with pre-warming for reduced GC pressure
  • Proper timeout configuration from network config
  • Thread-safe model mapping cache using sync.Map
  • Proxy configuration support

131-207: Cache-first retry pattern is appropriate.

The design correctly prioritizes cache hits (most common case) over immediate validation, only paying the cost of re-validation on 404 (cache miss). This is an intentional optimization per prior discussion.


209-269: Request execution with proper resource management.

The function correctly:

  • Sets appropriate Content-Type based on request type
  • Handles error responses with guarded message overwrites
  • Copies response body before releasing fasthttp resources to prevent use-after-free

271-436: Concurrent model listing implementation is robust.

Good patterns:

  • Parallel fetching with proper synchronization (WaitGroup + channel)
  • Graceful handling of partial failures (returns first error only if all fail)
  • Average latency calculation across successful requests
  • Correct provider constant usage (past issue addressed)

446-583: Chat completion implementation follows established patterns.

Both sync and streaming paths correctly:

  • Handle model parsing errors with proper error responses
  • Transform model names to HuggingFace format
  • Populate ExtraFields for diagnostics
  • Leverage OpenAI-compatible streaming for efficiency

585-615: Responses API correctly delegates to ChatCompletion.

The implementation properly converts Responses requests to ChatCompletion format and updates the response's RequestType to maintain accurate metadata.


617-700: Embedding implementation is well-structured.

The function correctly:

  • Uses the model alias cache for retry on 404
  • Handles custom response unmarshalling for HuggingFace's varied response formats
  • Tracks raw request/response when enabled

702-875: Speech and Transcription implementations are complete.

Both functions correctly:

  • Handle provider-specific request formats (hf-inference uses raw audio)
  • Include proper input validation (nil checks addressed)
  • Use model alias cache for retry logic
  • Download audio from URLs when needed (Speech)

877-930: Unsupported operation stubs are consistent.

All batch and file operations correctly return NewUnsupportedOperationError with appropriate request type identifiers.

core/providers/huggingface/types.go (1)

1-342: Excellent implementation of comprehensive HuggingFace provider types.

The file is well-structured with clear separation of concerns across models, chat, embeddings, speech, and transcription types. The custom JSON marshaling/unmarshaling logic correctly handles multiple input formats and edge cases. Previous feedback has been properly addressed:

  • HuggingFaceListModelsResponse.UnmarshalJSON (lines 34-52) now handles both array and object responses
  • HuggingFaceTranscriptionEarlyStopping.UnmarshalJSON (line 326) returns proper errors for invalid input
  • Field naming improved (ProviderModelID instead of ProviderModelMapping)

The use of sonic for JSON operations and the flexible type handling (e.g., InputsCustomType, HuggingFaceToolChoice) demonstrate solid design for API integration.

@qwerty-dvorak qwerty-dvorak force-pushed the 12-05-feat_huggingface_provider_added branch 2 times, most recently from b5d2c41 to 338f1a0 Compare December 19, 2025 06:38
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (2)
core/providers/huggingface/transcription.go (1)

37-47: fal-ai branch missing required Model and Provider fields.

The fal-ai branch constructs the request with only AudioURL but omits Model and Provider fields. According to the HuggingFace inference providers documentation, these fields are required for fal-ai API calls.

🔎 Suggested fix
 	} else {
 		encoded := base64.StdEncoding.EncodeToString(request.Input.File)
 		mimeType := getMimeTypeForAudioType(utils.DetectAudioMimeType(request.Input.File))
 		if mimeType == "audio/wav" {
 			return nil, fmt.Errorf("fal-ai provider does not support audio/wav format; please use a different format like mp3 or ogg")
 		}
 		encoded = fmt.Sprintf("data:%s;base64,%s", mimeType, encoded)
 		hfRequest = &HuggingFaceTranscriptionRequest{
 			AudioURL: encoded,
+			Model:    schemas.Ptr(modelName),
+			Provider: schemas.Ptr(string(inferenceProvider)),
 		}
 	}
core/providers/huggingface/types.go (1)

229-235: The Extra field in HuggingFaceSpeechRequest remains unaddressed.

Line 234: The Extra field is still present with type map[string]any and json:"-" tag. A previous review comment recommended either removing this field if unused, or changing it to json.RawMessage for consistency with similar patterns in the codebase. While the Extra fields were removed from HuggingFaceSpeechParameters and HuggingFaceSpeechResponse, this one remains.

🧹 Nitpick comments (5)
docs/contributing/adding-a-provider.mdx (1)

2000-2002: Minor: Use hyphen in compound adjective "Tool-calling".

For grammatical consistency with other compound adjectives in the document (e.g., "OpenAI-compatible"), consider hyphenating "Tool-calling" when used as a compound adjective modifying "tests".

Suggested fix
-**Tool calling tests fail**:
+**Tool-calling tests fail**:
core/schemas/mux.go (1)

1155-1241: LGTM! Well-designed support for reasoning-only responses.

The introduction of hasContent and hasReasoning flags with the updated gating logic correctly handles reasoning-only model responses by:

  • Creating text items even without content when reasoning is present
  • Emitting an empty delta on the first chunk to satisfy lifecycle validation requirements
  • Prioritizing content over empty deltas when both are present

The implementation ensures proper event sequencing for the Responses API streaming format.

💡 Optional simplification of condition on line 1216

The condition on line 1216 contains a redundant hasContent check:

if hasContent || (!state.TextItemHasContent && (hasReasoning || hasContent))

Since hasContent appears in the outer OR, when evaluating the second part, hasContent must be false, making the inner || hasContent redundant. The condition simplifies to:

-if hasContent || (!state.TextItemHasContent && (hasReasoning || hasContent)) {
+if hasContent || (!state.TextItemHasContent && hasReasoning) {

This is purely for readability and does not affect correctness.

core/internal/testutil/transcription.go (1)

261-278: Consider extracting a helper for repeated fixture loading pattern.

The Fal-AI fixture loading pattern is duplicated across multiple test functions (here, lines 369-386, 463-480, 561-578). Consider extracting a helper:

func loadAudioFixture(t *testing.T, fixtureName string) []byte {
    _, filename, _, _ := runtime.Caller(1)
    dir := filepath.Dir(filename)
    filePath := filepath.Join(dir, "scenarios", "media", fixtureName+".mp3")
    data, err := os.ReadFile(filePath)
    if err != nil {
        t.Fatalf("failed to read audio fixture %s: %v", filePath, err)
    }
    return data
}

This is optional since the current implementation works correctly and is test code.

core/providers/huggingface/embedding.go (1)

117-142: Consider edge case handling for empty array responses.

If the HuggingFace API returns an empty array [], the 2D array unmarshal at line 119 would succeed with len(arr2D) == 0, resulting in a response with zero embeddings. This may be intentional, but ensure callers handle empty Data arrays gracefully.

core/providers/huggingface/responses.go (1)

66-84: Consider returning an error when ToBifrostResponsesResponse() returns nil.

If resp.ToBifrostResponsesResponse() at line 76 returns nil (conversion failure), the function silently returns nil, nil, which could mask errors. Consider returning an explicit error to help diagnose conversion issues.

🔎 Suggested improvement
 	responsesResp := resp.ToBifrostResponsesResponse()
-	if responsesResp != nil {
-		responsesResp.ExtraFields.Provider = schemas.HuggingFace
-		responsesResp.ExtraFields.ModelRequested = requestedModel
-		responsesResp.ExtraFields.RequestType = schemas.ResponsesRequest
+	if responsesResp == nil {
+		return nil, fmt.Errorf("failed to convert chat response to responses response")
 	}
+	responsesResp.ExtraFields.Provider = schemas.HuggingFace
+	responsesResp.ExtraFields.ModelRequested = requestedModel
+	responsesResp.ExtraFields.RequestType = schemas.ResponsesRequest
 
 	return responsesResp, nil
📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between bb7d1a9 and 338f1a0.

⛔ Files ignored due to path filters (6)
  • core/internal/testutil/scenarios/media/Numbers_And_Punctuation.mp3 is excluded by !**/*.mp3
  • core/internal/testutil/scenarios/media/RoundTrip_Basic_MP3.mp3 is excluded by !**/*.mp3
  • core/internal/testutil/scenarios/media/RoundTrip_Medium_MP3.mp3 is excluded by !**/*.mp3
  • core/internal/testutil/scenarios/media/RoundTrip_Technical_MP3.mp3 is excluded by !**/*.mp3
  • core/internal/testutil/scenarios/media/Technical_Terms.mp3 is excluded by !**/*.mp3
  • ui/package-lock.json is excluded by !**/package-lock.json
📒 Files selected for processing (38)
  • .github/workflows/pr-tests.yml (1 hunks)
  • .github/workflows/release-pipeline.yml (4 hunks)
  • core/bifrost.go (2 hunks)
  • core/changelog.md (1 hunks)
  • core/internal/testutil/account.go (4 hunks)
  • core/internal/testutil/responses_stream.go (1 hunks)
  • core/internal/testutil/transcription.go (6 hunks)
  • core/providers/gemini/speech.go (1 hunks)
  • core/providers/gemini/transcription.go (2 hunks)
  • core/providers/gemini/utils.go (0 hunks)
  • core/providers/huggingface/chat.go (1 hunks)
  • core/providers/huggingface/embedding.go (1 hunks)
  • core/providers/huggingface/huggingface.go (1 hunks)
  • core/providers/huggingface/huggingface_test.go (1 hunks)
  • core/providers/huggingface/models.go (1 hunks)
  • core/providers/huggingface/responses.go (1 hunks)
  • core/providers/huggingface/speech.go (1 hunks)
  • core/providers/huggingface/transcription.go (1 hunks)
  • core/providers/huggingface/types.go (1 hunks)
  • core/providers/huggingface/utils.go (1 hunks)
  • core/providers/openai/openai.go (1 hunks)
  • core/providers/utils/audio.go (1 hunks)
  • core/providers/utils/utils.go (1 hunks)
  • core/schemas/account.go (2 hunks)
  • core/schemas/bifrost.go (3 hunks)
  • core/schemas/mux.go (3 hunks)
  • core/schemas/transcriptions.go (1 hunks)
  • docs/apis/openapi.json (1 hunks)
  • docs/contributing/adding-a-provider.mdx (1 hunks)
  • docs/docs.json (1 hunks)
  • docs/features/providers/huggingface.mdx (1 hunks)
  • docs/features/providers/providers-unified-interface.mdx (2 hunks)
  • transports/changelog.md (1 hunks)
  • transports/config.schema.json (2 hunks)
  • ui/README.md (6 hunks)
  • ui/lib/constants/config.ts (2 hunks)
  • ui/lib/constants/icons.tsx (1 hunks)
  • ui/lib/constants/logs.ts (2 hunks)
💤 Files with no reviewable changes (1)
  • core/providers/gemini/utils.go
✅ Files skipped from review due to trivial changes (3)
  • transports/changelog.md
  • core/providers/huggingface/utils.go
  • core/changelog.md
🚧 Files skipped from review as they are similar to previous changes (18)
  • docs/features/providers/providers-unified-interface.mdx
  • .github/workflows/release-pipeline.yml
  • core/schemas/transcriptions.go
  • transports/config.schema.json
  • core/providers/utils/audio.go
  • core/internal/testutil/responses_stream.go
  • ui/lib/constants/logs.ts
  • core/providers/openai/openai.go
  • docs/features/providers/huggingface.mdx
  • core/providers/huggingface/speech.go
  • docs/apis/openapi.json
  • core/providers/utils/utils.go
  • core/bifrost.go
  • ui/README.md
  • .github/workflows/pr-tests.yml
  • core/providers/gemini/transcription.go
  • core/providers/huggingface/huggingface_test.go
  • docs/docs.json
👮 Files not reviewed due to content moderation or server errors (1)
  • ui/lib/constants/icons.tsx
🧰 Additional context used
📓 Path-based instructions (1)
**

⚙️ CodeRabbit configuration file

always check the stack if there is one for the current PR. do not give localized reviews for the PR, always see all changes in the light of the whole stack of PRs (if there is a stack, if there is no stack you can continue to make localized suggestions/reviews)

Files:

  • core/schemas/account.go
  • core/providers/gemini/speech.go
  • ui/lib/constants/icons.tsx
  • core/providers/huggingface/chat.go
  • core/providers/huggingface/responses.go
  • core/internal/testutil/transcription.go
  • core/schemas/bifrost.go
  • ui/lib/constants/config.ts
  • core/providers/huggingface/transcription.go
  • core/schemas/mux.go
  • core/providers/huggingface/models.go
  • core/internal/testutil/account.go
  • core/providers/huggingface/embedding.go
  • docs/contributing/adding-a-provider.mdx
  • core/providers/huggingface/huggingface.go
  • core/providers/huggingface/types.go
🧠 Learnings (3)
📚 Learning: 2025-12-09T17:07:42.007Z
Learnt from: qwerty-dvorak
Repo: maximhq/bifrost PR: 1006
File: core/schemas/account.go:9-18
Timestamp: 2025-12-09T17:07:42.007Z
Learning: In core/schemas/account.go, the HuggingFaceKeyConfig field within the Key struct is currently unused and reserved for future Hugging Face inference endpoint deployments. Do not flag this field as missing from OpenAPI documentation or require its presence in the API spec until the feature is actively implemented and used. When the feature is added, update the OpenAPI docs accordingly; otherwise, treat this field as non-breaking and not part of the current API surface.

Applied to files:

  • core/schemas/account.go
  • core/providers/gemini/speech.go
  • core/providers/huggingface/chat.go
  • core/providers/huggingface/responses.go
  • core/internal/testutil/transcription.go
  • core/schemas/bifrost.go
  • core/providers/huggingface/transcription.go
  • core/schemas/mux.go
  • core/providers/huggingface/models.go
  • core/internal/testutil/account.go
  • core/providers/huggingface/embedding.go
  • core/providers/huggingface/huggingface.go
  • core/providers/huggingface/types.go
📚 Learning: 2025-12-15T10:34:13.855Z
Learnt from: qwerty-dvorak
Repo: maximhq/bifrost PR: 1006
File: core/providers/huggingface/chat.go:370-389
Timestamp: 2025-12-15T10:34:13.855Z
Learning: In the HuggingFace provider (core/providers/huggingface/), streaming tool call data arrives as a single delta chunk with all fields (id, type, function.name, function.arguments) present at once. Do not implement accumulation logic across deltas for streaming tool calls in this codebase; rely on single-chunk data as the guaranteed behavior. If there is existing accumulation logic in this directory, remove it or guard it behind this assumption.

Applied to files:

  • core/providers/huggingface/chat.go
  • core/providers/huggingface/responses.go
  • core/providers/huggingface/transcription.go
  • core/providers/huggingface/models.go
  • core/providers/huggingface/embedding.go
  • core/providers/huggingface/huggingface.go
  • core/providers/huggingface/types.go
📚 Learning: 2025-12-15T10:19:32.071Z
Learnt from: qwerty-dvorak
Repo: maximhq/bifrost PR: 1006
File: core/internal/testutil/transcription.go:241-258
Timestamp: 2025-12-15T10:19:32.071Z
Learning: In core/internal/testutil package, the function GenerateTTSAudioForTest returns ([]byte, string) not ([]byte, error), and it already handles errors internally by calling t.Fatalf(). Therefore, the blank identifier for the second return value is correct and no external error checking is needed.

Applied to files:

  • core/internal/testutil/transcription.go
🧬 Code graph analysis (10)
core/schemas/account.go (1)
ui/lib/types/config.ts (3)
  • AzureKeyConfig (23-27)
  • VertexKeyConfig (36-42)
  • BedrockKeyConfig (63-71)
core/providers/gemini/speech.go (1)
core/providers/utils/audio.go (1)
  • DetectAudioMimeType (78-119)
core/providers/huggingface/chat.go (2)
core/schemas/chatcompletions.go (5)
  • BifrostChatRequest (12-19)
  • ChatStreamOptions (237-240)
  • ChatToolChoiceStruct (390-395)
  • ChatToolChoiceTypeFunction (382-382)
  • ChatToolChoiceFunction (444-446)
core/providers/huggingface/types.go (6)
  • HuggingFaceChatRequest (76-94)
  • HuggingFaceResponseFormat (132-135)
  • HuggingFaceToolChoice (99-104)
  • EnumStringTypeAuto (109-109)
  • EnumStringTypeNone (110-110)
  • EnumStringTypeRequired (111-111)
core/internal/testutil/transcription.go (5)
core/schemas/transcriptions.go (3)
  • BifrostTranscriptionRequest (3-10)
  • TranscriptionInput (28-30)
  • TranscriptionParameters (32-49)
core/schemas/bifrost.go (3)
  • HuggingFace (51-51)
  • BifrostError (465-474)
  • SpeechRequest (99-99)
core/internal/testutil/utils.go (4)
  • GetProviderVoice (39-87)
  • GenerateTTSAudioForTest (568-640)
  • TTSTestTextBasic (20-20)
  • TTSTestTextMedium (23-23)
core/schemas/speech.go (4)
  • BifrostSpeechRequest (9-16)
  • SpeechParameters (43-58)
  • SpeechVoiceInput (65-68)
  • BifrostSpeechResponse (22-29)
core/internal/testutil/test_retry_framework.go (5)
  • GetTestRetryConfigForScenario (1118-1150)
  • TestRetryContext (168-173)
  • SpeechRetryConfig (216-223)
  • SpeechRetryCondition (144-147)
  • WithSpeechTestRetry (1328-1478)
core/schemas/bifrost.go (1)
ui/lib/types/config.ts (1)
  • ModelProvider (197-200)
core/providers/huggingface/transcription.go (6)
core/schemas/transcriptions.go (3)
  • BifrostTranscriptionRequest (3-10)
  • BifrostTranscriptionResponse (16-26)
  • TranscriptionSegment (87-98)
core/providers/huggingface/types.go (5)
  • HuggingFaceTranscriptionRequest (258-264)
  • HuggingFaceTranscriptionRequestParameters (267-270)
  • HuggingFaceTranscriptionGenerationParameters (273-290)
  • HuggingFaceTranscriptionEarlyStopping (294-297)
  • HuggingFaceTranscriptionResponse (330-333)
core/schemas/models.go (1)
  • Model (109-129)
core/providers/utils/audio.go (1)
  • DetectAudioMimeType (78-119)
core/schemas/utils.go (1)
  • SafeExtractIntPointer (486-494)
core/schemas/bifrost.go (2)
  • BifrostResponseExtraFields (394-405)
  • HuggingFace (51-51)
core/schemas/mux.go (2)
core/schemas/responses.go (2)
  • BifrostResponsesStreamResponse (1440-1479)
  • ResponsesStreamResponseTypeOutputTextDelta (1388-1388)
core/schemas/utils.go (1)
  • Ptr (16-18)
core/internal/testutil/account.go (3)
core/schemas/bifrost.go (2)
  • HuggingFace (51-51)
  • Fallback (149-152)
core/schemas/provider.go (4)
  • ProviderConfig (267-276)
  • NetworkConfig (48-56)
  • ConcurrencyAndBufferSize (131-134)
  • Provider (314-361)
core/internal/testutil/cross_provider_scenarios.go (1)
  • ProviderConfig (45-53)
core/providers/huggingface/embedding.go (4)
core/schemas/embedding.go (4)
  • BifrostEmbeddingRequest (9-16)
  • BifrostEmbeddingResponse (22-28)
  • EmbeddingData (118-122)
  • EmbeddingStruct (124-128)
core/providers/huggingface/types.go (3)
  • HuggingFaceEmbeddingRequest (161-172)
  • InputsCustomType (174-177)
  • EncodingType (219-219)
core/schemas/models.go (1)
  • Model (109-129)
core/schemas/chatcompletions.go (1)
  • BifrostLLMUsage (845-852)
core/providers/huggingface/types.go (1)
core/schemas/chatcompletions.go (3)
  • ChatStreamOptions (237-240)
  • ChatTool (252-257)
  • ChatToolChoiceFunction (444-446)
🪛 LanguageTool
docs/contributing/adding-a-provider.mdx

[uncategorized] ~1785-~1785: The official name of this software platform is spelled with a capital “H”.
Context: ....yml- ✅ Environment variable added to.github/workflows/release-pipeline.yml` (4 jobs...

(GITHUB)


[grammar] ~2000-~2000: Use a hyphen to join words.
Context: ...ror or incorrect stream handling Tool calling tests fail: - Solution: Ve...

(QB_NEW_EN_HYPHEN)

🔇 Additional comments (46)
core/providers/gemini/speech.go (1)

175-175: LGTM! Clean refactor to shared audio MIME detection utility.

Moving to utils.DetectAudioMimeType consolidates audio MIME type detection logic, reducing code duplication across providers and improving maintainability. The shared utility handles multiple audio formats with proper fallback logic.

docs/contributing/adding-a-provider.mdx (4)

7-13: Well-structured quick reference for contributors.

The introduction provides clear guidance on where to find production-ready implementations for both OpenAI-compatible and custom API providers. This helps contributors quickly locate relevant examples based on their integration needs.


64-69: Clear file creation order guidance.

Marking the file creation order as "CRITICAL" with explicit numbered steps helps prevent common mistakes where contributors might create files in the wrong order, leading to circular dependencies or incomplete implementations.


1770-1786: Comprehensive UI integration checklist.

The checklist covers all necessary files and locations for UI integration, making it easy for contributors to verify they haven't missed any required updates. This will reduce incomplete PRs and review cycles.


2033-2069: Thorough pre-submission checklist.

The final checklist consolidates all requirements across provider implementation, tests, schema, UI, CI/CD, and documentation. This comprehensive list helps ensure complete and high-quality provider contributions.

core/schemas/mux.go (1)

1410-1411: LGTM! Correct lifecycle management for reasoning-only responses.

Removing the state.TextItemHasContent check from the text item closure condition is the right approach. This ensures that text items are properly closed even for reasoning-only responses where the text item is created but may only contain an empty content delta.

This change maintains correct event sequencing and lifecycle transitions in the Responses API streaming format.

ui/lib/constants/config.ts (2)

61-61: LGTM!

Correctly marks HuggingFace as requiring an API key, consistent with the PR objectives that introduce HUGGING_FACE_API_KEY. The alphabetical ordering is also correct.


94-124: Add HuggingFace to BaseProvider type and PROVIDER_SUPPORTED_REQUESTS mapping.

HuggingFace is defined as a ProviderName in the constants but is missing from the BaseProvider type definition, which prevents it from being included in the PROVIDER_SUPPORTED_REQUESTS mapping. Since HuggingFace appears configured in the codebase (with sample models and icons), it should be added to both:

  1. BaseProvider type in ui/lib/types/config.ts (currently: "openai" | "anthropic" | "cohere" | "gemini" | "bedrock")
  2. PROVIDER_SUPPORTED_REQUESTS in ui/lib/constants/config.ts (with supported request types: list_models, chat_completion, chat_completion_stream, text_completion, text_completion_stream, responses, responses_stream, embedding)
core/schemas/bifrost.go (3)

35-53: LGTM! HuggingFace provider constant is properly defined.

The HuggingFace constant follows the established naming convention and string value pattern used by other providers.


56-63: HuggingFace correctly added to SupportedBaseProviders.

This enables custom providers to use HuggingFace as their base provider type.


66-85: HuggingFace correctly added to StandardProviders.

The provider is now included in the complete list of built-in providers, enabling it for standard provider operations.

core/schemas/account.go (2)

9-20: LGTM! HuggingFaceKeyConfig field properly integrated into Key struct.

The field follows the established pattern of other provider-specific configurations (Azure, Vertex, Bedrock) with appropriate optional semantics via pointer type and omitempty JSON tag.


70-73: HuggingFaceKeyConfig struct properly defined.

The Deployments map follows the same pattern as other provider key configurations, enabling model-to-deployment name mappings for future HuggingFace inference endpoint integration. Based on learnings, this is correctly reserved for future use.

core/internal/testutil/account.go (4)

114-114: HuggingFace properly added to configured providers list.

Consistent with other provider entries in the list.


327-334: HuggingFace key configuration looks good.

The environment variable HUGGING_FACE_API_KEY is consistently used. Note that UseForBatchAPI is intentionally omitted since HuggingFace doesn't support batch API operations (as confirmed by the AllProviderConfigs scenarios).


589-601: HuggingFace provider configuration is well-tuned.

The 300-second timeout accommodates HuggingFace model cold starts, and the retry configuration (10 retries with 2s-30s backoff) aligns with other variable/cloud providers.


1020-1053: Comprehensive HuggingFace test configuration.

The test config properly defines:

  • Provider-specific model identifiers (using HuggingFace's routing format)
  • Accurate scenario flags reflecting HuggingFace capabilities (e.g., MultipleToolCalls: false, streaming transcription/speech disabled)
  • Appropriate fallbacks to OpenAI

The configuration enables thorough testing of the HuggingFace provider integration.

core/providers/huggingface/chat.go (5)

11-14: Appropriate nil guard for request conversion.

Returning (nil, nil) for nil or empty input is a reasonable convention that allows callers to handle this case gracefully.


17-52: Parameter mapping looks correct.

The mappings from Bifrost parameters to HuggingFace parameters are straightforward and preserve optional semantics via pointer checks.


54-66: ResponseFormat conversion now properly handles errors.

The JSON marshal/unmarshal approach for format conversion returns meaningful errors, addressing the previous review concern about silent error swallowing.


68-73: Verify if IncludeObfuscation should also be forwarded.

Only IncludeUsage is copied from params.StreamOptions, but schemas.ChatStreamOptions also has an IncludeObfuscation field. If HuggingFace API supports this option, consider forwarding it as well:

 if params.StreamOptions != nil {
     hfReq.StreamOptions = &schemas.ChatStreamOptions{
-        IncludeUsage: params.StreamOptions.IncludeUsage,
+        IncludeUsage:       params.StreamOptions.IncludeUsage,
+        IncludeObfuscation: params.StreamOptions.IncludeObfuscation,
     }
 }

If HuggingFace doesn't support obfuscation, the current behavior is correct.


77-102: ToolChoice handling covers primary use cases.

The implementation correctly handles:

  • String enum values (auto, none, required)
  • Function-based tool choice via ChatToolChoiceTypeFunction
  • Guard to prevent setting invalid/empty ToolChoice

Note: ChatToolChoiceStruct also supports Custom and AllowedTools types which aren't handled, but these may not be applicable to HuggingFace's API.

core/internal/testutil/transcription.go (5)

73-97: Fixture-based testing for Fal-AI/HuggingFace properly implemented.

The code correctly:

  • Detects Fal-AI models via prefix check
  • Uses runtime.Caller to locate fixture files relative to source
  • Handles file read errors with t.Fatalf (addressing previous review concern)
  • Constructs proper transcription request with mp3 format

98-178: TTS generation path for non-Fal-AI providers looks good.

The else branch maintains the existing TTS generation flow with proper retry configuration and error handling.


369-386: Fixture loading for AllResponseFormats test is consistent.

Same pattern as other Fal-AI branches, correctly using RoundTrip_Basic_MP3.mp3 fixture.


463-480: Fixture loading for WithCustomParameters test is consistent.

Uses RoundTrip_Medium_MP3.mp3 fixture appropriately for the medium-length text test.


561-578: Fixture loading for MultipleLanguages test is consistent.

Correctly reuses RoundTrip_Basic_MP3.mp3 fixture for language testing.

core/providers/huggingface/models.go (3)

16-44: LGTM!

The model conversion logic is well-structured with proper nil checks, empty model filtering, and correct field mapping. The HuggingFaceID correctly uses model.ID (the original identifier) while Name uses model.ModelID.


46-102: LGTM!

The method derivation logic correctly maps HuggingFace pipeline types and tags to Bifrost request types. The deduplication via map and deterministic sorting ensures consistent output.


11-14: No action needed. The maxModelFetchLimit constant is used in core/providers/huggingface/utils.go at lines 92-93 for limit validation and should be retained.

Likely an incorrect or invalid review comment.

core/providers/huggingface/embedding.go (1)

11-78: LGTM!

The request conversion correctly handles provider-specific field mapping (Inputs for hf-inference, Input for others), properly extracts model/provider info, and maps both standard and HuggingFace-specific parameters from ExtraParams.

core/providers/huggingface/responses.go (2)

13-32: LGTM!

The conversion chain (ResponsesRequest → ChatRequest → HuggingFaceChatRequest) includes proper nil checks at each step, addressing the previous concern about nil pointer dereference.


34-62: LGTM!

The function correctly delegates to CheckContextAndGetRequestBody, includes the defensive nil check for hfReq, and properly configures streaming options when isStreaming is true.

core/providers/huggingface/transcription.go (2)

49-116: LGTM!

The parameter mapping correctly addresses the previous gating issue by mapping typed parameters (MaxNewTokens, MaxLength, etc.) outside the ExtraParams block. Generation parameters are properly initialized and populated from both schema-level fields and ExtraParams.


121-160: LGTM!

The response conversion correctly handles nil responses, validates required model name, maps text content, and safely extracts timestamp data from chunks with proper bounds checking.

core/providers/huggingface/huggingface.go (8)

29-81: LGTM!

Response pool management is well-implemented with proper struct reset on acquire and nil-safe release functions. Pre-warming the pools based on concurrency configuration is a good optimization.


209-269: LGTM!

The completeRequest method correctly handles audio content-type detection, error response parsing with proper message preservation (addressing past feedback), body decoding, and safe buffer copying to avoid use-after-free issues.


271-417: LGTM!

The concurrent model fetching implementation is well-structured with proper goroutine management, WaitGroup synchronization, and result aggregation. Error handling correctly returns the first error when all requests fail, and latency averaging is safely guarded.


585-615: LGTM!

The Responses implementation correctly delegates to ChatCompletion/ChatCompletionStream while properly overriding ExtraFields to reflect the actual request type. The streaming variant correctly sets the fallback context flag.


617-700: LGTM!

The embedding implementation correctly uses the model alias cache for request execution, handles multiple response formats via UnmarshalHuggingFaceEmbeddingResponse, and properly populates all ExtraFields including raw request/response when enabled.


702-782: LGTM!

The Speech implementation correctly manages the response pool with defer releaseHuggingFaceSpeechResponse, properly passes context to downloadAudioFromURL, and handles audio data conversion. ExtraFields are comprehensively populated.


788-875: LGTM!

The Transcription implementation correctly handles the hf-inference special case with proper input validation, manages the response pool with deferred release, and populates ExtraFields appropriately. The nil check for request.Input addresses the previous nil dereference concern.


882-930: LGTM!

All unsupported operations correctly return NewUnsupportedOperationError with the appropriate request type, maintaining consistency across the provider interface.

core/providers/huggingface/types.go (3)

73-143: Chat types correctly structured for HuggingFace API.

The chat-related types appropriately reuse Bifrost schemas where possible (ChatMessage, ChatStreamOptions, ChatTool, ChatToolChoiceFunction) while defining HuggingFace-specific variants like HuggingFaceToolChoice where the API contract differs. The custom MarshalJSON for HuggingFaceToolChoice correctly handles the union type (enum string or function object).


157-224: Embedding types correctly handle flexible input formats.

The InputsCustomType with custom JSON marshaling properly supports the HuggingFace API's flexible input field that accepts either a single string, an array of strings, or an object. The unmarshal logic tries formats in a sensible order (string → array → object) and the marshal logic appropriately prioritizes array over single text.


256-342: Transcription types are well-structured and error handling is correct.

The transcription-related types properly handle the HuggingFace ASR API, including:

  • Flexible audio input (raw bytes or URL)
  • Comprehensive generation parameters
  • Union type for EarlyStopping (bool or string "never") with symmetric marshaling
  • Proper error handling in UnmarshalJSON (line 326 correctly returns an error for invalid input)

The type aliases at the end provide convenient access without code duplication.

@qwerty-dvorak qwerty-dvorak force-pushed the 12-05-feat_huggingface_provider_added branch 3 times, most recently from 70fd6d0 to 3ea5fc8 Compare December 19, 2025 07:27
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (1)
ui/lib/constants/config.ts (1)

40-40: Fix the HuggingFace model identifier in the placeholder example.

The model identifier nebius/Qwen/Qwen3-Embedding-8B uses an incorrect format. The "nebius/" prefix indicates a Nebius inference service identifier, not a standard HuggingFace model format. Standard HuggingFace model identifiers follow the pattern organization/model-name.

🔎 Proposed fix
-	huggingface: "e.g. sambanova/meta-llama/Llama-3.1-8B-Instruct, nebius/Qwen/Qwen3-Embedding-8B",
+	huggingface: "e.g. meta-llama/Llama-3.1-8B-Instruct, Qwen/Qwen2.5-72B-Instruct",
🧹 Nitpick comments (4)
core/providers/openai/openai.go (1)

1047-1054: LGTM! Enhanced streaming emission for reasoning fields.

The broadened condition correctly triggers chunk emission when Delta.Reasoning or Delta.ReasoningDetails are present, following existing patterns (!= nil for pointers like Content, len() > 0 for slices like ToolCalls). The updated comment clearly documents the expanded behavior.

Optional: Verify test coverage

Consider verifying that test coverage includes scenarios where:

  • Delta.Reasoning is present (non-nil, including empty string case if valid)
  • Delta.ReasoningDetails has entries
  • Both fields are present simultaneously
  • These fields are present alongside other delta fields (Content, Audio, ToolCalls)

This ensures the new emission paths are exercised and behave as expected.

core/internal/testutil/transcription.go (2)

73-178: Fal‑AI-on-HuggingFace fixture path is correct; consider extracting a small helper

The conditional path for schemas.HuggingFace + fal-ai/ models that loads pre-generated mp3 fixtures and wires them into BifrostTranscriptionRequest neatly avoids wav/mp3 mismatches and fails fast on missing files. You might optionally pull the runtime.Caller + scenarios/media/<name>.mp3 resolution into a small helper to reuse in other Fal‑AI branches below and keep the test code DRY.


261-278: Repeated Fal‑AI fixture loading could be centralized for clarity

The Additional/Advanced/Language transcription tests repeat the same Fal‑AI-on-HuggingFace mp3 fixture lookup pattern (via runtime.Caller and os.ReadFile with t.Fatalf on failure). Extracting a shared loadFalAIAudioFixture(t, name string) []byte would remove repetition and make it easier to adjust fixture paths or error messages in one place. The continued use of audioData, _ = GenerateTTSAudioForTest(...) in non-Fal‑AI branches is correct given its ([]byte, string) signature and internal t.Fatalf on error. Based on learnings, this usage is intentional.

Also applies to: 369-386, 463-480, 561-578

core/providers/huggingface/transcription.go (1)

11-118: Transcription request conversion covers validations and provider-specific nuances well

Nil/empty-input checks, provider/model splitting, fal‑ai’s audio_url data‑URL handling with a WAV guard, and the detailed mapping of typed + ExtraParams into GenerationParameters all look correct and consistent with the rest of the provider. Only micro‑nit (optional): you always attach an empty generation_parameters object even when no fields are set; if you care about minimizing payloads you could gate assignment behind a “has any field” check.

📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 338f1a0 and 3ea5fc8.

⛔ Files ignored due to path filters (6)
  • core/internal/testutil/scenarios/media/Numbers_And_Punctuation.mp3 is excluded by !**/*.mp3
  • core/internal/testutil/scenarios/media/RoundTrip_Basic_MP3.mp3 is excluded by !**/*.mp3
  • core/internal/testutil/scenarios/media/RoundTrip_Medium_MP3.mp3 is excluded by !**/*.mp3
  • core/internal/testutil/scenarios/media/RoundTrip_Technical_MP3.mp3 is excluded by !**/*.mp3
  • core/internal/testutil/scenarios/media/Technical_Terms.mp3 is excluded by !**/*.mp3
  • ui/package-lock.json is excluded by !**/package-lock.json
📒 Files selected for processing (38)
  • .github/workflows/pr-tests.yml (1 hunks)
  • .github/workflows/release-pipeline.yml (4 hunks)
  • core/bifrost.go (2 hunks)
  • core/changelog.md (1 hunks)
  • core/internal/testutil/account.go (4 hunks)
  • core/internal/testutil/responses_stream.go (1 hunks)
  • core/internal/testutil/transcription.go (6 hunks)
  • core/providers/gemini/speech.go (1 hunks)
  • core/providers/gemini/transcription.go (2 hunks)
  • core/providers/gemini/utils.go (0 hunks)
  • core/providers/huggingface/chat.go (1 hunks)
  • core/providers/huggingface/embedding.go (1 hunks)
  • core/providers/huggingface/huggingface.go (1 hunks)
  • core/providers/huggingface/huggingface_test.go (1 hunks)
  • core/providers/huggingface/models.go (1 hunks)
  • core/providers/huggingface/responses.go (1 hunks)
  • core/providers/huggingface/speech.go (1 hunks)
  • core/providers/huggingface/transcription.go (1 hunks)
  • core/providers/huggingface/types.go (1 hunks)
  • core/providers/huggingface/utils.go (1 hunks)
  • core/providers/openai/openai.go (1 hunks)
  • core/providers/utils/audio.go (1 hunks)
  • core/providers/utils/utils.go (1 hunks)
  • core/schemas/account.go (2 hunks)
  • core/schemas/bifrost.go (3 hunks)
  • core/schemas/mux.go (3 hunks)
  • core/schemas/transcriptions.go (1 hunks)
  • docs/apis/openapi.json (1 hunks)
  • docs/contributing/adding-a-provider.mdx (1 hunks)
  • docs/docs.json (1 hunks)
  • docs/features/providers/huggingface.mdx (1 hunks)
  • docs/features/providers/providers-unified-interface.mdx (2 hunks)
  • transports/changelog.md (1 hunks)
  • transports/config.schema.json (2 hunks)
  • ui/README.md (6 hunks)
  • ui/lib/constants/config.ts (2 hunks)
  • ui/lib/constants/icons.tsx (1 hunks)
  • ui/lib/constants/logs.ts (2 hunks)
💤 Files with no reviewable changes (1)
  • core/providers/gemini/utils.go
🚧 Files skipped from review as they are similar to previous changes (16)
  • .github/workflows/pr-tests.yml
  • core/providers/utils/audio.go
  • transports/config.schema.json
  • core/providers/gemini/transcription.go
  • ui/lib/constants/logs.ts
  • docs/features/providers/providers-unified-interface.mdx
  • core/providers/huggingface/embedding.go
  • core/providers/gemini/speech.go
  • core/providers/huggingface/huggingface_test.go
  • docs/apis/openapi.json
  • core/providers/utils/utils.go
  • core/changelog.md
  • core/providers/huggingface/models.go
  • core/providers/huggingface/speech.go
  • .github/workflows/release-pipeline.yml
  • docs/features/providers/huggingface.mdx
👮 Files not reviewed due to content moderation or server errors (1)
  • ui/lib/constants/icons.tsx
🧰 Additional context used
📓 Path-based instructions (1)
**

⚙️ CodeRabbit configuration file

always check the stack if there is one for the current PR. do not give localized reviews for the PR, always see all changes in the light of the whole stack of PRs (if there is a stack, if there is no stack you can continue to make localized suggestions/reviews)

Files:

  • core/internal/testutil/transcription.go
  • core/providers/huggingface/responses.go
  • core/schemas/bifrost.go
  • core/bifrost.go
  • transports/changelog.md
  • core/schemas/account.go
  • core/providers/openai/openai.go
  • core/internal/testutil/account.go
  • ui/lib/constants/icons.tsx
  • core/schemas/mux.go
  • ui/README.md
  • core/schemas/transcriptions.go
  • core/providers/huggingface/chat.go
  • docs/docs.json
  • core/providers/huggingface/utils.go
  • core/providers/huggingface/transcription.go
  • docs/contributing/adding-a-provider.mdx
  • ui/lib/constants/config.ts
  • core/internal/testutil/responses_stream.go
  • core/providers/huggingface/huggingface.go
  • core/providers/huggingface/types.go
🧠 Learnings (5)
📚 Learning: 2025-12-15T10:19:32.071Z
Learnt from: qwerty-dvorak
Repo: maximhq/bifrost PR: 1006
File: core/internal/testutil/transcription.go:241-258
Timestamp: 2025-12-15T10:19:32.071Z
Learning: In core/internal/testutil package, the function GenerateTTSAudioForTest returns ([]byte, string) not ([]byte, error), and it already handles errors internally by calling t.Fatalf(). Therefore, the blank identifier for the second return value is correct and no external error checking is needed.

Applied to files:

  • core/internal/testutil/transcription.go
📚 Learning: 2025-12-09T17:07:42.007Z
Learnt from: qwerty-dvorak
Repo: maximhq/bifrost PR: 1006
File: core/schemas/account.go:9-18
Timestamp: 2025-12-09T17:07:42.007Z
Learning: In core/schemas/account.go, the HuggingFaceKeyConfig field within the Key struct is currently unused and reserved for future Hugging Face inference endpoint deployments. Do not flag this field as missing from OpenAPI documentation or require its presence in the API spec until the feature is actively implemented and used. When the feature is added, update the OpenAPI docs accordingly; otherwise, treat this field as non-breaking and not part of the current API surface.

Applied to files:

  • core/internal/testutil/transcription.go
  • core/providers/huggingface/responses.go
  • core/schemas/bifrost.go
  • core/bifrost.go
  • core/schemas/account.go
  • core/providers/openai/openai.go
  • core/internal/testutil/account.go
  • core/schemas/mux.go
  • core/schemas/transcriptions.go
  • core/providers/huggingface/chat.go
  • core/providers/huggingface/utils.go
  • core/providers/huggingface/transcription.go
  • core/internal/testutil/responses_stream.go
  • core/providers/huggingface/huggingface.go
  • core/providers/huggingface/types.go
📚 Learning: 2025-12-15T10:34:13.855Z
Learnt from: qwerty-dvorak
Repo: maximhq/bifrost PR: 1006
File: core/providers/huggingface/chat.go:370-389
Timestamp: 2025-12-15T10:34:13.855Z
Learning: In the HuggingFace provider (core/providers/huggingface/), streaming tool call data arrives as a single delta chunk with all fields (id, type, function.name, function.arguments) present at once. Do not implement accumulation logic across deltas for streaming tool calls in this codebase; rely on single-chunk data as the guaranteed behavior. If there is existing accumulation logic in this directory, remove it or guard it behind this assumption.

Applied to files:

  • core/providers/huggingface/responses.go
  • core/providers/huggingface/chat.go
  • core/providers/huggingface/utils.go
  • core/providers/huggingface/transcription.go
  • core/providers/huggingface/huggingface.go
  • core/providers/huggingface/types.go
📚 Learning: 2025-12-11T11:58:25.307Z
Learnt from: Pratham-Mishra04
Repo: maximhq/bifrost PR: 1000
File: core/providers/openai/responses.go:42-84
Timestamp: 2025-12-11T11:58:25.307Z
Learning: In core/providers/openai/responses.go (and related OpenAI response handling), document and enforce the API format constraint: if ResponsesReasoning != nil and the response contains content blocks, all content blocks should be treated as reasoning blocks by default. Implement type guards or parsing logic accordingly, and add unit tests to verify that when ResponsesReasoning is non-nil, content blocks are labeled as reasoning blocks. Include clear comments in the code explaining the rationale and ensure downstream consumers rely on this behavior.

Applied to files:

  • core/providers/openai/openai.go
📚 Learning: 2025-12-14T14:43:30.902Z
Learnt from: Radheshg04
Repo: maximhq/bifrost PR: 980
File: core/providers/openai/images.go:10-22
Timestamp: 2025-12-14T14:43:30.902Z
Learning: Enforce the OpenAI image generation SSE event type values across the OpenAI image flow in the repository: use "image_generation.partial_image" for partial chunks, "image_generation.completed" for the final result, and "error" for errors. Apply this consistently in schemas, constants, tests, accumulator routing, and UI code within core/providers/openai (and related Go files) to ensure uniform event typing and avoid mismatches.

Applied to files:

  • core/providers/openai/openai.go
🧬 Code graph analysis (8)
core/providers/huggingface/responses.go (5)
core/schemas/responses.go (2)
  • BifrostResponsesRequest (32-39)
  • BifrostResponsesResponse (45-85)
core/providers/huggingface/types.go (1)
  • HuggingFaceChatRequest (76-94)
core/providers/huggingface/chat.go (1)
  • ToHuggingFaceChatCompletionRequest (11-106)
core/schemas/chatcompletions.go (1)
  • BifrostChatResponse (27-42)
core/schemas/bifrost.go (3)
  • HuggingFace (51-51)
  • RequestType (88-88)
  • ResponsesRequest (96-96)
core/bifrost.go (2)
core/schemas/bifrost.go (1)
  • HuggingFace (51-51)
core/providers/huggingface/huggingface.go (1)
  • NewHuggingFaceProvider (66-99)
core/providers/openai/openai.go (3)
core/schemas/chatcompletions.go (1)
  • ChatStreamResponseChoice (783-785)
core/providers/gemini/types.go (1)
  • Content (977-985)
ui/lib/types/logs.ts (1)
  • ReasoningDetails (127-134)
core/internal/testutil/account.go (4)
core/schemas/bifrost.go (2)
  • HuggingFace (51-51)
  • OpenAI (35-35)
core/schemas/account.go (1)
  • Key (8-20)
core/schemas/provider.go (5)
  • ProviderConfig (267-276)
  • NetworkConfig (48-56)
  • DefaultRequestTimeoutInSeconds (15-15)
  • ConcurrencyAndBufferSize (131-134)
  • Provider (314-361)
core/internal/testutil/cross_provider_scenarios.go (1)
  • ProviderConfig (45-53)
core/schemas/mux.go (2)
core/schemas/responses.go (2)
  • BifrostResponsesStreamResponse (1440-1479)
  • ResponsesStreamResponseTypeOutputTextDelta (1388-1388)
core/schemas/utils.go (1)
  • Ptr (16-18)
core/providers/huggingface/chat.go (3)
core/schemas/chatcompletions.go (4)
  • BifrostChatRequest (12-19)
  • ChatStreamOptions (237-240)
  • ChatToolChoiceStruct (390-395)
  • ChatToolChoiceFunction (444-446)
core/providers/huggingface/types.go (2)
  • HuggingFaceChatRequest (76-94)
  • HuggingFaceToolChoice (99-104)
ui/lib/types/logs.ts (1)
  • Function (152-157)
core/providers/huggingface/transcription.go (5)
core/schemas/transcriptions.go (3)
  • BifrostTranscriptionRequest (3-10)
  • BifrostTranscriptionResponse (16-26)
  • TranscriptionSegment (87-98)
core/providers/huggingface/types.go (5)
  • HuggingFaceTranscriptionRequest (258-264)
  • HuggingFaceTranscriptionRequestParameters (267-270)
  • HuggingFaceTranscriptionGenerationParameters (273-290)
  • HuggingFaceTranscriptionEarlyStopping (294-297)
  • HuggingFaceTranscriptionResponse (330-333)
core/providers/utils/audio.go (1)
  • DetectAudioMimeType (78-119)
core/schemas/utils.go (1)
  • SafeExtractIntPointer (486-494)
core/schemas/bifrost.go (2)
  • BifrostResponseExtraFields (394-405)
  • HuggingFace (51-51)
core/providers/huggingface/huggingface.go (4)
core/providers/huggingface/types.go (2)
  • HuggingFaceTranscriptionResponse (330-333)
  • HuggingFaceSpeechResponse (243-245)
core/providers/utils/utils.go (2)
  • MakeRequestWithContext (41-95)
  • NewBifrostOperationError (733-744)
core/providers/utils/audio.go (1)
  • DetectAudioMimeType (78-119)
core/providers/huggingface/embedding.go (2)
  • ToHuggingFaceEmbeddingRequest (11-78)
  • UnmarshalHuggingFaceEmbeddingResponse (82-168)
🪛 LanguageTool
docs/contributing/adding-a-provider.mdx

[uncategorized] ~1785-~1785: The official name of this software platform is spelled with a capital “H”.
Context: ....yml- ✅ Environment variable added to.github/workflows/release-pipeline.yml` (4 jobs...

(GITHUB)


[grammar] ~2000-~2000: Use a hyphen to join words.
Context: ...ror or incorrect stream handling Tool calling tests fail: - Solution: Ve...

(QB_NEW_EN_HYPHEN)

🔇 Additional comments (36)
transports/changelog.md (1)

1-1: LGTM! Changelog entry is clear and concise.

The changelog entry accurately summarizes the HuggingFace provider addition with UI integration.

docs/contributing/adding-a-provider.mdx (1)

1-2070: LGTM! Comprehensive contributor guide for adding providers.

This is an excellent, thorough guide that provides clear step-by-step instructions for adding both OpenAI-compatible and custom API providers. The structure is logical, examples are helpful, and the guide covers all necessary aspects from research to CI/CD integration.

docs/docs.json (1)

158-166: LGTM! Documentation navigation structure updated appropriately.

The new Providers group consolidates provider-related documentation logically, and the HuggingFace page entry fits well within this structure.

core/schemas/mux.go (2)

1155-1241: LGTM! Proper handling of reasoning-only streaming responses.

The new gating logic correctly handles edge cases where responses contain only reasoning content without text. The empty delta emission on the first chunk ensures proper lifecycle validation while maintaining backward compatibility with content-only and mixed content/reasoning responses.


1410-1456: LGTM! Text item closure logic improved.

Closing the text item regardless of whether it has content properly supports reasoning-only responses while maintaining correct lifecycle sequencing. This is a good improvement over the previous content-dependent closure logic.

ui/lib/constants/config.ts (1)

61-61: LGTM! HuggingFace key requirement correctly configured.

Setting huggingface: true appropriately indicates that an API key is required for the HuggingFace provider.

core/internal/testutil/responses_stream.go (1)

693-693: LGTM! Increased safety threshold for complex streaming scenarios.

Raising the threshold from 100 to 300 is reasonable to accommodate providers that emit richer lifecycle events (such as reasoning, tool calls, and multiple content types) while still protecting against infinite loops.

core/schemas/account.go (2)

9-20: LGTM! HuggingFaceKeyConfig field added for future use.

The new field follows the existing pattern for provider-specific key configurations and is properly structured with the omitempty tag. Based on learnings, this is reserved for future Hugging Face inference endpoint deployments.

Based on learnings, this field is currently unused and should not be flagged for missing OpenAPI documentation until the feature is actively implemented.


70-72: LGTM! HuggingFaceKeyConfig type definition follows established patterns.

The new type mirrors the structure of other provider-specific configs (AzureKeyConfig, BedrockKeyConfig) with a Deployments map for model-to-deployment mappings.

core/bifrost.go (2)

26-26: Import correctly placed.

The HuggingFace provider import follows the alphabetical ordering convention used by other provider imports.


1889-1890: HuggingFace case correctly integrated into provider factory.

The switch case follows the established pattern for providers that don't return initialization errors. The placement before schemas.Nebius maintains logical grouping.

core/schemas/bifrost.go (2)

51-51: HuggingFace constant correctly defined.

The constant follows the established naming convention with lowercase provider identifier string.


62-62: Provider registration is complete.

HuggingFace is correctly added to both SupportedBaseProviders (enabling custom provider configurations based on HuggingFace) and StandardProviders (registering as a built-in provider).

Also applies to: 83-83

core/internal/testutil/account.go (4)

114-114: Provider correctly added to configured providers list.

HuggingFace is appropriately placed in the provider list, maintaining consistency with other provider entries.


327-334: HuggingFace key configuration is appropriate.

The key retrieval uses the standardized HUGGING_FACE_API_KEY environment variable. The omission of UseForBatchAPI is intentional since the HuggingFace test configuration shows batch operations are not enabled for this provider.


589-601: HuggingFace provider configuration is well-tuned.

The 300-second timeout appropriately accounts for HuggingFace model cold starts. The retry settings (10 retries, 2s-30s backoff) align with other cloud providers, and the concurrency/buffer settings match the standard pattern.


1020-1053: Comprehensive test configuration for HuggingFace is well-structured.

The test configuration appropriately reflects HuggingFace capabilities:

  • Models use the correct HuggingFace Inference API naming convention (provider/org/model)
  • MultipleToolCalls: false aligns with the learning that HuggingFace streaming tool call data arrives as single delta chunks
  • Stream variants for transcription and speech are correctly disabled
  • Fallback to OpenAI is properly configured
core/schemas/transcriptions.go (1)

37-40: HuggingFace transcription parameters are well-documented.

The new generation parameters (MaxLength, MinLength, MaxNewTokens, MinNewTokens) are correctly typed as optional pointers and include clear inline comments indicating their HuggingFace-specific purpose. This follows the pattern established by the Elevenlabs-specific fields below.

core/providers/huggingface/responses.go (2)

11-30: Request conversion function has proper nil guards.

The function correctly implements defensive nil checks at each conversion step. The guard at lines 25-27 addresses the previous review concern about dereferencing a nil request.


34-52: Response conversion correctly enriches metadata.

The function properly handles nil input and ensures model propagation. The ExtraFields enrichment with Provider, ModelRequested, and RequestType follows the pattern used by other providers.

Note: The function mutates the input resp.Model (line 41) as a side effect. This appears intentional to ensure model information is preserved in the response chain, but be aware of this behavior if the caller expects the original response to remain unmodified.

core/providers/huggingface/chat.go (1)

11-105: Chat request conversion aligns well with Bifrost schema and HF expectations

The field mapping (parameters, tools, tool_choice, response_format, stream options) is consistent and defensive, and error surfacing for ResponseFormat conversion is appropriate. I don’t see gaps relative to the surrounding provider implementation.

core/providers/huggingface/utils.go (2)

52-147: Inference provider list, model hub URL, and model/provider parsing look coherent

The provider enum/set and buildModelHubURL correctly mirror Hugging Face’s hub filters, and splitIntoModelProvider cleanly separates provider vs. model while explicitly rejecting ambiguous no-slash model strings. This matches the rest of the provider’s routing logic.


195-322: Provider mapping, validation, and audio download helpers are robust

getModelInferenceProviderMapping/getValidatedProviderModelID combine caching, precise task checks, and safe error propagation; downloadAudioFromURL and getMimeTypeForAudioType correctly use context-aware requests, status checks, and body copying to avoid use-after-free. Overall this utility layer looks solid.

ui/README.md (1)

3-239: UI README refresh is aligned with current architecture and docs strategy

The README now accurately describes the UI’s role, tech stack, and project structure while delegating detailed feature/provider lists to the central docs site. This reduces drift and looks good for long‑term maintainability.

core/providers/huggingface/transcription.go (1)

121-159: Transcription response→Bifrost mapping is straightforward and safe

The response adapter correctly sets provider/model metadata, copies Text, and turns Chunks into TranscriptionSegments with guarded timestamp handling (defaulting start/end to 0 when missing). This is a clean, minimal translation.

core/providers/huggingface/huggingface.go (8)

18-99: Provider construction, client config, and pooling are well-structured

The provider struct fields, fasthttp client configuration (timeouts, connection limits), proxy wiring, base URL normalization, and pre-warming of speech/transcription response pools are all coherent and align with how other providers in this codebase are set up.


111-190: Model-alias cache helper cleanly encapsulates routing + retry behavior

completeRequestWithModelAliasCache cleanly separates URL construction, provider‑task validation, optional embedding model rewrite, and a single 404‑driven retry with cache invalidation. The control flow is clear and reuses completeRequest without duplicating HTTP/error logic.


190-250: Core HTTP request helper handles headers, audio content types, and errors correctly

completeRequest uses context-aware requests, sets JSON vs audio content types appropriately (via DetectAudioMimeType + getMimeTypeForAudioType), and preserves useful error information by layering HuggingFaceResponseError over HandleProviderAPIError. Copying the body before releasing the response avoids fasthttp buffer pitfalls.


252-398: Model listing fan-out and aggregation logic is solid and tolerant to partial failures

listModelsByKey’s per-provider goroutines, shared result channel, and aggregation of data/latency/raw responses are implemented carefully: errors per provider don’t poison the whole result unless everything fails, and average latency plus combined raw responses are exposed only when requested.


427-596: Chat/Responses paths reuse OpenAI-compatible surface sensibly

Chat completion (sync + stream) correctly validates models via splitIntoModelProvider, normalizes the model as <hub-id>:<provider>, uses the shared chat converter, and leverages the OpenAI streaming helper with a custom request converter. The Responses/ResponsesStream fallback through chat is consistent with how other providers are currently wired and preserves extra fields.


598-681: Embedding path integrates request conversion, alias routing, and flexible response parsing cleanly

The Embedding method checks operation permissions, validates the model string, uses the dedicated converter, then routes via completeRequestWithModelAliasCache with the correct task, and finally normalizes the variety of possible HF embedding response shapes via UnmarshalHuggingFaceEmbeddingResponse. ExtraFields and optional raw request/response are populated consistently.


683-763: Speech flow correctly composes provider mapping, response pooling, and audio download

Text-to-speech requests go through provider/alias validation, use the speech converter, and then populate a pooled HuggingFaceSpeechResponse. Downloading the final audio via downloadAudioFromURL with context and then converting to BifrostSpeechResponse keeps the HTTP/audio concerns separated and maintains consistent ExtraFields and optional raw payloads.


769-861: Transcription flow handles hf-inference vs provider JSON paths and mapping consistently

The Transcription method distinguishes raw-audio hf-inference requests from JSON-based ones, validates presence of input audio, routes via completeRequestWithModelAliasCache with the correct ASR task, and uses pooled HF responses plus the dedicated converter to build BifrostTranscriptionResponse. ExtraFields and raw payload flags follow the same conventions as other operations.

core/providers/huggingface/types.go (3)

14-72: Model and inference-provider mapping types match Hub API semantics with flexible JSON handling

HuggingFaceModel and HuggingFaceListModelsResponse.UnmarshalJSON correctly accommodate both top-level array and legacy {"models":[...]} forms, and the inference provider mapping structs (HuggingFaceInferenceProviderInfo/Mapping) align with providerId+task usage seen in the routing utilities.


75-217: Chat/tool-choice and embedding input types are well-shaped and JSON-friendly

HuggingFaceChatRequest plus HuggingFaceToolChoice.MarshalJSON model the dual string/object tool_choice contract cleanly, and InputsCustomType handles string/array/object inputs symmetrically via custom (un)marshaling. This matches how the chat and embedding converters populate these fields.


228-342: Speech and transcription parameter/response types, including early_stopping union, are solid

Speech reuses the shared generation-parameter struct, and the transcription request/parameter types plus HuggingFaceTranscriptionEarlyStopping’s custom (un)marshal correctly model the “bool or string” union with proper validation. The transcription response/chunk types fit the conversion logic used in ToBifrostTranscriptionResponse.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (7)
core/internal/testutil/transcription.go (1)

73-177: Fal‑AI/HuggingFace mp3 fixture handling looks correct; consider a small helper to DRY it up.

The conditional branches for schemas.HuggingFace + fal-ai/ models correctly:

  • Bypass TTS and load pre-generated mp3 fixtures.
  • Fail fast on os.ReadFile errors with clear messages.
  • Keep non-HF/fal paths on the existing TTS + retry flow.
  • Preserve correct transcription params (Language, Format: "mp3", ResponseFormat, fallbacks).

Also, the GenerateTTSAudioForTest(...); audioData, _ = ... usage is correct and intentional given it returns ([]byte, string) and handles errors via t.Fatalf() internally. Based on learnings, there’s no need to add extra error handling around it.

You might optionally extract a small helper like loadFalAudioFixture(t, testName string) []byte (wrapping the runtime.Caller + filepath.Join + os.ReadFile pattern) to avoid repeating the same 7–8 lines across all five branches, but that’s purely for readability/maintainability.

Also applies to: 261-278, 369-386, 463-480, 561-578

core/schemas/mux.go (1)

1155-1160: Reasoning-only streaming handling is correct; the delta condition could be slightly simplified (optional).

The new hasContent / hasReasoning gating plus:

  • Creating the text item when either content is present or on the first reasoning-only chunk, and
  • Always closing the text item on finish_reason,

gives a consistent lifecycle for both contentful and reasoning-only responses and fixes the prior gap where reasoning-only streams might never get an output_text item.

The OutputTextDelta guard:

if hasContent || (!state.TextItemHasContent && (hasReasoning || hasContent)) {
    ...
}

is logically sound but a bit redundant; it could be simplified to:

if hasContent || (!state.TextItemHasContent && hasReasoning) {
    ...
}

for readability, without changing behavior. This is optional and non-blocking.

Also applies to: 1214-1241, 1410-1457

core/changelog.md (1)

1-1: Optional wording polish for readability.

Purely stylistic, but you might consider:

-feat: added HuggingFace provider using Inference Provider API, support for chat(with stream also), response(with stream also), TTS and speech synthesis
+feat: added HuggingFace provider using Inference Provider API, with support for chat (including streaming), responses (including streaming), TTS, and speech synthesis

to improve grammar and clarity. No functional impact either way.

core/providers/utils/audio.go (1)

64-119: DetectAudioMimeType logic matches the intended, limited format set; just be sure tests cover the edge cases.

The header constants and detection flow (WAV → ID3/MP3 → AAC via ADIF/ADTS → AIFF/AIFC → FLAC → OGG → MP3 frame sync → mp3 fallback) are aligned with the documented supported formats and the stricter ADTS mask is intentional to avoid misclassifying MP3 as AAC, per prior discussion. Based on learnings, the 0xF6 mask here is correct.

If not already in place, it’s worth having unit tests that:

  • Positively detect each of WAV/MP3/AAC/AIFF/OGG/FLAC.
  • Specifically cover an MP3 frame with bits that would trip a naive ADTS check, to lock in the stricter 0xF6 behavior.

Otherwise this helper looks good.

core/providers/huggingface/chat.go (1)

11-14: Consider returning an error for nil input instead of (nil, nil).

When bifrostReq or bifrostReq.Input is nil, the function returns (nil, nil). This silent nil return could mask bugs at call sites since the caller receives no error but also no usable request. Consider returning an explicit error to signal the invalid input.

🔎 Suggested improvement
 func ToHuggingFaceChatCompletionRequest(bifrostReq *schemas.BifrostChatRequest) (*HuggingFaceChatRequest, error) {
 	if bifrostReq == nil || bifrostReq.Input == nil {
-		return nil, nil
+		return nil, fmt.Errorf("bifrost chat request or input is nil")
 	}
docs/contributing/adding-a-provider.mdx (1)

1999-2002: Minor: Hyphenate "Tool-calling" for consistency.

Static analysis flagged that "Tool calling" should be hyphenated as a compound modifier.

🔎 Suggested fix
-**Tool calling tests fail**:
+**Tool-calling tests fail**:
core/providers/huggingface/types.go (1)

234-234: Remove or document the unused Extra field.

The Extra field is tagged with json:"-" which excludes it from JSON marshaling. Based on previous review discussions, this field is unused throughout the codebase. Either remove it entirely, or if it's reserved for future provider-specific metadata, document its intended purpose with a clear comment.

🔎 Proposed fix

If the field is truly unused, remove it:

 type HuggingFaceSpeechRequest struct {
 	Text       string                       `json:"text"`
 	Provider   string                       `json:"provider" validate:"required"`
 	Model      string                       `json:"model" validate:"required"`
 	Parameters *HuggingFaceSpeechParameters `json:"parameters,omitempty"`
-	Extra      map[string]any               `json:"-"`
 }

Or, if it's reserved for future use, document it clearly:

 type HuggingFaceSpeechRequest struct {
 	Text       string                       `json:"text"`
 	Provider   string                       `json:"provider" validate:"required"`
 	Model      string                       `json:"model" validate:"required"`
 	Parameters *HuggingFaceSpeechParameters `json:"parameters,omitempty"`
+	// Extra holds provider-specific opaque data for future extensions (not serialized)
 	Extra      map[string]any               `json:"-"`
 }
📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 338f1a0 and 3ea5fc8.

⛔ Files ignored due to path filters (6)
  • core/internal/testutil/scenarios/media/Numbers_And_Punctuation.mp3 is excluded by !**/*.mp3
  • core/internal/testutil/scenarios/media/RoundTrip_Basic_MP3.mp3 is excluded by !**/*.mp3
  • core/internal/testutil/scenarios/media/RoundTrip_Medium_MP3.mp3 is excluded by !**/*.mp3
  • core/internal/testutil/scenarios/media/RoundTrip_Technical_MP3.mp3 is excluded by !**/*.mp3
  • core/internal/testutil/scenarios/media/Technical_Terms.mp3 is excluded by !**/*.mp3
  • ui/package-lock.json is excluded by !**/package-lock.json
📒 Files selected for processing (38)
  • .github/workflows/pr-tests.yml (1 hunks)
  • .github/workflows/release-pipeline.yml (4 hunks)
  • core/bifrost.go (2 hunks)
  • core/changelog.md (1 hunks)
  • core/internal/testutil/account.go (4 hunks)
  • core/internal/testutil/responses_stream.go (1 hunks)
  • core/internal/testutil/transcription.go (6 hunks)
  • core/providers/gemini/speech.go (1 hunks)
  • core/providers/gemini/transcription.go (2 hunks)
  • core/providers/gemini/utils.go (0 hunks)
  • core/providers/huggingface/chat.go (1 hunks)
  • core/providers/huggingface/embedding.go (1 hunks)
  • core/providers/huggingface/huggingface.go (1 hunks)
  • core/providers/huggingface/huggingface_test.go (1 hunks)
  • core/providers/huggingface/models.go (1 hunks)
  • core/providers/huggingface/responses.go (1 hunks)
  • core/providers/huggingface/speech.go (1 hunks)
  • core/providers/huggingface/transcription.go (1 hunks)
  • core/providers/huggingface/types.go (1 hunks)
  • core/providers/huggingface/utils.go (1 hunks)
  • core/providers/openai/openai.go (1 hunks)
  • core/providers/utils/audio.go (1 hunks)
  • core/providers/utils/utils.go (1 hunks)
  • core/schemas/account.go (2 hunks)
  • core/schemas/bifrost.go (3 hunks)
  • core/schemas/mux.go (3 hunks)
  • core/schemas/transcriptions.go (1 hunks)
  • docs/apis/openapi.json (1 hunks)
  • docs/contributing/adding-a-provider.mdx (1 hunks)
  • docs/docs.json (1 hunks)
  • docs/features/providers/huggingface.mdx (1 hunks)
  • docs/features/providers/providers-unified-interface.mdx (2 hunks)
  • transports/changelog.md (1 hunks)
  • transports/config.schema.json (2 hunks)
  • ui/README.md (6 hunks)
  • ui/lib/constants/config.ts (2 hunks)
  • ui/lib/constants/icons.tsx (1 hunks)
  • ui/lib/constants/logs.ts (2 hunks)
💤 Files with no reviewable changes (1)
  • core/providers/gemini/utils.go
🚧 Files skipped from review as they are similar to previous changes (17)
  • docs/apis/openapi.json
  • core/providers/huggingface/embedding.go
  • docs/features/providers/providers-unified-interface.mdx
  • core/providers/huggingface/huggingface_test.go
  • ui/lib/constants/logs.ts
  • core/schemas/bifrost.go
  • ui/README.md
  • core/providers/huggingface/transcription.go
  • core/providers/huggingface/models.go
  • core/providers/huggingface/responses.go
  • docs/docs.json
  • transports/config.schema.json
  • .github/workflows/pr-tests.yml
  • core/schemas/account.go
  • core/providers/gemini/transcription.go
  • core/providers/huggingface/speech.go
  • ui/lib/constants/config.ts
👮 Files not reviewed due to content moderation or server errors (1)
  • ui/lib/constants/icons.tsx
🧰 Additional context used
📓 Path-based instructions (1)
**

⚙️ CodeRabbit configuration file

always check the stack if there is one for the current PR. do not give localized reviews for the PR, always see all changes in the light of the whole stack of PRs (if there is a stack, if there is no stack you can continue to make localized suggestions/reviews)

Files:

  • core/providers/utils/utils.go
  • core/providers/huggingface/chat.go
  • core/bifrost.go
  • docs/features/providers/huggingface.mdx
  • core/internal/testutil/responses_stream.go
  • core/providers/openai/openai.go
  • core/schemas/mux.go
  • core/providers/gemini/speech.go
  • core/providers/utils/audio.go
  • core/schemas/transcriptions.go
  • docs/contributing/adding-a-provider.mdx
  • core/internal/testutil/account.go
  • core/internal/testutil/transcription.go
  • core/providers/huggingface/utils.go
  • transports/changelog.md
  • ui/lib/constants/icons.tsx
  • core/changelog.md
  • core/providers/huggingface/huggingface.go
  • core/providers/huggingface/types.go
🧠 Learnings (9)
📚 Learning: 2025-12-09T17:07:42.007Z
Learnt from: qwerty-dvorak
Repo: maximhq/bifrost PR: 1006
File: core/schemas/account.go:9-18
Timestamp: 2025-12-09T17:07:42.007Z
Learning: In core/schemas/account.go, the HuggingFaceKeyConfig field within the Key struct is currently unused and reserved for future Hugging Face inference endpoint deployments. Do not flag this field as missing from OpenAPI documentation or require its presence in the API spec until the feature is actively implemented and used. When the feature is added, update the OpenAPI docs accordingly; otherwise, treat this field as non-breaking and not part of the current API surface.

Applied to files:

  • core/providers/utils/utils.go
  • core/providers/huggingface/chat.go
  • core/bifrost.go
  • core/internal/testutil/responses_stream.go
  • core/providers/openai/openai.go
  • core/schemas/mux.go
  • core/providers/gemini/speech.go
  • core/providers/utils/audio.go
  • core/schemas/transcriptions.go
  • core/internal/testutil/account.go
  • core/internal/testutil/transcription.go
  • core/providers/huggingface/utils.go
  • core/providers/huggingface/huggingface.go
  • core/providers/huggingface/types.go
📚 Learning: 2025-12-15T10:34:13.855Z
Learnt from: qwerty-dvorak
Repo: maximhq/bifrost PR: 1006
File: core/providers/huggingface/chat.go:370-389
Timestamp: 2025-12-15T10:34:13.855Z
Learning: In the HuggingFace provider (core/providers/huggingface/), streaming tool call data arrives as a single delta chunk with all fields (id, type, function.name, function.arguments) present at once. Do not implement accumulation logic across deltas for streaming tool calls in this codebase; rely on single-chunk data as the guaranteed behavior. If there is existing accumulation logic in this directory, remove it or guard it behind this assumption.

Applied to files:

  • core/providers/huggingface/chat.go
  • core/providers/huggingface/utils.go
  • core/providers/huggingface/huggingface.go
  • core/providers/huggingface/types.go
📚 Learning: 2025-12-15T10:06:05.395Z
Learnt from: qwerty-dvorak
Repo: maximhq/bifrost PR: 1006
File: docs/features/providers/huggingface.mdx:39-61
Timestamp: 2025-12-15T10:06:05.395Z
Learning: For fal-ai transcription requests routed through HuggingFace in Bifrost, WAV (audio/wav) is not supported and should be rejected. Only MP3 format is supported. Update the documentation and any related examples to reflect MP3 as the required input format for HuggingFace-based transcription, and note WAV should not be used. This applies specifically to the HuggingFace provider integration in this repository.

Applied to files:

  • docs/features/providers/huggingface.mdx
📚 Learning: 2025-12-09T17:08:21.123Z
Learnt from: qwerty-dvorak
Repo: maximhq/bifrost PR: 1006
File: docs/features/providers/huggingface.mdx:171-195
Timestamp: 2025-12-09T17:08:21.123Z
Learning: In docs/features/providers/huggingface.mdx, use the official Hugging Face naming conventions for provider identifiers in the capabilities table (e.g., ovhcloud-ai-endpoints, z-ai). Do not map to SDK identifiers like ovhcloud or zai-org; this aligns with Hugging Face's public docs and improves consistency for readers.

Applied to files:

  • docs/features/providers/huggingface.mdx
📚 Learning: 2025-12-11T11:58:25.307Z
Learnt from: Pratham-Mishra04
Repo: maximhq/bifrost PR: 1000
File: core/providers/openai/responses.go:42-84
Timestamp: 2025-12-11T11:58:25.307Z
Learning: In core/providers/openai/responses.go (and related OpenAI response handling), document and enforce the API format constraint: if ResponsesReasoning != nil and the response contains content blocks, all content blocks should be treated as reasoning blocks by default. Implement type guards or parsing logic accordingly, and add unit tests to verify that when ResponsesReasoning is non-nil, content blocks are labeled as reasoning blocks. Include clear comments in the code explaining the rationale and ensure downstream consumers rely on this behavior.

Applied to files:

  • core/providers/openai/openai.go
📚 Learning: 2025-12-14T14:43:30.902Z
Learnt from: Radheshg04
Repo: maximhq/bifrost PR: 980
File: core/providers/openai/images.go:10-22
Timestamp: 2025-12-14T14:43:30.902Z
Learning: Enforce the OpenAI image generation SSE event type values across the OpenAI image flow in the repository: use "image_generation.partial_image" for partial chunks, "image_generation.completed" for the final result, and "error" for errors. Apply this consistently in schemas, constants, tests, accumulator routing, and UI code within core/providers/openai (and related Go files) to ensure uniform event typing and avoid mismatches.

Applied to files:

  • core/providers/openai/openai.go
📚 Learning: 2025-09-01T15:29:17.076Z
Learnt from: TejasGhatte
Repo: maximhq/bifrost PR: 372
File: core/providers/utils.go:552-598
Timestamp: 2025-09-01T15:29:17.076Z
Learning: The detectAudioMimeType function in core/providers/utils.go is specifically designed for Gemini's audio format support, which only includes: WAV (audio/wav), MP3 (audio/mp3), AIFF (audio/aiff), AAC (audio/aac), OGG Vorbis (audio/ogg), and FLAC (audio/flac). The implementation should remain focused on these specific formats rather than being over-engineered for general-purpose audio detection.

Applied to files:

  • core/providers/utils/audio.go
📚 Learning: 2025-12-10T15:15:14.041Z
Learnt from: qwerty-dvorak
Repo: maximhq/bifrost PR: 1006
File: core/providers/utils/audio.go:92-98
Timestamp: 2025-12-10T15:15:14.041Z
Learning: In core/providers/utils/audio.go, within DetectAudioMimeType, use a mask of 0xF6 for ADTS sync detection instead of the standard 0xF0. This stricter check validates that the top nibble is 0xF and the Layer field bits (bits 2-1) are 00, preventing MP3 Layer III (Layer bits 11) from being misidentified as AAC. Ensure unit tests cover this behavior and document the rationale in code comments.

Applied to files:

  • core/providers/utils/audio.go
📚 Learning: 2025-12-15T10:19:32.071Z
Learnt from: qwerty-dvorak
Repo: maximhq/bifrost PR: 1006
File: core/internal/testutil/transcription.go:241-258
Timestamp: 2025-12-15T10:19:32.071Z
Learning: In core/internal/testutil package, the function GenerateTTSAudioForTest returns ([]byte, string) not ([]byte, error), and it already handles errors internally by calling t.Fatalf(). Therefore, the blank identifier for the second return value is correct and no external error checking is needed.

Applied to files:

  • core/internal/testutil/transcription.go
🧬 Code graph analysis (9)
core/providers/utils/utils.go (1)
core/schemas/bifrost.go (3)
  • Cerebras (47-47)
  • Perplexity (46-46)
  • HuggingFace (51-51)
core/bifrost.go (1)
core/schemas/bifrost.go (1)
  • HuggingFace (51-51)
core/providers/openai/openai.go (3)
core/schemas/chatcompletions.go (1)
  • ChatStreamResponseChoice (783-785)
core/providers/gemini/types.go (1)
  • Content (977-985)
ui/lib/types/logs.ts (1)
  • ReasoningDetails (127-134)
core/schemas/mux.go (2)
core/schemas/responses.go (2)
  • BifrostResponsesStreamResponse (1440-1479)
  • ResponsesStreamResponseTypeOutputTextDelta (1388-1388)
core/schemas/utils.go (1)
  • Ptr (16-18)
core/providers/gemini/speech.go (1)
core/providers/utils/audio.go (1)
  • DetectAudioMimeType (78-119)
core/internal/testutil/account.go (3)
core/schemas/bifrost.go (3)
  • HuggingFace (51-51)
  • Fallback (149-152)
  • OpenAI (35-35)
core/schemas/provider.go (5)
  • ProviderConfig (267-276)
  • NetworkConfig (48-56)
  • DefaultRequestTimeoutInSeconds (15-15)
  • ConcurrencyAndBufferSize (131-134)
  • Provider (314-361)
core/internal/testutil/cross_provider_scenarios.go (1)
  • ProviderConfig (45-53)
core/internal/testutil/transcription.go (4)
core/schemas/transcriptions.go (3)
  • BifrostTranscriptionRequest (3-10)
  • TranscriptionInput (28-30)
  • TranscriptionParameters (32-49)
core/internal/testutil/utils.go (1)
  • GetProviderVoice (39-87)
core/schemas/speech.go (4)
  • BifrostSpeechRequest (9-16)
  • SpeechParameters (43-58)
  • SpeechVoiceInput (65-68)
  • BifrostSpeechResponse (22-29)
core/internal/testutil/test_retry_framework.go (5)
  • GetTestRetryConfigForScenario (1118-1150)
  • TestRetryContext (168-173)
  • SpeechRetryConfig (216-223)
  • SpeechRetryCondition (144-147)
  • WithSpeechTestRetry (1328-1478)
core/providers/huggingface/huggingface.go (7)
core/providers/huggingface/types.go (5)
  • HuggingFaceTranscriptionResponse (330-333)
  • HuggingFaceSpeechResponse (243-245)
  • HuggingFaceResponseError (151-155)
  • HuggingFaceListModelsResponse (28-30)
  • HuggingFaceHubError (146-149)
core/providers/utils/utils.go (2)
  • ConfigureProxy (100-148)
  • GetRequestPath (221-241)
core/providers/utils/audio.go (1)
  • DetectAudioMimeType (78-119)
core/providers/huggingface/chat.go (1)
  • ToHuggingFaceChatCompletionRequest (11-106)
core/providers/huggingface/embedding.go (2)
  • ToHuggingFaceEmbeddingRequest (11-78)
  • UnmarshalHuggingFaceEmbeddingResponse (82-168)
core/providers/huggingface/speech.go (1)
  • ToHuggingFaceSpeechRequest (9-96)
core/providers/huggingface/transcription.go (1)
  • ToHuggingFaceTranscriptionRequest (11-119)
core/providers/huggingface/types.go (1)
core/schemas/chatcompletions.go (3)
  • ChatStreamOptions (237-240)
  • ChatTool (252-257)
  • ChatToolChoiceFunction (444-446)
🪛 LanguageTool
docs/contributing/adding-a-provider.mdx

[uncategorized] ~1785-~1785: The official name of this software platform is spelled with a capital “H”.
Context: ....yml- ✅ Environment variable added to.github/workflows/release-pipeline.yml` (4 jobs...

(GITHUB)


[grammar] ~2000-~2000: Use a hyphen to join words.
Context: ...ror or incorrect stream handling Tool calling tests fail: - Solution: Ve...

(QB_NEW_EN_HYPHEN)

🔇 Additional comments (27)
.github/workflows/release-pipeline.yml (1)

118-118: Environment variable naming needs verification against actual usage.

The changes correctly add HUGGING_FACE_API_KEY to all four release jobs (lines 118, 193, 270, 359) following the same underscore pattern as other provider keys. However, verification reveals:

  • Standard Hugging Face documentation specifies HUGGINGFACE_API_KEY (without underscores) for authentication
  • The codebase uses HUGGING_FACE_API_KEY with underscores (consistent with other providers like ELEVENLABS_API_KEY)
  • No direct code references to HUGGING_FACE_API_KEY were found in tests or integrations
  • Test configuration (conftest.py) doesn't include this variable in its API keys fixture

Confirm that the chosen naming (HUGGING_FACE_API_KEY) aligns with how Hugging Face libraries or test code actually consume the variable in this project, and verify the GitHub secret is configured in repository settings.

core/internal/testutil/responses_stream.go (1)

693-693: LGTM: Appropriate threshold increase for comprehensive streaming tests.

The increase from 100 to 300 chunks is reasonable given the enhanced streaming capabilities and new provider support (HuggingFace). This safety guard still prevents infinite loops while accommodating more comprehensive lifecycle event sequences.

core/providers/openai/openai.go (1)

1047-1054: LGTM: Proper reasoning content handling in streaming.

The broadened emission condition now correctly handles reasoning-related delta fields (Delta.Reasoning and Delta.ReasoningDetails) in addition to regular content, ensuring reasoning model responses stream properly. This aligns with the OpenAI reasoning API patterns.

Based on learnings, this supports the ResponsesReasoning API behavior where reasoning blocks are properly identified and streamed.

core/internal/testutil/account.go (4)

114-114: LGTM: HuggingFace correctly added to configured providers.

The addition properly integrates HuggingFace into the test infrastructure's provider list.


327-334: LGTM: HuggingFace key configuration is correct.

The API key sourcing from HUGGING_FACE_API_KEY is consistent with the documented environment variable. The absence of UseForBatchAPI aligns with the HuggingFace provider not supporting batch operations in the current implementation.


589-601: LGTM: Well-tuned configuration for HuggingFace Inference API.

The configuration appropriately accounts for HuggingFace's characteristics:

  • 300s timeout: Accommodates model cold starts on Inference API
  • 10 retries with 2s-30s backoff: Provides resilience matching other cloud providers
  • Concurrency (4) and buffer (10): Consistent with other provider configurations

1020-1053: LGTM: Comprehensive HuggingFace test configuration.

The configuration properly defines HuggingFace's capabilities:

  • Model selection: Uses appropriate HuggingFace Inference API models across chat, vision, embedding, transcription, and speech synthesis
  • Scenarios: Correctly enables supported features (chat, streaming, tools, vision, audio) while appropriately disabling unsupported ones (reasoning, batch/file operations)
  • Fallbacks: Includes OpenAI gpt-4o-mini for test resilience

The reasoning scenario is correctly set to false as HuggingFace doesn't provide a native reasoning API equivalent to OpenAI's o1 models.

core/schemas/transcriptions.go (1)

37-40: LGTM: Well-documented HuggingFace generation parameters.

The four new fields (MaxLength, MinLength, MaxNewTokens, MinNewTokens) are properly implemented:

  • Clear documentation: Each field's comment explicitly indicates HuggingFace usage
  • Correct types: Optional pointers (*int) with omitempty allow provider-specific flexibility
  • API alignment: Parameters match HuggingFace Inference API's automatic-speech-recognition generation controls

The per-field documentation approach is clearer and more maintainable than a grouped comment.

transports/changelog.md (1)

1-1: Changelog entry is fine as-is.

The new line succinctly documents the HuggingFace+UI addition; no changes needed.

core/providers/gemini/speech.go (1)

169-176: Using utils.DetectAudioMimeType here is a good consolidation.

Swapping the inline MIME sniffing for utils.DetectAudioMimeType(bifrostResp.Audio) keeps Gemini’s synthetic response in sync with the shared audio detection logic used elsewhere, and reduces duplication. No issues spotted.

core/bifrost.go (1)

26-26: HuggingFace wiring into createBaseProvider is consistent with other providers.

Adding the HuggingFace import and case schemas.HuggingFace: return huggingface.NewHuggingFaceProvider(config, bifrost.logger), nil follows the same pattern as OpenAI/Mistral/Elevenlabs/OpenRouter, and integrates cleanly with the existing targetProviderKey / custom-provider logic. No issues spotted here.

Also applies to: 1889-1890

core/providers/utils/utils.go (1)

1049-1052: Verify HuggingFace Inference API streaming behavior against current documentation.

HuggingFace's TGI supports the Messages API, which is fully compatible with the OpenAI Chat Completion API, and you can use OpenAI's client libraries or third-party libraries expecting OpenAI schema to interact with TGI's Messages API. However, the available documentation does not explicitly confirm whether HuggingFace sends a [DONE] sentinel marker at stream termination or simply closes the connection. OpenAI sends a final chunk with "[DONE]" to indicate the end of the stream, and if HuggingFace's API is truly fully compatible with OpenAI's, it should follow the same pattern. Confirm whether this implementation detail still holds true for the current version of the Inference API or Inference Endpoints you're targeting.

docs/features/providers/huggingface.mdx (1)

1-254: Documentation looks comprehensive and well-structured.

The documentation thoroughly covers HuggingFace provider implementation details including model aliasing, request format differences across inference providers, and the capability matrix. The past review feedback has been addressed.

One minor observation: The note at line 197 clarifying the checkmark convention is helpful for readers.

core/providers/huggingface/chat.go (1)

78-101: ToolChoice handling is well-implemented.

The logic correctly handles both string-based tool choices (auto, none, required) and structured function-based tool choices. The guard at line 99 ensures hfReq.ToolChoice is only set when valid.

core/providers/huggingface/utils.go (3)

130-147: Model parsing logic is correct and handles edge cases.

The splitIntoModelProvider function properly handles:

  • t == 0 (no slashes): returns error
  • t == 1 (one slash): sets provider to auto and uses full string as model
  • t > 1 (multiple slashes): splits on first slash for provider, rest for model

This correctly handles model formats like hf-inference/meta-llama/Llama-3-8B-Instruct.


213-267: Model mapping cache implementation looks solid.

The caching logic properly:

  1. Checks cache first before making HTTP requests
  2. Uses type assertion to validate cached data
  3. Stores mappings in cache only when non-nil
  4. Returns properly structured errors

The error handling at lines 243-246 now correctly guards against empty messages, preserving fallback messages.


318-322: Defensive copy of audio data prevents use-after-free.

The explicit copy at line 319 correctly prevents potential use-after-free issues since fasthttp response bodies reference internal buffers that get recycled.

docs/contributing/adding-a-provider.mdx (2)

7-13: Excellent quick reference section.

The note at the beginning pointing to specific provider implementations for reference is very helpful for contributors.


64-69: Clear file creation order guidance.

The explicit ordering (types.go → utils.go → feature files → provider.go → tests) is critical for maintainability and is well-documented.

core/providers/huggingface/huggingface.go (8)

29-63: Response pooling implementation is correct.

The sync.Pool pattern for HuggingFaceTranscriptionResponse and HuggingFaceSpeechResponse is properly implemented:

  • Acquire functions reset the struct before returning
  • Release functions check for nil before putting back
  • Both Speech (lines 727-728) and Transcription (lines 825-826) properly acquire and defer release

The pool pre-warming in the constructor (lines 78-81) helps reduce allocation pressure during startup.


111-188: Cache-first retry logic is well-designed.

The completeRequestWithModelAliasCache function implements an efficient cache-first pattern:

  1. Uses cached model mapping for initial request
  2. On 404, clears cache and re-fetches mapping
  3. Retries with updated model ID

This minimizes API calls (1 call on cache hit, 3 on miss) as clarified by the author in past comments. The embedding model field update logic at lines 136-144 and 161-168 correctly handles the retry scenario.


220-238: Error handling preserves fallback messages correctly.

The guarded assignments at lines 226-228 and 233-234 ensure that:

  • bifrostErr.Type is only set if errorResp.Type is non-empty
  • bifrostErr.Error.Message is only overwritten if errorResp.Message is non-empty

This preserves any fallback message already set by HandleProviderAPIError when JSON parsing fails.


252-398: Parallel ListModels aggregation is robust.

The implementation:

  1. Spawns concurrent goroutines for each inference provider
  2. Properly uses channels and WaitGroup for coordination
  3. Aggregates successful results and tracks first error
  4. Calculates average latency across successful responses
  5. Combines raw responses when enabled

The channel closure in the background goroutine (lines 343-346) ensures the range loop terminates correctly.


546-563: Streaming delegates to shared OpenAI handler.

The ChatCompletionStream correctly uses openai.HandleOpenAIChatCompletionStreaming with a custom request converter. This reuses battle-tested streaming logic while allowing HuggingFace-specific request transformation.


584-596: ResponsesStream context flag for conversion.

The context value BifrostContextKeyIsResponsesToChatCompletionFallback at line 589 properly signals downstream handlers that this is a Responses-to-ChatCompletion fallback flow.


794-798: Input validation for hf-inference transcription is correct.

The nil/empty check at lines 795-797 properly guards against the panic that was identified in past review comments. The error message clearly indicates the requirement.


863-911: Unsupported operations return consistent errors.

All unsupported operations (Batch*, File*) consistently use providerUtils.NewUnsupportedOperationError with the correct request type. This provides clear feedback to callers.

@qwerty-dvorak qwerty-dvorak force-pushed the 12-05-feat_huggingface_provider_added branch from 3ea5fc8 to 39d5ca5 Compare December 19, 2025 08:17
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (7)
core/providers/utils/audio.go (1)

92-98: Document the rationale for the 0xF6 mask.

The 0xF6 mask is correct and intentional (distinguishing AAC from MP3 Layer III), but the inline comment doesn't explain why 0xF6 is used instead of the standard 0xF0. Based on learnings, this stricter mask checks both the sync word (top 4 bits = 0xF) and the Layer field bits (bits 2-1 = 00), preventing MP3 Layer III files (which have Layer bits = 11) from being misidentified as AAC.

🔎 Suggested documentation enhancement
 	// AAC: ADIF or ADTS (0xFFF sync) - check before MP3 frame sync to avoid misclassification
 	if bytes.HasPrefix(audioData, adif) {
 		return "audio/aac"
 	}
+	// ADTS sync: 0xFF followed by top 4 bits = 0xF and Layer field = 00
+	// Mask 0xF6 checks sync (top 4 bits) AND Layer bits (bits 2-1) = 00 to distinguish from MP3 Layer III (11)
 	if len(audioData) >= 2 && audioData[0] == 0xFF && (audioData[1]&0xF6) == 0xF0 {
 		return "audio/aac"
 	}

Based on learnings, the mask 0xF6 is intentionally stricter than 0xF0 to prevent MP3 misidentification.

ui/README.md (1)

11-17: Verify provider count accuracy and consider reducing manual list maintenance.

Line 12 references "15+ AI providers"—with HuggingFace now added, please verify this count is still accurate. Additionally, per the previous reviewer's feedback (Pratham-Mishra04), consider moving the entire Key Features section to be purely link-based with descriptions pulled from external docs rather than maintaining a hardcoded list here.

core/internal/testutil/transcription.go (2)

73-97: Consider extracting fixture-reading logic to reduce duplication.

The Fal-AI/HuggingFace fixture-reading block appears 5 times with nearly identical structure (runtime.Caller → filepath construction → os.ReadFile → error check). Extracting to a helper function would improve maintainability and reduce the ~55 lines of duplicated code.

🔎 Suggested helper function approach

Add a helper function to this file:

// getAudioForTranscriptionTest returns audio data for transcription tests.
// For Fal-AI models on HuggingFace, reads from mp3 fixture; otherwise generates TTS audio.
func getAudioForTranscriptionTest(
	ctx context.Context,
	t *testing.T,
	client *bifrost.Bifrost,
	testConfig ComprehensiveTestConfig,
	speechSynthesisProvider schemas.ModelProvider,
	speechSynthesisModel string,
	text string,
	voiceType string,
	fixtureName string,
) []byte {
	if testConfig.Provider == schemas.HuggingFace && strings.HasPrefix(testConfig.TranscriptionModel, "fal-ai/") {
		_, filename, _, _ := runtime.Caller(1)
		dir := filepath.Dir(filename)
		filePath := filepath.Join(dir, "scenarios", "media", fmt.Sprintf("%s.mp3", fixtureName))
		audioData, err := os.ReadFile(filePath)
		if err != nil {
			t.Fatalf("failed to read audio fixture %s: %v", filePath, err)
		}
		return audioData
	}
	audioData, _ := GenerateTTSAudioForTest(ctx, t, client, speechSynthesisProvider, speechSynthesisModel, text, voiceType, "mp3")
	return audioData
}

Then replace each block with a single call, e.g.:

-				var transcriptionRequest *schemas.BifrostTranscriptionRequest
-				if testConfig.Provider == schemas.HuggingFace && strings.HasPrefix(testConfig.TranscriptionModel, "fal-ai/") {
-					// For Fal-AI models on HuggingFace, we have to use mp3 but fal-ai speech models only return wav
-					// So we read from a pre-generated mp3 file to avoid format issues
-					_, filename, _, _ := runtime.Caller(0)
-					dir := filepath.Dir(filename)
-					filePath := filepath.Join(dir, "scenarios", "media", fmt.Sprintf("%s.mp3", tc.name))
-					fileContent, err := os.ReadFile(filePath)
-					if err != nil {
-						t.Fatalf("failed to read audio fixture %s: %v", filePath, err)
-					}
+				audioData := getAudioForTranscriptionTest(ctx, t, client, testConfig, speechSynthesisProvider, speechSynthesisModel, tc.text, tc.voiceType, tc.name)
+				var transcriptionRequest *schemas.BifrostTranscriptionRequest
+				if testConfig.Provider == schemas.HuggingFace && strings.HasPrefix(testConfig.TranscriptionModel, "fal-ai/") {
 					transcriptionRequest = &schemas.BifrostTranscriptionRequest{
 						Provider: testConfig.Provider,
 						Model:    testConfig.TranscriptionModel,
 						Input: &schemas.TranscriptionInput{
-							File: fileContent,
+							File: audioData,
 						},

(Note: Request construction still needs conditional logic for parameters, but audio fetching is centralized.)

Also applies to: 261-278, 369-386, 463-480, 561-578


377-377: Hardcoded fixture names may be fragile.

Lines 377, 471, and 569 hardcode fixture names ("RoundTrip_Basic_MP3.mp3" and "RoundTrip_Medium_MP3.mp3") instead of deriving them from test context. If the RoundTrip tests change or fixtures are reorganized, these references may break without obvious connection.

Consider either:

  • Creating test-specific fixtures matching the test names (e.g., "Format_json.mp3", "WithCustomParameters.mp3", "Language_en.mp3"), or
  • Defining fixture name constants if reuse is intentional (e.g., const DefaultTranscriptionFixture = "RoundTrip_Basic_MP3.mp3").

Also applies to: 471-471, 569-569

core/schemas/mux.go (1)

1216-1216: Consider simplifying the condition.

The condition hasContent || (!state.TextItemHasContent && (hasReasoning || hasContent)) contains a redundant || hasContent in the inner expression. This simplifies to:

if hasContent || (!state.TextItemHasContent && hasReasoning) {

This doesn't affect correctness but improves clarity.

🔎 Suggested simplification
-if hasContent || (!state.TextItemHasContent && (hasReasoning || hasContent)) {
+if hasContent || (!state.TextItemHasContent && hasReasoning) {
core/providers/huggingface/embedding.go (1)

11-78: Consider explicitly rejecting non‑text embedding inputs for HuggingFace

ToHuggingFaceEmbeddingRequest only inspects Input.Text / Input.Texts and silently produces a request without input/inputs when callers supply only Embedding/Embeddings. For HF, vector‑to‑vector embeddings aren’t supported, so it may be clearer to fail fast instead of sending an effectively empty request body.

You could, for example, detect this case and return an error (or wrap it into a provider‑specific “unsupported input type” BifrostError) when Embedding/Embeddings are set but no text input is present.

core/providers/huggingface/types.go (1)

226-253: Consider removing unused Extra field.

Line 234 defines an Extra field with json:"-" that's never populated or used. If it's reserved for future use, add a comment explaining its purpose; otherwise, remove it to reduce maintenance burden.

📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 3ea5fc8 and 39d5ca5.

⛔ Files ignored due to path filters (6)
  • core/internal/testutil/scenarios/media/Numbers_And_Punctuation.mp3 is excluded by !**/*.mp3
  • core/internal/testutil/scenarios/media/RoundTrip_Basic_MP3.mp3 is excluded by !**/*.mp3
  • core/internal/testutil/scenarios/media/RoundTrip_Medium_MP3.mp3 is excluded by !**/*.mp3
  • core/internal/testutil/scenarios/media/RoundTrip_Technical_MP3.mp3 is excluded by !**/*.mp3
  • core/internal/testutil/scenarios/media/Technical_Terms.mp3 is excluded by !**/*.mp3
  • ui/package-lock.json is excluded by !**/package-lock.json
📒 Files selected for processing (38)
  • .github/workflows/pr-tests.yml (1 hunks)
  • .github/workflows/release-pipeline.yml (4 hunks)
  • core/bifrost.go (2 hunks)
  • core/changelog.md (1 hunks)
  • core/internal/testutil/account.go (4 hunks)
  • core/internal/testutil/responses_stream.go (1 hunks)
  • core/internal/testutil/transcription.go (6 hunks)
  • core/providers/gemini/speech.go (1 hunks)
  • core/providers/gemini/transcription.go (2 hunks)
  • core/providers/gemini/utils.go (0 hunks)
  • core/providers/huggingface/chat.go (1 hunks)
  • core/providers/huggingface/embedding.go (1 hunks)
  • core/providers/huggingface/huggingface.go (1 hunks)
  • core/providers/huggingface/huggingface_test.go (1 hunks)
  • core/providers/huggingface/models.go (1 hunks)
  • core/providers/huggingface/responses.go (1 hunks)
  • core/providers/huggingface/speech.go (1 hunks)
  • core/providers/huggingface/transcription.go (1 hunks)
  • core/providers/huggingface/types.go (1 hunks)
  • core/providers/huggingface/utils.go (1 hunks)
  • core/providers/openai/openai.go (1 hunks)
  • core/providers/utils/audio.go (1 hunks)
  • core/providers/utils/utils.go (1 hunks)
  • core/schemas/account.go (2 hunks)
  • core/schemas/bifrost.go (3 hunks)
  • core/schemas/mux.go (3 hunks)
  • core/schemas/transcriptions.go (1 hunks)
  • docs/apis/openapi.json (1 hunks)
  • docs/contributing/adding-a-provider.mdx (1 hunks)
  • docs/docs.json (1 hunks)
  • docs/features/providers/huggingface.mdx (1 hunks)
  • docs/features/providers/providers-unified-interface.mdx (2 hunks)
  • transports/changelog.md (1 hunks)
  • transports/config.schema.json (2 hunks)
  • ui/README.md (6 hunks)
  • ui/lib/constants/config.ts (2 hunks)
  • ui/lib/constants/icons.tsx (1 hunks)
  • ui/lib/constants/logs.ts (2 hunks)
💤 Files with no reviewable changes (1)
  • core/providers/gemini/utils.go
🚧 Files skipped from review as they are similar to previous changes (20)
  • core/providers/huggingface/responses.go
  • core/schemas/bifrost.go
  • core/providers/utils/utils.go
  • .github/workflows/pr-tests.yml
  • transports/changelog.md
  • transports/config.schema.json
  • core/providers/gemini/transcription.go
  • core/changelog.md
  • ui/lib/constants/logs.ts
  • ui/lib/constants/config.ts
  • docs/features/providers/providers-unified-interface.mdx
  • core/providers/openai/openai.go
  • core/providers/huggingface/speech.go
  • docs/features/providers/huggingface.mdx
  • core/providers/huggingface/models.go
  • core/schemas/transcriptions.go
  • core/providers/huggingface/transcription.go
  • core/providers/huggingface/huggingface_test.go
  • core/internal/testutil/responses_stream.go
  • docs/docs.json
👮 Files not reviewed due to content moderation or server errors (1)
  • ui/lib/constants/icons.tsx
🧰 Additional context used
📓 Path-based instructions (1)
**

⚙️ CodeRabbit configuration file

always check the stack if there is one for the current PR. do not give localized reviews for the PR, always see all changes in the light of the whole stack of PRs (if there is a stack, if there is no stack you can continue to make localized suggestions/reviews)

Files:

  • core/providers/huggingface/embedding.go
  • core/internal/testutil/account.go
  • ui/lib/constants/icons.tsx
  • core/bifrost.go
  • core/providers/gemini/speech.go
  • core/schemas/mux.go
  • docs/apis/openapi.json
  • core/internal/testutil/transcription.go
  • docs/contributing/adding-a-provider.mdx
  • core/providers/huggingface/chat.go
  • core/providers/utils/audio.go
  • core/providers/huggingface/utils.go
  • core/schemas/account.go
  • ui/README.md
  • core/providers/huggingface/types.go
  • core/providers/huggingface/huggingface.go
🧠 Learnings (6)
📚 Learning: 2025-12-09T17:07:42.007Z
Learnt from: qwerty-dvorak
Repo: maximhq/bifrost PR: 1006
File: core/schemas/account.go:9-18
Timestamp: 2025-12-09T17:07:42.007Z
Learning: In core/schemas/account.go, the HuggingFaceKeyConfig field within the Key struct is currently unused and reserved for future Hugging Face inference endpoint deployments. Do not flag this field as missing from OpenAPI documentation or require its presence in the API spec until the feature is actively implemented and used. When the feature is added, update the OpenAPI docs accordingly; otherwise, treat this field as non-breaking and not part of the current API surface.

Applied to files:

  • core/providers/huggingface/embedding.go
  • core/internal/testutil/account.go
  • core/bifrost.go
  • core/providers/gemini/speech.go
  • core/schemas/mux.go
  • core/internal/testutil/transcription.go
  • core/providers/huggingface/chat.go
  • core/providers/utils/audio.go
  • core/providers/huggingface/utils.go
  • core/schemas/account.go
  • core/providers/huggingface/types.go
  • core/providers/huggingface/huggingface.go
📚 Learning: 2025-12-15T10:34:13.855Z
Learnt from: qwerty-dvorak
Repo: maximhq/bifrost PR: 1006
File: core/providers/huggingface/chat.go:370-389
Timestamp: 2025-12-15T10:34:13.855Z
Learning: In the HuggingFace provider (core/providers/huggingface/), streaming tool call data arrives as a single delta chunk with all fields (id, type, function.name, function.arguments) present at once. Do not implement accumulation logic across deltas for streaming tool calls in this codebase; rely on single-chunk data as the guaranteed behavior. If there is existing accumulation logic in this directory, remove it or guard it behind this assumption.

Applied to files:

  • core/providers/huggingface/embedding.go
  • core/providers/huggingface/chat.go
  • core/providers/huggingface/utils.go
  • core/providers/huggingface/types.go
  • core/providers/huggingface/huggingface.go
📚 Learning: 2025-12-19T08:29:20.286Z
Learnt from: qwerty-dvorak
Repo: maximhq/bifrost PR: 1095
File: core/internal/testutil/count_tokens.go:30-67
Timestamp: 2025-12-19T08:29:20.286Z
Learning: In core/internal/testutil test files, enforce using GetTestRetryConfigForScenario() to obtain a generic retry config, then construct a typed retry config (e.g., CountTokensRetryConfig, EmbeddingRetryConfig, TranscriptionRetryConfig) with an empty Conditions slice. Copy only MaxAttempts, BaseDelay, MaxDelay, OnRetry, and OnFinalFail from the generic config. This convention should be consistently applied across all test files in this directory.

Applied to files:

  • core/internal/testutil/account.go
  • core/internal/testutil/transcription.go
📚 Learning: 2025-12-15T10:19:32.071Z
Learnt from: qwerty-dvorak
Repo: maximhq/bifrost PR: 1006
File: core/internal/testutil/transcription.go:241-258
Timestamp: 2025-12-15T10:19:32.071Z
Learning: In core/internal/testutil package, the function GenerateTTSAudioForTest returns ([]byte, string) not ([]byte, error), and it already handles errors internally by calling t.Fatalf(). Therefore, the blank identifier for the second return value is correct and no external error checking is needed.

Applied to files:

  • core/internal/testutil/transcription.go
📚 Learning: 2025-09-01T15:29:17.076Z
Learnt from: TejasGhatte
Repo: maximhq/bifrost PR: 372
File: core/providers/utils.go:552-598
Timestamp: 2025-09-01T15:29:17.076Z
Learning: The detectAudioMimeType function in core/providers/utils.go is specifically designed for Gemini's audio format support, which only includes: WAV (audio/wav), MP3 (audio/mp3), AIFF (audio/aiff), AAC (audio/aac), OGG Vorbis (audio/ogg), and FLAC (audio/flac). The implementation should remain focused on these specific formats rather than being over-engineered for general-purpose audio detection.

Applied to files:

  • core/providers/utils/audio.go
📚 Learning: 2025-12-10T15:15:14.041Z
Learnt from: qwerty-dvorak
Repo: maximhq/bifrost PR: 1006
File: core/providers/utils/audio.go:92-98
Timestamp: 2025-12-10T15:15:14.041Z
Learning: In core/providers/utils/audio.go, within DetectAudioMimeType, use a mask of 0xF6 for ADTS sync detection instead of the standard 0xF0. This stricter check validates that the top nibble is 0xF and the Layer field bits (bits 2-1) are 00, preventing MP3 Layer III (Layer bits 11) from being misidentified as AAC. Ensure unit tests cover this behavior and document the rationale in code comments.

Applied to files:

  • core/providers/utils/audio.go
🧬 Code graph analysis (9)
core/providers/huggingface/embedding.go (3)
core/schemas/embedding.go (4)
  • BifrostEmbeddingRequest (9-16)
  • BifrostEmbeddingResponse (22-28)
  • EmbeddingData (118-122)
  • EmbeddingStruct (124-128)
core/providers/huggingface/types.go (3)
  • HuggingFaceEmbeddingRequest (161-172)
  • InputsCustomType (174-177)
  • EncodingType (219-219)
core/schemas/chatcompletions.go (1)
  • BifrostLLMUsage (845-852)
core/internal/testutil/account.go (3)
core/schemas/bifrost.go (2)
  • HuggingFace (51-51)
  • Fallback (149-152)
core/schemas/provider.go (4)
  • ProviderConfig (267-276)
  • NetworkConfig (48-56)
  • ConcurrencyAndBufferSize (131-134)
  • Provider (314-361)
core/internal/testutil/cross_provider_scenarios.go (1)
  • ProviderConfig (45-53)
core/bifrost.go (2)
core/schemas/bifrost.go (1)
  • HuggingFace (51-51)
core/providers/huggingface/huggingface.go (1)
  • NewHuggingFaceProvider (66-99)
core/providers/gemini/speech.go (1)
core/providers/utils/audio.go (1)
  • DetectAudioMimeType (78-119)
core/schemas/mux.go (3)
core/providers/gemini/types.go (2)
  • Content (977-985)
  • Type (782-782)
core/schemas/responses.go (2)
  • BifrostResponsesStreamResponse (1440-1479)
  • ResponsesStreamResponseTypeOutputTextDelta (1388-1388)
core/schemas/utils.go (1)
  • Ptr (16-18)
core/internal/testutil/transcription.go (4)
core/schemas/transcriptions.go (3)
  • BifrostTranscriptionRequest (3-10)
  • TranscriptionInput (28-30)
  • TranscriptionParameters (32-49)
core/internal/testutil/utils.go (3)
  • GetProviderVoice (39-87)
  • GetErrorMessage (642-675)
  • GenerateTTSAudioForTest (568-640)
core/schemas/speech.go (4)
  • BifrostSpeechRequest (9-16)
  • SpeechParameters (43-58)
  • SpeechVoiceInput (65-68)
  • BifrostSpeechResponse (22-29)
core/internal/testutil/test_retry_framework.go (5)
  • GetTestRetryConfigForScenario (1118-1150)
  • TestRetryContext (168-173)
  • SpeechRetryConfig (216-223)
  • SpeechRetryCondition (144-147)
  • WithSpeechTestRetry (1328-1478)
core/providers/huggingface/utils.go (4)
core/schemas/models.go (1)
  • BifrostListModelsRequest (23-34)
core/providers/utils/utils.go (3)
  • GetRequestPath (221-241)
  • MakeRequestWithContext (41-95)
  • NewBifrostOperationError (733-744)
core/providers/huggingface/types.go (2)
  • HuggingFaceInferenceProviderMapping (68-71)
  • HuggingFaceHubError (146-149)
core/schemas/provider.go (1)
  • ErrProviderResponseDecode (29-29)
core/providers/huggingface/types.go (2)
core/schemas/models.go (1)
  • Model (109-129)
core/schemas/chatcompletions.go (3)
  • ChatStreamOptions (237-240)
  • ChatTool (252-257)
  • ChatToolChoiceFunction (444-446)
core/providers/huggingface/huggingface.go (4)
core/providers/utils/audio.go (1)
  • DetectAudioMimeType (78-119)
core/providers/huggingface/embedding.go (2)
  • ToHuggingFaceEmbeddingRequest (11-78)
  • UnmarshalHuggingFaceEmbeddingResponse (82-168)
core/providers/huggingface/speech.go (1)
  • ToHuggingFaceSpeechRequest (9-96)
core/providers/huggingface/transcription.go (1)
  • ToHuggingFaceTranscriptionRequest (11-119)
🪛 LanguageTool
docs/contributing/adding-a-provider.mdx

[uncategorized] ~1785-~1785: The official name of this software platform is spelled with a capital “H”.
Context: ....yml- ✅ Environment variable added to.github/workflows/release-pipeline.yml` (4 jobs...

(GITHUB)


[grammar] ~2000-~2000: Use a hyphen to join words.
Context: ...ror or incorrect stream handling Tool calling tests fail: - Solution: Ve...

(QB_NEW_EN_HYPHEN)

🔇 Additional comments (44)
core/providers/gemini/speech.go (1)

175-175: LGTM! Good refactoring to use the shared utility.

The change correctly uses the centralized utils.DetectAudioMimeType function instead of a local implementation, improving code maintainability and consistency across providers.

ui/README.md (3)

46-72: Architecture section is well-structured and accurate.

The updated architecture description correctly reflects the technology stack changes (Redux Toolkit with RTK Query) and maintains clarity on the integration model between UI and backend.


129-155: Verify RTK Query imports and exports match actual codebase structure.

The code example imports useGetLogsQuery, useCreateProviderMutation, and getErrorMessage from @/lib/store. Please confirm that this module path and these specific exports exist and are publicly available, as the actual implementation may differ from this documentation.


172-236: External documentation links are current and accessible.

The referenced documentation pages at docs.getbifrost.ai (provider configuration, governance features, observability, and MCP integration) are published and actively maintained. The benchmarking documentation at docs.getbifrost.ai/benchmarking/getting-started is accessible with current performance data. The main GitHub repository at github.com/maximhq/bifrost is active, and the documentation structure effectively reduces manual maintenance by redirecting users to external resources rather than maintaining duplicate content locally.

core/schemas/mux.go (1)

1155-1242: LGTM: Reasoning-only response support is well-implemented.

The gating logic correctly handles reasoning-only models by:

  • Creating text items when reasoning is present but no content has been emitted yet (line 1159)
  • Emitting an empty delta on the first reasoning-only chunk to satisfy lifecycle validation (lines 1219-1225)
  • Tracking whether the text item has received any delta via TextItemHasContent (line 1240)
  • Closing text items regardless of content presence to support reasoning-only responses (line 1411)

The logic properly handles content-only, reasoning-only, and mixed content+reasoning scenarios.

Also applies to: 1410-1411

docs/contributing/adding-a-provider.mdx (7)

7-38: Documentation structure and introduction are well-organized.

The quick reference to existing providers and clear explanation of the gateway pattern provide excellent context for contributors. The explicit instructions for registering in core/schemas/bifrost.go are actionable and specific.


41-85: Directory structure guidance is comprehensive and practical.

The clear distinction between OpenAI-compatible and custom API providers, along with the emphasized file creation order, helps prevent common implementation mistakes. The references to actual provider implementations (core/providers/huggingface/, core/providers/cerebras/) are valuable for contributors.


86-627: File conventions are thorough and enforce excellent separation of concerns.

The strict rules (marked with "CRITICAL RULE") combined with real examples (especially the HuggingFace chat converter on lines 412-528) provide clear patterns for contributors. The emphasis on naming conventions, nil checks, and pure transformation functions promotes maintainable code.


629-971: OpenAI-compatible provider guidance is clear and complete.

The section effectively explains when and how to leverage the OpenAI handler functions, with complete examples from the Cerebras provider. The distinction between constructor return types (with/without error) and the advantages of this approach are well-articulated.


972-1487: Implementation steps are well-structured and comprehensive.

The phase-by-phase approach with explicit ordering, checklists, and the planning document template (lines 989-1021) provides a clear roadmap. The note on line 1405 about omitted code being for brevity (with reference to complete examples) appropriately addresses the documentation's reference nature.


1489-1788: UI integration steps are comprehensive and actionable.

The 8-step process with specific file paths, code examples, and a complete checklist (lines 1770-1786) ensures contributors don't miss any integration points. Each step includes context about where to make changes and why.


1791-2071: Testing guidelines are thorough and practical.

The complete test file example, scenario configuration table (lines 1884-1896), and troubleshooting section (lines 1978-2003) provide contributors with everything needed to write comprehensive tests. The final pre-submission checklist organized by category ensures nothing is missed.

docs/apis/openapi.json (1)

4413-4437: ModelProvider enum update for Hugging Face looks consistent

Adding "huggingface" here matches the new provider wiring and keeps the public API enum in sync with backend support; no further schema changes needed in this file.

.github/workflows/release-pipeline.yml (1)

90-121: HUGGING_FACE_API_KEY propagation across release jobs is correct

Wiring HUGGING_FACE_API_KEY into all four release steps is consistent with other provider keys and should unblock Hugging Face–backed tests/releases in these pipelines. Please just confirm the corresponding GitHub secret is defined in the target environment so these jobs don’t start failing at runtime.

Also applies to: 165-195, 242-271, 327-360

core/schemas/account.go (1)

9-20: HuggingFace key config wiring is structurally sound and backward‑compatible

The new HuggingFaceKeyConfig type and Key.HuggingFaceKeyConfig field follow the same pattern as Azure/Vertex/Bedrock configs (pointer + omitempty, deployments map), so this won’t break existing payloads and cleanly reserves space for future Hugging Face deployment mappings. Based on learnings, it’s fine that this is not yet surfaced/used elsewhere in the API surface.

Also applies to: 70-72

core/bifrost.go (2)

26-26: Import wiring for HuggingFace provider looks correct

Importing core/providers/huggingface alongside other providers is consistent and required for the new switch case below. No issues here.


1889-1890: HuggingFace registered in createBaseProvider consistently

The new schemas.HuggingFace case mirrors other providers that construct a concrete provider and return it with a nil error, so HF will participate correctly in init, updates, and fallbacks.

core/internal/testutil/account.go (4)

95-118: HuggingFace added to comprehensive test providers

Including schemas.HuggingFace in GetConfiguredProviders keeps it aligned with the concrete configs and scenario table below, so HF will be exercised in comprehensive tests.


327-334: Test key configuration for HuggingFace is consistent

The HuggingFace test key is sourced from HUGGING_FACE_API_KEY with empty Models and weight 1.0, matching the pattern used for other single-key providers. Omitting UseForBatchAPI is fine given HF scenarios currently don’t enable batch/file APIs.


589-601: ProviderConfig defaults for HuggingFace are reasonable

A 300s default timeout plus 10 retries and moderate backoff windows are in line with other “variable” cloud providers and give HF some resilience to cold starts without being extreme. Concurrency and buffer reuse the shared Concurrency constant, which keeps tests consistent.


1020-1053: HuggingFace test scenario entry is well‑shaped

The HuggingFace ComprehensiveTestConfig wires chat, vision, embedding, transcription, and TTS models plus scenarios and an OpenAI fallback, mirroring how other providers are modeled. As long as these booleans match real HF capabilities, this should integrate cleanly into the cross‑provider test matrix.

core/providers/huggingface/chat.go (1)

11-105: Chat request conversion to HuggingFace format looks correct

The helper cleanly maps all standard chat parameters, response format, streaming options, tools, and tool choice into the HF request struct, and now surfaces ResponseFormat conversion failures instead of silently dropping them. The nil‑guard at the top is safe given upstream validation.

core/providers/huggingface/embedding.go (1)

80-168: Embedding response unmarshal covers expected HF shapes

UnmarshalHuggingFaceEmbeddingResponse sensibly tries the structured object, then 2D array, then 1D array, always normalizing into BifrostEmbeddingResponse with consistent object="list" and a non‑nil Usage. This should handle the common HF embedding response variants without surprising callers.

core/providers/huggingface/utils.go (5)

17-81: Inference provider enums and registry are coherent

The inferenceProvider constants, INFERENCE_PROVIDERS, and PROVIDERS_OR_POLICIES give a clear, type‑safe catalog of supported HF providers plus the auto policy, matching how the rest of the provider code expects to route requests. Using a precomputed slice keeps call‑site code simple.


83-147: Model hub and provider URL helpers, plus model parsing, are well‑structured

buildModelHubURL and buildModelInferenceProviderURL correctly assemble the Hub API URLs with pagination, sorting, and an inference_provider filter, while splitIntoModelProvider cleanly distinguishes explicit provider prefixes (>=2 slashes) from the auto case (org/model) and rejects obviously invalid names (no slash). This aligns with the model IDs used in the test configs.


149-193: Routing only embedding/speech/transcription through supported providers is appropriate

getInferenceProviderRouteURL restricts routing to the subset of HF providers that actually support embeddings, text‑to‑speech, or transcription and returns a clear error otherwise, which is consistent with its use only in those code paths. The hf‑inference pipeline selection by RequestType also looks correct.


195-267: Provider‑model mapping cache and validation are robust

convertToInferenceProviderMappings, getModelInferenceProviderMapping, and getValidatedProviderModelID combine to fetch, cache, and validate provider‑specific model IDs with good error handling (HTTP status, decode failures, unsupported operations). Using a sync.Map keyed by HF model name avoids redundant Hub calls without adding contention.


294-344: Audio download and MIME normalization utilities are safe and context‑aware

downloadAudioFromURL now uses MakeRequestWithContext, checks status codes, decodes the body safely, and returns a copied byte slice to avoid use‑after‑free issues. getMimeTypeForAudioType provides a sensible default (audio/mpeg) and normalizes audio/mp3 while passing other audio/* types through unchanged.

core/providers/huggingface/huggingface.go (10)

18-27: LGTM!

The provider struct is well-designed with appropriate fields for HTTP client configuration, caching, and custom provider support. The modelProviderMappingCache using sync.Map is a good choice for concurrent access patterns.


65-99: LGTM!

The constructor properly configures the HTTP client with reasonable defaults (5000 max connections per host, 60s idle timeout, 10s wait timeout), pre-warms response pools based on concurrency settings, and handles proxy configuration and base URL defaulting correctly.


111-188: Cache-first optimization working as intended.

The retry logic efficiently handles model alias resolution by attempting the cached mapping first, then clearing the cache and re-validating only on 404. This minimizes API calls for high cache hit rates.

Based on learnings, this cache-first pattern was confirmed as the intended design.


190-250: LGTM!

The request execution properly handles:

  • Audio vs. JSON content types with MIME detection
  • Authorization headers
  • Error response parsing with fallback message preservation (addressed from past review)
  • Body copying to prevent use-after-free with fasthttp's internal buffer

252-398: LGTM!

The model listing implementation efficiently queries multiple inference providers in parallel using goroutines, properly aggregates results, calculates average latency, and handles partial failures gracefully. Resource cleanup with defer statements is correct.


400-425: LGTM!

ListModels properly checks operation permissions and delegates to the multi-key handler. TextCompletion and TextCompletionStream correctly return unsupported operation errors with the proper request types (addressed from past reviews).


427-564: LGTM!

Chat completion methods properly:

  • Parse and validate model names with descriptive errors (addressed from past review)
  • Use direct struct allocation instead of pooling to avoid leaks (addressed from past review)
  • Delegate streaming to OpenAI-compatible helper with custom request converter
  • Set all required ExtraFields for observability

566-681: LGTM!

Responses/ResponsesStream properly fallback to chat completion with context tracking. Embedding implementation:

  • Validates model names with descriptive errors
  • Uses cache-aware retry logic for model alias resolution
  • Handles multiple response formats via custom unmarshaling
  • Properly tracks raw request/response data when enabled

683-861: LGTM!

Speech and Transcription methods properly:

  • Acquire and release pooled responses with defer (preventing leaks)
  • Validate task types correctly (text-to-speech vs automatic-speech-recognition, addressed from past reviews)
  • Handle hf-inference raw audio special case with nil checks (addressed from past review)
  • Pass context to downloadAudioFromURL (addressed from past review)
  • Use proper error types for unsupported streaming operations

863-911: LGTM!

Batch and file operations correctly return unsupported operation errors with appropriate request types, as these features are not supported by the HuggingFace provider.

core/providers/huggingface/types.go (6)

12-52: LGTM!

Model types are well-defined, and the custom UnmarshalJSON for HuggingFaceListModelsResponse correctly handles both the current array format [...] and legacy object format {"models": [...]}, with a descriptive error for unexpected formats (addressed from past review).


54-71: LGTM!

Inference provider mapping types are correct, with ProviderModelID properly reflecting the provider-specific model ID from the HuggingFace API (addressed from past review).


73-143: LGTM!

Chat types properly represent HuggingFace request structures:

  • HuggingFaceToolChoice correctly handles both enum string ("auto", "none", "required") and object forms with custom MarshalJSON
  • Reuses common schema types (schemas.ChatMessage, schemas.ChatTool) for consistency
  • Response format and JSON schema types support structured outputs

145-155: LGTM!

Error types appropriately distinguish between Hub API errors and Inference API errors, with proper fields for error parsing and reporting.


157-224: LGTM!

Embedding types properly handle flexible input formats:

  • InputsCustomType custom marshaling/unmarshaling supports string, array, and object representations
  • HuggingFaceEmbeddingRequest separates Input (for most providers) from Inputs (for hf-inference)
  • EncodingType enum supports both float and base64 encoding formats

255-342: LGTM!

Transcription types properly represent HuggingFace ASR API:

  • HuggingFaceTranscriptionEarlyStopping correctly handles both boolean and string "never" values with proper error handling (addressed from past review)
  • Generation parameters provide comprehensive control over text generation
  • Response types support both full text and timestamped chunks

@qwerty-dvorak qwerty-dvorak force-pushed the 12-05-feat_huggingface_provider_added branch 2 times, most recently from ef5f796 to 3e8d6d7 Compare December 19, 2025 08:57
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (3)
ui/README.md (1)

1-243: Significantly improved documentation structure with good external links, but consider the extent of inline documentation.

The comprehensive updates (terminology, architecture refresh, new Quick Links section) meaningfully address documentation clarity. The shift toward docs.getbifrost.ai links is helpful. However, this partially addresses the past reviewer's suggestion to "redirect everything to the docs" — the README still contains substantial inline content (architecture diagrams, feature descriptions, code examples) that mirrors docs content. Consider whether sections like Architecture (lines 46–72), Feature descriptions (lines 76–104), and Configuration guidance (lines 172–190) could be condensed further with a note to "see documentation for details" to avoid drift between README and canonical docs.

Also note: The HuggingFace provider integration (the core of this PR) is not explicitly mentioned in the README. The generic "15+ providers" link (line 12) will include it, but no specific acknowledgment of the new provider is present.

ui/lib/constants/config.ts (1)

40-40: Both model identifier examples use non-standard 3-part format.

The previous review correctly identified that nebius/Qwen/Qwen3-Embedding-8B is not a standard HuggingFace model ID. Additionally, sambanova/meta-llama/Llama-3.1-8B-Instruct has the same issue—both use 3-part paths that appear to be deployment-specific routing identifiers rather than standard HuggingFace Hub model IDs.

Standard HuggingFace model identifiers follow the 2-part organization/model-name format. Update to use examples from the PR objectives:

🔎 Suggested fix
-	huggingface: "e.g. sambanova/meta-llama/Llama-3.1-8B-Instruct, nebius/Qwen/Qwen3-Embedding-8B",
+	huggingface: "e.g. meta-llama/Llama-3.1-8B-Instruct, google/gemma-2-9b-it",
core/providers/huggingface/transcription.go (1)

11-47: fal-ai branch missing Model and Provider fields in the request struct.

The past review comment noted that fal-ai's API requires Model and Provider fields. The non-fal-ai branch at lines 32-36 correctly sets these fields, but the fal-ai branch at lines 44-46 only sets AudioURL. This may cause API failures.

🔎 Suggested fix
 		hfRequest = &HuggingFaceTranscriptionRequest{
 			AudioURL: encoded,
+			Model:    schemas.Ptr(modelName),
+			Provider: schemas.Ptr(string(inferenceProvider)),
 		}
🧹 Nitpick comments (2)
core/internal/testutil/transcription.go (1)

261-278: Consider extracting repeated fixture path discovery logic.

The pattern of using runtime.Caller(0) to locate and read fixture files is duplicated across 6 locations. Consider extracting this into a helper function for maintainability.

🔎 Suggested helper function
// getTestFixturePath returns the path to a test fixture file relative to the test source.
func getTestFixturePath(fixtureName string) string {
    _, filename, _, _ := runtime.Caller(1) // Caller of this function
    dir := filepath.Dir(filename)
    return filepath.Join(dir, "scenarios", "media", fmt.Sprintf("%s.mp3", fixtureName))
}

// readTestFixture reads a test fixture file, failing the test if not found.
func readTestFixture(t *testing.T, fixtureName string) []byte {
    filePath := getTestFixturePath(fixtureName)
    data, err := os.ReadFile(filePath)
    if err != nil {
        t.Fatalf("failed to read audio fixture %s: %v", filePath, err)
    }
    return data
}

Also applies to: 369-386, 463-480, 561-578

core/providers/huggingface/types.go (1)

226-253: Consider removing unused Extra field from HuggingFaceSpeechRequest.

The Extra map[string]any field at line 234 has json:"-" tag and is never populated or used in the codebase. Per the past review discussion, consider removing it or documenting its intended purpose.

📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 39d5ca5 and 3e8d6d7.

⛔ Files ignored due to path filters (6)
  • core/internal/testutil/scenarios/media/Numbers_And_Punctuation.mp3 is excluded by !**/*.mp3
  • core/internal/testutil/scenarios/media/RoundTrip_Basic_MP3.mp3 is excluded by !**/*.mp3
  • core/internal/testutil/scenarios/media/RoundTrip_Medium_MP3.mp3 is excluded by !**/*.mp3
  • core/internal/testutil/scenarios/media/RoundTrip_Technical_MP3.mp3 is excluded by !**/*.mp3
  • core/internal/testutil/scenarios/media/Technical_Terms.mp3 is excluded by !**/*.mp3
  • ui/package-lock.json is excluded by !**/package-lock.json
📒 Files selected for processing (38)
  • .github/workflows/pr-tests.yml (1 hunks)
  • .github/workflows/release-pipeline.yml (4 hunks)
  • core/bifrost.go (2 hunks)
  • core/changelog.md (1 hunks)
  • core/internal/testutil/account.go (4 hunks)
  • core/internal/testutil/responses_stream.go (1 hunks)
  • core/internal/testutil/transcription.go (6 hunks)
  • core/providers/gemini/speech.go (1 hunks)
  • core/providers/gemini/transcription.go (2 hunks)
  • core/providers/gemini/utils.go (0 hunks)
  • core/providers/huggingface/chat.go (1 hunks)
  • core/providers/huggingface/embedding.go (1 hunks)
  • core/providers/huggingface/huggingface.go (1 hunks)
  • core/providers/huggingface/huggingface_test.go (1 hunks)
  • core/providers/huggingface/models.go (1 hunks)
  • core/providers/huggingface/responses.go (1 hunks)
  • core/providers/huggingface/speech.go (1 hunks)
  • core/providers/huggingface/transcription.go (1 hunks)
  • core/providers/huggingface/types.go (1 hunks)
  • core/providers/huggingface/utils.go (1 hunks)
  • core/providers/openai/openai.go (1 hunks)
  • core/providers/utils/audio.go (1 hunks)
  • core/providers/utils/utils.go (1 hunks)
  • core/schemas/account.go (2 hunks)
  • core/schemas/bifrost.go (3 hunks)
  • core/schemas/mux.go (3 hunks)
  • core/schemas/transcriptions.go (1 hunks)
  • docs/apis/openapi.json (1 hunks)
  • docs/contributing/adding-a-provider.mdx (1 hunks)
  • docs/docs.json (1 hunks)
  • docs/features/providers/huggingface.mdx (1 hunks)
  • docs/features/providers/supported-providers.mdx (3 hunks)
  • transports/changelog.md (1 hunks)
  • transports/config.schema.json (2 hunks)
  • ui/README.md (6 hunks)
  • ui/lib/constants/config.ts (2 hunks)
  • ui/lib/constants/icons.tsx (1 hunks)
  • ui/lib/constants/logs.ts (2 hunks)
💤 Files with no reviewable changes (1)
  • core/providers/gemini/utils.go
🚧 Files skipped from review as they are similar to previous changes (19)
  • core/schemas/bifrost.go
  • .github/workflows/pr-tests.yml
  • core/providers/gemini/transcription.go
  • .github/workflows/release-pipeline.yml
  • core/internal/testutil/responses_stream.go
  • transports/changelog.md
  • core/providers/openai/openai.go
  • transports/config.schema.json
  • core/providers/huggingface/responses.go
  • core/providers/gemini/speech.go
  • core/bifrost.go
  • core/providers/huggingface/huggingface_test.go
  • core/providers/huggingface/speech.go
  • docs/features/providers/huggingface.mdx
  • core/changelog.md
  • core/schemas/transcriptions.go
  • docs/apis/openapi.json
  • core/providers/utils/audio.go
  • ui/lib/constants/logs.ts
👮 Files not reviewed due to content moderation or server errors (1)
  • ui/lib/constants/icons.tsx
🧰 Additional context used
📓 Path-based instructions (1)
**

⚙️ CodeRabbit configuration file

always check the stack if there is one for the current PR. do not give localized reviews for the PR, always see all changes in the light of the whole stack of PRs (if there is a stack, if there is no stack you can continue to make localized suggestions/reviews)

Files:

  • core/schemas/account.go
  • ui/lib/constants/config.ts
  • docs/features/providers/supported-providers.mdx
  • core/providers/huggingface/embedding.go
  • core/providers/huggingface/chat.go
  • core/schemas/mux.go
  • ui/lib/constants/icons.tsx
  • core/providers/huggingface/transcription.go
  • core/providers/huggingface/models.go
  • ui/README.md
  • core/providers/utils/utils.go
  • core/providers/huggingface/utils.go
  • docs/contributing/adding-a-provider.mdx
  • core/internal/testutil/account.go
  • core/internal/testutil/transcription.go
  • docs/docs.json
  • core/providers/huggingface/huggingface.go
  • core/providers/huggingface/types.go
🧠 Learnings (4)
📚 Learning: 2025-12-09T17:07:42.007Z
Learnt from: qwerty-dvorak
Repo: maximhq/bifrost PR: 1006
File: core/schemas/account.go:9-18
Timestamp: 2025-12-09T17:07:42.007Z
Learning: In core/schemas/account.go, the HuggingFaceKeyConfig field within the Key struct is currently unused and reserved for future Hugging Face inference endpoint deployments. Do not flag this field as missing from OpenAPI documentation or require its presence in the API spec until the feature is actively implemented and used. When the feature is added, update the OpenAPI docs accordingly; otherwise, treat this field as non-breaking and not part of the current API surface.

Applied to files:

  • core/schemas/account.go
  • core/providers/huggingface/embedding.go
  • core/providers/huggingface/chat.go
  • core/schemas/mux.go
  • core/providers/huggingface/transcription.go
  • core/providers/huggingface/models.go
  • core/providers/utils/utils.go
  • core/providers/huggingface/utils.go
  • core/internal/testutil/account.go
  • core/internal/testutil/transcription.go
  • core/providers/huggingface/huggingface.go
  • core/providers/huggingface/types.go
📚 Learning: 2025-12-15T10:34:13.855Z
Learnt from: qwerty-dvorak
Repo: maximhq/bifrost PR: 1006
File: core/providers/huggingface/chat.go:370-389
Timestamp: 2025-12-15T10:34:13.855Z
Learning: In the HuggingFace provider (core/providers/huggingface/), streaming tool call data arrives as a single delta chunk with all fields (id, type, function.name, function.arguments) present at once. Do not implement accumulation logic across deltas for streaming tool calls in this codebase; rely on single-chunk data as the guaranteed behavior. If there is existing accumulation logic in this directory, remove it or guard it behind this assumption.

Applied to files:

  • core/providers/huggingface/embedding.go
  • core/providers/huggingface/chat.go
  • core/providers/huggingface/transcription.go
  • core/providers/huggingface/models.go
  • core/providers/huggingface/utils.go
  • core/providers/huggingface/huggingface.go
  • core/providers/huggingface/types.go
📚 Learning: 2025-12-19T08:29:20.286Z
Learnt from: qwerty-dvorak
Repo: maximhq/bifrost PR: 1095
File: core/internal/testutil/count_tokens.go:30-67
Timestamp: 2025-12-19T08:29:20.286Z
Learning: In core/internal/testutil test files, enforce using GetTestRetryConfigForScenario() to obtain a generic retry config, then construct a typed retry config (e.g., CountTokensRetryConfig, EmbeddingRetryConfig, TranscriptionRetryConfig) with an empty Conditions slice. Copy only MaxAttempts, BaseDelay, MaxDelay, OnRetry, and OnFinalFail from the generic config. This convention should be consistently applied across all test files in this directory.

Applied to files:

  • core/internal/testutil/account.go
  • core/internal/testutil/transcription.go
📚 Learning: 2025-12-15T10:19:32.071Z
Learnt from: qwerty-dvorak
Repo: maximhq/bifrost PR: 1006
File: core/internal/testutil/transcription.go:241-258
Timestamp: 2025-12-15T10:19:32.071Z
Learning: In core/internal/testutil package, the function GenerateTTSAudioForTest returns ([]byte, string) not ([]byte, error), and it already handles errors internally by calling t.Fatalf(). Therefore, the blank identifier for the second return value is correct and no external error checking is needed.

Applied to files:

  • core/internal/testutil/transcription.go
🧬 Code graph analysis (6)
core/schemas/account.go (1)
ui/lib/types/config.ts (3)
  • AzureKeyConfig (23-27)
  • VertexKeyConfig (36-42)
  • BedrockKeyConfig (63-71)
core/providers/huggingface/embedding.go (4)
core/schemas/embedding.go (4)
  • BifrostEmbeddingRequest (9-16)
  • BifrostEmbeddingResponse (22-28)
  • EmbeddingData (118-122)
  • EmbeddingStruct (124-128)
core/providers/huggingface/types.go (3)
  • HuggingFaceEmbeddingRequest (161-172)
  • InputsCustomType (174-177)
  • EncodingType (219-219)
core/schemas/models.go (1)
  • Model (109-129)
core/schemas/chatcompletions.go (1)
  • BifrostLLMUsage (845-852)
core/providers/huggingface/models.go (3)
core/providers/huggingface/types.go (1)
  • HuggingFaceListModelsResponse (28-30)
core/schemas/bifrost.go (9)
  • ModelProvider (32-32)
  • RequestType (88-88)
  • ChatCompletionRequest (94-94)
  • ChatCompletionStreamRequest (95-95)
  • ResponsesRequest (96-96)
  • ResponsesStreamRequest (97-97)
  • EmbeddingRequest (98-98)
  • SpeechRequest (99-99)
  • TranscriptionRequest (101-101)
core/schemas/models.go (2)
  • BifrostListModelsResponse (36-45)
  • Model (109-129)
core/providers/utils/utils.go (1)
core/schemas/bifrost.go (3)
  • Cerebras (47-47)
  • Perplexity (46-46)
  • HuggingFace (51-51)
core/internal/testutil/account.go (3)
core/schemas/bifrost.go (3)
  • HuggingFace (51-51)
  • Fallback (149-152)
  • OpenAI (35-35)
core/schemas/provider.go (5)
  • ProviderConfig (267-276)
  • NetworkConfig (48-56)
  • DefaultRequestTimeoutInSeconds (15-15)
  • ConcurrencyAndBufferSize (131-134)
  • Provider (314-361)
core/internal/testutil/cross_provider_scenarios.go (1)
  • ProviderConfig (45-53)
core/providers/huggingface/types.go (1)
core/schemas/chatcompletions.go (3)
  • ChatStreamOptions (237-240)
  • ChatTool (252-257)
  • ChatToolChoiceFunction (444-446)
🪛 LanguageTool
docs/contributing/adding-a-provider.mdx

[uncategorized] ~1785-~1785: The official name of this software platform is spelled with a capital “H”.
Context: ....yml- ✅ Environment variable added to.github/workflows/release-pipeline.yml` (4 jobs...

(GITHUB)


[grammar] ~2000-~2000: Use a hyphen to join words.
Context: ...ror or incorrect stream handling Tool calling tests fail: - Solution: Ve...

(QB_NEW_EN_HYPHEN)

🔇 Additional comments (45)
ui/lib/constants/config.ts (1)

61-61: LGTM!

Correctly marks HuggingFace as requiring an API key, which aligns with the authentication requirements specified in the PR objectives.

docs/contributing/adding-a-provider.mdx (1)

1-2071: Excellent comprehensive guide for provider contributors!

This documentation represents a significant improvement to the contributor experience. The guide is thorough, well-structured, and provides clear patterns for both OpenAI-compatible and custom API providers.

Strengths:

  • Clear separation of concerns with strict file conventions
  • Progressive examples from simple to complex
  • Multiple verification checklists throughout
  • Real working examples from existing providers (HuggingFace, Cerebras)
  • Comprehensive testing guidance with scenario configuration
  • Complete UI integration steps with all affected files
  • Helpful troubleshooting sections

Structure:

  • ✅ Phase-based implementation workflow
  • ✅ CRITICAL markers for important sequences (e.g., file creation order)
  • ✅ Separate sections for OpenAI-compatible vs. custom providers
  • ✅ Code examples follow stated conventions
  • ✅ All past review concerns have been addressed

The guide successfully balances comprehensiveness with usability, providing both high-level patterns and detailed reference implementations.

core/schemas/account.go (2)

9-20: LGTM! HuggingFaceKeyConfig follows established patterns.

The addition of HuggingFaceKeyConfig to the Key struct is consistent with other provider-specific configurations (Azure, Vertex, Bedrock). The structure and formatting align with existing conventions.

Based on learnings, this field is reserved for future Hugging Face inference endpoint deployments and is intentionally unused in the current implementation.


70-72: LGTM! HuggingFaceKeyConfig struct is properly defined.

The struct definition follows the same pattern as other provider configs with a Deployments map for model-to-deployment mappings.

docs/features/providers/supported-providers.mdx (3)

1-14: LGTM! Documentation restructure improves clarity.

The changes from "Unified Interface" to "Supported Providers" with reorganized sections (Overview, Response Format) make the documentation more focused and easier to navigate. The updated description clearly highlights Bifrost's multi-provider support and OpenAI-compatible formats.


96-96: LGTM! HuggingFace provider capabilities accurately documented.

The provider support matrix correctly lists HuggingFace capabilities:

  • ✅ Models, Chat, Chat streaming, Responses, Responses streaming, Embeddings, TTS, STT
  • ❌ Text completions, streaming variants, Batch, Files

This aligns with the implementation in the PR.


127-148: LGTM! Custom Providers and metadata sections add valuable context.

The new sections clearly explain:

  • Custom provider configurations and use cases
  • Provider metadata in the extra_fields response section
  • Configuration options with links to Go SDK and Gateway docs

These additions improve the documentation's completeness.

docs/docs.json (1)

158-166: LGTM! Provider documentation properly grouped.

The new "Providers" group logically organizes provider-related documentation (supported-providers, custom-providers, huggingface) under a unified navigation section with appropriate icon. This improves documentation discoverability and structure.

core/schemas/mux.go (2)

1155-1241: LGTM! Streaming conversion properly handles reasoning-only responses.

The updated logic correctly:

  • Gates content emission with hasContent and hasReasoning checks
  • Creates text items when content OR reasoning is present (line 1159)
  • Emits an empty delta for reasoning-only first chunks to satisfy lifecycle requirements (lines 1220-1225)
  • Tracks content state with TextItemHasContent flag

This implementation supports models that output reasoning without text content, maintaining proper OpenAI-compatible streaming event sequencing.


1410-1411: LGTM! Text item closure supports reasoning-only responses.

Removing the dependency on TextItemHasContent for closure (line 1411) ensures that text items are properly closed even for reasoning-only responses where no actual text content was emitted. This completes the lifecycle correctly for all response types.

core/providers/huggingface/chat.go (1)

11-106: LGTM! Comprehensive chat request conversion with proper error handling.

The ToHuggingFaceChatCompletionRequest function correctly:

  • Validates input with nil checks (line 12)
  • Maps all standard parameters (frequency penalty, temperature, top_p, etc.)
  • Handles ResponseFormat conversion with proper error propagation (lines 57-64)
  • Converts StreamOptions and Tools arrays
  • Supports both string-based (auto/none/required) and structured ToolChoice formats

All past review concerns have been addressed, including error handling for ResponseFormat conversion.

core/internal/testutil/account.go (3)

114-114: LGTM! HuggingFace properly integrated into test configuration.

HuggingFace is correctly:

  • Added to configured providers list (line 114)
  • Configured with API key retrieval from HUGGING_FACE_API_KEY environment variable (line 330)
  • Following the same pattern as other providers

Also applies to: 327-334


589-601: LGTM! HuggingFace provider config has appropriate settings for cold starts.

The configuration is well-tuned:

  • 300s timeout accommodates model cold starts on serverless inference
  • 10 retries aligns with other cloud providers for resilience
  • 2s initial → 30s max backoff provides appropriate retry spacing
  • Concurrency (4) and buffer size (10) match other providers

1020-1053: LGTM! Comprehensive test configuration for HuggingFace provider.

The test config properly defines:

  • Models for chat, vision, embedding, transcription, and speech synthesis
  • Comprehensive scenario coverage (chat, streaming, tool calls, embedding, audio)
  • Appropriate capability flags (e.g., Reasoning: false, Batch: false)
  • Fallback to OpenAI gpt-4o-mini

This enables thorough testing of HuggingFace provider integration.

core/providers/huggingface/embedding.go (2)

11-78: LGTM! Embedding request conversion handles provider differences correctly.

The ToHuggingFaceEmbeddingRequest function properly:

  • Splits model into inference provider and model name with error handling (lines 16-19)
  • Initializes request with provider-specific fields (Model/Provider for non-hf-inference, empty for hf-inference)
  • Uses correct input field based on provider: Inputs for hf-inference, Input for others (lines 40-44)
  • Maps standard parameters (EncodingFormat, Dimensions)
  • Extracts HuggingFace-specific parameters from ExtraParams (normalize, prompt_name, truncate, truncation_direction)

All past review concerns about input field selection have been addressed.


80-168: LGTM! Response unmarshaling supports multiple HuggingFace formats.

The UnmarshalHuggingFaceEmbeddingResponse function robustly handles three response formats:

  1. Standard object with Data/Model/Usage fields (lines 94-114)
  2. 2D array of embeddings for batch inputs (lines 119-141)
  3. 1D array for single embedding (lines 146-164)

The fallback logic with ordered attempts and default Usage provision (lines 107-111, 136-140, 159-163) ensures compatibility across different HuggingFace inference providers.

core/providers/huggingface/models.go (2)

16-44: LGTM! Model list conversion with proper filtering.

The ToBifrostListModelsResponse function correctly:

  • Filters out models with empty ModelID (lines 25-27)
  • Filters models without supported methods (lines 29-32)
  • Constructs Model entries with proper ID format: provider/inferenceProvider/modelID (line 35)
  • Populates Name, SupportedMethods, and HuggingFaceID fields appropriately

This ensures only actionable, properly identified models are exposed.


46-102: LGTM! Comprehensive method derivation from pipeline and tags.

The deriveSupportedMethods function effectively:

  • Maps pipeline types to core request types (conversational→chat, feature-extraction→embedding, etc.)
  • Augments methods from tag patterns covering embedding, chat/completion, TTS, and STT
  • Deduplicates via map-based set (lines 49-54)
  • Returns deterministically sorted results (lines 95-101)

This approach ensures models are correctly advertised with their actual capabilities based on HuggingFace metadata. All past concerns about unsupported methods have been addressed.

core/internal/testutil/transcription.go (2)

73-98: LGTM - Fixture-based audio loading for fal-ai/HuggingFace transcription tests.

The implementation correctly:

  • Uses runtime.Caller(0) to locate fixtures relative to the test file
  • Properly handles file read errors with t.Fatalf
  • Constructs the transcription request with appropriate parameters for the fal-ai path

98-178: LGTM - TTS generation path for non-fal-ai providers.

The else branch properly implements the TTS-based audio generation flow with:

  • Correct retry configuration using GetTestRetryConfigForScenario
  • Proper cleanup of temporary audio files
  • Consistent request construction

Based on learnings, GenerateTTSAudioForTest returns ([]byte, string) and handles errors internally via t.Fatalf(), so the blank identifier usage at line 277 is correct.

core/providers/huggingface/utils.go (7)

1-51: LGTM - Well-structured inference provider constants and types.

The inference provider type system is cleanly defined with:

  • A dedicated inferenceProvider string type for type safety
  • Comprehensive list of 19 providers matching HuggingFace documentation
  • Special auto policy for automatic provider selection

52-81: LGTM - Provider lists are correctly structured.

The INFERENCE_PROVIDERS slice and PROVIDERS_OR_POLICIES (which adds "auto") are properly initialized. The IIFE pattern for PROVIDERS_OR_POLICIES ensures the slice is created once at init time.


83-121: LGTM - Model hub URL builder handles edge cases well.

The function properly:

  • Enforces pagination limits with defaultModelFetchLimit and maxModelFetchLimit
  • Handles various ExtraParams types via type switch
  • Uses proper URL encoding via url.Values

130-147: LGTM - Model/provider parsing with proper error handling.

The splitIntoModelProvider function now correctly:

  • Returns an error for model names without slashes (t==0)
  • Handles single-slash models (org/model) with auto provider
  • Handles multi-slash models (provider/org/model) correctly

This addresses the past review concern about empty provider/model names.


149-193: LGTM - Provider routing correctly scoped to supported operations.

Per the past discussion, this function intentionally only handles 6 providers (fal-ai, hf-inference, nebius, replicate, sambanova, scaleway) because these are the only providers that support embedding, speech, and transcription operations per HuggingFace documentation. The other providers in INFERENCE_PROVIDERS are used for chat/text-generation which follows a different routing pattern.


213-267: LGTM - Model inference provider mapping with proper caching.

The implementation correctly:

  • Checks cache first before making HTTP requests
  • Uses sync.Map for concurrent access safety
  • Handles error responses with proper message preservation
  • Stores results in cache after successful fetch

294-322: LGTM - Audio download with context support.

The downloadAudioFromURL function correctly:

  • Accepts context for cancellation/timeout support (addressing past review)
  • Uses providerUtils.MakeRequestWithContext for context-aware requests
  • Copies the body to avoid use-after-free issues with fasthttp's internal buffer
core/providers/huggingface/transcription.go (2)

48-116: LGTM - Parameter mapping correctly handles both typed and ExtraParams.

The implementation properly:

  • Always initializes genParams regardless of ExtraParams presence (fixing past gating bug)
  • Maps typed fields from request.Params first
  • Overlays ExtraParams values when present
  • Handles early_stopping union type (bool or string)
  • Uses schemas.SafeExtractIntPointer for safe numeric extraction

121-160: LGTM - Response conversion handles segments correctly.

The ToBifrostTranscriptionResponse method:

  • Validates non-empty requestedModel
  • Maps chunks to TranscriptionSegment with proper timestamp extraction
  • Safely handles variable-length timestamp arrays
core/providers/huggingface/types.go (5)

12-52: LGTM - Model types with flexible JSON unmarshaling.

The HuggingFaceListModelsResponse.UnmarshalJSON correctly handles both:

  • Top-level JSON array (most common for /api/models)
  • Object with models field (fallback)

This addresses the past review concern about struct/API format mismatch.


54-71: LGTM - Inference provider mapping types correctly structured.

The HuggingFaceInferenceProviderInfo now uses ProviderModelID (addressing the past rename suggestion) and the internal HuggingFaceInferenceProviderMapping struct correctly separates task and model ID.


73-143: LGTM - Chat request types with flexible tool choice handling.

The HuggingFaceToolChoice type correctly:

  • Supports enum values ("auto", "none", "required") via EnumValue
  • Supports function object via Function field
  • Has proper MarshalJSON that emits the correct format for each case

157-224: LGTM - Embedding types with flexible input handling.

The InputsCustomType with custom UnmarshalJSON/MarshalJSON:

  • Handles string, array of strings, and object forms
  • Correctly returns error for unexpected formats
  • Uses sonic for efficient JSON operations

292-327: LGTM - EarlyStopping type with proper error handling.

The HuggingFaceTranscriptionEarlyStopping.UnmarshalJSON now correctly returns an error with the invalid data when neither boolean nor string parsing succeeds, addressing the past review concern about silently accepting invalid input.

core/providers/huggingface/huggingface.go (11)

29-63: LGTM - Response pooling with proper acquire/release pattern.

The transcription and speech response pools are correctly implemented:

  • Pool initialization with sync.Pool
  • Acquire functions reset structs before returning
  • Release functions handle nil checks
  • Pre-warming in constructor based on concurrency config

65-99: LGTM - Provider constructor with sensible defaults.

The NewHuggingFaceProvider correctly:

  • Sets read/write timeouts from config
  • Configures proxy if present
  • Defaults base URL to defaultInferenceBaseURL
  • Pre-warms response pools

111-188: LGTM - Model alias caching with 404 retry optimization.

Per the past discussion, this is an intentional cache-first optimization where:

  • Most requests complete with one API call due to high cache hit rate
  • Only on 404 (cache miss) is there a re-validation and retry
  • This saves API calls compared to always validating first

190-250: LGTM - HTTP request handling with proper error message preservation.

The completeRequest function correctly:

  • Sets appropriate content types for audio vs JSON
  • Handles error responses with guarded message overwrites (addressing past review)
  • Copies response body to avoid use-after-free with fasthttp

252-398: LGTM - Parallel model listing across inference providers.

The listModelsByKey function:

  • Spawns goroutines for each inference provider
  • Uses channels and WaitGroup for coordination
  • Aggregates results with proper error handling
  • Calculates average latency across successful responses

427-508: LGTM - ChatCompletion with proper model name formatting.

The implementation correctly:

  • Parses provider/model from request
  • Reformats model as modelName:inferenceProvider for downstream processing
  • Converts to HuggingFace request format
  • Enriches response with provider metadata

510-564: LGTM - ChatCompletionStream delegates to OpenAI-compatible handler.

The streaming implementation correctly:

  • Uses the shared openai.HandleOpenAIChatCompletionStreaming for SSE handling
  • Passes custom request converter for HuggingFace-specific format
  • Sets stream=true in the request body

566-596: LGTM - Responses API implemented via ChatCompletion fallback.

The Responses and ResponsesStream methods correctly:

  • Convert to chat request format
  • Delegate to ChatCompletion/ChatCompletionStream
  • Convert response back to Responses format
  • Set appropriate context for stream fallback detection

598-681: LGTM - Embedding with model alias caching.

The Embedding function correctly:

  • Validates and parses model name
  • Uses completeRequestWithModelAliasCache for model resolution
  • Handles raw request/response tracking
  • Uses custom UnmarshalHuggingFaceEmbeddingResponse for flexible format handling

683-763: LGTM - Speech with audio download post-processing.

The Speech function correctly:

  • Uses model alias caching for text-to-speech task
  • Acquires/releases pooled response
  • Downloads audio from returned URL using context-aware method
  • Converts to Bifrost response format

769-856: LGTM - Transcription with hf-inference audio handling.

The Transcription function correctly:

  • Validates input for hf-inference provider (addressing past nil deref concern)
  • Sends raw audio bytes for hf-inference vs JSON for other providers
  • Uses model alias caching for automatic-speech-recognition task
  • Properly acquires/releases pooled response

@qwerty-dvorak qwerty-dvorak force-pushed the 12-05-feat_huggingface_provider_added branch from 3e8d6d7 to 60e18dc Compare December 19, 2025 09:20
Copy link
Collaborator

Pratham-Mishra04 commented Dec 19, 2025

Merge activity

  • Dec 19, 11:13 AM UTC: A user started a stack merge that includes this pull request via Graphite.
  • Dec 19, 11:13 AM UTC: @Pratham-Mishra04 merged this pull request with Graphite.

@Pratham-Mishra04 Pratham-Mishra04 merged commit ed34784 into main Dec 19, 2025
8 of 9 checks passed
@Pratham-Mishra04 Pratham-Mishra04 deleted the 12-05-feat_huggingface_provider_added branch December 19, 2025 11:13
@coderabbitai coderabbitai bot mentioned this pull request Dec 22, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature] Add huggingface as a provider

3 participants