Skip to content

Conversation

@Davshiv20
Copy link

Summary

Switches the Ollama provider from OpenAI-compatible APIs to Ollama's native APIs, enabling API key support for Ollama Cloud deployments. This allows users to configure API keys for authenticated Ollama instances while maintaining backward compatibility with local Ollama setups.

Changes

New Files:

  • types.go - Native Ollama request/response types for chat, embedding, and list models
  • chat.go - Bidirectional converters between Bifrost and Ollama chat formats
  • embedding.go - Converters for embedding requests/responses
  • models.go - Converters for list models responses
    Modified Files:
  • ollama.go - Complete rewrite to use native API endpoints (/api/chat, /api/embed, /api/tags) with API key support
  • ollama_test.go - Updated tests for native API with environment variable configuration
  • config.json - Updated with Ollama

Key Design Decisions:

  • Uses newline-delimited JSON streaming instead of SSE (native Ollama format)
  • Maps Bifrost parameters to Ollama's options object (max_tokens → num_predict, etc.)
  • Defaults to http://localhost:11434 for local Ollama when no base URL is provided
  • Supports Ollama-specific parameters via ExtraParams (num_ctx, mirostat, keep_alive, etc.)

Type of change

  • Bug fix
  • Feature
  • Refactor
  • Documentation
  • Chore/CI

Affected areas

  • [x ] Core (Go)
  • Transports (HTTP)
  • Providers/Integrations
  • Plugins
  • [] UI (Next.js)
  • Docs

How to test

ollama serve

Pull a test model

ollama serve

Pull a test model

ollama pull llama3.2:latest
ollama pull nomic-embed-text:latest

Run tests (no environment variables needed!)

cd core
go test -v ./providers/ollama/... -timeout 5m

Optional: Override default model

export OLLAMA_MODEL="llama3.2:latest"
export OLLAMA_EMBEDDING_MODEL="nomic-embed-text:latest"
go test -v ./providers/ollama/... -timeout 5m

# Core/Transports
go version
go test ./...

# UI
cd ui
pnpm i || npm i
pnpm test || npm test
pnpm build || npm run build

If adding new configs or environment variables, document them here.

Screenshots/Recordings

If UI changes, add before/after screenshots or short clips.

Breaking changes

  • Yes
  • No

If yes, describe impact and migration instructions.

Related issues

Closes Issue #1011

Security considerations

Note any security implications (auth, secrets, PII, sandboxing, etc.).

Checklist

  • I read docs/contributing/README.md and followed the guidelines
  • I added/updated tests where appropriate
  • I updated documentation where needed
  • I verified builds succeed (Go and UI)
  • I verified the CI pipeline passes locally if applicable

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Dec 12, 2025

📝 Walkthrough

Summary by CodeRabbit

  • New Features

    • Native Ollama API integration, Ollama Cloud support, embeddings, tool calling, and real-time chat streaming
    • Model browsing/listing and richer model metadata
  • Changes

    • Text completion removed — use chat completions
    • Default local base URL configured; improved error reporting and streaming resilience
  • Tests

    • Expanded Ollama and utility test suites covering embeddings, tools, images, and streaming

✏️ Tip: You can customize this high-level summary in your review settings.

Walkthrough

Adds native Ollama provider integration: types, bidirectional converters, message/tool/image utilities, request/response pools, streaming (newline-delimited JSON), completeRequest error handling, embedding and chat endpoints (/api/chat, /api/embed, /api/tags), and tests for local and cloud scenarios.

Changes

Cohort / File(s) Summary
Type definitions & model helpers
core/providers/ollama/types.go, core/providers/ollama/models.go
Adds exhaustive Ollama types (requests, responses, embeddings, models, errors, stream types) and helpers to convert to/from Bifrost models and to expose model info.
Chat request/response converters
core/providers/ollama/chat.go
Adds ToOllamaChatRequest and OllamaChatRequest.ToBifrostChatRequest with thorough mapping of messages, options, extra params (top_k, num_ctx, repeat_penalty, mirostat, keep_alive, think, etc.), response format handling, tools mapping, and nil-safety.
Embedding converters
core/providers/ollama/embedding.go
Adds ToOllamaEmbeddingRequest and OllamaEmbeddingRequest.ToBifrostEmbeddingRequest; maps model, input (text/texts), and extra params (truncate, keep_alive, num_ctx) both directions with nil guards.
Provider implementation & networking
core/providers/ollama/ollama.go
Reworks provider to call Ollama native API endpoints, adds memory pools and acquire/release helpers, completeRequest central helper, parseOllamaError, streaming support for newline JSON, ListModels via /api/tags, embedding via /api/embed, default base URL, and updated control flow.
Utilities & conversions
core/providers/ollama/utils.go
Implements unexported conversion utilities: convertMessagesToOllama/convertMessagesFromOllama, role mapping, content/image/base64 helpers, tool converters, and thinking/tool-call handling logic.
Unit tests for utilities
core/providers/ollama/utils_test.go
Adds extensive tests for base64 extraction/validation, min, and message/tool round-trip edge cases.
Integration tests
core/providers/ollama/ollama_test.go
Adds TestOllama and TestOllamaCloud with env-driven configuration, expanded scenarios (images, embeddings, tool calls, end-to-end flows), and gating for missing env vars.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant OllamaProvider
    participant OllamaAPI
    participant ResponsePool

    Client->>OllamaProvider: ChatCompletion(bifrostReq)
    OllamaProvider->>OllamaProvider: ToOllamaChatRequest(convert request)
    OllamaProvider->>ResponsePool: Acquire response object
    OllamaProvider->>OllamaAPI: POST /api/chat (JSON)
    OllamaAPI-->>OllamaProvider: OllamaChatResponse (or NDJSON stream)
    OllamaProvider->>OllamaProvider: ToBifrostChatResponse(convert & enrich)
    OllamaProvider->>ResponsePool: Release response object
    OllamaProvider-->>Client: BifrostChatResponse
Loading
sequenceDiagram
    participant Client
    participant OllamaProvider
    participant OllamaAPI
    participant StreamParser

    Client->>OllamaProvider: ChatCompletionStream(bifrostReq)
    OllamaProvider->>OllamaProvider: ToOllamaChatRequest(convert request)
    OllamaProvider->>OllamaAPI: POST /api/chat (streaming)
    OllamaAPI-->>StreamParser: Newline-delimited JSON stream
    loop per JSON line
        StreamParser->>OllamaProvider: OllamaStreamResponse
        OllamaProvider->>OllamaProvider: ToBifrostStreamResponse(convert chunk)
        OllamaProvider->>Client: BifrostStream (delta + metadata)
    end
Loading
sequenceDiagram
    participant Client
    participant OllamaProvider
    participant OllamaAPI

    Client->>OllamaProvider: Embedding(bifrostReq)
    OllamaProvider->>OllamaProvider: ToOllamaEmbeddingRequest(convert request)
    OllamaProvider->>OllamaAPI: POST /api/embed
    OllamaAPI-->>OllamaProvider: OllamaEmbeddingResponse
    OllamaProvider->>OllamaProvider: ToBifrostEmbeddingResponse(convert & enrich)
    OllamaProvider-->>Client: BifrostEmbeddingResponse
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45–75 minutes

  • Areas needing extra attention:
    • Tool-call mapping, tool_name fallback, and correlation of calls/responses in utils.go
    • Streaming NDJSON parsing, chunk indexing, and error propagation in ChatCompletionStream
    • Memory pool acquire/release correctness and potential leaks in ollama.go
    • Full mapping of extra parameters (mirostat, repeat_penalty, top_k, num_ctx, keep_alive, think) and response-format handling in chat converters
    • Base64 image extraction/validation edge cases in utils_test.go

Poem

🐰 I hopped into Ollama's bright API lane,
Converting messages, tools, images — no pain.
Streams, pools, and embeddings all sing,
From Bifrost to Ollama, I tug every string.
Hooray — a fluffy bridge, wired end-to-end! ✨

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 76.92% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately and concisely describes the main change: migrating the Ollama provider to use native APIs instead of OpenAI-compatible endpoints.
Description check ✅ Passed The description covers all major required sections: summary, changes overview, type of change, affected areas, testing instructions, and related issues. All critical information is present and well-organized.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

📜 Recent review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 7d62378 and 4537ab1.

📒 Files selected for processing (8)
  • core/providers/ollama/chat.go (1 hunks)
  • core/providers/ollama/embedding.go (1 hunks)
  • core/providers/ollama/models.go (1 hunks)
  • core/providers/ollama/ollama.go (6 hunks)
  • core/providers/ollama/ollama_test.go (3 hunks)
  • core/providers/ollama/types.go (1 hunks)
  • core/providers/ollama/utils.go (1 hunks)
  • core/providers/ollama/utils_test.go (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • core/providers/ollama/embedding.go
🧰 Additional context used
📓 Path-based instructions (1)
**

⚙️ CodeRabbit configuration file

always check the stack if there is one for the current PR. do not give localized reviews for the PR, always see all changes in the light of the whole stack of PRs (if there is a stack, if there is no stack you can continue to make localized suggestions/reviews)

Files:

  • core/providers/ollama/models.go
  • core/providers/ollama/ollama_test.go
  • core/providers/ollama/ollama.go
  • core/providers/ollama/chat.go
  • core/providers/ollama/types.go
  • core/providers/ollama/utils_test.go
  • core/providers/ollama/utils.go
🧠 Learnings (3)
📚 Learning: 2025-12-09T17:07:42.007Z
Learnt from: qwerty-dvorak
Repo: maximhq/bifrost PR: 1006
File: core/schemas/account.go:9-18
Timestamp: 2025-12-09T17:07:42.007Z
Learning: In core/schemas/account.go, the HuggingFaceKeyConfig field within the Key struct is currently unused and reserved for future Hugging Face inference endpoint deployments. Do not flag this field as missing from OpenAPI documentation or require its presence in the API spec until the feature is actively implemented and used. When the feature is added, update the OpenAPI docs accordingly; otherwise, treat this field as non-breaking and not part of the current API surface.

Applied to files:

  • core/providers/ollama/models.go
  • core/providers/ollama/ollama_test.go
  • core/providers/ollama/ollama.go
  • core/providers/ollama/chat.go
  • core/providers/ollama/types.go
  • core/providers/ollama/utils_test.go
  • core/providers/ollama/utils.go
📚 Learning: 2025-12-19T09:26:54.961Z
Learnt from: qwerty-dvorak
Repo: maximhq/bifrost PR: 1006
File: core/providers/utils/utils.go:1050-1051
Timestamp: 2025-12-19T09:26:54.961Z
Learning: Update streaming end-marker handling so HuggingFace is treated as a non-[DONE] provider for backends that do not emit a DONE marker (e.g., meta llama on novita). In core/providers/utils/utils.go, adjust ProviderSendsDoneMarker() (or related logic) to detect providers that may not emit DONE and avoid relying on DONE as the sole end signal. Add tests to cover both DONE-emitting and non-DONE backends, with clear documentation in code comments explaining the rationale and any fallback behavior.

Applied to files:

  • core/providers/ollama/models.go
  • core/providers/ollama/ollama_test.go
  • core/providers/ollama/ollama.go
  • core/providers/ollama/chat.go
  • core/providers/ollama/types.go
  • core/providers/ollama/utils_test.go
  • core/providers/ollama/utils.go
📚 Learning: 2025-12-15T10:16:21.909Z
Learnt from: qwerty-dvorak
Repo: maximhq/bifrost PR: 1006
File: core/providers/huggingface/huggingface_test.go:12-63
Timestamp: 2025-12-15T10:16:21.909Z
Learning: In provider tests under core/providers/<provider>/*_test.go, do not require or flag the use of defer for Shutdown(); instead call client.Shutdown() at the end of each test function. This pattern appears consistent across all provider tests. Apply this rule only within this path; for other tests or resources, defer may still be appropriate.

Applied to files:

  • core/providers/ollama/ollama_test.go
  • core/providers/ollama/utils_test.go
🧬 Code graph analysis (4)
core/providers/ollama/models.go (2)
core/schemas/models.go (1)
  • Model (109-129)
core/providers/ollama/types.go (1)
  • OllamaModel (143-150)
core/providers/ollama/ollama_test.go (4)
core/internal/testutil/account.go (2)
  • ComprehensiveTestConfig (60-82)
  • TestScenarios (22-57)
core/schemas/bifrost.go (1)
  • Ollama (42-42)
core/internal/testutil/tests.go (1)
  • RunAllComprehensiveTests (15-78)
core/internal/testutil/setup.go (1)
  • SetupTest (51-60)
core/providers/ollama/chat.go (3)
core/schemas/chatcompletions.go (2)
  • BifrostChatRequest (12-19)
  • ChatParameters (156-186)
core/providers/ollama/types.go (2)
  • OllamaChatRequest (16-25)
  • OllamaOptions (63-95)
core/schemas/utils.go (5)
  • SafeExtractIntPointer (486-494)
  • SafeExtractFloat64Pointer (497-505)
  • SafeExtractStringPointer (468-483)
  • SafeExtractBoolPointer (508-516)
  • ParseModelString (23-36)
core/providers/ollama/types.go (2)
core/schemas/chatcompletions.go (9)
  • BifrostChatResponse (27-42)
  • ChatAssistantMessageToolCall (710-716)
  • ChatAssistantMessageToolCallFunction (719-722)
  • ChatAssistantMessage (649-656)
  • BifrostResponseChoice (736-744)
  • ChatNonStreamResponseChoice (777-780)
  • BifrostLLMUsage (845-852)
  • ChatStreamResponseChoiceDelta (788-796)
  • ChatStreamResponseChoice (783-785)
core/schemas/bifrost.go (4)
  • BifrostResponseExtraFields (394-405)
  • ChatCompletionRequest (94-94)
  • ChatCompletionStreamRequest (95-95)
  • ListModelsRequest (91-91)
🔇 Additional comments (32)
core/providers/ollama/ollama_test.go (2)

12-75: Comprehensive test configuration for local Ollama.

The test setup is well-structured with appropriate scenario coverage for the native Ollama API. The environment-driven configuration with sensible defaults (llama3.2:latest, nomic-embed-text:latest) and clear comments explaining Ollama-specific behavior (base64 images, no text completion endpoint) makes the test easy to understand and run.


84-127: Verify cloud model feature support against actual Ollama Cloud capabilities.

The test disables several features (MultipleToolCalls, AutomaticFunctionCall, Embedding) with comments suggesting they "may not be supported in cloud," but Ollama documentation states that cloud models offer the same capabilities as local models. Either enable these features for cloud testing if currently supported, or document specific reasons why they remain disabled based on actual Ollama Cloud limitations rather than assumptions.

core/providers/ollama/models.go (3)

9-21: LGTM!

The conversion is straightforward with appropriate nil handling. The comment correctly notes that /api/tags is GET-only, clarifying why this function is primarily for completeness.


23-37: LGTM!

The conversion correctly maps Ollama model data to Bifrost format, using ModifiedAt for the Created timestamp and setting "ollama" as the owner. The nil guard ensures safe operation.


39-67: LGTM!

The display-ready map appropriately includes core fields and conditionally adds detail fields when present. This is useful for debugging and user-facing model information.

core/providers/ollama/utils.go (8)

14-24: Excellent documentation of Ollama's limitations.

The detailed comments explaining Ollama's tool call semantics (no IDs, correlation by function name) and the explicit warning against generating synthetic IDs demonstrate careful consideration of the API's constraints and set proper expectations for maintainers.


52-86: LGTM - Thoughtful handling of thinking content.

The logic correctly extracts and preserves thinking content from tool calls' ExtraContent, filters out placeholder tool calls (_thinking_placeholder), and sets the thinking field on the Ollama message. This aligns well with Ollama's native format while maintaining compatibility with Bifrost's structure.


88-101: LGTM - Appropriate tool message validation.

The validation correctly ensures tool messages have the required ToolName field and skips invalid messages that would be silently ignored by Ollama. The warning log for missing Name provides debugging context without failing the entire request.


154-190: LGTM - Consistent thinking preservation.

The reverse conversion mirrors the forward conversion's approach to preserving thinking content, using the same ExtraContent structure and _thinking_placeholder mechanism. This ensures round-trip compatibility for thinking-capable models.


240-255: LGTM - Clear role mapping.

The role mapping correctly handles all Bifrost roles, explicitly documents the developer→system fallback (since Ollama doesn't support developer role), and returns empty string for unsupported roles to enable filtering.


316-353: LGTM - Robust base64 extraction.

The function handles multiple image URL formats comprehensively (data URLs, raw base64, HTTP URLs), validates base64 content, and provides helpful warnings for unsupported formats. The use of min() to limit log output prevents excessive logging.


358-375: LGTM - Thorough base64 validation.

The validation correctly tries both standard and raw base64 encoding, checks minimum length, and validates that decoded data is non-empty. This handles various base64 formats while ensuring data integrity.


387-420: LGTM - Error handling addressed.

The tool call conversion properly handles JSON unmarshal errors by logging the error and preserving raw arguments under _raw_arguments, allowing downstream logic to inspect the original payload. This addresses the concern from the previous review about silently dropping errors.

core/providers/ollama/chat.go (2)

9-155: LGTM - Comprehensive parameter mapping.

The conversion correctly maps all standard Bifrost parameters to Ollama format and extracts Ollama-specific parameters from ExtraParams using type-safe helpers. The response format handling supports both simple JSON mode and structured schemas, and all mappings align with Ollama's native API documentation.


157-223: LGTM - Appropriate reverse conversion.

The reverse conversion correctly reconstructs a BifrostChatRequest from Ollama format, mapping core parameters and preserving key Ollama-specific settings in ExtraParams. The selective mapping of parameters is appropriate for passthrough scenarios where full round-trip fidelity isn't required.

core/providers/ollama/utils_test.go (3)

9-153: LGTM - Thorough utility test coverage.

The tests comprehensively cover base64 extraction and validation edge cases, including data URLs (JPEG/PNG), raw base64, HTTP URLs (unsupported), malformed input, and invalid base64. The test cases align well with the implementation's handling of various formats.


155-398: LGTM - Critical tool call semantics validated.

The tests correctly validate Ollama's tool call correlation by function name (not ID), with the critical assertion at line 262 ensuring tool_name is set to the function name ("getWeather") rather than the Bifrost tool_call_id. The edge case coverage (missing prior messages, multiple tools, invalid roles) ensures robust handling.


400-481: LGTM - Reverse conversion validated.

The tests correctly validate conversion from Ollama format back to Bifrost, ensuring tool calls are reconstructed with proper structure and that tool response messages have both ToolCallID and Name fields populated. This complements the forward conversion tests well.

core/providers/ollama/ollama.go (8)

42-82: LGTM - Efficient response pooling.

The response pools are well-implemented with proper reset on acquire and nil-safe release. Pre-warming the pools during provider initialization (lines 99-102) ensures they're ready for use, optimizing memory allocation during request handling.


128-172: LGTM - Centralized request handling.

The completeRequest helper properly consolidates request/response handling with correct context support, body decoding via CheckAndDecodeBody, and error parsing. The past review comments have been addressed (Authorization: Bearer documented correctly, no x-api-key claim).


187-253: LGTM - ListModels correctly implemented.

The implementation properly uses GetPathFromContext for path overrides and CheckAndDecodeBody for response handling, addressing previous review concerns. The function correctly handles both authenticated and local Ollama instances.


267-321: LGTM - ChatCompletion properly implemented.

The non-streaming chat completion correctly uses the conversion helpers, completeRequest for unified handling, response pools for efficiency, and properly sets all metadata fields. The flow is clean and follows the established patterns.


372-434: LGTM - Context cancellation properly handled.

The implementation addresses the previous critical review by adding a select statement (lines 416-434) to check for context cancellation before each scan, and documents fasthttp's streaming limitations (lines 373-375). While not perfect due to fasthttp constraints, this mitigation prevents goroutine leaks when contexts are cancelled.


456-514: LGTM - Streaming chunks properly handled.

The chunk processing correctly uses ShouldSendBackRawResponse (lines 465, 487) to respect context settings, addressing the previous review concern. The per-chunk latency tracking and Responses fallback conversion are implemented correctly.


557-605: LGTM - Embedding follows established pattern.

The embedding implementation correctly mirrors the ChatCompletion pattern, using the conversion helper, completeRequest for unified handling, response pools, and proper metadata population. The consistency across endpoints is good for maintainability.


529-555: LGTM - Clean Responses fallback pattern.

The Responses and ResponsesStream implementations correctly leverage the chat completion endpoints with appropriate conversions and context flags. This approach maintains compatibility while avoiding code duplication.

core/providers/ollama/types.go (6)

14-95: LGTM - Comprehensive request type definitions.

The request types are well-defined with clear documentation and align with Ollama's native API. The OllamaOptions struct is particularly thorough, covering all documented parameters (generation, penalty, context, advanced, low-level) to support the full range of Ollama's capabilities.


99-185: LGTM - Complete response type definitions.

The response types accurately represent Ollama's API responses, including timing information in nanoseconds (native format) and comprehensive model details. The reuse of the same structure for streaming and non-streaming responses aligns with Ollama's NDJSON streaming format.


189-293: LGTM - Correct chat response conversion.

The conversion properly handles nil cases, parses timestamps, and converts tool calls (using function name as ID since Ollama doesn't provide IDs). The thinking preservation using the _thinking_placeholder pattern is consistent with the message conversion utilities.


331-437: LGTM - Streaming conversion properly implemented.

The streaming response conversion correctly builds delta chunks, maps thinking to the Reasoning field (lines 383-387), sets finish reasons only on the final chunk, and includes usage information only when done=true. The isDone return value enables proper stream termination.


439-480: LGTM - Embedding conversion correct.

The embedding response conversion properly handles the float64 to float32 conversion (Ollama uses float64, Bifrost uses float32) and correctly structures the response data with proper indexing and usage information.


482-520: LGTM - Model filtering implemented.

The list models conversion now properly uses the configuredSet to filter models (line 503), addressing the previous review concern about unused code. The filtering is appropriately conditional (only when configuredModels is non-empty) and each model is correctly converted to Bifrost format.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@Davshiv20 Davshiv20 force-pushed the feat/ollama-native-api branch from 2789db7 to 5608aa4 Compare December 12, 2025 13:07
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 8

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (3)
ui/lib/schemas/providerForm.ts (1)

201-243: Schema incorrectly mandates Ollama API keys, contradicting configuration intent.

The config at ui/lib/constants/config.ts (line 64) explicitly sets isKeyRequiredByProvider.ollama = false, indicating keys should be optional for Ollama. However, the schema logic at line 222 makes keysRequired = true for Ollama (since it's not "custom" and not in the ["sgl"] exempt list), forcing at least one non-empty key via lines 224–240. This prevents local Ollama usage and contradicts the stated design.

Additionally, utils.ts normalizes provider names with .toLowerCase(), but providerForm.ts does not, creating a case-sensitivity mismatch that could cause validation bugs.

Suggested fix:

-		const keysRequired = data.selectedProvider === "custom" || !["sgl"].includes(data.selectedProvider);
+		const provider = data.selectedProvider.trim().toLowerCase();
+		const keysRequired = provider === "custom" || !["sgl", "ollama"].includes(provider);

This aligns schema validation with configuration and normalizes casing for consistency.

core/providers/ollama/ollama_test.go (2)

12-75: Defer client.Shutdown() to ensure cleanup on t.Fatal / FailNow.

 	client, ctx, cancel, err := testutil.SetupTest()
 	if err != nil {
 		t.Fatalf("Error initializing test setup: %v", err)
 	}
 	defer cancel()
+	defer client.Shutdown()
@@
-	client.Shutdown()
 }

77-133: Same cleanup issue in TestOllamaCloud: make client.Shutdown() a defer.

 	client, ctx, cancel, err := testutil.SetupTest()
 	if err != nil {
 		t.Fatalf("Error initializing test setup: %v", err)
 	}
 	defer cancel()
+	defer client.Shutdown()
@@
-	client.Shutdown()
 }
🧹 Nitpick comments (6)
core/providers/ollama/chat.go (2)

237-265: Base64 detection: consider supporting unpadded base64 and avoiding obvious false positives.
Using StdEncoding.DecodeString rejects unpadded base64 (common) and can accept some “random” strings that happen to decode.

 func isBase64(s string) bool {
 	if len(s) < 4 {
 		return false
 	}
-	_, err := base64.StdEncoding.DecodeString(s)
-	return err == nil
+	// Try padded and raw (unpadded) forms.
+	if _, err := base64.StdEncoding.DecodeString(s); err == nil {
+		return true
+	}
+	if _, err := base64.RawStdEncoding.DecodeString(s); err == nil {
+		return true
+	}
+	return false
 }

296-362: Reverse conversion isn’t symmetric (drops keep_alive, format, and many options).
If this method is intended for true “passthrough/reverse conversion”, it should round-trip the fields you set in ToOllamaChatRequest (or the comment should narrow the promise).

core/providers/ollama/types.go (4)

166-182: Consider consolidating duplicate types.

OllamaStreamResponse is identical to OllamaChatResponse. Consider using a type alias to reduce duplication:

// OllamaStreamResponse is the same structure as OllamaChatResponse used during streaming.
type OllamaStreamResponse = OllamaChatResponse

Alternatively, if distinct types are intentional for clarity, the current approach is acceptable.


214-225: Silently ignored error on JSON marshal.

The error from json.Marshal is discarded. While marshaling a map[string]interface{} rarely fails, if it does, the Arguments field will contain "null" instead of valid JSON, which could cause downstream parsing issues.

Consider logging or handling the error:

-				args, _ := json.Marshal(tc.Function.Arguments)
+				args, err := json.Marshal(tc.Function.Arguments)
+				if err != nil {
+					args = []byte("{}")
+				}

320-331: Silently ignored error on JSON marshal (same issue as line 215).

Same concern as the non-streaming conversion - consider handling the error consistently.


357-370: Inconsistent finish reason mapping compared to mapFinishReason.

This inline switch doesn't handle "load" and "unload" cases like mapFinishReason does (lines 271-272). Consider extracting a shared helper to ensure consistent behavior:

-		if r.Done {
-			if r.DoneReason != nil {
-				switch *r.DoneReason {
-				case "stop":
-					choice.FinishReason = schemas.Ptr("stop")
-				case "length":
-					choice.FinishReason = schemas.Ptr("length")
-				default:
-					choice.FinishReason = schemas.Ptr("stop")
-				}
-			} else {
-				choice.FinishReason = schemas.Ptr("stop")
-			}
-		}
+		if r.Done {
+			choice.FinishReason = mapDoneReasonToFinishReason(r.DoneReason, r.Done)
+		}

Then create a shared helper function that both methods can use.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 675561f and 2789db7.

📒 Files selected for processing (8)
  • core/providers/ollama/chat.go (1 hunks)
  • core/providers/ollama/embedding.go (1 hunks)
  • core/providers/ollama/models.go (1 hunks)
  • core/providers/ollama/ollama.go (5 hunks)
  • core/providers/ollama/ollama_test.go (3 hunks)
  • core/providers/ollama/types.go (1 hunks)
  • ui/app/workspace/providers/views/utils.ts (1 hunks)
  • ui/lib/schemas/providerForm.ts (1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**

⚙️ CodeRabbit configuration file

always check the stack if there is one for the current PR. do not give localized reviews for the PR, always see all changes in the light of the whole stack of PRs (if there is a stack, if there is no stack you can continue to make localized suggestions/reviews)

Files:

  • core/providers/ollama/chat.go
  • core/providers/ollama/embedding.go
  • ui/app/workspace/providers/views/utils.ts
  • core/providers/ollama/ollama_test.go
  • core/providers/ollama/types.go
  • core/providers/ollama/models.go
  • ui/lib/schemas/providerForm.ts
  • core/providers/ollama/ollama.go
🧠 Learnings (1)
📚 Learning: 2025-12-09T17:07:42.007Z
Learnt from: qwerty-dvorak
Repo: maximhq/bifrost PR: 1006
File: core/schemas/account.go:9-18
Timestamp: 2025-12-09T17:07:42.007Z
Learning: In core/schemas/account.go, the HuggingFaceKeyConfig field within the Key struct is currently unused and reserved for future Hugging Face inference endpoint deployments. Do not flag this field as missing from OpenAPI documentation or require its presence in the API spec until the feature is actively implemented and used. When the feature is added, update the OpenAPI docs accordingly; otherwise, treat this field as non-breaking and not part of the current API surface.

Applied to files:

  • core/providers/ollama/chat.go
  • core/providers/ollama/embedding.go
  • core/providers/ollama/ollama_test.go
  • core/providers/ollama/types.go
  • core/providers/ollama/models.go
  • core/providers/ollama/ollama.go
🧬 Code graph analysis (3)
core/providers/ollama/types.go (4)
core/schemas/models.go (1)
  • Model (109-129)
core/schemas/chatcompletions.go (6)
  • BifrostChatResponse (26-41)
  • ChatAssistantMessageToolCall (564-570)
  • ChatAssistantMessageToolCallFunction (573-576)
  • ChatAssistantMessage (541-545)
  • BifrostResponseChoice (582-590)
  • BifrostLLMUsage (640-647)
core/schemas/bifrost.go (1)
  • BifrostResponseExtraFields (285-294)
core/schemas/embedding.go (2)
  • BifrostEmbeddingResponse (22-28)
  • EmbeddingData (118-122)
core/providers/ollama/models.go (2)
core/schemas/models.go (1)
  • Model (109-129)
core/providers/ollama/types.go (1)
  • OllamaModel (140-147)
ui/lib/schemas/providerForm.ts (1)
ui/app/workspace/providers/views/utils.ts (1)
  • keysRequired (1-1)
🔇 Additional comments (16)
ui/app/workspace/providers/views/utils.ts (1)

1-1: Backend and frontend schemas confirm only sgl should be keyless—ollama key requirement is correct.

The code aligns with the backend validation schema in ui/lib/schemas/providerForm.ts, which consistently exempts only sgl from the keys requirement. Both the backend and frontend use identical logic: keysRequired = custom || !["sgl"].includes(provider). The commit message ("Refactor: update UI schema to enforce sgLang key requirements") indicates this was intentional for sgl, with no evidence that ollama was ever intended to be keyless.

Minor suggestion (non-blocking): normalize provider casing once to avoid double toLowerCase().

core/providers/ollama/embedding.go (1)

9-55: Forward embedding conversion looks solid (incl. nil-safety and conditional Options).

core/providers/ollama/models.go (3)

9-21: ToOllamaModel is fine (and the “future use” note is helpful).


39-67: GetModelInfo formatting is clean and nicely conditional.


23-37: Keep m.Name for schemas.Model.ID—it is the canonical field in Ollama's official /api/tags API.

The official Ollama documentation specifies models[].name as the model identifier field in /api/tags responses. The model field is a non-standard duplicate that appears in some third-party docs and builds, not in the canonical specification. Using m.Name ensures stability across Ollama versions and aligns with the official API contract. The current implementation is correct.

Likely an incorrect or invalid review comment.

core/providers/ollama/chat.go (1)

14-154: Request mapping is comprehensive; response-format + tool mapping are good.

core/providers/ollama/ollama.go (2)

42-83: Pooling helpers are straightforward and safe (reset + nil-guard).


98-120: FYI: per guidelines—if PR #1075 is part of a PR stack, double-check ordering/dependencies.
[coding_guidelines] says to consider the whole stack “if there is one”; I can’t see that context here.

core/providers/ollama/types.go (8)

1-10: LGTM!

Package declaration and imports are appropriate for the type definitions and conversion utilities.


12-92: LGTM!

Request types are well-structured with comprehensive documentation and appropriate JSON tags. The OllamaOptions struct covers a thorough set of Ollama-specific parameters.


94-109: LGTM!

Response type accurately represents Ollama's chat completion response structure with appropriate optional fields.


111-130: LGTM!

Embedding types correctly represent Ollama's embedding API, with Input as interface{} to accommodate both single strings and string arrays.


132-157: LGTM!

List models types accurately represent Ollama's /api/tags endpoint response structure.


257-276: LGTM!

The finish reason mapping handles Ollama-specific values appropriately, with sensible defaults for edge cases.


278-291: LGTM!

Usage conversion is straightforward. Note that this always returns a non-nil BifrostLLMUsage even when token counts are unavailable (defaulting to zeros), which is a reasonable design choice.


391-432: LGTM!

Embedding response conversion correctly handles the float64 to float32 conversion and appropriately sets usage for embedding requests (prompt tokens only).

@Davshiv20 Davshiv20 force-pushed the feat/ollama-native-api branch from 5608aa4 to 9055732 Compare December 12, 2025 13:10
@Davshiv20 Davshiv20 changed the title Feat/ollama native api Feat/ollama native api integration Dec 12, 2025
@Davshiv20 Davshiv20 changed the title Feat/ollama native api integration feat: migrate ollama to native api support Dec 12, 2025
@Davshiv20 Davshiv20 changed the title feat: migrate ollama to native api support feat(ollama): migrate ollama to native api Dec 12, 2025
@Pratham-Mishra04 Pratham-Mishra04 linked an issue Dec 12, 2025 that may be closed by this pull request
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
core/providers/ollama/utils.go (1)

215-224: Hardcoded JPEG MIME type may be incorrect for PNG images.

When converting images from Ollama back to Bifrost, the code always uses data:image/jpeg;base64, prefix regardless of the actual image format. If the original image was PNG, this produces an incorrect data URL.

🔎 Suggested improvement

Consider detecting the image type from the base64 magic bytes or preserving the original format. For now, this is a minor issue since the base64 data itself is correct:

 			// Add images
 			for _, img := range msg.Images {
-				dataURL := "data:image/jpeg;base64," + img
+				// Detect image type from base64 magic bytes
+				mimeType := "image/jpeg" // Default fallback
+				if len(img) >= 4 {
+					// PNG starts with iVBORw (decoded: 0x89 0x50 0x4E 0x47)
+					if strings.HasPrefix(img, "iVBORw") {
+						mimeType = "image/png"
+					}
+				}
+				dataURL := "data:" + mimeType + ";base64," + img
core/providers/ollama/ollama.go (1)

514-521: Scanner error handling could include more context.

When scanner.Err() returns an error, it's logged and processed, but it might be helpful to know if it was due to buffer overflow (which could happen with very large responses despite the 10MB limit).

🔎 Suggested improvement
 		if err := scanner.Err(); err != nil {
-			provider.logger.Warn(fmt.Sprintf("Error reading Ollama stream: %v", err))
+			provider.logger.Warn(fmt.Sprintf("Error reading Ollama stream (buffer limit: 10MB): %v", err))
📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 80c4267 and 7d62378.

📒 Files selected for processing (6)
  • core/providers/ollama/chat.go (1 hunks)
  • core/providers/ollama/embedding.go (1 hunks)
  • core/providers/ollama/ollama.go (5 hunks)
  • core/providers/ollama/types.go (1 hunks)
  • core/providers/ollama/utils.go (1 hunks)
  • core/providers/ollama/utils_test.go (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • core/providers/ollama/chat.go
🧰 Additional context used
📓 Path-based instructions (1)
**

⚙️ CodeRabbit configuration file

always check the stack if there is one for the current PR. do not give localized reviews for the PR, always see all changes in the light of the whole stack of PRs (if there is a stack, if there is no stack you can continue to make localized suggestions/reviews)

Files:

  • core/providers/ollama/embedding.go
  • core/providers/ollama/utils_test.go
  • core/providers/ollama/ollama.go
  • core/providers/ollama/utils.go
  • core/providers/ollama/types.go
🧠 Learnings (3)
📚 Learning: 2025-12-09T17:07:42.007Z
Learnt from: qwerty-dvorak
Repo: maximhq/bifrost PR: 1006
File: core/schemas/account.go:9-18
Timestamp: 2025-12-09T17:07:42.007Z
Learning: In core/schemas/account.go, the HuggingFaceKeyConfig field within the Key struct is currently unused and reserved for future Hugging Face inference endpoint deployments. Do not flag this field as missing from OpenAPI documentation or require its presence in the API spec until the feature is actively implemented and used. When the feature is added, update the OpenAPI docs accordingly; otherwise, treat this field as non-breaking and not part of the current API surface.

Applied to files:

  • core/providers/ollama/embedding.go
  • core/providers/ollama/utils_test.go
  • core/providers/ollama/ollama.go
  • core/providers/ollama/utils.go
  • core/providers/ollama/types.go
📚 Learning: 2025-12-19T09:26:54.961Z
Learnt from: qwerty-dvorak
Repo: maximhq/bifrost PR: 1006
File: core/providers/utils/utils.go:1050-1051
Timestamp: 2025-12-19T09:26:54.961Z
Learning: Update streaming end-marker handling so HuggingFace is treated as a non-[DONE] provider for backends that do not emit a DONE marker (e.g., meta llama on novita). In core/providers/utils/utils.go, adjust ProviderSendsDoneMarker() (or related logic) to detect providers that may not emit DONE and avoid relying on DONE as the sole end signal. Add tests to cover both DONE-emitting and non-DONE backends, with clear documentation in code comments explaining the rationale and any fallback behavior.

Applied to files:

  • core/providers/ollama/embedding.go
  • core/providers/ollama/utils_test.go
  • core/providers/ollama/ollama.go
  • core/providers/ollama/utils.go
  • core/providers/ollama/types.go
📚 Learning: 2025-12-15T10:16:21.909Z
Learnt from: qwerty-dvorak
Repo: maximhq/bifrost PR: 1006
File: core/providers/huggingface/huggingface_test.go:12-63
Timestamp: 2025-12-15T10:16:21.909Z
Learning: In provider tests under core/providers/<provider>/*_test.go, do not require or flag the use of defer for Shutdown(); instead call client.Shutdown() at the end of each test function. This pattern appears consistent across all provider tests. Apply this rule only within this path; for other tests or resources, defer may still be appropriate.

Applied to files:

  • core/providers/ollama/utils_test.go
🧬 Code graph analysis (1)
core/providers/ollama/utils_test.go (2)
core/schemas/chatcompletions.go (6)
  • ChatMessageRoleAssistant (419-419)
  • ChatAssistantMessage (541-545)
  • ChatAssistantMessageToolCall (564-570)
  • ChatAssistantMessageToolCallFunction (573-576)
  • ChatMessageRoleTool (422-422)
  • ChatMessageRoleUser (420-420)
core/providers/ollama/types.go (3)
  • OllamaMessage (28-35)
  • OllamaToolCall (38-40)
  • OllamaToolCallFunction (43-46)
🔇 Additional comments (23)
core/providers/ollama/embedding.go (3)

9-54: LGTM! Clean Bifrost to Ollama embedding request conversion.

The function correctly handles nil checks, maps the model, and extracts text/texts from the Bifrost EmbeddingInput. Extra parameters (truncate, keep_alive, num_ctx) are properly transferred to Ollama options.


81-94: The []interface{} case is now properly handled.

The implementation correctly iterates through the slice, validates each element is a string, and breaks early if any element fails type assertion. This addresses the previous review concern about JSON-roundtripped shapes.


101-115: Extra params mapping back to Bifrost looks correct.

The condition on line 102 properly checks all three fields before creating the params struct, avoiding unnecessary allocations.

core/providers/ollama/utils_test.go (4)

9-80: Comprehensive test coverage for extractBase64Image.

Good coverage of data URLs (JPEG, PNG), raw base64, HTTP/HTTPS URLs, empty strings, and malformed data URLs. The test cases align well with the expected behavior documented in the implementation.


155-269: Excellent test coverage for tool call conversion semantics.

The tests correctly verify critical Ollama-specific behavior:

  • Tool calls only appear on assistant messages
  • Tool responses use tool_name for correlation (not tool_call_id)
  • The mapping from ToolCallID to function name is validated

This ensures the conversion correctly handles Ollama's native tool call semantics.


271-300: Good edge case coverage for tool response without prior assistant message.

This test validates the fallback behavior where Name field is used when no prior tool call exists to map from. This is important for robustness.


400-481: Round-trip conversion tests look correct.

The convertMessagesFromOllama tests properly verify:

  • Assistant messages with tool calls are reconstructed correctly
  • Tool response messages map tool_name to both Name and ToolCallID fields
core/providers/ollama/utils.go (4)

25-106: Well-documented tool call conversion with Ollama-specific semantics.

The extensive comments explaining Ollama's tool call correlation by function name (not IDs) and the thinking placeholder pattern are helpful. The logic correctly:

  • Filters out thinking placeholders before conversion
  • Extracts thinking content from ExtraContent
  • Skips invalid tool messages without ToolName

171-189: Thinking content preservation via placeholder tool call is a reasonable workaround.

Given that ChatAssistantMessage doesn't have an ExtraContent field, using a dummy _thinking_placeholder tool call to preserve thinking content for passthrough scenarios is a pragmatic solution. The pattern is consistently applied in both directions.


358-375: Base64 validation with padding fallback is robust.

Trying both StdEncoding and RawStdEncoding handles cases where padding may be missing. The minimum length check of 4 is appropriate.


389-420: Tool call argument parsing with fallback is well-handled.

When JSON unmarshaling fails, the raw arguments are preserved under _raw_arguments key, which allows debugging while preventing data loss.

core/providers/ollama/ollama.go (7)

41-81: Response pooling implementation is correct.

The pools are properly initialized with sync.Pool, and the acquire/release functions correctly reset structs before returning them to prevent data leakage between requests.


97-101: Pool pre-warming during initialization is a good optimization.

Pre-allocating pool entries based on concurrency helps reduce allocation pressure during initial requests.


126-170: Centralized request handling is clean and well-structured.

The completeRequest helper properly:

  • Sets extra headers from config
  • Handles Bearer token authentication
  • Uses MakeRequestWithContext for timeout support
  • Decodes and copies the response body before releasing

370-378: Proper documentation of fasthttp streaming context limitations.

The comment at lines 371-373 correctly documents that fasthttp doesn't natively support context cancellation for streaming. The implementation works around this with the select check before scanner.Scan().


414-432: Context cancellation check before scanner.Scan() is correct.

This addresses the previous review concern about fasthttp's lack of native context cancellation for streaming. The select with default case allows the loop to proceed if context is still active.


460-468: Latency bookkeeping is now correct.

The chunk latency is computed (time.Since(lastChunkTime)) before lastChunkTime is updated. This ensures each chunk's latency reflects the time since the previous chunk was received, addressing the previous review concern.


490-498: Verify latency assignment for Responses fallback path.

The chunkLatencyMs computed at line 460 is reused for Responses chunks at line 497. However, when isDone && Type == ResponsesStreamResponseTypeCompleted (line 490), the latency is correctly set to total time since start. This looks correct.

core/providers/ollama/types.go (5)

14-95: Comprehensive Ollama API type definitions.

The request types accurately model Ollama's native API with proper JSON tags and optional field handling. The OllamaOptions struct covers both common and advanced model parameters.


189-293: ToBifrostChatResponse conversion is thorough.

The conversion correctly handles:

  • Timestamp parsing with RFC3339Nano format
  • Tool call mapping with function name as ID (Ollama limitation documented)
  • Thinking content preservation via placeholder pattern
  • Finish reason mapping

331-437: ToBifrostStreamResponse handles streaming chunks correctly.

The streaming conversion properly:

  • Sets chat.completion.chunk object type
  • Builds delta content only when there's actual data (hasDelta check)
  • Maps finish reason on final chunk
  • Adds usage statistics only on the final chunk (r.Done)

454-469: Embedding float64 to float32 conversion is acceptable.

While there is some precision loss, float32 is standard for embeddings in most systems and the trade-off for memory efficiency is reasonable.


495-505: The configuredSet is now properly used for model filtering.

The previous review concern about unused configuredSet is addressed. The code now correctly filters models when configuredModels is non-empty (line 503).

@Davshiv20 Davshiv20 force-pushed the feat/ollama-native-api branch from 7d62378 to 4537ab1 Compare December 20, 2025 12:55
@Davshiv20
Copy link
Author

@Pratham-Mishra04 the ollama cloud provider requires api key while ollama locally doesn't. current provider has implementations of both but in the UI we're either setting base URL or APIkey. So should I proceed to make a separate UI for this or a separate provider implementation for ollama cloud?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature] Switch to Ollama native apis

1 participant