fix: correct conversion of thinking level to thinking budget and vice versa in Gemini #1245

Pratham-Mishra04 · 2026-01-05T09:30:26Z

Summary

Fix the conversion between reasoning effort levels and thinking budgets in Gemini, ensuring proper handling across different Gemini model versions (2.5 and 3.0+).

Changes

Fixed conversion between thinking level and thinking budget in Gemini provider
Added proper model version detection to use appropriate reasoning parameters
Implemented priority rules when both max_tokens and effort are provided
Added special handling for Gemini Pro models which have limited thinking level support
Updated documentation with detailed explanation of Gemini reasoning parameters

Type of change

Affected areas

How to test

Test with different Gemini models and reasoning parameters:

# Test with Gemini 2.5 models
curl -X POST http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gemini-2.5-flash",
    "messages": [{"role": "user", "content": "Solve this math problem: 5x + 3 = 18"}],
    "reasoning": {"effort": "high"}
  }'

# Test with Gemini 3.0 models
curl -X POST http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gemini-3.0-flash",
    "messages": [{"role": "user", "content": "Solve this math problem: 5x + 3 = 18"}],
    "reasoning": {"max_tokens": 4096}
  }'

# Run tests
go test ./core/providers/gemini/...

Breaking changes

Yes
No

Related issues

Fixes inconsistent reasoning behavior between Gemini 2.5 and 3.0+ models.

Security considerations

No security implications.

Checklist

I added/updated tests where appropriate
I updated documentation where needed
I verified builds succeed (Go and UI)
I verified the CI pipeline passes locally if applicable

github-actions · 2026-01-05T09:30:45Z

🧪 Test Suite Available

This PR can be tested by a repository admin.

Run tests for PR #1245

Pratham-Mishra04 · 2026-01-05T09:30:50Z

This stack of pull requests is managed by Graphite. Learn more about stacking.

coderabbitai · 2026-01-05T09:30:58Z

📝 Walkthrough

Summary by CodeRabbit

Bug Fixes
- Fixed conversion of thinking level to thinking budget in Gemini provider.
New Features
- Enhanced Gemini reasoning support with model version-aware parameter handling for thinking budgets and levels.
Documentation
- Updated Gemini reasoning documentation with model version distinctions, conversion rules, and usage examples.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

Walkthrough

This PR implements model version-aware reasoning configuration handling for Gemini. It adds detection for Gemini 3.0+ and refactors conversion logic between Bifrost reasoning parameters (effort/max_tokens) and Gemini's thinking configurations (thinkingLevel/thinkingBudget), with version-specific behavior and updated documentation.

Changes

Cohort / File(s)	Summary
Reasoning Configuration Refactoring `core/providers/gemini/responses.go`, `core/providers/gemini/utils.go`	Major logic overhaul in `convertParamsToGenerationConfigResponses` with nuanced handling for max tokens, thinking budgets, thinking levels, and effort-based reasoning. New functions: `isGemini3Plus()` for model detection and `effortToThinkingLevel()` for effort-to-level mapping. Extended `convertParamsToGenerationConfig()` to accept model parameter and route conversions based on Gemini version.
Type System Updates `core/providers/gemini/types.go`	Added public constants: `MinReasoningMaxTokens`, `DefaultCompletionMaxTokens`, `DefaultReasoningMinBudget`, `DynamicReasoningBudget`. Changed `GenerationConfigThinkingConfig.ThinkingLevel` from custom enum type to `*string`. Refactored `UnmarshalJSON()` to use dual-tag auxiliary struct with camelCase/snake_case variants and `sonic.Unmarshal()`. Removed `ThinkingLevel` type and its constants.
Chat Request Handling `core/providers/gemini/chat.go`	Modified `ToGeminiChatCompletionRequest()` to pass model to `convertParamsToGenerationConfig()`, extending parameter conversion to consider model capabilities.
Documentation Updates `docs/providers/reasoning.mdx`	Comprehensive expansion distinguishing Gemini 2.5+ (budget-only) from Gemini 3.0+ (dual support for budget and level). Added conversion guidance, priority rules for overlapping parameters, version-specific behavior notes, examples, and new subsections on model differences and constraint handling.
Changelog Entries `core/changelog.md`, `transports/changelog.md`	Updated changelog entries to document the reasoning conversion fix across both changelog locations.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant ChatReq as Chat Request Handler
    participant ModelCheck as Model Capability<br/>(isGemini3Plus)
    participant ConfigConv as Config Converter
    participant GeminiAPI as Gemini API
    
    Client->>ChatReq: Bifrost request with<br/>reasoning.effort or<br/>reasoning.max_tokens
    activate ChatReq
    ChatReq->>ModelCheck: Check model version
    activate ModelCheck
    ModelCheck-->>ChatReq: Gemini 3.0+? / Version
    deactivate ModelCheck
    
    alt Has max_tokens (budget)
        ChatReq->>ConfigConv: Convert to thinkingBudget<br/>(both versions)
    else Only effort provided
        alt Gemini 3.0+
            ChatReq->>ConfigConv: Convert to thinkingLevel
        else Gemini 2.5 or earlier
            ChatReq->>ConfigConv: Convert to thinkingBudget<br/>(via effort lookup)
        end
    end
    
    activate ConfigConv
    ConfigConv->>ConfigConv: Set ThinkingConfig with<br/>appropriate field
    ConfigConv-->>ChatReq: GenerationConfig<br/>(reasoning fields set)
    deactivate ConfigConv
    
    ChatReq->>GeminiAPI: Gemini request with<br/>thinkingLevel or<br/>thinkingBudget
    deactivate ChatReq
    GeminiAPI-->>Client: Gemini response with<br/>reasoning details

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

Poem

🐰 Whiskers twitch with glee,
Model versions dance so free—
Budgets bloom and levels leap,
Thinking deep, conversions steep!
Gemini logic, patch complete,
Reasoning flows so neat! 🌟

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)

Check name	Status	Explanation
Title check	✅ Passed	The PR title clearly and specifically describes the main change: fixing the conversion between thinking levels and thinking budgets in Gemini across different model versions.
Description check	✅ Passed	The PR description is comprehensive and follows the template structure with all key sections completed: Summary, Changes, Type of change, Affected areas, How to test, Breaking changes, Related issues, Security considerations, and Checklist.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch 01-05-fix_fix_reasoning_effort_and_budget_convertion_for_gemini

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

Fix all issues with AI Agents 🤖

In @docs/providers/reasoning.mdx:
- Around line 840-851: The numeric example budgets in the table (3482, 2330) are
inconsistent with the documented default and the implementation: update either
the example calculations or the explanatory note so they match
GetBudgetTokensFromReasoningEffort and DefaultCompletionMaxTokens; specifically
either recalc the shown budgets using DefaultCompletionMaxTokens = 8192 (and
replace 3482/2330 with those 8192-based values) or change the footnote/text to
state the examples assume max_completion_tokens = 4096 so the current numbers
remain correct.

🧹 Nitpick comments (2)

core/providers/gemini/responses.go (1)

10-13: Responses-side reasoning config logic stays consistent with chat path

The updated convertParamsToGenerationConfigResponses mirrors the chat path correctly:

Uses DefaultCompletionMaxTokens / DefaultReasoningMinBudget as fallbacks.

Enforces the same priority rule: when both Reasoning.MaxTokens and Reasoning.Effort are set, only thinkingBudget is sent (effort ignored).

Handles special cases (effort == "none", max_tokens == 0, max_tokens == -1) and routes effort-only requests to:

thinkingLevel on Gemini 3.0+ (isGemini3Plus(r.Model) + effortToThinkingLevel), or

a computed thinkingBudget via GetBudgetTokensFromReasoningEffort on older models.

The behavior aligns with the new docs and the chat conversion logic; any error from the estimator is safely ignored as a soft fallback, which is reasonable here.

Also applies to: 2014-2079
core/providers/gemini/utils.go (1)
8-11: Harden Gemini 3+ detection to avoid misclassifying non-numeric model names

The new reasoning logic and helpers are generally solid:

Request/response conversion symmetry looks good.

Priority rules, special values (0, -1, "none"), and Pro-only level constraints all line up with the docs and with GetReasoningEffortFromBudgetTokens / GetBudgetTokensFromReasoningEffort.

The one fragile spot is isGemini3Plus:
idx := strings.Index(model, "gemini-")
if idx == -1 { return false }
afterPrefix := model[idx+7:]
if len(afterPrefix) == 0 { return false }
firstChar := afterPrefix[0]
return firstChar >= '3'
This will treat any model whose first character after gemini- is non‑numeric (e.g. "gemini-pro", "gemini-exp") as “3+” because 'p' / 'e' are ≥ '3' in ASCII. That can flip older or experimental models into the 3.0+ code path and cause us to send thinkingLevel to models that only support thinkingBudget.

Consider tightening the check to only treat models with an explicit numeric major version ≥ 3 as 3+; for example:
Proposed improvement for isGemini3Plus
-func isGemini3Plus(model string) bool {
-	// Convert to lowercase for case-insensitive comparison
-	model = strings.ToLower(model)
-
-	// Find "gemini-" prefix
-	idx := strings.Index(model, "gemini-")
-	if idx == -1 {
-		return false
-	}
-
-	// Get the part after "gemini-"
-	afterPrefix := model[idx+7:] // len("gemini-") = 7
-	if len(afterPrefix) == 0 {
-		return false
-	}
-
-	// Check first character - if it's '3' or higher, it's 3.0+
-	firstChar := afterPrefix[0]
-	return firstChar >= '3'
-}
+func isGemini3Plus(model string) bool {
+	model = strings.ToLower(model)
+
+	idx := strings.Index(model, "gemini-")
+	if idx == -1 {
+		return false
+	}
+
+	afterPrefix := model[idx+7:]
+	if len(afterPrefix) == 0 {
+		return false
+	}
+
+	// Extract leading numeric major version, e.g. "3" from "3.0-flash"
+	i := 0
+	for i < len(afterPrefix) && afterPrefix[i] >= '0' && afterPrefix[i] <= '9' {
+		i++
+	}
+	if i == 0 {
+		// No numeric prefix → treat as pre‑3.0 for safety
+		return false
+	}
+
+	major := int(afterPrefix[0] - '0')
+	return major >= 3
+}
This keeps the function cheap but avoids accidentally routing non‑versioned model names into the 3.0+ reasoning path.

Also applies to: 13-34, 36-65, 448-537

📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b6635d4 and 3fba3dd.

📒 Files selected for processing (7)

core/changelog.md
core/providers/gemini/chat.go
core/providers/gemini/responses.go
core/providers/gemini/types.go
core/providers/gemini/utils.go
docs/providers/reasoning.mdx
transports/changelog.md

🧰 Additional context used

📓 Path-based instructions (1)

**

⚙️ CodeRabbit configuration file

always check the stack if there is one for the current PR. do not give localized reviews for the PR, always see all changes in the light of the whole stack of PRs (if there is a stack, if there is no stack you can continue to make localized suggestions/reviews)

Files:

core/providers/gemini/chat.go
core/changelog.md
docs/providers/reasoning.mdx
core/providers/gemini/responses.go
core/providers/gemini/utils.go
transports/changelog.md
core/providers/gemini/types.go

🧠 Learnings (4)

📚 Learning: 2025-12-09T17:07:42.007Z

Learnt from: qwerty-dvorak
Repo: maximhq/bifrost PR: 1006
File: core/schemas/account.go:9-18
Timestamp: 2025-12-09T17:07:42.007Z
Learning: In core/schemas/account.go, the HuggingFaceKeyConfig field within the Key struct is currently unused and reserved for future Hugging Face inference endpoint deployments. Do not flag this field as missing from OpenAPI documentation or require its presence in the API spec until the feature is actively implemented and used. When the feature is added, update the OpenAPI docs accordingly; otherwise, treat this field as non-breaking and not part of the current API surface.

Applied to files:

core/providers/gemini/chat.go
core/providers/gemini/responses.go
core/providers/gemini/utils.go
core/providers/gemini/types.go

📚 Learning: 2025-12-29T11:54:55.836Z

Learnt from: akshaydeo
Repo: maximhq/bifrost PR: 1153
File: framework/configstore/rdb.go:2221-2246
Timestamp: 2025-12-29T11:54:55.836Z
Learning: In Go reviews, do not flag range-over-int patterns like for i := range n as compile-time errors, assuming Go 1.22+ semantics. Only flag actual range-capable values (slices, arrays, maps, channels, strings) and other compile-time issues. This applies to all Go files across the repository.

Applied to files:

core/providers/gemini/chat.go
core/providers/gemini/responses.go
core/providers/gemini/utils.go
core/providers/gemini/types.go

📚 Learning: 2025-12-19T09:26:54.961Z

Learnt from: qwerty-dvorak
Repo: maximhq/bifrost PR: 1006
File: core/providers/utils/utils.go:1050-1051
Timestamp: 2025-12-19T09:26:54.961Z
Learning: Update streaming end-marker handling so HuggingFace is treated as a non-[DONE] provider for backends that do not emit a DONE marker (e.g., meta llama on novita). In core/providers/utils/utils.go, adjust ProviderSendsDoneMarker() (or related logic) to detect providers that may not emit DONE and avoid relying on DONE as the sole end signal. Add tests to cover both DONE-emitting and non-DONE backends, with clear documentation in code comments explaining the rationale and any fallback behavior.

Applied to files:

core/providers/gemini/chat.go
core/providers/gemini/responses.go
core/providers/gemini/utils.go
core/providers/gemini/types.go

📚 Learning: 2025-12-30T05:37:48.365Z

Learnt from: Pratham-Mishra04
Repo: maximhq/bifrost PR: 1180
File: docs/features/mcp/connecting-to-servers.mdx:452-458
Timestamp: 2025-12-30T05:37:48.365Z
Learning: When reviewing documentation PRs in a Graphite-managed stack, first check related or previous PRs in the stack for feature implementations before flagging documentation as incorrect or unsupported. Documentation MDX files often reference features implemented in earlier stack PRs; verify that the documented behavior exists in earlier changes and that the docs accurately reflect the implemented state before requesting edits.

Applied to files:

docs/providers/reasoning.mdx

🧬 Code graph analysis (3)

core/providers/gemini/chat.go (1)

core/providers/gemini/types.go (1)

GenerationConfig (641-707)

core/providers/gemini/responses.go (3)

core/providers/gemini/types.go (3)

DefaultCompletionMaxTokens (18-18)

DefaultReasoningMinBudget (19-19)

DynamicReasoningBudget (20-20)

core/schemas/utils.go (1)

Ptr (16-18)

core/providers/utils/utils.go (1)

GetBudgetTokensFromReasoningEffort (1350-1387)

core/providers/gemini/utils.go (2)

core/providers/gemini/types.go (3)

DefaultCompletionMaxTokens (18-18)

DefaultReasoningMinBudget (19-19)

DynamicReasoningBudget (20-20)

core/providers/utils/utils.go (2)

GetReasoningEffortFromBudgetTokens (1308-1345)

GetBudgetTokensFromReasoningEffort (1350-1387)

🔇 Additional comments (5)

transports/changelog.md (1)

4-5: Changelog entries align with PR scope

The new bullets accurately describe the XAI docs update and Gemini reasoning conversion fix; nothing else to tweak here.

core/changelog.md (1)

2-3: Core changelog accurately reflects Gemini fix

The added Gemini reasoning conversion entry is consistent with the implementation changes in the provider.

docs/providers/reasoning.mdx (1)

24-25: Gemini reasoning docs match implementation behavior

The updated Gemini sections correctly capture:

2.5+ as budget-only and 3.0+ as budget+level.

Priority rules where reasoning.max_tokens wins over effort and only thinkingBudget is sent when both are present.

Special handling for 0, -1, and "none", as well as Pro-model level restrictions and version detection.

This lines up with the new model-aware logic in convertParamsToGenerationConfig, convertParamsToGenerationConfigResponses, and effortToThinkingLevel.

Also applies to: 538-607, 613-637, 659-687, 706-717, 840-848, 1464-1483, 1497-1510

core/providers/gemini/chat.go (1)

22-25: Model-aware generation config wiring looks correct

Passing bifrostReq.Model into convertParamsToGenerationConfig is exactly what the new Gemini 2.5 vs 3.0+ reasoning logic needs; nothing else to change here.

core/providers/gemini/types.go (1)

17-21: Reasoning constants and ThinkingLevel pointer semantics look correct

The new constants (MinReasoningMaxTokens, DefaultCompletionMaxTokens, DefaultReasoningMinBudget, DynamicReasoningBudget) line up with how Gemini reasoning defaults and special values are used in the converters.

Switching GenerationConfigThinkingConfig.ThinkingLevel to *string plus the updated UnmarshalJSON that handles both camelCase and snake_case fields via sonic.Unmarshal gives you the flexibility needed for model-aware logic without breaking JSON interop.

No issues here.

Also applies to: 913-923, 925-963

docs/providers/reasoning.mdx

akshaydeo · 2026-01-05T10:17:09Z

Merge activity

Jan 5, 10:17 AM UTC: A user started a stack merge that includes this pull request via Graphite.
Jan 5, 10:17 AM UTC: @akshaydeo merged this pull request with Graphite.

fix: fix reasoning effort and budget convertion for gemini

3fba3dd

Pratham-Mishra04 requested review from akshaydeo and danpiths January 5, 2026 09:30

coderabbitai bot requested changes Jan 5, 2026

View reviewed changes

docs/providers/reasoning.mdx Show resolved Hide resolved

coderabbitai bot approved these changes Jan 5, 2026

View reviewed changes

akshaydeo merged commit 48dd068 into main Jan 5, 2026
12 checks passed

akshaydeo deleted the 01-05-fix_fix_reasoning_effort_and_budget_convertion_for_gemini branch January 5, 2026 10:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: correct conversion of thinking level to thinking budget and vice versa in Gemini #1245

fix: correct conversion of thinking level to thinking budget and vice versa in Gemini #1245

Pratham-Mishra04 commented Jan 5, 2026

Uh oh!

github-actions bot commented Jan 5, 2026

Uh oh!

Pratham-Mishra04 commented Jan 5, 2026

Uh oh!

coderabbitai bot commented Jan 5, 2026 •

edited

Loading

Summary by CodeRabbit

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Poem

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

akshaydeo commented Jan 5, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

fix: correct conversion of thinking level to thinking budget and vice versa in Gemini #1245

fix: correct conversion of thinking level to thinking budget and vice versa in Gemini #1245

Conversation

Pratham-Mishra04 commented Jan 5, 2026

Summary

Changes

Type of change

Affected areas

How to test

Breaking changes

Related issues

Security considerations

Checklist

Uh oh!

github-actions bot commented Jan 5, 2026

🧪 Test Suite Available

Uh oh!

Pratham-Mishra04 commented Jan 5, 2026

Uh oh!

coderabbitai bot commented Jan 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Poem

Pre-merge checks and finishing touches

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

akshaydeo commented Jan 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Merge activity

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

coderabbitai bot commented Jan 5, 2026 •

edited

Loading

akshaydeo commented Jan 5, 2026 •

edited

Loading