-
Notifications
You must be signed in to change notification settings - Fork 185
fix: correct conversion of thinking level to thinking budget and vice versa in Gemini #1245
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: correct conversion of thinking level to thinking budget and vice versa in Gemini #1245
Conversation
🧪 Test Suite AvailableThis PR can be tested by a repository admin. |
This stack of pull requests is managed by Graphite. Learn more about stacking. |
📝 WalkthroughSummary by CodeRabbit
✏️ Tip: You can customize this high-level summary in your review settings. WalkthroughThis PR implements model version-aware reasoning configuration handling for Gemini. It adds detection for Gemini 3.0+ and refactors conversion logic between Bifrost reasoning parameters (effort/max_tokens) and Gemini's thinking configurations (thinkingLevel/thinkingBudget), with version-specific behavior and updated documentation. Changes
Sequence Diagram(s)sequenceDiagram
participant Client
participant ChatReq as Chat Request Handler
participant ModelCheck as Model Capability<br/>(isGemini3Plus)
participant ConfigConv as Config Converter
participant GeminiAPI as Gemini API
Client->>ChatReq: Bifrost request with<br/>reasoning.effort or<br/>reasoning.max_tokens
activate ChatReq
ChatReq->>ModelCheck: Check model version
activate ModelCheck
ModelCheck-->>ChatReq: Gemini 3.0+? / Version
deactivate ModelCheck
alt Has max_tokens (budget)
ChatReq->>ConfigConv: Convert to thinkingBudget<br/>(both versions)
else Only effort provided
alt Gemini 3.0+
ChatReq->>ConfigConv: Convert to thinkingLevel
else Gemini 2.5 or earlier
ChatReq->>ConfigConv: Convert to thinkingBudget<br/>(via effort lookup)
end
end
activate ConfigConv
ConfigConv->>ConfigConv: Set ThinkingConfig with<br/>appropriate field
ConfigConv-->>ChatReq: GenerationConfig<br/>(reasoning fields set)
deactivate ConfigConv
ChatReq->>GeminiAPI: Gemini request with<br/>thinkingLevel or<br/>thinkingBudget
deactivate ChatReq
GeminiAPI-->>Client: Gemini response with<br/>reasoning details
Estimated code review effort🎯 4 (Complex) | ⏱️ ~50 minutes Poem
Pre-merge checks and finishing touches✅ Passed checks (3 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
Fix all issues with AI Agents 🤖
In @docs/providers/reasoning.mdx:
- Around line 840-851: The numeric example budgets in the table (3482, 2330) are
inconsistent with the documented default and the implementation: update either
the example calculations or the explanatory note so they match
GetBudgetTokensFromReasoningEffort and DefaultCompletionMaxTokens; specifically
either recalc the shown budgets using DefaultCompletionMaxTokens = 8192 (and
replace 3482/2330 with those 8192-based values) or change the footnote/text to
state the examples assume max_completion_tokens = 4096 so the current numbers
remain correct.
🧹 Nitpick comments (2)
core/providers/gemini/responses.go (1)
10-13: Responses-side reasoning config logic stays consistent with chat pathThe updated
convertParamsToGenerationConfigResponsesmirrors the chat path correctly:
- Uses
DefaultCompletionMaxTokens/DefaultReasoningMinBudgetas fallbacks.- Enforces the same priority rule: when both
Reasoning.MaxTokensandReasoning.Effortare set, onlythinkingBudgetis sent (effort ignored).- Handles special cases (
effort == "none",max_tokens == 0,max_tokens == -1) and routes effort-only requests to:
thinkingLevelon Gemini 3.0+ (isGemini3Plus(r.Model)+effortToThinkingLevel), or- a computed
thinkingBudgetviaGetBudgetTokensFromReasoningEfforton older models.The behavior aligns with the new docs and the chat conversion logic; any error from the estimator is safely ignored as a soft fallback, which is reasonable here.
Also applies to: 2014-2079
core/providers/gemini/utils.go (1)
8-11: Harden Gemini 3+ detection to avoid misclassifying non-numeric model namesThe new reasoning logic and helpers are generally solid:
- Request/response conversion symmetry looks good.
- Priority rules, special values (0, -1,
"none"), and Pro-only level constraints all line up with the docs and withGetReasoningEffortFromBudgetTokens/GetBudgetTokensFromReasoningEffort.The one fragile spot is
isGemini3Plus:idx := strings.Index(model, "gemini-") if idx == -1 { return false } afterPrefix := model[idx+7:] if len(afterPrefix) == 0 { return false } firstChar := afterPrefix[0] return firstChar >= '3'This will treat any model whose first character after
gemini-is non‑numeric (e.g."gemini-pro","gemini-exp") as “3+” because'p'/'e'are ≥'3'in ASCII. That can flip older or experimental models into the 3.0+ code path and cause us to sendthinkingLevelto models that only supportthinkingBudget.Consider tightening the check to only treat models with an explicit numeric major version ≥ 3 as 3+; for example:
Proposed improvement for
isGemini3Plus-func isGemini3Plus(model string) bool { - // Convert to lowercase for case-insensitive comparison - model = strings.ToLower(model) - - // Find "gemini-" prefix - idx := strings.Index(model, "gemini-") - if idx == -1 { - return false - } - - // Get the part after "gemini-" - afterPrefix := model[idx+7:] // len("gemini-") = 7 - if len(afterPrefix) == 0 { - return false - } - - // Check first character - if it's '3' or higher, it's 3.0+ - firstChar := afterPrefix[0] - return firstChar >= '3' -} +func isGemini3Plus(model string) bool { + model = strings.ToLower(model) + + idx := strings.Index(model, "gemini-") + if idx == -1 { + return false + } + + afterPrefix := model[idx+7:] + if len(afterPrefix) == 0 { + return false + } + + // Extract leading numeric major version, e.g. "3" from "3.0-flash" + i := 0 + for i < len(afterPrefix) && afterPrefix[i] >= '0' && afterPrefix[i] <= '9' { + i++ + } + if i == 0 { + // No numeric prefix → treat as pre‑3.0 for safety + return false + } + + major := int(afterPrefix[0] - '0') + return major >= 3 +}This keeps the function cheap but avoids accidentally routing non‑versioned model names into the 3.0+ reasoning path.
Also applies to: 13-34, 36-65, 448-537
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (7)
core/changelog.mdcore/providers/gemini/chat.gocore/providers/gemini/responses.gocore/providers/gemini/types.gocore/providers/gemini/utils.godocs/providers/reasoning.mdxtransports/changelog.md
🧰 Additional context used
📓 Path-based instructions (1)
**
⚙️ CodeRabbit configuration file
always check the stack if there is one for the current PR. do not give localized reviews for the PR, always see all changes in the light of the whole stack of PRs (if there is a stack, if there is no stack you can continue to make localized suggestions/reviews)
Files:
core/providers/gemini/chat.gocore/changelog.mddocs/providers/reasoning.mdxcore/providers/gemini/responses.gocore/providers/gemini/utils.gotransports/changelog.mdcore/providers/gemini/types.go
🧠 Learnings (4)
📚 Learning: 2025-12-09T17:07:42.007Z
Learnt from: qwerty-dvorak
Repo: maximhq/bifrost PR: 1006
File: core/schemas/account.go:9-18
Timestamp: 2025-12-09T17:07:42.007Z
Learning: In core/schemas/account.go, the HuggingFaceKeyConfig field within the Key struct is currently unused and reserved for future Hugging Face inference endpoint deployments. Do not flag this field as missing from OpenAPI documentation or require its presence in the API spec until the feature is actively implemented and used. When the feature is added, update the OpenAPI docs accordingly; otherwise, treat this field as non-breaking and not part of the current API surface.
Applied to files:
core/providers/gemini/chat.gocore/providers/gemini/responses.gocore/providers/gemini/utils.gocore/providers/gemini/types.go
📚 Learning: 2025-12-29T11:54:55.836Z
Learnt from: akshaydeo
Repo: maximhq/bifrost PR: 1153
File: framework/configstore/rdb.go:2221-2246
Timestamp: 2025-12-29T11:54:55.836Z
Learning: In Go reviews, do not flag range-over-int patterns like for i := range n as compile-time errors, assuming Go 1.22+ semantics. Only flag actual range-capable values (slices, arrays, maps, channels, strings) and other compile-time issues. This applies to all Go files across the repository.
Applied to files:
core/providers/gemini/chat.gocore/providers/gemini/responses.gocore/providers/gemini/utils.gocore/providers/gemini/types.go
📚 Learning: 2025-12-19T09:26:54.961Z
Learnt from: qwerty-dvorak
Repo: maximhq/bifrost PR: 1006
File: core/providers/utils/utils.go:1050-1051
Timestamp: 2025-12-19T09:26:54.961Z
Learning: Update streaming end-marker handling so HuggingFace is treated as a non-[DONE] provider for backends that do not emit a DONE marker (e.g., meta llama on novita). In core/providers/utils/utils.go, adjust ProviderSendsDoneMarker() (or related logic) to detect providers that may not emit DONE and avoid relying on DONE as the sole end signal. Add tests to cover both DONE-emitting and non-DONE backends, with clear documentation in code comments explaining the rationale and any fallback behavior.
Applied to files:
core/providers/gemini/chat.gocore/providers/gemini/responses.gocore/providers/gemini/utils.gocore/providers/gemini/types.go
📚 Learning: 2025-12-30T05:37:48.365Z
Learnt from: Pratham-Mishra04
Repo: maximhq/bifrost PR: 1180
File: docs/features/mcp/connecting-to-servers.mdx:452-458
Timestamp: 2025-12-30T05:37:48.365Z
Learning: When reviewing documentation PRs in a Graphite-managed stack, first check related or previous PRs in the stack for feature implementations before flagging documentation as incorrect or unsupported. Documentation MDX files often reference features implemented in earlier stack PRs; verify that the documented behavior exists in earlier changes and that the docs accurately reflect the implemented state before requesting edits.
Applied to files:
docs/providers/reasoning.mdx
🧬 Code graph analysis (3)
core/providers/gemini/chat.go (1)
core/providers/gemini/types.go (1)
GenerationConfig(641-707)
core/providers/gemini/responses.go (3)
core/providers/gemini/types.go (3)
DefaultCompletionMaxTokens(18-18)DefaultReasoningMinBudget(19-19)DynamicReasoningBudget(20-20)core/schemas/utils.go (1)
Ptr(16-18)core/providers/utils/utils.go (1)
GetBudgetTokensFromReasoningEffort(1350-1387)
core/providers/gemini/utils.go (2)
core/providers/gemini/types.go (3)
DefaultCompletionMaxTokens(18-18)DefaultReasoningMinBudget(19-19)DynamicReasoningBudget(20-20)core/providers/utils/utils.go (2)
GetReasoningEffortFromBudgetTokens(1308-1345)GetBudgetTokensFromReasoningEffort(1350-1387)
🔇 Additional comments (5)
transports/changelog.md (1)
4-5: Changelog entries align with PR scopeThe new bullets accurately describe the XAI docs update and Gemini reasoning conversion fix; nothing else to tweak here.
core/changelog.md (1)
2-3: Core changelog accurately reflects Gemini fixThe added Gemini reasoning conversion entry is consistent with the implementation changes in the provider.
docs/providers/reasoning.mdx (1)
24-25: Gemini reasoning docs match implementation behaviorThe updated Gemini sections correctly capture:
- 2.5+ as budget-only and 3.0+ as budget+level.
- Priority rules where
reasoning.max_tokenswins overeffortand onlythinkingBudgetis sent when both are present.- Special handling for
0,-1, and"none", as well as Pro-model level restrictions and version detection.This lines up with the new model-aware logic in
convertParamsToGenerationConfig,convertParamsToGenerationConfigResponses, andeffortToThinkingLevel.Also applies to: 538-607, 613-637, 659-687, 706-717, 840-848, 1464-1483, 1497-1510
core/providers/gemini/chat.go (1)
22-25: Model-aware generation config wiring looks correctPassing
bifrostReq.ModelintoconvertParamsToGenerationConfigis exactly what the new Gemini 2.5 vs 3.0+ reasoning logic needs; nothing else to change here.core/providers/gemini/types.go (1)
17-21: Reasoning constants and ThinkingLevel pointer semantics look correct
- The new constants (
MinReasoningMaxTokens,DefaultCompletionMaxTokens,DefaultReasoningMinBudget,DynamicReasoningBudget) line up with how Gemini reasoning defaults and special values are used in the converters.- Switching
GenerationConfigThinkingConfig.ThinkingLevelto*stringplus the updatedUnmarshalJSONthat handles both camelCase and snake_case fields viasonic.Unmarshalgives you the flexibility needed for model-aware logic without breaking JSON interop.No issues here.
Also applies to: 913-923, 925-963
Merge activity
|

Summary
Fix the conversion between reasoning effort levels and thinking budgets in Gemini, ensuring proper handling across different Gemini model versions (2.5 and 3.0+).
Changes
Type of change
Affected areas
How to test
Test with different Gemini models and reasoning parameters:
Breaking changes
Related issues
Fixes inconsistent reasoning behavior between Gemini 2.5 and 3.0+ models.
Security considerations
No security implications.
Checklist