Skip to content

Conversation

@TejasGhatte
Copy link
Collaborator

@TejasGhatte TejasGhatte commented Dec 22, 2025

Handle Dynamic Thinking Budget Across Providers

This PR implements support for dynamic thinking budget (-1) in Gemini and adapts the behavior for other providers that don't natively support this feature.

Changes

  • Added support for dynamic thinking budget (-1) in Gemini by properly preserving the value
  • Implemented provider-specific handling for dynamic thinking budget:
    • Anthropic: Convert to maximum tokens value
    • Bedrock: Convert to default max tokens
    • Cohere: Remove token budget field entirely
  • Updated documentation to explain how dynamic thinking budget is handled across providers

Type of change

  • Bug fix
  • Feature
  • Refactor
  • Documentation
  • Chore/CI

Affected areas

  • Core (Go)
  • Transports (HTTP)
  • Providers/Integrations
  • Plugins
  • UI (Next.js)
  • Docs

How to test

Test dynamic thinking budget with different providers:

# Test with Gemini
curl -X POST "http://localhost:8080/v1/models/gemini-1.5-pro/chat/completions" \
  -H "Content-Type: application/json" \
  -d '{
    "messages": [{"role": "user", "content": "Complex reasoning task"}],
    "reasoning": {"max_tokens": -1}
  }'

# Test with Anthropic
curl -X POST "http://localhost:8080/v1/models/claude-3-opus-20240229/chat/completions" \
  -H "Content-Type: application/json" \
  -d '{
    "messages": [{"role": "user", "content": "Complex reasoning task"}],
    "reasoning": {"max_tokens": -1}
  }'

# Test with Cohere
curl -X POST "http://localhost:8080/v1/models/command-r-plus/chat/completions" \
  -H "Content-Type: application/json" \
  -d '{
    "messages": [{"role": "user", "content": "Complex reasoning task"}],
    "reasoning": {"max_tokens": -1}
  }'

Breaking changes

  • No

Related issues

Fixes issue with dynamic thinking budget handling across providers

Security considerations

No security implications.

Checklist

  • I added/updated tests where appropriate
  • I updated documentation where needed
  • I verified builds succeed (Go and UI)

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Dec 22, 2025

Caution

Review failed

The head commit changed during the review from 0d6a716 to 53c4db8.

📝 Walkthrough

Walkthrough

Handles dynamic thinking budget sentinel (-1) across multiple provider integrations (Anthropic, Bedrock, Cohere, Gemini): normalizes or explicitly propagates -1 per provider rules, enforces minimum budget where required, and updates docs and changelogs.

Changes

Cohort / File(s) Summary
Changelogs
core/changelog.md, transports/changelog.md
Updated changelog entries to document the fix and note dynamic thinking budget (-1) handling.
Anthropic provider
core/providers/anthropic/chat.go, core/providers/anthropic/responses.go
Normalize reasoning.max_tokens: treat -1 as minimum/default, enforce MinimumReasoningMaxTokens, compute a local budgetTokens and pass a pointer to it instead of using the raw input.
Bedrock provider
core/providers/bedrock/responses.go, core/providers/bedrock/utils.go
Introduce tokenBudget derived from MaxTokens (map -1 → anthropic.MinimumReasoningMaxTokens), use tokenBudget across Anthropic and Nova paths, update reasoning_config and GetReasoningEffortFromBudgetTokens usage accordingly.
Cohere provider
core/providers/cohere/chat.go, core/providers/cohere/responses.go
Build a Thinking object when Reasoning.MaxTokens present; set TokenBudget to nil if MaxTokens == -1 (explicit no fixed cap), otherwise use provided value.
Gemini provider
core/providers/gemini/responses.go, core/providers/gemini/utils.go
Change dynamic thinking-budget handling to explicitly propagate -1: set ThinkingBudget / MaxTokens to -1 (pointer) rather than leaving nil, altering generation config path for dynamic budgeting.
Documentation
docs/integrations/genai-sdk/overview.mdx
Added "Dynamic Thinking Budget" section with provider-specific behaviors and a Python example.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

  • Areas needing extra attention:
    • core/providers/bedrock/responses.go — multiple branching paths (Anthropic vs Nova) and budget propagation logic.
    • core/providers/bedrock/utils.go — tokenBudget derivation and its use in reasoning effort calculations.
    • core/providers/anthropic/* — enforcement of MinimumReasoningMaxTokens and pointer handling for BudgetTokens.
    • core/providers/gemini/* — ensure explicit -1 propagation does not break callers expecting nil.
    • Cross-provider consistency — confirm sentinel semantics (-1 vs nil vs normalized minimum) are applied as intended.

Poem

🐰 I nibble on budgets, one minus-one in sight,
A wink, a hop—dynamic thinking takes flight.
Gemini, Cohere, Anthropic, Bedrock too,
The rabbit hums softly: budgets made true. ✨

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately reflects the main change: handling dynamic thinking budget for Gemini, which is the core feature across multiple providers in this PR.
Description check ✅ Passed The PR description provides a clear summary, detailed changes section, proper checkboxes completion, and comprehensive testing instructions with curl examples for multiple providers.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@TejasGhatte TejasGhatte marked this pull request as ready for review December 22, 2025 14:45
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
core/providers/bedrock/responses.go (1)

1313-1328: Guard request.InferenceConfig before dereferencing in reasoning_config parsing

Here you do:

effort := providerUtils.GetReasoningEffortFromBudgetTokens(
    maxTokens, minBudgetTokens, *request.InferenceConfig.MaxTokens)

without checking that request.InferenceConfig (and MaxTokens) are non-nil. If Bedrock ever sends reasoning_config.budget_tokens without a populated InferenceConfig, this will panic.

Consider defensively:

  • Early‑returning or skipping effort derivation if request.InferenceConfig == nil or request.InferenceConfig.MaxTokens == nil, or
  • Supplying a safe default maxTokens when it’s missing.
🧹 Nitpick comments (4)
core/providers/cohere/responses.go (1)

940-950: Dynamic budget handling looks correct for Cohere.

The logic correctly implements Cohere's requirement to omit the token budget field when -1 is provided. Setting TokenBudget to nil will omit it from the JSON request due to the omitempty tag, which aligns with the PR objectives.

Consider adding input validation.

The code doesn't validate that MaxTokens is either -1 or >= 1 (per the type definition showing TokenBudget should be >= 1). Invalid values like 0, -2, or other negative numbers would be passed through without validation, potentially causing errors downstream.

🔎 Proposed validation
 if bifrostReq.Params.Reasoning.MaxTokens != nil {
 	thinking := &CohereThinking{
 		Type: ThinkingTypeEnabled,
 	}
 	if *bifrostReq.Params.Reasoning.MaxTokens == -1 {
 		// cohere does not support dynamic reasoning budget like gemini
 		// setting it to nil
 		thinking.TokenBudget = nil
 	} else {
+		// Validate that MaxTokens is >= 1
+		if *bifrostReq.Params.Reasoning.MaxTokens < 1 {
+			return nil, fmt.Errorf("reasoning max_tokens must be >= 1 or -1 for dynamic budget, got %d", *bifrostReq.Params.Reasoning.MaxTokens)
+		}
 		thinking.TokenBudget = bifrostReq.Params.Reasoning.MaxTokens
 	}
 	cohereReq.Thinking = thinking

Verify consistency across the provider stack.

Since this PR is part of a Graphite stack with related changes to Gemini, Anthropic, and Bedrock, ensure that the -1 handling pattern is consistent and correctly documented across all providers.

Based on coding guidelines, stack context is important for this review.

core/providers/gemini/utils.go (2)

46-55: Consider clarifying the redundant assignment for dynamic thinking budget.

Line 47 already assigns MaxTokens from ThinkingBudget, so line 53's reassignment of -1 is redundant. While the behavior is correct, this could be simplified for clarity:

💡 Optional refactor to reduce redundancy

If the explicit reassignment isn't needed for future-proofing, consider:

 if config.ThinkingConfig.ThinkingBudget != nil {
     params.Reasoning.MaxTokens = schemas.Ptr(int(*config.ThinkingConfig.ThinkingBudget))
     switch *config.ThinkingConfig.ThinkingBudget {
     case 0:
         params.Reasoning.Effort = schemas.Ptr("none")
-    case -1:
-        // dynamic thinking budget
-        params.Reasoning.MaxTokens = schemas.Ptr(-1)
     }
 }

However, retaining the explicit case may be intentional for:

  • Documentation clarity (making -1 handling visible)
  • Consistency with the case 0 pattern
  • Future-proofing if line 47's behavior changes

46-55: Consider validating negative thinking budget values.

The current implementation handles 0, -1, and positive values, but doesn't validate against other negative values (e.g., -2, -3). Since only -1 has a documented meaning (dynamic thinking budget), consider adding validation to reject other negative values.

🔎 Suggested validation
 if config.ThinkingConfig.ThinkingBudget != nil {
+    if *config.ThinkingConfig.ThinkingBudget < -1 {
+        // Invalid negative value - could log warning or treat as 0
+        params.Reasoning.MaxTokens = schemas.Ptr(0)
+        params.Reasoning.Effort = schemas.Ptr("none")
+    } else {
         params.Reasoning.MaxTokens = schemas.Ptr(int(*config.ThinkingConfig.ThinkingBudget))
         switch *config.ThinkingConfig.ThinkingBudget {
         case 0:
             params.Reasoning.Effort = schemas.Ptr("none")
         case -1:
             // dynamic thinking budget
             params.Reasoning.MaxTokens = schemas.Ptr(-1)
         }
+    }
 }

This ensures only valid values (-1, 0, or positive) are processed.

core/providers/bedrock/responses.go (1)

1467-1514: Bedrock dynamic reasoning budget normalization is coherent across Anthropic & Nova

This block:

  • Uses tokenBudget as the canonical scalar, remapping reasoning.max_tokens == -1 to DefaultCompletionMaxTokens so Bedrock never receives -1.
  • Enforces Anthropic’s minimum reasoning budget before populating "reasoning_config": {"type":"enabled","budget_tokens": tokenBudget}.
  • For Nova, translates tokenBudget into maxReasoningEffort via GetReasoningEffortFromBudgetTokens, and adjusts inferenceConfig for "high" effort by clearing MaxTokens, Temperature, and TopP, which aligns with the existing effort-based path below.

Behavioral nuance to keep in mind: for Anthropic models, tokenBudget is always derived from the Bifrost reasoning budget (or default) and is not clamped against inferenceConfig.MaxTokens, so callers can request a larger reasoning budget than the completion max. If Bedrock/Anthropic ever enforce budget_tokens <= max_tokens, that might need revisiting, but the current logic is internally consistent with your parsing side.

📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 401e9bd and 6c8c03c.

📒 Files selected for processing (10)
  • core/changelog.md
  • core/providers/anthropic/chat.go
  • core/providers/anthropic/responses.go
  • core/providers/bedrock/responses.go
  • core/providers/cohere/chat.go
  • core/providers/cohere/responses.go
  • core/providers/gemini/responses.go
  • core/providers/gemini/utils.go
  • docs/integrations/genai-sdk/overview.mdx
  • transports/changelog.md
🧰 Additional context used
📓 Path-based instructions (1)
**

⚙️ CodeRabbit configuration file

always check the stack if there is one for the current PR. do not give localized reviews for the PR, always see all changes in the light of the whole stack of PRs (if there is a stack, if there is no stack you can continue to make localized suggestions/reviews)

Files:

  • core/providers/gemini/responses.go
  • core/providers/gemini/utils.go
  • core/providers/anthropic/responses.go
  • core/providers/anthropic/chat.go
  • core/providers/bedrock/responses.go
  • docs/integrations/genai-sdk/overview.mdx
  • core/providers/cohere/responses.go
  • transports/changelog.md
  • core/providers/cohere/chat.go
  • core/changelog.md
🧠 Learnings (3)
📚 Learning: 2025-12-09T17:07:42.007Z
Learnt from: qwerty-dvorak
Repo: maximhq/bifrost PR: 1006
File: core/schemas/account.go:9-18
Timestamp: 2025-12-09T17:07:42.007Z
Learning: In core/schemas/account.go, the HuggingFaceKeyConfig field within the Key struct is currently unused and reserved for future Hugging Face inference endpoint deployments. Do not flag this field as missing from OpenAPI documentation or require its presence in the API spec until the feature is actively implemented and used. When the feature is added, update the OpenAPI docs accordingly; otherwise, treat this field as non-breaking and not part of the current API surface.

Applied to files:

  • core/providers/gemini/responses.go
  • core/providers/gemini/utils.go
  • core/providers/anthropic/responses.go
  • core/providers/anthropic/chat.go
  • core/providers/bedrock/responses.go
  • core/providers/cohere/responses.go
  • core/providers/cohere/chat.go
📚 Learning: 2025-12-19T09:26:54.961Z
Learnt from: qwerty-dvorak
Repo: maximhq/bifrost PR: 1006
File: core/providers/utils/utils.go:1050-1051
Timestamp: 2025-12-19T09:26:54.961Z
Learning: Update streaming end-marker handling so HuggingFace is treated as a non-[DONE] provider for backends that do not emit a DONE marker (e.g., meta llama on novita). In core/providers/utils/utils.go, adjust ProviderSendsDoneMarker() (or related logic) to detect providers that may not emit DONE and avoid relying on DONE as the sole end signal. Add tests to cover both DONE-emitting and non-DONE backends, with clear documentation in code comments explaining the rationale and any fallback behavior.

Applied to files:

  • core/providers/gemini/responses.go
  • core/providers/gemini/utils.go
  • core/providers/anthropic/responses.go
  • core/providers/anthropic/chat.go
  • core/providers/bedrock/responses.go
  • core/providers/cohere/responses.go
  • core/providers/cohere/chat.go
📚 Learning: 2025-12-12T10:28:54.988Z
Learnt from: Pratham-Mishra04
Repo: maximhq/bifrost PR: 978
File: core/providers/anthropic/chat.go:207-216
Timestamp: 2025-12-12T10:28:54.988Z
Learning: In core/providers/anthropic/chat.go, within ToAnthropicChatRequest, ensure the mapping from ReasoningDetails to Anthropic thinking blocks preserves all fields from ReasoningDetails, including when Text is nil or empty. Do not drop or coerce missing Text; propagate nil/empty values to maintain complete bidirectional mapping of Anthropic events in Bifrost. Add tests to verify that ReasoningDetails with Text nil or "" round-trip without loss, and update code to handle nullable Text without dropping data. If there are multiple blocks, preserve order and all fields during mapping.

Applied to files:

  • core/providers/anthropic/chat.go
🧬 Code graph analysis (7)
core/providers/gemini/responses.go (1)
core/schemas/utils.go (1)
  • Ptr (16-18)
core/providers/gemini/utils.go (2)
core/schemas/utils.go (1)
  • Ptr (16-18)
core/utils.go (1)
  • Ptr (56-58)
core/providers/anthropic/responses.go (2)
core/providers/anthropic/types.go (2)
  • MinimumReasoningMaxTokens (15-15)
  • AnthropicThinking (71-74)
core/schemas/utils.go (1)
  • Ptr (16-18)
core/providers/anthropic/chat.go (5)
core/providers/anthropic/types.go (2)
  • MinimumReasoningMaxTokens (15-15)
  • AnthropicThinking (71-74)
core/providers/bedrock/types.go (1)
  • MinimumReasoningMaxTokens (12-12)
core/providers/cohere/types.go (1)
  • MinimumReasoningMaxTokens (11-11)
core/schemas/utils.go (1)
  • Ptr (16-18)
core/utils.go (1)
  • Ptr (56-58)
core/providers/bedrock/responses.go (4)
core/providers/bedrock/types.go (2)
  • DefaultCompletionMaxTokens (13-13)
  • MinimumReasoningMaxTokens (12-12)
core/schemas/utils.go (1)
  • IsAnthropicModel (1048-1050)
core/providers/anthropic/types.go (1)
  • MinimumReasoningMaxTokens (15-15)
core/providers/utils/utils.go (1)
  • GetReasoningEffortFromBudgetTokens (1308-1345)
core/providers/cohere/responses.go (1)
core/providers/cohere/types.go (2)
  • CohereThinking (174-177)
  • ThinkingTypeEnabled (183-183)
core/providers/cohere/chat.go (1)
core/providers/cohere/types.go (2)
  • CohereThinking (174-177)
  • ThinkingTypeEnabled (183-183)
⏰ Context from checks skipped due to timeout of 900000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (19)
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
🔇 Additional comments (7)
core/changelog.md (1)

4-5: LGTM!

The changelog entries accurately describe the changes introduced in this PR. The distinction between the refactor (line 4) and the fix for dynamic thinking budget handling (line 5) is clear.

transports/changelog.md (1)

5-6: LGTM!

The changelog entries are consistent with the core changelog and accurately reflect the transport-layer changes for dynamic thinking budget handling.

core/providers/gemini/responses.go (1)

2024-2034: LGTM! Correct implementation of dynamic thinking budget for Gemini.

The logic correctly handles the three cases for reasoning configuration:

  • MaxTokens = 0: Disables thinking
  • MaxTokens = -1: Preserves dynamic thinking budget (Gemini native support)
  • Other values: Sets explicit thinking budget

This implementation aligns with the documentation stating that Gemini preserves -1 for native dynamic thinking support.

docs/integrations/genai-sdk/overview.mdx (1)

283-305: Documentation for dynamic thinking budget handling is accurate and implementations are verified.

All four provider-specific implementations for thinkingBudget: -1 have been confirmed in the codebase:

  • Gemini (core/providers/gemini/utils.go:51-53): Preserves -1 for native dynamic thinking
  • Anthropic (core/providers/anthropic/responses.go:1423-1426): Converts to maximum budget (anthropicReq.MaxTokens)
  • Bedrock (core/providers/bedrock/responses.go:1472-1476): Converts to maximum budget (DefaultCompletionMaxTokens)
  • Cohere (core/providers/cohere/responses.go:943-949): Drops budget field (sets TokenBudget to nil)

The documentation accurately reflects the implementation behavior. No action required.

core/providers/anthropic/chat.go (1)

104-120: Reasoning max_tokens = -1 normalization for Anthropic looks correct

The new budgetTokens path cleanly:

  • Normalizes reasoning.max_tokens == -1 to anthropicReq.MaxTokens (so Anthropic never sees -1).
  • Enforces MinimumReasoningMaxTokens on the resolved value.
  • Writes Thinking.BudgetTokens from the computed scalar instead of aliasing the original pointer.

This matches the intended “dynamic budget” behavior while staying within Anthropic’s static-budget API constraints, and keeps the Effort-based fallback path unchanged.

core/providers/anthropic/responses.go (1)

1420-1435: Consistent max_tokens = -1 handling for Anthropic Responses path

This mirrors the chat-side logic: reasoning.max_tokens == -1 is resolved to anthropicReq.MaxTokens, validated against MinimumReasoningMaxTokens, and written via schemas.Ptr(budgetTokens). That keeps Responses‑based Anthropic requests aligned with the chat path and avoids ever sending a -1 budget upstream while preserving existing Effort handling.

core/providers/cohere/chat.go (1)

107-117: Review verified—implementation is correct across all providers.

The -1 handling for dynamic reasoning budgets is correctly implemented:

  • Cohere (lines 110-117): Correctly sets TokenBudget to nil, omitting the field from the API request
  • Anthropic: Converts -1 to the request's max_tokens value (as verified in the code at lines 108-120)
  • Gemini: Preserves -1 for native dynamic thinking support (as verified in utils.go line 324+)

The implementation aligns with each provider's API specifications and the explicit struct construction improves clarity. The ExtraParams code path (lines 147-159) operates as a fallback and does not contradict the -1 handling logic for the primary Reasoning.MaxTokens path.

@TejasGhatte TejasGhatte force-pushed the 12-22-fix_dynamic_thinking_budget_handling_for_gemini branch from 6c8c03c to 6c01240 Compare December 22, 2025 15:10
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
core/providers/bedrock/responses.go (1)

1471-1493: Model-agnostic default for -1 may be incorrect for Nova models.

When MaxTokens is -1 (line 1472), the code sets tokenBudget to anthropic.MinimumReasoningMaxTokens (1024) for all Bedrock models. However, this Anthropic-specific constant may not be appropriate for Nova models:

  • Anthropic models: MinimumReasoningMaxTokens = 1024 ✓
  • Nova models: MinimumReasoningMaxTokens = 1

For Nova models, using 1024 as the default when -1 is specified could lead to unintended effort level calculations at line 1493, since GetReasoningEffortFromBudgetTokens(1024, 1, defaultMaxTokens) may produce a different effort level than intended.

Consider model-specific defaults:

🔎 Suggested fix: Use model-specific defaults
  tokenBudget := *bifrostReq.Params.Reasoning.MaxTokens
  if *bifrostReq.Params.Reasoning.MaxTokens == -1 {
      // bedrock does not support dynamic reasoning budget like gemini
-     // setting it to default max tokens
-     tokenBudget = anthropic.MinimumReasoningMaxTokens
+     // setting it to model-specific default max tokens
+     if schemas.IsAnthropicModel(bifrostReq.Model) {
+         tokenBudget = anthropic.MinimumReasoningMaxTokens
+     } else if schemas.IsNovaModel(bifrostReq.Model) {
+         // Use a sensible default for Nova, e.g., DefaultCompletionMaxTokens or a Nova-specific constant
+         tokenBudget = DefaultCompletionMaxTokens
+         if inferenceConfig.MaxTokens != nil {
+             tokenBudget = *inferenceConfig.MaxTokens
+         }
+     } else {
+         tokenBudget = MinimumReasoningMaxTokens
+     }
  }
🧹 Nitpick comments (3)
core/providers/anthropic/responses.go (1)

1421-1434: Anthropic responses reasoning budget normalization is consistent but duplicated

Normalizing reasoning.max_tokens == -1 to MinimumReasoningMaxTokens and enforcing >= MinimumReasoningMaxTokens is sensible and matches the chat path, but the logic is now duplicated between ToAnthropicResponsesRequest and ToAnthropicChatRequest. Consider extracting a small helper (e.g., resolveAnthropicBudgetTokens(maxTokens *int) (int, error)) to keep the behavior in one place.

core/providers/anthropic/chat.go (1)

104-119: Shared Anthropic reasoning budget logic should be centralized

This block correctly maps Reasoning.MaxTokens (including -1) to a validated budgetTokens and enables thinking, but it’s effectively identical to the logic in ToAnthropicResponsesRequest. Extracting a common helper would reduce divergence risk if Anthropic’s minimum or mapping rules change later.

core/providers/bedrock/utils.go (1)

39-95: Clarify -1 reasoning budget mapping for Bedrock Anthropic vs Nova and other models

The new tokenBudget flow makes Bedrock’s behavior around reasoning.max_tokens much clearer, but a couple of details are worth double‑checking:

  • For all Bedrock models, including Nova, MaxTokens == -1 is mapped to anthropic.MinimumReasoningMaxTokens. For Anthropic-on-Bedrock that’s intuitive, but for Nova and other Bedrock models it couples behavior to Anthropic’s minimum rather than this package’s MinimumReasoningMaxTokens or the per-model default/max tokens.
  • Only the Anthropic branch validates tokenBudget >= anthropic.MinimumReasoningMaxTokens; Nova and other models will accept very small positive tokenBudget values, while dynamic -1 requests get a comparatively large default.

If this cross-model coupling is intentional, consider adding a short comment explaining why Anthropic’s minimum is the shared default. Otherwise, it may be cleaner to:

  • Use the local MinimumReasoningMaxTokens (or defaultMaxTokens) for Nova/other models, and
  • Optionally normalize/clamp tokenBudget there as well for consistency.
📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 6c8c03c and 6c01240.

📒 Files selected for processing (11)
  • core/changelog.md
  • core/providers/anthropic/chat.go
  • core/providers/anthropic/responses.go
  • core/providers/bedrock/responses.go
  • core/providers/bedrock/utils.go
  • core/providers/cohere/chat.go
  • core/providers/cohere/responses.go
  • core/providers/gemini/responses.go
  • core/providers/gemini/utils.go
  • docs/integrations/genai-sdk/overview.mdx
  • transports/changelog.md
✅ Files skipped from review due to trivial changes (1)
  • transports/changelog.md
🚧 Files skipped from review as they are similar to previous changes (3)
  • core/providers/cohere/responses.go
  • docs/integrations/genai-sdk/overview.mdx
  • core/providers/gemini/responses.go
🧰 Additional context used
📓 Path-based instructions (1)
**

⚙️ CodeRabbit configuration file

always check the stack if there is one for the current PR. do not give localized reviews for the PR, always see all changes in the light of the whole stack of PRs (if there is a stack, if there is no stack you can continue to make localized suggestions/reviews)

Files:

  • core/providers/bedrock/responses.go
  • core/providers/anthropic/responses.go
  • core/providers/cohere/chat.go
  • core/providers/anthropic/chat.go
  • core/providers/gemini/utils.go
  • core/providers/bedrock/utils.go
  • core/changelog.md
🧠 Learnings (3)
📚 Learning: 2025-12-09T17:07:42.007Z
Learnt from: qwerty-dvorak
Repo: maximhq/bifrost PR: 1006
File: core/schemas/account.go:9-18
Timestamp: 2025-12-09T17:07:42.007Z
Learning: In core/schemas/account.go, the HuggingFaceKeyConfig field within the Key struct is currently unused and reserved for future Hugging Face inference endpoint deployments. Do not flag this field as missing from OpenAPI documentation or require its presence in the API spec until the feature is actively implemented and used. When the feature is added, update the OpenAPI docs accordingly; otherwise, treat this field as non-breaking and not part of the current API surface.

Applied to files:

  • core/providers/bedrock/responses.go
  • core/providers/anthropic/responses.go
  • core/providers/cohere/chat.go
  • core/providers/anthropic/chat.go
  • core/providers/gemini/utils.go
  • core/providers/bedrock/utils.go
📚 Learning: 2025-12-19T09:26:54.961Z
Learnt from: qwerty-dvorak
Repo: maximhq/bifrost PR: 1006
File: core/providers/utils/utils.go:1050-1051
Timestamp: 2025-12-19T09:26:54.961Z
Learning: Update streaming end-marker handling so HuggingFace is treated as a non-[DONE] provider for backends that do not emit a DONE marker (e.g., meta llama on novita). In core/providers/utils/utils.go, adjust ProviderSendsDoneMarker() (or related logic) to detect providers that may not emit DONE and avoid relying on DONE as the sole end signal. Add tests to cover both DONE-emitting and non-DONE backends, with clear documentation in code comments explaining the rationale and any fallback behavior.

Applied to files:

  • core/providers/bedrock/responses.go
  • core/providers/anthropic/responses.go
  • core/providers/cohere/chat.go
  • core/providers/anthropic/chat.go
  • core/providers/gemini/utils.go
  • core/providers/bedrock/utils.go
📚 Learning: 2025-12-12T10:28:54.988Z
Learnt from: Pratham-Mishra04
Repo: maximhq/bifrost PR: 978
File: core/providers/anthropic/chat.go:207-216
Timestamp: 2025-12-12T10:28:54.988Z
Learning: In core/providers/anthropic/chat.go, within ToAnthropicChatRequest, ensure the mapping from ReasoningDetails to Anthropic thinking blocks preserves all fields from ReasoningDetails, including when Text is nil or empty. Do not drop or coerce missing Text; propagate nil/empty values to maintain complete bidirectional mapping of Anthropic events in Bifrost. Add tests to verify that ReasoningDetails with Text nil or "" round-trip without loss, and update code to handle nullable Text without dropping data. If there are multiple blocks, preserve order and all fields during mapping.

Applied to files:

  • core/providers/anthropic/chat.go
🧬 Code graph analysis (6)
core/providers/bedrock/responses.go (5)
core/providers/anthropic/types.go (1)
  • MinimumReasoningMaxTokens (15-15)
core/providers/bedrock/types.go (1)
  • MinimumReasoningMaxTokens (12-12)
core/providers/cohere/types.go (1)
  • MinimumReasoningMaxTokens (11-11)
core/schemas/utils.go (1)
  • IsAnthropicModel (1048-1050)
core/providers/utils/utils.go (1)
  • GetReasoningEffortFromBudgetTokens (1308-1345)
core/providers/anthropic/responses.go (2)
core/providers/anthropic/types.go (2)
  • MinimumReasoningMaxTokens (15-15)
  • AnthropicThinking (71-74)
core/schemas/utils.go (1)
  • Ptr (16-18)
core/providers/cohere/chat.go (1)
core/providers/cohere/types.go (2)
  • CohereThinking (174-177)
  • ThinkingTypeEnabled (183-183)
core/providers/anthropic/chat.go (5)
core/providers/anthropic/types.go (2)
  • MinimumReasoningMaxTokens (15-15)
  • AnthropicThinking (71-74)
core/providers/bedrock/types.go (1)
  • MinimumReasoningMaxTokens (12-12)
core/providers/cohere/types.go (1)
  • MinimumReasoningMaxTokens (11-11)
core/schemas/utils.go (1)
  • Ptr (16-18)
core/utils.go (1)
  • Ptr (56-58)
core/providers/gemini/utils.go (2)
core/schemas/utils.go (1)
  • Ptr (16-18)
core/utils.go (1)
  • Ptr (56-58)
core/providers/bedrock/utils.go (4)
core/providers/anthropic/types.go (1)
  • MinimumReasoningMaxTokens (15-15)
core/providers/bedrock/types.go (1)
  • MinimumReasoningMaxTokens (12-12)
core/schemas/utils.go (1)
  • IsAnthropicModel (1048-1050)
core/providers/utils/utils.go (1)
  • GetReasoningEffortFromBudgetTokens (1308-1345)
⏰ Context from checks skipped due to timeout of 900000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
🔇 Additional comments (3)
core/providers/gemini/utils.go (1)

46-55: Dynamic reasoning budget sentinel now preserved for Gemini

Explicitly setting params.Reasoning.MaxTokens to Ptr(-1) for ThinkingBudget == -1 matches the dynamic-budget sentinel semantics and keeps Gemini aligned with other providers’ handling.

core/changelog.md (1)

4-5: Changelog entry accurately captures provider reasoning-budget changes

The new entries clearly describe the Vertex/Gemini endpoint refactor and cross-provider dynamic thinking budget handling.

core/providers/cohere/chat.go (1)

104-118: Cohere dynamic budget handling matches intended semantics

Treating Reasoning.MaxTokens == -1 as TokenBudget = nil (while enabling thinking) correctly encodes “no explicit budget” for Cohere, and keeps explicit positive budgets unchanged.

@TejasGhatte TejasGhatte force-pushed the 12-22-fix_dynamic_thinking_budget_handling_for_gemini branch 2 times, most recently from 84057b8 to 0d6a716 Compare December 22, 2025 15:55
Copy link
Collaborator

Pratham-Mishra04 commented Dec 22, 2025

Merge activity

  • Dec 22, 4:04 PM UTC: A user started a stack merge that includes this pull request via Graphite.
  • Dec 22, 4:18 PM UTC: Graphite rebased this pull request as part of a merge.
  • Dec 22, 4:19 PM UTC: @Pratham-Mishra04 merged this pull request with Graphite.

@Pratham-Mishra04 Pratham-Mishra04 changed the base branch from 12-22-fix_append_bedrock_and_cohere_routes_in_langchain_and_litellm_and_docs_fixes to graphite-base/1158 December 22, 2025 16:13
@Pratham-Mishra04 Pratham-Mishra04 changed the base branch from graphite-base/1158 to main December 22, 2025 16:16
@Pratham-Mishra04 Pratham-Mishra04 force-pushed the 12-22-fix_dynamic_thinking_budget_handling_for_gemini branch from 0d6a716 to 53c4db8 Compare December 22, 2025 16:17
@Pratham-Mishra04 Pratham-Mishra04 merged commit 60d7206 into main Dec 22, 2025
8 checks passed
@Pratham-Mishra04 Pratham-Mishra04 deleted the 12-22-fix_dynamic_thinking_budget_handling_for_gemini branch December 22, 2025 16:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants