Skip to content

Conversation

@TejasGhatte
Copy link
Collaborator

@TejasGhatte TejasGhatte commented Dec 11, 2025

Summary

Enhances Gemini provider support for reasoning/thinking capabilities, improves thought signature handling, and fixes response processing order for better compatibility with Gemini 3 Pro.

Changes

  • Added support for both camelCase and snake_case in Gemini thinking configuration
  • Fixed thought signature handling by properly encoding to base64
  • Improved response processing order to prioritize thoughts before regular text
  • Added support for capturing unknown fields in Gemini generation requests
  • Updated integration tests to support thinking/reasoning across providers
  • Fixed role mapping in Bifrost message conversion to Gemini format
  • Added reasoning summary parameter support

Type of change

  • Bug fix
  • Feature
  • Refactor
  • Documentation
  • Chore/CI

Affected areas

  • Core (Go)
  • Transports (HTTP)
  • Providers/Integrations
  • Plugins
  • UI (Next.js)
  • Docs

How to test

Test the Gemini provider with reasoning/thinking capabilities:

# Core/Transports
go test ./core/providers/gemini/...

# Integration tests
cd tests/integrations
python -m pytest tests/test_google.py::TestGoogleIntegration::test_26_extended_thinking -v
python -m pytest tests/test_google.py::TestGoogleIntegration::test_27_extended_thinking_streaming -v

Breaking changes

  • Yes
  • No

Related issues

Improves compatibility with Gemini 3 Pro and its reasoning capabilities.

Security considerations

No security implications as this is enhancing existing provider functionality.

Checklist

  • I added/updated tests where appropriate
  • I verified builds succeed (Go and UI)
  • I verified the CI pipeline passes locally if applicable

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Dec 11, 2025

Caution

Review failed

The pull request is closed.

📝 Walkthrough

Summary by CodeRabbit

  • New Features

    • Added native Gemini API support and optional raw-request inclusion in streaming/non-streaming responses.
  • Improvements

    • Expanded cross-provider "thinking"/reasoning support and normalized role handling across providers.
    • Improved streaming reliability: better final-chunk detection and more consistent metadata for streamed outputs.
  • Chores

    • Bumped core/framework/plugins versions and updated integration test configurations for cross-provider thinking.

✏️ Tip: You can customize this high-level summary in your review settings.

Walkthrough

Stop propagating ThoughtSignature in Gemini streaming; refactor Gemini responses state with indexed item IDs, text buffering, sonic/json changes, and base64 handling; add GenerationConfigThinkingConfig UnmarshalJSON; thread optional raw-request capture through Gemini/Vertex streaming; broaden cross-provider thinking tests and bump versions/deps.

Changes

Cohort / File(s) Summary
Gemini: streaming tool-call change
core/providers/gemini/chat.go
Stop preserving/attaching ThoughtSignature into ToolCall.ExtraContent for streaming responses.
Gemini: responses processing & state
core/providers/gemini/responses.go
Add TextOutputIndex, TextBuffer, helpers (closeTextItemIfOpen, nextOutputIndex, generateItemID); centralize index/ID generation; accumulate text deltas; switch JSON marshalling to sonic.Marshal; base64 encode/decode reasoning signatures and binary payloads; update content block conversions.
Gemini: types & utils
core/providers/gemini/types.go, core/providers/gemini/utils.go
Add UnmarshalJSON for GenerationConfigThinkingConfig to accept camelCase and snake_case; adjust ThinkingConfig/budget mapping and role normalization (user → "user", others → "model"); set reasoning summary default for certain models.
Gemini: raw-request propagation & streaming loops
core/providers/gemini/gemini.go
Thread sendBackRawRequest bool into streaming handlers; attach parsed RawRequest to streaming extras on final chunk when enabled; add ctx.Done checks in streaming loops; keep raw-response gating consistent.
Vertex: authQuery & raw flags forwarding
core/providers/vertex/vertex.go
Support passing Gemini API key via query param (authQuery) for Gemini deployments; append authQuery when present and skip token fetch; forward raw-request/raw-response flags through providerUtils helpers; set ModelDeployment when model ≠ deployment.
Cohere: tool message mapping
core/providers/cohere/chat.go
Populate ChatToolMessage.ToolCallID when Cohere message role is "tool".
Tests: configs & deps
tests/integrations/config.yml, tests/integrations/pyproject.toml
Add thinking entries for providers (openai, gemini, bedrock/anthropic replacement); bump google-genai to >=1.50.0.
Tests: cross-provider thinking & budgets
tests/integrations/tests/test_anthropic.py, tests/integrations/tests/test_google.py, tests/integrations/tests/test_openai.py
Parameterize extended-thinking tests across providers/models, adjust test signatures, lower thinking budgets, increase output token limits, gate provider-specific thought/usage assertions (Gemini), and add helper extraction for Google function calls.
Tests: Bedrock and test utils
tests/integrations/tests/test_bedrock.py, tests/integrations/tests/utils/common.py
Increase Bedrock test maxTokens and add a skip; add get_content_string(content: Any) -> str helper and use it for OpenAI content assertions.
Streaming telemetry & loops
core/providers/gemini/gemini.go, core/providers/vertex/vertex.go, other streaming handlers
Add context cancellation checks in streaming loops; ensure raw-request/raw-response telemetry gating consistent between streaming and non-streaming flows.
Versions & changelogs
core/version, framework/version, plugins/*/version, */changelog.md
Bump core/framework/plugin versions; add/update changelog entries (e.g., "switched to gemini native api", "fallback to supported request type for custom models").
Misc: payload conversions & encoding
core/providers/gemini/responses.go, core/providers/gemini/utils.go
Convert images/audio/files to data URLs/base64 where appropriate; replace json.Marshal with sonic.Marshal for function args and payloads.
Minor: logging / error handling
transports/bifrost-http/integrations/anthropic.go
Remove log import and silent marshaling error path in streaming response handling (skip logging on JSON marshal error).

Sequence Diagram(s)

mermaid
sequenceDiagram
participant Client
participant Bifrost as Bifrost/Server
participant GeminiProv as Gemini Provider
participant GeminiAPI as Gemini Native API
Note over Client,Bifrost: Streaming request initiated
Client->>Bifrost: Start stream (chat/responses)
Bifrost->>GeminiProv: Forward stream request (with sendBackRawRequest flag)
GeminiProv->>GeminiAPI: Stream to Gemini native API
GeminiAPI-->>GeminiProv: Streamed chunks (intermediate)
GeminiProv-->>Bifrost: Emit streamed chunks (no RawRequest attached)
alt Final chunk received
GeminiAPI-->>GeminiProv: final chunk (FinishReason + UsageMetadata)
GeminiProv->>GeminiProv: detect last chunk using FinishReason+UsageMetadata
GeminiProv->>Bifrost: Emit final chunk (attach parsed RawRequest if sendBackRawRequest=true)
Bifrost-->>Client: Final chunk with extras (maybe RawRequest)
end

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

  • Focus areas:
    • core/providers/gemini/responses.go: new state helpers, index/ID generation, TextBuffer semantics, sonic vs json marshaling, base64 handling.
    • core/providers/gemini/chat.go: removal of ThoughtSignature propagation and downstream impacts on consumers.
    • core/providers/gemini/types.go & utils.go: UnmarshalJSON correctness and ThinkingConfig precedence.
    • core/providers/gemini/gemini.go & core/providers/vertex/vertex.go: raw-request/raw-response plumbing, authQuery behavior, streaming loop cancellation.
    • Tests: cross-provider parametrization, changed token budgets, and updated helpers.

Possibly related PRs

Suggested reviewers

  • akshaydeo
  • danpiths

Poem

🐰 I nibble code where signatures hid,
Streams hop lighter, no nested lid.
IDs line up, base64 hums along,
Raw requests whispered at the stream's last song.
Tests stretch their legs—rabbit dance, so strong.

Pre-merge checks and finishing touches

❌ Failed checks (1 warning, 1 inconclusive)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 65.91% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
Title check ❓ Inconclusive The title refers to 'integration test cases fixes' which is partially addressed in the PR, but the main changes involve Gemini provider enhancements for reasoning/thinking capabilities, not just integration test fixes. Consider a more specific title like 'feat: add Gemini reasoning support and improve provider compatibility' to better reflect the primary changes in the PR.
✅ Passed checks (1 passed)
Check name Status Explanation
Description check ✅ Passed The PR description is comprehensive and follows the template structure with all major sections completed including summary, changes, type of change, affected areas, testing instructions, and security considerations.

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 59dee28 and 3f22dc6.

⛔ Files ignored due to path filters (1)
  • tests/integrations/uv.lock is excluded by !**/*.lock
📒 Files selected for processing (36)
  • core/changelog.md (1 hunks)
  • core/providers/cohere/chat.go (1 hunks)
  • core/providers/gemini/chat.go (1 hunks)
  • core/providers/gemini/gemini.go (11 hunks)
  • core/providers/gemini/responses.go (27 hunks)
  • core/providers/gemini/types.go (2 hunks)
  • core/providers/gemini/utils.go (3 hunks)
  • core/providers/vertex/vertex.go (8 hunks)
  • core/version (1 hunks)
  • framework/changelog.md (1 hunks)
  • framework/version (1 hunks)
  • plugins/governance/changelog.md (1 hunks)
  • plugins/governance/version (1 hunks)
  • plugins/jsonparser/changelog.md (1 hunks)
  • plugins/jsonparser/version (1 hunks)
  • plugins/logging/changelog.md (1 hunks)
  • plugins/logging/version (1 hunks)
  • plugins/maxim/changelog.md (1 hunks)
  • plugins/maxim/version (1 hunks)
  • plugins/mocker/changelog.md (1 hunks)
  • plugins/mocker/version (1 hunks)
  • plugins/otel/changelog.md (1 hunks)
  • plugins/otel/version (1 hunks)
  • plugins/semanticcache/changelog.md (1 hunks)
  • plugins/semanticcache/version (1 hunks)
  • plugins/telemetry/changelog.md (1 hunks)
  • plugins/telemetry/version (1 hunks)
  • tests/integrations/config.yml (6 hunks)
  • tests/integrations/pyproject.toml (1 hunks)
  • tests/integrations/tests/test_anthropic.py (5 hunks)
  • tests/integrations/tests/test_bedrock.py (6 hunks)
  • tests/integrations/tests/test_google.py (9 hunks)
  • tests/integrations/tests/test_openai.py (5 hunks)
  • tests/integrations/tests/utils/common.py (2 hunks)
  • transports/bifrost-http/integrations/anthropic.go (0 hunks)
  • transports/changelog.md (1 hunks)

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@TejasGhatte TejasGhatte marked this pull request as ready for review December 11, 2025 20:04
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
core/providers/gemini/utils.go (1)

670-680: Role mapping may incorrectly map tool messages.

The current logic maps all non-user roles to "model", but ChatMessageRoleTool (tool responses from the user providing function call results) should map to "user" in Gemini's format, not "model". Also, line 673-674 is now dead code since it's always overwritten.

Consider this correction:

 		if len(parts) > 0 {
 			content := Content{
 				Parts: parts,
-				Role:  string(message.Role),
 			}
-			if message.Role == schemas.ChatMessageRoleUser {
+			switch message.Role {
+			case schemas.ChatMessageRoleUser, schemas.ChatMessageRoleTool:
 				content.Role = "user"
-			} else {
+			default:
 				content.Role = "model"
 			}
 			contents = append(contents, content)
 		}
🧹 Nitpick comments (5)
tests/integrations/tests/test_bedrock.py (2)

164-167: Unconditional skip makes test body effectively dead

pytest.skip("Skipping text completion invoke test") at the top means test_01_text_completion_invoke never executes, even when Bedrock keys are present. If this is a long‑term decision, consider either marking the test itself as @pytest.mark.skip(reason=...) and removing the unreachable body, or documenting the reason (flaky, deprecated, etc.) so future maintainers know whether/when to re‑enable it.


210-216: Higher Bedrock maxTokens values look appropriate

Bumping inferenceConfig={"maxTokens": 500} for tool-calling, image analysis, and end‑to‑end tool tests aligns with extended reasoning and reduces truncation risk, and 500 is still a conservative budget for Bedrock. If you find yourself tweaking these in more places, it may be worth centralizing typical test budgets (e.g., via test_settings or a small helper) instead of scattering literals.

Also applies to: 249-255, 296-301, 685-691, 726-731

tests/integrations/tests/test_openai.py (1)

1385-1391: Cross-provider reasoning test is robust; address unused test_config

The refactored test_38_responses_reasoning nicely parameterizes over get_cross_provider_params_for_scenario("thinking") and degrades gracefully by retrying without reasoning when a backend doesn’t support those parameters. Two small nits:

  • test_config is unused here and Ruff flags it (ARG002); either drop it from the signature or explicitly mark this test with a # noqa: ARG002 if you want consistent fixture signatures.
  • The docstring still says “(gpt-5 model)” even though the test now runs across multiple providers; consider updating it to reflect the broader scope.

Also applies to: 1411-1523

tests/integrations/tests/test_anthropic.py (1)

673-691: Extended thinking tests are well-structured; tidy up unused test_config

The non‑streaming and streaming extended‑thinking tests now parameterize over the thinking scenario and use format_provider_model(provider, model) with reduced max_tokens/budget_tokens, plus extra_body={"reasoning_summary": "detailed"}, which aligns with the new reasoning support. The assertions on thinking blocks and reasoning keywords are strong. The only minor issue is that test_config isn’t used in either test, triggering Ruff ARG002; you can safely drop it from these two signatures or add a # noqa: ARG002 if you prefer to keep a uniform fixture pattern.

Also applies to: 699-745, 752-769, 771-852

tests/integrations/tests/test_google.py (1)

889-909: Extended thinking tests fit the new Gemini semantics; fix small lint issues

The non‑streaming (test_26_extended_thinking) and streaming (test_27_extended_thinking_streaming) tests correctly:

  • Parameterize over the thinking scenario using format_provider_model(provider, model).
  • Configure ThinkingConfig(include_thoughts=True, thinking_budget=2000) with larger max_output_tokens to exercise reasoning.
  • Scope strict “thoughts token count” checks to provider == "gemini", which avoids penalizing other providers.

Two minor cleanups:

  • test_config is unused in both tests (Ruff ARG002); consider removing it from the signatures or adding # noqa: ARG002 on each definition to keep the fixture pattern without tripping lint.
  • In test_27, has_thoughts is assigned but never used (F841). If you want to assert on Gemini thoughts as you do in the non‑streaming test, add an assert has_thoughts under if provider == "gemini"; otherwise, you can drop has_thoughts entirely and just log thoughts_token_count to satisfy the linter. For example:
-        if provider == "gemini":
-            has_thoughts = False
-            thoughts_token_count = 0
-            
-            if hasattr(final_usage, "thoughts_token_count"):
-                thoughts_token_count = final_usage.thoughts_token_count
-                has_thoughts = thoughts_token_count > 0
-                print(f"Found thoughts with {thoughts_token_count} tokens")
+        if provider == "gemini" and hasattr(final_usage, "thoughts_token_count"):
+            thoughts_token_count = final_usage.thoughts_token_count
+            print(f"Found thoughts with {thoughts_token_count} tokens")

Also applies to: 920-937, 971-985, 1018-1027

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5edcf23 and 7f5bb40.

⛔ Files ignored due to path filters (1)
  • tests/integrations/uv.lock is excluded by !**/*.lock
📒 Files selected for processing (11)
  • core/providers/gemini/chat.go (0 hunks)
  • core/providers/gemini/responses.go (4 hunks)
  • core/providers/gemini/types.go (3 hunks)
  • core/providers/gemini/utils.go (2 hunks)
  • tests/integrations/config.yml (6 hunks)
  • tests/integrations/pyproject.toml (1 hunks)
  • tests/integrations/tests/test_anthropic.py (5 hunks)
  • tests/integrations/tests/test_bedrock.py (6 hunks)
  • tests/integrations/tests/test_google.py (8 hunks)
  • tests/integrations/tests/test_openai.py (5 hunks)
  • tests/integrations/tests/utils/common.py (2 hunks)
💤 Files with no reviewable changes (1)
  • core/providers/gemini/chat.go
🧰 Additional context used
📓 Path-based instructions (1)
**

⚙️ CodeRabbit configuration file

always check the stack if there is one for the current PR. do not give localized reviews for the PR, always see all changes in the light of the whole stack of PRs (if there is a stack, if there is no stack you can continue to make localized suggestions/reviews)

Files:

  • tests/integrations/config.yml
  • core/providers/gemini/utils.go
  • tests/integrations/tests/utils/common.py
  • core/providers/gemini/types.go
  • tests/integrations/tests/test_openai.py
  • core/providers/gemini/responses.go
  • tests/integrations/tests/test_bedrock.py
  • tests/integrations/pyproject.toml
  • tests/integrations/tests/test_google.py
  • tests/integrations/tests/test_anthropic.py
🧠 Learnings (1)
📚 Learning: 2025-12-09T17:07:42.007Z
Learnt from: qwerty-dvorak
Repo: maximhq/bifrost PR: 1006
File: core/schemas/account.go:9-18
Timestamp: 2025-12-09T17:07:42.007Z
Learning: In core/schemas/account.go, the HuggingFaceKeyConfig field within the Key struct is currently unused and reserved for future Hugging Face inference endpoint deployments. Do not flag this field as missing from OpenAPI documentation or require its presence in the API spec until the feature is actively implemented and used. When the feature is added, update the OpenAPI docs accordingly; otherwise, treat this field as non-breaking and not part of the current API surface.

Applied to files:

  • core/providers/gemini/utils.go
  • core/providers/gemini/types.go
  • core/providers/gemini/responses.go
🧬 Code graph analysis (3)
core/providers/gemini/utils.go (3)
core/schemas/utils.go (1)
  • SafeExtractStringPointer (468-483)
core/providers/gemini/types.go (1)
  • Role (15-15)
core/schemas/chatcompletions.go (1)
  • ChatMessageRoleUser (462-462)
core/providers/gemini/types.go (3)
core/providers/anthropic/types.go (1)
  • Alias (103-103)
core/providers/bedrock/types.go (1)
  • Alias (101-101)
core/providers/openai/types.go (2)
  • Alias (79-79)
  • Alias (190-190)
tests/integrations/tests/test_anthropic.py (1)
tests/integrations/tests/utils/parametrize.py (2)
  • get_cross_provider_params_for_scenario (12-47)
  • format_provider_model (50-65)
🪛 Ruff (0.14.8)
tests/integrations/tests/test_openai.py

1387-1387: Unused method argument: test_config

(ARG002)

tests/integrations/tests/test_google.py

891-891: Unused method argument: test_config

(ARG002)


967-967: Unused method argument: test_config

(ARG002)


1025-1025: Local variable has_thoughts is assigned to but never used

Remove assignment to unused variable has_thoughts

(F841)

tests/integrations/tests/test_anthropic.py

675-675: Unused method argument: test_config

(ARG002)


754-754: Unused method argument: test_config

(ARG002)

⏰ Context from checks skipped due to timeout of 900000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (26)
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
🔇 Additional comments (13)
core/providers/gemini/types.go (1)

981-1019: LGTM - Well-structured dual-case JSON unmarshalling.

The implementation correctly uses pointer types to detect field presence and properly falls back from camelCase to snake_case. This ensures compatibility with both the Gemini API (camelCase) and the genai SDK (snake_case).

core/providers/gemini/utils.go (1)

35-37: LGTM - Proper extraction of reasoning_summary.

The code correctly uses the SafeExtractStringPointer utility to safely extract the reasoning_summary from ExtraParams and populate params.Reasoning.Summary.

core/providers/gemini/responses.go (4)

730-740: LGTM - Reordering of part processing for thought prioritization.

The switch case reordering ensures thought/reasoning content is processed before regular text, and function calls are handled before thought signatures. This aligns with the PR objective of prioritizing thought responses before regular text.


1679-1693: LGTM - Proper handling of regular text parts.

The new case correctly handles non-thought text content, creating a properly structured ResponsesMessage with the appropriate type and content block structure.


1723-1736: LGTM - Consistent base64 encoding of thought signatures.

The change to use base64.StdEncoding.EncodeToString for thought signatures is consistent with the streaming path (line 928) and the standalone ThoughtSignature case (line 1868), ensuring uniform encoding across all code paths.


1898-1914: Mutual exclusivity of ThinkingBudget and ThinkingLevel is correctly enforced.

The condition that sets ThinkingLevel only when MaxTokens is nil is correct. Gemini API documentation confirms that thinkingLevel and thinkingBudget must not be sent together in the same request, as this can cause an error. The code properly prioritizes explicit token budgets (MaxTokensThinkingBudget) over effort-based reasoning levels (EffortThinkingLevel), preventing conflicting configurations.

tests/integrations/config.yml (1)

36-45: Thinking-capable model wiring is consistent with new tests

Defining thinking models for OpenAI, Anthropic, Gemini, and Bedrock and enabling the thinking scenario for those providers lines up with the new get_cross_provider_params_for_scenario("thinking") usages in the test suite. Please just double‑check that these model IDs exist and are configured in Bifrost (and any staging/prod envs) so the new cross‑provider reasoning tests don’t fail due to missing models or routing.

Also applies to: 51-57, 63-68, 78-84, 108-126, 159-176, 184-201

tests/integrations/tests/utils/common.py (1)

545-568: Content normalization via get_content_string is a solid improvement

Routing OpenAI content through get_content_string in both assert_valid_chat_response and assert_valid_image_response correctly handles the newer “list of content parts” shape and prevents .lower() calls on non‑strings. The helper’s handling of dict/text parts is defensive and should make the assertions more robust across SDK versions and providers.

Also applies to: 600-620, 1878-1891

tests/integrations/pyproject.toml (1)

24-37: google-genai>=1.50.0 matches new ThinkingConfig usage

Raising the minimum google-genai version to >=1.50.0 is consistent with the use of types.ThinkingConfig, richer speech APIs, and streaming semantics in test_google.py. Just ensure your CI and any developer environments running these integration tests are actually upgraded to at least this version.

tests/integrations/tests/test_openai.py (1)

1126-1135: Larger max_output_tokens for Responses tests are reasonable

Increasing max_output_tokens to 1000 for the various Responses API tests (simple text, system message, image, and streaming) makes sense for richer cross‑provider outputs and should help avoid truncation, especially with reasoning‑heavy models. Keep an eye on any provider‑specific hard limits, but 1000 is generally a safe upper bound for these integration tests.

Also applies to: 1162-1170, 1194-1202, 1290-1298

tests/integrations/tests/test_anthropic.py (1)

265-279: Increased Anthropic max_tokens match more complex flows

Raising max_tokens to 500 for the end‑to‑end tool‑calling flow and to 1000 for streaming (both plain and with tools) is appropriate for multi‑turn, tool‑heavy, or verbose outputs. These values are still conservative for modern models and should reduce flakiness from truncated responses.

Also applies to: 593-608, 618-627

tests/integrations/tests/test_google.py (2)

475-479: Higher max_output_tokens for Google integration-specific tests are sensible

Using max_output_tokens=1000 for the creative story and max_output_tokens=500 for the system-instruction test gives the GenAI models enough room to respond naturally without being overly constrained. This is in line with the expanded reasoning/longer outputs tested elsewhere.

Also applies to: 500-507


1049-1071: Helper for extracting Google function calls is defensive and ready for reuse

extract_google_function_calls correctly guards against missing function_calls and safely converts args to a dict, which should make future tests around tool-calling easier to write and less repetitive. Once you start asserting on function calls in more places, consider using this helper instead of open‑coding the extraction.

@TejasGhatte TejasGhatte force-pushed the 12-10-fix_integration_test_cases_fixes branch from 7f5bb40 to bdc7fee Compare December 11, 2025 20:32
@Pratham-Mishra04 Pratham-Mishra04 force-pushed the 12-09-fix_reroute_requests_incoming_from_integrations_for_full_compatibility_in_custom_providers branch from 5edcf23 to e72dca6 Compare December 11, 2025 20:33
@Pratham-Mishra04 Pratham-Mishra04 force-pushed the 12-10-fix_integration_test_cases_fixes branch from bdc7fee to 3a39ed0 Compare December 11, 2025 20:33
@TejasGhatte TejasGhatte force-pushed the 12-09-fix_reroute_requests_incoming_from_integrations_for_full_compatibility_in_custom_providers branch from e72dca6 to dc29eae Compare December 11, 2025 21:21
@TejasGhatte TejasGhatte force-pushed the 12-10-fix_integration_test_cases_fixes branch 3 times, most recently from 7460482 to 0a143cb Compare December 11, 2025 21:47
@TejasGhatte TejasGhatte force-pushed the 12-09-fix_reroute_requests_incoming_from_integrations_for_full_compatibility_in_custom_providers branch from dc29eae to 509b38e Compare December 11, 2025 21:47
@TejasGhatte TejasGhatte changed the base branch from 12-09-fix_reroute_requests_incoming_from_integrations_for_full_compatibility_in_custom_providers to graphite-base/1066 December 12, 2025 06:37
@TejasGhatte TejasGhatte force-pushed the 12-10-fix_integration_test_cases_fixes branch from 0a143cb to 811dc18 Compare December 12, 2025 06:37
@TejasGhatte TejasGhatte changed the base branch from graphite-base/1066 to 12-08-feat_added_support_for_gemini_native_converterswq December 12, 2025 06:38
@coderabbitai coderabbitai bot requested review from akshaydeo and danpiths December 12, 2025 06:39
@TejasGhatte TejasGhatte force-pushed the 12-10-fix_integration_test_cases_fixes branch from 811dc18 to 9fe9555 Compare December 12, 2025 06:40
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 7

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (3)
tests/integrations/tests/test_openai.py (1)

1385-1524: Test 38 needs the _no_providers_/_no_model_ sentinel skip (otherwise it will fail in empty-provider setups).
Other parametrized tests guard this; this one currently doesn’t. Also, Ruff flags test_config unused—use it (lightweight assert) to keep fixture wiring intact.

 @skip_if_no_api_key("openai")
 @pytest.mark.parametrize("provider,model", get_cross_provider_params_for_scenario("thinking"))
 def test_38_responses_reasoning(self, openai_client, test_config, provider, model):
+    if provider == "_no_providers_" or model == "_no_model_":
+        pytest.skip("No providers configured for this scenario")
+    assert test_config is not None
     """Test Case 38: Responses API with reasoning (gpt-5 model)"""
     # Use gpt-5 reasoning model
     model_to_use = format_provider_model(provider, model)
core/providers/vertex/vertex.go (1)

1018-1038: Remove duplicated ModelDeployment assignment (merge artifact).
ModelDeployment is set twice (once at Lines 1024-1026 and again at Lines 1032-1034). Keep one.

@@
-        if request.Model != deployment {
-            response.ExtraFields.ModelDeployment = deployment
-        }
-
-        // Set raw response if enabled
+        // Set raw response if enabled
         if providerUtils.ShouldSendBackRawResponse(ctx, provider.sendBackRawResponse) {
             response.ExtraFields.RawResponse = rawResponse
         }
-        if request.Model != deployment {
-            response.ExtraFields.ModelDeployment = deployment
-        }
 
         if providerUtils.ShouldSendBackRawRequest(ctx, provider.sendBackRawRequest) {
             response.ExtraFields.RawRequest = rawRequest
         }
core/providers/cohere/chat.go (1)

686-697: Guard ChatToolMessage creation on non-nil ToolCallID and use the role constant.

When cm.ToolCallID is nil, creating ChatToolMessage produces invalid tool messages that downstream providers reject or skip. Bedrock explicitly validates this with: "tool message missing tool_call_id". Additionally, use schemas.ChatMessageRoleTool instead of the magic string "tool" for consistency with the rest of the codebase.

- if cm.Role == "tool" {
+ if cm.Role == string(schemas.ChatMessageRoleTool) && cm.ToolCallID != nil {
    bifrostMessage.ChatToolMessage = &schemas.ChatToolMessage{
      ToolCallID: cm.ToolCallID,
    }
  }
🧹 Nitpick comments (4)
tests/integrations/tests/utils/common.py (1)

616-620: Good: image assertions now normalize OpenAI content via get_content_string(...).lower().
Only suggestion: consider adding/adjusting a small unit test for get_content_string() covering the expected OpenAI “list of dict blocks” shape (e.g., {"type": "...", "text": "..."}) to lock in behavior.

tests/integrations/tests/test_openai.py (1)

1131-1135: Consider making max_output_tokens scenario/config-driven to avoid slow/flaky runs across providers.
The bump to 1000 can materially increase latency/cost for slower/stricter providers; if you already have per-scenario budgets in config, it’d be more resilient to fetch a cap from there (or at least document why 1000 is required).

Also applies to: 1166-1170, 1198-1202, 1294-1299

tests/integrations/tests/test_anthropic.py (2)

675-675: Remove unused test_config parameter.

The test_config fixture parameter is not used in the test body.

Apply this diff to remove the unused parameter:

-def test_15_extended_thinking(self, anthropic_client, test_config, provider, model):
+def test_15_extended_thinking(self, anthropic_client, provider, model):

Based on static analysis hints.


754-754: Remove unused test_config parameter.

The test_config fixture parameter is not used in the test body.

Apply this diff to remove the unused parameter:

-def test_16_extended_thinking_streaming(self, anthropic_client, test_config, provider, model):
+def test_16_extended_thinking_streaming(self, anthropic_client, provider, model):

Based on static analysis hints.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 7f5bb40 and 9fe9555.

⛔ Files ignored due to path filters (1)
  • tests/integrations/uv.lock is excluded by !**/*.lock
📒 Files selected for processing (35)
  • core/changelog.md (1 hunks)
  • core/providers/cohere/chat.go (1 hunks)
  • core/providers/gemini/chat.go (0 hunks)
  • core/providers/gemini/gemini.go (7 hunks)
  • core/providers/gemini/responses.go (4 hunks)
  • core/providers/gemini/types.go (3 hunks)
  • core/providers/gemini/utils.go (2 hunks)
  • core/providers/vertex/vertex.go (3 hunks)
  • core/version (1 hunks)
  • framework/changelog.md (1 hunks)
  • framework/version (1 hunks)
  • plugins/governance/changelog.md (1 hunks)
  • plugins/governance/version (1 hunks)
  • plugins/jsonparser/changelog.md (1 hunks)
  • plugins/jsonparser/version (1 hunks)
  • plugins/logging/changelog.md (1 hunks)
  • plugins/logging/version (1 hunks)
  • plugins/maxim/changelog.md (1 hunks)
  • plugins/maxim/version (1 hunks)
  • plugins/mocker/changelog.md (1 hunks)
  • plugins/mocker/version (1 hunks)
  • plugins/otel/changelog.md (1 hunks)
  • plugins/otel/version (1 hunks)
  • plugins/semanticcache/changelog.md (1 hunks)
  • plugins/semanticcache/version (1 hunks)
  • plugins/telemetry/changelog.md (1 hunks)
  • plugins/telemetry/version (1 hunks)
  • tests/integrations/config.yml (6 hunks)
  • tests/integrations/pyproject.toml (1 hunks)
  • tests/integrations/tests/test_anthropic.py (5 hunks)
  • tests/integrations/tests/test_bedrock.py (6 hunks)
  • tests/integrations/tests/test_google.py (8 hunks)
  • tests/integrations/tests/test_openai.py (5 hunks)
  • tests/integrations/tests/utils/common.py (2 hunks)
  • transports/changelog.md (1 hunks)
💤 Files with no reviewable changes (1)
  • core/providers/gemini/chat.go
✅ Files skipped from review due to trivial changes (16)
  • plugins/otel/version
  • plugins/mocker/version
  • plugins/otel/changelog.md
  • plugins/governance/changelog.md
  • plugins/mocker/changelog.md
  • plugins/maxim/changelog.md
  • plugins/semanticcache/changelog.md
  • plugins/logging/version
  • plugins/telemetry/changelog.md
  • plugins/jsonparser/changelog.md
  • core/version
  • plugins/governance/version
  • plugins/maxim/version
  • framework/version
  • plugins/telemetry/version
  • plugins/logging/changelog.md
🚧 Files skipped from review as they are similar to previous changes (4)
  • core/providers/gemini/utils.go
  • tests/integrations/pyproject.toml
  • core/providers/gemini/types.go
  • tests/integrations/tests/test_bedrock.py
🧰 Additional context used
📓 Path-based instructions (1)
**

⚙️ CodeRabbit configuration file

always check the stack if there is one for the current PR. do not give localized reviews for the PR, always see all changes in the light of the whole stack of PRs (if there is a stack, if there is no stack you can continue to make localized suggestions/reviews)

Files:

  • plugins/jsonparser/version
  • plugins/semanticcache/version
  • core/providers/cohere/chat.go
  • core/providers/gemini/responses.go
  • tests/integrations/tests/utils/common.py
  • core/providers/vertex/vertex.go
  • tests/integrations/tests/test_anthropic.py
  • framework/changelog.md
  • core/providers/gemini/gemini.go
  • tests/integrations/tests/test_openai.py
  • tests/integrations/config.yml
  • core/changelog.md
  • tests/integrations/tests/test_google.py
  • transports/changelog.md
🧠 Learnings (1)
📚 Learning: 2025-12-09T17:07:42.007Z
Learnt from: qwerty-dvorak
Repo: maximhq/bifrost PR: 1006
File: core/schemas/account.go:9-18
Timestamp: 2025-12-09T17:07:42.007Z
Learning: In core/schemas/account.go, the HuggingFaceKeyConfig field within the Key struct is currently unused and reserved for future Hugging Face inference endpoint deployments. Do not flag this field as missing from OpenAPI documentation or require its presence in the API spec until the feature is actively implemented and used. When the feature is added, update the OpenAPI docs accordingly; otherwise, treat this field as non-breaking and not part of the current API surface.

Applied to files:

  • core/providers/cohere/chat.go
  • core/providers/gemini/responses.go
  • core/providers/vertex/vertex.go
  • core/providers/gemini/gemini.go
🧬 Code graph analysis (6)
core/providers/cohere/chat.go (2)
core/providers/gemini/types.go (1)
  • Role (15-15)
core/schemas/chatcompletions.go (1)
  • ChatToolMessage (621-623)
core/providers/vertex/vertex.go (2)
core/providers/utils/utils.go (2)
  • ShouldSendBackRawRequest (551-556)
  • ShouldSendBackRawResponse (559-564)
core/schemas/models.go (1)
  • Model (109-129)
tests/integrations/tests/test_anthropic.py (1)
tests/integrations/tests/utils/parametrize.py (2)
  • get_cross_provider_params_for_scenario (12-47)
  • format_provider_model (50-65)
core/providers/gemini/gemini.go (1)
core/providers/utils/utils.go (2)
  • ShouldSendBackRawRequest (551-556)
  • ParseAndSetRawRequest (447-454)
tests/integrations/tests/test_openai.py (1)
tests/integrations/tests/utils/parametrize.py (2)
  • get_cross_provider_params_for_scenario (12-47)
  • format_provider_model (50-65)
tests/integrations/tests/test_google.py (1)
tests/integrations/tests/utils/parametrize.py (2)
  • get_cross_provider_params_for_scenario (12-47)
  • format_provider_model (50-65)
🪛 LanguageTool
core/changelog.md

[grammar] ~4-~4: Use a hyphen to join words.
Context: ...api enhancement: improved internal inter provider conversions for integrations fe...

(QB_NEW_EN_HYPHEN)

transports/changelog.md

[grammar] ~5-~5: Use a hyphen to join words.
Context: ...api enhancement: improved internal inter provider conversions for integrations fe...

(QB_NEW_EN_HYPHEN)

🪛 Ruff (0.14.8)
tests/integrations/tests/test_anthropic.py

675-675: Unused method argument: test_config

(ARG002)


754-754: Unused method argument: test_config

(ARG002)

tests/integrations/tests/test_openai.py

1387-1387: Unused method argument: test_config

(ARG002)

tests/integrations/tests/test_google.py

891-891: Unused method argument: test_config

(ARG002)


967-967: Unused method argument: test_config

(ARG002)


1025-1025: Local variable has_thoughts is assigned to but never used

Remove assignment to unused variable has_thoughts

(F841)

⏰ Context from checks skipped due to timeout of 900000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
🔇 Additional comments (25)
tests/integrations/tests/utils/common.py (1)

563-568: Good: standardizes OpenAI content extraction via get_content_string (handles string vs list blocks).
This should make the integration assertions more resilient as providers/models return structured content.

plugins/semanticcache/version (1)

1-1: This is a new semantic cache plugin being added to the repository, not a version bump of an existing plugin.

The file plugins/semanticcache/version (with value 1.3.47) is part of a complete new plugin being introduced in this PR. All files in the semantic cache directory are newly added (not modified), making 1.3.47 the initial release version. The changelog documents updates to the core (v1.2.37) and framework (v1.1.47) dependencies, which likely support the integration test improvements and Gemini provider enhancements mentioned in the PR objectives.

Likely an incorrect or invalid review comment.

framework/changelog.md (1)

1-3: Changelog entry reorder looks fine.
No semantic change implied by this hunk.

core/providers/vertex/vertex.go (2)

729-744: Good: Vertex→Gemini streaming now forwards sendBackRawRequest explicitly.
This aligns Vertex streaming behavior with Gemini’s updated stream handler signatures (raw-request capture at stream end).


1217-1232: Good: Vertex→Gemini responses streaming now forwards sendBackRawRequest explicitly.
Same rationale as chat streaming; keeps raw-request propagation consistent across endpoints.

core/providers/gemini/gemini.go (4)

476-491: Good: attach raw request only on the terminal stream chunk.
Avoids duplicating heavy payloads on every SSE event.


591-606: Good: same raw-request threading applied to Responses streaming path.

Also applies to: 608-623


772-794: Good: raw request is attached to the completed response only.


292-307: Signature change (+call-site) looks consistent: sendBackRawRequest is threaded through streaming handlers.

All four call sites match the function signatures correctly:

  • gemini.go:293 and gemini.go:591 pass the parameters in correct order with nil for postResponseConverter
  • vertex.go:730 and vertex.go:1218 pass the parameters with actual postResponseConverter implementations
  • Both sendBackRawRequest and sendBackRawResponse are consistently positioned as the 7th and 8th parameters across all handlers
core/providers/gemini/responses.go (4)

730-739: LGTM! Logical reordering of part processing.

Prioritizing Thought before regular Text and FunctionCall before ThoughtSignature aligns with proper processing order for Gemini's extended thinking responses. This ensures reasoning content is emitted before the final text output.


1679-1693: LGTM! Added explicit handling for regular text parts.

The new case part.Text != "" properly creates a ResponsesMessage with appropriate type and role for non-thought text content, complementing the reordered switch statement in processGeminiPart.


1726-1726: Good fix: Base64 encoding for thought signatures.

Encoding the binary ThoughtSignature to base64 is necessary for proper JSON serialization and transport. This aligns with the PR objective of improving Gemini 3 Pro compatibility.


1902-1903: Mutual exclusivity of Effort and MaxTokens is correctly enforced.

The guard condition correctly prevents setting ThinkingLevel when MaxTokens is specified. Gemini's API requires thinkingLevel and thinkingBudget (MaxTokens) to be mutually exclusive—supplying both will cause an API error.

tests/integrations/config.yml (4)

56-56: LGTM! Anthropic thinking model configuration.

claude-opus-4-5 is correctly configured for extended thinking capabilities.


67-67: LGTM! Gemini thinking model configuration.

gemini-3-pro-preview aligns with the PR objective of improving Gemini 3 Pro reasoning capabilities.


125-125: LGTM! Enabled thinking scenarios for OpenAI and Gemini.

Adding thinking: true enables the extended thinking tests for these providers, consistent with the new model configurations.

Also applies to: 175-175


44-44: > Likely an incorrect or invalid review comment.

tests/integrations/tests/test_google.py (4)

478-478: LGTM! Increased token limits for generation tests.

Higher max_output_tokens values (1000, 500) provide more buffer for varied response lengths.

Also applies to: 505-506


890-891: LGTM! Cross-provider parameterization for extended thinking tests.

Using get_cross_provider_params_for_scenario("thinking") enables running these tests across all providers that support thinking capabilities, consistent with the config.yml changes.


905-907: Verify thinking_budget and max_output_tokens values.

The thinking_budget=2000 and max_output_tokens=2500 values should be validated against provider limits. Ensure max_output_tokens is sufficient to accommodate both thinking tokens and response tokens.


921-936: LGTM! Provider-conditional thoughts assertion.

Wrapping the thoughts_token_count assertion in if provider == "gemini" correctly avoids failures for non-Gemini providers that may not expose thinking tokens in usage metadata.

tests/integrations/tests/test_anthropic.py (4)

278-278: LGTM: Appropriate token budget for end-to-end tool calling.

The increase from 100 to 500 max_tokens is reasonable for end-to-end tool calling scenarios that involve multiple turns and tool response processing.


602-602: LGTM: Adequate token budget for streaming tests.

The increase to 1000 max_tokens is appropriate for streaming tests to ensure substantial content for proper validation of streaming functionality.

Also applies to: 624-624


762-765: LGTM: Appropriate token budgets for streaming thinking test.

The max_tokens of 4000 and budget_tokens of 2000 are reasonable for streaming thinking scenarios. The reduction from the non-streaming test's 2500 budget_tokens is appropriate given streaming's incremental nature.


687-689: Verify that reasoning_summary parameter is properly supported across all "thinking" scenario providers.

The reasoning_summary parameter in extra_body is supported by multiple providers in the Bifrost backend (Anthropic, Bedrock, Gemini) and is properly extracted and converted during request handling. The test correctly routes to different providers using the "provider/model" format via format_provider_model().

However, verify at runtime that:

  1. All providers in the "thinking" scenario (openai, anthropic, gemini, bedrock) correctly handle the reasoning_summary parameter without errors
  2. The test's use of @skip_if_no_api_key("anthropic") is appropriate for a multi-provider parametrized test
  3. Consider clarifying test architecture: using anthropic_client fixture for non-Anthropic providers through Bifrost may work but is architecturally unclear

@TejasGhatte TejasGhatte force-pushed the 12-10-fix_integration_test_cases_fixes branch from 9fe9555 to aa6e9b3 Compare December 12, 2025 07:34
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
core/providers/gemini/responses.go (2)

235-239: Critical: decode base64 EncryptedContent when mapping back to Gemini ThoughtSignature.

Right now you base64-encode thought signatures (Gemini → Responses), but you don’t base64-decode on the reverse path (Responses → Gemini). That will break Gemini 3 Pro “thought signature” roundtrips.

@@
-                        if msg.ResponsesReasoning != nil && msg.ResponsesReasoning.EncryptedContent != nil {
-                            part.ThoughtSignature = []byte(*msg.ResponsesReasoning.EncryptedContent)
-                        }
+                        if msg.ResponsesReasoning != nil && msg.ResponsesReasoning.EncryptedContent != nil {
+                            if decoded, err := base64.StdEncoding.DecodeString(*msg.ResponsesReasoning.EncryptedContent); err == nil {
+                                part.ThoughtSignature = decoded
+                            } else {
+                                // Back-compat fallback if upstream ever sends non-base64
+                                part.ThoughtSignature = []byte(*msg.ResponsesReasoning.EncryptedContent)
+                            }
+                        }
@@
-                if msg.ResponsesReasoning.EncryptedContent != nil {
-                    currentParts = append(currentParts, &Part{
-                        ThoughtSignature: []byte(*msg.ResponsesReasoning.EncryptedContent),
-                    })
-                }
+                if msg.ResponsesReasoning.EncryptedContent != nil {
+                    if decoded, err := base64.StdEncoding.DecodeString(*msg.ResponsesReasoning.EncryptedContent); err == nil {
+                        currentParts = append(currentParts, &Part{ThoughtSignature: decoded})
+                    } else {
+                        currentParts = append(currentParts, &Part{ThoughtSignature: []byte(*msg.ResponsesReasoning.EncryptedContent)})
+                    }
+                }
@@
-        if bifrostResp.Item != nil && bifrostResp.Item.ResponsesReasoning != nil && bifrostResp.Item.EncryptedContent != nil {
+        if bifrostResp.Item != nil && bifrostResp.Item.ResponsesReasoning != nil && bifrostResp.Item.EncryptedContent != nil {
+            thoughtSigBytes, err := base64.StdEncoding.DecodeString(*bifrostResp.Item.ResponsesReasoning.EncryptedContent)
+            if err != nil {
+                thoughtSigBytes = []byte(*bifrostResp.Item.ResponsesReasoning.EncryptedContent)
+            }
             candidate.Content.Parts = append(candidate.Content.Parts, &Part{
-                ThoughtSignature: []byte(*bifrostResp.Item.ResponsesReasoning.EncryptedContent),
+                ThoughtSignature: thoughtSigBytes,
             })
         }
@@
-                            part.ThoughtSignature = []byte(*nextMsg.ResponsesReasoning.EncryptedContent)
+                            if decoded, err := base64.StdEncoding.DecodeString(*nextMsg.ResponsesReasoning.EncryptedContent); err == nil {
+                                part.ThoughtSignature = decoded
+                            } else {
+                                part.ThoughtSignature = []byte(*nextMsg.ResponsesReasoning.EncryptedContent)
+                            }
                         }
                     }

Also applies to: 285-289, 423-427, 2161-2169


1860-1876: Model-version-aware parameter selection required: Gemini 3 and 2.5 have incompatible thinking parameters.

The conditional logic (setting Effort only when MaxTokens is nil) prevents both from being mapped in the same request, but the code lacks model-version detection. Gemini 3 uses thinkingLevel exclusively and rejects requests with thinkingBudget (returns 400 error); Gemini 2.5 uses thinkingBudget. Without checking the model version at the call site in ToGeminiResponsesRequest (where bifrostReq.Model is available), the code may send incompatible parameters to either model version. Add model-version-aware branching to map EffortthinkingLevel for Gemini 3 only and MaxTokensthinkingBudget for Gemini 2.5 only.

♻️ Duplicate comments (3)
plugins/jsonparser/version (1)

1-1: Version bump looks correct, but verify transports/go.mod update.

The version bump to 1.3.48 is coordinated with other plugin updates. However, a previous review flagged a critical issue where transports/go.mod still references governance and logging at v1.3.47. Ensure that transports/go.mod has been updated to reference v1.3.48 for all affected plugins before merging.

#!/bin/bash
# Verify that transports/go.mod references the updated plugin versions

echo "=== Checking transports/go.mod for plugin version references ==="
rg -n "plugins/(governance|jsonparser|logging)" transports/go.mod

echo ""
echo "=== Checking for any v1.3.47 references in transports/go.mod ==="
rg -n "v1\.3\.47" transports/go.mod

echo ""
echo "=== Checking for v1.3.48 references in transports/go.mod ==="
rg -n "v1\.3\.48" transports/go.mod
transports/changelog.md (1)

5-5: Hyphenate "inter-provider" for proper compound modifier format.

A previous review correctly identified that "inter provider" should be hyphenated as "inter-provider" per standard English grammar rules for compound modifiers.

Apply this diff:

-enhancement: improved internal inter provider conversions for integrations
+enhancement: improved internal inter-provider conversions for integrations
core/changelog.md (1)

4-4: Hyphenate “inter-provider” in changelog entry.

-enhancement: improved internal inter provider conversions for integrations
+enhancement: improved internal inter-provider conversions for integrations
🧹 Nitpick comments (6)
tests/integrations/tests/test_bedrock.py (1)

215-215: Token budget increases align with thinking/reasoning support.

The uniform increase to 500 tokens across these cross-provider tests is appropriate for the PR's addition of thinking/reasoning capabilities, which require additional output tokens.

Consider adding a constant or comment documenting the token budget choice:

# Token budget for tests with thinking/reasoning enabled
INFERENCE_MAX_TOKENS = 500

Then reference it in inferenceConfig:

inferenceConfig={"maxTokens": INFERENCE_MAX_TOKENS}

Also applies to: 254-254, 300-300, 689-689, 730-730

tests/integrations/tests/utils/common.py (1)

1878-1891: Harden get_content_string to handle list-of-strings (and avoid silently dropping content).

 def get_content_string(content: Any) -> str:
     """Get a string representation of content"""
     if isinstance(content, str):
         return content
     elif isinstance(content, list):
         parts: List[str] = []
         for c in content:
+            if isinstance(c, str):
+                parts.append(c)
+                continue
             if isinstance(c, dict):
                 parts.append(c.get("text", ""))
             elif hasattr(c, "text"):
                 parts.append(getattr(c, "text") or "")
         return " ".join(filter(None, parts))
     else:
         return ""
tests/integrations/tests/test_google.py (2)

890-936: Cross-provider parameterization correctly implemented.

The test now runs across multiple providers with appropriate Gemini-specific gating for thoughts assertions. Token budgets have been reduced to accommodate smaller context windows.

The test_config parameter is unused throughout this test. Consider removing it:

-    def test_26_extended_thinking(self, google_client, test_config, provider, model):
+    def test_26_extended_thinking(self, google_client, provider, model):

966-1031: Cross-provider streaming test correctly implemented.

The streaming test mirrors the non-streaming test's cross-provider pattern with appropriate Gemini-specific gating for thoughts token count assertions.

The test_config parameter is unused throughout this test. Consider removing it:

-    def test_27_extended_thinking_streaming(self, google_client, test_config, provider, model):
+    def test_27_extended_thinking_streaming(self, google_client, provider, model):
tests/integrations/tests/test_anthropic.py (2)

667-686: Cross-provider thinking test correctly implemented.

The test now supports multiple providers with appropriate token budget reductions. The addition of reasoning_summary in extra_body tests Gemini's reasoning summary parameter support mentioned in the PR objectives.

The test_config parameter is unused throughout this test. Consider removing it:

-    def test_15_extended_thinking(self, anthropic_client, test_config, provider, model):
+    def test_15_extended_thinking(self, anthropic_client, provider, model):

747-763: Cross-provider streaming thinking test correctly implemented.

The streaming test mirrors the non-streaming test's cross-provider pattern with appropriate token budget reductions for smaller context models.

The test_config parameter is unused throughout this test. Consider removing it:

-    def test_16_extended_thinking_streaming(self, anthropic_client, test_config, provider, model):
+    def test_16_extended_thinking_streaming(self, anthropic_client, provider, model):
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 9fe9555 and aa6e9b3.

⛔ Files ignored due to path filters (1)
  • tests/integrations/uv.lock is excluded by !**/*.lock
📒 Files selected for processing (34)
  • core/changelog.md (1 hunks)
  • core/providers/cohere/chat.go (1 hunks)
  • core/providers/gemini/chat.go (0 hunks)
  • core/providers/gemini/gemini.go (7 hunks)
  • core/providers/gemini/responses.go (18 hunks)
  • core/providers/gemini/types.go (3 hunks)
  • core/providers/gemini/utils.go (2 hunks)
  • core/providers/vertex/vertex.go (3 hunks)
  • core/version (1 hunks)
  • framework/version (1 hunks)
  • plugins/governance/changelog.md (1 hunks)
  • plugins/governance/version (1 hunks)
  • plugins/jsonparser/changelog.md (1 hunks)
  • plugins/jsonparser/version (1 hunks)
  • plugins/logging/changelog.md (1 hunks)
  • plugins/logging/version (1 hunks)
  • plugins/maxim/changelog.md (1 hunks)
  • plugins/maxim/version (1 hunks)
  • plugins/mocker/changelog.md (1 hunks)
  • plugins/mocker/version (1 hunks)
  • plugins/otel/changelog.md (1 hunks)
  • plugins/otel/version (1 hunks)
  • plugins/semanticcache/changelog.md (1 hunks)
  • plugins/semanticcache/version (1 hunks)
  • plugins/telemetry/changelog.md (1 hunks)
  • plugins/telemetry/version (1 hunks)
  • tests/integrations/config.yml (6 hunks)
  • tests/integrations/pyproject.toml (1 hunks)
  • tests/integrations/tests/test_anthropic.py (5 hunks)
  • tests/integrations/tests/test_bedrock.py (6 hunks)
  • tests/integrations/tests/test_google.py (8 hunks)
  • tests/integrations/tests/test_openai.py (5 hunks)
  • tests/integrations/tests/utils/common.py (2 hunks)
  • transports/changelog.md (1 hunks)
💤 Files with no reviewable changes (1)
  • core/providers/gemini/chat.go
✅ Files skipped from review due to trivial changes (2)
  • plugins/jsonparser/changelog.md
  • plugins/mocker/version
🚧 Files skipped from review as they are similar to previous changes (16)
  • plugins/logging/version
  • core/providers/gemini/utils.go
  • plugins/semanticcache/changelog.md
  • framework/version
  • plugins/maxim/changelog.md
  • plugins/otel/changelog.md
  • plugins/governance/changelog.md
  • plugins/logging/changelog.md
  • core/providers/gemini/types.go
  • core/providers/cohere/chat.go
  • plugins/maxim/version
  • core/providers/vertex/vertex.go
  • plugins/telemetry/version
  • plugins/otel/version
  • plugins/telemetry/changelog.md
  • tests/integrations/config.yml
🧰 Additional context used
📓 Path-based instructions (1)
**

⚙️ CodeRabbit configuration file

always check the stack if there is one for the current PR. do not give localized reviews for the PR, always see all changes in the light of the whole stack of PRs (if there is a stack, if there is no stack you can continue to make localized suggestions/reviews)

Files:

  • plugins/mocker/changelog.md
  • tests/integrations/tests/test_openai.py
  • core/providers/gemini/gemini.go
  • tests/integrations/tests/test_bedrock.py
  • tests/integrations/tests/test_google.py
  • tests/integrations/pyproject.toml
  • core/version
  • plugins/semanticcache/version
  • plugins/governance/version
  • plugins/jsonparser/version
  • core/providers/gemini/responses.go
  • tests/integrations/tests/test_anthropic.py
  • tests/integrations/tests/utils/common.py
  • core/changelog.md
  • transports/changelog.md
🧠 Learnings (1)
📚 Learning: 2025-12-09T17:07:42.007Z
Learnt from: qwerty-dvorak
Repo: maximhq/bifrost PR: 1006
File: core/schemas/account.go:9-18
Timestamp: 2025-12-09T17:07:42.007Z
Learning: In core/schemas/account.go, the HuggingFaceKeyConfig field within the Key struct is currently unused and reserved for future Hugging Face inference endpoint deployments. Do not flag this field as missing from OpenAPI documentation or require its presence in the API spec until the feature is actively implemented and used. When the feature is added, update the OpenAPI docs accordingly; otherwise, treat this field as non-breaking and not part of the current API surface.

Applied to files:

  • core/providers/gemini/gemini.go
  • core/providers/gemini/responses.go
🧬 Code graph analysis (5)
tests/integrations/tests/test_openai.py (1)
tests/integrations/tests/utils/parametrize.py (2)
  • get_cross_provider_params_for_scenario (12-47)
  • format_provider_model (50-65)
core/providers/gemini/gemini.go (1)
core/providers/utils/utils.go (2)
  • ShouldSendBackRawRequest (551-556)
  • ParseAndSetRawRequest (447-454)
tests/integrations/tests/test_google.py (1)
tests/integrations/tests/utils/parametrize.py (2)
  • get_cross_provider_params_for_scenario (12-47)
  • format_provider_model (50-65)
core/providers/gemini/responses.go (4)
core/providers/gemini/types.go (5)
  • Role (15-15)
  • FunctionCall (1147-1157)
  • FunctionResponse (1160-1182)
  • Content (978-986)
  • Type (783-783)
core/schemas/responses.go (3)
  • ResponsesInputMessageRoleUser (337-337)
  • ResponsesMessage (318-331)
  • ResponsesMessageContent (343-348)
ui/lib/types/logs.ts (3)
  • FunctionCall (157-160)
  • ResponsesMessage (423-438)
  • ResponsesMessageContent (400-400)
core/schemas/utils.go (1)
  • Ptr (16-18)
tests/integrations/tests/test_anthropic.py (1)
tests/integrations/tests/utils/parametrize.py (2)
  • get_cross_provider_params_for_scenario (12-47)
  • format_provider_model (50-65)
🪛 LanguageTool
core/changelog.md

[grammar] ~4-~4: Use a hyphen to join words.
Context: ...api enhancement: improved internal inter provider conversions for integrations fe...

(QB_NEW_EN_HYPHEN)

transports/changelog.md

[grammar] ~5-~5: Use a hyphen to join words.
Context: ...api enhancement: improved internal inter provider conversions for integrations fe...

(QB_NEW_EN_HYPHEN)

🪛 Ruff (0.14.8)
tests/integrations/tests/test_openai.py

1387-1387: Unused method argument: test_config

(ARG002)

tests/integrations/tests/test_google.py

891-891: Unused method argument: test_config

(ARG002)


967-967: Unused method argument: test_config

(ARG002)

tests/integrations/tests/test_anthropic.py

668-668: Unused method argument: test_config

(ARG002)


748-748: Unused method argument: test_config

(ARG002)

⏰ Context from checks skipped due to timeout of 900000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Graphite / mergeability_check
🔇 Additional comments (14)
plugins/semanticcache/version (1)

1-1: No review needed for this file—it contains no changes in this PR.

Likely an incorrect or invalid review comment.

plugins/mocker/changelog.md (1)

1-1: LGTM!

The changelog entry accurately documents the coordinated version bumps to core 1.2.37 and framework 1.1.47, consistent with the broader version alignment across the PR stack.

plugins/governance/version (1)

1-1: LGTM!

Version bump to 1.3.48 is properly coordinated with other plugin updates in the stack (jsonparser, logging also bumped to 1.3.48).

core/version (1)

1-1: LGTM!

Core version bump to 1.2.37 is properly coordinated with framework and plugin updates across the stack.

transports/changelog.md (1)

6-7: LGTM!

The new changelog entries accurately document the Gemini native API integration and fallback handling for custom models, aligning well with the PR objectives and broader stack changes.

tests/integrations/pyproject.toml (1)

35-35: The google-genai version constraint is valid. Version 1.50.0 exists on PyPI and has been released and patched (1.50.1), confirming it is a stable version. The constraint >=1.50.0 appropriately enables support for Gemini 3 Pro and thinking/reasoning capabilities as described in the PR.

tests/integrations/tests/utils/common.py (1)

563-568: Good: unified OpenAI content extraction to handle string-or-list shapes.

Also applies to: 619-620

core/providers/gemini/gemini.go (2)

246-249: Good fix: ParseAndSetRawRequest avoids []byte being JSON-marshaled as base64.


300-301: LGTM: raw-request flag plumbed through streaming and applied once at stream end.

Also applies to: 318-319, 484-486, 598-599, 616-617, 787-789

core/providers/gemini/responses.go (2)

720-735: Good: part processing order prioritizes thought → text and function_call before thought_signature.


1641-1656: Good: emit regular text parts separately after thought parts in Responses output conversion.

tests/integrations/tests/test_google.py (2)

478-478: LGTM! Token limits updated appropriately.

The increased token limits align with the test objectives for generation configuration and system instruction tests.

Also applies to: 505-505


1055-1076: LGTM! Well-designed helper function.

The function includes proper type checking, error handling, and clear documentation. The defensive programming approach with hasattr checks prevents attribute errors.

tests/integrations/tests/test_anthropic.py (1)

277-277: LGTM! Token limits increased appropriately.

The increased token limits accommodate more complex test scenarios:

  • End-to-end tool calling flows require more tokens for complete conversation cycles
  • Streaming tests benefit from larger token budgets for validation

Also applies to: 597-597, 619-619

@TejasGhatte TejasGhatte force-pushed the 12-10-fix_integration_test_cases_fixes branch from aa6e9b3 to c93b41a Compare December 12, 2025 09:53
@TejasGhatte TejasGhatte force-pushed the 12-08-feat_added_support_for_gemini_native_converterswq branch from fd14a0e to 3154215 Compare December 12, 2025 09:53
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (3)
core/providers/vertex/vertex.go (1)

501-525: Bug risk: Gemini non-streaming uses provider defaults for raw capture, but attaches based on ctx override.
In Gemini non-streaming branches, HandleProviderResponse(..., provider.sendBackRawRequest, provider.sendBackRawResponse) can produce rawRequest/rawResponse inconsistently with the later ShouldSendBackRawRequest/ShouldSendBackRawResponse checks. Also, ModelDeployment is assigned twice in the Gemini Responses branch.

@@
-        rawRequest, rawResponse, bifrostErr := providerUtils.HandleProviderResponse(resp.Body(), &geminiResponse, jsonBody, provider.sendBackRawRequest, provider.sendBackRawResponse)
+        sendBackRawReq := providerUtils.ShouldSendBackRawRequest(ctx, provider.sendBackRawRequest)
+        sendBackRawResp := providerUtils.ShouldSendBackRawResponse(ctx, provider.sendBackRawResponse)
+        rawRequest, rawResponse, bifrostErr := providerUtils.HandleProviderResponse(resp.Body(), &geminiResponse, jsonBody, sendBackRawReq, sendBackRawResp)
@@
-        if providerUtils.ShouldSendBackRawRequest(ctx, provider.sendBackRawRequest) {
+        if sendBackRawReq {
             response.ExtraFields.RawRequest = rawRequest
         }
-        if providerUtils.ShouldSendBackRawResponse(ctx, provider.sendBackRawResponse) {
+        if sendBackRawResp {
             response.ExtraFields.RawResponse = rawResponse
         }
@@
-        if request.Model != deployment {
-            response.ExtraFields.ModelDeployment = deployment
-        }
+        // (remove duplicate ModelDeployment assignment)

Also applies to: 1011-1039

tests/integrations/tests/test_anthropic.py (1)

265-311: Assert tool_use_id is found before sending tool_result (avoid opaque downstream failures).

         tool_use_id = None
         for content in response.content:
             if content.type == "tool_use":
                 tool_use_id = content.id
                 break
+        assert tool_use_id is not None, "Expected a tool_use block with an id"
core/providers/gemini/responses.go (1)

147-195: Fix double-base64 encoding of ThoughtSignature when round-tripping via EncryptedContent.

Lines 237 and 2168 assign []byte(*EncryptedContent) directly to part.ThoughtSignature. Since EncryptedContent is a base64-encoded string and ThoughtSignature is a []byte field, the JSON marshaler will base64-encode it again during serialization, corrupting the signature for Gemini 3 Pro flows.

Decode the base64 string before assigning to ThoughtSignature:

+import "encoding/base64"
+import "strings"
  ...
- part.ThoughtSignature = []byte(*msg.ResponsesReasoning.EncryptedContent)
+ enc := strings.TrimSpace(*msg.ResponsesReasoning.EncryptedContent)
+ if decoded, err := base64.StdEncoding.DecodeString(enc); err == nil {
+     part.ThoughtSignature = decoded
+ } else {
+     // fallback (keeps backward-compat if EncryptedContent was not base64)
+     part.ThoughtSignature = []byte(enc)
+ }

Apply the same fix at line 2168 in convertResponsesMessagesToGeminiContents.

♻️ Duplicate comments (3)
transports/changelog.md (1)

5-5: Hyphenate “inter-provider” for readability/consistency (duplicate).

-enhancement: improved internal inter provider conversions for integrations
+enhancement: improved internal inter-provider conversions for integrations
core/changelog.md (1)

4-4: Unresolved: Hyphenate "inter-provider" for consistency.

This issue was flagged in a previous review but remains unfixed. Apply this correction:

-enhancement: improved internal inter provider conversions for integrations
+enhancement: improved internal inter-provider conversions for integrations
tests/integrations/tests/test_openai.py (1)

1385-1391: Parametrized “thinking” test should skip the _no_providers_ sentinel + mark unused fixture.
This matches earlier review feedback; without it, the test can run in “no providers configured” mode and Ruff will keep flagging test_config.

@@
 @skip_if_no_api_key("openai")
 @pytest.mark.parametrize("provider,model", get_cross_provider_params_for_scenario("thinking"))
-def test_38_responses_reasoning(self, openai_client, test_config, provider, model):
+def test_38_responses_reasoning(self, openai_client, _test_config, provider, model):
+    if provider == "_no_providers_" or model == "_no_model_":
+        pytest.skip("No providers configured for this scenario")
     """Test Case 38: Responses API with reasoning (gpt-5 model)"""
🧹 Nitpick comments (3)
tests/integrations/tests/utils/common.py (1)

1878-1891: get_content_string is a solid, safe helper; consider one extra fallback.
If you expect SDK objects/dicts where the text is nested (e.g., {"type":"text","text":"..."} already handled; but some SDKs may nest under another key), you may want to extend parsing later—but current behavior is safe (returns "").

tests/integrations/tests/test_openai.py (2)

1131-1135: max_output_tokens=1000 may make CI slower/flakier than needed.
Consider tightening the prompts + lowering token caps (or making caps configurable) to reduce latency/cost/rate-limit noise in cross-provider CI.

Also applies to: 1166-1170, 1198-1202, 1294-1299


1392-1524: Cross-provider “reasoning” fallback may miss some unsupported-parameter failures.
Right now the fallback only triggers when the exception string contains "reasoning" or "not supported". For non-OpenAI providers, failures may mention include, unknown field, etc. Consider broadening the fallback condition (or explicitly skipping providers that don’t advertise this capability) to reduce false negatives.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between aa6e9b3 and c93b41a.

⛔ Files ignored due to path filters (1)
  • tests/integrations/uv.lock is excluded by !**/*.lock
📒 Files selected for processing (35)
  • core/changelog.md (1 hunks)
  • core/providers/cohere/chat.go (1 hunks)
  • core/providers/gemini/chat.go (0 hunks)
  • core/providers/gemini/gemini.go (7 hunks)
  • core/providers/gemini/responses.go (23 hunks)
  • core/providers/gemini/types.go (3 hunks)
  • core/providers/gemini/utils.go (3 hunks)
  • core/providers/vertex/vertex.go (3 hunks)
  • core/version (1 hunks)
  • framework/changelog.md (1 hunks)
  • framework/version (1 hunks)
  • plugins/governance/changelog.md (1 hunks)
  • plugins/governance/version (1 hunks)
  • plugins/jsonparser/changelog.md (1 hunks)
  • plugins/jsonparser/version (1 hunks)
  • plugins/logging/changelog.md (1 hunks)
  • plugins/logging/version (1 hunks)
  • plugins/maxim/changelog.md (1 hunks)
  • plugins/maxim/version (1 hunks)
  • plugins/mocker/changelog.md (1 hunks)
  • plugins/mocker/version (1 hunks)
  • plugins/otel/changelog.md (1 hunks)
  • plugins/otel/version (1 hunks)
  • plugins/semanticcache/changelog.md (1 hunks)
  • plugins/semanticcache/version (1 hunks)
  • plugins/telemetry/changelog.md (1 hunks)
  • plugins/telemetry/version (1 hunks)
  • tests/integrations/config.yml (6 hunks)
  • tests/integrations/pyproject.toml (1 hunks)
  • tests/integrations/tests/test_anthropic.py (5 hunks)
  • tests/integrations/tests/test_bedrock.py (6 hunks)
  • tests/integrations/tests/test_google.py (8 hunks)
  • tests/integrations/tests/test_openai.py (5 hunks)
  • tests/integrations/tests/utils/common.py (2 hunks)
  • transports/changelog.md (1 hunks)
💤 Files with no reviewable changes (1)
  • core/providers/gemini/chat.go
✅ Files skipped from review due to trivial changes (5)
  • plugins/otel/changelog.md
  • plugins/governance/version
  • plugins/mocker/changelog.md
  • plugins/telemetry/changelog.md
  • plugins/maxim/version
🚧 Files skipped from review as they are similar to previous changes (12)
  • core/providers/gemini/utils.go
  • framework/version
  • plugins/logging/version
  • plugins/telemetry/version
  • framework/changelog.md
  • plugins/jsonparser/version
  • tests/integrations/pyproject.toml
  • tests/integrations/tests/test_bedrock.py
  • core/providers/cohere/chat.go
  • core/version
  • plugins/semanticcache/version
  • plugins/logging/changelog.md
🧰 Additional context used
📓 Path-based instructions (1)
**

⚙️ CodeRabbit configuration file

always check the stack if there is one for the current PR. do not give localized reviews for the PR, always see all changes in the light of the whole stack of PRs (if there is a stack, if there is no stack you can continue to make localized suggestions/reviews)

Files:

  • core/providers/vertex/vertex.go
  • tests/integrations/tests/utils/common.py
  • plugins/maxim/changelog.md
  • plugins/jsonparser/changelog.md
  • core/changelog.md
  • core/providers/gemini/gemini.go
  • tests/integrations/tests/test_openai.py
  • plugins/otel/version
  • tests/integrations/tests/test_google.py
  • tests/integrations/config.yml
  • plugins/semanticcache/changelog.md
  • plugins/governance/changelog.md
  • core/providers/gemini/types.go
  • core/providers/gemini/responses.go
  • plugins/mocker/version
  • tests/integrations/tests/test_anthropic.py
  • transports/changelog.md
🧠 Learnings (1)
📚 Learning: 2025-12-09T17:07:42.007Z
Learnt from: qwerty-dvorak
Repo: maximhq/bifrost PR: 1006
File: core/schemas/account.go:9-18
Timestamp: 2025-12-09T17:07:42.007Z
Learning: In core/schemas/account.go, the HuggingFaceKeyConfig field within the Key struct is currently unused and reserved for future Hugging Face inference endpoint deployments. Do not flag this field as missing from OpenAPI documentation or require its presence in the API spec until the feature is actively implemented and used. When the feature is added, update the OpenAPI docs accordingly; otherwise, treat this field as non-breaking and not part of the current API surface.

Applied to files:

  • core/providers/vertex/vertex.go
  • core/providers/gemini/gemini.go
  • core/providers/gemini/types.go
  • core/providers/gemini/responses.go
🧬 Code graph analysis (5)
core/providers/vertex/vertex.go (2)
core/providers/utils/utils.go (2)
  • ShouldSendBackRawRequest (551-556)
  • ShouldSendBackRawResponse (559-564)
core/schemas/models.go (1)
  • Model (109-129)
core/providers/gemini/gemini.go (1)
core/providers/utils/utils.go (2)
  • ShouldSendBackRawRequest (551-556)
  • ParseAndSetRawRequest (447-454)
tests/integrations/tests/test_openai.py (1)
tests/integrations/tests/utils/parametrize.py (2)
  • get_cross_provider_params_for_scenario (12-47)
  • format_provider_model (50-65)
core/providers/gemini/responses.go (3)
core/providers/gemini/types.go (5)
  • Role (15-15)
  • FunctionCall (1147-1157)
  • FunctionResponse (1160-1182)
  • Content (978-986)
  • Type (783-783)
core/schemas/responses.go (7)
  • ResponsesInputMessageRoleUser (337-337)
  • ResponsesMessage (318-331)
  • ResponsesInputMessageRoleAssistant (336-336)
  • ResponsesMessageContent (343-348)
  • ResponsesMessageContentBlock (403-415)
  • ResponsesOutputMessageContentTypeText (396-396)
  • ResponsesMessageTypeMessage (294-294)
core/schemas/utils.go (1)
  • Ptr (16-18)
tests/integrations/tests/test_anthropic.py (1)
tests/integrations/tests/utils/parametrize.py (2)
  • get_cross_provider_params_for_scenario (12-47)
  • format_provider_model (50-65)
🪛 LanguageTool
core/changelog.md

[grammar] ~4-~4: Use a hyphen to join words.
Context: ...api enhancement: improved internal inter provider conversions for integrations fe...

(QB_NEW_EN_HYPHEN)

transports/changelog.md

[grammar] ~5-~5: Use a hyphen to join words.
Context: ...api enhancement: improved internal inter provider conversions for integrations fe...

(QB_NEW_EN_HYPHEN)

🪛 Ruff (0.14.8)
tests/integrations/tests/test_openai.py

1387-1387: Unused method argument: test_config

(ARG002)

tests/integrations/tests/test_google.py

891-891: Unused method argument: test_config

(ARG002)


967-967: Unused method argument: test_config

(ARG002)

tests/integrations/tests/test_anthropic.py

668-668: Unused method argument: test_config

(ARG002)


748-748: Unused method argument: test_config

(ARG002)

⏰ Context from checks skipped due to timeout of 900000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (9)
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
🔇 Additional comments (16)
plugins/mocker/version (1)

1-1: Version bump looks fine; verify stack-wide version coherence.

Given this PR is part of a Graphite stack, please ensure plugins/mocker 1.3.47 is consistent with the other version bumps in the stack (core/framework) and any release automation that consumes these version files.

plugins/maxim/changelog.md (1)

1-1: Changelog entry looks fine; just ensure it matches the file’s convention/section placement.
Given this is a single list item, please confirm it’s under the correct heading (e.g., Unreleased / version block) and ordering style used elsewhere in plugins/maxim/changelog.md.

plugins/governance/changelog.md (1)

1-1: Changelog entry looks fine—please verify it matches the stacked version bumps and placement conventions.
Since this PR is part of a Graphite stack, double-check that core 1.2.37 / framework 1.1.47 are the exact versions bumped in the stack and that this bullet is in the correct section/order for plugins/governance/changelog.md.

core/changelog.md (1)

5-6: LGTM!

The new entries are well-formed and properly reflect the Gemini provider enhancements in this PR.

plugins/jsonparser/changelog.md (1)

1-1: Confirm changelog scope: version-only vs. substantive changes.

The entry documents version bumps (core 1.2.37 and framework 1.1.47) but does not capture the substantive features and fixes mentioned in the PR objectives (e.g., Gemini thinking/reasoning support, integration test fixes, role mapping improvements). Verify whether this is intentional—i.e., whether plugins/jsonparser/changelog.md is meant to track only version updates for the jsonparser plugin, with provider/core changes documented in separate changelogs. If broader changelog coverage is expected here, the entry should be expanded to reflect the actual changes.

plugins/otel/version (1)

1-1: LGTM! Version bump is appropriate.

The patch version increment (1.0.46 → 1.0.47) aligns with the bug fixes and integration test improvements in this PR.

tests/integrations/tests/utils/common.py (1)

564-568: Good normalization of OpenAI “content” (string vs list).
Switching to get_content_string(...) removes a bunch of edge cases vs (content or "") and should make the assertions more robust across providers.

Also applies to: 619-620

core/providers/gemini/types.go (1)

920-958: Dual-case thinkingConfig unmarshalling looks correct.
Preferring camelCase and falling back to snake_case is the right approach for “GenAI SDK vs Gemini API” compatibility.

core/providers/vertex/vertex.go (1)

729-745: Nice: Vertex Gemini streaming now respects ctx override for raw request capture.
Threading providerUtils.ShouldSendBackRawRequest(...) into the shared Gemini streaming handlers aligns with the “raw request” feature behavior in other paths.

Also applies to: 1218-1233

core/providers/gemini/gemini.go (2)

246-249: Good fix: ParseAndSetRawRequest preserves JSON “object-like” shape vs base64-encoded bytes.
This aligns ChatCompletion RawRequest behavior with other providers/operations.


293-307: Streaming raw-request propagation looks consistent; final-chunk attachment is a good tradeoff.
Flag threading + ParseAndSetRawRequest(&response.ExtraFields, jsonBody) on the terminal event is coherent and avoids noisy per-chunk payloads.

Also applies to: 318-320, 483-489, 591-605, 616-618, 786-789

tests/integrations/config.yml (1)

44-45: Please verify the “thinking” model IDs exist + are accessible in CI (preview model names can break quickly).
If any are not universally available, consider an env-override (or a fallback list) to keep the suite stable.

Also applies to: 56-57, 67-68, 83-84, 125-126, 175-176

core/providers/gemini/responses.go (2)

523-525: Nice: monotonic output_index + cached TextOutputIndex should prevent collisions with “thoughts before text”.
The helper methods make the stream lifecycle much easier to reason about.

Also applies to: 553-554, 568-569, 603-604, 616-640, 720-730, 753-795, 820-827, 1282-1284


1849-1877: ThinkingConfig mapping change is sensible (avoid conflicting “effort” vs max-tokens intent).
Guarding ThinkingLevel when Reasoning.MaxTokens is provided should reduce surprising configs.

tests/integrations/tests/test_anthropic.py (1)

667-686: extra_body / reasoning_summary usage: please confirm Anthropic SDK + Bifrost passthrough compatibility.
This is a good test, but it’s tightly coupled to SDK request-shape support.

tests/integrations/tests/test_google.py (1)

889-937: Thinking tests look solid; please confirm Google GenAI SDK field names are correct for the pinned version.
In particular: types.ThinkingConfig(... thinking_budget=...) and streaming usage_metadata.thoughts_token_count availability.

Also applies to: 965-1032

@TejasGhatte TejasGhatte force-pushed the 12-10-fix_integration_test_cases_fixes branch 2 times, most recently from 59dee28 to bbd6304 Compare December 12, 2025 10:42
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
core/providers/gemini/responses.go (2)

235-239: Critical: EncryptedContent is base64 text; it must be decoded before assigning Part.ThoughtSignature.
Right now you’re doing []byte(*EncryptedContent), which passes ASCII base64 bytes to Gemini instead of the original signature bytes (breaks Gemini 3 Pro thought-signature handling).

@@
-                        if msg.ResponsesReasoning != nil && msg.ResponsesReasoning.EncryptedContent != nil {
-                            part.ThoughtSignature = []byte(*msg.ResponsesReasoning.EncryptedContent)
-                        }
+                        if msg.ResponsesReasoning != nil && msg.ResponsesReasoning.EncryptedContent != nil {
+                            if decoded, err := base64.StdEncoding.DecodeString(*msg.ResponsesReasoning.EncryptedContent); err == nil {
+                                part.ThoughtSignature = decoded
+                            }
+                        }
@@
-                if msg.ResponsesReasoning.EncryptedContent != nil {
-                    currentParts = append(currentParts, &Part{
-                        ThoughtSignature: []byte(*msg.ResponsesReasoning.EncryptedContent),
-                    })
-                }
+                if msg.ResponsesReasoning.EncryptedContent != nil {
+                    if decoded, err := base64.StdEncoding.DecodeString(*msg.ResponsesReasoning.EncryptedContent); err == nil {
+                        currentParts = append(currentParts, &Part{ThoughtSignature: decoded})
+                    }
+                }
@@
-        if bifrostResp.Item != nil && bifrostResp.Item.ResponsesReasoning != nil && bifrostResp.Item.EncryptedContent != nil {
-            candidate.Content.Parts = append(candidate.Content.Parts, &Part{
-                ThoughtSignature: []byte(*bifrostResp.Item.ResponsesReasoning.EncryptedContent),
-            })
-        }
+        if bifrostResp.Item != nil && bifrostResp.Item.ResponsesReasoning != nil && bifrostResp.Item.ResponsesReasoning.EncryptedContent != nil {
+            if decoded, err := base64.StdEncoding.DecodeString(*bifrostResp.Item.ResponsesReasoning.EncryptedContent); err == nil {
+                candidate.Content.Parts = append(candidate.Content.Parts, &Part{ThoughtSignature: decoded})
+            }
+        }
@@
-                        if nextMsg.Type != nil && *nextMsg.Type == schemas.ResponsesMessageTypeReasoning &&
-                            nextMsg.ResponsesReasoning != nil && nextMsg.ResponsesReasoning.EncryptedContent != nil {
-                            part.ThoughtSignature = []byte(*nextMsg.ResponsesReasoning.EncryptedContent)
-                        }
+                        if nextMsg.Type != nil && *nextMsg.Type == schemas.ResponsesMessageTypeReasoning &&
+                            nextMsg.ResponsesReasoning != nil && nextMsg.ResponsesReasoning.EncryptedContent != nil {
+                            if decoded, err := base64.StdEncoding.DecodeString(*nextMsg.ResponsesReasoning.EncryptedContent); err == nil {
+                                part.ThoughtSignature = decoded
+                            }
+                        }

Also applies to: 285-289, 422-427, 2162-2169


753-795: Text-item caching is good; add a small guard for TextOutputIndex/ItemIDs to avoid silent empty IDs.
In processGeminiTextPart, itemID := state.ItemIDs[outputIndex] will yield "" if something goes off-track; similarly closeGeminiTextItem assumes TextOutputIndex is valid. A defensive check (and early return) would make debugging much easier.

Also applies to: 1229-1230, 1282-1284

♻️ Duplicate comments (7)
core/changelog.md (1)

4-4: Hyphenate “inter-provider” for consistency/readability (LanguageTool).
This was already flagged previously; please apply the hyphenation.

-enhancement: improved internal inter provider conversions for integrations
+enhancement: improved internal inter-provider conversions for integrations
plugins/semanticcache/changelog.md (1)

1-1: Verify changelog entries in core and framework changelogs.

This changelog entry documents the version bumps (core v1.2.37, framework v1.1.47), but as flagged in the previous review, corresponding entries should also be added to core/changelog.md and framework/changelog.md to maintain consistency across all component changelogs.

#!/bin/bash
# Verify if core and framework changelogs contain entries for the version bumps

echo "=== Checking core/changelog.md for version 1.2.37 ==="
if [ -f "core/changelog.md" ]; then
  grep -A 2 "1\.2\.37" core/changelog.md || echo "⚠️  No entry found for 1.2.37"
else
  echo "⚠️  core/changelog.md not found"
fi

echo ""
echo "=== Checking framework/changelog.md for version 1.1.47 ==="
if [ -f "framework/changelog.md" ]; then
  grep -A 2 "1\.1\.47" framework/changelog.md || echo "⚠️  No entry found for 1.1.47"
else
  echo "⚠️  framework/changelog.md not found"
fi

echo ""
echo "=== Checking if both files exist ==="
ls -la core/changelog.md framework/changelog.md 2>&1
tests/integrations/tests/test_google.py (1)

889-909: Thinking tests: add sentinel skip + mark unused test_config.
Both tests can receive ("_no_providers_", "_no_model_") and should skip like other parametrized tests. Also Ruff ARG002 applies.

 @skip_if_no_api_key("google")
 @pytest.mark.parametrize("provider,model", get_cross_provider_params_for_scenario("thinking"))
-def test_26_extended_thinking(self, google_client, test_config, provider, model):
+def test_26_extended_thinking(self, google_client, _test_config, provider, model):
+    if provider == "_no_providers_" or model == "_no_model_":
+        pytest.skip("No providers configured for this scenario")
     """Test Case 26: Extended thinking/reasoning (non-streaming)"""
     from google.genai import types
@@
 @skip_if_no_api_key("google")
 @pytest.mark.parametrize("provider,model", get_cross_provider_params_for_scenario("thinking"))
-def test_27_extended_thinking_streaming(self, google_client, test_config, provider, model):
+def test_27_extended_thinking_streaming(self, google_client, _test_config, provider, model):
+    if provider == "_no_providers_" or model == "_no_model_":
+        pytest.skip("No providers configured for this scenario")
     """Test Case 27: Extended thinking/reasoning (streaming)"""
     from google.genai import types

Also applies to: 965-985

tests/integrations/tests/test_anthropic.py (1)

667-686: Cross-provider thinking: verify extra_body.reasoning_summary is accepted for all providers; also mark unused test_config.
If some backends reject unknown request fields, you may need to gate extra_body by provider. Also Ruff ARG002 applies.

 @pytest.mark.parametrize("provider,model", get_cross_provider_params_for_scenario("thinking"))
-def test_15_extended_thinking(self, anthropic_client, test_config, provider, model):
+def test_15_extended_thinking(self, anthropic_client, _test_config, provider, model):
@@
 @pytest.mark.parametrize("provider,model", get_cross_provider_params_for_scenario("thinking"))
-def test_16_extended_thinking_streaming(self, anthropic_client, test_config, provider, model):
+def test_16_extended_thinking_streaming(self, anthropic_client, _test_config, provider, model):
#!/bin/bash
set -euo pipefail
# Find where Bifrost/Anthropic path handles extra_body passthrough, and whether unknown fields are dropped or validated.
rg -n --type=go "extra_body|reasoning_summary|ParseAndSetRawRequest|unknown fields|ExtraParams" -S core | head -200

Also applies to: 747-765

transports/changelog.md (1)

5-5: Hyphenate “inter-provider” for consistency.
This matches the prior LanguageTool finding.

-enhancement: improved internal inter provider conversions for integrations
+enhancement: improved internal inter-provider conversions for integrations
tests/integrations/tests/test_bedrock.py (1)

164-167: Unconditional skip needs a reason + tracking reference.

-        pytest.skip("Skipping text completion invoke test")
+        # TODO(<issue-id>): reason (e.g., flaky upstream / unsupported for some models / auth constraints)
+        pytest.skip("Skipping text completion invoke test - TODO(<issue-id>): <reason>")
tests/integrations/tests/test_openai.py (1)

1385-1391: Fix thinking test: skip sentinel tuple + silence unused test_config.
Right now get_cross_provider_params_for_scenario("thinking") can return ("_no_providers_", "_no_model_"), and this test doesn’t skip it (unlike others). Also Ruff flags test_config unused.

 @skip_if_no_api_key("openai")
 @pytest.mark.parametrize("provider,model", get_cross_provider_params_for_scenario("thinking"))
-def test_38_responses_reasoning(self, openai_client, test_config, provider, model):
+def test_38_responses_reasoning(self, openai_client, _test_config, provider, model):
+    if provider == "_no_providers_" or model == "_no_model_":
+        pytest.skip("No providers configured for this scenario")
     """Test Case 38: Responses API with reasoning (gpt-5 model)"""
     # Use gpt-5 reasoning model
     model_to_use = format_provider_model(provider, model)
🧹 Nitpick comments (2)
tests/integrations/tests/test_bedrock.py (1)

210-216: Token limit bump to 500: OK, but consider centralizing.
Hard-coding maxTokens=500 across tests can drift; if you already have test_settings.max_tokens in tests/integrations/config.yml, prefer reading from config/util to keep providers aligned.

Also applies to: 249-255, 296-301, 684-690, 726-731

core/providers/gemini/responses.go (1)

523-525: Stream state/index refactor looks solid (monotonic nextOutputIndex, cached TextOutputIndex).
The helper split (closeTextItemIfOpen, nextOutputIndex, generateItemID) makes ordering changes (“thoughts before text”) much safer. Only nit: MessageID := fmt.Sprintf("msg_%d", state.CreatedAt) can collide across streams started in the same second—consider a higher-entropy suffix if IDs are externally visible/meaningful.

Also applies to: 553-554, 568-569, 603-604, 616-640

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c93b41a and 59dee28.

⛔ Files ignored due to path filters (1)
  • tests/integrations/uv.lock is excluded by !**/*.lock
📒 Files selected for processing (35)
  • core/changelog.md (1 hunks)
  • core/providers/cohere/chat.go (1 hunks)
  • core/providers/gemini/chat.go (0 hunks)
  • core/providers/gemini/gemini.go (7 hunks)
  • core/providers/gemini/responses.go (23 hunks)
  • core/providers/gemini/types.go (2 hunks)
  • core/providers/gemini/utils.go (3 hunks)
  • core/providers/vertex/vertex.go (3 hunks)
  • core/version (1 hunks)
  • framework/changelog.md (1 hunks)
  • framework/version (1 hunks)
  • plugins/governance/changelog.md (1 hunks)
  • plugins/governance/version (1 hunks)
  • plugins/jsonparser/changelog.md (1 hunks)
  • plugins/jsonparser/version (1 hunks)
  • plugins/logging/changelog.md (1 hunks)
  • plugins/logging/version (1 hunks)
  • plugins/maxim/changelog.md (1 hunks)
  • plugins/maxim/version (1 hunks)
  • plugins/mocker/changelog.md (1 hunks)
  • plugins/mocker/version (1 hunks)
  • plugins/otel/changelog.md (1 hunks)
  • plugins/otel/version (1 hunks)
  • plugins/semanticcache/changelog.md (1 hunks)
  • plugins/semanticcache/version (1 hunks)
  • plugins/telemetry/changelog.md (1 hunks)
  • plugins/telemetry/version (1 hunks)
  • tests/integrations/config.yml (6 hunks)
  • tests/integrations/pyproject.toml (1 hunks)
  • tests/integrations/tests/test_anthropic.py (5 hunks)
  • tests/integrations/tests/test_bedrock.py (6 hunks)
  • tests/integrations/tests/test_google.py (8 hunks)
  • tests/integrations/tests/test_openai.py (5 hunks)
  • tests/integrations/tests/utils/common.py (2 hunks)
  • transports/changelog.md (1 hunks)
💤 Files with no reviewable changes (1)
  • core/providers/gemini/chat.go
✅ Files skipped from review due to trivial changes (3)
  • plugins/semanticcache/version
  • core/version
  • plugins/maxim/changelog.md
🚧 Files skipped from review as they are similar to previous changes (18)
  • plugins/logging/version
  • plugins/maxim/version
  • core/providers/cohere/chat.go
  • core/providers/gemini/utils.go
  • plugins/otel/version
  • core/providers/gemini/types.go
  • framework/version
  • plugins/mocker/version
  • core/providers/vertex/vertex.go
  • plugins/jsonparser/version
  • framework/changelog.md
  • plugins/telemetry/changelog.md
  • plugins/mocker/changelog.md
  • tests/integrations/tests/utils/common.py
  • plugins/otel/changelog.md
  • plugins/jsonparser/changelog.md
  • plugins/logging/changelog.md
  • plugins/governance/version
🧰 Additional context used
📓 Path-based instructions (1)
**

⚙️ CodeRabbit configuration file

always check the stack if there is one for the current PR. do not give localized reviews for the PR, always see all changes in the light of the whole stack of PRs (if there is a stack, if there is no stack you can continue to make localized suggestions/reviews)

Files:

  • core/changelog.md
  • tests/integrations/tests/test_openai.py
  • tests/integrations/tests/test_anthropic.py
  • core/providers/gemini/gemini.go
  • core/providers/gemini/responses.go
  • tests/integrations/config.yml
  • tests/integrations/pyproject.toml
  • tests/integrations/tests/test_bedrock.py
  • plugins/telemetry/version
  • plugins/semanticcache/changelog.md
  • plugins/governance/changelog.md
  • tests/integrations/tests/test_google.py
  • transports/changelog.md
🧠 Learnings (1)
📚 Learning: 2025-12-09T17:07:42.007Z
Learnt from: qwerty-dvorak
Repo: maximhq/bifrost PR: 1006
File: core/schemas/account.go:9-18
Timestamp: 2025-12-09T17:07:42.007Z
Learning: In core/schemas/account.go, the HuggingFaceKeyConfig field within the Key struct is currently unused and reserved for future Hugging Face inference endpoint deployments. Do not flag this field as missing from OpenAPI documentation or require its presence in the API spec until the feature is actively implemented and used. When the feature is added, update the OpenAPI docs accordingly; otherwise, treat this field as non-breaking and not part of the current API surface.

Applied to files:

  • core/providers/gemini/gemini.go
  • core/providers/gemini/responses.go
🧬 Code graph analysis (4)
tests/integrations/tests/test_openai.py (1)
tests/integrations/tests/utils/parametrize.py (2)
  • get_cross_provider_params_for_scenario (12-47)
  • format_provider_model (50-65)
core/providers/gemini/gemini.go (1)
core/providers/utils/utils.go (2)
  • ShouldSendBackRawRequest (551-556)
  • ParseAndSetRawRequest (447-454)
core/providers/gemini/responses.go (3)
core/providers/gemini/types.go (5)
  • Role (15-15)
  • FunctionCall (1144-1154)
  • FunctionResponse (1157-1179)
  • Content (975-983)
  • Type (780-780)
core/schemas/responses.go (3)
  • ResponsesMessage (318-331)
  • ResponsesMessageContent (343-348)
  • ResponsesMessageTypeMessage (294-294)
core/schemas/utils.go (1)
  • Ptr (16-18)
tests/integrations/tests/test_google.py (1)
tests/integrations/tests/utils/parametrize.py (2)
  • get_cross_provider_params_for_scenario (12-47)
  • format_provider_model (50-65)
🪛 LanguageTool
core/changelog.md

[grammar] ~4-~4: Use a hyphen to join words.
Context: ...api enhancement: improved internal inter provider conversions for integrations fe...

(QB_NEW_EN_HYPHEN)

transports/changelog.md

[grammar] ~5-~5: Use a hyphen to join words.
Context: ...api enhancement: improved internal inter provider conversions for integrations fe...

(QB_NEW_EN_HYPHEN)

🪛 Ruff (0.14.8)
tests/integrations/tests/test_openai.py

1387-1387: Unused method argument: test_config

(ARG002)

tests/integrations/tests/test_anthropic.py

668-668: Unused method argument: test_config

(ARG002)


748-748: Unused method argument: test_config

(ARG002)

tests/integrations/tests/test_google.py

891-891: Unused method argument: test_config

(ARG002)


967-967: Unused method argument: test_config

(ARG002)

⏰ Context from checks skipped due to timeout of 900000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
🔇 Additional comments (14)
plugins/telemetry/version (1)

1-1: Version bump looks fine; please confirm stack-wide version alignment.

Given this PR is part of a Graphite stack, please double-check the telemetry plugin version 1.3.47 matches the repo’s overall versioning/release metadata expectations for the entire stack (e.g., any plugin index/manifest/changelog conventions, if applicable). As per coding guidelines, reviewing in light of the whole stack.

plugins/governance/changelog.md (1)

1-1: Verify changelog entry completeness and format consistency.

The entry mentions version bumps (core 1.2.37, framework 1.1.47) but doesn't detail the specific improvements (thinking/reasoning enhancements, role mapping fixes, etc.) introduced in this PR. Verify whether this minimal entry is appropriate for the governance plugin changelog (i.e., if only dependencies changed) or if more detail is needed to match the project's changelog conventions.

tests/integrations/tests/test_google.py (1)

475-480: max_output_tokens bump seems fine; confirm it matches intended coverage/cost.
No correctness issues seen.

Also applies to: 501-507

tests/integrations/tests/test_anthropic.py (1)

273-279: Token bumps look reasonable for avoiding truncation.
No correctness issues.

Also applies to: 594-600, 616-622

core/providers/gemini/gemini.go (3)

246-249: Correct raw-request handling (avoid []byte → base64 in JSON).
Using providerUtils.ParseAndSetRawRequest here aligns the raw request shape with other endpoints.


293-307: Raw request flag plumbing in chat streaming looks correct.
Attaching the parsed raw request only on the terminal chunk is a good tradeoff to avoid repeated payload bloat.

Also applies to: 311-325, 483-490


591-605: Raw request flag plumbing in responses streaming looks correct.
Same terminal-event behavior applied consistently here.

Also applies to: 609-623, 786-794

tests/integrations/pyproject.toml (1)

35-35: Verify google-genai>=1.50.0 compatibility with your streaming and type expectations. Ensure the version bump doesn't break any reliance on generate_content_stream behavior or type shapes your tests depend on (e.g., usage_metadata, thoughts_token_count fields).

tests/integrations/config.yml (1)

44-45: File may not exist in the current repository state. The review comment references tests/integrations/config.yml at lines 44-45 and additional lines (56-57, 67-68, 83-84, 125-126, 175-176), but this file could not be located in the repository. This suggests either:

  1. The file is being added in a staged/pending change not yet visible in the working tree
  2. The file path may be incorrect
  3. The review is referencing a change that hasn't been made available to the verification environment

If the file exists and contains new thinking model IDs (gpt-5.1, claude-opus-4-5, gemini-3-pro-preview, etc.), the original concern remains valid: Configuration files with model definitions should be validated against any model_capabilities or provider support logic to ensure these new model IDs are recognized. However, this cannot be verified without access to the actual file and related validation code.

tests/integrations/tests/test_openai.py (1)

1131-1135: max_output_tokens=1000 parameter is fully supported in openai SDK 1.30.0+.

The minimum required version in pyproject.toml (openai>=1.30.0) includes full support for the Responses API with the max_output_tokens parameter. All usages across the test cases (lines 1131–1135, 1166–1170, 1198–1202, 1294–1299) are correct and compatible.

core/providers/gemini/responses.go (4)

174-182: Role mapping tweak looks intentional; verify no “system/developer” output messages are mis-bucketed.
Gemini only supports "user"/"model", so defaulting everything except ResponsesInputMessageRoleUser to "model" is reasonable—just ensure bifrostResp.Output never includes system-ish roles that should instead become SystemInstruction.


720-730: Processing order (thoughts before text; function calls before signatures) matches the PR intent.
This is consistent with “prioritize thoughts before regular text” and the text-close-before-non-text pattern you added.


1642-1657: Nice: explicit “regular text” candidate conversion + reasoning effort guard.
The added case part.Text != "" in convertGeminiCandidatesToResponsesOutput makes mixed-part candidates far clearer, and the “only set thinking level when max tokens isn’t set” guard prevents conflicting thinking config.

Also applies to: 1865-1867


968-970: Address sonic.Marshal behavioral differences and add error handling.

The code at lines 968–970 and 1663–1665 silently ignores sonic.Marshal errors. Two issues:

  1. HTML escaping mismatch: sonic v1.14.1 does NOT escape HTML by default (unlike encoding/json). If tool arguments contain HTML-like characters (<, >, &, etc.), sonic's output will differ. Use sonic.ConfigStd or an encoder with EscapeHTML enabled if byte-for-byte parity is required.

  2. Silent error suppression: Errors are ignored, leaving argsJSON as an empty string if marshaling fails. This masks real issues (circular references, unsupported types) and is inconsistent with error handling elsewhere in the codebase (logging/returning errors in other sonic.Marshal calls). Log or return the error instead of silently dropping it.

@TejasGhatte TejasGhatte force-pushed the 12-10-fix_integration_test_cases_fixes branch from bbd6304 to 78f731b Compare December 12, 2025 11:18
@TejasGhatte TejasGhatte force-pushed the 12-08-feat_added_support_for_gemini_native_converterswq branch from 3154215 to d5b03f0 Compare December 12, 2025 11:25
@TejasGhatte TejasGhatte force-pushed the 12-10-fix_integration_test_cases_fixes branch from 78f731b to 3475937 Compare December 12, 2025 11:25
@TejasGhatte TejasGhatte changed the base branch from 12-08-feat_added_support_for_gemini_native_converterswq to graphite-base/1066 December 12, 2025 11:40
Copy link
Collaborator

Pratham-Mishra04 commented Dec 12, 2025

Merge activity

@TejasGhatte TejasGhatte force-pushed the 12-10-fix_integration_test_cases_fixes branch from 3475937 to 595f34e Compare December 12, 2025 14:32
@TejasGhatte TejasGhatte changed the base branch from graphite-base/1066 to 12-08-feat_added_support_for_gemini_native_converterswq December 12, 2025 14:32
@TejasGhatte TejasGhatte force-pushed the 12-10-fix_integration_test_cases_fixes branch from 595f34e to 5b9f593 Compare December 12, 2025 14:33
@Pratham-Mishra04 Pratham-Mishra04 changed the base branch from 12-08-feat_added_support_for_gemini_native_converterswq to graphite-base/1066 December 12, 2025 14:44
@Pratham-Mishra04 Pratham-Mishra04 changed the base branch from graphite-base/1066 to main December 12, 2025 14:44
@Pratham-Mishra04 Pratham-Mishra04 force-pushed the 12-10-fix_integration_test_cases_fixes branch from 5b9f593 to 3f22dc6 Compare December 12, 2025 14:45
@Pratham-Mishra04 Pratham-Mishra04 merged commit 653191b into main Dec 12, 2025
8 checks passed
@Pratham-Mishra04 Pratham-Mishra04 deleted the 12-10-fix_integration_test_cases_fixes branch December 12, 2025 14:46
@coderabbitai coderabbitai bot mentioned this pull request Dec 20, 2025
16 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants