-
Notifications
You must be signed in to change notification settings - Fork 150
fix: integration test cases fixes #1066
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: integration test cases fixes #1066
Conversation
|
Caution Review failedThe pull request is closed. 📝 WalkthroughSummary by CodeRabbit
✏️ Tip: You can customize this high-level summary in your review settings. WalkthroughStop propagating ThoughtSignature in Gemini streaming; refactor Gemini responses state with indexed item IDs, text buffering, sonic/json changes, and base64 handling; add GenerationConfigThinkingConfig UnmarshalJSON; thread optional raw-request capture through Gemini/Vertex streaming; broaden cross-provider thinking tests and bump versions/deps. Changes
Sequence Diagram(s)mermaid Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes
Possibly related PRs
Suggested reviewers
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (1 passed)
📜 Recent review detailsConfiguration used: CodeRabbit UI Review profile: CHILL Plan: Pro ⛔ Files ignored due to path filters (1)
📒 Files selected for processing (36)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
This stack of pull requests is managed by Graphite. Learn more about stacking. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
core/providers/gemini/utils.go (1)
670-680: Role mapping may incorrectly map tool messages.The current logic maps all non-user roles to
"model", butChatMessageRoleTool(tool responses from the user providing function call results) should map to"user"in Gemini's format, not"model". Also, line 673-674 is now dead code since it's always overwritten.Consider this correction:
if len(parts) > 0 { content := Content{ Parts: parts, - Role: string(message.Role), } - if message.Role == schemas.ChatMessageRoleUser { + switch message.Role { + case schemas.ChatMessageRoleUser, schemas.ChatMessageRoleTool: content.Role = "user" - } else { + default: content.Role = "model" } contents = append(contents, content) }
🧹 Nitpick comments (5)
tests/integrations/tests/test_bedrock.py (2)
164-167: Unconditional skip makes test body effectively dead
pytest.skip("Skipping text completion invoke test")at the top meanstest_01_text_completion_invokenever executes, even when Bedrock keys are present. If this is a long‑term decision, consider either marking the test itself as@pytest.mark.skip(reason=...)and removing the unreachable body, or documenting the reason (flaky, deprecated, etc.) so future maintainers know whether/when to re‑enable it.
210-216: Higher BedrockmaxTokensvalues look appropriateBumping
inferenceConfig={"maxTokens": 500}for tool-calling, image analysis, and end‑to‑end tool tests aligns with extended reasoning and reduces truncation risk, and 500 is still a conservative budget for Bedrock. If you find yourself tweaking these in more places, it may be worth centralizing typical test budgets (e.g., viatest_settingsor a small helper) instead of scattering literals.Also applies to: 249-255, 296-301, 685-691, 726-731
tests/integrations/tests/test_openai.py (1)
1385-1391: Cross-provider reasoning test is robust; address unusedtest_configThe refactored
test_38_responses_reasoningnicely parameterizes overget_cross_provider_params_for_scenario("thinking")and degrades gracefully by retrying withoutreasoningwhen a backend doesn’t support those parameters. Two small nits:
test_configis unused here and Ruff flags it (ARG002); either drop it from the signature or explicitly mark this test with a# noqa: ARG002if you want consistent fixture signatures.- The docstring still says “(gpt-5 model)” even though the test now runs across multiple providers; consider updating it to reflect the broader scope.
Also applies to: 1411-1523
tests/integrations/tests/test_anthropic.py (1)
673-691: Extended thinking tests are well-structured; tidy up unusedtest_configThe non‑streaming and streaming extended‑thinking tests now parameterize over the
thinkingscenario and useformat_provider_model(provider, model)with reducedmax_tokens/budget_tokens, plusextra_body={"reasoning_summary": "detailed"}, which aligns with the new reasoning support. The assertions on thinking blocks and reasoning keywords are strong. The only minor issue is thattest_configisn’t used in either test, triggering Ruff ARG002; you can safely drop it from these two signatures or add a# noqa: ARG002if you prefer to keep a uniform fixture pattern.Also applies to: 699-745, 752-769, 771-852
tests/integrations/tests/test_google.py (1)
889-909: Extended thinking tests fit the new Gemini semantics; fix small lint issuesThe non‑streaming (
test_26_extended_thinking) and streaming (test_27_extended_thinking_streaming) tests correctly:
- Parameterize over the
thinkingscenario usingformat_provider_model(provider, model).- Configure
ThinkingConfig(include_thoughts=True, thinking_budget=2000)with largermax_output_tokensto exercise reasoning.- Scope strict “thoughts token count” checks to
provider == "gemini", which avoids penalizing other providers.Two minor cleanups:
test_configis unused in both tests (Ruff ARG002); consider removing it from the signatures or adding# noqa: ARG002on each definition to keep the fixture pattern without tripping lint.- In
test_27,has_thoughtsis assigned but never used (F841). If you want to assert on Gemini thoughts as you do in the non‑streaming test, add anassert has_thoughtsunderif provider == "gemini"; otherwise, you can drophas_thoughtsentirely and just logthoughts_token_countto satisfy the linter. For example:- if provider == "gemini": - has_thoughts = False - thoughts_token_count = 0 - - if hasattr(final_usage, "thoughts_token_count"): - thoughts_token_count = final_usage.thoughts_token_count - has_thoughts = thoughts_token_count > 0 - print(f"Found thoughts with {thoughts_token_count} tokens") + if provider == "gemini" and hasattr(final_usage, "thoughts_token_count"): + thoughts_token_count = final_usage.thoughts_token_count + print(f"Found thoughts with {thoughts_token_count} tokens")Also applies to: 920-937, 971-985, 1018-1027
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
tests/integrations/uv.lockis excluded by!**/*.lock
📒 Files selected for processing (11)
core/providers/gemini/chat.go(0 hunks)core/providers/gemini/responses.go(4 hunks)core/providers/gemini/types.go(3 hunks)core/providers/gemini/utils.go(2 hunks)tests/integrations/config.yml(6 hunks)tests/integrations/pyproject.toml(1 hunks)tests/integrations/tests/test_anthropic.py(5 hunks)tests/integrations/tests/test_bedrock.py(6 hunks)tests/integrations/tests/test_google.py(8 hunks)tests/integrations/tests/test_openai.py(5 hunks)tests/integrations/tests/utils/common.py(2 hunks)
💤 Files with no reviewable changes (1)
- core/providers/gemini/chat.go
🧰 Additional context used
📓 Path-based instructions (1)
**
⚙️ CodeRabbit configuration file
always check the stack if there is one for the current PR. do not give localized reviews for the PR, always see all changes in the light of the whole stack of PRs (if there is a stack, if there is no stack you can continue to make localized suggestions/reviews)
Files:
tests/integrations/config.ymlcore/providers/gemini/utils.gotests/integrations/tests/utils/common.pycore/providers/gemini/types.gotests/integrations/tests/test_openai.pycore/providers/gemini/responses.gotests/integrations/tests/test_bedrock.pytests/integrations/pyproject.tomltests/integrations/tests/test_google.pytests/integrations/tests/test_anthropic.py
🧠 Learnings (1)
📚 Learning: 2025-12-09T17:07:42.007Z
Learnt from: qwerty-dvorak
Repo: maximhq/bifrost PR: 1006
File: core/schemas/account.go:9-18
Timestamp: 2025-12-09T17:07:42.007Z
Learning: In core/schemas/account.go, the HuggingFaceKeyConfig field within the Key struct is currently unused and reserved for future Hugging Face inference endpoint deployments. Do not flag this field as missing from OpenAPI documentation or require its presence in the API spec until the feature is actively implemented and used. When the feature is added, update the OpenAPI docs accordingly; otherwise, treat this field as non-breaking and not part of the current API surface.
Applied to files:
core/providers/gemini/utils.gocore/providers/gemini/types.gocore/providers/gemini/responses.go
🧬 Code graph analysis (3)
core/providers/gemini/utils.go (3)
core/schemas/utils.go (1)
SafeExtractStringPointer(468-483)core/providers/gemini/types.go (1)
Role(15-15)core/schemas/chatcompletions.go (1)
ChatMessageRoleUser(462-462)
core/providers/gemini/types.go (3)
core/providers/anthropic/types.go (1)
Alias(103-103)core/providers/bedrock/types.go (1)
Alias(101-101)core/providers/openai/types.go (2)
Alias(79-79)Alias(190-190)
tests/integrations/tests/test_anthropic.py (1)
tests/integrations/tests/utils/parametrize.py (2)
get_cross_provider_params_for_scenario(12-47)format_provider_model(50-65)
🪛 Ruff (0.14.8)
tests/integrations/tests/test_openai.py
1387-1387: Unused method argument: test_config
(ARG002)
tests/integrations/tests/test_google.py
891-891: Unused method argument: test_config
(ARG002)
967-967: Unused method argument: test_config
(ARG002)
1025-1025: Local variable has_thoughts is assigned to but never used
Remove assignment to unused variable has_thoughts
(F841)
tests/integrations/tests/test_anthropic.py
675-675: Unused method argument: test_config
(ARG002)
754-754: Unused method argument: test_config
(ARG002)
⏰ Context from checks skipped due to timeout of 900000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (26)
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
🔇 Additional comments (13)
core/providers/gemini/types.go (1)
981-1019: LGTM - Well-structured dual-case JSON unmarshalling.The implementation correctly uses pointer types to detect field presence and properly falls back from camelCase to snake_case. This ensures compatibility with both the Gemini API (camelCase) and the genai SDK (snake_case).
core/providers/gemini/utils.go (1)
35-37: LGTM - Proper extraction of reasoning_summary.The code correctly uses the
SafeExtractStringPointerutility to safely extract thereasoning_summaryfromExtraParamsand populateparams.Reasoning.Summary.core/providers/gemini/responses.go (4)
730-740: LGTM - Reordering of part processing for thought prioritization.The switch case reordering ensures thought/reasoning content is processed before regular text, and function calls are handled before thought signatures. This aligns with the PR objective of prioritizing thought responses before regular text.
1679-1693: LGTM - Proper handling of regular text parts.The new case correctly handles non-thought text content, creating a properly structured
ResponsesMessagewith the appropriate type and content block structure.
1723-1736: LGTM - Consistent base64 encoding of thought signatures.The change to use
base64.StdEncoding.EncodeToStringfor thought signatures is consistent with the streaming path (line 928) and the standalone ThoughtSignature case (line 1868), ensuring uniform encoding across all code paths.
1898-1914: Mutual exclusivity of ThinkingBudget and ThinkingLevel is correctly enforced.The condition that sets
ThinkingLevelonly whenMaxTokensisnilis correct. Gemini API documentation confirms thatthinkingLevelandthinkingBudgetmust not be sent together in the same request, as this can cause an error. The code properly prioritizes explicit token budgets (MaxTokens→ThinkingBudget) over effort-based reasoning levels (Effort→ThinkingLevel), preventing conflicting configurations.tests/integrations/config.yml (1)
36-45: Thinking-capable model wiring is consistent with new testsDefining
thinkingmodels for OpenAI, Anthropic, Gemini, and Bedrock and enabling thethinkingscenario for those providers lines up with the newget_cross_provider_params_for_scenario("thinking")usages in the test suite. Please just double‑check that these model IDs exist and are configured in Bifrost (and any staging/prod envs) so the new cross‑provider reasoning tests don’t fail due to missing models or routing.Also applies to: 51-57, 63-68, 78-84, 108-126, 159-176, 184-201
tests/integrations/tests/utils/common.py (1)
545-568: Content normalization viaget_content_stringis a solid improvementRouting OpenAI content through
get_content_stringin bothassert_valid_chat_responseandassert_valid_image_responsecorrectly handles the newer “list of content parts” shape and prevents.lower()calls on non‑strings. The helper’s handling of dict/text parts is defensive and should make the assertions more robust across SDK versions and providers.Also applies to: 600-620, 1878-1891
tests/integrations/pyproject.toml (1)
24-37:google-genai>=1.50.0matches new ThinkingConfig usageRaising the minimum
google-genaiversion to>=1.50.0is consistent with the use oftypes.ThinkingConfig, richer speech APIs, and streaming semantics intest_google.py. Just ensure your CI and any developer environments running these integration tests are actually upgraded to at least this version.tests/integrations/tests/test_openai.py (1)
1126-1135: Largermax_output_tokensfor Responses tests are reasonableIncreasing
max_output_tokensto 1000 for the various Responses API tests (simple text, system message, image, and streaming) makes sense for richer cross‑provider outputs and should help avoid truncation, especially with reasoning‑heavy models. Keep an eye on any provider‑specific hard limits, but 1000 is generally a safe upper bound for these integration tests.Also applies to: 1162-1170, 1194-1202, 1290-1298
tests/integrations/tests/test_anthropic.py (1)
265-279: Increased Anthropicmax_tokensmatch more complex flowsRaising
max_tokensto 500 for the end‑to‑end tool‑calling flow and to 1000 for streaming (both plain and with tools) is appropriate for multi‑turn, tool‑heavy, or verbose outputs. These values are still conservative for modern models and should reduce flakiness from truncated responses.Also applies to: 593-608, 618-627
tests/integrations/tests/test_google.py (2)
475-479: Highermax_output_tokensfor Google integration-specific tests are sensibleUsing
max_output_tokens=1000for the creative story andmax_output_tokens=500for the system-instruction test gives the GenAI models enough room to respond naturally without being overly constrained. This is in line with the expanded reasoning/longer outputs tested elsewhere.Also applies to: 500-507
1049-1071: Helper for extracting Google function calls is defensive and ready for reuse
extract_google_function_callscorrectly guards against missingfunction_callsand safely convertsargsto a dict, which should make future tests around tool-calling easier to write and less repetitive. Once you start asserting on function calls in more places, consider using this helper instead of open‑coding the extraction.
7f5bb40 to
bdc7fee
Compare
5edcf23 to
e72dca6
Compare
bdc7fee to
3a39ed0
Compare
e72dca6 to
dc29eae
Compare
7460482 to
0a143cb
Compare
dc29eae to
509b38e
Compare
509b38e to
fd14a0e
Compare
0a143cb to
811dc18
Compare
811dc18 to
9fe9555
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 7
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (3)
tests/integrations/tests/test_openai.py (1)
1385-1524: Test 38 needs the_no_providers_/_no_model_sentinel skip (otherwise it will fail in empty-provider setups).
Other parametrized tests guard this; this one currently doesn’t. Also, Ruff flagstest_configunused—use it (lightweight assert) to keep fixture wiring intact.@skip_if_no_api_key("openai") @pytest.mark.parametrize("provider,model", get_cross_provider_params_for_scenario("thinking")) def test_38_responses_reasoning(self, openai_client, test_config, provider, model): + if provider == "_no_providers_" or model == "_no_model_": + pytest.skip("No providers configured for this scenario") + assert test_config is not None """Test Case 38: Responses API with reasoning (gpt-5 model)""" # Use gpt-5 reasoning model model_to_use = format_provider_model(provider, model)core/providers/vertex/vertex.go (1)
1018-1038: Remove duplicatedModelDeploymentassignment (merge artifact).
ModelDeploymentis set twice (once at Lines 1024-1026 and again at Lines 1032-1034). Keep one.@@ - if request.Model != deployment { - response.ExtraFields.ModelDeployment = deployment - } - - // Set raw response if enabled + // Set raw response if enabled if providerUtils.ShouldSendBackRawResponse(ctx, provider.sendBackRawResponse) { response.ExtraFields.RawResponse = rawResponse } - if request.Model != deployment { - response.ExtraFields.ModelDeployment = deployment - } if providerUtils.ShouldSendBackRawRequest(ctx, provider.sendBackRawRequest) { response.ExtraFields.RawRequest = rawRequest }core/providers/cohere/chat.go (1)
686-697: GuardChatToolMessagecreation on non-nilToolCallIDand use the role constant.When
cm.ToolCallIDis nil, creatingChatToolMessageproduces invalid tool messages that downstream providers reject or skip. Bedrock explicitly validates this with: "tool message missing tool_call_id". Additionally, useschemas.ChatMessageRoleToolinstead of the magic string"tool"for consistency with the rest of the codebase.- if cm.Role == "tool" { + if cm.Role == string(schemas.ChatMessageRoleTool) && cm.ToolCallID != nil { bifrostMessage.ChatToolMessage = &schemas.ChatToolMessage{ ToolCallID: cm.ToolCallID, } }
🧹 Nitpick comments (4)
tests/integrations/tests/utils/common.py (1)
616-620: Good: image assertions now normalize OpenAI content viaget_content_string(...).lower().
Only suggestion: consider adding/adjusting a small unit test forget_content_string()covering the expected OpenAI “list of dict blocks” shape (e.g.,{"type": "...", "text": "..."}) to lock in behavior.tests/integrations/tests/test_openai.py (1)
1131-1135: Consider makingmax_output_tokensscenario/config-driven to avoid slow/flaky runs across providers.
The bump to1000can materially increase latency/cost for slower/stricter providers; if you already have per-scenario budgets in config, it’d be more resilient to fetch a cap from there (or at least document why1000is required).Also applies to: 1166-1170, 1198-1202, 1294-1299
tests/integrations/tests/test_anthropic.py (2)
675-675: Remove unused test_config parameter.The
test_configfixture parameter is not used in the test body.Apply this diff to remove the unused parameter:
-def test_15_extended_thinking(self, anthropic_client, test_config, provider, model): +def test_15_extended_thinking(self, anthropic_client, provider, model):Based on static analysis hints.
754-754: Remove unused test_config parameter.The
test_configfixture parameter is not used in the test body.Apply this diff to remove the unused parameter:
-def test_16_extended_thinking_streaming(self, anthropic_client, test_config, provider, model): +def test_16_extended_thinking_streaming(self, anthropic_client, provider, model):Based on static analysis hints.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
tests/integrations/uv.lockis excluded by!**/*.lock
📒 Files selected for processing (35)
core/changelog.md(1 hunks)core/providers/cohere/chat.go(1 hunks)core/providers/gemini/chat.go(0 hunks)core/providers/gemini/gemini.go(7 hunks)core/providers/gemini/responses.go(4 hunks)core/providers/gemini/types.go(3 hunks)core/providers/gemini/utils.go(2 hunks)core/providers/vertex/vertex.go(3 hunks)core/version(1 hunks)framework/changelog.md(1 hunks)framework/version(1 hunks)plugins/governance/changelog.md(1 hunks)plugins/governance/version(1 hunks)plugins/jsonparser/changelog.md(1 hunks)plugins/jsonparser/version(1 hunks)plugins/logging/changelog.md(1 hunks)plugins/logging/version(1 hunks)plugins/maxim/changelog.md(1 hunks)plugins/maxim/version(1 hunks)plugins/mocker/changelog.md(1 hunks)plugins/mocker/version(1 hunks)plugins/otel/changelog.md(1 hunks)plugins/otel/version(1 hunks)plugins/semanticcache/changelog.md(1 hunks)plugins/semanticcache/version(1 hunks)plugins/telemetry/changelog.md(1 hunks)plugins/telemetry/version(1 hunks)tests/integrations/config.yml(6 hunks)tests/integrations/pyproject.toml(1 hunks)tests/integrations/tests/test_anthropic.py(5 hunks)tests/integrations/tests/test_bedrock.py(6 hunks)tests/integrations/tests/test_google.py(8 hunks)tests/integrations/tests/test_openai.py(5 hunks)tests/integrations/tests/utils/common.py(2 hunks)transports/changelog.md(1 hunks)
💤 Files with no reviewable changes (1)
- core/providers/gemini/chat.go
✅ Files skipped from review due to trivial changes (16)
- plugins/otel/version
- plugins/mocker/version
- plugins/otel/changelog.md
- plugins/governance/changelog.md
- plugins/mocker/changelog.md
- plugins/maxim/changelog.md
- plugins/semanticcache/changelog.md
- plugins/logging/version
- plugins/telemetry/changelog.md
- plugins/jsonparser/changelog.md
- core/version
- plugins/governance/version
- plugins/maxim/version
- framework/version
- plugins/telemetry/version
- plugins/logging/changelog.md
🚧 Files skipped from review as they are similar to previous changes (4)
- core/providers/gemini/utils.go
- tests/integrations/pyproject.toml
- core/providers/gemini/types.go
- tests/integrations/tests/test_bedrock.py
🧰 Additional context used
📓 Path-based instructions (1)
**
⚙️ CodeRabbit configuration file
always check the stack if there is one for the current PR. do not give localized reviews for the PR, always see all changes in the light of the whole stack of PRs (if there is a stack, if there is no stack you can continue to make localized suggestions/reviews)
Files:
plugins/jsonparser/versionplugins/semanticcache/versioncore/providers/cohere/chat.gocore/providers/gemini/responses.gotests/integrations/tests/utils/common.pycore/providers/vertex/vertex.gotests/integrations/tests/test_anthropic.pyframework/changelog.mdcore/providers/gemini/gemini.gotests/integrations/tests/test_openai.pytests/integrations/config.ymlcore/changelog.mdtests/integrations/tests/test_google.pytransports/changelog.md
🧠 Learnings (1)
📚 Learning: 2025-12-09T17:07:42.007Z
Learnt from: qwerty-dvorak
Repo: maximhq/bifrost PR: 1006
File: core/schemas/account.go:9-18
Timestamp: 2025-12-09T17:07:42.007Z
Learning: In core/schemas/account.go, the HuggingFaceKeyConfig field within the Key struct is currently unused and reserved for future Hugging Face inference endpoint deployments. Do not flag this field as missing from OpenAPI documentation or require its presence in the API spec until the feature is actively implemented and used. When the feature is added, update the OpenAPI docs accordingly; otherwise, treat this field as non-breaking and not part of the current API surface.
Applied to files:
core/providers/cohere/chat.gocore/providers/gemini/responses.gocore/providers/vertex/vertex.gocore/providers/gemini/gemini.go
🧬 Code graph analysis (6)
core/providers/cohere/chat.go (2)
core/providers/gemini/types.go (1)
Role(15-15)core/schemas/chatcompletions.go (1)
ChatToolMessage(621-623)
core/providers/vertex/vertex.go (2)
core/providers/utils/utils.go (2)
ShouldSendBackRawRequest(551-556)ShouldSendBackRawResponse(559-564)core/schemas/models.go (1)
Model(109-129)
tests/integrations/tests/test_anthropic.py (1)
tests/integrations/tests/utils/parametrize.py (2)
get_cross_provider_params_for_scenario(12-47)format_provider_model(50-65)
core/providers/gemini/gemini.go (1)
core/providers/utils/utils.go (2)
ShouldSendBackRawRequest(551-556)ParseAndSetRawRequest(447-454)
tests/integrations/tests/test_openai.py (1)
tests/integrations/tests/utils/parametrize.py (2)
get_cross_provider_params_for_scenario(12-47)format_provider_model(50-65)
tests/integrations/tests/test_google.py (1)
tests/integrations/tests/utils/parametrize.py (2)
get_cross_provider_params_for_scenario(12-47)format_provider_model(50-65)
🪛 LanguageTool
core/changelog.md
[grammar] ~4-~4: Use a hyphen to join words.
Context: ...api enhancement: improved internal inter provider conversions for integrations fe...
(QB_NEW_EN_HYPHEN)
transports/changelog.md
[grammar] ~5-~5: Use a hyphen to join words.
Context: ...api enhancement: improved internal inter provider conversions for integrations fe...
(QB_NEW_EN_HYPHEN)
🪛 Ruff (0.14.8)
tests/integrations/tests/test_anthropic.py
675-675: Unused method argument: test_config
(ARG002)
754-754: Unused method argument: test_config
(ARG002)
tests/integrations/tests/test_openai.py
1387-1387: Unused method argument: test_config
(ARG002)
tests/integrations/tests/test_google.py
891-891: Unused method argument: test_config
(ARG002)
967-967: Unused method argument: test_config
(ARG002)
1025-1025: Local variable has_thoughts is assigned to but never used
Remove assignment to unused variable has_thoughts
(F841)
⏰ Context from checks skipped due to timeout of 900000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
🔇 Additional comments (25)
tests/integrations/tests/utils/common.py (1)
563-568: Good: standardizes OpenAI content extraction viaget_content_string(handles string vs list blocks).
This should make the integration assertions more resilient as providers/models return structured content.plugins/semanticcache/version (1)
1-1: This is a new semantic cache plugin being added to the repository, not a version bump of an existing plugin.The file
plugins/semanticcache/version(with value1.3.47) is part of a complete new plugin being introduced in this PR. All files in the semantic cache directory are newly added (not modified), making1.3.47the initial release version. The changelog documents updates to the core (v1.2.37) and framework (v1.1.47) dependencies, which likely support the integration test improvements and Gemini provider enhancements mentioned in the PR objectives.Likely an incorrect or invalid review comment.
framework/changelog.md (1)
1-3: Changelog entry reorder looks fine.
No semantic change implied by this hunk.core/providers/vertex/vertex.go (2)
729-744: Good: Vertex→Gemini streaming now forwardssendBackRawRequestexplicitly.
This aligns Vertex streaming behavior with Gemini’s updated stream handler signatures (raw-request capture at stream end).
1217-1232: Good: Vertex→Gemini responses streaming now forwardssendBackRawRequestexplicitly.
Same rationale as chat streaming; keeps raw-request propagation consistent across endpoints.core/providers/gemini/gemini.go (4)
476-491: Good: attach raw request only on the terminal stream chunk.
Avoids duplicating heavy payloads on every SSE event.
591-606: Good: same raw-request threading applied to Responses streaming path.Also applies to: 608-623
772-794: Good: raw request is attached to the completed response only.
292-307: Signature change (+call-site) looks consistent:sendBackRawRequestis threaded through streaming handlers.All four call sites match the function signatures correctly:
gemini.go:293andgemini.go:591pass the parameters in correct order withnilforpostResponseConvertervertex.go:730andvertex.go:1218pass the parameters with actualpostResponseConverterimplementations- Both
sendBackRawRequestandsendBackRawResponseare consistently positioned as the 7th and 8th parameters across all handlerscore/providers/gemini/responses.go (4)
730-739: LGTM! Logical reordering of part processing.Prioritizing
Thoughtbefore regularTextandFunctionCallbeforeThoughtSignaturealigns with proper processing order for Gemini's extended thinking responses. This ensures reasoning content is emitted before the final text output.
1679-1693: LGTM! Added explicit handling for regular text parts.The new
case part.Text != ""properly creates aResponsesMessagewith appropriate type and role for non-thought text content, complementing the reordered switch statement inprocessGeminiPart.
1726-1726: Good fix: Base64 encoding for thought signatures.Encoding the binary
ThoughtSignatureto base64 is necessary for proper JSON serialization and transport. This aligns with the PR objective of improving Gemini 3 Pro compatibility.
1902-1903: Mutual exclusivity of Effort and MaxTokens is correctly enforced.The guard condition correctly prevents setting
ThinkingLevelwhenMaxTokensis specified. Gemini's API requiresthinkingLevelandthinkingBudget(MaxTokens) to be mutually exclusive—supplying both will cause an API error.tests/integrations/config.yml (4)
56-56: LGTM! Anthropic thinking model configuration.
claude-opus-4-5is correctly configured for extended thinking capabilities.
67-67: LGTM! Gemini thinking model configuration.
gemini-3-pro-previewaligns with the PR objective of improving Gemini 3 Pro reasoning capabilities.
125-125: LGTM! Enabled thinking scenarios for OpenAI and Gemini.Adding
thinking: trueenables the extended thinking tests for these providers, consistent with the new model configurations.Also applies to: 175-175
44-44: > Likely an incorrect or invalid review comment.tests/integrations/tests/test_google.py (4)
478-478: LGTM! Increased token limits for generation tests.Higher
max_output_tokensvalues (1000, 500) provide more buffer for varied response lengths.Also applies to: 505-506
890-891: LGTM! Cross-provider parameterization for extended thinking tests.Using
get_cross_provider_params_for_scenario("thinking")enables running these tests across all providers that support thinking capabilities, consistent with the config.yml changes.
905-907: Verifythinking_budgetandmax_output_tokensvalues.The
thinking_budget=2000andmax_output_tokens=2500values should be validated against provider limits. Ensuremax_output_tokensis sufficient to accommodate both thinking tokens and response tokens.
921-936: LGTM! Provider-conditional thoughts assertion.Wrapping the
thoughts_token_countassertion inif provider == "gemini"correctly avoids failures for non-Gemini providers that may not expose thinking tokens in usage metadata.tests/integrations/tests/test_anthropic.py (4)
278-278: LGTM: Appropriate token budget for end-to-end tool calling.The increase from 100 to 500 max_tokens is reasonable for end-to-end tool calling scenarios that involve multiple turns and tool response processing.
602-602: LGTM: Adequate token budget for streaming tests.The increase to 1000 max_tokens is appropriate for streaming tests to ensure substantial content for proper validation of streaming functionality.
Also applies to: 624-624
762-765: LGTM: Appropriate token budgets for streaming thinking test.The max_tokens of 4000 and budget_tokens of 2000 are reasonable for streaming thinking scenarios. The reduction from the non-streaming test's 2500 budget_tokens is appropriate given streaming's incremental nature.
687-689: Verify that reasoning_summary parameter is properly supported across all "thinking" scenario providers.The
reasoning_summaryparameter inextra_bodyis supported by multiple providers in the Bifrost backend (Anthropic, Bedrock, Gemini) and is properly extracted and converted during request handling. The test correctly routes to different providers using the "provider/model" format viaformat_provider_model().However, verify at runtime that:
- All providers in the "thinking" scenario (openai, anthropic, gemini, bedrock) correctly handle the
reasoning_summaryparameter without errors- The test's use of
@skip_if_no_api_key("anthropic")is appropriate for a multi-provider parametrized test- Consider clarifying test architecture: using
anthropic_clientfixture for non-Anthropic providers through Bifrost may work but is architecturally unclear
9fe9555 to
aa6e9b3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 5
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
core/providers/gemini/responses.go (2)
235-239: Critical: decode base64EncryptedContentwhen mapping back to GeminiThoughtSignature.Right now you base64-encode thought signatures (Gemini → Responses), but you don’t base64-decode on the reverse path (Responses → Gemini). That will break Gemini 3 Pro “thought signature” roundtrips.
@@ - if msg.ResponsesReasoning != nil && msg.ResponsesReasoning.EncryptedContent != nil { - part.ThoughtSignature = []byte(*msg.ResponsesReasoning.EncryptedContent) - } + if msg.ResponsesReasoning != nil && msg.ResponsesReasoning.EncryptedContent != nil { + if decoded, err := base64.StdEncoding.DecodeString(*msg.ResponsesReasoning.EncryptedContent); err == nil { + part.ThoughtSignature = decoded + } else { + // Back-compat fallback if upstream ever sends non-base64 + part.ThoughtSignature = []byte(*msg.ResponsesReasoning.EncryptedContent) + } + } @@ - if msg.ResponsesReasoning.EncryptedContent != nil { - currentParts = append(currentParts, &Part{ - ThoughtSignature: []byte(*msg.ResponsesReasoning.EncryptedContent), - }) - } + if msg.ResponsesReasoning.EncryptedContent != nil { + if decoded, err := base64.StdEncoding.DecodeString(*msg.ResponsesReasoning.EncryptedContent); err == nil { + currentParts = append(currentParts, &Part{ThoughtSignature: decoded}) + } else { + currentParts = append(currentParts, &Part{ThoughtSignature: []byte(*msg.ResponsesReasoning.EncryptedContent)}) + } + } @@ - if bifrostResp.Item != nil && bifrostResp.Item.ResponsesReasoning != nil && bifrostResp.Item.EncryptedContent != nil { + if bifrostResp.Item != nil && bifrostResp.Item.ResponsesReasoning != nil && bifrostResp.Item.EncryptedContent != nil { + thoughtSigBytes, err := base64.StdEncoding.DecodeString(*bifrostResp.Item.ResponsesReasoning.EncryptedContent) + if err != nil { + thoughtSigBytes = []byte(*bifrostResp.Item.ResponsesReasoning.EncryptedContent) + } candidate.Content.Parts = append(candidate.Content.Parts, &Part{ - ThoughtSignature: []byte(*bifrostResp.Item.ResponsesReasoning.EncryptedContent), + ThoughtSignature: thoughtSigBytes, }) } @@ - part.ThoughtSignature = []byte(*nextMsg.ResponsesReasoning.EncryptedContent) + if decoded, err := base64.StdEncoding.DecodeString(*nextMsg.ResponsesReasoning.EncryptedContent); err == nil { + part.ThoughtSignature = decoded + } else { + part.ThoughtSignature = []byte(*nextMsg.ResponsesReasoning.EncryptedContent) + } } }Also applies to: 285-289, 423-427, 2161-2169
1860-1876: Model-version-aware parameter selection required: Gemini 3 and 2.5 have incompatible thinking parameters.The conditional logic (setting Effort only when MaxTokens is nil) prevents both from being mapped in the same request, but the code lacks model-version detection. Gemini 3 uses
thinkingLevelexclusively and rejects requests withthinkingBudget(returns 400 error); Gemini 2.5 usesthinkingBudget. Without checking the model version at the call site inToGeminiResponsesRequest(wherebifrostReq.Modelis available), the code may send incompatible parameters to either model version. Add model-version-aware branching to mapEffort→thinkingLevelfor Gemini 3 only andMaxTokens→thinkingBudgetfor Gemini 2.5 only.
♻️ Duplicate comments (3)
plugins/jsonparser/version (1)
1-1: Version bump looks correct, but verify transports/go.mod update.The version bump to 1.3.48 is coordinated with other plugin updates. However, a previous review flagged a critical issue where
transports/go.modstill references governance and logging at v1.3.47. Ensure thattransports/go.modhas been updated to reference v1.3.48 for all affected plugins before merging.#!/bin/bash # Verify that transports/go.mod references the updated plugin versions echo "=== Checking transports/go.mod for plugin version references ===" rg -n "plugins/(governance|jsonparser|logging)" transports/go.mod echo "" echo "=== Checking for any v1.3.47 references in transports/go.mod ===" rg -n "v1\.3\.47" transports/go.mod echo "" echo "=== Checking for v1.3.48 references in transports/go.mod ===" rg -n "v1\.3\.48" transports/go.modtransports/changelog.md (1)
5-5: Hyphenate "inter-provider" for proper compound modifier format.A previous review correctly identified that "inter provider" should be hyphenated as "inter-provider" per standard English grammar rules for compound modifiers.
Apply this diff:
-enhancement: improved internal inter provider conversions for integrations +enhancement: improved internal inter-provider conversions for integrationscore/changelog.md (1)
4-4: Hyphenate “inter-provider” in changelog entry.-enhancement: improved internal inter provider conversions for integrations +enhancement: improved internal inter-provider conversions for integrations
🧹 Nitpick comments (6)
tests/integrations/tests/test_bedrock.py (1)
215-215: Token budget increases align with thinking/reasoning support.The uniform increase to 500 tokens across these cross-provider tests is appropriate for the PR's addition of thinking/reasoning capabilities, which require additional output tokens.
Consider adding a constant or comment documenting the token budget choice:
# Token budget for tests with thinking/reasoning enabled INFERENCE_MAX_TOKENS = 500Then reference it in inferenceConfig:
inferenceConfig={"maxTokens": INFERENCE_MAX_TOKENS}Also applies to: 254-254, 300-300, 689-689, 730-730
tests/integrations/tests/utils/common.py (1)
1878-1891: Hardenget_content_stringto handle list-of-strings (and avoid silently dropping content).def get_content_string(content: Any) -> str: """Get a string representation of content""" if isinstance(content, str): return content elif isinstance(content, list): parts: List[str] = [] for c in content: + if isinstance(c, str): + parts.append(c) + continue if isinstance(c, dict): parts.append(c.get("text", "")) elif hasattr(c, "text"): parts.append(getattr(c, "text") or "") return " ".join(filter(None, parts)) else: return ""tests/integrations/tests/test_google.py (2)
890-936: Cross-provider parameterization correctly implemented.The test now runs across multiple providers with appropriate Gemini-specific gating for thoughts assertions. Token budgets have been reduced to accommodate smaller context windows.
The
test_configparameter is unused throughout this test. Consider removing it:- def test_26_extended_thinking(self, google_client, test_config, provider, model): + def test_26_extended_thinking(self, google_client, provider, model):
966-1031: Cross-provider streaming test correctly implemented.The streaming test mirrors the non-streaming test's cross-provider pattern with appropriate Gemini-specific gating for thoughts token count assertions.
The
test_configparameter is unused throughout this test. Consider removing it:- def test_27_extended_thinking_streaming(self, google_client, test_config, provider, model): + def test_27_extended_thinking_streaming(self, google_client, provider, model):tests/integrations/tests/test_anthropic.py (2)
667-686: Cross-provider thinking test correctly implemented.The test now supports multiple providers with appropriate token budget reductions. The addition of
reasoning_summaryinextra_bodytests Gemini's reasoning summary parameter support mentioned in the PR objectives.The
test_configparameter is unused throughout this test. Consider removing it:- def test_15_extended_thinking(self, anthropic_client, test_config, provider, model): + def test_15_extended_thinking(self, anthropic_client, provider, model):
747-763: Cross-provider streaming thinking test correctly implemented.The streaming test mirrors the non-streaming test's cross-provider pattern with appropriate token budget reductions for smaller context models.
The
test_configparameter is unused throughout this test. Consider removing it:- def test_16_extended_thinking_streaming(self, anthropic_client, test_config, provider, model): + def test_16_extended_thinking_streaming(self, anthropic_client, provider, model):
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
tests/integrations/uv.lockis excluded by!**/*.lock
📒 Files selected for processing (34)
core/changelog.md(1 hunks)core/providers/cohere/chat.go(1 hunks)core/providers/gemini/chat.go(0 hunks)core/providers/gemini/gemini.go(7 hunks)core/providers/gemini/responses.go(18 hunks)core/providers/gemini/types.go(3 hunks)core/providers/gemini/utils.go(2 hunks)core/providers/vertex/vertex.go(3 hunks)core/version(1 hunks)framework/version(1 hunks)plugins/governance/changelog.md(1 hunks)plugins/governance/version(1 hunks)plugins/jsonparser/changelog.md(1 hunks)plugins/jsonparser/version(1 hunks)plugins/logging/changelog.md(1 hunks)plugins/logging/version(1 hunks)plugins/maxim/changelog.md(1 hunks)plugins/maxim/version(1 hunks)plugins/mocker/changelog.md(1 hunks)plugins/mocker/version(1 hunks)plugins/otel/changelog.md(1 hunks)plugins/otel/version(1 hunks)plugins/semanticcache/changelog.md(1 hunks)plugins/semanticcache/version(1 hunks)plugins/telemetry/changelog.md(1 hunks)plugins/telemetry/version(1 hunks)tests/integrations/config.yml(6 hunks)tests/integrations/pyproject.toml(1 hunks)tests/integrations/tests/test_anthropic.py(5 hunks)tests/integrations/tests/test_bedrock.py(6 hunks)tests/integrations/tests/test_google.py(8 hunks)tests/integrations/tests/test_openai.py(5 hunks)tests/integrations/tests/utils/common.py(2 hunks)transports/changelog.md(1 hunks)
💤 Files with no reviewable changes (1)
- core/providers/gemini/chat.go
✅ Files skipped from review due to trivial changes (2)
- plugins/jsonparser/changelog.md
- plugins/mocker/version
🚧 Files skipped from review as they are similar to previous changes (16)
- plugins/logging/version
- core/providers/gemini/utils.go
- plugins/semanticcache/changelog.md
- framework/version
- plugins/maxim/changelog.md
- plugins/otel/changelog.md
- plugins/governance/changelog.md
- plugins/logging/changelog.md
- core/providers/gemini/types.go
- core/providers/cohere/chat.go
- plugins/maxim/version
- core/providers/vertex/vertex.go
- plugins/telemetry/version
- plugins/otel/version
- plugins/telemetry/changelog.md
- tests/integrations/config.yml
🧰 Additional context used
📓 Path-based instructions (1)
**
⚙️ CodeRabbit configuration file
always check the stack if there is one for the current PR. do not give localized reviews for the PR, always see all changes in the light of the whole stack of PRs (if there is a stack, if there is no stack you can continue to make localized suggestions/reviews)
Files:
plugins/mocker/changelog.mdtests/integrations/tests/test_openai.pycore/providers/gemini/gemini.gotests/integrations/tests/test_bedrock.pytests/integrations/tests/test_google.pytests/integrations/pyproject.tomlcore/versionplugins/semanticcache/versionplugins/governance/versionplugins/jsonparser/versioncore/providers/gemini/responses.gotests/integrations/tests/test_anthropic.pytests/integrations/tests/utils/common.pycore/changelog.mdtransports/changelog.md
🧠 Learnings (1)
📚 Learning: 2025-12-09T17:07:42.007Z
Learnt from: qwerty-dvorak
Repo: maximhq/bifrost PR: 1006
File: core/schemas/account.go:9-18
Timestamp: 2025-12-09T17:07:42.007Z
Learning: In core/schemas/account.go, the HuggingFaceKeyConfig field within the Key struct is currently unused and reserved for future Hugging Face inference endpoint deployments. Do not flag this field as missing from OpenAPI documentation or require its presence in the API spec until the feature is actively implemented and used. When the feature is added, update the OpenAPI docs accordingly; otherwise, treat this field as non-breaking and not part of the current API surface.
Applied to files:
core/providers/gemini/gemini.gocore/providers/gemini/responses.go
🧬 Code graph analysis (5)
tests/integrations/tests/test_openai.py (1)
tests/integrations/tests/utils/parametrize.py (2)
get_cross_provider_params_for_scenario(12-47)format_provider_model(50-65)
core/providers/gemini/gemini.go (1)
core/providers/utils/utils.go (2)
ShouldSendBackRawRequest(551-556)ParseAndSetRawRequest(447-454)
tests/integrations/tests/test_google.py (1)
tests/integrations/tests/utils/parametrize.py (2)
get_cross_provider_params_for_scenario(12-47)format_provider_model(50-65)
core/providers/gemini/responses.go (4)
core/providers/gemini/types.go (5)
Role(15-15)FunctionCall(1147-1157)FunctionResponse(1160-1182)Content(978-986)Type(783-783)core/schemas/responses.go (3)
ResponsesInputMessageRoleUser(337-337)ResponsesMessage(318-331)ResponsesMessageContent(343-348)ui/lib/types/logs.ts (3)
FunctionCall(157-160)ResponsesMessage(423-438)ResponsesMessageContent(400-400)core/schemas/utils.go (1)
Ptr(16-18)
tests/integrations/tests/test_anthropic.py (1)
tests/integrations/tests/utils/parametrize.py (2)
get_cross_provider_params_for_scenario(12-47)format_provider_model(50-65)
🪛 LanguageTool
core/changelog.md
[grammar] ~4-~4: Use a hyphen to join words.
Context: ...api enhancement: improved internal inter provider conversions for integrations fe...
(QB_NEW_EN_HYPHEN)
transports/changelog.md
[grammar] ~5-~5: Use a hyphen to join words.
Context: ...api enhancement: improved internal inter provider conversions for integrations fe...
(QB_NEW_EN_HYPHEN)
🪛 Ruff (0.14.8)
tests/integrations/tests/test_openai.py
1387-1387: Unused method argument: test_config
(ARG002)
tests/integrations/tests/test_google.py
891-891: Unused method argument: test_config
(ARG002)
967-967: Unused method argument: test_config
(ARG002)
tests/integrations/tests/test_anthropic.py
668-668: Unused method argument: test_config
(ARG002)
748-748: Unused method argument: test_config
(ARG002)
⏰ Context from checks skipped due to timeout of 900000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Graphite / mergeability_check
🔇 Additional comments (14)
plugins/semanticcache/version (1)
1-1: No review needed for this file—it contains no changes in this PR.Likely an incorrect or invalid review comment.
plugins/mocker/changelog.md (1)
1-1: LGTM!The changelog entry accurately documents the coordinated version bumps to core 1.2.37 and framework 1.1.47, consistent with the broader version alignment across the PR stack.
plugins/governance/version (1)
1-1: LGTM!Version bump to 1.3.48 is properly coordinated with other plugin updates in the stack (jsonparser, logging also bumped to 1.3.48).
core/version (1)
1-1: LGTM!Core version bump to 1.2.37 is properly coordinated with framework and plugin updates across the stack.
transports/changelog.md (1)
6-7: LGTM!The new changelog entries accurately document the Gemini native API integration and fallback handling for custom models, aligning well with the PR objectives and broader stack changes.
tests/integrations/pyproject.toml (1)
35-35: The google-genai version constraint is valid. Version 1.50.0 exists on PyPI and has been released and patched (1.50.1), confirming it is a stable version. The constraint>=1.50.0appropriately enables support for Gemini 3 Pro and thinking/reasoning capabilities as described in the PR.tests/integrations/tests/utils/common.py (1)
563-568: Good: unified OpenAI content extraction to handle string-or-list shapes.Also applies to: 619-620
core/providers/gemini/gemini.go (2)
246-249: Good fix:ParseAndSetRawRequestavoids[]bytebeing JSON-marshaled as base64.
300-301: LGTM: raw-request flag plumbed through streaming and applied once at stream end.Also applies to: 318-319, 484-486, 598-599, 616-617, 787-789
core/providers/gemini/responses.go (2)
720-735: Good: part processing order prioritizes thought → text and function_call before thought_signature.
1641-1656: Good: emit regular text parts separately after thought parts in Responses output conversion.tests/integrations/tests/test_google.py (2)
478-478: LGTM! Token limits updated appropriately.The increased token limits align with the test objectives for generation configuration and system instruction tests.
Also applies to: 505-505
1055-1076: LGTM! Well-designed helper function.The function includes proper type checking, error handling, and clear documentation. The defensive programming approach with
hasattrchecks prevents attribute errors.tests/integrations/tests/test_anthropic.py (1)
277-277: LGTM! Token limits increased appropriately.The increased token limits accommodate more complex test scenarios:
- End-to-end tool calling flows require more tokens for complete conversation cycles
- Streaming tests benefit from larger token budgets for validation
Also applies to: 597-597, 619-619
aa6e9b3 to
c93b41a
Compare
fd14a0e to
3154215
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 4
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (3)
core/providers/vertex/vertex.go (1)
501-525: Bug risk: Gemini non-streaming uses provider defaults for raw capture, but attaches based on ctx override.
In Gemini non-streaming branches,HandleProviderResponse(..., provider.sendBackRawRequest, provider.sendBackRawResponse)can producerawRequest/rawResponseinconsistently with the laterShouldSendBackRawRequest/ShouldSendBackRawResponsechecks. Also,ModelDeploymentis assigned twice in the Gemini Responses branch.@@ - rawRequest, rawResponse, bifrostErr := providerUtils.HandleProviderResponse(resp.Body(), &geminiResponse, jsonBody, provider.sendBackRawRequest, provider.sendBackRawResponse) + sendBackRawReq := providerUtils.ShouldSendBackRawRequest(ctx, provider.sendBackRawRequest) + sendBackRawResp := providerUtils.ShouldSendBackRawResponse(ctx, provider.sendBackRawResponse) + rawRequest, rawResponse, bifrostErr := providerUtils.HandleProviderResponse(resp.Body(), &geminiResponse, jsonBody, sendBackRawReq, sendBackRawResp) @@ - if providerUtils.ShouldSendBackRawRequest(ctx, provider.sendBackRawRequest) { + if sendBackRawReq { response.ExtraFields.RawRequest = rawRequest } - if providerUtils.ShouldSendBackRawResponse(ctx, provider.sendBackRawResponse) { + if sendBackRawResp { response.ExtraFields.RawResponse = rawResponse } @@ - if request.Model != deployment { - response.ExtraFields.ModelDeployment = deployment - } + // (remove duplicate ModelDeployment assignment)Also applies to: 1011-1039
tests/integrations/tests/test_anthropic.py (1)
265-311: Asserttool_use_idis found before sendingtool_result(avoid opaque downstream failures).tool_use_id = None for content in response.content: if content.type == "tool_use": tool_use_id = content.id break + assert tool_use_id is not None, "Expected a tool_use block with an id"core/providers/gemini/responses.go (1)
147-195: Fix double-base64 encoding ofThoughtSignaturewhen round-tripping viaEncryptedContent.Lines 237 and 2168 assign
[]byte(*EncryptedContent)directly topart.ThoughtSignature. SinceEncryptedContentis a base64-encoded string andThoughtSignatureis a[]bytefield, the JSON marshaler will base64-encode it again during serialization, corrupting the signature for Gemini 3 Pro flows.Decode the base64 string before assigning to
ThoughtSignature:+import "encoding/base64" +import "strings" ... - part.ThoughtSignature = []byte(*msg.ResponsesReasoning.EncryptedContent) + enc := strings.TrimSpace(*msg.ResponsesReasoning.EncryptedContent) + if decoded, err := base64.StdEncoding.DecodeString(enc); err == nil { + part.ThoughtSignature = decoded + } else { + // fallback (keeps backward-compat if EncryptedContent was not base64) + part.ThoughtSignature = []byte(enc) + }Apply the same fix at line 2168 in
convertResponsesMessagesToGeminiContents.
♻️ Duplicate comments (3)
transports/changelog.md (1)
5-5: Hyphenate “inter-provider” for readability/consistency (duplicate).-enhancement: improved internal inter provider conversions for integrations +enhancement: improved internal inter-provider conversions for integrationscore/changelog.md (1)
4-4: Unresolved: Hyphenate "inter-provider" for consistency.This issue was flagged in a previous review but remains unfixed. Apply this correction:
-enhancement: improved internal inter provider conversions for integrations +enhancement: improved internal inter-provider conversions for integrationstests/integrations/tests/test_openai.py (1)
1385-1391: Parametrized “thinking” test should skip the_no_providers_sentinel + mark unused fixture.
This matches earlier review feedback; without it, the test can run in “no providers configured” mode and Ruff will keep flaggingtest_config.@@ @skip_if_no_api_key("openai") @pytest.mark.parametrize("provider,model", get_cross_provider_params_for_scenario("thinking")) -def test_38_responses_reasoning(self, openai_client, test_config, provider, model): +def test_38_responses_reasoning(self, openai_client, _test_config, provider, model): + if provider == "_no_providers_" or model == "_no_model_": + pytest.skip("No providers configured for this scenario") """Test Case 38: Responses API with reasoning (gpt-5 model)"""
🧹 Nitpick comments (3)
tests/integrations/tests/utils/common.py (1)
1878-1891:get_content_stringis a solid, safe helper; consider one extra fallback.
If you expect SDK objects/dicts where the text is nested (e.g.,{"type":"text","text":"..."}already handled; but some SDKs may nest under another key), you may want to extend parsing later—but current behavior is safe (returns"").tests/integrations/tests/test_openai.py (2)
1131-1135:max_output_tokens=1000may make CI slower/flakier than needed.
Consider tightening the prompts + lowering token caps (or making caps configurable) to reduce latency/cost/rate-limit noise in cross-provider CI.Also applies to: 1166-1170, 1198-1202, 1294-1299
1392-1524: Cross-provider “reasoning” fallback may miss some unsupported-parameter failures.
Right now the fallback only triggers when the exception string contains"reasoning"or"not supported". For non-OpenAI providers, failures may mentioninclude,unknown field, etc. Consider broadening the fallback condition (or explicitly skipping providers that don’t advertise this capability) to reduce false negatives.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
tests/integrations/uv.lockis excluded by!**/*.lock
📒 Files selected for processing (35)
core/changelog.md(1 hunks)core/providers/cohere/chat.go(1 hunks)core/providers/gemini/chat.go(0 hunks)core/providers/gemini/gemini.go(7 hunks)core/providers/gemini/responses.go(23 hunks)core/providers/gemini/types.go(3 hunks)core/providers/gemini/utils.go(3 hunks)core/providers/vertex/vertex.go(3 hunks)core/version(1 hunks)framework/changelog.md(1 hunks)framework/version(1 hunks)plugins/governance/changelog.md(1 hunks)plugins/governance/version(1 hunks)plugins/jsonparser/changelog.md(1 hunks)plugins/jsonparser/version(1 hunks)plugins/logging/changelog.md(1 hunks)plugins/logging/version(1 hunks)plugins/maxim/changelog.md(1 hunks)plugins/maxim/version(1 hunks)plugins/mocker/changelog.md(1 hunks)plugins/mocker/version(1 hunks)plugins/otel/changelog.md(1 hunks)plugins/otel/version(1 hunks)plugins/semanticcache/changelog.md(1 hunks)plugins/semanticcache/version(1 hunks)plugins/telemetry/changelog.md(1 hunks)plugins/telemetry/version(1 hunks)tests/integrations/config.yml(6 hunks)tests/integrations/pyproject.toml(1 hunks)tests/integrations/tests/test_anthropic.py(5 hunks)tests/integrations/tests/test_bedrock.py(6 hunks)tests/integrations/tests/test_google.py(8 hunks)tests/integrations/tests/test_openai.py(5 hunks)tests/integrations/tests/utils/common.py(2 hunks)transports/changelog.md(1 hunks)
💤 Files with no reviewable changes (1)
- core/providers/gemini/chat.go
✅ Files skipped from review due to trivial changes (5)
- plugins/otel/changelog.md
- plugins/governance/version
- plugins/mocker/changelog.md
- plugins/telemetry/changelog.md
- plugins/maxim/version
🚧 Files skipped from review as they are similar to previous changes (12)
- core/providers/gemini/utils.go
- framework/version
- plugins/logging/version
- plugins/telemetry/version
- framework/changelog.md
- plugins/jsonparser/version
- tests/integrations/pyproject.toml
- tests/integrations/tests/test_bedrock.py
- core/providers/cohere/chat.go
- core/version
- plugins/semanticcache/version
- plugins/logging/changelog.md
🧰 Additional context used
📓 Path-based instructions (1)
**
⚙️ CodeRabbit configuration file
always check the stack if there is one for the current PR. do not give localized reviews for the PR, always see all changes in the light of the whole stack of PRs (if there is a stack, if there is no stack you can continue to make localized suggestions/reviews)
Files:
core/providers/vertex/vertex.gotests/integrations/tests/utils/common.pyplugins/maxim/changelog.mdplugins/jsonparser/changelog.mdcore/changelog.mdcore/providers/gemini/gemini.gotests/integrations/tests/test_openai.pyplugins/otel/versiontests/integrations/tests/test_google.pytests/integrations/config.ymlplugins/semanticcache/changelog.mdplugins/governance/changelog.mdcore/providers/gemini/types.gocore/providers/gemini/responses.goplugins/mocker/versiontests/integrations/tests/test_anthropic.pytransports/changelog.md
🧠 Learnings (1)
📚 Learning: 2025-12-09T17:07:42.007Z
Learnt from: qwerty-dvorak
Repo: maximhq/bifrost PR: 1006
File: core/schemas/account.go:9-18
Timestamp: 2025-12-09T17:07:42.007Z
Learning: In core/schemas/account.go, the HuggingFaceKeyConfig field within the Key struct is currently unused and reserved for future Hugging Face inference endpoint deployments. Do not flag this field as missing from OpenAPI documentation or require its presence in the API spec until the feature is actively implemented and used. When the feature is added, update the OpenAPI docs accordingly; otherwise, treat this field as non-breaking and not part of the current API surface.
Applied to files:
core/providers/vertex/vertex.gocore/providers/gemini/gemini.gocore/providers/gemini/types.gocore/providers/gemini/responses.go
🧬 Code graph analysis (5)
core/providers/vertex/vertex.go (2)
core/providers/utils/utils.go (2)
ShouldSendBackRawRequest(551-556)ShouldSendBackRawResponse(559-564)core/schemas/models.go (1)
Model(109-129)
core/providers/gemini/gemini.go (1)
core/providers/utils/utils.go (2)
ShouldSendBackRawRequest(551-556)ParseAndSetRawRequest(447-454)
tests/integrations/tests/test_openai.py (1)
tests/integrations/tests/utils/parametrize.py (2)
get_cross_provider_params_for_scenario(12-47)format_provider_model(50-65)
core/providers/gemini/responses.go (3)
core/providers/gemini/types.go (5)
Role(15-15)FunctionCall(1147-1157)FunctionResponse(1160-1182)Content(978-986)Type(783-783)core/schemas/responses.go (7)
ResponsesInputMessageRoleUser(337-337)ResponsesMessage(318-331)ResponsesInputMessageRoleAssistant(336-336)ResponsesMessageContent(343-348)ResponsesMessageContentBlock(403-415)ResponsesOutputMessageContentTypeText(396-396)ResponsesMessageTypeMessage(294-294)core/schemas/utils.go (1)
Ptr(16-18)
tests/integrations/tests/test_anthropic.py (1)
tests/integrations/tests/utils/parametrize.py (2)
get_cross_provider_params_for_scenario(12-47)format_provider_model(50-65)
🪛 LanguageTool
core/changelog.md
[grammar] ~4-~4: Use a hyphen to join words.
Context: ...api enhancement: improved internal inter provider conversions for integrations fe...
(QB_NEW_EN_HYPHEN)
transports/changelog.md
[grammar] ~5-~5: Use a hyphen to join words.
Context: ...api enhancement: improved internal inter provider conversions for integrations fe...
(QB_NEW_EN_HYPHEN)
🪛 Ruff (0.14.8)
tests/integrations/tests/test_openai.py
1387-1387: Unused method argument: test_config
(ARG002)
tests/integrations/tests/test_google.py
891-891: Unused method argument: test_config
(ARG002)
967-967: Unused method argument: test_config
(ARG002)
tests/integrations/tests/test_anthropic.py
668-668: Unused method argument: test_config
(ARG002)
748-748: Unused method argument: test_config
(ARG002)
⏰ Context from checks skipped due to timeout of 900000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (9)
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
🔇 Additional comments (16)
plugins/mocker/version (1)
1-1: Version bump looks fine; verify stack-wide version coherence.Given this PR is part of a Graphite stack, please ensure
plugins/mocker1.3.47is consistent with the other version bumps in the stack (core/framework) and any release automation that consumes these version files.plugins/maxim/changelog.md (1)
1-1: Changelog entry looks fine; just ensure it matches the file’s convention/section placement.
Given this is a single list item, please confirm it’s under the correct heading (e.g., Unreleased / version block) and ordering style used elsewhere inplugins/maxim/changelog.md.plugins/governance/changelog.md (1)
1-1: Changelog entry looks fine—please verify it matches the stacked version bumps and placement conventions.
Since this PR is part of a Graphite stack, double-check thatcore 1.2.37/framework 1.1.47are the exact versions bumped in the stack and that this bullet is in the correct section/order forplugins/governance/changelog.md.core/changelog.md (1)
5-6: LGTM!The new entries are well-formed and properly reflect the Gemini provider enhancements in this PR.
plugins/jsonparser/changelog.md (1)
1-1: Confirm changelog scope: version-only vs. substantive changes.The entry documents version bumps (
core 1.2.37andframework 1.1.47) but does not capture the substantive features and fixes mentioned in the PR objectives (e.g., Gemini thinking/reasoning support, integration test fixes, role mapping improvements). Verify whether this is intentional—i.e., whetherplugins/jsonparser/changelog.mdis meant to track only version updates for the jsonparser plugin, with provider/core changes documented in separate changelogs. If broader changelog coverage is expected here, the entry should be expanded to reflect the actual changes.plugins/otel/version (1)
1-1: LGTM! Version bump is appropriate.The patch version increment (1.0.46 → 1.0.47) aligns with the bug fixes and integration test improvements in this PR.
tests/integrations/tests/utils/common.py (1)
564-568: Good normalization of OpenAI “content” (string vs list).
Switching toget_content_string(...)removes a bunch of edge cases vs(content or "")and should make the assertions more robust across providers.Also applies to: 619-620
core/providers/gemini/types.go (1)
920-958: Dual-casethinkingConfigunmarshalling looks correct.
Preferring camelCase and falling back to snake_case is the right approach for “GenAI SDK vs Gemini API” compatibility.core/providers/vertex/vertex.go (1)
729-745: Nice: Vertex Gemini streaming now respects ctx override for raw request capture.
ThreadingproviderUtils.ShouldSendBackRawRequest(...)into the shared Gemini streaming handlers aligns with the “raw request” feature behavior in other paths.Also applies to: 1218-1233
core/providers/gemini/gemini.go (2)
246-249: Good fix:ParseAndSetRawRequestpreserves JSON “object-like” shape vs base64-encoded bytes.
This aligns ChatCompletion RawRequest behavior with other providers/operations.
293-307: Streaming raw-request propagation looks consistent; final-chunk attachment is a good tradeoff.
Flag threading +ParseAndSetRawRequest(&response.ExtraFields, jsonBody)on the terminal event is coherent and avoids noisy per-chunk payloads.Also applies to: 318-320, 483-489, 591-605, 616-618, 786-789
tests/integrations/config.yml (1)
44-45: Please verify the “thinking” model IDs exist + are accessible in CI (preview model names can break quickly).
If any are not universally available, consider an env-override (or a fallback list) to keep the suite stable.Also applies to: 56-57, 67-68, 83-84, 125-126, 175-176
core/providers/gemini/responses.go (2)
523-525: Nice: monotonicoutput_index+ cachedTextOutputIndexshould prevent collisions with “thoughts before text”.
The helper methods make the stream lifecycle much easier to reason about.Also applies to: 553-554, 568-569, 603-604, 616-640, 720-730, 753-795, 820-827, 1282-1284
1849-1877: ThinkingConfig mapping change is sensible (avoid conflicting “effort” vs max-tokens intent).
GuardingThinkingLevelwhenReasoning.MaxTokensis provided should reduce surprising configs.tests/integrations/tests/test_anthropic.py (1)
667-686:extra_body/reasoning_summaryusage: please confirm Anthropic SDK + Bifrost passthrough compatibility.
This is a good test, but it’s tightly coupled to SDK request-shape support.tests/integrations/tests/test_google.py (1)
889-937: Thinking tests look solid; please confirm Google GenAI SDK field names are correct for the pinned version.
In particular:types.ThinkingConfig(... thinking_budget=...)and streamingusage_metadata.thoughts_token_countavailability.Also applies to: 965-1032
59dee28 to
bbd6304
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
core/providers/gemini/responses.go (2)
235-239: Critical:EncryptedContentis base64 text; it must be decoded before assigningPart.ThoughtSignature.
Right now you’re doing[]byte(*EncryptedContent), which passes ASCII base64 bytes to Gemini instead of the original signature bytes (breaks Gemini 3 Pro thought-signature handling).@@ - if msg.ResponsesReasoning != nil && msg.ResponsesReasoning.EncryptedContent != nil { - part.ThoughtSignature = []byte(*msg.ResponsesReasoning.EncryptedContent) - } + if msg.ResponsesReasoning != nil && msg.ResponsesReasoning.EncryptedContent != nil { + if decoded, err := base64.StdEncoding.DecodeString(*msg.ResponsesReasoning.EncryptedContent); err == nil { + part.ThoughtSignature = decoded + } + } @@ - if msg.ResponsesReasoning.EncryptedContent != nil { - currentParts = append(currentParts, &Part{ - ThoughtSignature: []byte(*msg.ResponsesReasoning.EncryptedContent), - }) - } + if msg.ResponsesReasoning.EncryptedContent != nil { + if decoded, err := base64.StdEncoding.DecodeString(*msg.ResponsesReasoning.EncryptedContent); err == nil { + currentParts = append(currentParts, &Part{ThoughtSignature: decoded}) + } + } @@ - if bifrostResp.Item != nil && bifrostResp.Item.ResponsesReasoning != nil && bifrostResp.Item.EncryptedContent != nil { - candidate.Content.Parts = append(candidate.Content.Parts, &Part{ - ThoughtSignature: []byte(*bifrostResp.Item.ResponsesReasoning.EncryptedContent), - }) - } + if bifrostResp.Item != nil && bifrostResp.Item.ResponsesReasoning != nil && bifrostResp.Item.ResponsesReasoning.EncryptedContent != nil { + if decoded, err := base64.StdEncoding.DecodeString(*bifrostResp.Item.ResponsesReasoning.EncryptedContent); err == nil { + candidate.Content.Parts = append(candidate.Content.Parts, &Part{ThoughtSignature: decoded}) + } + } @@ - if nextMsg.Type != nil && *nextMsg.Type == schemas.ResponsesMessageTypeReasoning && - nextMsg.ResponsesReasoning != nil && nextMsg.ResponsesReasoning.EncryptedContent != nil { - part.ThoughtSignature = []byte(*nextMsg.ResponsesReasoning.EncryptedContent) - } + if nextMsg.Type != nil && *nextMsg.Type == schemas.ResponsesMessageTypeReasoning && + nextMsg.ResponsesReasoning != nil && nextMsg.ResponsesReasoning.EncryptedContent != nil { + if decoded, err := base64.StdEncoding.DecodeString(*nextMsg.ResponsesReasoning.EncryptedContent); err == nil { + part.ThoughtSignature = decoded + } + }Also applies to: 285-289, 422-427, 2162-2169
753-795: Text-item caching is good; add a small guard forTextOutputIndex/ItemIDsto avoid silent empty IDs.
InprocessGeminiTextPart,itemID := state.ItemIDs[outputIndex]will yield""if something goes off-track; similarlycloseGeminiTextItemassumesTextOutputIndexis valid. A defensive check (and early return) would make debugging much easier.Also applies to: 1229-1230, 1282-1284
♻️ Duplicate comments (7)
core/changelog.md (1)
4-4: Hyphenate “inter-provider” for consistency/readability (LanguageTool).
This was already flagged previously; please apply the hyphenation.-enhancement: improved internal inter provider conversions for integrations +enhancement: improved internal inter-provider conversions for integrationsplugins/semanticcache/changelog.md (1)
1-1: Verify changelog entries in core and framework changelogs.This changelog entry documents the version bumps (core v1.2.37, framework v1.1.47), but as flagged in the previous review, corresponding entries should also be added to
core/changelog.mdandframework/changelog.mdto maintain consistency across all component changelogs.#!/bin/bash # Verify if core and framework changelogs contain entries for the version bumps echo "=== Checking core/changelog.md for version 1.2.37 ===" if [ -f "core/changelog.md" ]; then grep -A 2 "1\.2\.37" core/changelog.md || echo "⚠️ No entry found for 1.2.37" else echo "⚠️ core/changelog.md not found" fi echo "" echo "=== Checking framework/changelog.md for version 1.1.47 ===" if [ -f "framework/changelog.md" ]; then grep -A 2 "1\.1\.47" framework/changelog.md || echo "⚠️ No entry found for 1.1.47" else echo "⚠️ framework/changelog.md not found" fi echo "" echo "=== Checking if both files exist ===" ls -la core/changelog.md framework/changelog.md 2>&1tests/integrations/tests/test_google.py (1)
889-909: Thinking tests: add sentinel skip + mark unusedtest_config.
Both tests can receive("_no_providers_", "_no_model_")and should skip like other parametrized tests. Also Ruff ARG002 applies.@skip_if_no_api_key("google") @pytest.mark.parametrize("provider,model", get_cross_provider_params_for_scenario("thinking")) -def test_26_extended_thinking(self, google_client, test_config, provider, model): +def test_26_extended_thinking(self, google_client, _test_config, provider, model): + if provider == "_no_providers_" or model == "_no_model_": + pytest.skip("No providers configured for this scenario") """Test Case 26: Extended thinking/reasoning (non-streaming)""" from google.genai import types @@ @skip_if_no_api_key("google") @pytest.mark.parametrize("provider,model", get_cross_provider_params_for_scenario("thinking")) -def test_27_extended_thinking_streaming(self, google_client, test_config, provider, model): +def test_27_extended_thinking_streaming(self, google_client, _test_config, provider, model): + if provider == "_no_providers_" or model == "_no_model_": + pytest.skip("No providers configured for this scenario") """Test Case 27: Extended thinking/reasoning (streaming)""" from google.genai import typesAlso applies to: 965-985
tests/integrations/tests/test_anthropic.py (1)
667-686: Cross-provider thinking: verifyextra_body.reasoning_summaryis accepted for all providers; also mark unusedtest_config.
If some backends reject unknown request fields, you may need to gateextra_bodyby provider. Also Ruff ARG002 applies.@pytest.mark.parametrize("provider,model", get_cross_provider_params_for_scenario("thinking")) -def test_15_extended_thinking(self, anthropic_client, test_config, provider, model): +def test_15_extended_thinking(self, anthropic_client, _test_config, provider, model): @@ @pytest.mark.parametrize("provider,model", get_cross_provider_params_for_scenario("thinking")) -def test_16_extended_thinking_streaming(self, anthropic_client, test_config, provider, model): +def test_16_extended_thinking_streaming(self, anthropic_client, _test_config, provider, model):#!/bin/bash set -euo pipefail # Find where Bifrost/Anthropic path handles extra_body passthrough, and whether unknown fields are dropped or validated. rg -n --type=go "extra_body|reasoning_summary|ParseAndSetRawRequest|unknown fields|ExtraParams" -S core | head -200Also applies to: 747-765
transports/changelog.md (1)
5-5: Hyphenate “inter-provider” for consistency.
This matches the prior LanguageTool finding.-enhancement: improved internal inter provider conversions for integrations +enhancement: improved internal inter-provider conversions for integrationstests/integrations/tests/test_bedrock.py (1)
164-167: Unconditional skip needs a reason + tracking reference.- pytest.skip("Skipping text completion invoke test") + # TODO(<issue-id>): reason (e.g., flaky upstream / unsupported for some models / auth constraints) + pytest.skip("Skipping text completion invoke test - TODO(<issue-id>): <reason>")tests/integrations/tests/test_openai.py (1)
1385-1391: Fix thinking test: skip sentinel tuple + silence unusedtest_config.
Right nowget_cross_provider_params_for_scenario("thinking")can return("_no_providers_", "_no_model_"), and this test doesn’t skip it (unlike others). Also Ruff flagstest_configunused.@skip_if_no_api_key("openai") @pytest.mark.parametrize("provider,model", get_cross_provider_params_for_scenario("thinking")) -def test_38_responses_reasoning(self, openai_client, test_config, provider, model): +def test_38_responses_reasoning(self, openai_client, _test_config, provider, model): + if provider == "_no_providers_" or model == "_no_model_": + pytest.skip("No providers configured for this scenario") """Test Case 38: Responses API with reasoning (gpt-5 model)""" # Use gpt-5 reasoning model model_to_use = format_provider_model(provider, model)
🧹 Nitpick comments (2)
tests/integrations/tests/test_bedrock.py (1)
210-216: Token limit bump to 500: OK, but consider centralizing.
Hard-codingmaxTokens=500across tests can drift; if you already havetest_settings.max_tokensintests/integrations/config.yml, prefer reading from config/util to keep providers aligned.Also applies to: 249-255, 296-301, 684-690, 726-731
core/providers/gemini/responses.go (1)
523-525: Stream state/index refactor looks solid (monotonicnextOutputIndex, cachedTextOutputIndex).
The helper split (closeTextItemIfOpen,nextOutputIndex,generateItemID) makes ordering changes (“thoughts before text”) much safer. Only nit:MessageID := fmt.Sprintf("msg_%d", state.CreatedAt)can collide across streams started in the same second—consider a higher-entropy suffix if IDs are externally visible/meaningful.Also applies to: 553-554, 568-569, 603-604, 616-640
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
tests/integrations/uv.lockis excluded by!**/*.lock
📒 Files selected for processing (35)
core/changelog.md(1 hunks)core/providers/cohere/chat.go(1 hunks)core/providers/gemini/chat.go(0 hunks)core/providers/gemini/gemini.go(7 hunks)core/providers/gemini/responses.go(23 hunks)core/providers/gemini/types.go(2 hunks)core/providers/gemini/utils.go(3 hunks)core/providers/vertex/vertex.go(3 hunks)core/version(1 hunks)framework/changelog.md(1 hunks)framework/version(1 hunks)plugins/governance/changelog.md(1 hunks)plugins/governance/version(1 hunks)plugins/jsonparser/changelog.md(1 hunks)plugins/jsonparser/version(1 hunks)plugins/logging/changelog.md(1 hunks)plugins/logging/version(1 hunks)plugins/maxim/changelog.md(1 hunks)plugins/maxim/version(1 hunks)plugins/mocker/changelog.md(1 hunks)plugins/mocker/version(1 hunks)plugins/otel/changelog.md(1 hunks)plugins/otel/version(1 hunks)plugins/semanticcache/changelog.md(1 hunks)plugins/semanticcache/version(1 hunks)plugins/telemetry/changelog.md(1 hunks)plugins/telemetry/version(1 hunks)tests/integrations/config.yml(6 hunks)tests/integrations/pyproject.toml(1 hunks)tests/integrations/tests/test_anthropic.py(5 hunks)tests/integrations/tests/test_bedrock.py(6 hunks)tests/integrations/tests/test_google.py(8 hunks)tests/integrations/tests/test_openai.py(5 hunks)tests/integrations/tests/utils/common.py(2 hunks)transports/changelog.md(1 hunks)
💤 Files with no reviewable changes (1)
- core/providers/gemini/chat.go
✅ Files skipped from review due to trivial changes (3)
- plugins/semanticcache/version
- core/version
- plugins/maxim/changelog.md
🚧 Files skipped from review as they are similar to previous changes (18)
- plugins/logging/version
- plugins/maxim/version
- core/providers/cohere/chat.go
- core/providers/gemini/utils.go
- plugins/otel/version
- core/providers/gemini/types.go
- framework/version
- plugins/mocker/version
- core/providers/vertex/vertex.go
- plugins/jsonparser/version
- framework/changelog.md
- plugins/telemetry/changelog.md
- plugins/mocker/changelog.md
- tests/integrations/tests/utils/common.py
- plugins/otel/changelog.md
- plugins/jsonparser/changelog.md
- plugins/logging/changelog.md
- plugins/governance/version
🧰 Additional context used
📓 Path-based instructions (1)
**
⚙️ CodeRabbit configuration file
always check the stack if there is one for the current PR. do not give localized reviews for the PR, always see all changes in the light of the whole stack of PRs (if there is a stack, if there is no stack you can continue to make localized suggestions/reviews)
Files:
core/changelog.mdtests/integrations/tests/test_openai.pytests/integrations/tests/test_anthropic.pycore/providers/gemini/gemini.gocore/providers/gemini/responses.gotests/integrations/config.ymltests/integrations/pyproject.tomltests/integrations/tests/test_bedrock.pyplugins/telemetry/versionplugins/semanticcache/changelog.mdplugins/governance/changelog.mdtests/integrations/tests/test_google.pytransports/changelog.md
🧠 Learnings (1)
📚 Learning: 2025-12-09T17:07:42.007Z
Learnt from: qwerty-dvorak
Repo: maximhq/bifrost PR: 1006
File: core/schemas/account.go:9-18
Timestamp: 2025-12-09T17:07:42.007Z
Learning: In core/schemas/account.go, the HuggingFaceKeyConfig field within the Key struct is currently unused and reserved for future Hugging Face inference endpoint deployments. Do not flag this field as missing from OpenAPI documentation or require its presence in the API spec until the feature is actively implemented and used. When the feature is added, update the OpenAPI docs accordingly; otherwise, treat this field as non-breaking and not part of the current API surface.
Applied to files:
core/providers/gemini/gemini.gocore/providers/gemini/responses.go
🧬 Code graph analysis (4)
tests/integrations/tests/test_openai.py (1)
tests/integrations/tests/utils/parametrize.py (2)
get_cross_provider_params_for_scenario(12-47)format_provider_model(50-65)
core/providers/gemini/gemini.go (1)
core/providers/utils/utils.go (2)
ShouldSendBackRawRequest(551-556)ParseAndSetRawRequest(447-454)
core/providers/gemini/responses.go (3)
core/providers/gemini/types.go (5)
Role(15-15)FunctionCall(1144-1154)FunctionResponse(1157-1179)Content(975-983)Type(780-780)core/schemas/responses.go (3)
ResponsesMessage(318-331)ResponsesMessageContent(343-348)ResponsesMessageTypeMessage(294-294)core/schemas/utils.go (1)
Ptr(16-18)
tests/integrations/tests/test_google.py (1)
tests/integrations/tests/utils/parametrize.py (2)
get_cross_provider_params_for_scenario(12-47)format_provider_model(50-65)
🪛 LanguageTool
core/changelog.md
[grammar] ~4-~4: Use a hyphen to join words.
Context: ...api enhancement: improved internal inter provider conversions for integrations fe...
(QB_NEW_EN_HYPHEN)
transports/changelog.md
[grammar] ~5-~5: Use a hyphen to join words.
Context: ...api enhancement: improved internal inter provider conversions for integrations fe...
(QB_NEW_EN_HYPHEN)
🪛 Ruff (0.14.8)
tests/integrations/tests/test_openai.py
1387-1387: Unused method argument: test_config
(ARG002)
tests/integrations/tests/test_anthropic.py
668-668: Unused method argument: test_config
(ARG002)
748-748: Unused method argument: test_config
(ARG002)
tests/integrations/tests/test_google.py
891-891: Unused method argument: test_config
(ARG002)
967-967: Unused method argument: test_config
(ARG002)
⏰ Context from checks skipped due to timeout of 900000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
🔇 Additional comments (14)
plugins/telemetry/version (1)
1-1: Version bump looks fine; please confirm stack-wide version alignment.Given this PR is part of a Graphite stack, please double-check the telemetry plugin version
1.3.47matches the repo’s overall versioning/release metadata expectations for the entire stack (e.g., any plugin index/manifest/changelog conventions, if applicable). As per coding guidelines, reviewing in light of the whole stack.plugins/governance/changelog.md (1)
1-1: Verify changelog entry completeness and format consistency.The entry mentions version bumps (core 1.2.37, framework 1.1.47) but doesn't detail the specific improvements (thinking/reasoning enhancements, role mapping fixes, etc.) introduced in this PR. Verify whether this minimal entry is appropriate for the governance plugin changelog (i.e., if only dependencies changed) or if more detail is needed to match the project's changelog conventions.
tests/integrations/tests/test_google.py (1)
475-480:max_output_tokensbump seems fine; confirm it matches intended coverage/cost.
No correctness issues seen.Also applies to: 501-507
tests/integrations/tests/test_anthropic.py (1)
273-279: Token bumps look reasonable for avoiding truncation.
No correctness issues.Also applies to: 594-600, 616-622
core/providers/gemini/gemini.go (3)
246-249: Correct raw-request handling (avoid[]byte→ base64 in JSON).
UsingproviderUtils.ParseAndSetRawRequesthere aligns the raw request shape with other endpoints.
293-307: Raw request flag plumbing in chat streaming looks correct.
Attaching the parsed raw request only on the terminal chunk is a good tradeoff to avoid repeated payload bloat.Also applies to: 311-325, 483-490
591-605: Raw request flag plumbing in responses streaming looks correct.
Same terminal-event behavior applied consistently here.Also applies to: 609-623, 786-794
tests/integrations/pyproject.toml (1)
35-35: Verifygoogle-genai>=1.50.0compatibility with your streaming and type expectations. Ensure the version bump doesn't break any reliance ongenerate_content_streambehavior or type shapes your tests depend on (e.g.,usage_metadata,thoughts_token_countfields).tests/integrations/config.yml (1)
44-45: File may not exist in the current repository state. The review comment referencestests/integrations/config.ymlat lines 44-45 and additional lines (56-57, 67-68, 83-84, 125-126, 175-176), but this file could not be located in the repository. This suggests either:
- The file is being added in a staged/pending change not yet visible in the working tree
- The file path may be incorrect
- The review is referencing a change that hasn't been made available to the verification environment
If the file exists and contains new thinking model IDs (gpt-5.1, claude-opus-4-5, gemini-3-pro-preview, etc.), the original concern remains valid: Configuration files with model definitions should be validated against any model_capabilities or provider support logic to ensure these new model IDs are recognized. However, this cannot be verified without access to the actual file and related validation code.
tests/integrations/tests/test_openai.py (1)
1131-1135:max_output_tokens=1000parameter is fully supported in openai SDK 1.30.0+.The minimum required version in
pyproject.toml(openai>=1.30.0) includes full support for the Responses API with themax_output_tokensparameter. All usages across the test cases (lines 1131–1135, 1166–1170, 1198–1202, 1294–1299) are correct and compatible.core/providers/gemini/responses.go (4)
174-182: Role mapping tweak looks intentional; verify no “system/developer” output messages are mis-bucketed.
Gemini only supports"user"/"model", so defaulting everything exceptResponsesInputMessageRoleUserto"model"is reasonable—just ensurebifrostResp.Outputnever includes system-ish roles that should instead becomeSystemInstruction.
720-730: Processing order (thoughts before text; function calls before signatures) matches the PR intent.
This is consistent with “prioritize thoughts before regular text” and the text-close-before-non-text pattern you added.
1642-1657: Nice: explicit “regular text” candidate conversion + reasoning effort guard.
The addedcase part.Text != ""inconvertGeminiCandidatesToResponsesOutputmakes mixed-part candidates far clearer, and the “only set thinking level when max tokens isn’t set” guard prevents conflicting thinking config.Also applies to: 1865-1867
968-970: Address sonic.Marshal behavioral differences and add error handling.The code at lines 968–970 and 1663–1665 silently ignores
sonic.Marshalerrors. Two issues:
HTML escaping mismatch: sonic v1.14.1 does NOT escape HTML by default (unlike
encoding/json). If tool arguments contain HTML-like characters (<,>,&, etc.), sonic's output will differ. Usesonic.ConfigStdor an encoder withEscapeHTMLenabled if byte-for-byte parity is required.Silent error suppression: Errors are ignored, leaving
argsJSONas an empty string if marshaling fails. This masks real issues (circular references, unsupported types) and is inconsistent with error handling elsewhere in the codebase (logging/returning errors in other sonic.Marshal calls). Log or return the error instead of silently dropping it.
bbd6304 to
78f731b
Compare
3154215 to
d5b03f0
Compare
78f731b to
3475937
Compare
Merge activity
|
d5b03f0 to
828c298
Compare
3475937 to
595f34e
Compare
595f34e to
5b9f593
Compare
…ty in custom providers
5b9f593 to
3f22dc6
Compare

Summary
Enhances Gemini provider support for reasoning/thinking capabilities, improves thought signature handling, and fixes response processing order for better compatibility with Gemini 3 Pro.
Changes
Type of change
Affected areas
How to test
Test the Gemini provider with reasoning/thinking capabilities:
Breaking changes
Related issues
Improves compatibility with Gemini 3 Pro and its reasoning capabilities.
Security considerations
No security implications as this is enhancing existing provider functionality.
Checklist