-
Notifications
You must be signed in to change notification settings - Fork 188
fix: integration test fixes and support responseSchema in Gemini JSON response format #1255
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
📝 WalkthroughSummary by CodeRabbit
✏️ Tip: You can customize this high-level summary in your review settings. WalkthroughbuildOpenAIResponseFormat gained an optional Schema parameter and now marshals/unmarshals/normalizes response schemas as a fallback to responseJsonSchema; Gemini conversion stops setting tool names on function responses; configs add Vertex/Cohere entries; tests and utilities expand streaming/thinking handling and Anthropic now accepts extra_body for streaming. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
This stack of pull requests is managed by Graphite. Learn more about stacking. |
🧪 Test Suite AvailableThis PR can be tested by a repository admin. |
d3ef392 to
b2315af
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🤖 Fix all issues with AI Agents
In @tests/integrations/tests/test_openai.py:
- Around line 931-938: The inline comment above the similarity assertions is
inconsistent (it says "> 0.7" but the assertions use "> 0.6"); update the
comment near the similarity checks (around the assertions for similarity_1_2,
similarity_1_3, similarity_2_3 in the test file) to state the actual threshold
being enforced ("> 0.6") or change the assertions to > 0.7 if you intend that
higher threshold—ensure the comment and the assertions (similarity_1_2,
similarity_1_3, similarity_2_3) match.
🧹 Nitpick comments (2)
core/providers/gemini/utils.go (1)
165-165: Response schema handling for Gemini→Responses looks correct; watch for non‑map ResponseJSONSchemaThe new
buildOpenAIResponseFormat(responseJsonSchema, responseSchema)logic and the updated call:
- Prefer an explicit
ResponseJSONSchema(must be amap[string]interface{}) and only fall back toResponseSchemawhen the JSON schema map is absent.- When only
ResponseSchemais present, marshalling to a map plusconvertTypeToLowerCaseandbuildJSONSchemaFromMapgives you a reasonable OpenAI‑stylejson_schemaformat, including lower‑cased"type"fields.- If
ResponseJSONSchemais ever set to a non‑map value (e.g., a typed struct), this will silently degrade toFormat.Type = "json_object"rather thanjson_schema.Functionally this adds the desired support for both
responseJsonSchemaandresponseSchemawhile keeping existing behavior intact; just ensure all writers ofResponseJSONSchemain the stack keep using a plainmap[string]interface{}to avoid accidental fallback tojson_object.Also applies to: 1068-1127
tests/integrations/tests/test_openai.py (1)
1018-1021: Update comment to reflect bedrock exclusion.The code now excludes both
geminiandbedrockfrom usage data checks, but the comment only mentions gemini and openai. Consider updating the comment to clarify why bedrock is excluded.🔎 Suggested comment fix
- if provider != "gemini" and provider != "bedrock": # gemini does not return usage data and openai does not return usage data for long text + if provider != "gemini" and provider != "bedrock": # gemini and bedrock do not return usage data for long text embeddings assert response.usage is not None, "Usage should be reported for longer text" assert response.usage.total_tokens > 20, "Longer text should consume more tokens"
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (5)
core/providers/gemini/utils.gotests/integrations/config.ymltests/integrations/tests/test_anthropic.pytests/integrations/tests/test_openai.pytests/integrations/tests/utils/common.py
🧰 Additional context used
📓 Path-based instructions (1)
**
⚙️ CodeRabbit configuration file
always check the stack if there is one for the current PR. do not give localized reviews for the PR, always see all changes in the light of the whole stack of PRs (if there is a stack, if there is no stack you can continue to make localized suggestions/reviews)
Files:
core/providers/gemini/utils.gotests/integrations/tests/utils/common.pytests/integrations/config.ymltests/integrations/tests/test_anthropic.pytests/integrations/tests/test_openai.py
🧠 Learnings (4)
📚 Learning: 2025-12-09T17:07:42.007Z
Learnt from: qwerty-dvorak
Repo: maximhq/bifrost PR: 1006
File: core/schemas/account.go:9-18
Timestamp: 2025-12-09T17:07:42.007Z
Learning: In core/schemas/account.go, the HuggingFaceKeyConfig field within the Key struct is currently unused and reserved for future Hugging Face inference endpoint deployments. Do not flag this field as missing from OpenAPI documentation or require its presence in the API spec until the feature is actively implemented and used. When the feature is added, update the OpenAPI docs accordingly; otherwise, treat this field as non-breaking and not part of the current API surface.
Applied to files:
core/providers/gemini/utils.go
📚 Learning: 2025-12-29T11:54:55.836Z
Learnt from: akshaydeo
Repo: maximhq/bifrost PR: 1153
File: framework/configstore/rdb.go:2221-2246
Timestamp: 2025-12-29T11:54:55.836Z
Learning: In Go reviews, do not flag range-over-int patterns like for i := range n as compile-time errors, assuming Go 1.22+ semantics. Only flag actual range-capable values (slices, arrays, maps, channels, strings) and other compile-time issues. This applies to all Go files across the repository.
Applied to files:
core/providers/gemini/utils.go
📚 Learning: 2025-12-19T09:26:54.961Z
Learnt from: qwerty-dvorak
Repo: maximhq/bifrost PR: 1006
File: core/providers/utils/utils.go:1050-1051
Timestamp: 2025-12-19T09:26:54.961Z
Learning: Update streaming end-marker handling so HuggingFace is treated as a non-[DONE] provider for backends that do not emit a DONE marker (e.g., meta llama on novita). In core/providers/utils/utils.go, adjust ProviderSendsDoneMarker() (or related logic) to detect providers that may not emit DONE and avoid relying on DONE as the sole end signal. Add tests to cover both DONE-emitting and non-DONE backends, with clear documentation in code comments explaining the rationale and any fallback behavior.
Applied to files:
core/providers/gemini/utils.go
📚 Learning: 2025-12-24T07:38:16.990Z
Learnt from: qwerty-dvorak
Repo: maximhq/bifrost PR: 1095
File: tests/integrations/tests/test_google.py:2030-2030
Timestamp: 2025-12-24T07:38:16.990Z
Learning: In Python tests under tests/integrations/tests, allow a fixture parameter named test_config in test functions even if unused; do not flag it as unused. This is an internal convention to ensure consistency for integration tests.
Applied to files:
tests/integrations/tests/test_anthropic.pytests/integrations/tests/test_openai.py
🧬 Code graph analysis (1)
core/providers/gemini/utils.go (2)
core/providers/gemini/types.go (2)
Schema(734-785)Type(788-788)core/schemas/responses.go (2)
ResponsesTextConfig(120-123)ResponsesTextConfigFormat(125-130)
🔇 Additional comments (5)
tests/integrations/tests/test_anthropic.py (1)
799-801: Streaming thinking test extra_body aligns with non‑streaming pathUsing
extra_body={"reasoning_summary": "detailed"}here keeps the streaming thinking test consistent with the non‑streaming variant and should exercise the same summary behavior. No changes needed.tests/integrations/config.yml (1)
13-13: Config updates for cohere/OpenAI/Bedrock look consistent; just confirm model availability
- Line 13: Adding
cohereunderbifrost.endpointsmatches the newproviders.cohereblock and theprovider_api_keys/get_api_keywiring.- Line 40: Pointing
openai.filetogpt-4okeeps file operations aligned with the rest of the OpenAI config (chat/vision) and avoids depending on a separate gpt‑5‑only entry.- Line 124: Updating Bedrock
embeddingstoglobal.cohere.embed-v4:0is consistent with using Cohere v4 embeddings through Bedrock.Assuming these model IDs exist in the deployed environment, the config is coherent with the rest of this stack.
Also applies to: 40-40, 124-124
tests/integrations/tests/utils/common.py (2)
1814-1815: Cohere API key wiring matches config and skip helpersAdding
"cohere": "COHERE_API_KEY"toget_api_keykeeps the runtime mapping consistent withprovider_api_keysinconfig.ymland letsskip_if_no_api_key("cohere")behave as expected for new cohere scenarios. Just make sure any directget_api_key("cohere")usages in tests are either wrapped inskip_if_no_api_keyor otherwise guarded so missing env vars result in skips rather than hard failures.
2498-2506: Reasoning/thinking extraction in get_content_string_with_summary is aligned with new provider formatsThe extended handling in
get_content_string_with_summary:
- Detects Anthropic‑style thinking blocks (
{"type": "thinking", "thinking": ...}) and treats them as reasoning content.- Supports Gemini‑style reasoning blocks that surface
summaryorcontentlists withtextfields.- Handles plain string items in a LangChain
response.contentlist so mixed content shapes still contribute to the aggregated text.This should make reasoning tests more robust across Anthropic, Gemini, and LangChain/OpenAI response shapes without regressing existing behavior.
Also applies to: 2523-2525
tests/integrations/tests/test_openai.py (1)
1238-1241: Test input data is correctly aligned with assertions.
RESPONSES_SIMPLE_TEXT_INPUTcontains a space exploration prompt ("Tell me a fun fact about space exploration"), which properly matches the updated keyword assertions expecting space exploration-related content.
b2315af to
e53ced3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
tests/integrations/config.yml (1)
699-704: Virtual key still enabled despite PR summaryThe PR description says “Disabled virtual key testing by default,” but
virtual_key.enabledis still set totruehere. If the intent is to stop double‑running cross‑provider tests, this should probably befalse.Proposed config tweak
-virtual_key: - enabled: true - value: "sk-bf-test-key" +virtual_key: + enabled: false + value: "sk-bf-test-key"
🧹 Nitpick comments (2)
tests/integrations/tests/utils/common.py (1)
2480-2526: Reasoning/thinking content extraction for LangChain is soundThe updates to
get_content_string_with_summarynow:
- Recognize Anthropic‑style
{"type": "thinking", "thinking": ...}blocks and markhas_reasoning_content.- Handle
{"type": "reasoning", "summary": [...]}and{"type": "reasoning", "content": [...]}blocks, accumulating theirtext.- Accept plain strings in
response.contentlists.
This should make reasoning‑aware assertions more reliable across LangChain/Anthropic/Gemini without impacting existing paths.If you see more response variants later, consider centralizing the per‑block handling into small helper functions to keep this function from growing further.
core/providers/gemini/utils.go (1)
1068-1145:buildOpenAIResponseFormathandles both schema sources with safe fallbacksThe refactored
buildOpenAIResponseFormat(responseJsonSchema, responseSchema)behaves sensibly:
- Prefers
responseJsonSchemawhen non‑nil and amap[string]interface{}.- Falls back to
json_objectifresponseJsonSchemais present but not a map.- Otherwise, if
responseSchemais provided:
- Marshals/unmarshals it, normalizes
"type"fields viaconvertTypeToLowerCase, and builds a properjson_schemaformat.- On any marshal/unmarshal/type‑assert failure, falls back to
json_object.- With neither set, defaults to
json_object.This gives robust behavior even with malformed or unexpected schemas, and keeps the OpenAI Responses config API surface consistent.
You might want a brief comment noting that
responseJsonSchemaintentionally takes precedence overresponseSchemawhen both are set, so future maintainers don’t change that ordering accidentally.
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (5)
core/providers/gemini/utils.gotests/integrations/config.ymltests/integrations/tests/test_anthropic.pytests/integrations/tests/test_openai.pytests/integrations/tests/utils/common.py
🚧 Files skipped from review as they are similar to previous changes (1)
- tests/integrations/tests/test_openai.py
🧰 Additional context used
📓 Path-based instructions (1)
**
⚙️ CodeRabbit configuration file
always check the stack if there is one for the current PR. do not give localized reviews for the PR, always see all changes in the light of the whole stack of PRs (if there is a stack, if there is no stack you can continue to make localized suggestions/reviews)
Files:
core/providers/gemini/utils.gotests/integrations/tests/utils/common.pytests/integrations/tests/test_anthropic.pytests/integrations/config.yml
🧠 Learnings (4)
📚 Learning: 2025-12-09T17:07:42.007Z
Learnt from: qwerty-dvorak
Repo: maximhq/bifrost PR: 1006
File: core/schemas/account.go:9-18
Timestamp: 2025-12-09T17:07:42.007Z
Learning: In core/schemas/account.go, the HuggingFaceKeyConfig field within the Key struct is currently unused and reserved for future Hugging Face inference endpoint deployments. Do not flag this field as missing from OpenAPI documentation or require its presence in the API spec until the feature is actively implemented and used. When the feature is added, update the OpenAPI docs accordingly; otherwise, treat this field as non-breaking and not part of the current API surface.
Applied to files:
core/providers/gemini/utils.go
📚 Learning: 2025-12-29T11:54:55.836Z
Learnt from: akshaydeo
Repo: maximhq/bifrost PR: 1153
File: framework/configstore/rdb.go:2221-2246
Timestamp: 2025-12-29T11:54:55.836Z
Learning: In Go reviews, do not flag range-over-int patterns like for i := range n as compile-time errors, assuming Go 1.22+ semantics. Only flag actual range-capable values (slices, arrays, maps, channels, strings) and other compile-time issues. This applies to all Go files across the repository.
Applied to files:
core/providers/gemini/utils.go
📚 Learning: 2025-12-19T09:26:54.961Z
Learnt from: qwerty-dvorak
Repo: maximhq/bifrost PR: 1006
File: core/providers/utils/utils.go:1050-1051
Timestamp: 2025-12-19T09:26:54.961Z
Learning: Update streaming end-marker handling so HuggingFace is treated as a non-[DONE] provider for backends that do not emit a DONE marker (e.g., meta llama on novita). In core/providers/utils/utils.go, adjust ProviderSendsDoneMarker() (or related logic) to detect providers that may not emit DONE and avoid relying on DONE as the sole end signal. Add tests to cover both DONE-emitting and non-DONE backends, with clear documentation in code comments explaining the rationale and any fallback behavior.
Applied to files:
core/providers/gemini/utils.go
📚 Learning: 2025-12-24T07:38:16.990Z
Learnt from: qwerty-dvorak
Repo: maximhq/bifrost PR: 1095
File: tests/integrations/tests/test_google.py:2030-2030
Timestamp: 2025-12-24T07:38:16.990Z
Learning: In Python tests under tests/integrations/tests, allow a fixture parameter named test_config in test functions even if unused; do not flag it as unused. This is an internal convention to ensure consistency for integration tests.
Applied to files:
tests/integrations/tests/test_anthropic.py
🧬 Code graph analysis (1)
core/providers/gemini/utils.go (2)
core/providers/gemini/types.go (2)
Schema(734-785)Type(788-788)core/schemas/responses.go (2)
ResponsesTextConfig(120-123)ResponsesTextConfigFormat(125-130)
🔇 Additional comments (9)
tests/integrations/config.yml (3)
11-18: Cohere endpoint wiring looks consistentThe new
bifrost.endpoints.cohere: "cohere"entry is consistent with thecohereprovider block andprovider_api_keys.coherein this file, and withget_api_key("cohere")intests/integrations/tests/utils/common.py. Nothing blocking here.
38-47: Switchingopenai.filetogpt-4ois reasonableUsing
gpt-4ofor file-related tests aligns with the rest of the OpenAI capabilities in this config and avoids depending ongpt-5. No additional changes needed here.
116-125: Bedrock embeddings model updated to Cohere – verify availabilityPointing
bedrock.embeddingstoglobal.cohere.embed-v4:0matches the new Cohere focus in this PR, but the actual ARN/name has to exist and be enabled in the target AWS account/region.Please double‑check in your Bedrock console (or via CLI) that
global.cohere.embed-v4:0is the correct, available embeddings model for your test environment.tests/integrations/tests/test_anthropic.py (1)
789-802: Streaming thinking test correctly exercisesextra_body– confirm SDK supportPassing
extra_body={"reasoning_summary": "detailed"}in the streaming extended‑thinking test now mirrors the non‑streaming test and validates the new Anthropic request shape; the loop overstreamremains unchanged.Please confirm the pinned Anthropic Python SDK version in this repo documents the
extra_bodykeyword onmessages.create(stream=True)so CI doesn’t fail on an unexpected argument.tests/integrations/tests/utils/common.py (2)
1805-1815: Cohere API key mapping is consistentAdding
"cohere": "COHERE_API_KEY"tokey_mapmatchesprovider_api_keys.cohereintests/integrations/config.yml, so Cohere tests will pick up the correct env var.
2584-2612: Input-tokens validation helper aligns with Anthropic and OpenAI usage
assert_valid_input_tokens_responsecleanly separates expectations:
library == "google": checkstotal_tokens(Gemini count endpoint).library == "openai": checksobjectcontains"input_tokens"and validatesinput_tokens.- Else (Anthropic and others): requires a positive
input_tokensattribute.
This matches how the new Anthropic tests call it (library="anthropic"), and should be flexible enough for future providers.core/providers/gemini/utils.go (3)
162-179: Responses parameters now correctly surface response schemasWhen
ResponseMIMETypeisapplication/json, you now buildparams.TextviabuildOpenAIResponseFormat(config.ResponseJSONSchema, config.ResponseSchema)and also exposeresponse_schema/response_json_schemaviaExtraParams. This gives the OpenAI‑style Responses layer both the structured format and the raw Gemini schema knobs, which is exactly what downstream tests expect.
951-1065: JSON-schema normalization helper is appropriate for Responses JSON format
normalizeSchemaTypesandbuildJSONSchemaFromMap:
- Lower‑case
"type"fields (and nested properties/items/anyOf/oneOf).- Populate
Type,Properties,Required,Description,AdditionalProperties, andName/TitleintoResponsesTextConfigFormatJSONSchema.This matches the OpenAI Responses JSON schema shape, while keeping the original map structure intact for downstream use. No issues spotted.
1279-1309: OpenAIresponse_format→ Gemini schema extraction remains compatible
extractSchemaMapFromResponseFormatcontinues to:
- Guard on
type == "json_schema".- Pull the nested
json_schema.schemamap.- Normalize via
normalizeSchemaForGemini.This is still the right place to adapt OpenAI Response JSON schemas for Gemini, and works nicely with the new
buildOpenAIResponseFormatpath that turns Gemini’s schema back into an OpenAI‑styleResponsesTextConfigFormatJSONSchema.
e53ced3 to
0a1551a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
core/providers/gemini/utils.go (1)
1086-1119: Schema conversion logic is correct but could benefit from error visibility.The marshal/unmarshal approach to convert
responseSchemato a normalized map is a common Go pattern and works correctly. The defensive error handling ensures graceful degradation tojson_objectmode.Consider logging the conversion errors before falling back to
json_objectmode (lines 1090, 1100, 1113). This would aid debugging when schema conversion fails silently in production.Example: Add debug logging for conversion failures
data, err := sonic.Marshal(responseSchema) if err != nil { + // Consider: log.Debug("Failed to marshal responseSchema, falling back to json_object", "error", err) // If marshaling fails, fall back to json_object mode return &schemas.ResponsesTextConfig{Similar logging could be added at lines 1100 and 1113.
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (5)
core/providers/gemini/utils.gotests/integrations/config.ymltests/integrations/tests/test_anthropic.pytests/integrations/tests/test_openai.pytests/integrations/tests/utils/common.py
🚧 Files skipped from review as they are similar to previous changes (3)
- tests/integrations/config.yml
- tests/integrations/tests/test_openai.py
- tests/integrations/tests/test_anthropic.py
🧰 Additional context used
📓 Path-based instructions (1)
**
⚙️ CodeRabbit configuration file
always check the stack if there is one for the current PR. do not give localized reviews for the PR, always see all changes in the light of the whole stack of PRs (if there is a stack, if there is no stack you can continue to make localized suggestions/reviews)
Files:
core/providers/gemini/utils.gotests/integrations/tests/utils/common.py
🧠 Learnings (3)
📚 Learning: 2025-12-09T17:07:42.007Z
Learnt from: qwerty-dvorak
Repo: maximhq/bifrost PR: 1006
File: core/schemas/account.go:9-18
Timestamp: 2025-12-09T17:07:42.007Z
Learning: In core/schemas/account.go, the HuggingFaceKeyConfig field within the Key struct is currently unused and reserved for future Hugging Face inference endpoint deployments. Do not flag this field as missing from OpenAPI documentation or require its presence in the API spec until the feature is actively implemented and used. When the feature is added, update the OpenAPI docs accordingly; otherwise, treat this field as non-breaking and not part of the current API surface.
Applied to files:
core/providers/gemini/utils.go
📚 Learning: 2025-12-29T11:54:55.836Z
Learnt from: akshaydeo
Repo: maximhq/bifrost PR: 1153
File: framework/configstore/rdb.go:2221-2246
Timestamp: 2025-12-29T11:54:55.836Z
Learning: In Go reviews, do not flag range-over-int patterns like for i := range n as compile-time errors, assuming Go 1.22+ semantics. Only flag actual range-capable values (slices, arrays, maps, channels, strings) and other compile-time issues. This applies to all Go files across the repository.
Applied to files:
core/providers/gemini/utils.go
📚 Learning: 2025-12-19T09:26:54.961Z
Learnt from: qwerty-dvorak
Repo: maximhq/bifrost PR: 1006
File: core/providers/utils/utils.go:1050-1051
Timestamp: 2025-12-19T09:26:54.961Z
Learning: Update streaming end-marker handling so HuggingFace is treated as a non-[DONE] provider for backends that do not emit a DONE marker (e.g., meta llama on novita). In core/providers/utils/utils.go, adjust ProviderSendsDoneMarker() (or related logic) to detect providers that may not emit DONE and avoid relying on DONE as the sole end signal. Add tests to cover both DONE-emitting and non-DONE backends, with clear documentation in code comments explaining the rationale and any fallback behavior.
Applied to files:
core/providers/gemini/utils.go
🧬 Code graph analysis (1)
core/providers/gemini/utils.go (2)
core/providers/gemini/types.go (2)
Schema(734-785)Type(788-788)core/schemas/responses.go (2)
ResponsesTextConfig(120-123)ResponsesTextConfigFormat(125-130)
🔇 Additional comments (3)
tests/integrations/tests/utils/common.py (2)
1814-1814: LGTM - Cohere provider mapping added.The Cohere API key mapping follows the established pattern and aligns with the new Cohere endpoint configuration mentioned in the PR objectives.
2498-2525: Enhanced reasoning content detection across multiple providers.The additions properly handle thinking/reasoning blocks from various providers (Anthropic, Gemini, OpenAI) with appropriate fallbacks for plain string items. The logic correctly:
- Detects Anthropic thinking blocks and extracts thinking text
- Handles reasoning blocks with summary or content fields
- Sets the
has_reasoning_contentflag consistently- Gracefully handles plain string items in content lists
core/providers/gemini/utils.go (1)
165-165: Function signature extended to support responseSchema parameter.The addition of the
responseSchema *Schemaparameter enables fallback support whenresponseJsonSchemais not provided, implementing the feature described in the PR objectives. The call site at line 165 correctly passesconfig.ResponseSchema.Also applies to: 1068-1068
0a1551a to
9afa095
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🤖 Fix all issues with AI Agents
In @tests/integrations/tests/test_openai.py:
- Around line 1674-1835: The test unpacks event_types in
test_38a_responses_reasoning_streaming but never uses it (causing linter
warnings) and both 38a/38b lack a fallback for providers that don't support
streaming/reasoning; rename the unused binding to _event_types (or assert
something lightweight like "assert isinstance(event_types, list)") in
test_38a_responses_reasoning_streaming, and wrap the streaming call +
collect_responses_streaming_content in a try/except that mirrors the existing
fallback pattern used in test_38_responses_reasoning (catch the
provider-not-supported exception and fall back to a non-streaming/simple
response assertion) so a lagging backend won’t fail the entire cross-provider
run.
🧹 Nitpick comments (2)
core/providers/gemini/utils.go (1)
1068-1127: Improve fallback whenresponseJsonSchemais non-map butresponseSchemais availableRight now any non‑
map[string]interface{}responseJsonSchema(e.g., a typed struct) causes an immediate fallback tojson_object, even ifresponseSchemais populated. You could make this more robust by lettingresponseSchemaact as a secondary source before giving up, e.g.:Suggested control-flow tweak
func buildOpenAIResponseFormat(responseJsonSchema interface{}, responseSchema *Schema) *schemas.ResponsesTextConfig { name := "json_response" - var schemaMap map[string]interface{} - - // Try to use responseJsonSchema first - if responseJsonSchema != nil { - // Use responseJsonSchema directly if it's a map - var ok bool - schemaMap, ok = responseJsonSchema.(map[string]interface{}) - if !ok { - // If not a map, fall back to json_object mode - return &schemas.ResponsesTextConfig{ - Format: &schemas.ResponsesTextConfigFormat{ - Type: "json_object", - }, - } - } - } else if responseSchema != nil { + var schemaMap map[string]interface{} + + // Prefer responseJsonSchema when it's a usable map + if responseJsonSchema != nil { + if m, ok := responseJsonSchema.(map[string]interface{}); ok { + schemaMap = m + } else if responseSchema == nil { + // No usable schema at all – fall back + return &schemas.ResponsesTextConfig{ + Format: &schemas.ResponsesTextConfigFormat{ + Type: "json_object", + }, + } + } + } + + // If we still don't have a map, try building one from responseSchema + if schemaMap == nil && responseSchema != nil { // Convert responseSchema to map using JSON marshaling and type normalization data, err := sonic.Marshal(responseSchema) if err != nil { // If marshaling fails, fall back to json_object mode return &schemas.ResponsesTextConfig{ Format: &schemas.ResponsesTextConfigFormat{ Type: "json_object", }, } } @@ - normalized := convertTypeToLowerCase(rawMap) - var ok bool - schemaMap, ok = normalized.(map[string]interface{}) + normalized := convertTypeToLowerCase(rawMap) + m, ok := normalized.(map[string]interface{}) + schemaMap = m if !ok { // If type assertion fails, fall back to json_object mode return &schemas.ResponsesTextConfig{ Format: &schemas.ResponsesTextConfigFormat{ Type: "json_object", }, } } - } else { - // No schema provided - use json_object mode - return &schemas.ResponsesTextConfig{ - Format: &schemas.ResponsesTextConfigFormat{ - Type: "json_object", - }, - } }This keeps the existing safety net but avoids silently dropping structured schemas when
responseJsonSchemais present but not already a plain map.tests/integrations/tests/test_openai.py (1)
1018-1021: Update long‑text usage comment to match the providers being exemptedThe condition now skips the usage assertion for
provider == "gemini"andprovider == "bedrock", but the comment still mentions Gemini and OpenAI. To avoid confusion, consider updating it to mention Gemini and Bedrock instead, or broaden it to “providers that don’t return usage for this path”.Minimal comment tweak
- if provider != "gemini" and provider != "bedrock": # gemini does not return usage data and openai does not return usage data for long text + # gemini/bedrock do not return usage data for this long-text embeddings path + if provider != "gemini" and provider != "bedrock":
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (6)
core/providers/gemini/responses.gocore/providers/gemini/utils.gotests/integrations/config.ymltests/integrations/tests/test_anthropic.pytests/integrations/tests/test_openai.pytests/integrations/tests/utils/common.py
💤 Files with no reviewable changes (1)
- core/providers/gemini/responses.go
🚧 Files skipped from review as they are similar to previous changes (1)
- tests/integrations/tests/test_anthropic.py
🧰 Additional context used
📓 Path-based instructions (1)
**
⚙️ CodeRabbit configuration file
always check the stack if there is one for the current PR. do not give localized reviews for the PR, always see all changes in the light of the whole stack of PRs (if there is a stack, if there is no stack you can continue to make localized suggestions/reviews)
Files:
tests/integrations/tests/utils/common.pycore/providers/gemini/utils.gotests/integrations/config.ymltests/integrations/tests/test_openai.py
🧠 Learnings (4)
📚 Learning: 2025-12-09T17:07:42.007Z
Learnt from: qwerty-dvorak
Repo: maximhq/bifrost PR: 1006
File: core/schemas/account.go:9-18
Timestamp: 2025-12-09T17:07:42.007Z
Learning: In core/schemas/account.go, the HuggingFaceKeyConfig field within the Key struct is currently unused and reserved for future Hugging Face inference endpoint deployments. Do not flag this field as missing from OpenAPI documentation or require its presence in the API spec until the feature is actively implemented and used. When the feature is added, update the OpenAPI docs accordingly; otherwise, treat this field as non-breaking and not part of the current API surface.
Applied to files:
core/providers/gemini/utils.go
📚 Learning: 2025-12-29T11:54:55.836Z
Learnt from: akshaydeo
Repo: maximhq/bifrost PR: 1153
File: framework/configstore/rdb.go:2221-2246
Timestamp: 2025-12-29T11:54:55.836Z
Learning: In Go reviews, do not flag range-over-int patterns like for i := range n as compile-time errors, assuming Go 1.22+ semantics. Only flag actual range-capable values (slices, arrays, maps, channels, strings) and other compile-time issues. This applies to all Go files across the repository.
Applied to files:
core/providers/gemini/utils.go
📚 Learning: 2025-12-19T09:26:54.961Z
Learnt from: qwerty-dvorak
Repo: maximhq/bifrost PR: 1006
File: core/providers/utils/utils.go:1050-1051
Timestamp: 2025-12-19T09:26:54.961Z
Learning: Update streaming end-marker handling so HuggingFace is treated as a non-[DONE] provider for backends that do not emit a DONE marker (e.g., meta llama on novita). In core/providers/utils/utils.go, adjust ProviderSendsDoneMarker() (or related logic) to detect providers that may not emit DONE and avoid relying on DONE as the sole end signal. Add tests to cover both DONE-emitting and non-DONE backends, with clear documentation in code comments explaining the rationale and any fallback behavior.
Applied to files:
core/providers/gemini/utils.go
📚 Learning: 2025-12-24T07:38:16.990Z
Learnt from: qwerty-dvorak
Repo: maximhq/bifrost PR: 1095
File: tests/integrations/tests/test_google.py:2030-2030
Timestamp: 2025-12-24T07:38:16.990Z
Learning: In Python tests under tests/integrations/tests, allow a fixture parameter named test_config in test functions even if unused; do not flag it as unused. This is an internal convention to ensure consistency for integration tests.
Applied to files:
tests/integrations/tests/test_openai.py
🧬 Code graph analysis (2)
core/providers/gemini/utils.go (2)
core/providers/gemini/types.go (2)
Schema(734-785)Type(788-788)core/schemas/responses.go (2)
ResponsesTextConfig(120-123)ResponsesTextConfigFormat(125-130)
tests/integrations/tests/test_openai.py (2)
tests/integrations/tests/utils/common.py (2)
skip_if_no_api_key(1829-1840)collect_responses_streaming_content(1935-1991)tests/integrations/tests/utils/parametrize.py (2)
get_cross_provider_params_with_vk_for_scenario(50-101)format_provider_model(126-141)
🪛 Ruff (0.14.10)
tests/integrations/tests/test_openai.py
1676-1676: Unused method argument: test_config
(ARG002)
1692-1692: Unpacked variable event_types is never used
Prefix it with an underscore or any other dummy variable pattern
(RUF059)
1746-1746: Unused method argument: test_config
(ARG002)
🔇 Additional comments (7)
core/providers/gemini/utils.go (1)
162-179: JSON response-format wiring from GenerationConfig looks solidPassing both
ResponseJSONSchemaandResponseSchemaintobuildOpenAIResponseFormatforapplication/jsonwhile still exposing them viaExtraParamsgives Gemini maximum information without changing existing fallbacks. No issues from a correctness standpoint.tests/integrations/config.yml (1)
13-19: Cohere/Vertex provider configuration is consistent; double‑check vertex routing
coherenow has a dedicated endpoint, provider block, API key, and scenarios entry, which line up correctly.vertexis wired through providers,provider_api_keys, andprovider_scenarioswith capabilities that match what the tests exercise (no batch/files, embeddings/thinking enabled), and bedrock embeddings are updated to the global Cohere ARN.One thing to verify: there is no
vertexentry underbifrost.endpoints, so all vertex‑backed tests must be routed via an existing integration (likelygoogle: genai). If that’s intentional, this config looks good as‑is.Also applies to: 37-67, 87-157, 116-125, 126-157, 134-135, 148-167, 291-329
tests/integrations/tests/utils/common.py (2)
1804-1816: API key routing for Cohere and Vertex is aligned with provider configAdding
"cohere": "COHERE_API_KEY"and"vertex": "VERTEX_API_KEY"here keepsget_api_keyconsistent withprovider_api_keysinconfig.yml, so cross‑provider tests can reuse the same helper without special‑casing.
1969-1975: Streaming and reasoning content extraction updates look correct
- Including
response.summary_text.deltaincollect_responses_streaming_contentensures reasoning summaries contribute to the assembled content used by tests.- The LangChain branch in
get_content_string_with_summarynow correctly recognizes Anthropic‑stylethinkingblocks, Gemini‑stylereasoningblocks (bothsummaryandcontent), and plain string items, while flagginghas_reasoning_contentappropriately.These changes should make reasoning/summary assertions much more reliable across clients.
Also applies to: 2503-2531
tests/integrations/tests/test_openai.py (3)
1197-1205: File‑input expectations keyed to “hello world” PDF look reasonableBoth the chat and responses file tests now assert that the model’s answer references generic document/file keywords (including “hello”/“world” tied to the embedded PDF). This keeps the assertions flexible but still grounded in the actual fixture content.
Also applies to: 1361-1376
1238-1242: Space‑exploration keyword checks for simple responses are appropriateThe updated keyword list (
"space","exploration","astronaut","moon","mars", etc.) matches the test prompt and still leaves enough flexibility across providers while ensuring the response actually talks about space exploration.
1539-1543: Highermax_output_tokensfor reasoning responses is a good safeguardBumping
max_output_tokensto 1200 for the high‑effort reasoning test reduces the risk of truncating chain‑of‑thought style answers, especially now that you’re also asserting on multiple reasoning/step indicators.
9afa095 to
652111d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In @tests/integrations/tests/test_openai.py:
- Around line 1017-1021: The inline comment and the conditional are
inconsistent: the code currently skips usage checks for providers by evaluating
if provider != "gemini" and provider != "bedrock" but the comment says "gemini
does not return usage data and openai does not return usage data for long text."
Update the comment to correctly reflect the providers being excluded (gemini and
bedrock) or, if you intended to exclude openai instead of bedrock, change the
condition to check provider != "gemini" and provider != "openai"; ensure this
uses the same provider names as the condition that guards the response.usage and
response.usage.total_tokens assertions so the comment and the if-check are
consistent with each other.
🧹 Nitpick comments (1)
core/providers/gemini/utils.go (1)
1068-1145: Consider adding fallback toresponseSchemawhenresponseJsonSchemais non-nil but invalidThe function handles the happy path well, but there's a robustness gap: if
responseJsonSchemais non-nil but not amap[string]interface{}(which can happen when overridden viaExtraParams["response_json_schema"]), it immediately falls back tojson_objectmode without attempting to use a non-nilresponseSchema.Since both parameters can coexist, the function should try
responseSchemabefore giving up entirely:Optional robustness improvement
- if responseJsonSchema != nil { - // Use responseJsonSchema directly if it's a map - var ok bool - schemaMap, ok = responseJsonSchema.(map[string]interface{}) - if !ok { - // If not a map, fall back to json_object mode - return &schemas.ResponsesTextConfig{ - Format: &schemas.ResponsesTextConfigFormat{ - Type: "json_object", - }, - } - } - } else if responseSchema != nil { + if responseJsonSchema != nil { + // Use responseJsonSchema directly if it's a map + var ok bool + schemaMap, ok = responseJsonSchema.(map[string]interface{}) + if !ok && responseSchema == nil { + // If not a map and we have no Gemini schema to fall back to, use json_object + return &schemas.ResponsesTextConfig{ + Format: &schemas.ResponsesTextConfigFormat{ + Type: "json_object", + }, + } + } + } + + if schemaMap == nil && responseSchema != nil { // Convert responseSchema to map using JSON marshaling and type normalization data, err := sonic.Marshal(responseSchema) if err != nil { // If marshaling fails, fall back to json_object mode return &schemas.ResponsesTextConfig{ Format: &schemas.ResponsesTextConfigFormat{ Type: "json_object", }, } } ... - } else { + } else if schemaMap == nil { // No schema provided - use json_object mode return &schemas.ResponsesTextConfig{ Format: &schemas.ResponsesTextConfigFormat{ Type: "json_object", }, } }Not required for correctness today, but it makes the helper more tolerant of mixed inputs, particularly when
response_json_schemais provided viaExtraParamswith an unexpected type.
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (6)
core/providers/gemini/responses.gocore/providers/gemini/utils.gotests/integrations/config.ymltests/integrations/tests/test_anthropic.pytests/integrations/tests/test_openai.pytests/integrations/tests/utils/common.py
💤 Files with no reviewable changes (1)
- core/providers/gemini/responses.go
🚧 Files skipped from review as they are similar to previous changes (2)
- tests/integrations/tests/test_anthropic.py
- tests/integrations/config.yml
🧰 Additional context used
📓 Path-based instructions (1)
**
⚙️ CodeRabbit configuration file
always check the stack if there is one for the current PR. do not give localized reviews for the PR, always see all changes in the light of the whole stack of PRs (if there is a stack, if there is no stack you can continue to make localized suggestions/reviews)
Files:
core/providers/gemini/utils.gotests/integrations/tests/test_openai.pytests/integrations/tests/utils/common.py
🧠 Learnings (4)
📚 Learning: 2025-12-09T17:07:42.007Z
Learnt from: qwerty-dvorak
Repo: maximhq/bifrost PR: 1006
File: core/schemas/account.go:9-18
Timestamp: 2025-12-09T17:07:42.007Z
Learning: In core/schemas/account.go, the HuggingFaceKeyConfig field within the Key struct is currently unused and reserved for future Hugging Face inference endpoint deployments. Do not flag this field as missing from OpenAPI documentation or require its presence in the API spec until the feature is actively implemented and used. When the feature is added, update the OpenAPI docs accordingly; otherwise, treat this field as non-breaking and not part of the current API surface.
Applied to files:
core/providers/gemini/utils.go
📚 Learning: 2025-12-29T11:54:55.836Z
Learnt from: akshaydeo
Repo: maximhq/bifrost PR: 1153
File: framework/configstore/rdb.go:2221-2246
Timestamp: 2025-12-29T11:54:55.836Z
Learning: In Go reviews, do not flag range-over-int patterns like for i := range n as compile-time errors, assuming Go 1.22+ semantics. Only flag actual range-capable values (slices, arrays, maps, channels, strings) and other compile-time issues. This applies to all Go files across the repository.
Applied to files:
core/providers/gemini/utils.go
📚 Learning: 2025-12-19T09:26:54.961Z
Learnt from: qwerty-dvorak
Repo: maximhq/bifrost PR: 1006
File: core/providers/utils/utils.go:1050-1051
Timestamp: 2025-12-19T09:26:54.961Z
Learning: Update streaming end-marker handling so HuggingFace is treated as a non-[DONE] provider for backends that do not emit a DONE marker (e.g., meta llama on novita). In core/providers/utils/utils.go, adjust ProviderSendsDoneMarker() (or related logic) to detect providers that may not emit DONE and avoid relying on DONE as the sole end signal. Add tests to cover both DONE-emitting and non-DONE backends, with clear documentation in code comments explaining the rationale and any fallback behavior.
Applied to files:
core/providers/gemini/utils.go
📚 Learning: 2025-12-24T07:38:16.990Z
Learnt from: qwerty-dvorak
Repo: maximhq/bifrost PR: 1095
File: tests/integrations/tests/test_google.py:2030-2030
Timestamp: 2025-12-24T07:38:16.990Z
Learning: In Python tests under tests/integrations/tests, allow a fixture parameter named test_config in test functions even if unused; do not flag it as unused. This is an internal convention to ensure consistency for integration tests.
Applied to files:
tests/integrations/tests/test_openai.py
🧬 Code graph analysis (2)
core/providers/gemini/utils.go (2)
core/providers/gemini/types.go (2)
Schema(734-785)Type(788-788)core/schemas/responses.go (2)
ResponsesTextConfig(120-123)ResponsesTextConfigFormat(125-130)
tests/integrations/tests/test_openai.py (2)
tests/integrations/tests/utils/common.py (2)
skip_if_no_api_key(1829-1840)collect_responses_streaming_content(1935-1991)tests/integrations/tests/utils/parametrize.py (2)
get_cross_provider_params_with_vk_for_scenario(50-101)format_provider_model(126-141)
🪛 Ruff (0.14.10)
tests/integrations/tests/test_openai.py
1676-1676: Unused method argument: test_config
(ARG002)
1692-1692: Unpacked variable event_types is never used
Prefix it with an underscore or any other dummy variable pattern
(RUF059)
1746-1746: Unused method argument: test_config
(ARG002)
🔇 Additional comments (8)
tests/integrations/tests/utils/common.py (3)
1805-1816: Cohere/Vertex API key mapping is consistent with existing helpersAdding
"cohere"→COHERE_API_KEYand"vertex"→VERTEX_API_KEYfits the existing pattern and keepsget_api_keyaligned with new providers.
1950-1991: Includingresponse.summary_text.deltain streaming aggregation is correctCapturing
response.summary_text.deltaalongsideresponse.output_text.deltaensures reasoning summaries from the Responses API are not dropped during streaming collection and matches how tests consumecontent.
2485-2531: Expanded reasoning/thinking extraction for LangChain messages looks solidThe added handling for Anthropic-style
{"type": "thinking", "thinking": "..."}blocks and plain string items in thecontentlist correctly setshas_reasoning_contentand preserves all text content without impacting existing reasoning/summary paths.core/providers/gemini/utils.go (1)
162-179: Passing bothResponseJSONSchemaandResponseSchemaintobuildOpenAIResponseFormatmatches the new behaviorWiring
config.ResponseJSONSchemaandconfig.ResponseSchemaintobuildOpenAIResponseFormatin theapplication/jsonbranch ensures Responses text config is derived from either OpenAI-style JSON schema or Gemini-nativeSchema, while still exposing both originals viaExtraParams.tests/integrations/tests/test_openai.py (4)
924-938: Embedding similarity comment now matches the 0.6 thresholdThe inline comment for similar-text embeddings has been updated to “> 0.6”, matching the actual assertions for
similarity_1_2,similarity_1_3, andsimilarity_2_3. This keeps the test expectations self‑consistent.
1201-1205: PDF document keyword assertions align with actual test fixtureBoth the chat completion and Responses API “with file” tests now assert against keywords like
"hello","world","testing","pdf","file", which matches thetestingpdfcontent and should be robust across providers while remaining flexible.Also applies to: 1371-1376
1238-1242: Space‑exploration keyword checks match the promptThe
keywordslist fortest_32_responses_simple_text(space, exploration, astronaut, moon, mars, rocket, nasa, satellite) is well aligned with the “fun fact about space exploration” prompt and should work across providers without being overly strict.
1539-1543: Highermax_output_tokensfor reasoning responses is reasonableIncreasing
max_output_tokensfrom 800 to 1200 intest_38_responses_reasoninggives reasoning models more room for detailed chains of thought and summaries, consistent with the new streaming variants.
Merge activity
|

Summary
Enhanced Gemini provider to support both
responseJsonSchemaandresponseSchemafor JSON response formatting, and updated integration tests to improve reliability and coverage.Changes
responseSchemain Gemini'sbuildOpenAIResponseFormatfunctionType of change
Affected areas
How to test
Run the integration tests to verify the changes:
Breaking changes
Related issues
Improves JSON schema handling for Gemini provider responses
Security considerations
No security implications.
Checklist