-
Notifications
You must be signed in to change notification settings - Fork 186
feat: add code execution tool support for OpenAI, Anthropic and Gemini #1333
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: graphite-base/1333
Are you sure you want to change the base?
feat: add code execution tool support for OpenAI, Anthropic and Gemini #1333
Conversation
This stack of pull requests is managed by Graphite. Learn more about stacking. |
📝 WalkthroughSummary by CodeRabbitRelease Notes
✏️ Tip: You can customize this high-level summary in your review settings. WalkthroughAdds code execution tool support across OpenAI, Anthropic, and Gemini providers with container lifecycle management, response round-tripping capabilities, token counting refinements including modality breakdown, and comprehensive integration tests. Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 20
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (4)
core/schemas/responses.go (2)
266-298: Token breakdown extension looks good; comments are slightly misleading.
ToolUseTokensinResponsesResponseOutputTokenscomment says “input”.ModalityTokenCountcomment in output struct says “input”.No functional issue, but worth correcting to avoid confusion.
43-84: CreatedAt will serialize as0unless every code path populates it.The field is currently
CreatedAt int \json:"created_at"`without omitempty. Multiple response initialization paths create emptyBifrostResponsesResponse{}structures without setting this field (framework/streaming/types.go, core/providers/utils/utils.go, etc.). Some streaming handlers do checkif state.CreatedAt == 0and set it totime.Now().Unix(), but this is inconsistent. When any code path leaves CreatedAt unset, it serializes as"created_at": 0`, presenting epoch time to clients.Either make it pointer+omitempty (
CreatedAt *int \json:"created_at,omitempty"``) and set it when available, or ensure all creation sites populate it—especially in empty response handlers and streaming state resets.core/providers/gemini/responses.go (1)
1936-2209: ExecutableCode → Responses only attaches the first CodeExecutionResult; this likely drops additional outputs.
The look-ahead loopbreaks on the first result. If Gemini returns multiplecodeExecutionResultparts, only one becomes aResponsesCodeInterpreterToolCall.Outputsentry.Proposed fix (collect consecutive results)
- for j := partIdx + 1; j < len(candidate.Content.Parts); j++ { + for j := partIdx + 1; j < len(candidate.Content.Parts); j++ { nextPart := candidate.Content.Parts[j] if nextPart.CodeExecutionResult != nil { @@ msg.ResponsesToolMessage.ResponsesCodeInterpreterToolCall.Outputs = append( msg.ResponsesToolMessage.ResponsesCodeInterpreterToolCall.Outputs, @@ ) - break + continue } // Stop looking if we hit another executable code or other significant content if nextPart.ExecutableCode != nil || nextPart.Text != "" || nextPart.FunctionCall != nil { break } }core/providers/anthropic/responses.go (1)
4021-4096: Removefmt.Printlndebug logs from provider code.
This will spam stdout in production and can leak tool configuration.Proposed fix
- fmt.Println("tool", tool.Name) // Handle special tool types first if tool.Type != nil { - fmt.Println("tool type", *tool.Type) switch *tool.Type { @@ - bfToolJSON, _ := json.MarshalIndent(bifrostTool, "", " ") - fmt.Println("bifrostTool", string(bfToolJSON)) return bifrostTool
🤖 Fix all issues with AI agents
In `@core/providers/anthropic/responses.go`:
- Around line 2098-2108: The loop that propagates response.Container uses the
wrong nil-check and can nil-deref: replace the current condition with a guard
that verifies msg.Type != nil && *msg.Type ==
schemas.ResponsesMessageTypeCodeInterpreterCall and also that
msg.ResponsesToolMessage != nil and
msg.ResponsesToolMessage.ResponsesCodeInterpreterToolCall != nil before
assigning; then set
outputMessages[i].ResponsesToolMessage.ResponsesCodeInterpreterToolCall.ContainerID
= response.Container.ID and .ExpiresAt = &response.Container.ExpiresAt only
after these checks to avoid panics.
- Around line 3306-3327: The bifrost message created for code-interpreter
tool_use is missing the CallID so downstream tool-result matching (which checks
msg.ResponsesToolMessage.CallID) fails; set
bifrostMsg.ResponsesToolMessage.CallID = block.CallID (or the appropriate block
field holding the call id) when building the ResponsesToolMessage in the code
path that creates bifrostMsg (inside the if block where block.Name ==
AnthropicToolNameBashCodeExecution), ensuring the CallID is propagated alongside
Code and Outputs so the tool-result handler can match correctly.
- Around line 3876-3969: convertBifrostCodeInterpreterCallToAnthropicBlocks
currently requires msg.ID and ignores the tool call CallID and the
CodeInterpreter Language and leading whitespace; change linkage to use
msg.ResponsesToolMessage.CallID as the primary ID source (fall back to msg.ID
only if CallID is nil) for server_tool_use.ID and
bash_code_execution_tool_result.ToolUseID, and when wrapping code use
strings.TrimSpace on code and consult codeInterpreter.Language to decide wrapper
(e.g., if Language == "python" wrap with python3 heredoc, if "bash" leave as-is,
otherwise choose a sensible default or no wrap); update references in the
function to use CallID where building serverToolUseBlock and bashResultBlock and
ensure trimming of leading whitespace before prefix checks.
- Around line 4091-4095: The mapping for AnthropicToolTypeCodeExecution should
not set the Name field because CodeInterpreter is a singleton infra tool; update
the block that returns a *schemas.ResponsesTool for
AnthropicToolTypeCodeExecution to omit the Name assignment (leave only Type:
schemas.ResponsesToolTypeCodeInterpreter), removing any reference to tool.Name
so the fixed provider name is used instead.
In `@core/providers/anthropic/types.go`:
- Around line 166-182: The MarshalJSON validation incorrectly only errors when
all three fields are set; change it to count non-nil fields (ContentStr,
ContentBlock, ContentBlocks) and return an error if more than one is non-nil.
After that, simplify the marshaling branches to explicitly handle each non-nil
case (if ContentStr != nil -> marshal *ContentStr; else if ContentBlock != nil
-> marshal *ContentBlock; else if ContentBlocks != nil -> marshal ContentBlocks)
and otherwise marshal nil; keep references to AnthropicContent.MarshalJSON,
ContentStr, ContentBlock, and ContentBlocks so the logic enforces mutual
exclusivity and never silently falls through.
In `@core/providers/gemini/responses.go`:
- Around line 149-152: The code assigns a pointer to response.ResponseID
(bifrostResp.ID = &response.ResponseID), which retains the whole response on the
heap; instead copy the string into a new pointer using the project helper
(schemas.Ptr) so only the string is retained. Replace the direct address-of
usage in the mapping code that sets bifrostResp.ID (where response.ResponseID is
checked) with a call to schemas.Ptr(response.ResponseID) so you return a pointer
to a standalone string rather than &response.ResponseID.
- Around line 335-398: The code that maps code-interpreter return codes to
outcomes currently sets any non-zero logs.ReturnCode to OutcomeFailed; update
the mapping in the code path inside the ResponsesCodeInterpreterToolCall
handling (where OutcomeOK and OutcomeFailed are used) to special-case return
code 124 and set outcome = OutcomeDeadlineExceeded when logs.ReturnCode != nil
&& *logs.ReturnCode == 124, otherwise keep OutcomeFailed for other non-zero
codes (i.e., if logs.ReturnCode != nil { if *logs.ReturnCode == 124 { outcome =
OutcomeDeadlineExceeded } else if *logs.ReturnCode != 0 { outcome =
OutcomeFailed } }). Ensure you reference OutcomeOK, OutcomeFailed,
OutcomeDeadlineExceeded and logs.ReturnCode when making the change so the
reverse mapping remains consistent.
In `@core/providers/gemini/utils.go`:
- Around line 486-503: Loop over metadata.CandidatesTokensDetails may contain
nil elements; before accessing detail.Modality or detail.TokenCount in the block
that builds outputDetails and sets usage.OutputTokensDetails fields, add a
nil-check (skip iteration if detail == nil). Apply the same defensive check
pattern used for PromptTokensDetails in count_tokens.go, and also review and add
similar nil guards for the other token-detail loops handling Candidates/Prompt/…
around the other sections (the blocks previously at lines ~437, ~455, ~475) to
avoid panics when elements are nil.
In `@tests/integrations/python/tests/test_anthropic.py`:
- Around line 2323-2443: The class-scoped fixture and tests misuse the
integration helpers: remove the class-level anthropic_client fixture and reuse
the existing module-level anthropic_client fixture (symbol: anthropic_client)
which calls get_api_key("anthropic"); add the decorator
`@skip_if_no_api_key`("anthropic") to each test method (skip_if_no_api_key);
import os at the top if you still use os.getenv for base_url, otherwise remove
os usage; and change the print statements that use f-strings with no
placeholders to regular strings (replace prints like f"✓ ... test passed!" with
plain string literals). Ensure all references to get_api_key use the lowercase
"anthropic".
In `@tests/integrations/python/tests/test_google.py`:
- Around line 2703-2724: The google_genai_client fixture is broken: call
get_api_key with provider name "google" (get_api_key("google")) instead of the
env var name; remove the inline use of skip_if_no_api_key (it's a decorator) and
instead apply `@skip_if_no_api_key`("google") to the test(s) or class; pass an
HttpOptions instance to genai.Client (e.g., HttpOptions(api_version="v1alpha",
base_url=...)) rather than a plain dict and avoid mutating client._base_url
after creation—provide the base URL via the HttpOptions or client constructor
parameters when creating genai.Client; keep references to fixture
google_genai_client, function get_api_key(), decorator skip_if_no_api_key,
class/constructor genai.Client, and HttpOptions to locate the changes.
In `@tests/integrations/python/tests/test_openai.py`:
- Around line 3045-3156: Insert the same "skip when unconfigured" guard at the
top of each new test (test_64_code_execution_math,
test_65_code_execution_data_analysis, test_66_code_execution_equation_solving):
check if provider == "_no_providers_" or model == "_no_model_" and call
pytest.skip("No providers configured for this scenario") before any client setup
or API calls so the tests behave like the other parameterized tests when no
providers are configured.
- Around line 3048-3065: In test_64_code_execution_math, replace the raw model
argument passed to client.chat.completions.create with the formatted model by
calling format_provider_model(provider, model) so it matches other tests (keep
the tools/extra_body/extra_query logic intact); also address the unused
test_config parameter by renaming it to _test_config (or removing it) in the
test signature to satisfy static analysis and test framework conventions.
- Around line 3086-3103: In test_65_code_execution_data_analysis, remove the
unused test_config parameter from the function signature, ensure the model is
normalized by calling format_provider_model(model, provider) before using it
(e.g., assign back to model), and fix the stray f-string by replacing the
f-prefix with a plain string or adding the intended placeholder so no f-string
is used without interpolation; update references in the
client.chat.completions.create call to use the normalized model variable.
- Around line 3124-3141: In test_66_code_execution_equation_solving, remove the
unused test_config parameter from the function signature (or use it if
intended), call format_provider_model(provider, model) and assign its result
back to model before calling client.chat.completions.create so the model is
normalized, and fix the stray f-string (remove the leading f from any string
literal that has no placeholders or add the correct {placeholder}) to avoid
invalid formatting; adjust references to model/provider in the call (e.g.,
model=model and extra_body={"provider": provider} if not vk_enabled)
accordingly.
- Around line 3063-3065: The test passes redundant provider info by supplying
extra_body={"provider": provider} and extra_query={"vk":"true"} in the code
execution test call; remove these explicit extras and rely on
get_provider_openai_client to configure the provider (API key/base URL) and
x-bf-vk header when vk_enabled=True, or if you intentionally need them for this
specific code-execution path, add a short comment explaining why; update the
call site that uses extra_body and extra_query (the code-execution test
invocation) to stop passing those parameters unless a documented, test-specific
reason exists.
🧹 Nitpick comments (10)
core/providers/gemini/utils.go (1)
463-471: Non-deterministic ordering when merging modality counts.Iterating over the
modalityMapproduces non-deterministic slice ordering. This is functionally correct but could cause test flakiness if tests compare exact output. Consider sorting by modality name if deterministic output is needed for testing or caching.♻️ Optional fix for deterministic ordering
// Convert back to slice details := make([]schemas.ModalityTokenCount, 0, len(modalityMap)) + // Sort modalities for deterministic output + modalities := make([]string, 0, len(modalityMap)) + for modality := range modalityMap { + modalities = append(modalities, modality) + } + sort.Strings(modalities) + for _, modality := range modalities { + details = append(details, schemas.ModalityTokenCount{ + Modality: modality, + TokenCount: modalityMap[modality], + }) + } - for modality, count := range modalityMap { - details = append(details, schemas.ModalityTokenCount{ - Modality: modality, - TokenCount: count, - }) - } usage.InputTokensDetails.ModalityTokenCount = detailstests/integrations/python/tests/test_google.py (1)
2700-2735: Consider adding skip decorator and using config-based model selection.The new test class lacks the
@skip_if_no_api_key("google")decorator that other Gemini-specific tests use (e.g., line 2613). This will cause failures in CI environments without the API key.Additionally, other tests in this file use
get_model("google", "chat")or parametrization for model selection, which provides flexibility. Consider whether these tests should follow that pattern.Suggested improvement
+@skip_if_no_api_key("gemini") class TestGoogleCodeExecution: """Tests for code execution tool with Google Gemini SDK"""And for model selection:
response = client.models.generate_content( - model="gemini-2.0-flash", + model=get_model("google", "code_execution") or "gemini-2.0-flash",tests/integrations/python/tests/test_openai.py (1)
3077-3081: Remove extraneousfprefix from strings without placeholders.Per static analysis hints, lines 3080, 3118, and 3155 have f-strings without any placeholders. This is a minor issue but should be cleaned up for consistency.
Proposed fix
- print(f"✓ Code execution (math) test passed!") + print("✓ Code execution (math) test passed!") print(f" Response: {message.content[:200]}...")Apply similar changes to lines 3118 and 3155.
core/schemas/responses.go (1)
808-887: Code interpreter metadata additions look reasonable; consider omitempty for provider-specific fields.
ExpiresAt(Anthropic) and possiblyCode/Outputsmay be absent depending on provider; withoutomitemptythey’ll serialize asnull. If the intent is “only present when provided”, addomitemptyon provider-specific fields.core/providers/gemini/types.go (2)
67-133: Custom contents (un)marshal looks good; avoid stale slice data on reuse.Consider setting
g.Contents = nilbefore processingaux.Contents(and similarlyc.Parts = nilin Content.UnmarshalJSON) so unmarshaling into a reused struct can’t retain prior values when the JSON omits the field.
1061-1116: Custom parts (un)marshal looks good; same “stale data on reuse” nit.core/providers/gemini/responses.go (3)
461-507: Avoid aliasing the same slice forPromptTokensDetailsandToolUsePromptTokensDetailsunless intentional.
Both fields point todetails; if any downstream code mutates one slice, it mutates both.Safer approach (two slices)
- geminiResp.UsageMetadata.PromptTokensDetails = details - // Also set ToolUsePromptTokensDetails to the same value - // Gemini might use either field depending on context - geminiResp.UsageMetadata.ToolUsePromptTokensDetails = details + geminiResp.UsageMetadata.PromptTokensDetails = details + // Gemini might use either field depending on context; avoid aliasing. + toolDetails := make([]*ModalityTokenCount, len(details)) + copy(toolDetails, details) + geminiResp.UsageMetadata.ToolUsePromptTokensDetails = toolDetails
1882-1888: Deduplicate CodeExecution tool emission.
Iftoolscontains multiple entries withCodeExecution != nil, this appends multiplecode_interpretertools.Proposed fix
func convertGeminiToolsToResponsesTools(tools []Tool) []schemas.ResponsesTool { var responsesTools []schemas.ResponsesTool + var addedCodeExecution bool for _, tool := range tools { @@ - if tool.CodeExecution != nil { + if tool.CodeExecution != nil && !addedCodeExecution { responsesTool := schemas.ResponsesTool{ Type: schemas.ResponsesToolTypeCodeInterpreter, } responsesTools = append(responsesTools, responsesTool) + addedCodeExecution = true } }
2418-2426: Tool type comparison uses raw string"function"; prefer the typed constant for consistency.
This avoids drift if constants change and improves readability.Proposed fix
- if tool.Type == "function" { + if tool.Type == schemas.ResponsesToolTypeFunction {core/providers/anthropic/responses.go (1)
2178-2197: Format validation forExpiresAtis already documented; consider aligning with batch timestamp handling.The schema comment already documents
ExpiresAtas an ISO 8601 timestamp from Anthropic. However, note that batch processing usesparseAnthropicTimestamp()for explicit validation/parsing, while container extraction here only checks for empty strings. For consistency, consider either parsing the timestamp to int64 (as done in batch.go) or adding format validation before assignment.
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (15)
core/changelog.mdcore/providers/anthropic/responses.gocore/providers/anthropic/types.gocore/providers/gemini/count_tokens.gocore/providers/gemini/responses.gocore/providers/gemini/responses_test.gocore/providers/gemini/types.gocore/providers/gemini/utils.gocore/providers/utils/utils.gocore/schemas/responses.gotests/integrations/python/config.ymltests/integrations/python/tests/test_anthropic.pytests/integrations/python/tests/test_google.pytests/integrations/python/tests/test_openai.pytransports/changelog.md
🧰 Additional context used
📓 Path-based instructions (1)
**
⚙️ CodeRabbit configuration file
always check the stack if there is one for the current PR. do not give localized reviews for the PR, always see all changes in the light of the whole stack of PRs (if there is a stack, if there is no stack you can continue to make localized suggestions/reviews)
Files:
tests/integrations/python/config.ymlcore/providers/gemini/utils.gocore/providers/utils/utils.gocore/providers/gemini/responses_test.gotransports/changelog.mdcore/providers/gemini/count_tokens.gotests/integrations/python/tests/test_openai.pycore/changelog.mdcore/schemas/responses.gocore/providers/anthropic/types.gotests/integrations/python/tests/test_google.pycore/providers/gemini/types.gotests/integrations/python/tests/test_anthropic.pycore/providers/gemini/responses.gocore/providers/anthropic/responses.go
🧠 Learnings (15)
📚 Learning: 2026-01-14T04:32:14.023Z
Learnt from: Radheshg04
Repo: maximhq/bifrost PR: 1312
File: tests/integrations/python/config.yml:170-171
Timestamp: 2026-01-14T04:32:14.023Z
Learning: Guideline: In HuggingFace provider configuration, follow the provider-prefixed model path format fal-ai/<namespace>/<model> where the first segment fal-ai/ is the provider routing prefix and the remainder is the actual model path on fal.ai. Do not double-include the provider prefix. Example: fal-ai/flux/dev is the actual model path; when combined with the provider prefix you get fal-ai/fal-ai/flux/dev, which correctly indicates provider + model path.
Applied to files:
tests/integrations/python/config.yml
📚 Learning: 2025-12-09T17:07:42.007Z
Learnt from: qwerty-dvorak
Repo: maximhq/bifrost PR: 1006
File: core/schemas/account.go:9-18
Timestamp: 2025-12-09T17:07:42.007Z
Learning: In core/schemas/account.go, the HuggingFaceKeyConfig field within the Key struct is currently unused and reserved for future Hugging Face inference endpoint deployments. Do not flag this field as missing from OpenAPI documentation or require its presence in the API spec until the feature is actively implemented and used. When the feature is added, update the OpenAPI docs accordingly; otherwise, treat this field as non-breaking and not part of the current API surface.
Applied to files:
core/providers/gemini/utils.gocore/providers/utils/utils.gocore/providers/gemini/responses_test.gocore/providers/gemini/count_tokens.gocore/schemas/responses.gocore/providers/anthropic/types.gocore/providers/gemini/types.gocore/providers/gemini/responses.gocore/providers/anthropic/responses.go
📚 Learning: 2025-12-29T11:54:55.836Z
Learnt from: akshaydeo
Repo: maximhq/bifrost PR: 1153
File: framework/configstore/rdb.go:2221-2246
Timestamp: 2025-12-29T11:54:55.836Z
Learning: In Go reviews, do not flag range-over-int patterns like for i := range n as compile-time errors, assuming Go 1.22+ semantics. Only flag actual range-capable values (slices, arrays, maps, channels, strings) and other compile-time issues. This applies to all Go files across the repository.
Applied to files:
core/providers/gemini/utils.gocore/providers/utils/utils.gocore/providers/gemini/responses_test.gocore/providers/gemini/count_tokens.gocore/schemas/responses.gocore/providers/anthropic/types.gocore/providers/gemini/types.gocore/providers/gemini/responses.gocore/providers/anthropic/responses.go
📚 Learning: 2026-01-14T04:40:11.480Z
Learnt from: Pratham-Mishra04
Repo: maximhq/bifrost PR: 1312
File: framework/modelcatalog/pricing.go:276-426
Timestamp: 2026-01-14T04:40:11.480Z
Learning: In the Bifrost codebase, ImageUsage and other usage types guarantee that TotalTokens is populated (computed as InputTokens + OutputTokens if providers don’t supply TotalTokens). Reviewers can rely on this invariant and should not assume TotalTokens may be missing when input/output tokens exist. When implementing tiering logic or token-based decisions, you can safely use TotalTokens without extra null/zero guards, provided you’re in a context where InputTokens and OutputTokens are present. If a branch might discard tokens, ensure the invariant is preserved or add explicit checks only where the inputs are confirmed to be valid.
Applied to files:
core/providers/gemini/utils.gocore/providers/utils/utils.gocore/providers/gemini/responses_test.gocore/providers/gemini/count_tokens.gocore/schemas/responses.gocore/providers/anthropic/types.gocore/providers/gemini/types.gocore/providers/gemini/responses.gocore/providers/anthropic/responses.go
📚 Learning: 2026-01-14T13:30:28.760Z
Learnt from: Radheshg04
Repo: maximhq/bifrost PR: 1326
File: plugins/semanticcache/test_utils.go:545-559
Timestamp: 2026-01-14T13:30:28.760Z
Learning: In the maximhq/bifrost repository, prefer using bifrost.Ptr() to create pointers instead of the address operator (&) even when & would be valid syntactically. Apply this consistently across all code paths, including test utilities, to improve consistency and readability. Replace occurrences of &value where a *T is expected with bifrost.Ptr(value) (or an equivalent call) and ensure the function is in scope and used correctly for the target pointer type.
Applied to files:
core/providers/gemini/utils.gocore/providers/utils/utils.gocore/providers/gemini/responses_test.gocore/providers/gemini/count_tokens.gocore/schemas/responses.gocore/providers/anthropic/types.gocore/providers/gemini/types.gocore/providers/gemini/responses.gocore/providers/anthropic/responses.go
📚 Learning: 2025-12-19T09:26:54.961Z
Learnt from: qwerty-dvorak
Repo: maximhq/bifrost PR: 1006
File: core/providers/utils/utils.go:1050-1051
Timestamp: 2025-12-19T09:26:54.961Z
Learning: Update streaming end-marker handling so HuggingFace is treated as a non-[DONE] provider for backends that do not emit a DONE marker (e.g., meta llama on novita). In core/providers/utils/utils.go, adjust ProviderSendsDoneMarker() (or related logic) to detect providers that may not emit DONE and avoid relying on DONE as the sole end signal. Add tests to cover both DONE-emitting and non-DONE backends, with clear documentation in code comments explaining the rationale and any fallback behavior.
Applied to files:
core/providers/gemini/utils.gocore/providers/utils/utils.gocore/providers/gemini/responses_test.gocore/providers/gemini/count_tokens.gocore/providers/anthropic/types.gocore/providers/gemini/types.gocore/providers/gemini/responses.gocore/providers/anthropic/responses.go
📚 Learning: 2026-01-10T11:27:47.535Z
Learnt from: Radheshg04
Repo: maximhq/bifrost PR: 1256
File: core/providers/openai/openai.go:2276-2385
Timestamp: 2026-01-10T11:27:47.535Z
Learning: Validate image generation requests for nil and missing prompts before dispatch. Follow the same pattern used here: core/bifrost.go validates nil/empty prompts, providerUtils.CheckContextAndGetRequestBody returns a structured error when the request converter yields nil, and apply this across all providers (including OpenAI) to avoid sending null bodies.
Applied to files:
core/providers/gemini/utils.gocore/providers/utils/utils.gocore/providers/gemini/responses_test.gocore/providers/gemini/count_tokens.gocore/providers/anthropic/types.gocore/providers/gemini/types.gocore/providers/gemini/responses.gocore/providers/anthropic/responses.go
📚 Learning: 2026-01-14T10:53:44.658Z
Learnt from: Pratham-Mishra04
Repo: maximhq/bifrost PR: 1326
File: core/providers/gemini/gemini.go:1679-1754
Timestamp: 2026-01-14T10:53:44.658Z
Learning: Validate image generation inputs in core/bifrost.go before invoking any provider handler. Ensure in all provider implementations (e.g., core/providers/gemini/gemini.go) that the request and request.Input are non-nil before use, to prevent nil dereferences and provide clear error handling. Apply this invariant broadly to all providers and add tests for nil input scenarios.
Applied to files:
core/providers/gemini/utils.gocore/providers/utils/utils.gocore/providers/gemini/responses_test.gocore/providers/gemini/count_tokens.gocore/providers/anthropic/types.gocore/providers/gemini/types.gocore/providers/gemini/responses.gocore/providers/anthropic/responses.go
📚 Learning: 2026-01-10T15:52:10.995Z
Learnt from: akshaydeo
Repo: maximhq/bifrost PR: 1298
File: core/providers/utils/utils.go:983-1006
Timestamp: 2026-01-10T15:52:10.995Z
Learning: In Go with fasthttp v1.68.0, Response.BodyStream() returns an io.Reader, but the concrete type also implements io.Closer. In reviews, consider adding a safe type assertion pattern to close the stream when possible, e.g. if closer, ok := bodyStream.(io.Closer); ok { if err := closer.Close(); err != nil { // handle } }. This check should be applied in all code paths where BodyStream() is consumed, not only in this specific file, to ensure proper resource cleanup when the underlying reader supports closing.
Applied to files:
core/providers/utils/utils.go
📚 Learning: 2025-12-15T10:16:21.909Z
Learnt from: qwerty-dvorak
Repo: maximhq/bifrost PR: 1006
File: core/providers/huggingface/huggingface_test.go:12-63
Timestamp: 2025-12-15T10:16:21.909Z
Learning: In provider tests under core/providers/<provider>/*_test.go, do not require or flag the use of defer for Shutdown(); instead call client.Shutdown() at the end of each test function. This pattern appears consistent across all provider tests. Apply this rule only within this path; for other tests or resources, defer may still be appropriate.
Applied to files:
core/providers/gemini/responses_test.go
📚 Learning: 2026-01-13T17:10:07.064Z
Learnt from: Radheshg04
Repo: maximhq/bifrost PR: 1312
File: tests/integrations/python/tests/test_openai.py:1166-1258
Timestamp: 2026-01-13T17:10:07.064Z
Learning: In tests under tests/integrations/python, prefer using the OpenAI image generation model 'gpt-image-1' via the config key providers.openai.image_generation for image-generation scenarios. This avoids DALLE-3 parameter limitations (e.g., n>1, quality/size combos). Ensure tests reference this provider in mocks/fixtures and document why this choice is used for test determinism.
Applied to files:
tests/integrations/python/tests/test_openai.pytests/integrations/python/tests/test_google.pytests/integrations/python/tests/test_anthropic.py
📚 Learning: 2026-01-11T14:08:10.341Z
Learnt from: Pratham-Mishra04
Repo: maximhq/bifrost PR: 1298
File: core/providers/anthropic/anthropic.go:682-699
Timestamp: 2026-01-11T14:08:10.341Z
Learning: In Anthroplic streaming implementations (and analogous providers), ensure that the final 'summary' chunk, which carries usage information and metadata, is emitted after all delta chunks and uses a chunk index of last_delta_index + 1. This differentiates the summary chunk from content delta chunks. Apply this convention consistently in the anthropic provider code and in similar streaming providers, and consider adding a targeted test to assert the ordering and chunk index logic.
Applied to files:
core/providers/anthropic/types.gocore/providers/anthropic/responses.go
📚 Learning: 2026-01-13T13:36:35.221Z
Learnt from: TejasGhatte
Repo: maximhq/bifrost PR: 1319
File: core/providers/anthropic/responses.go:937-937
Timestamp: 2026-01-13T13:36:35.221Z
Learning: In core/providers/anthropic/responses.go, when handling Anthropic API streaming responses, ensure that content_block_start events include a signature field set to an empty string (e.g., contentBlock.Signature = ""). The actual signature is delivered later via signature_delta events. This behavior is per Anthropic's specification and should not be treated as an error. This guideline should apply to all Anthropic response handling files under core/providers/anthropic/ and similar go files that process streaming blocks.
Applied to files:
core/providers/anthropic/types.gocore/providers/anthropic/responses.go
📚 Learning: 2026-01-14T06:57:42.750Z
Learnt from: TejasGhatte
Repo: maximhq/bifrost PR: 1319
File: core/providers/anthropic/types.go:248-336
Timestamp: 2026-01-14T06:57:42.750Z
Learning: For Anthropic citation types (page_location, char_location, content_block_location), ensure there is an optional string field file_id to reference uploaded files. Update the Go structs modeling these citations to include FileID *string (or string with omitempty) and document its optionality in comments, so code consuming these types can handle absence of file_id gracefully.
Applied to files:
core/providers/anthropic/types.gocore/providers/anthropic/responses.go
📚 Learning: 2026-01-14T06:57:42.750Z
Learnt from: TejasGhatte
Repo: maximhq/bifrost PR: 1319
File: core/providers/anthropic/types.go:248-336
Timestamp: 2026-01-14T06:57:42.750Z
Learning: In core/providers/anthropic/types.go, ensure the web_search_result_location citation type includes a string field named 'url' alongside the existing fields 'encrypted_index', 'title', and 'cited_text'. If the field is missing, add it with type string and appropriate struct tags (e.g., json and/or db tags) and update any related serialization or usage accordingly.
Applied to files:
core/providers/anthropic/types.gocore/providers/anthropic/responses.go
🧬 Code graph analysis (8)
core/providers/gemini/utils.go (2)
core/providers/gemini/types.go (2)
ModalityTokenCount(1488-1493)Modality(799-799)core/schemas/responses.go (1)
ModalityTokenCount(279-282)
core/providers/gemini/responses_test.go (3)
core/providers/gemini/types.go (13)
GenerateContentResponse(1524-1540)Candidate(1449-1475)Content(1062-1070)Part(1122-1146)ExecutableCode(1309-1314)CodeExecutionResult(1282-1288)Outcome(1291-1291)OutcomeOK(1297-1297)Role(25-25)FinishReason(34-34)GenerateContentResponseUsageMetadata(1496-1521)Type(871-871)FunctionCall(1328-1338)core/schemas/responses.go (5)
ResponsesMessageTypeReasoning(324-324)ResponsesReasoning(776-779)ResponsesMessageTypeCodeInterpreterCall(314-314)ResponsesToolMessage(497-517)ResponsesCodeInterpreterToolCall(809-815)core/providers/gemini/responses.go (1)
ToGeminiResponsesResponse(178-511)
core/providers/gemini/count_tokens.go (2)
core/providers/gemini/types.go (2)
ModalityTokenCount(1488-1493)Modality(799-799)core/schemas/responses.go (1)
ModalityTokenCount(279-282)
tests/integrations/python/tests/test_openai.py (2)
tests/integrations/python/tests/utils/parametrize.py (1)
get_cross_provider_params_with_vk_for_scenario(50-101)tests/integrations/python/tests/utils/common.py (1)
assert_valid_chat_response(815-847)
core/schemas/responses.go (3)
core/providers/gemini/types.go (2)
ModalityTokenCount(1488-1493)Modality(799-799)core/schemas/json_native.go (2)
Marshal(8-10)Unmarshal(18-20)core/schemas/json_wasm.go (2)
Marshal(8-10)Unmarshal(22-24)
core/providers/anthropic/types.go (1)
ui/lib/types/logs.ts (1)
ContentBlock(93-104)
tests/integrations/python/tests/test_google.py (2)
tests/integrations/python/tests/utils/common.py (2)
get_api_key(1805-1827)skip_if_no_api_key(1830-1841)core/providers/gemini/types.go (1)
Tool(693-720)
tests/integrations/python/tests/test_anthropic.py (3)
tests/integrations/python/tests/utils/common.py (2)
get_api_key(1805-1827)skip_if_no_api_key(1830-1841)core/schemas/bifrost.go (1)
Anthropic(35-35)tests/integrations/python/tests/test_google.py (3)
test_code_execution_math(2726-2761)test_code_execution_data_analysis(2763-2798)test_code_execution_equation_solving(2800-2835)
🪛 Ruff (0.14.11)
tests/integrations/python/tests/test_openai.py
3048-3048: Unused method argument: test_config
(ARG002)
3080-3080: f-string without any placeholders
Remove extraneous f prefix
(F541)
3086-3086: Unused method argument: test_config
(ARG002)
3118-3118: f-string without any placeholders
Remove extraneous f prefix
(F541)
3124-3124: Unused method argument: test_config
(ARG002)
3155-3155: f-string without any placeholders
Remove extraneous f prefix
(F541)
tests/integrations/python/tests/test_google.py
2760-2760: f-string without any placeholders
Remove extraneous f prefix
(F541)
2797-2797: f-string without any placeholders
Remove extraneous f prefix
(F541)
2834-2834: f-string without any placeholders
Remove extraneous f prefix
(F541)
tests/integrations/python/tests/test_anthropic.py
2330-2330: Undefined name skip_if_no_api_key
(F821)
2334-2334: Undefined name os
(F821)
2370-2370: f-string without any placeholders
Remove extraneous f prefix
(F541)
2406-2406: f-string without any placeholders
Remove extraneous f prefix
(F541)
2442-2442: f-string without any placeholders
Remove extraneous f prefix
(F541)
⏰ Context from checks skipped due to timeout of 900000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (12)
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
🔇 Additional comments (18)
core/changelog.md (1)
6-7: LGTM!Changelog entries accurately reflect the PR scope and follow the existing format.
transports/changelog.md (1)
6-7: LGTM!Changelog entries are consistent with core/changelog.md and follow the established format.
core/providers/gemini/count_tokens.go (3)
19-45: LGTM!The modality token count conversion logic is well-structured:
- Correctly preserves original modality casing in the output while using lowercase for comparison
- Properly handles nil entries in the slice
- Accumulates both aggregate counts and modality-specific breakdowns
58-61: Verify additive audio token counting from cache.When
CacheTokensDetailscontains audio modality tokens, they are added toAudioTokenson top of any tokens already counted fromPromptTokensDetails. Confirm this additive behavior is intentional and won't cause double-counting if the same audio tokens appear in both sources.
78-96: LGTM!The reverse conversion is clean and handles the optional CachedTokens field appropriately.
core/providers/gemini/utils.go (1)
425-444: LGTM!The simplified cached tokens assignment and new tool use token handling are well-implemented. The modality breakdown for tool use tokens correctly converts provider-specific types to the Bifrost schema.
tests/integrations/python/config.yml (1)
187-187: LGTM!Code execution capability is correctly enabled for OpenAI, Anthropic, and Gemini providers, with the scenario properly mapped to the "tools" capability type. This aligns with the PR's scope.
Also applies to: 251-251, 294-294, 463-463
core/providers/utils/utils.go (1)
402-424: LGTM - Improved error response handling.The refactored error body handling now:
- Explicitly sets
RawResponsetonilfor empty responses (line 402)- Defers unmarshalling with a string fallback (lines 407-412) ensuring
RawResponseis never nil for non-empty bodies- Uses the same
rawErrorResponsefor the successful JSON parsing pathThis provides consistent behavior and supports code execution scenarios where error payloads may contain varied content types.
core/providers/gemini/responses_test.go (2)
14-147: LGTM - Comprehensive round-trip test for code execution.The test thoroughly validates the bidirectional conversion between Gemini and Bifrost formats:
- Creates a realistic Gemini response with
executableCode,codeExecutionResult, andtextparts- Validates thought signatures are preserved in the correct locations
- Ensures no standalone
thoughtSignatureparts are created- Verifies usage metadata round-trips correctly
Minor: Line 128's assertion is effectively redundant with line 122 (both assert
part0.ExecutableCodeis not nil).
149-254: LGTM - Well-structured test using real Gemini response data.This test provides excellent coverage by:
- Using a realistic JSON response from the Gemini API (based on actual bug report scenario)
- Validating the original response structure before conversion
- Ensuring the round-trip preserves all critical data including thought signatures
- The loop at lines 246-252 properly validates that no standalone
thoughtSignatureparts are created, which was likely the bug being fixedThe inline JSON is lengthy but serves as valuable documentation of the expected Gemini response format.
core/providers/anthropic/types.go (4)
215-229: LGTM - New content block type constants.The new constants
BashCodeExecutionToolResultandBashCodeExecutionResultare properly defined and follow the existing naming conventions.
252-254: LGTM - New fields for bash code execution results.The new
StdOut,StdErr, andReturnCodefields are correctly typed as pointers (optional) with appropriate JSON tags matching Anthropic's API conventions (stdout,stderr,return_code).
379-385: LGTM - New tool name constants.The new
AnthropicToolNameBashCodeExecutionandAnthropicToolNameCodeExecutionconstants are well-defined and follow the existing pattern.
472-472: LGTM - Container support in responses.The new
Containerfield onAnthropicMessageResponseand theAnthropicContainertype are well-structured. TheExpiresAtfield correctly uses a string type to handle ISO 8601 timestamps, consistent with how other timestamp fields are handled in this file (e.g.,parseAnthropicFileTimestamp).Also applies to: 497-500
tests/integrations/python/tests/test_anthropic.py (1)
2310-2321: validate_cache_read(): change looks OK.core/providers/anthropic/responses.go (2)
4246-4251: The tool type and name constants are correctly configured.AnthropicToolTypeCodeExecutionresolves to"code_execution_20250825"andAnthropicToolNameCodeExecutionto"code_execution", both matching the Anthropic API specification.
2627-2661: Ordering is correct: thinking blocks properly precede tool_use blocks.The implementation correctly prepends
pendingReasoningContentBlocksbeforependingToolCalls, ensuring thinking blocks appear first in the assistant message as required by Anthropic's API. This ordering (thinking → tool_use) is mandatory when extended reasoning is enabled, and the code handles it consistently across all tool-call types (FunctionCall, WebSearchCall, CodeInterpreterCall).core/providers/gemini/responses.go (1)
168-333: Clarify role handling in candidate grouping — user-role messages inbifrostResp.Outputwould produce invalid Gemini responses.The code does check for and handle
role == "user"(line 212), creating candidates with role "user". However, in the standard response conversion flow (Gemini →convertGeminiCandidatesToResponsesOutput→bifrostResp.Output), the helper function only ever emitsResponsesInputMessageRoleAssistant, so user-role messages should never appear here.If external code populates
bifrostResp.Outputwith user-role messages and passes them toToGeminiResponsesResponse, this will produce invalid Gemini response candidates. Consider either:
- Explicitly filtering out or rejecting user-role messages in this function with an error or log warning
- Documenting the contract that
bifrostResp.Outputmust only contain assistant-role messages when passed to this function
✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.
| // Add container ID and container expire at using the Anthropic Response | ||
| if response.Container != nil { | ||
| // Find the corresponding code interpreter call by type=code_interpreter_call | ||
| for i, msg := range outputMessages { | ||
| if msg.Type != nil && *msg.Type == schemas.ResponsesMessageTypeCodeInterpreterCall && msg.ResponsesCodeInterpreterToolCall != nil { | ||
| outputMessages[i].ResponsesToolMessage.ResponsesCodeInterpreterToolCall.ContainerID = response.Container.ID | ||
| outputMessages[i].ResponsesToolMessage.ResponsesCodeInterpreterToolCall.ExpiresAt = &response.Container.ExpiresAt | ||
| break | ||
| } | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix code-interpreter container propagation: wrong field check + possible nil deref.
The condition checks msg.ResponsesCodeInterpreterToolCall (likely not a field) but then writes through outputMessages[i].ResponsesToolMessage....
Proposed fix
// Add container ID and container expire at using the Anthropic Response
if response.Container != nil {
// Find the corresponding code interpreter call by type=code_interpreter_call
for i, msg := range outputMessages {
- if msg.Type != nil && *msg.Type == schemas.ResponsesMessageTypeCodeInterpreterCall && msg.ResponsesCodeInterpreterToolCall != nil {
+ if msg.Type != nil && *msg.Type == schemas.ResponsesMessageTypeCodeInterpreterCall &&
+ outputMessages[i].ResponsesToolMessage != nil &&
+ outputMessages[i].ResponsesToolMessage.ResponsesCodeInterpreterToolCall != nil {
outputMessages[i].ResponsesToolMessage.ResponsesCodeInterpreterToolCall.ContainerID = response.Container.ID
- outputMessages[i].ResponsesToolMessage.ResponsesCodeInterpreterToolCall.ExpiresAt = &response.Container.ExpiresAt
+ outputMessages[i].ResponsesToolMessage.ResponsesCodeInterpreterToolCall.ExpiresAt = schemas.Ptr(response.Container.ExpiresAt)
break
}
}
}🤖 Prompt for AI Agents
In `@core/providers/anthropic/responses.go` around lines 2098 - 2108, The loop
that propagates response.Container uses the wrong nil-check and can nil-deref:
replace the current condition with a guard that verifies msg.Type != nil &&
*msg.Type == schemas.ResponsesMessageTypeCodeInterpreterCall and also that
msg.ResponsesToolMessage != nil and
msg.ResponsesToolMessage.ResponsesCodeInterpreterToolCall != nil before
assigning; then set
outputMessages[i].ResponsesToolMessage.ResponsesCodeInterpreterToolCall.ContainerID
= response.Container.ID and .ExpiresAt = &response.Container.ExpiresAt only
after these checks to avoid panics.
| // Check if its code execution tool use | ||
| if block.Name != nil && *block.Name == string(AnthropicToolNameBashCodeExecution) { | ||
| bifrostMsg := schemas.ResponsesMessage{ | ||
| ID: block.ID, | ||
| Type: schemas.Ptr(schemas.ResponsesMessageTypeCodeInterpreterCall), | ||
| Status: schemas.Ptr("completed"), | ||
| ResponsesToolMessage: &schemas.ResponsesToolMessage{ | ||
| ResponsesCodeInterpreterToolCall: &schemas.ResponsesCodeInterpreterToolCall{ | ||
| Outputs: []schemas.ResponsesCodeInterpreterOutput{}, | ||
| }, | ||
| }, | ||
| } | ||
| if block.Input != nil { | ||
| if inputMap, ok := block.Input.(map[string]interface{}); ok { | ||
| if code, ok := inputMap["command"].(string); ok { | ||
| bifrostMsg.ResponsesToolMessage.ResponsesCodeInterpreterToolCall.Code = &code | ||
| } | ||
| } | ||
| } | ||
| // Set container ID and container expire at using the Anthropic Response | ||
| bifrostMessages = append(bifrostMessages, bifrostMsg) | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code-interpreter tool_use → Bifrost is missing CallID (breaks tool_result linkage).
Later, the tool-result handler matches by msg.ResponsesToolMessage.CallID, but you never set it here.
Proposed fix
if block.Name != nil && *block.Name == string(AnthropicToolNameBashCodeExecution) {
bifrostMsg := schemas.ResponsesMessage{
ID: block.ID,
Type: schemas.Ptr(schemas.ResponsesMessageTypeCodeInterpreterCall),
Status: schemas.Ptr("completed"),
ResponsesToolMessage: &schemas.ResponsesToolMessage{
+ CallID: block.ID,
ResponsesCodeInterpreterToolCall: &schemas.ResponsesCodeInterpreterToolCall{
Outputs: []schemas.ResponsesCodeInterpreterOutput{},
},
},
}
if block.Input != nil {
if inputMap, ok := block.Input.(map[string]interface{}); ok {
if code, ok := inputMap["command"].(string); ok {
bifrostMsg.ResponsesToolMessage.ResponsesCodeInterpreterToolCall.Code = &code
+ // Optional: set language explicitly if the schema supports it.
+ // bifrostMsg.ResponsesToolMessage.ResponsesCodeInterpreterToolCall.Language = schemas.Ptr("bash")
}
}
}
bifrostMessages = append(bifrostMessages, bifrostMsg)
}🤖 Prompt for AI Agents
In `@core/providers/anthropic/responses.go` around lines 3306 - 3327, The bifrost
message created for code-interpreter tool_use is missing the CallID so
downstream tool-result matching (which checks msg.ResponsesToolMessage.CallID)
fails; set bifrostMsg.ResponsesToolMessage.CallID = block.CallID (or the
appropriate block field holding the call id) when building the
ResponsesToolMessage in the code path that creates bifrostMsg (inside the if
block where block.Name == AnthropicToolNameBashCodeExecution), ensuring the
CallID is propagated alongside Code and Outputs so the tool-result handler can
match correctly.
| case AnthropicContentBlockTypeBashCodeExecutionToolResult: | ||
| // find the corresponding code interpreter call by tool_use_id | ||
| if block.ToolUseID != nil { | ||
| for i := len(bifrostMessages) - 1; i >= 0; i-- { | ||
| msg := &bifrostMessages[i] | ||
| if msg.Type != nil && *msg.Type == schemas.ResponsesMessageTypeCodeInterpreterCall && | ||
| msg.ResponsesToolMessage != nil && | ||
| msg.ResponsesToolMessage.CallID != nil && | ||
| *msg.ResponsesToolMessage.CallID == *block.ToolUseID { | ||
| codeExecutionBlock := block.Content.ContentBlock | ||
| if codeExecutionBlock == nil || codeExecutionBlock.Type != AnthropicContentBlockTypeBashCodeExecutionResult { | ||
| continue | ||
| } | ||
| // Add this result to the code interpreter call outputs | ||
| var log schemas.ResponsesCodeInterpreterOutputLogs | ||
| log.Type = "logs" | ||
| log.ReturnCode = codeExecutionBlock.ReturnCode | ||
| if codeExecutionBlock.StdOut != nil { | ||
| log.Logs = *codeExecutionBlock.StdOut | ||
| } else if codeExecutionBlock.StdErr != nil { | ||
| log.Logs = *codeExecutionBlock.StdErr | ||
| } | ||
| msg.ResponsesToolMessage.ResponsesCodeInterpreterToolCall.Outputs = append(msg.ResponsesToolMessage.ResponsesCodeInterpreterToolCall.Outputs, schemas.ResponsesCodeInterpreterOutput{ | ||
| ResponsesCodeInterpreterOutputLogs: &log, | ||
| }) | ||
| } | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tool-result attachment needs nil checks and should break after the match.
As written, it can panic (block.Content nil, or ResponsesCodeInterpreterToolCall nil) and keeps scanning even after attaching.
Proposed fix
case AnthropicContentBlockTypeBashCodeExecutionToolResult:
// find the corresponding code interpreter call by tool_use_id
if block.ToolUseID != nil {
for i := len(bifrostMessages) - 1; i >= 0; i-- {
msg := &bifrostMessages[i]
if msg.Type != nil && *msg.Type == schemas.ResponsesMessageTypeCodeInterpreterCall &&
msg.ResponsesToolMessage != nil &&
msg.ResponsesToolMessage.CallID != nil &&
*msg.ResponsesToolMessage.CallID == *block.ToolUseID {
- codeExecutionBlock := block.Content.ContentBlock
+ if block.Content == nil || block.Content.ContentBlock == nil {
+ break
+ }
+ if msg.ResponsesToolMessage.ResponsesCodeInterpreterToolCall == nil {
+ break
+ }
+ codeExecutionBlock := block.Content.ContentBlock
if codeExecutionBlock == nil || codeExecutionBlock.Type != AnthropicContentBlockTypeBashCodeExecutionResult {
- continue
+ break
}
@@
msg.ResponsesToolMessage.ResponsesCodeInterpreterToolCall.Outputs = append(msg.ResponsesToolMessage.ResponsesCodeInterpreterToolCall.Outputs, schemas.ResponsesCodeInterpreterOutput{
ResponsesCodeInterpreterOutputLogs: &log,
})
+ break
}
}
}| // convertBifrostCodeInterpreterCallToAnthropicBlocks converts a Bifrost code_interpreter_call to Anthropic server_tool_use and bash_code_execution_tool_result blocks | ||
| func convertBifrostCodeInterpreterCallToAnthropicBlocks(msg *schemas.ResponsesMessage) []AnthropicContentBlock { | ||
| if msg.ResponsesToolMessage == nil || msg.ResponsesToolMessage.ResponsesCodeInterpreterToolCall == nil { | ||
| return nil | ||
| } | ||
|
|
||
| var blocks []AnthropicContentBlock | ||
| codeInterpreter := msg.ResponsesToolMessage.ResponsesCodeInterpreterToolCall | ||
|
|
||
| // Ensure we have a valid ID for the tool use block (critical for linkage) | ||
| if msg.ID == nil { | ||
| // Cannot proceed without an ID - this would break tool_use_id linkage | ||
| return nil | ||
| } | ||
|
|
||
| // 1. Create server_tool_use block for the code interpreter | ||
| serverToolUseBlock := AnthropicContentBlock{ | ||
| Type: AnthropicContentBlockTypeServerToolUse, | ||
| Name: schemas.Ptr(string(AnthropicToolNameBashCodeExecution)), | ||
| ID: msg.ID, // Always set - required for tool result linkage | ||
| } | ||
|
|
||
| // Wrap the code in bash command format: python3 << 'EOF'\n...\nEOF\n | ||
| if codeInterpreter.Code != nil { | ||
| command := *codeInterpreter.Code | ||
| // Wrap in heredoc format if not already wrapped | ||
| if !strings.HasPrefix(command, "python") && !strings.HasPrefix(command, "bash") { | ||
| command = fmt.Sprintf("python3 << 'EOF'\n%s\nEOF\n", command) | ||
| } | ||
| input := map[string]interface{}{ | ||
| "command": command, | ||
| } | ||
| serverToolUseBlock.Input = input | ||
| } | ||
|
|
||
| blocks = append(blocks, serverToolUseBlock) | ||
|
|
||
| // 2. Create bash_code_execution_tool_result block if outputs are present | ||
| if len(codeInterpreter.Outputs) > 0 { | ||
| for _, output := range codeInterpreter.Outputs { | ||
| // Initialize stdout and stderr with empty strings (Anthropic expects these fields) | ||
| stdout := "" | ||
| stderr := "" | ||
| var returnCode *int | ||
|
|
||
| // Handle logs output | ||
| if output.ResponsesCodeInterpreterOutputLogs != nil { | ||
| logs := output.ResponsesCodeInterpreterOutputLogs | ||
| returnCode = logs.ReturnCode | ||
| if returnCode != nil { | ||
| if *returnCode == 0 { | ||
| stdout = logs.Logs | ||
| } else { | ||
| stderr = logs.Logs | ||
| } | ||
| } else { | ||
| // If return code is not present, use the logs as stdout | ||
| stdout = logs.Logs | ||
| } | ||
| } | ||
|
|
||
| // Create the bash_code_execution_result content block | ||
| // This must include type, stdout, stderr, return_code, and an empty content array | ||
| bashResultContent := AnthropicContentBlock{ | ||
| Type: AnthropicContentBlockTypeBashCodeExecutionResult, | ||
| StdOut: &stdout, | ||
| StdErr: &stderr, | ||
| ReturnCode: returnCode, | ||
| Content: &AnthropicContent{ | ||
| ContentBlocks: []AnthropicContentBlock{}, // Empty array as per Anthropic spec | ||
| }, | ||
| } | ||
|
|
||
| // Ensure we have a valid tool_use_id (critical for linkage) | ||
| if msg.ID == nil { | ||
| // Skip if no ID - this would be a hard break | ||
| continue | ||
| } | ||
|
|
||
| // Create the bash_code_execution_tool_result block | ||
| bashResultBlock := AnthropicContentBlock{ | ||
| Type: AnthropicContentBlockTypeBashCodeExecutionToolResult, | ||
| ToolUseID: msg.ID, // Must match the server_tool_use.id | ||
| Content: &AnthropicContent{ | ||
| ContentBlock: &bashResultContent, | ||
| }, | ||
| } | ||
|
|
||
| blocks = append(blocks, bashResultBlock) | ||
| } | ||
| } | ||
|
|
||
| return blocks | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🌐 Web query:
Anthropic API bash code execution tool server_tool_use input schema documentation
💡 Result:
You can find Anthropic’s documentation for the code-execution / bash tool and the server_tool_use input schema here:
- Code execution (bash + file) tool — includes response format showing "server_tool_use" blocks and the bash tool input format. [1]
- "How to implement tool use" — describes the tools array and the input_schema JSON Schema field used when defining client tools. [2]
- Bash tool reference — details the bash tool behavior, versioning, and that it is schema-less (Claude’s built-in bash tool uses a built-in schema). [3]
References:
[1] Anthropic — Code execution tool docs. (docs.anthropic.com)
[2] Anthropic — How to implement tool use. (docs.anthropic.com)
[3] Anthropic — Bash tool docs. (docs.anthropic.com)
If you want, I can extract the exact JSON examples (server_tool_use event, bash input shape) into a ready-to-run curl snippet.
Citations:
- 1: https://docs.anthropic.com/en/docs/agents-and-tools/tool-use/code-execution-tool?utm_source=openai
- 2: https://docs.anthropic.com/en/docs/agents-and-tools/tool-use/implement-tool-use?utm_source=openai
- 3: https://docs.anthropic.com/en/docs/agents-and-tools/tool-use/bash-tool?utm_source=openai
🌐 Web query:
Anthropic tool_use block format required fields tool_use_id linkage
💡 Result:
Summary — required fields & linkage
-
tool_use content block (assistant -> contains):
- type: "tool_use".
- id: a unique identifier for that tool invocation (used to link results).
- name: tool name (must match a tool defined in your request).
- input: object matching the tool's input_schema. (docs.anthropic.com)
-
tool_result content block (user -> contains results):
- type: "tool_result".
- tool_use_id: must equal the tool_use block's id (this is the linkage).
- content: the tool output (string, nested content blocks, or document blocks).
- is_error: optional boolean if execution failed. (docs.anthropic.com)
-
Important formatting rules:
- tool_result blocks must immediately follow the corresponding assistant tool_use in the message history (no intervening messages).
- In the user message containing tool_result blocks, all tool_result blocks must come first in the content array; any text must follow them. Violating this causes a 400 error. (docs.anthropic.com)
If you want, I can paste minimal example JSON for a tool_use + matching tool_result.
Citations:
- 1: https://docs.anthropic.com/en/docs/agents-and-tools/tool-use/implement-tool-use?utm_source=openai
- 2: https://docs.anthropic.com/en/docs/agents-and-tools/tool-use/implement-tool-use?utm_source=openai
- 3: https://docs.anthropic.com/en/docs/agents-and-tools/tool-use/implement-tool-use?utm_source=openai
🏁 Script executed:
# Find ResponsesToolMessage definition and CallID field
rg "type ResponsesToolMessage" -A 20Repository: maximhq/bifrost
Length of output: 4556
🏁 Script executed:
# Check ResponsesCodeInterpreterToolCall structure for Language field
rg "type ResponsesCodeInterpreterToolCall" -A 15Repository: maximhq/bifrost
Length of output: 1513
🏁 Script executed:
# Look for CallID usage in the codebase
rg "CallID" core/providers/anthropic/ -B 2 -A 2Repository: maximhq/bifrost
Length of output: 16445
Use CallID as primary ID source and respect the Language field for code wrapping.
The hard check msg.ID == nil => return nil is fragile; the code should use msg.ResponsesToolMessage.CallID as the primary source for linkage (matching how the rest of the codebase links tool_use and tool_result blocks). Additionally, the code wrapping heuristic only checks for "python" and "bash" string prefixes but ignores the Language field available in ResponsesCodeInterpreterToolCall, and does not account for leading whitespace that may affect code execution.
🤖 Prompt for AI Agents
In `@core/providers/anthropic/responses.go` around lines 3876 - 3969,
convertBifrostCodeInterpreterCallToAnthropicBlocks currently requires msg.ID and
ignores the tool call CallID and the CodeInterpreter Language and leading
whitespace; change linkage to use msg.ResponsesToolMessage.CallID as the primary
ID source (fall back to msg.ID only if CallID is nil) for server_tool_use.ID and
bash_code_execution_tool_result.ToolUseID, and when wrapping code use
strings.TrimSpace on code and consult codeInterpreter.Language to decide wrapper
(e.g., if Language == "python" wrap with python3 heredoc, if "bash" leave as-is,
otherwise choose a sensible default or no wrap); update references in the
function to use CallID where building serverToolUseBlock and bashResultBlock and
ensure trimming of leading whitespace before prefix checks.
| case AnthropicToolTypeCodeExecution: | ||
| return &schemas.ResponsesTool{ | ||
| Type: schemas.ResponsesToolTypeCodeInterpreter, | ||
| Name: &tool.Name, | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
# First, let's understand the codebase structure
git ls-files | head -20Repository: maximhq/bifrost
Length of output: 926
🏁 Script executed:
# Search for Bifrost schema definitions related to ResponsesToolTypeCodeInterpreter
fd -t f -e go | xargs grep -l "ResponsesToolTypeCodeInterpreter" | head -10Repository: maximhq/bifrost
Length of output: 192
🏁 Script executed:
# Look for the schemas package and find ResponsesToolType definitions
fd -t f -e go "schemas" | head -20Repository: maximhq/bifrost
Length of output: 41
🏁 Script executed:
# Search for CodeInterpreter in the codebase
rg "CodeInterpreter" -t go -A 3 -B 3 | head -50Repository: maximhq/bifrost
Length of output: 4122
🏁 Script executed:
# Check the file in question to see more context
wc -l core/providers/anthropic/responses.goRepository: maximhq/bifrost
Length of output: 101
🏁 Script executed:
# Look at the schemas file to find ResponsesTool definition
cat -n core/schemas/responses.go | grep -A 20 "type ResponsesTool"Repository: maximhq/bifrost
Length of output: 23535
🏁 Script executed:
# Search for ResponsesToolTypeCodeInterpreter definition in schemas
rg "ResponsesToolTypeCodeInterpreter" core/schemas/responses.go -B 5 -A 5Repository: maximhq/bifrost
Length of output: 779
🏁 Script executed:
# Get more context around the tool mapping code in the anthropic file
sed -n '4080,4120p' core/providers/anthropic/responses.goRepository: maximhq/bifrost
Length of output: 1223
🏁 Script executed:
# Search for other tool type mappings to see pattern
sed -n '4070,4150p' core/providers/anthropic/responses.goRepository: maximhq/bifrost
Length of output: 2543
🏁 Script executed:
# Check for git context about PRs
git log --oneline -20Repository: maximhq/bifrost
Length of output: 111
🏁 Script executed:
# Search for how Name field is used with CodeInterpreter
rg "ResponsesToolTypeCodeInterpreter" -A 10 -B 5 core/Repository: maximhq/bifrost
Length of output: 7235
🏁 Script executed:
# Check if Name is referenced in CodeInterpreter processing
rg "CodeInterpreter" core/providers/anthropic/responses.go | grep -i "name"Repository: maximhq/bifrost
Length of output: 41
🏁 Script executed:
# Look for how tools are used/processed to see if Name matters for CodeInterpreter
rg "tool\.Name" core/providers/anthropic/responses.go | head -20Repository: maximhq/bifrost
Length of output: 262
Remove Name field from CodeInterpreter tool mapping; it's a singleton tool without variable names.
CodeInterpreter is an infrastructure tool like LocalShell—the schema documents the Name field only for "Function" and "Custom" tools. Gemini's CodeInterpreter mapping correctly omits it, and LocalShell follows the same pattern. The fixed tool name "code_execution" comes from the provider, not from dynamic configuration.
🤖 Prompt for AI Agents
In `@core/providers/anthropic/responses.go` around lines 4091 - 4095, The mapping
for AnthropicToolTypeCodeExecution should not set the Name field because
CodeInterpreter is a singleton infra tool; update the block that returns a
*schemas.ResponsesTool for AnthropicToolTypeCodeExecution to omit the Name
assignment (leave only Type: schemas.ResponsesToolTypeCodeInterpreter), removing
any reference to tool.Name so the fixed provider name is used instead.
| @pytest.mark.parametrize( | ||
| "provider,model,vk_enabled", get_cross_provider_params_with_vk_for_scenario("code_execution") | ||
| ) | ||
| def test_64_code_execution_math(self, test_config, provider, model, vk_enabled): | ||
| """Test Case 64: Code Execution - Mathematical Computation""" | ||
| client = get_provider_openai_client(provider, vk_enabled=vk_enabled) | ||
|
|
||
| # Test solving a mathematical equation using code execution | ||
| response = client.chat.completions.create( | ||
| model=model, | ||
| messages=[ | ||
| {"role": "user", "content": "Calculate the sum of all prime numbers between 1 and 50 using Python. Show your work."} | ||
| ], | ||
| tools=[ | ||
| { | ||
| "type": "code_interpreter" | ||
| } | ||
| ], | ||
| extra_body={"provider": provider} if not vk_enabled else None, | ||
| extra_query={"vk": "true"} if vk_enabled else None | ||
| ) | ||
|
|
||
| # Validate response | ||
| assert_valid_chat_response(response) | ||
| assert response.choices, "Response should have choices" | ||
| assert len(response.choices) > 0, "Should have at least one choice" | ||
|
|
||
| message = response.choices[0].message | ||
| assert message.content, "Message should have content" | ||
|
|
||
| # The response should contain the result (sum of primes 1-50 = 328) | ||
| content_text = message.content.lower() | ||
| assert any(keyword in content_text for keyword in ["328", "prime", "sum"]), \ | ||
| f"Response should contain calculation result. Got: {message.content}" | ||
|
|
||
| print(f"✓ Code execution (math) test passed!") | ||
| print(f" Response: {message.content[:200]}...") | ||
|
|
||
| @pytest.mark.parametrize( | ||
| "provider,model,vk_enabled", get_cross_provider_params_with_vk_for_scenario("code_execution") | ||
| ) | ||
| def test_65_code_execution_data_analysis(self, test_config, provider, model, vk_enabled): | ||
| """Test Case 65: Code Execution - Data Analysis""" | ||
| client = get_provider_openai_client(provider, vk_enabled=vk_enabled) | ||
|
|
||
| # Test statistical analysis using code execution | ||
| response = client.chat.completions.create( | ||
| model=model, | ||
| messages=[ | ||
| {"role": "user", "content": "Calculate the mean and standard deviation of the following numbers: [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]. Use Python to compute these statistics."} | ||
| ], | ||
| tools=[ | ||
| { | ||
| "type": "code_interpreter" | ||
| } | ||
| ], | ||
| extra_body={"provider": provider} if not vk_enabled else None, | ||
| extra_query={"vk": "true"} if vk_enabled else None | ||
| ) | ||
|
|
||
| # Validate response | ||
| assert_valid_chat_response(response) | ||
| assert response.choices, "Response should have choices" | ||
|
|
||
| message = response.choices[0].message | ||
| assert message.content, "Message should have content" | ||
|
|
||
| # The response should contain statistical calculations | ||
| # Mean should be 5.5, std dev ~2.87 | ||
| content_text = message.content.lower() | ||
| assert any(keyword in content_text for keyword in ["mean", "average", "5.5", "standard deviation", "std"]), \ | ||
| f"Response should contain statistical results. Got: {message.content}" | ||
|
|
||
| print(f"✓ Code execution (data analysis) test passed!") | ||
| print(f" Response: {message.content[:200]}...") | ||
|
|
||
| @pytest.mark.parametrize( | ||
| "provider,model,vk_enabled", get_cross_provider_params_with_vk_for_scenario("code_execution") | ||
| ) | ||
| def test_66_code_execution_equation_solving(self, test_config, provider, model, vk_enabled): | ||
| """Test Case 66: Code Execution - Equation Solving""" | ||
| client = get_provider_openai_client(provider, vk_enabled=vk_enabled) | ||
|
|
||
| # Test equation solving using code execution | ||
| response = client.chat.completions.create( | ||
| model=model, | ||
| messages=[ | ||
| {"role": "user", "content": "Solve the equation 3x + 11 = 14 for x using Python."} | ||
| ], | ||
| tools=[ | ||
| { | ||
| "type": "code_interpreter" | ||
| } | ||
| ], | ||
| extra_body={"provider": provider} if not vk_enabled else None, | ||
| extra_query={"vk": "true"} if vk_enabled else None | ||
| ) | ||
|
|
||
| # Validate response | ||
| assert_valid_chat_response(response) | ||
| assert response.choices, "Response should have choices" | ||
|
|
||
| message = response.choices[0].message | ||
| assert message.content, "Message should have content" | ||
|
|
||
| # The solution should be x = 1 | ||
| content_text = message.content.lower() | ||
| assert any(keyword in content_text for keyword in ["x = 1", "x=1", "1.0", "solution"]), \ | ||
| f"Response should contain equation solution. Got: {message.content}" | ||
|
|
||
| print(f"✓ Code execution (equation solving) test passed!") | ||
| print(f" Response: {message.content[:200]}...") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing skip condition for unconfigured providers.
Other parameterized tests in this file include a skip condition at the beginning of the test method:
if provider == "_no_providers_" or model == "_no_model_":
pytest.skip("No providers configured for this scenario")This condition is missing from all three new code execution tests, which could cause test failures when no providers are configured for the code_execution scenario.
Proposed fix for test_64
def test_64_code_execution_math(self, test_config, provider, model, vk_enabled):
"""Test Case 64: Code Execution - Mathematical Computation"""
+ if provider == "_no_providers_" or model == "_no_model_":
+ pytest.skip("No providers configured for code_execution scenario")
+
client = get_provider_openai_client(provider, vk_enabled=vk_enabled)Apply the same fix to test_65 and test_66.
🧰 Tools
🪛 Ruff (0.14.11)
3048-3048: Unused method argument: test_config
(ARG002)
3080-3080: f-string without any placeholders
Remove extraneous f prefix
(F541)
3086-3086: Unused method argument: test_config
(ARG002)
3118-3118: f-string without any placeholders
Remove extraneous f prefix
(F541)
3124-3124: Unused method argument: test_config
(ARG002)
3155-3155: f-string without any placeholders
Remove extraneous f prefix
(F541)
🤖 Prompt for AI Agents
In `@tests/integrations/python/tests/test_openai.py` around lines 3045 - 3156,
Insert the same "skip when unconfigured" guard at the top of each new test
(test_64_code_execution_math, test_65_code_execution_data_analysis,
test_66_code_execution_equation_solving): check if provider == "_no_providers_"
or model == "_no_model_" and call pytest.skip("No providers configured for this
scenario") before any client setup or API calls so the tests behave like the
other parameterized tests when no providers are configured.
| def test_64_code_execution_math(self, test_config, provider, model, vk_enabled): | ||
| """Test Case 64: Code Execution - Mathematical Computation""" | ||
| client = get_provider_openai_client(provider, vk_enabled=vk_enabled) | ||
|
|
||
| # Test solving a mathematical equation using code execution | ||
| response = client.chat.completions.create( | ||
| model=model, | ||
| messages=[ | ||
| {"role": "user", "content": "Calculate the sum of all prime numbers between 1 and 50 using Python. Show your work."} | ||
| ], | ||
| tools=[ | ||
| { | ||
| "type": "code_interpreter" | ||
| } | ||
| ], | ||
| extra_body={"provider": provider} if not vk_enabled else None, | ||
| extra_query={"vk": "true"} if vk_enabled else None | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Model parameter should use format_provider_model for consistency.
Other tests in this file use format_provider_model(provider, model) to format the model parameter, but these new tests pass model directly. This inconsistency could cause issues with cross-provider testing.
Additionally, the test_config parameter is unused (per static analysis hint). Consider removing it or using _ prefix if it's required by the test framework.
Proposed fix
`@pytest.mark.parametrize`(
"provider,model,vk_enabled", get_cross_provider_params_with_vk_for_scenario("code_execution")
)
- def test_64_code_execution_math(self, test_config, provider, model, vk_enabled):
+ def test_64_code_execution_math(self, provider, model, vk_enabled):
"""Test Case 64: Code Execution - Mathematical Computation"""
client = get_provider_openai_client(provider, vk_enabled=vk_enabled)
# Test solving a mathematical equation using code execution
response = client.chat.completions.create(
- model=model,
+ model=format_provider_model(provider, model),
messages=[
{"role": "user", "content": "Calculate the sum of all prime numbers between 1 and 50 using Python. Show your work."}
],🧰 Tools
🪛 Ruff (0.14.11)
3048-3048: Unused method argument: test_config
(ARG002)
🤖 Prompt for AI Agents
In `@tests/integrations/python/tests/test_openai.py` around lines 3048 - 3065, In
test_64_code_execution_math, replace the raw model argument passed to
client.chat.completions.create with the formatted model by calling
format_provider_model(provider, model) so it matches other tests (keep the
tools/extra_body/extra_query logic intact); also address the unused test_config
parameter by renaming it to _test_config (or removing it) in the test signature
to satisfy static analysis and test framework conventions.
| extra_body={"provider": provider} if not vk_enabled else None, | ||
| extra_query={"vk": "true"} if vk_enabled else None | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# First, let's look at the specific lines mentioned (3063-3065) with context
sed -n '3050,3080p' tests/integrations/python/tests/test_openai.pyRepository: maximhq/bifrost
Length of output: 1388
🏁 Script executed:
#!/bin/bash
# Check how extra_body and extra_query are used throughout the file
rg -n "extra_body.*provider|extra_query.*vk" tests/integrations/python/tests/test_openai.pyRepository: maximhq/bifrost
Length of output: 1456
🏁 Script executed:
#!/bin/bash
# Find test_52_web_search_non_streaming and examine it
rg -n -B5 -A20 "def test_52_web_search_non_streaming" tests/integrations/python/tests/test_openai.pyRepository: maximhq/bifrost
Length of output: 1561
🏁 Script executed:
#!/bin/bash
# Look at the test function context around lines 3063-3065
sed -n '3025,3075p' tests/integrations/python/tests/test_openai.pyRepository: maximhq/bifrost
Length of output: 2506
🏁 Script executed:
#!/bin/bash
# Check if there are other tests with similar vk_enabled conditional pattern
rg -n "if not vk_enabled else None" tests/integrations/python/tests/test_openai.pyRepository: maximhq/bifrost
Length of output: 298
🏁 Script executed:
#!/bin/bash
# Check the file for stack information or related PRs
git log --oneline -20 tests/integrations/python/tests/test_openai.py 2>/dev/null | head -20Repository: maximhq/bifrost
Length of output: 111
🏁 Script executed:
#!/bin/bash
# Look for the pattern where extra_body includes provider and understand when vk_enabled is used
rg -B3 -A3 "vk_enabled" tests/integrations/python/tests/test_openai.py | head -80Repository: maximhq/bifrost
Length of output: 3973
🏁 Script executed:
#!/bin/bash
# Look for all code_execution tests to see if they all follow the same pattern
rg -n "def test.*code_execution" tests/integrations/python/tests/test_openai.pyRepository: maximhq/bifrost
Length of output: 349
🏁 Script executed:
#!/bin/bash
# Check if web_search tests also use parameterization with vk_enabled
rg -B2 "def test.*web_search" tests/integrations/python/tests/test_openai.py | head -40Repository: maximhq/bifrost
Length of output: 453
🏁 Script executed:
#!/bin/bash
# Understand the design: check if provider is always passed via client creation or needs explicit params
rg -A15 "def get_provider_openai_client" tests/integrations/python/tests/test_openai.pyRepository: maximhq/bifrost
Length of output: 632
🏁 Script executed:
#!/bin/bash
# Check if there are comments or documentation about vk_enabled and provider handling
rg -B5 "vk_enabled" tests/integrations/python/tests/test_openai.py | grep -A5 -B5 "provider\|vk" | head -60Repository: maximhq/bifrost
Length of output: 3003
🏁 Script executed:
#!/bin/bash
# Compare test_52 (web_search) with test_64 (code_execution) to see all differences
echo "=== test_52_web_search_non_streaming ===" && \
sed -n '2840,2865p' tests/integrations/python/tests/test_openai.py && \
echo "" && \
echo "=== test_64_code_execution_math ===" && \
sed -n '3048,3073p' tests/integrations/python/tests/test_openai.pyRepository: maximhq/bifrost
Length of output: 2528
🏁 Script executed:
#!/bin/bash
# Check what API method is used in each test (responses.create vs chat.completions.create)
rg -n "client\.(responses|chat\.completions|batches)" tests/integrations/python/tests/test_openai.py | grep -E "test_52|test_53|test_64|test_65|test_66"Repository: maximhq/bifrost
Length of output: 41
🏁 Script executed:
#!/bin/bash
# Look at the entire get_provider_openai_client function to understand provider passing mechanism
sed -n '/^def get_provider_openai_client/,/^def [a-z_]/p' tests/integrations/python/tests/test_openai.py | head -50Repository: maximhq/bifrost
Length of output: 983
🏁 Script executed:
#!/bin/bash
# Check all code_execution tests to see if they all use extra_body/extra_query pattern
sed -n '3048,3160p' tests/integrations/python/tests/test_openai.py | grep -E "extra_body|extra_query|client\."Repository: maximhq/bifrost
Length of output: 625
🏁 Script executed:
#!/bin/bash
# Check if responses.create API also needs extra_body/extra_query by examining its usage
rg -B2 -A5 "\.responses\.create" tests/integrations/python/tests/test_openai.py | head -50Repository: maximhq/bifrost
Length of output: 1909
🏁 Script executed:
#!/bin/bash
# Check if there are any other chat.completions.create calls to see the pattern
rg -B3 -A3 "chat\.completions\.create" tests/integrations/python/tests/test_openai.py | grep -E "extra_body|extra_query|chat\.completions\.create" | head -30Repository: maximhq/bifrost
Length of output: 1250
🏁 Script executed:
#!/bin/bash
# Look at git history to understand why code_execution tests use extra_body/extra_query
git log --oneline -p tests/integrations/python/tests/test_openai.py | grep -A20 -B5 "code_execution_math\|extra_body.*provider.*vk_enabled" | head -100Repository: maximhq/bifrost
Length of output: 3850
🏁 Script executed:
#!/bin/bash
# Check if there are chat.completions.create calls in other tests that DON'T use extra_body
rg -B5 "chat\.completions\.create" tests/integrations/python/tests/test_openai.py | grep -E "def test_[0-9]+|extra_body" | head -40Repository: maximhq/bifrost
Length of output: 1221
🏁 Script executed:
#!/bin/bash
# Look at the actual differences between responses.create and chat.completions.create usage patterns
echo "=== Responses.create tests (with vk_enabled) ===" && \
rg -A5 "responses\.create" tests/integrations/python/tests/test_openai.py | head -30 && \
echo "" && \
echo "=== Chat.completions.create tests (with vk_enabled, non-code-execution) ===" && \
sed -n '300,600p' tests/integrations/python/tests/test_openai.py | grep -A5 "chat\.completions\.create" | head -30Repository: maximhq/bifrost
Length of output: 2319
🏁 Script executed:
#!/bin/bash
# Check specifically if non-code-execution chat.completions tests pass extra_body
rg -B10 "chat\.completions\.create" tests/integrations/python/tests/test_openai.py | grep -B10 "extra_body" | grep "def test"Repository: maximhq/bifrost
Length of output: 41
🏁 Script executed:
#!/bin/bash
# Get the full picture: which tests use which API and which pass extra_body/extra_query
echo "=== Tests using responses.create ===" && \
rg -B3 "responses\.create" tests/integrations/python/tests/test_openai.py | grep "def test" && \
echo "" && \
echo "=== Tests using chat.completions.create with extra_body ===" && \
rg -B5 "extra_body.*provider.*vk_enabled" tests/integrations/python/tests/test_openai.py | grep "def test" && \
echo "" && \
echo "=== Tests using chat.completions.create without extra_body ===" && \
rg -B10 "chat\.completions\.create" tests/integrations/python/tests/test_openai.py | grep -v "extra_body" | grep "def test" | head -10Repository: maximhq/bifrost
Length of output: 95
🏁 Script executed:
#!/bin/bash
# Let me check if test_01-09 chat.completions.create are indeed all without extra_body by looking at specific lines
echo "=== test_01 ===" && sed -n '100,150p' tests/integrations/python/tests/test_openai.py | grep -A8 "chat\.completions" && \
echo "" && echo "=== test_03 ===" && sed -n '200,300p' tests/integrations/python/tests/test_openai.py | grep -A8 "chat\.completions"Repository: maximhq/bifrost
Length of output: 74
🏁 Script executed:
#!/bin/bash
# Check the commit that added code_execution tests to understand the reasoning
git log --all --oneline tests/integrations/python/tests/test_openai.py | grep -i "code\|execution"Repository: maximhq/bifrost
Length of output: 111
🏁 Script executed:
#!/bin/bash
# Look at the code_execution specific parameterization to see if it's different from web_search
rg -B5 'get_cross_provider_params_with_vk_for_scenario.*code_execution' tests/integrations/python/tests/test_openai.pyRepository: maximhq/bifrost
Length of output: 827
🏁 Script executed:
#!/bin/bash
# Check if there's any documentation about when to use extra_body vs client initialization
grep -r "extra_body.*provider" tests/integrations/python/ | grep -v ".pyc" | head -20Repository: maximhq/bifrost
Length of output: 1882
🏁 Script executed:
#!/bin/bash
# Let me see what the full test function looks like for test_01 to confirm it doesn't use extra_body
sed -n '95,130p' tests/integrations/python/tests/test_openai.pyRepository: maximhq/bifrost
Length of output: 1221
🏁 Script executed:
#!/bin/bash
# Check if there's a pattern difference - do code_execution tests require extra_body for a specific reason?
# Look at comments or docstrings in the code_execution section
sed -n '3040,3050p' tests/integrations/python/tests/test_openai.pyRepository: maximhq/bifrost
Length of output: 625
🏁 Script executed:
#!/bin/bash
# Verify: which chat.completions tests pass extra_body and which don't
echo "=== All chat.completions.create calls ===" && \
rg -B15 "chat\.completions\.create" tests/integrations/python/tests/test_openai.py | grep -E "def test_|chat\.completions" | head -40Repository: maximhq/bifrost
Length of output: 2767
🏁 Script executed:
#!/bin/bash
# Final check: are there any comments explaining why code_execution needs extra_body
rg -B10 "code_execution" tests/integrations/python/tests/test_openai.py | grep -E "comment|#|\"\"\"" | head -20Repository: maximhq/bifrost
Length of output: 241
🏁 Script executed:
#!/bin/bash
# Check the actual commit that added code_execution tests
git show 662048ff --stat tests/integrations/python/tests/test_openai.py | head -20Repository: maximhq/bifrost
Length of output: 358
🏁 Script executed:
#!/bin/bash
# Look at the full diff to understand the intent
git show 662048ff -- tests/integrations/python/tests/test_openai.py | grep -A30 -B10 "code_execution_math" | head -60Repository: maximhq/bifrost
Length of output: 1973
🏁 Script executed:
#!/bin/bash
# Check if there's any issue or PR context
git log --all --grep="code.execution" --oneline | head -5Repository: maximhq/bifrost
Length of output: 111
🏁 Script executed:
#!/bin/bash
# Most importantly: check if test_01 actually doesn't pass extra_body by looking at its full implementation
sed -n '98,125p' tests/integrations/python/tests/test_openai.pyRepository: maximhq/bifrost
Length of output: 935
🏁 Script executed:
#!/bin/bash
# Check if there's a comment or documentation about why extra_body is needed for code_execution
# Let me look at the broader context around all three code_execution tests
sed -n '3040,3080p' tests/integrations/python/tests/test_openai.pyRepository: maximhq/bifrost
Length of output: 1878
🏁 Script executed:
#!/bin/bash
# Check if responses.create API actually supports extra_body
rg -A10 "def responses" tests/integrations/python/tests/test_openai.py | head -30Repository: maximhq/bifrost
Length of output: 41
🏁 Script executed:
#!/bin/bash
# Final check: are there any comments in the codebase about provider passing strategy
grep -r "provider.*extra_body\|extra_body.*provider" tests/integrations/python/tests/ | grep -i "comment\|#" | head -10Repository: maximhq/bifrost
Length of output: 1168
🏁 Script executed:
#!/bin/bash
# Check if there's a specific reason code_execution needs explicit provider passing
# by looking at the preceding context and any API differences
sed -n '3040,3048p' tests/integrations/python/tests/test_openai.pyRepository: maximhq/bifrost
Length of output: 478
🏁 Script executed:
#!/bin/bash
# Let me check if there's a batches API pattern that shows this extra_body usage is intentional
sed -n '2150,2180p' tests/integrations/python/tests/test_openai.pyRepository: maximhq/bifrost
Length of output: 1587
🏁 Script executed:
#!/bin/bash
# Final verification: check if responses.create API actually might need extra_body internally
# by examining if any responses tests pass vk parameters
rg -A20 "responses\.create" tests/integrations/python/tests/test_openai.py | grep -E "extra_body|extra_query|vk_enabled" | head -10Repository: maximhq/bifrost
Length of output: 362
🏁 Script executed:
#!/bin/bash
# Let me check if maybe test_01-09 actually DO pass extra_body but in a different way
# by looking at the full chat.completions.create call in test_01
sed -n '114,128p' tests/integrations/python/tests/test_openai.pyRepository: maximhq/bifrost
Length of output: 580
🏁 Script executed:
#!/bin/bash
# Check test_13 streaming to see if it passes extra_body
sed -n '430,460p' tests/integrations/python/tests/test_openai.py | grep -A15 "chat\.completions\.create"Repository: maximhq/bifrost
Length of output: 1417
🏁 Script executed:
#!/bin/bash
# One more check: do web_search tests (test_52/53) need extra_body for their API?
# Let me see the full responses.create call
sed -n '2848,2860p' tests/integrations/python/tests/test_openai.pyRepository: maximhq/bifrost
Length of output: 637
Redundant provider passing in code execution tests.
The code execution tests (64-66) pass provider via extra_body and vk via extra_query, but this duplicates information already established during client initialization. Other chat completion tests (01-09) and response API tests (52-53) do not use this pattern. Since get_provider_openai_client already configures the client with the provider's API key and base URL, and sets the x-bf-vk header when vk_enabled=True, explicitly passing these parameters again appears unnecessary. Either remove the redundant parameters or document why code execution specifically requires this approach.
🤖 Prompt for AI Agents
In `@tests/integrations/python/tests/test_openai.py` around lines 3063 - 3065, The
test passes redundant provider info by supplying extra_body={"provider":
provider} and extra_query={"vk":"true"} in the code execution test call; remove
these explicit extras and rely on get_provider_openai_client to configure the
provider (API key/base URL) and x-bf-vk header when vk_enabled=True, or if you
intentionally need them for this specific code-execution path, add a short
comment explaining why; update the call site that uses extra_body and
extra_query (the code-execution test invocation) to stop passing those
parameters unless a documented, test-specific reason exists.
| def test_65_code_execution_data_analysis(self, test_config, provider, model, vk_enabled): | ||
| """Test Case 65: Code Execution - Data Analysis""" | ||
| client = get_provider_openai_client(provider, vk_enabled=vk_enabled) | ||
|
|
||
| # Test statistical analysis using code execution | ||
| response = client.chat.completions.create( | ||
| model=model, | ||
| messages=[ | ||
| {"role": "user", "content": "Calculate the mean and standard deviation of the following numbers: [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]. Use Python to compute these statistics."} | ||
| ], | ||
| tools=[ | ||
| { | ||
| "type": "code_interpreter" | ||
| } | ||
| ], | ||
| extra_body={"provider": provider} if not vk_enabled else None, | ||
| extra_query={"vk": "true"} if vk_enabled else None | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same issues apply to test_65_code_execution_data_analysis.
This test has the same issues:
- Unused
test_configparameter - Missing
format_provider_model()call - f-string without placeholder at line 3118
Proposed fix
`@pytest.mark.parametrize`(
"provider,model,vk_enabled", get_cross_provider_params_with_vk_for_scenario("code_execution")
)
- def test_65_code_execution_data_analysis(self, test_config, provider, model, vk_enabled):
+ def test_65_code_execution_data_analysis(self, provider, model, vk_enabled):
"""Test Case 65: Code Execution - Data Analysis"""
client = get_provider_openai_client(provider, vk_enabled=vk_enabled)
# Test statistical analysis using code execution
response = client.chat.completions.create(
- model=model,
+ model=format_provider_model(provider, model),
messages=[📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| def test_65_code_execution_data_analysis(self, test_config, provider, model, vk_enabled): | |
| """Test Case 65: Code Execution - Data Analysis""" | |
| client = get_provider_openai_client(provider, vk_enabled=vk_enabled) | |
| # Test statistical analysis using code execution | |
| response = client.chat.completions.create( | |
| model=model, | |
| messages=[ | |
| {"role": "user", "content": "Calculate the mean and standard deviation of the following numbers: [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]. Use Python to compute these statistics."} | |
| ], | |
| tools=[ | |
| { | |
| "type": "code_interpreter" | |
| } | |
| ], | |
| extra_body={"provider": provider} if not vk_enabled else None, | |
| extra_query={"vk": "true"} if vk_enabled else None | |
| ) | |
| def test_65_code_execution_data_analysis(self, provider, model, vk_enabled): | |
| """Test Case 65: Code Execution - Data Analysis""" | |
| client = get_provider_openai_client(provider, vk_enabled=vk_enabled) | |
| # Test statistical analysis using code execution | |
| response = client.chat.completions.create( | |
| model=format_provider_model(provider, model), | |
| messages=[ | |
| {"role": "user", "content": "Calculate the mean and standard deviation of the following numbers: [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]. Use Python to compute these statistics."} | |
| ], | |
| tools=[ | |
| { | |
| "type": "code_interpreter" | |
| } | |
| ], | |
| extra_body={"provider": provider} if not vk_enabled else None, | |
| extra_query={"vk": "true"} if vk_enabled else None | |
| ) |
🧰 Tools
🪛 Ruff (0.14.11)
3086-3086: Unused method argument: test_config
(ARG002)
🤖 Prompt for AI Agents
In `@tests/integrations/python/tests/test_openai.py` around lines 3086 - 3103, In
test_65_code_execution_data_analysis, remove the unused test_config parameter
from the function signature, ensure the model is normalized by calling
format_provider_model(model, provider) before using it (e.g., assign back to
model), and fix the stray f-string by replacing the f-prefix with a plain string
or adding the intended placeholder so no f-string is used without interpolation;
update references in the client.chat.completions.create call to use the
normalized model variable.
| def test_66_code_execution_equation_solving(self, test_config, provider, model, vk_enabled): | ||
| """Test Case 66: Code Execution - Equation Solving""" | ||
| client = get_provider_openai_client(provider, vk_enabled=vk_enabled) | ||
|
|
||
| # Test equation solving using code execution | ||
| response = client.chat.completions.create( | ||
| model=model, | ||
| messages=[ | ||
| {"role": "user", "content": "Solve the equation 3x + 11 = 14 for x using Python."} | ||
| ], | ||
| tools=[ | ||
| { | ||
| "type": "code_interpreter" | ||
| } | ||
| ], | ||
| extra_body={"provider": provider} if not vk_enabled else None, | ||
| extra_query={"vk": "true"} if vk_enabled else None | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same issues apply to test_66_code_execution_equation_solving.
This test has the same issues as the previous two:
- Unused
test_configparameter - Missing
format_provider_model()call - f-string without placeholder at line 3155
Proposed fix
`@pytest.mark.parametrize`(
"provider,model,vk_enabled", get_cross_provider_params_with_vk_for_scenario("code_execution")
)
- def test_66_code_execution_equation_solving(self, test_config, provider, model, vk_enabled):
+ def test_66_code_execution_equation_solving(self, provider, model, vk_enabled):
"""Test Case 66: Code Execution - Equation Solving"""
client = get_provider_openai_client(provider, vk_enabled=vk_enabled)
# Test equation solving using code execution
response = client.chat.completions.create(
- model=model,
+ model=format_provider_model(provider, model),
messages=[📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| def test_66_code_execution_equation_solving(self, test_config, provider, model, vk_enabled): | |
| """Test Case 66: Code Execution - Equation Solving""" | |
| client = get_provider_openai_client(provider, vk_enabled=vk_enabled) | |
| # Test equation solving using code execution | |
| response = client.chat.completions.create( | |
| model=model, | |
| messages=[ | |
| {"role": "user", "content": "Solve the equation 3x + 11 = 14 for x using Python."} | |
| ], | |
| tools=[ | |
| { | |
| "type": "code_interpreter" | |
| } | |
| ], | |
| extra_body={"provider": provider} if not vk_enabled else None, | |
| extra_query={"vk": "true"} if vk_enabled else None | |
| ) | |
| def test_66_code_execution_equation_solving(self, provider, model, vk_enabled): | |
| """Test Case 66: Code Execution - Equation Solving""" | |
| client = get_provider_openai_client(provider, vk_enabled=vk_enabled) | |
| # Test equation solving using code execution | |
| response = client.chat.completions.create( | |
| model=format_provider_model(provider, model), | |
| messages=[ | |
| {"role": "user", "content": "Solve the equation 3x + 11 = 14 for x using Python."} | |
| ], | |
| tools=[ | |
| { | |
| "type": "code_interpreter" | |
| } | |
| ], | |
| extra_body={"provider": provider} if not vk_enabled else None, | |
| extra_query={"vk": "true"} if vk_enabled else None | |
| ) |
🧰 Tools
🪛 Ruff (0.14.11)
3124-3124: Unused method argument: test_config
(ARG002)
🤖 Prompt for AI Agents
In `@tests/integrations/python/tests/test_openai.py` around lines 3124 - 3141, In
test_66_code_execution_equation_solving, remove the unused test_config parameter
from the function signature (or use it if intended), call
format_provider_model(provider, model) and assign its result back to model
before calling client.chat.completions.create so the model is normalized, and
fix the stray f-string (remove the leading f from any string literal that has no
placeholders or add the correct {placeholder}) to avoid invalid formatting;
adjust references to model/provider in the call (e.g., model=model and
extra_body={"provider": provider} if not vk_enabled) accordingly.

Added support for code execution tool across OpenAI, Anthropic, and Gemini
This PR adds support for the code execution tool across OpenAI, Anthropic, and Gemini providers, enabling users to execute Python code and other commands through these LLM providers.
Changes
Type of change
Affected areas
How to test
Test code execution with each provider using the Python integration tests:
Breaking changes
Related issues
Implements code execution tool support across major providers
Security considerations
Code execution is sandboxed by the providers themselves. This PR only implements the interface to these existing provider capabilities.
Checklist