-
Notifications
You must be signed in to change notification settings - Fork 648
Description
Priority: Medium
Problem Statement
Even with semantic filtering, some agents need access to many capabilities. Each capability as a separate tool adds to context overhead and degrades tool selection accuracy. Research from Cloudflare Code Mode and Anthropic Code Execution shows that LLMs are better at writing code than selecting from large tool sets.
Proposed Solution
Provide a CodeSandbox interface that:
- Accepts tool definitions and exposes them as a callable typed SDK within the sandbox
- Executes agent-generated code in isolation
- Returns results to the agent
A single execute_code tool replaces dozens of individual tools. The agent accomplishes work in one code execution call instead of chaining N tool calls — fewer round-trips means faster results and less opportunity to drift off track.
Security boundary: the sandbox exposes only the tool SDK, not raw system access. Tools themselves define the capability boundary.
Use Case
- Developer assistants needing file operations, git commands, HTTP requests, JSON parsing, etc.
- Reducing context overhead from irrelevant tool definitions
- Improving tool selection accuracy by narrowing the active set
Additional Context
Part of the Context Management epic, Track 2: Tool Context.