PoC: Simulating Code Execution via MCP Meta-Tools #1118

yannbonzom · 2025-11-15T21:52:05Z

Adds an opt-in “code-mode” wrapper that fronts multiple downstream MCP servers via three meta-tools (list_mcp_servers, list_tool_names, get_tool_implementation, plus call_tool). Agents progressively disclose server → tool → schema, so they only load what they need.

Motivation and Context

Per Anthropic's recent article on code execution providing massive token savings over MCP servers, I was curious to explore whether we might imagine emulating this file-system based code execution (listing directories & reading files to execute scripts) with MCP meta-tools (one config stores many MCP servers, and so the agent can list MCP servers, list tools, and call tools).

With the wrapper in place, linking Playwright through the MCP server drops a simple “open a site” run from 17 k tokens down to 13.7 k. Given Cursor’s 11.1 k base prompt, that’s 5.9 k tokens of overhead without the wrapper versus 2.6 k with it—a 56 % reduction (about 3.3 k tokens) while still keeping all downstream tools available. The LLM does spend a couple of extra tool calls on discovery, but the workflow mirrors “explore files → read file,” so even with many servers the prompt stays slim.

Please note: I open this PR to hear what you all think of this sort of setup. I did this fairly quickly, so there's likely cleaner approaches to accomplishing this setup. I would love to hear your thoughts and hunches around this, and whether building this out further might be worthwhile! Thanks for your thoughts :).

How Has This Been Tested?

npm run typecheck
npm run test
Manual end-to-end test in Cursor with the wrapper configured via code-config.mcp-servers.json, Playwright MCP linked to the local SDK, and the client invoking list_mcp_servers → list_tool_names(serverId=playwright) → get_tool_implementation → call_tool.

This is how I set it up:
In ~/.cursor/code-config.mcp-servers.json:

{
    "downstreams": [
        {
            "id": "playwright",
            "description": "Browser automation via Playwright",
            "command": "node",
            "args": ["/Users/yannbonzom/Desktop/projects/playwright-mcp/cli.js", "--headless", "--browser=chromium"]
        }
    ]
}

In ~/.cursor/mcp.json:

{
  "mcpServers": {
    "code-mode-mcp-servers": {
      "command": "npm",
      "args": [
        "--prefix",
        "/Users/yannbonzom/Desktop/projects/typescript-sdk",
        "run",
        "code-mode",
        "--",
        "--config",
        "/Users/yannbonzom/.cursor/code-mode.mcp-servers.json"
      ]
    }
  }
}

Breaking Changes

No breaking changes. The wrapper is opt-in and runs as its own CLI (npm run code-mode -- --config ...). Existing SDK usage is untouched.

Types of changes

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to change)
Documentation update

Checklist

I have read the MCP Documentation
My code follows the repository's style guidelines
New and existing tests pass locally
I have added appropriate error handling
I have added or updated documentation as needed

Additional context

The config file is named code-config.mcp-servers.json to underline that it lists multiple downstream MCP servers.
Meta-tools are hierarchical to keep token usage predictable: list_mcp_servers returns server summaries; list_tool_names requires serverId and returns just name + short description; get_tool_implementation is the only call that returns full schemas/stubs.
This gives agents a code-execution-like exploration experience without needing a real filesystem view, and lets users keep many MCP servers active simultaneously while still saving tokens. I realize there's the risk of too-many-MCP-servers (similar to the too-many-tools problem causing agents to get confused), so it will need experimentation to see if we encounter similar problems. My hunch is that it's a lot easier for an agent to discern across, say, [notion, playwright, jira] than having many similar-sounding tool names.
Sample chat to show how it's able to dynamically retrieve what it needs:

Sample chat showing how it figures out to explore its tools first to then determine what to use

pkg-pr-new · 2025-11-15T21:52:52Z

Open in StackBlitz

npm i https://pkg.pr.new/modelcontextprotocol/typescript-sdk/@modelcontextprotocol/sdk@1118

commit: 9d8a259

mattzcarey · 2025-11-17T14:07:02Z

Love this! Will leave it to the other team to decide whether it lives here yet. Codemode is awesome :)

PoC: Simulating Code Execution via MCP Meta-Tools

9d8a259

yannbonzom marked this pull request as ready for review November 15, 2025 21:56

yannbonzom requested a review from a team as a code owner November 15, 2025 21:56

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

PoC: Simulating Code Execution via MCP Meta-Tools #1118

PoC: Simulating Code Execution via MCP Meta-Tools #1118

Uh oh!

yannbonzom commented Nov 15, 2025 •

edited

Loading

Uh oh!

pkg-pr-new bot commented Nov 15, 2025

Uh oh!

mattzcarey commented Nov 17, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

PoC: Simulating Code Execution via MCP Meta-Tools #1118

Are you sure you want to change the base?

PoC: Simulating Code Execution via MCP Meta-Tools #1118

Uh oh!

Conversation

yannbonzom commented Nov 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation and Context

How Has This Been Tested?

Breaking Changes

Types of changes

Checklist

Additional context

Uh oh!

pkg-pr-new bot commented Nov 15, 2025

Uh oh!

mattzcarey commented Nov 17, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

yannbonzom commented Nov 15, 2025 •

edited

Loading