Why Your AI Agent Is Drowning in Tools (And How Code Mode Saves It)
When your MCP server hits 50+ tools, your LLM loses 7% of its context before processing a single message. Discover the emerging "code mode" pattern that slashes token consumption and prevents tool hallucination.

Imagine you use various MCP servers for work. As a developer, you might connect a Figma MCP, Context7 MCP, or Jira MCP server to your agent, allowing you to leverage tools using natural language. Sounds perfect, right?
But you've probably already hit the wall: too many tools flooding your LLM's context window.
This creates two critical problems. First, context window bloat. Every tool name, description, and parameter schema consumes tokens on every request. At 50+ tools, this can eat 5 to 7 percent of the model's context before a single user message arrives, crowding out conversation history, document content, and reasoning space.
Second, tool hallucination. When you have too many semantically similar tools, the LLM starts inventing tool names that don't exist, conflating parameters between tools, or calling the right tool with arguments from a different tool's schema. For a deeper dive into this issue, check out this great article MCP Tool Design.
So how do we fix this?
Solution 1: Tool Reduction
The most straightforward approach is reducing the number of tools. You can do this in two ways: limit what the AI agent sees, or decrease the number of tools on the MCP side.
Agent-Side Filtering
With this approach, you define a curated set of tools that solve the specific problem your AI agent is facing. You request the full tool listing from the MCP, then filter it before passing it to the LLM. Fewer tools mean less context consumed.
Benefits:
- Very transparent: every call and argument is visible and easy to debug
- Simple to implement: follow the MCP spec and you're done
Drawbacks:
- Bloat: Large APIs (hundreds/thousands of endpoints) still create tool bloat
- Round-trips: Multi-step workflows require many round-trips
- Complex control flow: Loops, retries, and branching are clumsy when expressed as individual tool invocations
- Ongoing evaluation: You need to reevaluate new tools to see if they fit your scenario
MCP-Side Reduction
If you own the MCP server, you can reduce tools at the source. Don't just wrap every REST API call into a tool. Think about the use cases you're trying to solve (again, see MCP Tool Design). One use case might combine one or two APIs together, or wrap a single API call with business logic.
This is an iterative process. Our team went through the same tedious process, with numerous iterations to clean up our tools. Once you have clear use cases, ensure each tool addresses a distinct problem to avoid the hallucination issue.
If you still end up with too many tools (something our team is slowly facing), consider configuration options. You might add a persistence layer where users can pick and choose tools, or provide configuration options to hide certain tools when running the MCP locally. Each approach has trade-offs.
Solution 2: Code Mode (Tool Wrapping)
Here's where things get interesting. Instead of exposing tools directly to the LLM, you make them searchable and executable through code. This might sound similar to tool reduction, but the mechanism is fundamentally different.
The key insight: LLMs, trained on massive code datasets, are supposed to be better at writing code than calling tools directly. This approach was pioneered by Cloudflare in their Code Mode article, with impressive results:
- Can handle more (and more complex) tools
- Dramatically reduces context window usage when multiple calls are needed
Let's examine the trade-offs.
Benefits:
- Massive token reduction: Only a small SDK + a few examples are in context instead of the whole API schema. Multi-step workflows become one generated program + one execution, instead of many tool calls.
- Better control flow: Loops, conditionals, retries, and batching are just normal code.
- Fewer LLM round-trips: One execution can encapsulate dozens of real API calls.
- Stronger isolation: Code runs in a well-scoped sandbox with tightly controlled outbound access.
Drawbacks:
- Harder to debug: You inspect generated code and its logs, not a clean sequence of tool calls.
- Requires infrastructure: You need a code execution environment and sandboxing.
- Less "pure MCP": You're layering a mini runtime and SDK on top.
How Code Mode Works
This approach can be applied on either the agent side or the MCP side. Cloudflare's implementation works on the agent side. When you connect to an MCP server in "code mode," the Agents SDK fetches the MCP server's schema and converts it into a TypeScript API with doc comments. It then exposes two tools to the LLM:
- search: Allows the model to search over the pre-resolved OpenAPI spec using a JavaScript async arrow function. This returns only the relevant endpoints, types, and examples instead of stuffing the entire spec into context.
- execute: Allows the model to run a JavaScript async arrow function in a sandboxed Dynamic Worker isolate, where it can call endpoints, handle pagination, add conditionals/loops, and compose multi-step workflows.
Anthropic introduced a similar approach in Code execution with MCP, using a tree structure to search for viable tools. Same concept, different search implementation.
Agent-Side vs. MCP-Side Implementation
While both patterns I've described live on the agent side, you're not limited to that approach. You can implement the same functionality on the MCP side by:
- Adding a search tool alongside your other tools
- Making all tools searchable (exposing just one search tool, or all tools plus search)
- Adding an execution tool for the passed code
The execution tool can live on either side. Search implementations vary—for example, with Anthropic's approach, the search tool traverses a file structure and returns matches. But that's just one option among many.
Conclusion
I've seen it repeatedly: new approaches like code mode initially seem to dismiss MCP entirely. But look closer, and you'll see MCP still plays a crucial role thanks to its operability, scalability, and security features. I would also add to take the token reduction claims with a grain of salt. Indeed, you can save tokens with code mode, but only when you have repetitive operations or when you care only about part of the data structure, which might not always be the case.
The key takeaway? Regardless of which approach you choose, you still need to maintain a reasonable number of tools in your MCP server. What "reasonable" means depends entirely on your use cases, there's no magic number like 5, 20, or 50.
Start with your use cases, iterate on your tool design, and choose the pattern that best fits your constraints. Your AI agent (and your context window) will thank you.
Published by...
