Cloudflare's Code Mode: Fitting an Entire API into 1,000 Tokens

There's a quiet crisis in agentic AI: the more capable you make an agent, the more of its context window you consume just describing what it can do. Model Context Protocol (MCP) was supposed to solve tool integration — and it did — but every tool definition you add eats tokens. For small APIs that's fine. For large ones, it becomes untenable.

Today Cloudflare shipped their answer ¹: a single MCP server exposing the entire Cloudflare API — DNS, Workers, R2, Zero Trust, WAF, and everything else — in approximately 1,000 tokens.

The full API, done properly as traditional MCP tools, would cost 1.17 million tokens ¹. That's more than the context window of any current frontier model.

The Root Problem

When you expose an API over MCP the traditional way, each operation becomes a tool definition: name, description, parameters, schema. For a 10-endpoint API that's manageable. For the Cloudflare API with 2,500+ endpoints, you'd need to load the entire OpenAPI spec into the model's context before it could do anything. At 1.17M tokens, that's a non-starter.

The workaround most teams reach for: hand-pick a subset of endpoints and maintain dedicated MCP servers per product. Cloudflare had been doing exactly this — separate servers for DNS, Workers Observability, etc. It doesn't scale.

Code Mode: Write Code, Not Descriptions

The insight behind Code Mode is a shift in how the agent interacts with the API surface. Instead of calling pre-defined tools, the agent writes JavaScript code that gets executed in a sandboxed V8 isolate (a Cloudflare Worker).

The entire MCP server exposes exactly two tools ¹:

[
  {
    "name": "search",
    "description": "Search the Cloudflare OpenAPI spec.",
    "inputSchema": {
      "type": "object",
      "properties": {
        "code": { "type": "string", "description": "JavaScript async arrow function to search the OpenAPI spec" }
      }
    }
  },
  {
    "name": "execute",
    "description": "Execute JavaScript code against the Cloudflare API.",
    "inputSchema": {
      "type": "object",
      "properties": {
        "code": { "type": "string", "description": "JavaScript async arrow function to execute" }
      }
    }
  }
]

That's it. Two tools, fixed token cost, regardless of how many endpoints Cloudflare adds in the future.

How Discovery Works

When the agent needs to find the right endpoint, it calls search(). It receives a spec object — the full Cloudflare OpenAPI spec with all $refs pre-resolved — and writes JavaScript to query it:

async () => {
  const results = [];
  for (const [path, methods] of Object.entries(spec.paths)) {
    if (path.includes('/zones/') &&
       (path.includes('firewall/waf') || path.includes('rulesets'))) {
      for (const [method, op] of Object.entries(methods)) {
        results.push({ method: method.toUpperCase(), path, summary: op.summary });
      }
    }
  }
  return results;
}

The spec never enters the model context. The agent interacts with it through code — and only the relevant subset of results comes back. In the example above, 2,500 endpoints narrowed to the 10 WAF and ruleset endpoints needed for DDoS protection configuration.

The agent can also introspect schemas before acting — drilling into specific response shapes, finding enum values, checking required fields — all without loading documentation into context.

How Execution Works

When the agent knows what to call, it uses execute(). The sandbox provides a cloudflare.request() client pre-configured with the user's scoped API token. The agent can chain multiple API calls, handle pagination, and compose complex workflows in a single execution:

async () => {
  const ddos = await cloudflare.request({
    method: "GET",
    path: `/zones/${zoneId}/rulesets/phases/ddos_l7/entrypoint`
  });
  const waf = await cloudflare.request({
    method: "GET",
    path: `/zones/${zoneId}/rulesets/phases/http_request_firewall_managed/entrypoint`
  });
  // inspect, modify, return only what matters
}

What would require dozens of sequential tool calls in a traditional MCP setup can happen in a single execute() call. The agent programs the retrieval, not just the invocation.

The Sandbox

Both tools run inside a Dynamic Worker isolate ² — a lightweight V8 sandbox with:

No filesystem access
No environment variable leakage
External fetches disabled by default
Explicit outbound fetch control when needed

This matters for security: MCP servers are increasingly targeted by prompt injection attacks ³. Running generated code inside an isolate rather than directly on the host limits the blast radius considerably.

How This Compares

Cloudflare's post does a useful comparison of the three main approaches to MCP token reduction ¹:

Client-side Code Mode — the agent writes TypeScript against typed SDKs and runs it locally in a sandboxed environment. Implemented in Goose and in Claude's Programmatic Tool Calling. Requires secure sandbox access on the client side.

CLI-based discovery — MCP servers get converted to CLIs (via tools like MCPorter ⁴), giving agents progressive capability discovery through shell commands. More attack surface than a sandboxed isolate; requires the agent to have shell access. Tools like OpenClaw use this pattern.

Dynamic tool search — a search function surfaces a subset of relevant tools per task. Used in Claude Code. Smaller context use, but the search mechanism itself needs maintenance, and matched tools still cost tokens.

Server-side Code Mode combines the best of these: fixed token footprint, no client-side modifications required, progressive discovery built in, and execution inside a proper sandbox. The agent doesn't need a shell. The server doesn't need to maintain a tool per endpoint.

The New MCP Server

The Cloudflare MCP server is available now at https://mcp.cloudflare.com/mcp. It uses OAuth 2.1 — when your agent connects, you authorize it and select which permissions to grant. The token gets downscoped to exactly what you approved.

Add it to any MCP client:

{
  "mcpServers": {
    "cloudflare-api": {
      "url": "https://mcp.cloudflare.com/mcp"
    }
  }
}

For CI/CD or automation, a static API token works too.

The Code Mode SDK is also open-sourced ⁵ in the Cloudflare Agents SDK, so you can apply the same pattern to your own MCP servers.

Why This Matters Beyond Cloudflare

The 99.9% token reduction is striking, but the more important shift is architectural. Code Mode suggests that MCP tool definitions — as a mechanism for capability discovery — may be the wrong abstraction for large APIs.

The traditional model: define every operation upfront, load it all into context, hope the model picks the right one.

The Code Mode model: define an execution environment, let the agent explore and compose on demand.

As APIs grow larger and agents become more capable, the second approach has obvious scaling advantages. Cloudflare has 2,500 endpoints today. The server handles that with two tools. At 25,000 endpoints, it would still be two tools.

That's a more durable architecture.

Sources:

¹ Cloudflare Blog — "Code Mode: give agents an entire API in 1,000 tokens" (Feb 20, 2026): blog.cloudflare.com/code-mode-mcp

² Cloudflare Docs — Dynamic Worker Loader: developers.cloudflare.com/workers/runtime-apis/bindings/worker-loader

³ Anthropic Engineering — "Code Execution with MCP": anthropic.com/engineering/code-execution-with-mcp

⁴ MCPorter — MCP to CLI converter: github.com/steipete/mcporter

⁵ Cloudflare Code Mode SDK (open source): github.com/cloudflare/agents/tree/main/packages/codemode