Giving agents hands

Small tool surfaces beat fat APIs

A marketing API exposes 115 operations; my server hands the agent six tools. The boundary is set by token budget and model focus, not REST purity.

22 Jun 2026 · 3 min

When you wrap an API for an agent, the first instinct is to mirror it. The API has endpoints, so make a tool per endpoint. One to one, faithful, complete. It feels honest.

It is also a mistake. The mailchimp API has something like 115 operations once you count every nested resource and verb. If I had exposed 115 tools, the agent would spend its whole context window just reading the menu, and it would still pick the wrong dish. The tool list is not documentation. It is something the model has to hold in its head on every single turn.

Six tools for 115 operations

So the mailchimp server is six tools: ping, audience, member, campaign, template, report. That is the entire surface. Each one dispatches on an action enum. The audience tool takes an action of list, get, create, update, or delete, plus whatever fields that action needs. The member tool does the same for members. Five real operations collapse behind one name the model can actually reason about.

The google search console server is built the same way: six tools, each an action-enum dispatcher. Sites, sitemaps, queries, the lot, folded into a handful of nouns. The model sees six choices, picks the noun, then picks the verb. Two small decisions instead of one giant lookup.

The win is not tidiness. It is that the model makes better choices when it is choosing among six things than among 115. Fewer tokens spent describing the surface, fewer near-duplicate tools to confuse, more of the context left for the actual task.

When one verb per tool is right

This is not a universal rule, and the fleet proves it. The cloudflare server does not collapse anything. It has six DNS tools, and each verb is its own tool: list records, get a record, create, update, delete, and so on. No action enum in sight.

Why the difference? Infra verbs are not interchangeable the way CRUD on a marketing resource is. Deleting a DNS record is a different kind of act from listing one, with a different blast radius, and I want the model to feel that difference in the tool name itself. A delete-dns-record tool that the agent has to deliberately reach for is safer than a dns tool where delete is just one more value in an enum it might fill in by pattern-matching. When the verbs carry real, distinct risk, give each one its own door.

So the boundary moves. On a marketing API where the operations are dozens of variations on the same safe theme, fold them into six action-enum tools and save the budget. On infra where each verb is a separate decision with separate consequences, split them out so the model has to mean it.

The actual rule

REST purity says: one resource, one set of endpoints, mirror it faithfully. That rule optimizes for the wrong reader. Your API docs are read by a human who can scroll. Your tool list is read by a model that pays for every token of it on every turn and has to choose under that load.

So the real question for each tool boundary is not "what does the API expose." It is two things. How much of the context budget does this surface cost. And does splitting a verb out earn its keep by making a risky choice more deliberate. Mailchimp's answer was to collapse 115 into 6. Cloudflare's answer was to keep delete standing alone. Same fleet, same builder, opposite calls, because the thing being optimized was never the API. It was the model's attention.

Wrap for the reader you actually have. The reader is the model, and its budget is small.

Related
now runningwhisper_scheduleopen