Giving agents hands
Guardrails for agents in production
A catalogue of the guards I actually ship: typed confirmations, blast-radius escalation, pay-to-play gating, ordered workflows, and read-only by default.
18 Jun 2026 · 3 min
The scary moment with an agent is not when it fails. A failure throws an error and stops. The scary moment is when it succeeds at the wrong thing, confidently, against production, because nothing was in the way. Every guard below exists because I pictured that moment and built something to stand in front of it.
This is a catalogue, not a manifesto. Each one is in a server I run.
Typed confirmation literals
A boolean confirm flag is no guard at all. The model fills booleans in by pattern. So the destructive tools do not take confirm: boolean. They take a literal. Dropping a database requires confirm: z.literal("yes"). The model cannot satisfy that by guessing a shape; it has to produce the exact string, which means it has to have read the description that told it the string. The type is the speed bump.
Escalate confirmation by blast radius
Not every destructive act is equally destructive, so the password gets harder as the damage grows. In the s3 server, deleting a single object needs confirm: "yes". Deleting a whole prefix, which could be thousands of objects, needs confirm: "yes-delete-all". The longer, uglier literal is deliberate friction sized to the consequence. You cannot fat-finger your way through a prefix delete the way you might through one object. The string you have to type grows with the hole you are about to dig.
Pay-to-play gating
Some calls cost money, and an agent in a loop can spend it faster than you can watch. The image tool wraps that with a gate: the billable, high-quality path only fires when high_fidelity: true is set explicitly. The default path is the free or cheap one. There is no way to reach the expensive call by accident; the agent has to opt into spending, and that opt-in is visible in the call it made. When the bill shows up, the log shows exactly which request chose to pay.
Encode the workflow order in the server
Some operations are only safe in a sequence, and the model does not know the sequence unless you tell it. The dokku server carries its workflow in the server instructions themselves: create the app, add the domain, set the letsencrypt email, set the ports, deploy, and only then enable SSL, because SSL can only succeed after DNS is actually pointing at the box and the first deploy has landed. Enabling letsencrypt as step two looks reasonable to a model reading the tool list cold. It fails every time. So the ordering is not left to inference. It is written down where the agent reads it, as a required sequence, so the obvious-but-wrong path is closed before it is tried.
Read-only by default
This one underwrites all the others. Every server ships read-only. List, get, report, describe: those are on from day one. Create, set, delete: those are off until I have watched the server drive in read mode long enough to trust it. Widening is a decision I make per verb, never a default the server ships with.
Read-only first is what makes the rest cheap. By the time a write verb is enabled, I have already seen how the model reaches for this system, what it tends to get wrong, where it needs a literal and where it needs an ordering. The guard is informed by the watching. You do not design the speed bumps in the abstract. You design them after you have seen the car take the corner too fast.
The shape of all of it
None of these are clever. A literal string, a longer literal string, an explicit cost flag, an ordered list in the instructions, a write switch that stays off. The cleverness is in deciding which act gets which guard, and that judgment comes from running the thing, not from a framework.
An agent with hands is useful precisely because it can act without asking. Guards are how you decide, in advance, the small set of acts where it has to ask anyway. Pick those carefully. Everything else, let it run.
- Giving agents real handsA fleet of thirteen small servers that give agents real hands. What they wrap, the two house styles I build them in, and why they all start read-only.Musing
- mcp-dokkuAn MCP server that drives a Dokku PaaS over SSH.Tool
- Small tool surfaces beat fat APIsA marketing API exposes 115 operations; my server hands the agent six tools. The boundary is set by token budget and model focus, not REST purity.Musing
- Infinite exercises, verifiedA model drafts maths questions against the component library, a verifier throws out the junk, and a clean one renders. Forever.Lab
- Sign an OAuth 1.0a request in plain NodePosting to X with user-context creds means signing the request yourself. Here is the HMAC-SHA1 signature, built by hand, no library.Snippet