How to let an agent write your code without giving it the keys

· series: Working with the Machine

This is the technical companion to The colleague who hallucinates. That post told the story — an audit agent that reported a passing gate (twenty-seven of twenty-seven) as failing and nearly turned a hallucination into four real commits — and the principle that fell out of it: you don’t make a coding agent trustworthy, you make its output checkable, and you build the workspace so the dangerous moves are either impossible or self-reporting. This one is the build guide. It walks through the actual machinery — a ~150-line launcher, two fail-closed hooks, and the dispatch loop — close enough to the real thing that you can stand up your own version.

It’s built on Claude Code (the CLI), and the specific mechanism it leans on — a hook that runs before a tool call and can veto it — is Claude Code’s PreToolUse hook. If your agent harness has an equivalent pre-execution interception point, the same shapes port over. Linux and bash are assumed. The snippets are trimmed for the page — error handling and a few escape hatches are elided — but the control flow is what runs.

You don’t have to have read the companion to use this. The launch pattern, the deny-by-JSON hook contract, the scope-sidecar trick, and the audit-sidecar discipline generalize to any setup where you’re handing real write access to something that occasionally lies with confidence.

Why this is hard

Here’s the asymmetry that makes the whole problem.

When you hand an agent a structured work order — implement this, in these files, validated by this command — you have a lot of surface to enforce against. The scope is a list of paths. The acceptance check is a command with an exit code. You can gate all of it.

When you send an agent to look around and tell you what it found, you have almost nothing. The input is a freeform prompt. The output is a paragraph. And a paragraph that says “the gate is failing” is indistinguishable, at the dispatcher’s end, from a paragraph that says “the gate is failing” when it isn’t. The model is best at producing fluent, confident prose — which is exactly the output you can least verify.

So the strategy splits in two:

  1. For agents that write — fence the filesystem. Make it physically impossible to touch a file outside the work order, and fail closed when the work order is missing.
  2. For agents that investigate — force every factual claim to name its own proof, up front, before the agent even runs.

Both are enforced at the same chokepoint: a hook that fires before the tool call and can refuse it. Everything below is variations on that one move.

The one mechanism everything hangs on

A Claude Code PreToolUse hook is a command that runs before a tool executes. It receives the tool call as JSON on stdin, and it can allow the call (exit 0, no output) or deny it by printing a small JSON decision and exiting 0:

deny() {
  cat <<EOF
{
  "hookSpecificOutput": {
    "hookEventName": "PreToolUse",
    "permissionDecision": "deny",
    "permissionDecisionReason": "$1"
  }
}
EOF
  exit 0
}

That’s the entire contract. Print that, and the tool call never happens; the permissionDecisionReason is fed back to the agent as the reason it was blocked. Note the convention: you exit 0 even when denying — the decision is in the JSON, not the exit code. (Exiting 2 with a message on stderr does the same thing; the JSON form is just easier to read and lets you write a precise reason.)

Every fence in this post is a hook that decides whether to call deny. Hold onto that and the rest is detail.

Piece 1 — the launcher

Before any fences matter, you need a clean way to start an agent in a known role. On Skaldborn that’s a ~150-line bash script, start-claude.sh. It does three things worth copying.

It binds the role to a model at launch. A “writer” and a “reviewer” aren’t the same actor with different instructions — they’re different sessions, on deliberately different models, so the thing that reviews the work was never the thing that wrote it:

role_to_model() {
  case "$1" in
    architect|steward|orchestrator|writer) echo "claude-opus-4-8" ;;
    reviewer|verifier)                      echo "claude-opus-4-7" ;;
    *) return 1 ;;   # unknown role → hard error, don't guess
  esac
}

It launches detached, with the role’s model and a remote-control handle:

INNER="claude --dangerously-skip-permissions --remote-control $RC_NAME"
[[ -n "$MODEL"  ]] && INNER="$INNER --model $MODEL"
[[ -n "$PROMPT" ]] && INNER="$INNER $(sq_escape "$PROMPT")"

tmux new-session -d -s "$SESSION" -e "CLAUDE_SESSION_SLUG=$SLUG" "$INNER"

Yes, --dangerously-skip-permissions. That flag is what makes the agent able to run unattended — and it’s also exactly why the hooks below have to be airtight. Skipping the interactive permission prompt doesn’t skip the hooks. The hooks are the permission system once the human is no longer in the loop. If you’re going to use that flag, the fences aren’t optional; they’re the replacement for the thing you turned off.

(sq_escape is a POSIX single-quote escaper. The prompt travels bash → tmux → sh -cclaude, and a prompt with a stray quote or newline will shred the argv chain. Wrap in '...', replace embedded ' with '\'', move on with your life.)

It makes “done” a fact on disk, not a claim. The launcher prints — and a sentinel file records — the exact path that will exist when the session finishes its work:

Role: writer (model: claude-opus-4-8)
Expected handoff: state/coordination/writer-query-api-handoff.md
Active-session sentinel: state/coordination/active-sessions/writer-query-api.json

This matters more than it looks. An agent that says “I’m done!” and an agent that crashed mid-task produce the same silence from the outside. By contract, completion is the handoff file landing at the declared path — nothing else counts. A caller can block on it without parsing any chat:

until [ -f state/coordination/writer-query-api-handoff.md ]; do sleep 60; done
echo "DONE: writer-query-api"

The sentinel (written at launch, removed by a tmux session-closed hook when the session dies) answers the other question — is it still running? Present means alive; absent means stopped. Together they give you a cheap, file-based view of a fleet of agents without a scheduler.

Piece 2 — the fence around the writer

Now the important one. A writer is dispatched with a scope: the component it may touch, expressed as a set of allowed directory roots in that component’s contract file. The boundary hook (enforce-component-boundary.sh, wired to fire before every Read, Edit, and Write) enforces it.

The first problem is mechanical and worth knowing if you build this yourself: environment variables don’t propagate to subagents, and at the moment the hook fires for a freshly spawned agent, the agent doesn’t exist yet — so you can’t key the scope on an agent id you don’t have. The fix is a single-shot file. The dispatcher drops one pending-scope.json before spawning the writer; the writer’s very first tool call claims it:

# No scope in the environment? Try to claim the pending scope file.
if [[ -z "${COMPONENT_CONTRACT:-}" ]]; then
  AGENT_ID=$(jq -r '.agent_id // ""'   <<<"$INPUT")
  AGENT_TYPE=$(jq -r '.agent_type // ""' <<<"$INPUT")
  SIDECAR="state/coordination/active-scope-$AGENT_ID.json"
  PENDING="state/coordination/pending-scope.json"

  # First tool call claims the pending scope as this agent's active scope.
  [[ ! -f "$SIDECAR" && -f "$PENDING" ]] && mv "$PENDING" "$SIDECAR"
  [[ -f "$SIDECAR" ]] && COMPONENT_CONTRACT=$(jq -r '.contract' "$SIDECAR")
fi

Then the part that makes it a fence and not a suggestion — fail closed. A writer that arrives with no scope resolved is a misconfigured dispatch (the orchestrator forgot to stage it). The wrong move is to shrug and allow everything; that silently re-opens the exact hole the fence exists to close. So:

# A writer with no resolved scope is a misconfiguration. Deny, don't default-open.
if [[ -z "${COMPONENT_CONTRACT:-}" && "$AGENT_TYPE" == writer-* ]]; then
  deny "Component scope missing for $AGENT_TYPE. The orchestrator must stage \
state/coordination/pending-scope.json before dispatching the writer."
fi

With a scope resolved, the actual check is almost boring. Read the allowed roots out of the contract; if the target path is inside one, allow; otherwise, deny:

ALLOWED_ROOTS=$(jq -r '(.source_roots + .test_roots + .shared_dependencies)[]' "$CONTRACT_PATH")

while IFS= read -r root; do
  [[ -z "$root" ]] && continue
  [[ "$root" != /* ]] && root="$PWD/$root"
  if [[ "$FILE_PATH" == "$root" || "$FILE_PATH" == "$root/"* ]]; then
    exit 0          # inside the fence — allow
  fi
done <<<"$ALLOWED_ROOTS"

COMPONENT_NAME=$(jq -r '.component // "unknown"' "$CONTRACT_PATH")
deny "Component boundary violation: $FILE_PATH is outside allowed roots for \
component '$COMPONENT_NAME'."

There’s a small universal allowlist on top of this — every agent can always read the shared contracts, the coordination directory, the build files — so the fence constrains source, not the agent’s ability to orient itself. And reads of the agent’s own component directory are always fine. But the spine is the loop above: a writer scoped to the query API cannot open a simulation file, full stop, no matter how reasonable its plan to “just check one thing” sounds.

One deliberate non-choice: I don’t isolate writers in git worktrees. I tried; eleven stale worktrees once stranded eighteen commits and I spent an afternoon recovering them like lost luggage. The boundary hook plus the scope file gives the isolation without the graveyard. Writers commit straight to the main branch, inside their fence.

Piece 3 — the fence around the investigator

The boundary hook guards writing. It does nothing for the more insidious case: an agent sent to investigate, which can’t damage a file but can hand you a confident, fabricated finding that you then act on. (That’s the failure the companion post opens with — an audit agent that reported a passing gate as failing and nearly turned a hallucination into four real commits.)

The second hook (enforce-subagent-dispatch-discipline.sh, wired to fire before the Agent tool — the one that spawns sub-agents) gates exactly the investigation-class spawns and lets everything else through:

SUBAGENT_TYPE=$(jq -r '.tool_input.subagent_type // ""' <<<"$INPUT")
case "$SUBAGENT_TYPE" in
  Explore|general-purpose) ;;   # the fact-finding agents — gated below
  *) exit 0 ;;                  # writers, reviewers, etc. — not our problem here
esac

The rule for the gated ones: you may not dispatch a fact-finder without first declaring, on disk, how its finding will be proven. Two sidecars satisfy that. An audit sidecar — for any dispatch that will assert a fact — or a freeform sidecar with a stated reason, for genuine open-ended search (“find the usages of this function”) where there’s no fact to prove yet.

AUDIT="state/coordination/pending-subagent-audit.json"
FREEFORM="state/coordination/pending-subagent-freeform.json"

if [[ -f "$AUDIT" ]]; then
  if ! err=$(validate_audit "$AUDIT"); then
    deny "audit sidecar invalid: $err"
  fi
  consume "$AUDIT"; exit 0      # claim succeeds — dispatch allowed
fi

if [[ -f "$FREEFORM" ]]; then
  reason=$(jq -r '.reason // ""' "$FREEFORM")
  [[ -n "$reason" && "$reason" != "null" ]] || deny "freeform bypass requires a non-empty reason"
  consume "$FREEFORM"; exit 0
fi

deny "audit sidecar required for Explore/general-purpose dispatch. Write \
pending-subagent-audit.json or pending-subagent-freeform.json before the call."

The whole trick is in what validate_audit insists on. An audit packet has to name the evidence, and the evidence has to be the kind you can’t fake:

validate_audit() {
  local file="$1" errors=""
  jq empty "$file" 2>/dev/null || { echo "not valid JSON"; return 1; }

  local fmt;  fmt=$(jq -r '.expected_evidence_format // ""' "$file")
  local nc;   nc=$(jq '(.canonical_commands // []) | length' "$file")
  local nf;   nf=$(jq '(.canonical_files    // []) | length' "$file")

  case "$fmt" in raw_stdout|parsed_jsonl|file_read) ;;
    *) errors+="expected_evidence_format must be raw_stdout|parsed_jsonl|file_read; " ;;
  esac

  # Must name at least one evidence source.
  (( nc == 0 && nf == 0 )) && errors+="name at least one canonical_command or canonical_file; "
  # "I read a file" must name the file; "I ran a command" must name the command.
  [[ "$fmt" == "file_read" ]]                            && (( nf == 0 )) && errors+="file_read needs canonical_files; "
  [[ "$fmt" == "raw_stdout" || "$fmt" == "parsed_jsonl" ]] && (( nc == 0 )) && errors+="$fmt needs canonical_commands; "

  [[ -n "$errors" ]] && { echo "${errors% }"; return 1; }
  return 0
}

So a real audit sidecar looks like this:

{
  "packet_type": "audit",
  "canonical_commands": ["make validate-governance-enforcement-declared"],
  "expected_evidence_format": "raw_stdout"
}

expected_evidence_format: raw_stdout is the load-bearing field. It means the agent’s report must quote that command’s actual output verbatim — not summarize it, not characterize it, paste it. You cannot claim a gate is red when the gate’s own stdout, sitting three lines down in your own report, says green. The lie and its disproof end up in the same document. The fence doesn’t stop the agent from being wrong; it stops the agent from being wrong invisibly.

consume is the audit trail. The sidecar doesn’t get deleted — it’s moved, stamped with the tool-call id that claimed it, so afterward you can match “what did the dispatcher promise” against “which spawn actually consumed it”:

consume() {
  local src="$1"
  mkdir -p state/coordination/consumed
  mv "$src" "state/coordination/consumed/${TOOL_USE_ID}-$(basename "$src")"
}

The freeform escape hatch is deliberate, and deliberately visible. The discipline I want isn’t “fill out a form forever” — it’s “if you’re going to assert a fact about the repo, name your proof; if you’re just poking around, say so out loud.” Because every freeform reason is logged to the same trail, “freeform everything” can’t quietly become the path of least resistance without leaving fingerprints.

Wiring it together

The hooks are inert until you register them. In .claude/settings.json, you attach each to the tool it guards via a matcher:

{
  "hooks": {
    "PreToolUse": [
      {
        "matcher": "Read|Edit|Write",
        "hooks": [
          { "type": "command", "command": ".claude/hooks/enforce-component-boundary.sh" }
        ]
      },
      {
        "matcher": "Agent",
        "hooks": [
          { "type": "command", "command": ".claude/hooks/enforce-subagent-dispatch-discipline.sh" }
        ]
      }
    ]
  }
}

Read|Edit|Write is a regex over tool names — the boundary hook fires before any file op. Agent is the spawn tool — the dispatch-discipline hook fires before any sub-agent is created. That’s the entire integration. (Skaldborn runs a few more hooks at each point — one that keeps the orchestrator out of source code, one that blocks destructive git, an audit logger — but those two are the load-bearing pair this post is about.)

The loop that drives it

The fences are passive — they only say no. Something has to do the dispatching, and on Skaldborn that’s the orchestrator: a session whose entire job is to run a phase of work by handing scoped packets to fresh agents. Its loop, stripped to the spine:

  1. Read the next unblocked work packet.
  2. Stage the writer’s scope: write pending-scope.json naming the component contract.
  3. Spawn the writer. The boundary hook now confines it to that component.
  4. When the handoff file lands, spawn a separate reviewer against the diff and the contract.
  5. On approval, spawn a verifier to run the regression check.
  6. Only then mark the packet done.

Two constraints fall out of the mechanics. Writers are serial — there’s one pending-scope.json at a time, so you stage, spawn, wait for the first tool call to claim it, then stage the next. (Parallel writers need a per-spawn keying scheme the single-shot file doesn’t give you.) And the orchestrator itself is forbidden from reading source — enforced by yet another boundary hook — because the moment the coordinator starts reading code to “just check,” it stops being the clean context that can notice when a packet is mis-scoped. If it can’t dispatch from contracts, logs, and packet history alone, the correct output isn’t a guess. It’s “the contract is underspecified,” and a stop.

No single agent both writes code and approves it. No agent that reasons about the system is also the one reporting facts about it. Those separations are the point, and they’re structural, not polite.

Gotchas worth knowing before you build this

A few things cost me time. Skip the lessons:

  • Env vars don’t reach sub-agents, and the sub-agent doesn’t exist when the parent’s spawn hook fires. This is the reason scope is passed by single-shot file instead of an environment variable. Don’t fight it — drop a file, claim it on first tool call.
  • exit 0 is how you deny. The decision lives in the JSON on stdout, not the exit code. An uncaught error that exits non-zero is not a deny — depending on config it can fail open. Which is why the next point matters.
  • Decide your failure direction on purpose. These hooks shell out to jq. If jq is missing, what happens? Skaldborn’s dispatch-discipline hook fails closed — no jq, no dispatch — with a narrow, explicit environment-variable escape hatch for the rare case you need to override. A security fence that fails open the moment a tool is missing isn’t a fence.
  • Per-session tmux hooks don’t fire on session close — by the time session-closed runs, the session and its attached hooks are already gone. Use a global hook that extracts the session name from the hook context. (This is the bit that cleans up the sentinel files.)
  • Quote-escape the prompt across the whole chain. bash → tmux → sh -cclaude will happily destroy a prompt containing a quote or newline. POSIX single-quote escaping survives it; bash-only tricks don’t, because tmux runs the inner command through sh.

How Skaldborn consumes this

None of this is a side project — it’s how the game actually gets built. The substrate work, the content pipeline, the client: most of it, by volume, is written by agents running inside these fences, dispatched by the loop above, while I’m reviewing something else or asleep. The launcher starts them in role. The boundary hook keeps each one inside its component. The dispatch-discipline hook keeps the fact-finders honest. The handoff files tell me what finished. I read diffs and verdicts, not keystrokes.

The fences don’t make the agents smarter. They make the agents’ mistakes cheap and loud instead of expensive and silent — a misconfigured dispatch stops at the door, an out-of-bounds edit never lands, a fabricated finding arrives stapled to its own disproof. That’s the whole trade, and it’s the only reason I’m comfortable handing real write access to something I’ve watched confidently report green as red.

What’s next

There’s a piece I deferred and still owe: a post-run check that reads the agent’s transcript and verifies each promised command’s output actually appears in its report — turning “detectable in review” into “mechanically impossible to fake.” The pre-dispatch sidecar forces the agent to name its proof; it doesn’t yet force the agent to quote it truthfully. That’s the next fence.

If you want to follow along, subscribe via the form at the bottom of any page — one short email when the next post lands. If you build a version of this and something breaks — or you want to argue — write to devlog@skaldborn.com. I read everything.