Out-of-process orchestration for Claude Code

We've been building Claudeverse, a tool for running multiple Claude Code agents in parallel. The longer we worked on it, the clearer it got that a cluster of unrelated-looking complaints about Claude Code all trace back to one architectural decision — and that decision is worth writing down.

The symptoms

Skim the anthropics/claude-code issue tracker and you'll find variations of the same frustration:

Subagent output overflows the parent and crashes the session. #23463: seven parallel subagents, ~150K characters of results, "8 consecutive 'Prompt is too long' errors with zero recovery mechanism." The session freezes and the work is gone.
Subagents share one token budget. #10212: "all sub-agents share the parent session's 200K token budget." Run five, starve all five.
The orchestrator pattern is structurally blocked. #46424: "Only the top-level REPL can call Agent()." A subagent can't coordinate its own children.
Restarts wipe everything. #39663 and #43696: a multi-hour debugging session, gone on restart; --continue and --resume don't bring the context back.
Parallel sessions fight over the working tree. #52051: two sessions on one repo "step on each other's working tree." The community is already building worktree-juggling tools around this (Gwt-Claude on HN).

These read like five different bugs. They're closer to one.

The common cause

In Claude Code, the orchestrator is an agent. The thing that decides to spawn subagents is itself a turn in a Claude session. So everything the orchestrator coordinates lives inside its process, its context window, its working directory, and its session lifetime:

Subagent results come back into the parent's context — so enough of them overflow it.
Subagents draw on the parent's token budget — so they compete for it.
Only the top-level REPL can dispatch — so hierarchy is impossible.
State lives in the session — so a restart is a wipe.
Every agent shares the one checkout — so parallel agents collide.

It's not that any one of these was designed wrong. It's that putting the orchestrator inside an agent collapses them into the same problem: the coordinator and the work it coordinates have no isolation from each other.

Moving the orchestrator out

The alternative we landed on: the orchestrator is not an agent. It's a separate process that spawns Claude Code agents as child subprocesses and coordinates them from the outside. Our architecture is designed around that split, and it's what makes the five symptoms decomposable:

Context isolation. Each agent runs as its own Claude CLI process with its own session context. A subagent's output is captured by the orchestrator process — it doesn't get concatenated into a parent agent's prompt. One agent filling its context is one agent's problem, not a session-wide crash.

Separate context windows. Because each agent is its own process and its own session, agents aren't dividing one session's context window between them. (That's isolation at the process level — not a claim about how your Claude account's quota behaves, which isn't ours to change.)

Peer dispatch. There's no "top-level REPL." The orchestrator is ordinary code; any agent it runs is a peer subprocess. An agent that coordinates its own sub-agents is just more subprocesses.

State outside the process. The orchestrator tracks cycle and task state outside any single agent process. The agent process is disposable; the record isn't — so a crash or restart can resume from that state instead of from zero.

Workspace isolation. Because each agent is launched as its own CLI process with an explicit working directory, the orchestrator can put agents in separate checkouts or worktrees instead of forcing every worker through one shared repo state. That turns working-tree collisions into an orchestration policy problem, not an unavoidable side effect of one parent session.

Being precise about what this is

A few honest caveats, because HN will (correctly) ask:

"State outside the process" is cycle/task state, not a full session replay. What survives a restart is the structured record — what was attempted, what status it ended in, which tasks were touched — not the agent's entire in-context reasoning chain. It stops the re-investigation loop; it isn't time travel.
This is a coordination layer, not a model change. Claudeverse wraps the Claude CLI. It doesn't make the model better at any one task; it makes many tasks survivable to run at once.
Out-of-process has its own costs. Subprocess supervision, draining output without unbounded buffering, killing process groups so descendant processes don't leak, coordinating shared resources across crashes. We traded a context-isolation problem for a process-supervision problem — and we think that's the better trade, because supervision is a comparatively well-understood discipline and sharing a context window is not.

Why write this down

Mostly because the issue tracker shows a lot of people independently hitting the same wall and treating it as five separate annoyances. It's worth naming the shared cause: in-process orchestration couples the coordinator to the work. The fix isn't a flag — it's moving the coordinator out.