Turn loop + MCP

How the harness wakes up, what it asks claude to do, and what tools claude has access to in return.

The loop

Each agent harness (hive serve — one binary for all agents) runs:

  1. Long-poll Recv on its socket. The host-side broker (broker.rs::recv_blocking_batch) returns immediately if there's a pending message, otherwise waits up to 30 s for a broker Sent event for this recipient.
  2. Pop one message. Peek the remaining inbox depth with Status.
  3. Emit LiveEvent::TurnStart { from, body, unread } onto the SSE bus.
  4. Spawn claude (one process per turn) and pipe the wake prompt over stdin.
  5. Stream stdout (JSON lines) into the bus as LiveEvent::Stream(value). Pump stderr as Note.
  6. Wait for claude to exit. Compaction is two-pronged — reactive on Prompt is too long and proactive on a context watermark (see Compaction below). Rate-limit detection: on stderr the harness does a raw-line match for 429 / rate_limit markers; on stdout it only fires on parsed {"type":"error"} JSON events (avoiding false positives when agents discuss rate_limit_error in conversation text). On detection the harness sets the rate_limited sentinel (Bus::emit_status("rate_limited")), sleeps HIVE_RATE_LIMIT_SLEEP_SECS (default 300), then retries. The dashboard and per-agent page show a ⊘ rate limited badge while the harness is parked. Auth-failed detection: both stdout and stderr pumps also match AUTH_FAIL_MARKERS ("authentication_failed", 401, etc.). On the first 401, drive_turn retries the same prompt once immediately (transient token-refresh races and brief API hiccups can cause a 401 that clears on retry). Only if the retry also returns AuthFailed does drive_turn bubble it up to the serve loop, which then writes {state_dir}/hyperhive-needs-login, emits needs_login_idle status, requeues the inflight message (so it replays after re-auth), and parks in wait_for_login — the same path used at boot. The operator re-authenticates via the per-agent web UI login flow; on success the sentinel is cleared and the queued message drives the next turn normally. Mtime-snapshot resumption: wait_for_login snapshots the ~/.claude/ dir (newest file mtime + file count) at entry and only resumes when that snapshot advances — not just when credentials exist on disk. This prevents a silent infinite-401 loop: stale credentials already on disk at the time of the 401 no longer cause an immediate false-resume. The DirSnapshot struct tracks both axes; either a mtime advance OR a file-count change triggers resume (the count axis handles filesystems where modified() errors on every file).
  7. Emit LiveEvent::TurnEnd { ok, note }. Sleep poll_ms to avoid tight loops on transient failures.

Harness binary shape

One hive binary for all agents. The earlier split into hive-ag3nt + hive-m1nd was collapsed because the privilege boundary lives server-side at the broker socket (/run/hive/mcp.sock): ManagerRequest calls are refused by the standard agent socket regardless of who sends them.

Three subcommands:

Surface trait + zero-sized type tags

AgentRequest / AgentResponse (= ManagerRequest / ManagerResponse — type aliases) are the wire types. There is one role: agent. bin/hive.rs factors the turn loop through a Surface trait with one zero-sized impl (AgentSurface) wrapping:

main() calls serve_main::<AgentSurface> for all roles. The turn loop (serve_loop / handle_turn / wake) has no per-role branches.

Boot wiring

serve_main reads HIVE_PORT (default DEFAULT_WEB_PORT) + HIVE_LABEL (default "hive" for standalone runs; the meta flake sets it unconditionally for any container-deployed agent; see docs/conventions.md::Hive identity for the env stack), opens turn-stats sqlite, prepares the on-boot files (see below), installs claude plugins, spawns forge_notify::run + web_ui::serve, and either drops into serve_loop directly (Online) or parks on the login flow first (NeedsLogin).

Plugin install failures are not fatal: each entry comes back as a human-readable failure string that gets routed via Surface::send_to_parent to the agent's topology parent (the broker resolves <parent> per topology::parent_of; root agents and the manager fall through to operator).

Turn outcomes

turn::TurnOutcome drives the post-claude branch:

Outcome Action
Ok / Compacted ack_turn
RateLimited sleep HIVE_RATE_LIMIT_SLEEP_SECS (default 300), requeue inflight, status back to online
AuthFailed emit needs_login_idle sentinel, requeue inflight, park in wait_for_login
Failed(err) route [system] \` claude turn failed:\ntoviasend_to_parent`

After the outcome handler, the stats sink records a row and the hyperhive-continue sentinel (dropped by the request_next_turn MCP tool) is consumed if present. handle_turn reports the result to serve_loop via TurnControl { auth_failed, continue_requested, pending }. When a continue was requested, the turn did not auth-fail, and the inbox is empty (pending == 0), serve_loop drives the next turn in-process with a synthetic { from: "self", body: "continue" } message (synthetic_continue) — it never goes through the broker, so the self-continue doesn't persist to sqlite or show up as a recv'able inbox message. If real messages are already pending the continue is dropped: those messages drive the next turn(s) via recv_next, so an explicit self-wake isn't needed (this is the request_next_turn contract — "no effect if a new inbox message arrives before this turn ends"). The should_self_continue predicate encodes exactly that decision.

The claude invocation

claude --print --verbose --output-format stream-json --model <name> \
  --continue --settings /run/hive/claude-settings.json \
  --system-prompt-file /run/hive/claude-system-prompt.md \
  --mcp-config /run/hive/claude-mcp-config.json --strict-mcp-config \
  --tools <builtins> --allowedTools <builtins+mcp>
# wake prompt piped over stdin

<name> is read from Bus::model() on each turn. The initial default is set by hyperhive.model in the agent's agent.nix (NixOS option; propagates via HIVE_DEFAULT_MODEL env var; falls back to "haiku" if unset). The operator can flip it at runtime with /model <name> in the web terminal — the next turn picks it up. The choice is persisted to /harness/hyperhive-model so it survives restart; override path: HYPERHIVE_MODEL_FILE env var for tests.

Context-window size is looked up per-model via events::context_window_tokens(model). Resolution order (first match wins):

  1. HIVE_CONTEXT_WINDOW_TOKENS_<KEY> env var, where KEY (lowercased) is a substring of the active model name. Injected by the meta flake from services.hyperhive.c0re.contextWindowTokens (host-level NixOS option, defaults: haiku=200k, sonnet=1M, opus=1M). Override these for all agents at once without a per-agent config change.
  2. HIVE_CONTEXT_WINDOW_TOKENS — single global override for any model (useful in dev / test).
  3. Hard fallback: 200_000 (conservative; only reached outside NixOS where the env vars aren't set).

The effective window drives watermarks and is exposed at runtime via /api/state.context_window_tokens so the UI can show a percentage-of-window ctx badge.

--continue keeps a persistent session per agent (claude stores sessions in ~/.claude/projects/, which is bind-mounted persistently). Auto-compact and auto-memory are disabled via --settings because hyperhive owns compaction — see Compaction below. A one-shot --continue suppression is available via POST /api/new-session (or /new-session slash command in the per-agent terminal) — Bus::take_skip_continue() flips an AtomicBool once per turn, the next claude invocation drops --continue, every subsequent turn resumes normal behaviour.

Compaction

claude's own in-session auto-compact is off (--settings); hyperhive owns it explicitly in turn::drive_turn. There are two triggers:

The compact watermark defaults to 75% of context_window_tokens(model) (dynamically derived — 150k for haiku, 750k for sonnet/opus). Override with HIVE_COMPACT_WATERMARK_TOKENS (absolute token count); set to 0 to disable proactive compaction entirely (the reactive path always applies). The proactive path is best-effort — a failed checkpoint turn or /compact is surfaced as a Note but never fails the turn that already succeeded. The operator can also force a compaction any time via /api/compact.

To disable proactive compaction for a specific agent, use the nix option:

hyperhive.autoCompact = false;  # default true

Setting autoCompact = false is equivalent to HIVE_COMPACT_WATERMARK_TOKENS=0. Useful for agents running large-context models (sonnet/opus) where the 75% heuristic fires before the session is actually full — the reactive path (compact-on-overflow when the session hits the hard limit) still applies.

The child runs with cwd = /state (when the bind exists; falls back to the parent's cwd in dev), so any relative path in a tool call (Read foo.md, Bash ls, Write notes.md) lands in the agent's durable bind-mounted dir. CLAUDE.md auto-load walks upward from /state — drop a per-agent CLAUDE.md there if you want long-term hints that survive destroy/recreate.

The wake prompt is intentionally minimal: just the popped message's from/body, plus an inline ({unread} more pending — drain via …) hint when unread > 0. Claude drives any further recv/send itself via the embedded MCP server.

Whenever hive-c0re starts / restarts / rebuilds a container, it also drops a system message into the agent's inbox via Coordinator::kick_agent — a one-line "you were just (re)started, check /state/ for your notes, --continue session is intact". The next turn picks it up like any other inbox message.

On-boot files

hive_ag3nt::turn::write_* writes three files next to the per-agent socket at /run/hive/ once at startup:

The shared per-turn plumbing lives in hive_ag3nt::turn::{write_mcp_config, write_settings, write_system_prompt, run_turn, drive_turn, emit_turn_end, wait_for_login, compact_session} so the two binaries can't drift.

Agent icon

hyperhive.icon = ./icon.svg;  # default: null (falls back to shared hyperhive logo)

Path to an SVG file used as this agent's visual identity — shown in the per-agent page header, as the page favicon, and uploaded to the agent's Forgejo profile avatar (via the forge-avatar-sync boot unit) and Matrix profile avatar (via matrix-avatar-sync). Commit the SVG next to agent.nix in the config repo and reference it as a relative path.

When null (the default), the agent falls back to the shared hyperhive branding mark. The harness serves whichever icon is active at GET /icon on the per-agent web port.

user.passwordlessSudo

hyperhive.user.passwordlessSudo = true;  # default

Grants the per-agent unix user passwordless sudo (NOPASSWD: ALL). Enabled by default so claude's shell tools work for operations that need root inside the container (systemctl, package managers in dev shells, etc.) — the same privilege surface the previous root-user shape had, now elevated explicitly rather than implicitly.

Set to false for agents that should be strictly unprivileged. Any tool invocation that needs root then fails loudly with the standard sudo rejection rather than silently succeeding — easier to audit.

hyperhive.user.uid, hyperhive.user.gid, and hyperhive.user.name are the companion options; see docs/agent-hierarchy.md — "Harness systemd unit shape" for the full user.* surface.

hyperhive.dashboardLinks = [
  { label = "Stats"; icon = "📊"; url = "http://localhost:9001/stats"; }
  { label = "Scratchpad"; url = "http://localhost:8080"; }
];

Declares extra navigation links that appear on the agent's dashboard card and in the per-agent page header alongside the built-in forge / config / container links. Each entry has:

Field Required Description
label yes Display text shown in the icon strip tooltip and meta-nav.
url yes Absolute URL — may include a different port (the dashboard renders it as a plain anchor).
icon no Emoji or short glyph prefix. Defaults to empty string.

The list is written to <state>/hyperhive-dashboard-links.json by a one-shot systemd unit at container boot. hive-c0re reads the file on each container-view snapshot and attaches the links to the agent card (kind = External) without any code change. Omitting the option (default empty) produces no extra links.

Custom static files

hyperhive.frontend.extraFiles = {
  "games/bitburner" = {
    source = ./bitburner-dist;   # path relative to agent.nix
    # target defaults to attribute name: "games/bitburner"
  };
  "my-page" = {
    source = ./my-page.html;
    target = "my-page.html";    # explicit override
  };
};

Layers additional files over the default per-agent web UI dist. Each attribute defines one overlay entry:

Constraints: target must start with an alphanumeric or _ and contain only alphanumerics, _, ., /, -. .. segments are rejected by a config assertion. The merge step refuses to overwrite files already present in the default dist — pick a target name that does not collide with existing paths (static/, index.html, etc.).

The default dist ships at hyperhive.frontend.dist (the hyperhive-frontend package output, read-only). To replace the entire UI rather than layer on top, override frontend.dist directly.

Connectivity overrides

Two hyperhive.forge.* / hyperhive.matrix.* options override where the per-agent daemons connect. Both rarely need changing on a standard single-host deploy, but are useful for multi-hive or custom-network setups.

hyperhive.forge.url = "http://localhost:3000";   # default
hyperhive.matrix.url = "http://localhost:8008";  # default

hyperhive.forge.url — base URL of the Forgejo instance. Used by a one-shot boot unit (tea-login) that writes ~/.config/tea/config.yml directly from the agent's forge-token, so tea and hive-forge work without an interactive auth step. The unit is a no-op when forge-token is absent. Override when the agent should connect to a Forgejo on a different host or port (e.g. a swarm peer's forge). Validated: must be an http:// or https:// URL or the empty string.

hyperhive.matrix.url — homeserver URL used by hive-matrix-daemon when connecting via the matrix-sdk. Default points at the in-host tuwunel (localhost:8008), reachable over the host loopback in shared-netns mode. Override per-agent when an agent should talk to a different homeserver — for example a remote hive's tuwunel reached over a VPN, or an external Matrix server for a federation-only agent.

Claude Code plugins

The harness installs Claude Code plugins before the serve loop opens. Two per-agent agent.nix options control this:

hyperhive.claudeMarketplaces = [ "anthropics/claude-plugins-official" ];  # default
hyperhive.claudePlugins = [ "formatter@my-marketplace" ];                  # default: []
hyperhive.claudePluginsAutoUpdate = false;                                 # default

cargo.shortMessages

hyperhive.cargo.shortMessages = true;  # default

When enabled (the default), the harness injects a cargo shell function into /etc/hyperhive/bash-env.sh that transparently appends --message-format short to compile subcommands (build, check, clippy, test, run, doc, bench, install, rustc, fix). This suppresses the per-crate progress lines that flood the response window, leaving only warnings and errors.

The function handles +toolchain selectors (cargo +nightly build) and passes through cleanly when --message-format is already present. Non-compile subcommands (new, add, third-party cargo-*) are left untouched.

Set to false for agents that parse cargo's JSON output programmatically and do not pass --message-format json themselves.

MCP surface

The harness ships an embedded MCP server (rmcp 1.7). Claude launches it as a stdio child via --mcp-config. The hyperhive socket name is hyperhive, so the tools land in claude as mcp__hyperhive__<tool>.

Tool access is gated by tool groups (HIVE_TOOL_GROUPS). The default preset (AGENT_DEFAULT) includes messaging, meta, inbox, and execution. Privileged groups (lifecycle, approvals, scheduling, diagnostics) are opt-in via the P3RM1SS10NS tab.

Core tools (always available)

Messaging (messaging group): send(to, body, in_reply_to?), recv(wait_seconds?, max?), ask(question, options?, multi?, ttl_seconds?, to?), answer(id, answer).

Inbox (inbox group): get_loose_ends(agent?), cancel_loose_end(kind, id), remind(message, delay_seconds? | at_unix_timestamp?), request_next_turn().

Meta (meta group): set_status(text), get_agent_meta(name?).

Privileged tools (by tool group)

Waking the agent from inside the container

External MCP servers (and any other in-container process) can inject a wake-up event into the agent's inbox via the per-agent socket at /run/hive/mcp.sock. Two equivalent paths:

The wake event lands in the broker as {from:<label>, to:<agent>, body}, waking whatever recv call the harness is currently blocked on. The next turn fires with the wake prompt formed from that message.

Identity = socket: anything that can connect to /run/hive/mcp.sock is implicitly trusted to inject these — the bind-mount is the agent's own container only.

Authoritative state

hive_ag3nt::events::Bus carries the current turn-loop state in addition to the broadcast channel and the events history. Variants:

The harness flips state at the relevant transitions (set_state(Thinking) before drive_turn, set_state(Idle) after; set_state(Compacting) around compact_session). Exposed via /api/state.turn_state + turn_state_since (unix seconds); the agent page renders this rather than deriving from SSE events.

Tool envelope

mcp::run_tool_envelope: every MCP tool handler logs the request, runs the body, logs the result. Pre-/post-log only — the inbox status hint moved to the wake prompt + UI header.

Tool whitelist (mcp::ALLOWED_BUILTIN_TOOLS)

Bash is disallowed — shell execution goes through mcp__bash__run (background tasks with structured output + task-id tracking) instead of an interactive shell. The run / status MCP tools (mcp__bash__run / mcp__bash__status) are always in the --allowedTools list.

WebFetch / WebSearch are off by default; enable the web_tools tool group in the P3RM1SS10NS tab and rebuild the agent to enable them.