Agent hierarchy & privileges

Design + audit doc for the agent-privileges + tree-shape milestone (the issue tree). The implementation lands in pieces; this doc tracks what's done, what's planned, and what currently special-cases the manager.

Current state (as of this PR)

Topology lives in the hive-c0re-owned meta repo, alongside flake.nix, at /var/lib/hyperhive/meta/topology.json:

{
  "manager": null,
  "alice":   "manager",
  "bob":     "alice"
}

null = root-level agent. Today only the manager qualifies by default. Other agents land under "manager" on first sync. Re-parenting is operator-driven:

All three converge on topology::set_parent, which delegates the validation rules to a pure apply_set_parent helper. Refuses:

The manager is reparentable like any other agent — there's no "structurally root" carve-out; the manager's privileges live on its MCP socket, not its tree position, and the cycle walk above catches the only real safety concern (moving the manager under one of its own descendants).

Idempotent no-op fast path skips the disk write when the parent is already what's requested. After a successful write the surfaces call Coordinator::rescan_containers_and_emit so connected dashboard viewers see the tree repaint without polling (ContainerView.parent is sourced from topology.json).

Today's caveat: the move is purely a JSON edit. Only the top-level manager (root) gets /var/lib/hyperhive/agents bind-mounted at /agents in its container, so sub-agents don't yet see their would-be children's state. Once sub-manager bind mounts land alongside cap enforcement, set_parent grows a companion umount-old / mount-new / restart-cascade step.

Why meta, not per-agent agent.nix

An agent shouldn't be able to claim a parent without that parent's consent, and operator-driven re-parenting shouldn't require touching the moved agent's config. Topology IS a system-level concern; meta is where system-level facts live.

Flow

  1. Read: topology::read() parses topology.json into a BTreeMap<String, Option<String>>. Missing / unparsable file → empty map → every agent treated as root (safe degradation for fresh installs that haven't run meta::sync_agents yet).
  2. Reconcile: meta::sync_agents calls topology::reconcile alongside its flake.nix regeneration. New agents land at their default position (manager as parent, manager itself as root); removed agents drop. Existing entries are preserved as-is so operator overrides stick across regenerations.
  3. Inject: meta::render_flake looks up each agent's parent and passes it to mkAgent. When non-null, the mkAgent body sets HIVE_PARENT = parent in the agent's systemd service environment so the harness / claude prompts can see it.
  4. Surface: container_view::build_all reads topology.json and populates ContainerView.parent: Option<String> on every rescan. The dashboard renders the field as a tree.

Target topology semantics

Once enforcement lands the rules collapse into:

operation who can do it
kill / start / restart / update (any descendant) any ancestor
request_init_config (spawn a new child) any agent, child added under self
request_apply_commit (any descendant's config) any ancestor
get_logs (any descendant) any ancestor
moderate questions / reminders (cancel any open thread of a descendant) any ancestor
send / recv routing parent ↔ same-parent siblings ↔ self ↔ descendants; explicit allow-list for anyone else
request_update_meta_inputs (bump meta lock) root agents only (today: just manager)

"Ancestor" walks ContainerView.parent chains; cycles are guarded by a visited-set at dispatch time (a malformed topology.json can't lock the dispatcher into a loop).

Current manager special-casings — the audit

What currently makes the manager different from every other agent, and which axis the post-milestone version reads each special-case along:

A — naming + bootstrap

B — wire-protocol privileges

The ManagerRequest::* variants in hive-sh4re/src/lib.rs are operations the manager flavour socket can make that sub-agent sockets can't:

variant semantic post-milestone
RequestInitConfig seed an agent's proposed config repo topology — descendants only
RequestApplyCommit submit a commit sha for operator approval topology — descendants only
Kill / Start / Restart / Update container lifecycle on an existing agent topology — descendants only
RequestUpdateMetaInputs bump meta flake.lock per-agent cap (root-only today; a future "let coder bump its own input" might grant it)
GetLogs journalctl scrape of a sub-agent topology — descendants only
Wake inject a from: <X> message into self's inbox not really privileged — the wire surface exists because daemon co-processes (e.g. forge_notify) need it. Sub-agents have the same via their own socket.

C — storage / mounts (hive-c0re::lifecycle)

The manager container's nspawn bind set:

Tree-shape version:

D — drop legacy /state for manager ✓ done

lifecycle.rs no longer binds /state for the manager. HYPERHIVE_STATE_DIR is now injected uniformly via systemd.globalEnvironment in meta.rs for every container (manager included), so all token/state paths resolve through $HYPERHIVE_STATE_DIR. The harness-base shell scripts (tea-login, forge-avatar-sync, matrix-avatar-sync) simplified from glob+for loops to a direct $HYPERHIVE_STATE_DIR/<token> read.

E — prompt + tools

F — drive-by checks across c0re

G — sub-agents inside the same container

Future work: when enabled for an agent, it can spawn temporary "sub-agents" that run inside its own container. Lighter than a full nspawn agent. Open questions, not yet wired:

Harness systemd unit shape

One harness binary (hive), one harness-base.nix template, one service unit (systemd.services.hive-ag3nt) for all agents. There is no longer a separate manager service name or role distinction in the harness — privilege differences live server-side in the broker socket (which tool groups and manager-surface calls each agent receives).

agent-base.nix and manager.nix both import harness-base.nix. manager.nix additionally sets forge defaults to suppress the subscription/participation firehose so ruth's inbox stays focused on direct mentions, reviews, and assignments.

Environment variables set on the unit

PATH setup (the wrapper-dir trick)

path = [ "/run/wrappers" "/run/current-system/sw" ];

/run/wrappers comes first so setuid wrappers (notably sudo) resolve before bare nix-store binaries. NixOS's systemd.services.<unit>.path appends /bin to every entry via lib.makeBinPath; passing /run/wrappers/bin directly produces /run/wrappers/bin/bin which doesn't exist (docs/gotchas.md:: systemd.services.*.path appends /bin to every entry). With the harness running as the per-agent user this matters: without the wrapper dir on PATH, sudo resolves to the un-setuid nix-store binary and rejects with must be owned by uid 0 and have the setuid bit set regardless of hyperhive.user.passwordlessSudo.

serviceConfig highlights

Cross-references