Agent hierarchy & privileges

Design + audit doc for the agent-privileges + tree-shape milestone (the issue tree). The implementation lands in pieces; this doc tracks what's done, what's planned, and what currently special-cases the manager.

Current state (as of this PR)

Topology lives in the hive-c0re-owned meta repo, alongside flake.nix, at /var/lib/hyperhive/meta/topology.json:

{
  "ruth": null,
  "alice": null,
  "bob": "alice"
}

null = root-level agent. New agents default to root (null parent) — there is no structural manager that everything hangs under. Hierarchy is built explicitly: an agent that requests a sub-agent gets a requester-as-parent edge written at its init_config approval (so bob above was spawned by alice), and the operator can reparent any agent. The bootstrap container (ruth) is just another root. Re-parenting is operator-driven:

CLI: hivectl agents set-parent <child> --parent <new> (or --root to promote). Exactly one of --parent / --root is required.
Dashboard: POST /api/topology/set-parent (form fields child, optional new_parent — absent / empty ⇒ promote to root).
Wire: HostRequest::SetParent { child, new_parent: Option<String> }.

All three converge on topology::set_parent, which delegates the validation rules to a pure apply_set_parent helper. Refuses:

unknown child / new_parent (typo guard),
self-parenting,
cycles (32-hop ancestor walk, mirroring is_descendant_of).

The manager is reparentable like any other agent — there's no "structurally root" carve-out; the manager's privileges live on its MCP socket, not its tree position, and the cycle walk above catches the only real safety concern (moving the manager under one of its own descendants).

Idempotent no-op fast path skips the disk write when the parent is already what's requested. After a successful write the surfaces call Coordinator::rescan_containers_and_emit so connected dashboard viewers see the tree repaint without polling (ContainerView.parent is sourced from topology.json).

Today's caveat: the move is purely a JSON edit. Only the top-level manager (root) gets /var/lib/hyperhive/agents bind-mounted at /agents in its container, so sub-agents don't yet see their would-be children's state. Once sub-manager bind mounts land alongside cap enforcement, set_parent grows a companion umount-old / mount-new / restart-cascade step.

Why meta, not per-agent `agent.nix`

An agent shouldn't be able to claim a parent without that parent's consent, and operator-driven re-parenting shouldn't require touching the moved agent's config. Topology IS a system-level concern; meta is where system-level facts live.

Flow

Read: topology::read() parses topology.json into a BTreeMap<String, Option<String>>. Missing / unparsable file → empty map → every agent treated as root (safe degradation for fresh installs that haven't run meta::sync_agents yet).
Reconcile: meta::sync_agents calls topology::reconcile alongside its flake.nix regeneration. New agents default to root (null parent) — an agent-requested sub-agent already carries an explicit requester-as-parent edge from its init_config approval, so only user/operator-initiated spawns hit this default, and those are roots; removed agents drop. Existing entries are preserved as-is so operator overrides stick across regenerations. Pending-init agents (an operator-approved proposed config repo but no container yet — Coordinator::pending_init_names) are kept too, so the child -> parent edge written when request_init_config is approved survives the gap until the first apply-commit spawns the container.
Inject: meta::render_flake looks up each agent's parent and passes it to mkAgent. When non-null, the mkAgent body sets HIVE_PARENT = parent in the agent's systemd service environment so the harness / claude prompts can see it.
Surface: container_view::build_all reads topology.json and populates ContainerView.parent: Option<String> on every rescan. The dashboard renders the field as a tree.

Target topology semantics

Once enforcement lands the rules collapse into:

operation	who can do it
`kill` / `start` / `restart` / `update` (any descendant)	any ancestor
`request_init_config` (spawn a new child)	any agent, child added under self
config change via forge PR (any descendant's config)	any ancestor
`get_logs` (any descendant)	any ancestor
moderate questions / reminders (cancel any open thread of a descendant)	any ancestor
`send` / `recv` routing	parent ↔ same-parent siblings ↔ self ↔ descendants; explicit allow-list for anyone else
`request_update_meta_inputs` (bump meta lock)	root agents only (today: just `manager`)

"Ancestor" walks ContainerView.parent chains; cycles are guarded by a visited-set at dispatch time (a malformed topology.json can't lock the dispatcher into a loop).

Current manager special-casings — the audit

What currently makes the manager different from every other agent, and which axis the post-milestone version reads each special-case along:

A — naming + bootstrap

MANAGER_AGENT = "ruth" (broker recipient name), MANAGER_NAME = "ruth" (logical name, state-dir key), and MANAGER_CONTAINER = "h-ruth" (nixos-container name); the h- prefix lets lifecycle::list() use a single starts_with("h-") filter.
auto_update::ensure_manager runs at hive-c0re boot and spawns h-ruth if missing. Topology: ruth defaults to root-level (no parent); hive-c0re handles the bootstrap lifecycle directly.

B — wire-protocol privileges

The ManagerRequest::* variants in hive-sh4re/src/lib.rs are operations the manager flavour socket can make that sub-agent sockets can't:

variant	semantic	post-milestone
`RequestInitConfig`	seed an agent's proposed config repo	topology — existing direct child (re-init) or a brand-new name (child added under self on approval); a name owned by a different parent is refused
`Kill` / `Start` / `Restart` / `Update`	container lifecycle on an existing agent	topology — descendants only
`RequestUpdateMetaInputs`	bump meta `flake.lock`	per-agent cap (root-only today; a future "let coder bump its own input" might grant it)
`GetLogs`	journalctl scrape of a sub-agent	topology — descendants only
`Wake`	inject a `from: <X>` message into self's inbox	not really privileged — the wire surface exists because daemon co-processes (e.g. `forge_notify`) need it. Sub-agents have the same via their own socket.

C — storage / mounts (`hive-c0re::lifecycle`)

The manager container's nspawn bind set:

HOST_AGENTS_ROOT (/var/lib/hyperhive/agents) → /agents RW — so the manager can edit any agent's proposed config repo
HOST_APPLIED_ROOT (/var/lib/hyperhive/applied) → /applied RO — so the manager can diff against what's deployed
HOST_META_ROOT (/var/lib/hyperhive/meta) → /meta RO — so the manager can read the system-wide deploy log

Tree-shape version:

Each agent gets RW to /agents/<descendant>/ for every descendant in its subtree. The root agent (today: manager) gets RW to the full forest as a special case of "the root has every other agent as a descendant".
RO /meta access if the agent holds a "meta read" cap.
request_update_meta_inputs is the only path that actually writes flake.lock, gated by the cap; everyone else stays RO.

D — drop legacy `/state` for manager ✓ done

lifecycle.rs no longer binds /state for the manager. HYPERHIVE_STATE_DIR is now injected uniformly via systemd.globalEnvironment in meta.rs for every container (manager included), so all token/state paths resolve through $HYPERHIVE_STATE_DIR. The agent-module shell scripts (tea-login, forge-avatar-sync) simplified from glob+for loops to a direct $HYPERHIVE_STATE_DIR/<token> read.

E — prompt + tools

prompts/system.md with  /  marker blocks, assembled by hive_ag3nt::prompt::render based on flavor. Per-agent cap list of what the agent can do — already a single parametrised prompt; once per-agent cap groups land the marker grammar grows cap:<group> blocks the renderer reads from the per-agent ToolGroup set.
mcp.rs::Flavor::{Agent, Manager} controls which MCP tools claude sees. Already structured this way internally — the per-flavour allow-list becomes a per-cap-set lookup.

F — drive-by checks across c0re

loose_ends.rs: manager sees hive-wide loose-ends, sub-agents only their own. Topology — every agent sees its own + its descendants'.
operator_questions.rs + broker.rs: "manager can cancel any question" override on the owner check. Topology — agents can moderate threads of their descendants.
reminder_scheduler.rs: same override pattern for reminder cancel. Topology — descendants only.
actions.rs: destroy refuses to act on MANAGER_NAME (no foot-shooting). Topology — agents can destroy descendants but never themselves or ancestors.
crash_watch.rs: skips ContainerCrash for the manager (it auto-restarts via systemd). Topology — the root container has different recovery semantics, every other agent falls into the same watch loop.

G — sub-agents inside the same container

Future work: when enabled for an agent, it can spawn temporary "sub-agents" that run inside its own container. Lighter than a full nspawn agent. Open questions, not yet wired:

Inherit caps from parent, or take an explicit narrower set?
Survive container restart, or always ephemeral?
Inbox: separate from parent, or shared?
Filesystem: share parent's /state RW, or a sub-dir?
Identity: distinct broker recipient name, or address the parent?

Harness systemd unit shape

One harness serve binary (hive-agent, with its hive-agent-mcp sibling), one shared nix/agent-modules/ tree, one service unit (systemd.services.hive-agent) for all agents. There is no longer a separate manager service name or role distinction in the harness — privilege differences live server-side in the broker socket (which tool groups and manager-surface calls each agent receives).

agent.nix and ruth.nix both import the shared nix/agent-modules/. ruth.nix additionally sets forge defaults to suppress the subscription/participation firehose so ruth's inbox stays focused on direct mentions, reviews, and assignments.

Environment variables set on the unit

HOME = /home/<userName> — systemd defaults HOME to / for services without User= set; with the per-agent user the harness needs the right home so claude finds its bind-mounted ~/.claude/ session dir.
HIVE_STATIC_DIR = <mergedDist> — tower_http::ServeDir root for the per-agent web UI; merged dist = agent default + every hyperhive.frontend.extraFiles overlay.
HIVE_ASSETS_DIR = pkgs.hyperhive-assets/share/hyperhive — set directly on the unit, not via environment.variables, because the latter only populates /etc/profile which systemd services don't inherit.

`PATH` setup (the wrapper-dir trick)

path = [ "/run/wrappers" "/run/current-system/sw" ];

/run/wrappers comes first so setuid wrappers (notably sudo) resolve before bare nix-store binaries. NixOS's systemd.services.<unit>.path appends /bin to every entry via lib.makeBinPath; passing /run/wrappers/bin directly produces /run/wrappers/bin/bin which doesn't exist (docs/gotchas.md:: systemd.services.*.path appends /bin to every entry). With the harness running as the per-agent user this matters: without the wrapper dir on PATH, sudo resolves to the un-setuid nix-store binary and rejects with must be owned by uid 0 and have the setuid bit set regardless of hyperhive.user.passwordlessSudo.

`serviceConfig` highlights

ExecStart = pkgs.hyperhive/bin/hive-agent — same binary for every agent.
Restart = on-failure, RestartSec = 2 — keeps the harness resilient across transient crashes without thundering retries.
RuntimeDirectory = "hive-config" → /run/hive-config/ owned by User=, auto-cleared on stop. The harness writes regenerated claude-{mcp-config,settings,system-prompt} files there (paths::config_dir). Deliberately separate from /run/hive, which the host bind-mounts in root-owned and which holds hive-c0re's mcp.sock.
User = Group = userName — drops root inside the container; sudo is the explicit escalation surface (hyperhive.user.passwordlessSudo).

Cross-references

Milestone: "Agent privileges and sub-agents"
Dashboard render: "show agent topology in container list"
Audit table source: milestone comment
Operator/agent trust boundary (orthogonal axis): boundary.md