Agent hierarchy & privileges
Design + audit doc for the agent-privileges + tree-shape milestone (the issue tree). The implementation lands in pieces; this doc tracks what's done, what's planned, and what currently special-cases the manager.
Current state (as of this PR)
Topology lives in the hive-c0re-owned meta repo, alongside
flake.nix, at /var/lib/hyperhive/meta/topology.json:
{
"manager": null,
"alice": "manager",
"bob": "alice"
}
null = root-level agent. Today only the manager qualifies by default.
Other agents land under "manager" on first sync. Re-parenting is
operator-driven:
- CLI:
hive-c0re set-parent <child> --parent <new>(or--rootto promote). Exactly one of--parent/--rootis required. - Dashboard:
POST /api/topology/set-parent(form fieldschild, optionalnew_parent— absent / empty ⇒ promote to root). - Wire:
HostRequest::SetParent { child, new_parent: Option<String> }.
All three converge on topology::set_parent, which delegates the
validation rules to a pure apply_set_parent helper. Refuses:
- unknown
child/new_parent(typo guard), - self-parenting,
- cycles (32-hop ancestor walk, mirroring
is_descendant_of).
The manager is reparentable like any other agent — there's no "structurally root" carve-out; the manager's privileges live on its MCP socket, not its tree position, and the cycle walk above catches the only real safety concern (moving the manager under one of its own descendants).
Idempotent no-op fast path skips the disk write when the parent is
already what's requested. After a successful write the surfaces call
Coordinator::rescan_containers_and_emit so connected dashboard
viewers see the tree repaint without polling
(ContainerView.parent is sourced from topology.json).
Today's caveat: the move is purely a JSON edit. Only the
top-level manager (root) gets /var/lib/hyperhive/agents
bind-mounted at /agents in its container, so sub-agents don't yet
see their would-be children's state. Once sub-manager bind mounts
land alongside cap enforcement, set_parent grows a companion
umount-old / mount-new / restart-cascade step.
Why meta, not per-agent agent.nix
An agent shouldn't be able to claim a parent without that parent's consent, and operator-driven re-parenting shouldn't require touching the moved agent's config. Topology IS a system-level concern; meta is where system-level facts live.
Flow
- Read:
topology::read()parsestopology.jsoninto aBTreeMap<String, Option<String>>. Missing / unparsable file → empty map → every agent treated as root (safe degradation for fresh installs that haven't runmeta::sync_agentsyet). - Reconcile:
meta::sync_agentscallstopology::reconcilealongside itsflake.nixregeneration. New agents land at their default position (manager as parent, manager itself as root); removed agents drop. Existing entries are preserved as-is so operator overrides stick across regenerations. - Inject:
meta::render_flakelooks up each agent's parent and passes it tomkAgent. When non-null, the mkAgent body setsHIVE_PARENT = parentin the agent's systemd service environment so the harness / claude prompts can see it. - Surface:
container_view::build_allreadstopology.jsonand populatesContainerView.parent: Option<String>on every rescan. The dashboard renders the field as a tree.
Target topology semantics
Once enforcement lands the rules collapse into:
| operation | who can do it |
|---|---|
kill / start / restart / update (any descendant) |
any ancestor |
request_init_config (spawn a new child) |
any agent, child added under self |
request_apply_commit (any descendant's config) |
any ancestor |
get_logs (any descendant) |
any ancestor |
| moderate questions / reminders (cancel any open thread of a descendant) | any ancestor |
send / recv routing |
parent ↔ same-parent siblings ↔ self ↔ descendants; explicit allow-list for anyone else |
request_update_meta_inputs (bump meta lock) |
root agents only (today: just manager) |
"Ancestor" walks ContainerView.parent chains; cycles are guarded by a
visited-set at dispatch time (a malformed topology.json can't lock the
dispatcher into a loop).
Current manager special-casings — the audit
What currently makes the manager different from every other agent, and which axis the post-milestone version reads each special-case along:
A — naming + bootstrap
MANAGER_AGENT = "ruth"(broker recipient name),MANAGER_NAME = "ruth"(logical name, state-dir key), andMANAGER_CONTAINER = "h-ruth"(nixos-container name); theh-prefix letslifecycle::list()use a singlestarts_with("h-")filter.auto_update::ensure_managerruns at hive-c0re boot and spawnsh-ruthif missing. Topology: ruth defaults to root-level (no parent); hive-c0re handles the bootstrap lifecycle directly.
B — wire-protocol privileges
The ManagerRequest::* variants in hive-sh4re/src/lib.rs are
operations the manager flavour socket can make that sub-agent sockets
can't:
| variant | semantic | post-milestone |
|---|---|---|
RequestInitConfig |
seed an agent's proposed config repo | topology — descendants only |
RequestApplyCommit |
submit a commit sha for operator approval | topology — descendants only |
Kill / Start / Restart / Update |
container lifecycle on an existing agent | topology — descendants only |
RequestUpdateMetaInputs |
bump meta flake.lock |
per-agent cap (root-only today; a future "let coder bump its own input" might grant it) |
GetLogs |
journalctl scrape of a sub-agent | topology — descendants only |
Wake |
inject a from: <X> message into self's inbox |
not really privileged — the wire surface exists because daemon co-processes (e.g. forge_notify) need it. Sub-agents have the same via their own socket. |
C — storage / mounts (hive-c0re::lifecycle)
The manager container's nspawn bind set:
HOST_AGENTS_ROOT (/var/lib/hyperhive/agents) → /agentsRW — so the manager can edit any agent's proposed config repoHOST_APPLIED_ROOT (/var/lib/hyperhive/applied) → /appliedRO — so the manager can diff against what's deployedHOST_META_ROOT (/var/lib/hyperhive/meta) → /metaRO — so the manager can read the system-wide deploy log
Tree-shape version:
- Each agent gets RW to
/agents/<descendant>/for every descendant in its subtree. The root agent (today: manager) gets RW to the full forest as a special case of "the root has every other agent as a descendant". - RO
/metaaccess if the agent holds a "meta read" cap. request_update_meta_inputsis the only path that actually writesflake.lock, gated by the cap; everyone else stays RO.
D — drop legacy /state for manager ✓ done
lifecycle.rs no longer binds /state for the manager.
HYPERHIVE_STATE_DIR is now injected uniformly via
systemd.globalEnvironment in meta.rs for every container
(manager included), so all token/state paths resolve through
$HYPERHIVE_STATE_DIR. The harness-base shell scripts
(tea-login, forge-avatar-sync, matrix-avatar-sync) simplified
from glob+for loops to a direct $HYPERHIVE_STATE_DIR/<token>
read.
E — prompt + tools
prompts/system.mdwith<!-- role:agent -->/<!-- role:manager -->marker blocks, assembled byhive_ag3nt::prompt::renderbased on flavor. Per-agent cap list of what the agent can do — already a single parametrised prompt; once per-agent cap groups land the marker grammar growscap:<group>blocks the renderer reads from the per-agent ToolGroup set.mcp.rs::Flavor::{Agent, Manager}controls which MCP tools claude sees. Already structured this way internally — the per-flavour allow-list becomes a per-cap-set lookup.
F — drive-by checks across c0re
loose_ends.rs: manager sees hive-wide loose-ends, sub-agents only their own. Topology — every agent sees its own + its descendants'.operator_questions.rs+broker.rs: "manager can cancel any question" override on the owner check. Topology — agents can moderate threads of their descendants.reminder_scheduler.rs: same override pattern for reminder cancel. Topology — descendants only.actions.rs:destroyrefuses to act onMANAGER_NAME(no foot-shooting). Topology — agents can destroy descendants but never themselves or ancestors.crash_watch.rs: skipsContainerCrashfor the manager (it auto-restarts via systemd). Topology — the root container has different recovery semantics, every other agent falls into the same watch loop.
G — sub-agents inside the same container
Future work: when enabled for an agent, it can spawn temporary "sub-agents" that run inside its own container. Lighter than a full nspawn agent. Open questions, not yet wired:
- Inherit caps from parent, or take an explicit narrower set?
- Survive container restart, or always ephemeral?
- Inbox: separate from parent, or shared?
- Filesystem: share parent's
/stateRW, or a sub-dir? - Identity: distinct broker recipient name, or address the parent?
Harness systemd unit shape
One harness binary (hive), one harness-base.nix template, one
service unit (systemd.services.hive-ag3nt) for all agents. There
is no longer a separate manager service name or role distinction in
the harness — privilege differences live server-side in the broker
socket (which tool groups and manager-surface calls each agent
receives).
agent-base.nix and manager.nix both import harness-base.nix.
manager.nix additionally sets forge defaults to suppress the
subscription/participation firehose so ruth's inbox stays focused
on direct mentions, reviews, and assignments.
Environment variables set on the unit
HOME = /home/<userName>— systemd defaultsHOMEto/for services withoutUser=set; with the per-agent user the harness needs the right home so claude finds its bind-mounted~/.claude/session dir.HIVE_STATIC_DIR = <mergedDist>—tower_http::ServeDirroot for the per-agent web UI; merged dist = agent default + everyhyperhive.frontend.extraFilesoverlay.HIVE_ASSETS_DIR = pkgs.hyperhive-assets/share/hyperhive— set directly on the unit, not viaenvironment.variables, because the latter only populates/etc/profilewhich systemd services don't inherit.
PATH setup (the wrapper-dir trick)
path = [ "/run/wrappers" "/run/current-system/sw" ];
/run/wrappers comes first so setuid wrappers (notably sudo)
resolve before bare nix-store binaries. NixOS's
systemd.services.<unit>.path appends /bin to every entry via
lib.makeBinPath; passing /run/wrappers/bin directly produces
/run/wrappers/bin/bin which doesn't exist (docs/gotchas.md:: systemd.services.*.path appends /bin to every entry). With the
harness running as the per-agent user this matters: without the
wrapper dir on PATH, sudo resolves to the un-setuid nix-store
binary and rejects with must be owned by uid 0 and have the setuid bit set regardless of hyperhive.user.passwordlessSudo.
serviceConfig highlights
ExecStart = pkgs.hyperhive/bin/hive serve— single binary.Restart = on-failure,RestartSec = 2— keeps the harness resilient across transient crashes without thundering retries.RuntimeDirectory = "hive-config"→/run/hive-config/owned byUser=, auto-cleared on stop. The harness writes regeneratedclaude-{mcp-config,settings,system-prompt}files there (paths::config_dir). Deliberately separate from/run/hive, which the host bind-mounts in root-owned and which holds hive-c0re'smcp.sock.User = Group = userName— drops root inside the container; sudo is the explicit escalation surface (hyperhive.user.passwordlessSudo).
Cross-references
- Milestone: "Agent privileges and sub-agents"
- Dashboard render: "show agent topology in container list"
- Audit table source: milestone comment
- Operator/agent trust boundary (orthogonal axis):
boundary.md