The operator/agent boundary

Design rationale for hyperhive's two-principal trust model. The implementation work — container network isolation, the unifying gateway, core-daemon privsep — is tracked as area:ops issues on the forge.

The operator/agent boundary is now technically enforced, not just a convention. Containers run in private netns (network isolation is always on), the gateway proxies all operator-facing traffic, and hive-c0re runs as the unprivileged hive-core user. All three area:ops pillars — network isolation, the gateway, and privsep — are complete and active.

Two principals, two paths

Operator — reaches every UI (the dashboard + every per-agent page) through the gateway, on one origin. Operator-authority actions (approve / deny, answer-as-operator, lifecycle POSTs) are served by the core daemon and only reachable via the gateway.
Agent — speaks only for itself, only over its per-agent unix socket. The socket's identity is the agent (see docs/conventions.md, "identity = socket"). An agent must not be able to reach the core daemon's HTTP surface, another agent's socket, or another agent's web UI.

Design rule

Operator-authority actions never get a per-agent-socket entry point. They live on the core backend.

Worked example — answering an operator-targeted question is a POST /answer-question/{id} on the core dashboard, never an AgentRequest variant. If it were a per-agent-socket request, an agent could curl its own socket and spoof an operator answer. The per-agent web UI POSTs cross-origin to the core for these (see the inline-answer feature — the loose-ends section on each agent page).

Why network isolation is the load-bearing step

Without network isolation, containers share the host network namespace and can reach localhost:<core-port>, the dashboard, and every other agent's web port — the operator/agent split is on the honour system and every boundary claim above is aspirational. Network isolation is what makes the boundary real; the gateway and privsep are ergonomics and defence-in-depth layered on top.

Network isolation is now complete and always on: every agent container runs in a private netns behind the hive bridge. The shared-netns mode was removed. See docs/network.md.

Concretely, the core daemon's dashboard /api carries no application-layer authentication — operator-authority routes are served unauthenticated at the HTTP layer. Their protection is entirely (a) the gateway, which fronts all operator traffic and is where operator auth lives, and (b) network isolation, which keeps agents — and hive-ci's untrusted PR builds — off host-loopback so nothing can reach 127.0.0.1:<dashboard_port> directly. This is deliberate given the load-bearing role of network isolation above, but it is a standing invariant: the /api must never be bound to a non-loopback address or exposed outside the gateway, and every new operator-authority route inherits that assumption. hive-ci is treated like an agent for this purpose — it runs untrusted PR code and is netns-isolated for the same reason.

The area:ops issues followed this sequencing:

Gateway — pure ergonomics win, unblocks same-origin (lets the cross-origin CORS shim on /answer-question/{id} go away), no behavioural risk. An nginx nixos-container now sits in front of all surfaces; per-agent UIs are proxied under /agent/<name>/.
Network isolation — the load-bearing step that turns the honour-system split into an enforced boundary. Complete — always-on, unconditional; the shared-netns mode was removed.
Privsep — defence in depth on the core process; hive-c0re runs as the unprivileged hive-core user and delegates root operations to hive-priv, a narrow socket-activated helper. See docs/security.md for the privilege boundary table.

hive-priv socket activation

hive-priv is always socket-activated by the hive-priv.socket systemd unit. The unit binds /run/hive/priv.sock with SocketGroup=hive-core and mode 0660 and passes the ready listener to the helper as fd 3 (LISTEN_FDS). The helper requires this and bails if it isn't socket-activated — there is intentionally no self-bind fallback.

Dropping the old fallback removed a dev/prod divergence: when hive-priv bound the socket itself it created the file owned by root's primary group rather than hive-core, so a hive-core client couldn't connect the way the socket unit's SocketGroup grant intends. Requiring socket activation everywhere means dev and prod take the exact same path and the group grant always holds.

host admin socket access (`hivectl`)

hivectl drives the whole hive — spawn / kill / destroy / rebuild / deploy — over the host admin socket /run/hyperhive/host.sock, socket-activated by the hive-c0re.socket unit. That socket is the full-control surface, so who can connect to it is a real trust boundary.

By default the socket is 0660 group-owned by hive-admin, an empty group — so it is effectively root-only until an operator is explicitly granted access. Grant sudoless hivectl by listing login users in services.hyperhive.c0re.adminUsers; each is added to hive-admin, and members connect without sudo. The runtime dir /run/hyperhive is 0751 (traverse-only, no listing) so the group can reach the socket path; the socket's own 0660 hive-admin mode gates the connection, and the per-agent subdirs under it keep their own restrictive perms. Keep adminUsers to trusted operators — membership is equivalent to root over the hive.

The operator/agent boundary

Two principals, two paths

Design rule

Why network isolation is the load-bearing step

hive-priv socket activation

host admin socket access (hivectl)

host admin socket access (`hivectl`)