Conventions

Code-style and process expectations across the workspace. Most of these exist because something already went wrong without them.

Naming

Hive identity (label + domain + display names)

Four env vars cover the identity surface, read by hive_ag3nt::identity:

hive_name + swarm_name are distinct from HYPERHIVE_HIVE_DOMAIN: the domain may carry the hive name as its leftmost label by convention, but the convention isn't machine-readable, and federated hives at different DNS domains can share a swarm name. Humans want both: the address (@darkest.space) AND the prose name (pr1ma). Matrix MXIDs still use the domain-based convention untouched.

qualify(label) is the same shape as qualified_label() but applies to an arbitrary label the caller already has (e.g. a peer name from the broker); it's the right surface when rendering a peer's name when the caller knows it's hive-local.

Identity = socket

There are no auth tokens on the per-agent unix sockets. The socket path identifies the principal; perms come from "who has the bind-mount." A sub-agent only sees its own /run/hive/mcp.sock; the manager has access to its privileged socket; hive-c0re owns the host admin socket.

Wake injection

AgentRequest::Wake { from, body } (and the manager-flavour mirror) is the wake-event-injection surface. Recipient is implicit — the agent the socket belongs to — and from is caller-chosen so the wake prompt can label the source verbatim ("matrix: new message in #general", "forge: PR #42 opened", etc.). Typical caller: an in-container background task (the matrix daemon, a scraper, the forge-notify webhook subscriber) that needs to signal "external work has arrived" without going through the broker as a peer agent.

Identity = socket means anything that can connect to /run/hive/mcp.sock is implicitly trusted to inject wakes. That's fine: the bind-mount only exposes the socket inside the agent's own container, so the trust boundary is the container's process namespace, not the wire surface.

Recipient sentinels

A few recipient names are reserved by the broker and have special meaning that ordinary agent labels can never collide with — agent name validation rejects any character outside [a-z0-9_-], so the angle-bracket and asterisk shapes below are structurally safe.

When a <children> or <parent> send resolves to real recipients, the broker stores the resolved label(s) as the message recipient(s) — the dashboard and recv side see the real routes. The sentinels are purely send-time addressing conveniences.

Wire protocol

JSON line-delimited over unix sockets in both directions (host admin / manager / agent). SSE streams (/dashboard/stream on hive-c0re, /events/stream on the per-agent web UIs) are text/event-stream; each frame carries a seq field for the snapshot-dedupe dance (see docs/web-ui.md). Request/response types live in hive-sh4re — change them in one place. The dashboard event vocabulary lives in hive-c0re::dashboard_events::DashboardEvent.

Broker delivery + ack cycle

AgentRequest::Recv is the only path that delivers messages to an agent. Always returns a list (Messages { messages }) — empty when nothing's pending, single-pop when max = None (default 1, the single-message behaviour), batched up to max when caller asks for more (server-side cap is 32; values above clamp silently). wait_seconds long-polls for the first message; once one arrives — or one is already pending — the call drains up to max in total before returning, so a single Recv call coalesces a burst.

Per-row bookkeeping inside the broker:

AgentRequest::AckTurn closes out the in-memory list — the harness fires it after TurnOutcome::Ok, marking every message popped since the last ack as fully handled. Claude doesn't see this surface; it's strictly a harness↔broker pairing. On TurnOutcome::Failed the harness intentionally skips the ack so the unacked rows stay in-flight in the DB and get picked up by the next requeue sweep.

AgentRequest::RequeueInflight is the recovery pair: fired by the harness exactly once at boot, before the serve loop starts. Catches the crashed-mid-turn / OOM-killed / container-restarted cases where a previous harness session popped messages but never drove them to a clean turn-end. Resets delivered_at back to NULL on every unacked row (so the next Recv pops them again), and remembers each id in a per-recipient in-memory set so the next Recv can tag the row with redelivered: true. Idempotent + cheap when there's nothing in flight, so the at-boot fire is unconditional.

Question routing (Ask / Answer)

AgentRequest::Ask (and the manager-flavour mirror) surfaces a structured question that either lands in the operator's dashboard queue or in a peer agent's inbox. The recipient is the to field:

Shape fields are uniform across both targets:

Response shape is always QuestionQueued { id } — the asker stores the id and correlates the asynchronous answer event when it lands. Authorisation on Answer: only the question's target agent (or the operator via the dashboard) is permitted to reply; an answer attempt from anyone else fails the wire-side check.

Loose-ends wire shape

LooseEnd is the per-row response shape for GetLooseEnds (both the agent-flavour and manager-flavour requests). Tagged enum so new thread kinds (forge PRs, long-running approvals from a privileged bot, etc.) can land later without breaking existing handlers. Each row carries enough context that the caller renders it directly as a bulleted list, no follow-up fetch needed.

Per-flavour scoping is uniform across the three variants:

Per-variant fields:

age_seconds saturates at zero on any clock anomaly (back-step, unsynchronised wall clock, etc.) so the bulleted list never shows nonsense ages.

CancelLooseEnd { kind, id } is the matching write surface. The kind enum (Question / Reminder / Approval) selects which underlying store the dispatcher reaches into. Question and Reminder cancel from either surface subject to ownership checks (asker for the question, scheduler for the reminder). Approval is manager-only — sub-agents don't submit approvals so they have nothing of their own to withdraw; their wire surface returns a clear error if they try. Cancelling an approval transitions the row to ApprovalStatus::Cancelled and fires ApprovalResolved { status: "cancelled" } so the dashboard pulls the card out of the pending pane.

Agent metadata

AgentRequest::GetAgentMeta { name } returns identity + status for an agent. Self-introspection when name = None (replaces the older Whoami request); target query when name = Some.

Response is AgentMeta { name, running, hyperhive_rev, status_text, status_set_at, hive_name, swarm_name }:

Tool groups

The MCP tool surface an agent receives is derived from a set of named ToolGroup values (hive_sh4re::ToolGroup), not from a hardcoded binary flavor.

Group Tools
messaging send, recv, ask, answer
meta get_agent_meta (set_status is always-on, see below)
inbox get_loose_ends, cancel_loose_end, remind, request_next_turn
execution vestigial — mcp__bash__run / mcp__bash__status are always available unconditionally via extraMcpServers; this group's entries expand to non-existent mcp__hyperhive__run / mcp__hyperhive__status and have no effect. See docs/tools/bash.md.
lifecycle kill, start, restart, update (privileged)
approvals request_init_config, request_apply_commit, request_update_meta_inputs (privileged)
scheduling request_schedule_prompt, fire_schedule_now, cancel_schedule, edit_schedule, list_schedules (privileged)
diagnostics get_logs (privileged)

Always-on toolsset_status is exposed to every agent regardless of which groups it holds (ToolGroup::ALWAYS_ON_TOOLS). The operator dashboard depends on every agent being able to report its status chip, and the server-side SetStatus handler has no tool-group check (only length validation), so gating it would only desync the --allowedTools list from what the host actually accepts. Revoking meta therefore drops get_agent_meta but never set_status.

Config storage — per-agent tool groups live in /var/lib/hyperhive/meta/tool-groups.json (hive-c0re-owned, committed to the meta repo alongside topology.json). Format: { "alice": ["messaging", "meta", "inbox", "lifecycle"], "bob": ["messaging", "meta", "inbox"] }. An absent entry means "use role default". Tool permissions are intentionally NOT configurable from agent.nix — that file goes through the manager's approval flow, so letting it declare its own groups would let the manager grant itself any tool by submitting a config commit, bypassing the operator gate.

Setting groups — the operator sets groups via the dashboard or hive-c0re::tool_groups::set_groups(name, groups). After a change meta::sync_agents commits the updated file; the next agent rebuild picks up the new HIVE_TOOL_GROUPS env var. Agents with no entry get no var.

Runtime resolution — at session start the harness reads HIVE_TOOL_GROUPS (a comma-separated list of snake_case group names injected by the meta renderer from tool-groups.json). Unrecognised tokens are logged and skipped. Falls back to ToolGroup::AGENT_DEFAULT (messaging, meta, inbox, execution) when the var is absent or empty.

Updating the surface — when a new #[tool] fn is added to HiveServer in hive-ag3nt/src/mcp.rs, add its name to the matching ToolGroup::tools() slice in hive-sh4re/src/lib.rs. That's the single source of truth; allowed_mcp_tools reads it at session start.

Capabilities

Capabilities gate system-level access that goes beyond the MCP tool surface — things an agent can access, not just call. Parallel to tool groups but orthogonal: an agent can have a tool group that registers a tool AND a capability that allows the underlying resource access.

Capability Effect
manage_root_agent may lifecycle-manage the root/manager agent via kill/start/restart
read_host_journal get_host_journal MCP tool is registered + GET /journal-host requests are served
query_agent_state may call get_loose_ends / CountPendingReminders targeting non-child agents

Config storage — per-agent capabilities live in /var/lib/hyperhive/meta/capabilities.json alongside tool-groups.json. Format: { "atlas": ["read_host_journal"], "ruth": ["manage_root_agent"] }. An absent entry means "no extra capabilities". render_flake in meta.rs reads this file and injects HIVE_CAPABILITIES (comma-separated snake_case names) into each agent's systemd service env; absent entries emit no env var so agents without capabilities don't trigger a spurious rebuild.

Setting capabilities — the operator sets capabilities via the C4P4B1L1T13S section in the dashboard's P3RM1SS10NS tab. hive-c0re::capabilities::set_caps(name, caps) is the write path. After a change meta::sync_agents commits the updated file; the next agent rebuild picks up the new HIVE_CAPABILITIES env var.

Runtime resolution — at session start the harness reads HIVE_CAPABILITIES and resolves each token to a Capability variant. Unrecognised tokens are logged and skipped. An absent or empty var means no extra capabilities.

Capability NOT configurable from agent.nix — same reasoning as tool groups: an agent that could grant its own capabilities via a config commit would bypass the operator approval gate.

Adding a new capability — add a variant to Capability in hive-sh4re/src/lib.rs + an arm to as_str. Add it to Capability::ALL (the source of truth for the permissions UI columns). Implement the access check in the relevant handler (agent_server.rs, mcp.rs, or dashboard.rs).

Async forms

Dashboard + per-agent mutating forms carry data-async; a delegated submit listener in assets/tabs.js intercepts, shows a spinner, POSTs application/x-www-form-urlencoded (axum's Form extractor rejects multipart), calls refreshState() on success. New mutating forms should add data-async and optionally data-confirm (for a JS-side confirm() prompt) or data-prompt="…" (for a window.prompt() whose answer goes into a hidden input named by data-prompt-field, default note).

refreshState defers automatically when document.activeElement sits inside a managed section so the operator's typing isn't lost; collapsible <details data-restore-key=…> survive the re-render via snapshotOpenDetails / restoreOpenDetails.

rebuild is the reconcile verb

lifecycle::rebuild idempotently rewrites /etc/nixos-containers/<C>.conf (PRIVATE_NETWORK=0, clears HOST_ADDRESS / LOCAL_ADDRESS, sets EXTRA_NSPAWN_FLAGS), regenerates applied/<name>/flake.nix, writes the systemd limits drop-in, then nixos-container update + stop + start.

Anything that changes per-container state on the host should be re-applied here so a manual ↻ R3BU1LD from the dashboard is sufficient to recover.

Actions are factored

approve / deny / destroy (and the lifecycle helper) live in actions.rs / dashboard.rs. The admin socket and the dashboard POST handlers both call into them so the two surfaces never drift.

Commit messages

Short, lowercase, no Co-Authored-By trailer. Imperative mood, no period. Body explains why if non-obvious; otherwise the subject alone is fine. Wrap at ~72 cols.

Commit before test

Stage and commit when work looks ready, then run validation (cargo check, nix flake check, real deploy). Failures get a follow-up commit rather than an amend. The commit history is the work log; rewriting it loses signal.

Best-effort oneshot services

The harness ships a family of one-shot systemd services that configure agent-side surfaces from values hive-c0re writes into the state dir at provisioning time:

Shape contract — every one of these:

  1. Always exit 0, even on internal failure. A non-zero exit would mark the unit failed, which in turn aborts nixos-container update and blocks rebuilds. The agent's capability surface is not allowed to gate the container build.
  2. No set -e in the script body. Subshell failures must not propagate. Use ... || true on every external call that can fail (forge unreachable, missing icon, parse error, etc.)
  3. Skip silently when prerequisites are missing: no token file, no icon, no reachable upstream → echo a short skip line + exit 0. The next boot tries again.
  4. Wired to multi-user.target so they run on every boot (lets a rotated token / new icon take effect without systemctl restart gymnastics).
  5. Re-runnable: a second invocation produces the same final state (idempotent uploads, idempotent config rewrites). Used by the .path watchers that re-fire on token appearance (see docs/persistence.md::matrix-avatar-sync).

The artefact lives under the agent user's home where applicable (~/.config/tea/config.yml) and is chown'd to that user, but the service itself stays root-owned so the bootstrap ordering doesn't need a user-existence check before each fire.

This pattern keeps the rebuild path resilient: any failure inside these services degrades the corresponding surface (no tea config, no avatar) but never blocks the container from coming up. The operator notices through journalctl -u <unit> rather than a broken switch-to-configuration.