Conventions
Code-style and process expectations across the workspace. Most of these exist because something already went wrong without them.
Naming
- Containers are length-bounded by
nixos-container(≤ 11 chars). - Sub-agents are
h-<name>with<name>≤ 9 chars. - The manager is
ruth(fixed name). MAX_AGENT_NAMEinlifecycle.rsenforces the cap.- Per-agent web UI port =
WEB_PORT_BASE + FNV1a(name) % WEB_PORT_RANGE(8100..8999) for every agent including the manager; dashboardcfg.dashboardPort(default 7000).
Hive identity (label + domain + display names)
Four env vars cover the identity surface, read by
hive_ag3nt::identity:
HIVE_LABEL— short, hive-local agent label (iris,damocles).label()returns it; falls back to empty string if the env var is missing so downstream callers can decide how to surface "unknown agent" rather than getting a panic from this module.HYPERHIVE_HIVE_DOMAIN— the hive's canonical DNS domain (e.g.darkest.space), set byhive-c0re.nixfromservices.hyperhive.domain. When configured,qualified_label()returns${label}@${domain}(e.g.iris@darkest.space); when unset (single-hive deployments, dev/test) it degrades to just the short label so existing callers see no change. The qualified form surfaces in the per-agent web UI title, the system-prompt template, and/api/state.qualified_label.HYPERHIVE_HIVE_NAME— human display name of this hive (pr1ma). Read byhive_name();Nonewhen unset.HYPERHIVE_SWARM_NAME— human display name of the wider swarm this hive belongs to (constellat1on). Read byswarm_name(); federated hives at different DNS domains can share a swarm name.
hive_name + swarm_name are distinct from
HYPERHIVE_HIVE_DOMAIN: the domain may carry the hive name as
its leftmost label by convention, but the convention isn't
machine-readable, and federated hives at different DNS domains
can share a swarm name. Humans want both: the address
(@darkest.space) AND the prose name (pr1ma). Matrix MXIDs
still use the domain-based convention untouched.
qualify(label) is the same shape as qualified_label() but
applies to an arbitrary label the caller already has (e.g. a peer
name from the broker); it's the right surface when rendering a
peer's name when the caller knows it's hive-local.
Identity = socket
There are no auth tokens on the per-agent unix sockets. The socket
path identifies the principal; perms come from "who has the
bind-mount." A sub-agent only sees its own /run/hive/mcp.sock; the
manager has access to its privileged socket; hive-c0re owns the host
admin socket.
Wake injection
AgentRequest::Wake { from, body } (and the manager-flavour mirror)
is the wake-event-injection surface. Recipient is implicit — the
agent the socket belongs to — and from is caller-chosen so the
wake prompt can label the source verbatim ("matrix: new message in #general", "forge: PR #42 opened", etc.). Typical caller: an
in-container background task (the matrix daemon, a scraper, the
forge-notify webhook subscriber) that needs to signal "external work
has arrived" without going through the broker as a peer agent.
Identity = socket means anything that can connect to
/run/hive/mcp.sock is implicitly trusted to inject wakes. That's
fine: the bind-mount only exposes the socket inside the agent's own
container, so the trust boundary is the container's process
namespace, not the wire surface.
Recipient sentinels
A few recipient names are reserved by the broker and have special
meaning that ordinary agent labels can never collide with — agent
name validation rejects any character outside [a-z0-9_-], so the
angle-bracket and asterisk shapes below are structurally safe.
*— broadcast: deliver to every running agent except the sender (agent_server::handle_sendfans out viaCoordinator::broadcast_send).operator— the human at the dashboard. Messages accumulate in the inbox view; no agent everrecv's them.<parent>— the sender's parent pertopology.json. Rewritten at send time bytopology::resolve_recipient: looks upparent_of(sender)and falls back tooperatorwhen the sender is a root agent (or absent from topology entirely). Lets agents address their parent without learning the label, so runtime reparenting propagates with zero agent-side restart.<children>— fan-out to every direct descendant of the sender pertopology.json. Resolved inagent_server::handle_sendviatopology::children_of(sender): one message is delivered to each child, bypassing the allow-list check (structural fan-out targets are never user-listed peers). No-op for leaf agents (returnsOkwhen the child set is empty). Lets a sub-manager nudge its subtree without enumerating labels.
When a <children> or <parent> send resolves to real recipients, the
broker stores the resolved label(s) as the message recipient(s) — the
dashboard and recv side see the real routes. The sentinels are purely
send-time addressing conveniences.
Wire protocol
JSON line-delimited over unix sockets in both directions (host admin
/ manager / agent). SSE streams (/dashboard/stream on hive-c0re,
/events/stream on the per-agent web UIs) are text/event-stream;
each frame carries a seq field for the snapshot-dedupe dance
(see docs/web-ui.md). Request/response types live in hive-sh4re
— change them in one place. The dashboard event vocabulary lives
in hive-c0re::dashboard_events::DashboardEvent.
Broker delivery + ack cycle
AgentRequest::Recv is the only path that delivers messages to an
agent. Always returns a list (Messages { messages }) — empty when
nothing's pending, single-pop when max = None (default 1, the
single-message behaviour), batched up to max when caller asks for
more (server-side cap is 32; values above clamp silently).
wait_seconds long-polls for the first message; once one arrives —
or one is already pending — the call drains up to max in total
before returning, so a single Recv call coalesces a burst.
Per-row bookkeeping inside the broker:
delivered_at = NOWset on every popped row.- Each recipient has an in-memory
unacked_idslist of every row delivered since the lastAckTurn. redelivered = trueon a row ifRequeueInflightresurfaced it (the harness prepends a "may already be handled" hint when this flag is set so the per-message warning is visible).
AgentRequest::AckTurn closes out the in-memory list — the harness
fires it after TurnOutcome::Ok, marking every message popped since
the last ack as fully handled. Claude doesn't see this surface; it's
strictly a harness↔broker pairing. On TurnOutcome::Failed the
harness intentionally skips the ack so the unacked rows stay
in-flight in the DB and get picked up by the next requeue sweep.
AgentRequest::RequeueInflight is the recovery pair: fired by the
harness exactly once at boot, before the serve loop starts. Catches
the crashed-mid-turn / OOM-killed / container-restarted cases where
a previous harness session popped messages but never drove them to
a clean turn-end. Resets delivered_at back to NULL on every
unacked row (so the next Recv pops them again), and remembers
each id in a per-recipient in-memory set so the next Recv can tag
the row with redelivered: true. Idempotent + cheap when there's
nothing in flight, so the at-boot fire is unconditional.
Question routing (Ask / Answer)
AgentRequest::Ask (and the manager-flavour mirror) surfaces a
structured question that either lands in the operator's dashboard
queue or in a peer agent's inbox. The recipient is the to field:
to = Noneorto = Some("operator")— routes to the operator-question queue. The dashboard renders the question with anyoptionsas a chip strip plus a free-text fallback (Other…) so the operator is never trapped by an incomplete list. The legacyAskOperatorvariant collapses into this case.to = Some(<agent>)— peer Q&A. The target agent receives aHelperEvent::QuestionAsked { id, asker, question, options, multi }in their inbox. They reply viaAgentRequest::Answer(orManagerRequest::Answerif they're the manager); the answer threads back to the asker as aHelperEvent::QuestionAnsweredevent.
Shape fields are uniform across both targets:
optionsis advisory — the dashboard chips are decoration over a free-text fallback; peer-agent recipients see the list in theirQuestionAskedevent and can return any string.multi = truelets the answerer pick multiple options (checkboxes in the dashboard, a hint in the peer-agent event). The answer comes back as a single string with selections joined by", ".ttl_secondsauto-cancels with answer[expired](andanswerer: "ttl-watchdog") when the wait becomes moot.None= wait indefinitely or until manual cancel.
Response shape is always QuestionQueued { id } — the asker stores
the id and correlates the asynchronous answer event when it lands.
Authorisation on Answer: only the question's target agent (or
the operator via the dashboard) is permitted to reply; an answer
attempt from anyone else fails the wire-side check.
Loose-ends wire shape
LooseEnd is the per-row response shape for GetLooseEnds (both
the agent-flavour and manager-flavour requests). Tagged enum so
new thread kinds (forge PRs, long-running approvals from a
privileged bot, etc.) can land later without breaking existing
handlers. Each row carries enough context that the caller renders
it directly as a bulleted list, no follow-up fetch needed.
Per-flavour scoping is uniform across the three variants:
- agent-flavour
GetLooseEndsonly surfaces rows the calling agent has standing in.Approvalrows only appear when the calling agent is the manager (sub-agents don't submit approvals).Questionrows surface where the agent isaskerORtarget(the routing semantics from the Ask/Answer subsection above).Reminderrows are scoped toowner == self. - manager-flavour
GetLooseEndslists every pending row in the swarm — full audit view.
Per-variant fields:
Approval { id, agent, commit_ref, description?, age_seconds }—agentis the affected agent (target of the spawn / config commit), not the asker.descriptionis the manager's free-text blurb shown on the dashboard card.commit_refis the kind-specific payload (seedocs/approvals.md::Approval kinds (wire shapes)).Question { id, asker, target?, question, age_seconds }—target = None= operator-routed (dashboard);Some(agent)= peer-to-peer thread.Reminder { id, owner, message, due_at, age_seconds }—due_atis the absolute unix timestamp the scheduler is targeting; clients compute time-until-fire asdue_at - now.
age_seconds saturates at zero on any clock anomaly (back-step,
unsynchronised wall clock, etc.) so the bulleted list never
shows nonsense ages.
CancelLooseEnd { kind, id } is the matching write surface. The
kind enum (Question / Reminder / Approval) selects which
underlying store the dispatcher reaches into. Question and
Reminder cancel from either surface subject to ownership
checks (asker for the question, scheduler for the reminder).
Approval is manager-only — sub-agents don't submit approvals
so they have nothing of their own to withdraw; their wire
surface returns a clear error if they try. Cancelling an approval
transitions the row to ApprovalStatus::Cancelled and fires
ApprovalResolved { status: "cancelled" } so the dashboard pulls
the card out of the pending pane.
Agent metadata
AgentRequest::GetAgentMeta { name } returns identity + status for
an agent. Self-introspection when name = None (replaces the older
Whoami request); target query when name = Some.
Response is AgentMeta { name, running, hyperhive_rev, status_text, status_set_at, hive_name, swarm_name }:
hyperhive_rev:Noneonly when the configured flake URL has no canonical path. Otherwise carries the rev the target is currently pinned at.running: whether the target's container is currently up. Whenfalse, the host clearsstatus_text/status_set_at— on-disk values from before the stop are stale snapshots and shouldn't be shown as live status. Defaults totrueon the wire (older harnesses never serialised it, and the host only knew how to ask about live containers — keeps backwards-compat with pre-running-field payloads).status_text/status_set_at: last value written viaSetStatus, plus its unix timestamp. BothNonewhen the target has never set a status, when the agent name is unknown, or whenrunning = false(see above).hive_name/swarm_name: display names read fromHYPERHIVE_HIVE_NAME/HYPERHIVE_SWARM_NAMEenv (sourced fromservices.hyperhive.hiveName/services.hyperhive.swarmName). BothNonewhen the options aren't configured.
Tool groups
The MCP tool surface an agent receives is derived from a set of named
ToolGroup values (hive_sh4re::ToolGroup), not from a hardcoded
binary flavor.
| Group | Tools |
|---|---|
messaging |
send, recv, ask, answer |
meta |
get_agent_meta (set_status is always-on, see below) |
inbox |
get_loose_ends, cancel_loose_end, remind, request_next_turn |
execution |
vestigial — mcp__bash__run / mcp__bash__status are always available unconditionally via extraMcpServers; this group's entries expand to non-existent mcp__hyperhive__run / mcp__hyperhive__status and have no effect. See docs/tools/bash.md. |
lifecycle |
kill, start, restart, update (privileged) |
approvals |
request_init_config, request_apply_commit, request_update_meta_inputs (privileged) |
scheduling |
request_schedule_prompt, fire_schedule_now, cancel_schedule, edit_schedule, list_schedules (privileged) |
diagnostics |
get_logs (privileged) |
Always-on tools — set_status is exposed to every agent regardless of
which groups it holds (ToolGroup::ALWAYS_ON_TOOLS). The operator dashboard
depends on every agent being able to report its status chip, and the
server-side SetStatus handler has no tool-group check (only length
validation), so gating it would only desync the --allowedTools list from
what the host actually accepts. Revoking meta therefore drops
get_agent_meta but never set_status.
Config storage — per-agent tool groups live in
/var/lib/hyperhive/meta/tool-groups.json (hive-c0re-owned, committed to the
meta repo alongside topology.json). Format: { "alice": ["messaging", "meta", "inbox", "lifecycle"], "bob": ["messaging", "meta", "inbox"] }. An absent entry
means "use role default". Tool permissions are intentionally NOT configurable
from agent.nix — that file goes through the manager's approval flow, so
letting it declare its own groups would let the manager grant itself any tool by
submitting a config commit, bypassing the operator gate.
Setting groups — the operator sets groups via the dashboard or
hive-c0re::tool_groups::set_groups(name, groups). After a change
meta::sync_agents commits the updated file; the next agent rebuild picks up
the new HIVE_TOOL_GROUPS env var. Agents with no entry get no var.
Runtime resolution — at session start the harness reads HIVE_TOOL_GROUPS
(a comma-separated list of snake_case group names injected by the meta renderer
from tool-groups.json). Unrecognised tokens are logged and skipped. Falls back
to ToolGroup::AGENT_DEFAULT (messaging, meta, inbox, execution) when
the var is absent or empty.
Updating the surface — when a new #[tool] fn is added to HiveServer
in hive-ag3nt/src/mcp.rs, add its name to the matching ToolGroup::tools()
slice in hive-sh4re/src/lib.rs. That's the single source of truth;
allowed_mcp_tools reads it at session start.
Capabilities
Capabilities gate system-level access that goes beyond the MCP tool surface — things an agent can access, not just call. Parallel to tool groups but orthogonal: an agent can have a tool group that registers a tool AND a capability that allows the underlying resource access.
| Capability | Effect |
|---|---|
manage_root_agent |
may lifecycle-manage the root/manager agent via kill/start/restart |
read_host_journal |
get_host_journal MCP tool is registered + GET /journal-host requests are served |
query_agent_state |
may call get_loose_ends / CountPendingReminders targeting non-child agents |
Config storage — per-agent capabilities live in
/var/lib/hyperhive/meta/capabilities.json alongside tool-groups.json.
Format: { "atlas": ["read_host_journal"], "ruth": ["manage_root_agent"] }.
An absent entry means "no extra capabilities". render_flake in meta.rs
reads this file and injects HIVE_CAPABILITIES (comma-separated
snake_case names) into each agent's systemd service env; absent entries emit
no env var so agents without capabilities don't trigger a spurious rebuild.
Setting capabilities — the operator sets capabilities via the
C4P4B1L1T13S section in the dashboard's P3RM1SS10NS tab.
hive-c0re::capabilities::set_caps(name, caps) is the write path.
After a change meta::sync_agents commits the updated file; the next agent
rebuild picks up the new HIVE_CAPABILITIES env var.
Runtime resolution — at session start the harness reads HIVE_CAPABILITIES
and resolves each token to a Capability variant. Unrecognised tokens are
logged and skipped. An absent or empty var means no extra capabilities.
Capability NOT configurable from agent.nix — same reasoning as tool
groups: an agent that could grant its own capabilities via a config commit would
bypass the operator approval gate.
Adding a new capability — add a variant to Capability in
hive-sh4re/src/lib.rs + an arm to as_str. Add it to Capability::ALL (the
source of truth for the permissions UI columns). Implement the access check in
the relevant handler (agent_server.rs, mcp.rs, or dashboard.rs).
Async forms
Dashboard + per-agent mutating forms carry data-async; a delegated
submit listener in assets/tabs.js intercepts, shows a spinner,
POSTs application/x-www-form-urlencoded (axum's Form extractor
rejects multipart), calls refreshState() on success. New mutating
forms should add data-async and optionally data-confirm (for a
JS-side confirm() prompt) or data-prompt="…" (for a
window.prompt() whose answer goes into a hidden input named by
data-prompt-field, default note).
refreshState defers automatically when document.activeElement
sits inside a managed section so the operator's typing isn't lost;
collapsible <details data-restore-key=…> survive the re-render
via snapshotOpenDetails / restoreOpenDetails.
rebuild is the reconcile verb
lifecycle::rebuild idempotently rewrites
/etc/nixos-containers/<C>.conf (PRIVATE_NETWORK=0, clears
HOST_ADDRESS / LOCAL_ADDRESS, sets EXTRA_NSPAWN_FLAGS),
regenerates applied/<name>/flake.nix, writes the systemd limits
drop-in, then nixos-container update + stop + start.
Anything that changes per-container state on the host should be
re-applied here so a manual ↻ R3BU1LD from the dashboard is
sufficient to recover.
Actions are factored
approve / deny / destroy (and the lifecycle helper) live in
actions.rs / dashboard.rs. The admin socket and the dashboard
POST handlers both call into them so the two surfaces never drift.
Commit messages
Short, lowercase, no Co-Authored-By trailer. Imperative mood, no
period. Body explains why if non-obvious; otherwise the subject
alone is fine. Wrap at ~72 cols.
Commit before test
Stage and commit when work looks ready, then run validation
(cargo check, nix flake check, real deploy). Failures get a
follow-up commit rather than an amend. The commit history is the
work log; rewriting it loses signal.
Best-effort oneshot services
The harness ships a family of one-shot systemd services that configure agent-side surfaces from values hive-c0re writes into the state dir at provisioning time:
tea-login— writes~/.config/tea/config.ymlfrom theforge-tokenwritten byhive-c0re::forge::ensure_user_for, sotea repos create/tea pulls creatework without interactive prompts.forge-avatar-sync— uploadshyperhive.iconSVG to the agent's Forgejo profile, so the icon shows up on commits / PRs / issue comments.matrix-avatar-sync— same idea for matrix profile avatars (two-stepmedia upload→set avatar_urldance — seedocs/persistence.md::matrix-avatar-syncfor the protocol detail).
Shape contract — every one of these:
- Always
exit 0, even on internal failure. A non-zero exit would mark the unitfailed, which in turn abortsnixos-container updateand blocks rebuilds. The agent's capability surface is not allowed to gate the container build. - No
set -ein the script body. Subshell failures must not propagate. Use... || trueon every external call that can fail (forge unreachable, missing icon, parse error, etc.) - Skip silently when prerequisites are missing: no token
file, no icon, no reachable upstream →
echoa short skip line +exit 0. The next boot tries again. - Wired to
multi-user.targetso they run on every boot (lets a rotated token / new icon take effect withoutsystemctl restartgymnastics). - Re-runnable: a second invocation produces the same final
state (idempotent uploads, idempotent config rewrites). Used
by the
.pathwatchers that re-fire on token appearance (seedocs/persistence.md::matrix-avatar-sync).
The artefact lives under the agent user's home where applicable
(~/.config/tea/config.yml) and is chown'd to that user, but the
service itself stays root-owned so the bootstrap ordering doesn't
need a user-existence check before each fire.
This pattern keeps the rebuild path resilient: any failure inside
these services degrades the corresponding surface (no tea config,
no avatar) but never blocks the container from coming up. The
operator notices through journalctl -u <unit> rather than a
broken switch-to-configuration.