Approvals + manager + helper events

The approval queue is hyperhive's pivot: nothing that changes the shape of an agent (its config, whether it exists) happens without an operator click. The manager (ruth) is the policy gate in front of that queue; helper events are how it stays informed about what happens after a decision lands.

End-to-end approval flow

  1. Manager edits files under /agents/<name>/config/ (any tracked path, but agent.nix is the contract entry point) and commits with its own git identity.
  2. Manager submits the commit sha via request_apply_commit(agent, commit_ref). commit_ref must be a commit sha (7-40 hex chars, short or full) — a branch or tag name is rejected so the approval pins an immutable commit.
  3. hive-c0re immediately fetches that commit from the proposed repo into the applied repo and tags it proposal/<id>. It resolves the sha locally against the proposed repo, fetches all of proposed's heads into applied's object db, then tags the resolved commit — git fetch <remote> <sha>:<dst> can't fetch by a bare sha (the left side of a refspec is a remote ref name), so the resolution happens on hive-c0re's side. The approval row stores both the manager-supplied sha and the canonical hive-c0re-vouched sha. From here on the proposed repo is irrelevant for this approval — the manager can amend, force-push, or rm -rf the proposed repo and the queued approval still points at an immutable git object inside applied. 3a. Flake validation (ApplyCommit only): after the proposal tag is planted, hive-c0re reads proposal/<id>:flake.lock and runs two checks. If either check fails, no pending approval is created for the operator — the row is marked failed and surfaces on the dashboard with the validation message:
    • Stale lock — materialises the commit in a temp worktree, runs nix flake lock (no --update-input flags, so it only fills missing entries), and rejects if the committed flake.lock differs from the result. Triggered when the manager added or removed inputs in flake.nix without re-running nix flake lock. Fix: run nix flake lock in the config repo, commit, and re-submit.
    • Duplicate inputs — groups lock nodes by their canonical original field; rejects if two or more nodes share the same source. This usually means an input is missing inputs.<x>.inputs.nixpkgs.follows = "nixpkgs". Fix: add the follows directive, re-lock, and re-submit. Both checks only flag new violations — agents whose lock already carried duplicates before this check was added are unaffected until a coordinated config-change pass via manager.
  4. Operator sees the proposal as a card on the dashboard — a full multi-file diff, toggleable between three bases (vs the running tree / vs the last approved proposal / vs the previous queued proposal) — and clicks ◆ APPR0VE (or hive-c0re approve <id> on the CLI).
  5. hive-c0re moves the working tree to proposal/<id> and runs the build under a sequence of tags (see below). On success, applied/main fast-forwards to the proposal commit. On failure, main stays put and the working tree resets back to the previous deployed commit.
  6. HelperEvent::ApprovalResolved (and Rebuilt for the ApplyCommit kind) land in the manager's inbox, carrying both the canonical sha and the terminal tag.

Withdrawing a pending approval

The manager can call cancel_loose_end(kind: "approval", id) to withdraw an approval that hasn't been acted on yet. The row transitions to ApprovalStatus::Cancelled (distinct from Denied/Failed), the dashboard pulls the card out of the pending pane, and ApprovalResolved { status: "cancelled" } fires on the manager + dashboard channels. Approvals that have already been approved/denied/failed return an error — the resolution is final once the operator (or a lifecycle failure) acted on the row.

Sub-agent surface refuses the approval kind with a clear error: sub-agents don't submit approvals, so they have nothing of their own to withdraw. Manager-only.

InitConfig approvals are the first step in a two-step spawn flow. On approve, hive-c0re seeds the proposed config repo with a default agent.nix template and sends the manager HelperEvent::ConfigReady { agent }. The manager then reviews, edits, and commits the template before calling request_apply_commit to proceed to an ApplyCommit approval. The first ApplyCommit creates the container; subsequent ones rebuild it with new config. This gives the manager (and operator) an explicit review gate on the initial configuration before any container is created.

Approval kinds (wire shapes)

ApprovalKind carries five variants; each maps to a different commit_ref encoding because that field is overloaded as the kind-specific payload carrier.

Scheduled prompts (submit paths)

Two ways a row lands in scheduled_prompts:

No self-target shortcut: even agent-self schedules need approval. The existing remind MCP tool stays the quick self-wake path (no approval, lands directly in the agent's own inbox); this module is the bigger, multi-recipient, operator-visible thing.

Scheduled prompt worker (catch-up clamp)

When hive-c0re comes back from being down, the worker sees rows whose next_fire_at_unix is well in the past. For recurring rows that would mean firing N delayed pulses in a row — spammy and useless. Instead the worker fires once per row and bumps next_fire_at_unix to the next interval slot ≥ now, recording how many cycles were skipped in last_result (per-target). Operators see "fired late, caught up from 17 skipped" instead of 17 wake-up storms.

One-shot rows fire once (if past due, on the next worker pass) and are deleted by the worker; recurring rows survive until cancelled.

targets is its own table (scheduled_prompt_targets) so partial cancellation flips a single row and the dashboard can show last-fired / last-result per recipient. Cancelling every target reaps the parent row on the next worker pass.

Missing-target failure

When a target name doesn't resolve to a known agent (container destroyed, operator typo, etc.) the worker:

  1. Records last_result = "no such agent: <name>" on the per-target row.
  2. Sends a single advisory Message from system to operator naming the schedule, target, and reason.
  3. Continues fanning out to the other live targets.

Transient broker errors (sqlite lock contention, etc.) get the same last_result annotation plus a tracing::warn, and then:

Reminder delivery: file-path semantics

A reminder may carry a file_path (the agent-visible path inside its container, e.g. /agents/<name>/state/foo.md). On delivery hive-c0re:

  1. Translates the container path to the host path (/var/lib/hyperhive/agents/<name>/state/foo.md) so c0re can write from outside the container.
  2. Validates the path: rejects anything outside the agent's own state subtree, containing .. (path traversal), or with an empty relative tail. On rejection the write is skipped and the original message is delivered inline with a warning — the reminder still fires.
  3. Defends against symlink escape: after create_dir_all, the parent dir is canonicalized and re-verified to live under the agent's host state root. The final file is opened with O_NOFOLLOW | O_CREAT | O_TRUNC so an existing symlink at the basename cannot redirect the write to an arbitrary host path.
  4. Writes the body to disk and delivers a short pointer message in its place, keeping the agent's inbox / wake-prompt small while the bulky payload is read out of band.

Atomicity of the inbox INSERT + reminders.sent_at UPDATE is handled inside Broker::deliver_reminders_batch; the scheduler only computes the body strings before calling it.

Destroy semantics

HostRequest::Destroy { name, purge } is the lifecycle tear-down, not an approval. Stops + removes the nspawn container, drops the systemd drop-in, fails any pending approvals. Persistent state (proposed/applied repos, claude credentials, /state/ notes) is kept by default — recreating the agent with the same name reuses prior config + login. With purge = true the agent's /var/lib/hyperhive/{agents,applied}/<name>/ trees are also wiped (config history + creds + notes gone forever). The manager refuses to destroy itself.

Meta flake

The hive-c0re-owned repo at /var/lib/hyperhive/meta/ declares one flake input per agent (agent-<n>.url = "git+file:///var/lib/hyperhive/applied/<n>") and one nixosConfigurations.<n> output per agent. Each output wraps inputs.agent-<n>.nixosModules.default with the identity + HIVE_PORT / HIVE_LABEL / HIVE_DASHBOARD_PORT injection module that setup_applied used to generate inline. Containers run against --flake /var/lib/hyperhive/meta#<n>.

Per-deploy lock flow (two-phase, owned by actions::run_apply_commitmeta::{prepare,finalize,abort} _deploy):

  1. meta::prepare_deploy(name) runs nix flake lock --update-input agent-<n> without committing. Working tree of meta now points the input at applied/<n>/main (which run_apply_commit already fast-forwarded to proposal/<id>).
  2. lifecycle::rebuild_no_meta runs nixos-container update <c> --flake meta#<name>. Nix evaluates against the staged lock.
  3. On success — meta::finalize_deploy(name, sha, "deployed/ <id>") stages flake.lock and commits with deploy <n> deployed/<id> <sha12>. Meta's git log gains one entry per successful deploy.
  4. On failure — meta::abort_deploy() runs git restore flake.lock so the meta history shows only successes; the failure stays as an annotated failed/<id> tag in applied/<n>.

Single-phase variants exist for paths without rollback semantics: meta::lock_update_for_rebuild(name) for the manual ↻ R3BU1LD button (commits if the lock changed) and meta::lock_update_hyperhive() for the auto-update flake-rev bump (one shot before per-agent rebuilds, commits if the lock changed).

meta::sync_agents(hyperhive_flake, dashboard_port, &agents) is the idempotent reconciler called by spawn, destroy, rebuild, and the startup migration. Renders flake.nix from the agent list; if it differs from disk, runs nix flake lock + commits as regenerate meta flake (or seed meta from N agent(s) on the very first call).

The manager has /meta RO-bound inside its container: git -C /meta log --oneline is the swarm-wide deploy log, cat /meta/flake.lock | jq '.nodes["agent-<n>"].locked' resolves which sha each agent is pinned at right now. Dashboard surfaces the same info as a deployed:<sha12> chip per container row.

Two repos per agent

/var/lib/hyperhive/agents/<name>/config/    proposed — manager RW
└── <anything>                              # any files the manager
                                            # wants in the commit.
                                            # agent.nix is the
                                            # convention entry
                                            # point; flake.nix is
                                            # tracked boilerplate
                                            # (manager doesn't edit
                                            # it).

/var/lib/hyperhive/applied/<name>/          applied — core-only
├── .git/                                   # tag-rich history
├── flake.nix                               # tracked, fixed
│                                           # boilerplate exporting
│                                           # nixosModules.default
├── agent.nix                               # working tree of main
└── <other manager files>                   # also tracked

/var/lib/hyperhive/meta/                    swarm-wide flake — core
├── .git/                                   # one commit per successful
│                                           # deploy
├── flake.nix                               # generated from agent set
└── flake.lock                              # pins each agent's sha

Why two physical repos: the manager's /agents/<n>/config/ is RW — a buggy or hostile agent can git clean -fdx its own proposed tree. The applied repo is never bind-mounted (except the read-only .git exposure described below) so a destructive move inside the container cannot reach it.

The container's --flake ref is /var/lib/hyperhive/meta#<name> (see "Meta flake" above). The agent's own applied/<n>/flake.nix is a fixed boilerplate that exports nixosModules.default = import ./agent.nix; the meta flake imports that module and wraps it with identity + HIVE_PORT / HIVE_LABEL / HIVE_DASHBOARD_PORT.

Tag state machine

Every approval id walks through a fixed set of tags on the underlying commit inside the applied repo:

Tag When Annotated?
proposal/<id> request_apply_commit, after fetch no
approved/<id> operator approve no
building/<id> rebuild started no
deployed/<id> rebuild succeeded — main ff's here no
failed/<id> rebuild failed yes (body = error)
denied/<id> operator deny yes (body = operator note)

applied/main is always the latest deployed/*. denied/ and failed/ are terminal; the manager submits a new commit + new approval id to retry. Because tags are first-class git objects, rejected and failed trees stay browsable forever — git log --tags in the applied repo is the audit trail.

Dispatch via rebuild_queue

Long-running approval work — ApplyCommit, UpdateMetaInputs, Spawn — no longer runs inline inside actions::approve. Instead the approval handler enqueues a QueueEntry into the global rebuild_queue:

ApprovalKind QueueKind queued QueueSource
ApplyCommit Rebuild Approval
UpdateMetaInputs MetaUpdate Approval
Spawn Spawn Approval
InitConfig — runs inline (sub-second git seed)
SchedulePrompt — runs inline (single sqlite insert)

Each queue entry carries the originating approval_id so the worker can re-fetch the approval row when it dispatches, run the kind-specific pipeline (run_approval_apply_commit / run_approval_update_meta_inputs / run_approval_spawn), and fire the matching HelperEvent::* on completion via finish_approval.

Two visible consequences:

QueueSource::Approval carries the approval_id so a tail-end build failure surfaces back as a failed approval row, not just a silent queue entry. QueueSource::Manual (dashboard ↻ R3BU1LD) and QueueSource::AutoUpdate (boot-time sweep) use the same queue but skip the approval row plumbing.

Forge mirror

When the bundled hive-forge container is running — on by default, hyperhive.forge.enable — hive-c0re mirrors every agent's applied repo into a private agent-configs Forgejo org. forge::push_config(<name>) pushes applied/main plus every tag to agent-configs/<name> after each ref mutation: the spawn that seeds deployed/0, every request_apply_commit (which plants proposal/<id>), every approve / deny, and a sweep at startup. Pushes are best-effort — a missing or stopped forge never blocks a deploy.

The org is private and agents are not members, so only the core user (a Forgejo site admin) can read it: an agent can't reach another agent's config — or even its own — through the forge. The tokenised push URL is passed inline to git push, never written into applied/<n>/.git/config; that repo is RO-bind-mounted into the manager, and a stored token would leak core's admin credential to an agent.

The dashboard deep-links into this org — a config repo link per container row and a commit on forge link per approval card. See docs/web-ui.md.

Manager view of applied + meta

The manager container gets three host-side bind mounts via set_nspawn_flags:

Each proposed repo (/agents/<n>/config/) is pre-configured with applied as a git remote pointing at /applied/<n>/.git. Useful incantations from inside the manager:

git -C /agents/<n>/config fetch applied
git -C /agents/<n>/config log applied/main --oneline
git -C /agents/<n>/config show applied/refs/tags/deployed/<id>
git -C /agents/<n>/config show applied/refs/tags/failed/<id>   # body = build error
git -C /agents/<n>/config show applied/refs/tags/denied/<id>   # body = operator note
git -C /agents/<n>/config rebase applied/main                 # base in-flight work on what's deployed

git -C /meta log --oneline                                    # swarm-wide deploy history
cat /meta/flake.lock | jq '.nodes | with_entries(select(.key | startswith("agent-")))'

The RO binds block push at the kernel level, so the manager can only fetch / read — git plumbing inside the container cannot corrupt either authoritative repo.

Migration from the pre-tag / pre-meta schemes

Both overhauls (tag-driven flow + meta flake) ship in-place migrations that run on every hive-c0re startup. Idempotent; each phase is a no-op once already applied. Behaviour:

No state loss in either migration. claude creds, /state/ notes, the events DB, proposed history, and applied history all survive. The manager keeps its session; sub-agents stay logged in.

Manager (ruth) is hive-c0re-managed

The manager container runs through the same lifecycle as sub-agents. On hive-c0re serve startup, if ruth is missing, hive-c0re creates it. The manager's flake lives at /var/lib/hyperhive/applied/ruth/; its proposed config at /var/lib/hyperhive/agents/ruth/config/. Manager can edit its own agent.nix (visible inside the container at /agents/ruth/config/) and submit request_apply_commit("ruth", <sha>) for operator approval.

Differences from sub-agents:

Migration note (for older hosts): drop any containers.root = { ... } block from your host NixOS config. hyperhive creates and updates the manager itself.

Manager policy

From hive-ag3nt/prompts/system.md (<!-- role:manager --> block, rendered via hive_ag3nt::prompt::render): the manager does NOT rubber-stamp sub-agent config requests. It verifies (role match, package legitimacy, cheaper alternative, blast radius) before committing and calling request_apply_commit.

For ambiguous cases or anything that needs human signal, the manager calls ask(question, options?, multi?, ttl_seconds?, to?) — queues the question and returns the id immediately. When to is omitted (or "operator") the question shows up on the dashboard; when to is a sub-agent's name, the recipient receives a HelperEvent::QuestionAsked and answers via their own answer tool. Either way the answer arrives back as HelperEvent::QuestionAnswered { id, question, answer, answerer } in the asker's inbox. Storage is hive-c0re::operator_questions (sqlite) — same table, with a nullable target column (NULL = operator). Dispatch goes through hive-c0re/src/questions.rs::{handle_ask, handle_answer} so both the agent + manager surfaces stay aligned. The answer flow is:

POST /answer-question/{id}                       agent: Answer { id, answer }
  → OperatorQuestions::answer(_, _, "operator")    → questions::handle_answer
  → notify_agent(asker, QuestionAnswered {         → OperatorQuestions::answer(_, _, agent)
       answerer: "operator", ... })                → notify_agent(asker, QuestionAnswered {
                                                       answerer: agent, ... })

Two more paths resolve a pending question with a sentinel answer:

Helper events to the manager

Coordinator::notify_manager(&HelperEvent) enqueues an inbox message from sender system with the event JSON in the body. The manager harness no longer short-circuits these — they drive a regular claude turn so the manager can react. Variants (hive_sh4re::HelperEvent):

Optional sha field on ApprovalResolved, Spawned, and Rebuilt carries the canonical hive-c0re-vouched commit sha. Optional tag on ApprovalResolved and Rebuilt only — the spawn path always lands at deployed/0, so the tag is implicit and not echoed. The tag values for the variants that do carry it: deployed/<id> / failed/<id> / denied/<id> for approval-driven flows; approved/<id> for the rare bare-approval case where no underlying action runs. Both fields are Option: None on the rebuild paths that don't change the deployed commit (e.g. auto_update::rebuild_agent reapplying the existing main, or the dashboard ↻ R3BU1LD button when the lock didn't move). When set, git show <sha> against /agents/<n>/applied.git inside the manager container yields the exact tree that was referenced.

To add a new event: new HelperEvent variant + call sites + update prompts/system.md (<!-- role:manager --> block, the lifecycle- event list) so the manager knows the new shape.

Auto-update on startup

hive-c0re serve runs auto_update::run in a background task right after opening the coordinator. It enumerates managed containers and rebuilds any whose recorded hyperhive rev differs from the current one — sub-agents and manager go through the same lifecycle::rebuild path.

"Rev" = canonical filesystem path of cfg.hyperhiveFlake. Marker file: /var/lib/hyperhive/applied/.<name>.hyperhive-rev. If the flake input has no canonical path (e.g. a github: URL), auto-update is a no-op — rebuild manually.

The dashboard surfaces pending updates per agent: a clickable "needs update ↻" badge appears whenever the marker differs from current rev. The badge POSTs /rebuild/<name>, calling the same auto_update::rebuild_agent path so manual triggers and the startup scan can't drift. When at least one container is stale, a top-level ↻ UPD4TE 4LL button appears that loops over every stale container.