hive-ci: Forgejo Actions Runner

The hive-ci module runs a Forgejo Actions runner in a hive-ci nixos-container, executing CI jobs from .forgejo/workflows/ci.yml on every PR.

CI checks

Three jobs run on every PR (and on workflow_dispatch for manual re-triggers):

Job	What it runs	Currently required
nix flake check	treefmt + rustfmt formatting, `cargo clippy -D warnings`, `cargo test`, module evaluation	yes
tracker-tag lint	flags `#NNN` issue tags in source and comments (`scripts/check-issue-refs.sh`)	no (red, non-blocking)
comment-block lint	flags contiguous comment blocks over 30 lines (`scripts/check-comment-blocks.sh`)	no (red, non-blocking)

The tracker-tag and comment-block checks are non-blocking today (a hit fails the check but does not prevent merge) while the legacy backlog is cleaned up. They are expected to become required checks once the tree is clean.

Running checks locally

Don't run nix flake check directly — it dispatches to the shared build farm and wastes a remote-builder slot. Use the devshell equivalents instead:

nix develop -c cargo clippy --all-targets -- -D warnings
nix develop -c cargo test
nix develop -c treefmt          # same as nix fmt; treefmt covers rustfmt + nixfmt + taplo
sh scripts/check-issue-refs.sh  # tracker-tag lint
sh scripts/check-comment-blocks.sh  # comment-block lint

A git pre-push hook that automates the two lint checks is provided at scripts/pre-push. Install it once per clone:

ln -sf ../../scripts/pre-push .git/hooks/pre-push

After that, any git push automatically runs both lints and aborts with a diagnostic if either fails — catching the issue locally before CI sees it. Note that the hook does not run cargo clippy or cargo test (those are slow); run those manually before pushing Rust changes.

Operator bootstrap

Set services.hyperhive.forge.ci.enable = true in the host NixOS config. That's it — no manual token provisioning.

Requirements:

The internal forge is always present (mandatory), so the runner always has a hive-forge instance to register against — nothing extra to enable.
Optional: tune services.hyperhive.forge.ci.name (runner name in forge admin panel), concurrency (parallel job capacity), labels (workflow targeting), jobTimeout (per-job wall-clock cap, default "1h", Go duration string e.g. "3h" — a job that exceeds it is killed so a hung or runaway build can't hold the runner's single slot indefinitely).

Container design

Private netns, bridge-attached: the container runs in its own network namespace (privateNetwork = true, hostBridge) and reaches hive-forge through the gateway at http://<forge.domain> (resolved to the bridge IP via networking.extraHosts). It cannot reach host-loopback services — the core dashboard at 127.0.0.1:7000 and the raw forge port are unreachable from CI. Requires forge.behindGateway = true.
Non-ephemeral: runner credentials persist across restarts (written to container's stateDir on first registration, reused thereafter).
Sandbox fallback: nspawn containers can't create user-namespaces, so nix's sandboxing would always fail. Module sets nix.settings.sandbox-fallback = true in the container — nix builds run unsandboxed (safe because the container is already isolated). See docs/gotchas.md.
Credential isolation: the forge admin token (forge-core-token) never enters the container. hive-c0re holds it and performs all forge API calls (runner validation + registration-token mint, in forge/ci_runner.rs); via hive-priv it writes only the runner registration token to the host env-file /run/hive-ci/runner-token, which the container bind-mounts read-only.

Auto-registration flow

Registration is off the container's boot-critical path — hive-c0re owns it and runs it out of band, so a slow forge or core-token never delays the container's start. (The earlier design ran a host-side hive-ci-prefetch.service that gated container@hive-ci start on a forge round-trip, which could exceed the nspawn start timeout and trip a restart loop; moving registration into hive-c0re removed that.) The core admin token is held only by hive-c0re on the host; only the runner registration token reaches the container.

hive-c0re side (`forge/ci_runner.rs`, run during the startup sweep)

Gated on HYPERHIVE_FORGE_CI_ENABLED (the nix module sets it on hive-c0re.service when forge.ci.enable). Best-effort — failures are logged and never abort the sweep; a healthy runner is never restarted.

If .runner exists at /var/lib/nixos-containers/hive-ci/var/lib/gitea-runner/hive/.runner, validate its id against GET /api/v1/admin/runners/{id} with the core admin token:
- 200: still registered — done, no restart.
- 404 / other non-200 / malformed: stale — re-register (below).
- transport error (forge unreachable): keep the existing creds; a network blip must not wipe a valid runner.
If absent or stale: mint a fresh token from GET /api/v1/admin/runners/registration-token, then hand it to hive-priv's RegisterCiRunner, which (as root) writes TOKEN=<real> in place to the host env-file /run/hive-ci/runner-token (preserving the inode nspawn pinned into the container at start) and restarts gitea-runner-hive.service inside the container so it picks up the credential and registers.

Container side

The container boots immediately — nothing gates its start on registration.
tmpfiles seeds /run/hive-ci/runner-token with TOKEN=placeholder so the runner's EnvironmentFile always exists.
gitea-runner-hive.service has an ExecStartPre precond (ahead of the nix-daemon wait) that fails fast unless it is already registered (.runner present) or a real, non-placeholder token is in place. Restart=on-failure (no start-limit cap) self-heals it: a runner that precond-fails at boot keeps retrying until hive-c0re writes the token (c0re's explicit restart is the primary path; the retry is the safety net).
Convergence: because the token write targets the host file, even if c0re's restart races the container being down, the container later starts, reads the now-real token, passes the precond, and registers on its own.

Actions checkout mirror

When forge.ci.enable is set, hive-c0re auto-seeds an actions/checkout pull-mirror on the local forge and sets Forgejo's DEFAULT_ACTIONS_URL to point at the local instance. This means CI uses: actions/checkout@vN steps resolve entirely on loopback — no external DNS on the CI critical path.

The mirror is seeded by hive-c0re itself during its forge provisioning sweep (forge.rs::ensure_mirrors). The nix module forwards the effective mirror list as HYPERHIVE_FORGE_MIRRORS in the hive-c0re service environment (JSON-encoded [{upstream, dest}] list). hive-c0re already holds the admin token for the rest of the forge provisioning sweep (orgs, agent accounts, etc.), so mirror seeding lives in the same place rather than a separate host-side unit.

General-purpose mirrors: you can pre-seed any external repo as a pull-mirror via services.hyperhive.forge.mirrors:

services.hyperhive.forge.mirrors = [
  { upstream = "https://github.com/actions/checkout"; dest = "actions/checkout"; }
  { upstream = "https://github.com/example/tool";    dest = "mirrors/tool"; }
];

Each entry is created as a real Forgejo pull-mirror — not a one-off clone. Forgejo re-syncs the mirror on every pull (git-upload-pack request), so a DNS blip during that sync will propagate back to the runner as a hard git clone failure. The <owner> org in dest is auto-created. Keep mirror dests out of the hive-c0re-managed namespaces (config/, shared/, agents/, core/) to avoid provisioning collisions.

CI workflow

Three jobs are defined in .forgejo/workflows/ci.yml: nix flake check, tracker-tag lint, and comment-block lint. All three are required — a lint failure blocks merge. hive-forge ci-rerun --pr N dispatches a workflow_dispatch retrigger without an empty commit.

Security: unsandboxed builds and trusted contributors

hive-ci should only run CI for trusted contributors. The security boundary is weaker than it looks:

What unsandboxed builds mean

nspawn containers cannot create user-namespaces, so nix.settings.sandbox-fallback = true is set in the container. This means every nix build (and nix flake check) runs without a build sandbox — the build process has full access to the container filesystem, network, and any bind-mounts during the build phase.

A malicious default.nix or build script in a PR can therefore:

Make arbitrary network requests to any address reachable from the container. The container runs in its own netns behind the hive bridge, so it reaches the forge only through the gateway (http://<forge.domain>, public/read endpoints — no admin credentials) and cannot reach host-loopback services: the unauthenticated core dashboard at 127.0.0.1:7000 and the raw forge port are off-limits (bridge→127.0.0.0/8 is dropped).
Write to the container filesystem, including corrupting the runner's state dir or .runner credentials.

The core admin token (forge-core-token) is not bind-mounted into the container. It is held and used only by hive-c0re on the host (forge/ci_runner.rs), which mints per-runner registration tokens; only that registration token reaches the container's env-file. A build process can still reach forge over the network, but cannot use the admin token to issue privileged API calls.

Note: nix flake check --no-build (eval-only) reduces the attack surface but does not eliminate it — builtins.fetchGit, builtins.fetchurl, and import-from-derivation can reach the network and filesystem during evaluation. The default CI workflow runs full nix flake check (builds derivations), which is the higher-risk path.

Mitigation

For a hive used by a single operator or a small trusted team, the risk is low — all contributors are already trusted with forge access anyway.

For repos with external contributors or fork PRs:

Use Forgejo's fork PR approval workflow (repository.settings → "Require approval for fork PRs from first-time contributors") to gate CI until a maintainer approves the first PR.
Or restrict the CI workflow trigger to push events on branches (not pull_request from forks) — forks can't push to upstream branches.

The current design is appropriate for a trusted-team hive where all contributors have implicit forge access.

Host store maintenance (recommended)

The CI runner builds derivations through the host nix-daemon — the hive-ci container shares the host store and has no daemon of its own. Build outputs accumulate in /nix/store with no automatic collection, and a busy CI day can fill the disk until every job fails fast with ENOSPC.

Store GC is a host-level concern, so it belongs in the host's own NixOS configuration, not in the hyperhive service modules — a single service should not reach out and change the host's global nix-daemon options. Add the following to your host config:

{
  # Daily GC: delete store paths not referenced by a live root and older
  # than a day. Keeps the store bounded between builds.
  nix.gc = {
    automatic = true;
    dates = "daily";
    options = "--delete-older-than 1d";
  };

  # Disk-pressure GC: when free space drops below min-free mid-build, the
  # daemon collects garbage up to max-free before continuing. This is the
  # real-time net the daily timer can't provide — a same-day build burst is
  # what fills the disk. Tune to your disk size.
  nix.settings.min-free = 20 * 1024 * 1024 * 1024;  # 20 GiB
  nix.settings.max-free = 50 * 1024 * 1024 * 1024;  # 50 GiB
}

Remote builders: if CI dispatches builds to a remote builder (e.g. via nix.buildMachines / ssh-ng://), the build outputs land in that host's store, so the same GC config should be applied wherever the builder runs — GC on the coordinator host won't reclaim space on the builder.

References

nix/host-modules/hive-ci.nix: runner configuration, auto-registration script, container setup.
.forgejo/workflows/ci.yml: workflow definition.
docs/gotchas.md: nix sandboxing limitations in containers.