hive-gateway

Single nginx in front of every hyperhive web surface. Container hive-gateway, shared host netns, system-config (not meta-flake managed). Configured via services.hyperhive.gateway.* + per-subsystem opt-in flags in services.hyperhive.{forge,matrix,...}.

Vhost map

URL vhost upstream source
<hive>/ _ (catch-all) hive-c0re dashboard (7000) always
<hive>/agent/<name>/ _ per-agent harness (UDS or TCP) agents.conf (runtime-generated)
<hive>/.well-known/matrix/{client,server} _ inline JSON (no upstream) matrix.enable && domain != null
<hive>/matrix/ (deprecated) _ 301 → matrix.<hive>/ matrix.gui.enable
forge.<hive>/ forge.<hive> forgejo (3000) forge.behindGateway
matrix.<hive>/_matrix/* matrix.<hive> tuwunel (8008) matrix.gatewayHost != null
matrix.<hive>/ matrix.<hive> fluffychat-web static matrix.gui.enable
matrix.<hive>/config.json matrix.<hive> inline JSON (FluffyChat boot config) matrix.gui.enable && domain != null

Per-agent UIs stay sub-path because they're hyperhive-internal and base-path-aware. External standard apps (forge / matrix) get sub-domains because their defaults work cleanly at sub-domain root + per-origin cookies / storage isolation matters.

Discovery flow (matrix)

Operator points client at <hive>. Sequence:

  1. Client fetches https://<hive>/.well-known/matrix/client{"m.homeserver":{"base_url":"https://matrix.<hive>"}} (no port suffix when gateway listens on 443). With selfSignedTls = false the scheme drops to http and the port suffix reflects the bare port instead.
  2. Client connects to matrix.<hive>/_matrix/client/....
  3. Gateway routes /_matrix/* → tuwunel at 127.0.0.1:8008.

matrix-dart-sdk (FluffyChat etc.) hardcodes https for the well-known fetch regardless of input scheme, so the discovery endpoint MUST be https — see "Self-signed TLS" below for the cert generation that backs the default-on path.

Federation peers fetch .well-known/matrix/server{"m.server":"matrix.<hive>"} and connect to matrix.<hive>:8448 per spec default. Gateway only listens on configured port (+ httpsPort when TLS on); cross-hive federation needs either an SRV record (_matrix._tcp.matrix.<hive> → port 80 / 443) OR matrix.openFirewall = true so peers reach tuwunel's federation port directly. Hyperhive is mostly closed/internal, so this rarely bites.

SPA fallback (Accept-header pattern)

The <hive> catch-all and the matrix.<hive> vhost both serve a flutter SPA (per-agent UI, fluffychat). Two requirements collide:

Solution: an nginx http-context map $http_accept $matrix_spa_target { ... } keyed on the request's Accept header. Browser navigations (Accept: text/html,...) get index.html; asset fetches (Accept: image/*, */*, etc.) get a sentinel nonexistent path → try_files falls through to =404. No extension allowlist, no if block, no regex heuristics.

Local dev (localHostsEntry)

services.hyperhive.gateway.localHostsEntry = true adds entries to the host's /etc/hosts:

lib.unique de-dupes if any sub-domain happens to equal another entry. Operators with real DNS leave it off.

Sub-domain shape (rationale)

Operator decision: sub-domain over sub-path for forge + matrix, sub-path for per-agent UIs.

services.hyperhive.{forge.domain,matrix.gatewayHost} take the full hostname (forge.darkest.space, git.example.com) rather than a label that gets concatenated with hive-domain — operators want control over the full shape, not a forced <label>.<hive-domain> pattern.

Tuning knobs

Per-vhost timeouts + body-size limits live in the location blocks:

SSH for forge stays direct on cfg.sshPort — separate listener protocol, not HTTP-over-nginx.

Per-agent unix-socket upstream

All agents bind their web UI on a unix-domain socket at /run/hive-agent/<name>/web.sock — the HIVE_WEB_SOCKET env var is now set unconditionally for every agent. The mechanism:

  1. Agent side. HIVE_WEB_SOCKET=/run/hive-agent/<name>/web.sock is set on every harness service env; web_ui::serve binds a UnixListener at that path. The deprecated hyperhive.web.useUnixSocket option is now a no-op.
  2. Host side. hive-c0re bind-mounts the per-agent subdir (/run/hive-agent/<name>/) into the agent's container. Dir bind, not file bind — file bind-mounts don't survive the harness's unlink + bind(2) cycle on socket replace. Per-agent subdir keeps each agent's container blind to siblings' sockets.
  3. Marker gate. After successful bind_unix, the harness drops <dir>/hyperhive-socket-bound next to the socket. c0re's agent_sockets::write filters its JSON map by marker presence — only agents whose harness has actually bound the socket appear there. (Legacy name .bound also accepted during the transition window.)
  4. Gateway side. gateway_nginx::write generates /var/lib/hyperhive/gateway/agents.conf — a plain nginx include file with one location /agent/<name>/ block per agent. UDS upstream (http://unix:/run/hive-agent/<name>/web.sock:/) when hyperhive-socket-bound marker present; TCP loopback for agents that haven't yet been rebuilt under the new config. The gateway container bind-mounts /var/lib/hyperhive/gateway/ at /run/hive-state/; nginx includes /run/hive-state/agents.conf. After each write, c0re triggers the appropriate nginx action inside the gateway container via hive-priv (which runs as root and has --machine=hive-gateway transport rights that hive-c0re lacks). hive-priv queries ActiveState and dispatches:
    • active → systemctl reload nginx (SIGHUP, zero-downtime)
    • failed → systemctl reset-failed nginx + systemctl start nginx
    • otherwise → systemctl start nginx This is intentionally host-side: IN_MOVED_TO from an atomic rename does not propagate across the nspawn mount-namespace boundary, so a path unit inside the container would never fire.

c0re regenerates agents.conf (and triggers a reload) on two triggers: every topology change (new/removed agents) and every 10s marker poll tick (agent_sockets::spawn_poll). write() is idempotent — skips the rename when content is unchanged. Failed reloads are retried automatically on subsequent poll ticks via gateway_nginx::reload_if_pending.

Agent port map (agent-ports.json)

/var/lib/hyperhive/run/agent-ports.json is a flat JSON object keyed by logical agent name → TCP web port:

{
  "iris":     8178,
  "atlas":    8304,
  "argus":    8267,
  "damocles": 8549
}

Written alongside agents.conf on every topology change. Ports come from lifecycle::agent_web_port(name) — a pure FNV-1a hash of the name, reproducible from the name alone. The manager is excluded: it always uses a unix socket (HIVE_WEB_SOCKET is unconditionally set for the manager role), so its TCP port never appears in the fallback map. The gateway routes /agent/root/ to the manager's unix socket via agents.conf alongside sub-agents.

The file doubles as a human-readable audit artifact — cat agent-ports.json shows every registered sub-agent and its deterministic port assignment. TCP loopback upstreams in agents.conf reference these ports for agents that haven't opted into unix-socket mode yet.

Both agent-ports.json and agents.conf use atomic <path>.tmp + rename() writes so a crashing c0re process never leaves a partial or unparseable file behind.

When the gateway is in front, the SW4RM tab builds per-agent links as same-origin /agent/<name>/… URLs instead of the legacy direct http://<host>:<container.port>/ TCP shape. The signal comes from StateSnapshot.gateway_enabled, sourced from the HIVE_GATEWAY_ENABLED env the c0re NixOS module sets when services.hyperhive.gateway.enable = true. Three render sites flip together: the primary agent-name link, the favicon fetch (<url>/icon), and the nav-strip container-kind links from DashboardState.links (GET /api/dashboard-state). forge-kind nav-strip links still resolve against http://<host>:3000 (separate sub-domain transition tracked by forge.behindGateway); external-kind links are already absolute. See docs/web-ui/dashboard.md::Container row for the frontend-side derivation.

TLS modes

Four modes:

mode config cert source .well-known scheme
self-signed (default) selfSignedTls = true auto-generated RSA-4096, 10-year https
ACME (Let's Encrypt) selfSignedTls = false + tls.acme.enable = true nginx inside container via HTTP-01 https
operator cert selfSignedTls = false + tls.certDir set bind-mounted from host https
http-only selfSignedTls = false, no tls.certDir, no tls.acme none http

ACME / Let's Encrypt (tls.acme)

Simplest production path for operators with a public domain:

services.hyperhive.gateway = {
  selfSignedTls = false;
  openFirewall  = true;
  tls.acme = {
    enable = true;
    email  = "admin@example.com";
  };
};

nginx inside the gateway container obtains and auto-renews certs via the ACME HTTP-01 challenge on port (default 80). The gateway container shares the host network namespace (privateNetwork = false) so outbound ACME requests work without any extra routing. Certs are stored inside the container's persistent state dir (/var/lib/acme/ inside hive-gateway; survives restarts because ephemeral = false).

Requirements: services.hyperhive.domain must be publicly DNS-resolvable to this host, and openFirewall = true so Let's Encrypt can reach /.well-known/acme-challenge/. Each active vhost (main domain, forge.<domain>, matrix.<domain>) gets its own cert via separate ACME challenges.

Mutual exclusion: selfSignedTls = true or tls.certDir set together with tls.acme.enable = true fails an assertion.

Swarm peers: CA-signed certs are trusted by default — remote hives need no certFingerprint in swarm.peers.

Self-signed TLS (selfSignedTls)

On by default. The gateway generates a self-signed RSA-4096 cert at first boot (10-year validity) and listens on httpsPort (default 443) on every vhost beside the plain-http port (default 80).

Why on by default: matrix-dart-sdk (FluffyChat's SDK) hardcodes https://<host>/.well-known/matrix/client for homeserver discovery and refuses to fall back to plain http. Without TLS the browser client cannot bootstrap.

Cert shape: subject CN = bare hive domain; subjectAltName covers <hive> + wildcard *.<hive> so all current and future sub-domain vhosts (matrix, forge, ...) validate under the same cert. Stored at /var/lib/hive-gateway/tls/{cert,key}.pem inside the gateway container (ephemeral = false, so persisted across container restart).

Regeneration: the generator unit (hive-gateway-self-signed-cert.service) always runs (idempotent). To rotate (e.g. cert leak, expiry approaching), delete cert.pem inside the gateway container and restart nginx.service.

Cert prompts: browsers warn once per host on first visit. With the wildcard SAN, https://<hive>/, https://matrix.<hive>/, and https://forge.<hive>/ are covered by the same cert, but the browser still prompts per origin.

Operator-provided cert (tls.certDir)

For operators with a real CA cert (Let's Encrypt, corporate CA, etc.):

services.hyperhive.gateway = {
  selfSignedTls = false;
  tls.certDir = "/var/lib/acme/example.com";  # nixpkgs security.acme output dir
  # tls.certName = "cert.pem";   # default — matches security.acme layout
  # tls.keyName  = "key.pem";    # default — matches security.acme layout
};

The directory is bind-mounted read-only into the gateway container at /run/hive-tls/. nginx uses cert.pem + key.pem (override tls.certName/tls.keyName for different filenames). Both modes listen on httpsPort (default 443) and emit https:// in .well-known responses.

selfSignedTls = true and tls.certDir set together is an assertion error.

Key file permissions: nixpkgs's security.acme outputs private keys as 0640 root:acme by default. nginx inside the gateway container runs as the nginx user and cannot read a key with that ownership. Fix with:

security.acme.certs."example.com".group = "nginx";

or make the key world-readable (0644) if your threat model allows it. nginx errors out at startup on a key it can't read — the error is explicit in the journal, not a silent failure.

Peer hive config: when using a CA-signed cert, peer hives can declare this hive without certFingerprint in swarm.peers — the standard CA bundle validates:

services.hyperhive.swarm.peers."example.com" = { };  # no certFingerprint needed

HTTP-only (selfSignedTls = false, no tls.certDir)

For operators who front the gateway with an external TLS terminator (caddy, traefik, nginx + ACME on the host). The gateway listens on port (default 80) only. .well-known/matrix responses use http://, which breaks matrix-dart-sdk discovery — acceptable when matrix GUI is off or the external proxy handles the .well-known redirect.

.well-known/matrix/{client,server} scheme: https when TLS is active (either mode), http when http-only. The scheme must match what clients see at the external hostname.

Firewall posture (host-level)

hive-c0re.nix opens the per-agent web-port range 8100..8999 in the host firewall only when services.hyperhive.gateway.enable = false. With the gateway on (default), it's the sole external entry point and proxies to 127.0.0.1:<port> internally — leaving the per-agent ports firewall-open would defeat the single-front-door story.

services.hyperhive.gateway.openFirewall = true opens port plus httpsPort when selfSignedTls = true (default). Operators who flip selfSignedTls = false to front the gateway with a real TLS-terminating reverse proxy on the host get only port opened.

The manager hashes into the same port range as sub-agents (no "manager pinned at 8000" special case), so one range opening covers every container.

The dashboard port (cfg.dashboardPort, default 7000) is not listed in either case — it binds 127.0.0.1 only, so a firewall hole would be a no-op. Remote dashboard access flows through the gateway. Operators who opt out of the gateway lose external dashboard reach by design — the surface is privileged (approve / deny / destroy) and must not be exposed without a real reverse proxy in front.

HIVE_FORGE_URL: domain via gateway for isolated agents, loopback for shared-netns

Agents poll HIVE_FORGE_URL for Forgejo notifications + run all hive-forge calls against it. hive-c0re.nix sets this based on the network isolation mode:

hive-forge container shape

Private Forgejo wrapped in a nixos-container (hive-forge, not h-* — keeps c0re's lifecycle scanner out of the picture; the operator manages it via the standard nixos-container CLI). The container also keeps hive-forge from fighting any services.forgejo the operator already runs on the host — separate systemd namespace, separate state dir, separate port unless the operator deliberately collides.

Container shares the host network namespace (privateNetwork = false) so agents reach the forge at http://localhost:<httpPort> without extra plumbing — nixos-container is here for state + systemd-unit isolation, not network isolation.

State lives at /var/lib/nixos-containers/hive-forge/var/lib/forgejo/ and survives container restart / host reboot. To wipe, destroy the container.

Network and port configuration

services.hyperhive.forge = {
  httpPort   = 3000;   # default — HTTP listener; outside hyperhive's 7000/8100-8999 range
  sshPort    = 2222;   # default — git-over-SSH; kept off 22 so it doesn't collide with the host openssh
  openFirewall = false; # default — expose httpPort + sshPort to the host firewall
};

httpPort (default 3000) is the port Forgejo's HTTP server binds to. It sits outside hyperhive's reserved ranges (dashboard 7000, agents 8100–8999) so a default install has no port fights. Change it only if you already have another process bound to 3000.

sshPort (default 2222) is the port Forgejo's built-in SSH server uses for git clone/push/pull over SSH (git@<domain>:owner/repo.git via -p 2222). Port 22 is left alone on the host for openssh.

openFirewall (default false) controls whether httpPort and sshPort are opened in the host firewall. Off by default (secure by default): every agent container reaches Forgejo at localhost:<httpPort> via the shared host netns without a firewall hole. Flip to true when you need:

Forgejo served through the gateway (forge.behindGateway = true) does not need openFirewall — the gateway's own openFirewall option covers that path.

rootUrl override

services.hyperhive.forge.rootUrl = "https://forge.example.com/";

rootUrl (default null) overrides the Forgejo ROOT_URL that is auto-derived from forge.domain + gateway state. The auto-derivation covers most cases:

Shape Auto-derived ROOT_URL
behindGateway = true http://<forge.domain>/ (port suffix omitted when gateway.port == 80)
behindGateway = false http://<forge.domain>:<httpPort>/

The auto-derivation always uses http://. Set rootUrl explicitly when you need https:// (e.g. behind a TLS-terminating reverse proxy or when selfSignedTls = true and clone URLs must carry https://), or when forge.domain resolves differently from the public URL. Must end with / (Forgejo requirement; an assertion enforces this).

Per-agent static frontend split

When services.hyperhive.frontend is configured, hive-c0re injects HIVE_AGENT_FRONTEND_DIR = "${cfg.frontend}/agent" into its service environment. The nginx include generator (gateway_nginx::write) reads this variable and, when set, emits split location blocks per agent instead of the legacy single-proxy block.

Location priority for /agent/<name>/...:

# 1. Compiled assets — content-addressed nix store path, cache forever
location ^~ /agent/<name>/static/ {
    alias <frontend>/static/;
    expires 1y;
    add_header Cache-Control "public, immutable, max-age=31536000";
}

# 2. Static dist + proxy fallback for dynamic paths
location /agent/<name>/ {
    alias <frontend>/;
    try_files $uri $uri.html $uri/index.html @<name>_dynamic;
}

# 3. Proxy catchall — API, events, icon, send, login, …
location @<name>_dynamic {
    proxy_pass <upstream>;
    proxy_set_header X-Forwarded-Prefix /agent/<name>;
    proxy_intercept_errors on;
    error_page 502 503 504 = /__hive_agent_unreachable;
    # … (full proxy header block)
}

try_files resolution (nginx applies the alias mapping before checking file existence):

request resolved outcome
/agent/iris/ <frontend>/index.html main agent page
/agent/iris/stats <frontend>/stats.html stats page
/agent/iris/screen <frontend>/screen.html screen page
/agent/iris/static/app.js caught by ^~ block first served with immutable cache
/agent/iris/api/state no file match → @iris_dynamic proxied to agent daemon
/agent/iris/events/live no file match → @iris_dynamic proxied (SSE)

Adding a new HTML page to the frontend dist (dist/<page>.html) automatically makes it reachable at /agent/<name>/<page> — no generator change needed.

Why ^~ for /static/: the ^~ prefix gives this block higher priority than the plain prefix location /agent/<name>/, so compiled JS/CSS assets skip try_files entirely and get the immutable cache headers. Nix store paths are content-addressed — the hash changes on any content change — so max-age=31536000 is safe.

Why nix store is reachable from the gateway container: nspawn containers bind-mount /nix/store read-only by default. The HIVE_AGENT_FRONTEND_DIR path is a nix store path baked in at hive-c0re build time — the same path is visible to both c0re (writing agents.conf) and the gateway nginx (serving files from it).

Graceful degradation: if HIVE_AGENT_FRONTEND_DIR is empty or unset (e.g. a build that predates cfg.frontend), each agent gets the legacy single-proxy block and all traffic is forwarded to the agent daemon as before.

extraFiles: per-agent hyperhive.frontend.extraFiles are in mergedDist, not in the base cfg.frontend dist. They are not under the nix-store alias path, so requests for them fall through try_files to @<name>_dynamic and are served by the agent daemon as before.

Per-agent error pages

/agent/<name>/ requests hit two failure modes; both get static HTML pages instead of nginx's default error chrome:

Both pages are built at deploy time via pkgs.runCommand (one nix derivation hyperhive-agent-error-pages with not-found.html + unreachable.html inside) and served via two internal nginx locations with alias to the exact file. internal keeps the files from being directly request-able by operators — only nginx's own error-handling can reach them.

Page styling: minimal inline CSS matching the dashboard's catppuccin palette (#1e1e2e bg, #cdd6f4 text, #cba6f7 heading). No dependencies on the frontend dist — these pages render even when hive-c0re itself is down.

Scope is intentionally narrow: only routes already special-cased in the nginx config get custom error pages. Other gateway routes (forge / matrix / fluffychat) get nginx defaults — extending the custom-error pattern there is a separate follow-up.

HTTP Basic auth

services.hyperhive.gateway.auth.enable = true gates every request to the main vhost (_) behind HTTP Basic auth. nginx's built-in auth_basic module validates credentials; no extra service or host-side daemon is required.

Setup:

services.hyperhive.gateway.auth = {
  enable = true;
  # realm = "hyperhive";  # optional, default shown
};

The credential store lives at the fixed path /var/lib/hyperhive/gateway/gateway.htpasswd on the host. A tmpfiles rule pre-creates the file on first boot; no manual path configuration is required. The file is exposed inside the gateway container at /run/hive-state/gateway.htpasswd via the existing gateway state bind-mount.

Manage users with hivectl gateway (defaults to the standard path — no --file flag needed for the common case):

# Add or update a user (prompted for password):
hivectl gateway create-user alice --password-stdin

# Add with inline password (visible in shell history — avoid for sensitive creds):
hivectl gateway create-user bob --password hunter2

# Remove a user:
hivectl gateway delete-user bob

# List current usernames:
hivectl gateway list-users

hivectl gateway create-user hashes passwords with BCrypt (cost 12) and writes $2y$-prefixed hashes that nginx accepts natively. No external htpasswd binary is required. Pass --file <path> to target a non-default file.

What is not gated: per-agent UI routes emitted into agents.conf (served under /agent/<name>/) inherit no auth from / — nginx applies auth_basic per-location. Full per-agent coverage is a follow-up.

Realm: the WWW-Authenticate: Basic realm="..." string browsers display in the credential dialog. Defaults to "hyperhive". Must not contain " or $.

Custom 401 page: when credentials are absent or wrong, nginx serves a Catppuccin-styled unauthorized.html page (built into the same Nix derivation as the agent error pages) that tells the operator which hivectl command to run to create a user. The response status is still 401 (error_page 401 =401 /__hive_auth_unauthorized) so browsers present the login dialog on the first visit — users who dismiss the dialog see the human-readable hint. The internal exact-match location (= /__hive_auth_unauthorized) beats location / in nginx's prefix ordering, preventing the subrequest from looping back through auth_basic.

Security headers

The following headers are emitted at server scope on every gateway vhost (_, forge.<domain>, matrix.<domain>):

Header Value
X-Frame-Options SAMEORIGIN
X-Content-Type-Options nosniff
Referrer-Policy strict-origin-when-cross-origin

nginx's add_header inheritance rule: a location block that sets its own add_header does not inherit server-scope headers. API locations that carry their own CORS headers (e.g. /.well-known/matrix/client, /_matrix/) are therefore unaffected. HTML-serving and proxy locations with no add_header of their own pick the security headers up automatically.

HSTS (gateway.hsts)

HSTS is opt-in and disabled by default:

services.hyperhive.gateway.hsts = {
  enable = true;          # default: false
  maxAge = 31536000;      # default: 1 year (required for preload list)
  includeSubDomains = true;  # default: true
};

When enabled, a Strict-Transport-Security: max-age=...[; includeSubDomains] header is added alongside the other security headers.

Opt-in rationale: HSTS pins HTTPS in the browser's preload cache; enabling it on a deployment that later loses TLS locks browsers out until max-age expires. Only enable when TLS is permanent.

Assertion: hsts.enable = true without a TLS mode configured (selfSignedTls, tls.certDir, or tls.acme.enable) is a NixOS build-time assertion failure — HSTS over plain HTTP is harmless but almost always a misconfiguration.