hive-network

Host-side bridge + per-agent private-netns isolation — always on whenever hyperhive is enabled. Configured via services.hyperhive.network.*.

Isolation is the only mode — there is no shared-netns fallback. The former services.hyperhive.network.enable, services.hyperhive.network.isolateContainers and services.hyperhive.network.upstreamDns options were removed; a config that still sets one fails eval with a removal message.

Network map

One picture of the whole hive. There are two planes: infra containers share the host netns and bind host ports directly; compute containers (agents + CI) each get a private netns behind the bridge. The unix-socket control plane rides the VFS and is untouched by any of it.

                        internet
                           │  uplink NIC — NAT MASQUERADE for the
                           │  bridge subnet (10.42.0.0/24 default)
┌──────────────────────────┴─────────────────────────────────────────┐
│ host netns — the host itself plus gateway / forge / matrix         │
│                                                                    │
│   nginx :80/:443                                  [hive-gateway]   │
│   dnsmasq 10.42.0.1:53 (DNS) + :67 (DHCP)         [hive-gateway]   │
│   forgejo :3000 http, :2222 git-ssh               [hive-forge]     │
│   tuwunel :8008 client API                        [hive-matrix]    │
│   hive-c0re dashboard 127.0.0.1:7000              (host service)   │
│   wg-hive :51820/udp — swarm mesh, when enabled   (host iface)     │
│                                                                    │
│                  hive-br0   10.42.0.1/24                           │
│              ┌──────────┼──────────────┐                           │
└──────────────┼──────────┼──────────────┼───────────────────────────┘
          vb-h-<a>    vb-h-<b>      vb-hive-ci      veth pairs
               │          │              │
          ┌────┴────┐ ┌───┴─────┐ ┌──────┴──┐   one private netns
          │ agent a │ │ agent b │ │ hive-ci │   each; eth0 leases
          │  eth0   │ │  eth0   │ │  eth0   │   from the DHCP pool
          └─────────┘ └─────────┘ └─────────┘

container	netns	IPv4	listens / reached via
`hive-gateway`	host (shared)	host addresses	nginx `:80`/`:443` (every vhost); dnsmasq `bridgeIp:53` + DHCP `:67` on the bridge
`hive-forge`	host (shared)	host addresses	forgejo `:3000` http, `:2222` git-ssh; fronted by the `forge.<domain>` vhost
`hive-matrix`	host (shared)	host addresses	tuwunel `:8008` (+ optional federation port); fronted by the matrix vhost
`hive-ci`	private, veth on bridge	DHCP pool	outbound only (runner → forge); no inbound surface
`h-<agent>`	private, veth on bridge	DHCP pool	web UI via UDS `/run/hive-agent/<name>` → nginx sub-path; in-container UI port hashed 8100–8999

The flows, end to end:

DHCP — agent dhcpcd broadcasts on eth0 → veth → bridge → host firewall (udp 67 hole) → dnsmasq pool → lease + router option.
DNS — agents query bridgeIp:53; hive zones are answered authoritatively with the bridge IP, everything else forwards to the host's resolvers (see Resolver behaviour below).
HTTP — forge.<domain> / matrix / dashboard names all resolve to the bridge IP, land on nginx :80/:443, and proxy to forgejo :3000, tuwunel :8008, hive-c0re 127.0.0.1:7000, or a per-agent UI unix socket.
Internet egress — agent default route points at the bridge IP; the host forwards + masquerades out its uplink.
Swarm — peer hives connect over the wg-hive WireGuard mesh and reach each other's gateway/forge across it (docs/swarm.md).
Control plane (no network) — per-agent broker socket /run/hive/mcp.sock, privileged helper /run/hive/priv.sock, operator admin /run/hyperhive/host.sock, and the per-agent UI sockets under /run/hive-agent/ are unix domain sockets bind-mounted through the VFS; private netns does not affect them.

Container shape (where dnsmasq lives)

Co-located in the existing hive-gateway container — single front-door for both DNS and HTTP, saves a sibling container, single systemd-unit / state surface to monitor. The gateway shares host netns (privateNetwork = false) so dnsmasq's bind-interfaces listener on bridgeIp is on the host's bridge interface.

Configuration

{
  services.hyperhive = {
    enable = true;
    domain = "darkest.space";
    # network.bridgeIp = "10.42.0.1";   # default
  };
}

Requires services.hyperhive.domain to be set — the dnsmasq resolver is authoritative for <hive-domain> and its sub-domains.

Bridge addressing

Default subnet is 10.42.0.0/24, host-side gateway at 10.42.0.1. RFC 1918 space, unlikely to clash with operator's existing setup; override bridgeIp + bridgePrefixLength if a different range is already in use. /24 gives 254 usable per-agent addresses — enough for any single-host hive; bigger swarms or tighter addressing schemes pick their own.

Resolver behaviour

dnsmasq is authoritative for the hive's own zones — answers <hive-domain>, forge.<hive-domain>, matrix.<hive-domain> queries with the bridge IP (where nginx is reachable). Everything else is forwarded to the host's own resolvers: dnsmasq reads the gateway container's /etc/resolv.conf, the host copy nixos-container makes at each container start — a host resolver change is picked up on the next gateway restart. Containers don't need to know the upstream — they query the bridge IP and dnsmasq does the right thing per-name.

bind-interfaces + interface = [ bridgeName "lo" ] means the listener only accepts queries from the bridge interface (plus lo for container health-checks). External hosts can't reach it — no DNS-amplification surface even when the operator opens port 80 for gateway HTTP.

resolveLocalQueries = false keeps dnsmasq out of the host's own resolution stack — the host's resolver (systemd-resolved, plain glibc nss, dnscrypt-proxy, etc.) keeps doing whatever the operator configured. The hive resolver is purely for inbound queries from agent containers.

Firewall posture

networking.firewall.interfaces.<bridge>.allowedUDPPorts = [ 53 67 ] networking.firewall.interfaces.<bridge>.allowedTCPPorts = [ 53 80 443 ]

Port 53 opens the resolver on the bridge interface only. Other interfaces stay closed. The hive resolver isn't an external-facing service.
Port 67 (UDP) admits DHCP requests to the dnsmasq pool. dnsmasq receives DHCP via a regular UDP socket (it does not use a netfilter-bypassing raw socket), so the hole is mandatory — without it containers never get a lease and fall back to 169.254.x.x.
Ports 80 and 443 let isolated agents reach nginx (gateway container, shared host netns) for the forge sub-domain, per-agent UI proxies, and any other HTTP services.

The host firewall is the only firewall. The shared-netns infra containers (gateway, forge, matrix) set networking.firewall.enable = false: a NixOS firewall inside a shared-netns container runs against the host ruleset — at container boot its firewall-start flushes the nixos-fw chains, rebuilds them from the container's (empty) port list, and deletes the host's nixos-nat-* chains without recreating them, silently wiping the bridge holes above plus the agents' NAT. Private-netns containers (agents, hive-ci) may keep their own firewall — it is scoped to their namespace.

Reaching host services (`exposeHostPorts`)

By default agents can only reach the host on 80/443 (+53 DNS), so a host-side service on another port — e.g. a dev OTEL collector for services.hyperhive.otel.endpoint (see docs/observability.md) — is unreachable.

services.hyperhive.network.exposeHostPorts = [ 4318 ]; opens each listed TCP port P on the bridge-interface allowedTCPPorts, so an agent can connect to <bridgeIp>:P (point the collector endpoint at http://<bridgeIp>:4318, default http://10.42.0.1:4318).

This is firewall-only: the host service must bind an address reachable from the bridge — 0.0.0.0 or the bridge IP — not loopback only. The bridge→127.0.0.0/8 DROP rule (below) is unchanged, so a service bound to 127.0.0.1 only stays unreachable; rebind it to 0.0.0.0.

The port is reachable by every agent on the bridge subnet (like DNS/gateway), so only expose services safe for any agent to reach.

Container isolation

Each agent container runs in a private network namespace with a dedicated veth pair attached to the bridge. The following table summarises what the nix side sets up unconditionally:

effect	mechanism
IP forwarding	`boot.kernel.sysctl."net.ipv4.ip_forward" = 1`
Internet NAT	`networking.nat { enable = true; internalInterfaces = [ bridgeName ]; }` — MASQUERADE on packets leaving via any external NIC
Loopback DROP	`networking.firewall.extraInputRules` — drops bridge-subnet → `127.0.0.0/8` traffic; defence-in-depth against routing table leaks
Gateway access	`networking.firewall.interfaces.<bridge>.allowedTCPPorts = [ 80 443 ]` — lets isolated agents (private netns, veth on bridge) reach nginx on the host
c0re signal	`HIVE_NETWORK_ISOLATION=1`, `HIVE_NETWORK_BRIDGE`, `HIVE_NETWORK_SUBNET` in `systemd.services.hive-c0re.environment`

HIVE_NETWORK_SUBNET is the host-side bridge IP + prefix (e.g. 10.42.0.1/24), not the canonical network address. The Rust side must normalise (bitwise-AND with mask) before subnet membership checks or address arithmetic.

What the Rust side does

hive-c0re reads HIVE_NETWORK_ISOLATION and passes PRIVATE_NETWORK=1, LOCAL_ADDRESS= (empty), HOST_ADDRESS=<bridge-ip>, and HOST_BRIDGE=<bridgeName> via lifecycle::set_nspawn_flags when creating or updating containers. LOCAL_ADDRESS is left empty so the container's dhcpcd acquires an address from the bridge dnsmasq pool (networking.useDHCP = true in nix/agent-modules/network.nix). This applies uniformly to all containers — agents and service containers alike.

HOST_ADDRESS is the bridge gateway IP (the address part of HIVE_NETWORK_SUBNET, via lifecycle::bridge_gateway_ip — taken verbatim so a non-.1 operator override still resolves to wherever the bridge actually lives). It is load-bearing: nixos-container's container-side network setup only installs a default route (ip route add default via $HOST_ADDRESS) when HOST_ADDRESS is non-empty. In bridge mode the host-side address/route setup is skipped, so writing it only affects the container's default route — without it the container comes up with an IP but no path off the bridge subnet (no internet, no api.anthropic.com).

How the isolated container gets its resolver

nixos-container copies the host's /etc/resolv.conf into the container at every start. The host resolver (e.g. 127.0.0.53 from systemd-resolved, or a LAN router) is unreachable from a private netns and isn't authoritative for the hive's own zones, so it is replaced with the bridge dnsmasq at boot. Because the copy happens on every start, a declarative environment.etc."resolv.conf" would be clobbered — so the wiring is runtime:

hive-priv drops a marker file (/etc/hyperhive-bridge-dns, carrying the gateway IP) into each container's /etc.
the hyperhive-isolated-dns oneshot (nix/agent-modules/network.nix), gated on that marker, rewrites /etc/resolv.conf to nameserver <gateway-ip> at boot. It is ordered before the harness (hive-ag3nt), the matrix daemon, and tea-login so the resolver is correct before the first DNS lookup.

Why isolation is safe: all hive-c0re communication goes through unix domain sockets (/run/hive/mcp.sock for agent requests, /run/hive/priv.sock for privileged ops). These are bind-mounted into containers via the nspawn conf. UDS paths traverse the VFS, not the network stack, so PRIVATE_NETWORK=1 does not affect them.

The nix side also enables IP forwarding + NAT (agents reach the internet through the host) and drops bridge-subnet → loopback traffic (defence-in-depth against a compromised agent reaching the c0re dashboard HTTP at 127.0.0.1). Agents have no legitimate reason to reach the dashboard over loopback — the hive-c0re admin socket is a UDS, not TCP.

Cross-references

docs/gateway.md — vhost map + the gateway container's other duties