hive-network

Host-side bridge + per-agent DNS resolver — the foundation that makes container netns isolation safe to land. Configured via services.hyperhive.network.*; off by default during rollout.

Why ship before netns isolation

If netns isolation lands first, agent containers lose /etc/resolv.conf propagation from the host and DNS breaks until a separate resolver is up. Inverting the sequence — bridge + dnsmasq first, netns flip second — makes the flag day boring: the resolver endpoint is already live, agents just discover it via veth instead of shared netns.

v1 vs v2

feature v1 (this PR) v2 (after netns isolation)
bridge interface created on host, no slave NICs per-agent veth pairs attach
dnsmasq binding bridge IP (reachable via host loopback in shared netns) bridge IP (reachable via veth in private netns)
agent container netns shared host private
agent /etc/resolv.conf unchanged (host DNS) nameserver <bridge-ip>
address rules target <bridge-ip> (works in both modes) unchanged from v1

The address rules ship pointing at the bridge IP from v1 so the DNS contract is fixed before any container actually depends on it — minimises the things that flip on netns day.

Container shape (where dnsmasq lives)

Co-located in the existing hive-gateway container — single front-door for both DNS and HTTP, saves a sibling container, single systemd-unit / state surface to monitor. The gateway shares host netns (privateNetwork = false) so dnsmasq's bind-interfaces listener on bridgeIp works without any veth gymnastics today; when agent containers flip to private netns the binding doesn't change (it's still on the host's bridge interface).

Configuration

{
  services.hyperhive = {
    enable = true;
    domain = "darkest.space";
    network.enable = true;            # opt in to bridge + DNS
    network.bridgeIp = "10.42.0.1";   # default
    network.upstreamDns = [            # default Cloudflare + Quad9
      "1.1.1.1"
      "9.9.9.9"
    ];
  };
}

Asserts services.hyperhive.domain != null (resolver needs a domain to be authoritative for) + services.hyperhive.gateway.enable = true (resolver lives in the gateway container).

Bridge addressing

Default subnet is 10.42.0.0/24, host-side gateway at 10.42.0.1. RFC 1918 space, unlikely to clash with operator's existing setup; override bridgeIp + bridgePrefixLength if a different range is already in use. /24 gives 254 usable per-agent addresses — enough for any single-host hive; bigger swarms or tighter addressing schemes pick their own.

Resolver behaviour

dnsmasq is authoritative for the hive's own zones — answers <hive-domain>, forge.<hive-domain>, matrix.<hive-domain> queries with the bridge IP (where nginx is reachable). Everything else gets forwarded to upstreamDns. Containers don't need to know the upstream — they query the bridge IP and dnsmasq does the right thing per-name.

bind-interfaces + interface = [ bridgeName "lo" ] means the listener only accepts queries from the bridge interface (plus lo for container health-checks). External hosts can't reach it — no DNS-amplification surface even when the operator opens port 80 for gateway HTTP.

resolveLocalQueries = false keeps dnsmasq out of the host's own resolution stack — the host's resolver (systemd-resolved, plain glibc nss, dnscrypt-proxy, etc.) keeps doing whatever the operator configured. The hive resolver is purely for inbound queries from agent containers.

Firewall posture

networking.firewall.interfaces.<bridge>.allowedUDPPorts = [ 53 ]

When isolateContainers = true, allowedTCPPorts is extended with [ 80 443 ] so isolated agents can reach nginx (gateway container, shared host netns) for the forge sub-domain, per-agent UI proxies, and any other HTTP services.

Container isolation

services.hyperhive.network.isolateContainers (default false) flips agent containers from shared host netns to private netns. Set only after enable = true is stable in production — an assertion blocks the reverse.

What the nix side does when isolateContainers = true

effect mechanism
IP forwarding boot.kernel.sysctl."net.ipv4.ip_forward" = 1
Internet NAT networking.nat { enable = true; internalInterfaces = [ bridgeName ]; } — MASQUERADE on packets leaving via any external NIC
Loopback DROP networking.firewall.extraInputRules — drops bridge-subnet → 127.0.0.0/8 traffic; defence-in-depth against routing table leaks
Gateway access networking.firewall.interfaces.<bridge>.allowedTCPPorts = [ 80 443 ] — lets isolated agents reach nginx on the host (shared netns)
Forge URL HIVE_FORGE_URL flips from http://127.0.0.1:3000 to http://forge.<domain> — agents resolve via dnsmasq, nginx proxies to forgejo
c0re signal HIVE_NETWORK_ISOLATION=1, HIVE_NETWORK_BRIDGE, HIVE_NETWORK_SUBNET in systemd.services.hive-c0re.environment

HIVE_NETWORK_SUBNET is the host-side bridge IP + prefix (e.g. 10.42.0.1/24), not the canonical network address. The Rust side must normalise (bitwise-AND with mask) before subnet membership checks or address arithmetic.

What the Rust side does

hive-c0re reads HIVE_NETWORK_ISOLATION and, when set, passes PRIVATE_NETWORK=1, LOCAL_ADDRESS=<deterministic-ip>, and HOST_BRIDGE=<bridgeName> via lifecycle::set_nspawn_flags when creating or updating containers. Each agent gets a deterministic IP derived from its name so the address is reproducible across destroy/recreate. This applies uniformly to all containers including the manager — no special case.

Why isolation is safe for the manager: all hive-c0re communication goes through unix domain sockets (/run/hive/mcp.sock for agent requests, /run/hive/priv.sock for privileged ops, per-agent manager sockets). These are bind-mounted into containers via the nspawn conf. UDS paths traverse the VFS, not the network stack, so PRIVATE_NETWORK=1 does not affect them.

The nix side also enables IP forwarding + NAT (agents reach the internet through the host) and drops bridge-subnet → loopback traffic (defence-in-depth against a compromised agent reaching the c0re dashboard HTTP at 127.0.0.1). Agents have no legitimate reason to reach the dashboard over loopback — the hive-c0re admin socket is a UDS, not TCP.

Prerequisites before flipping on

Migration behaviour

Containers are destroyed and re-created when the flag flips. Agent state under /agents/<name>/state/ is bind-mounted and survives; the container rootfs is recreated cleanly from the nix store.

Cross-references