hive-network
Host-side bridge + per-agent DNS resolver — the foundation that
makes container netns isolation safe to land. Configured via
services.hyperhive.network.*; off by default during rollout.
Why ship before netns isolation
If netns isolation lands first, agent containers lose
/etc/resolv.conf propagation from the host and DNS breaks until a
separate resolver is up. Inverting the sequence — bridge + dnsmasq
first, netns flip second — makes the flag day boring: the resolver
endpoint is already live, agents just discover it via veth instead
of shared netns.
v1 vs v2
| feature | v1 (this PR) | v2 (after netns isolation) |
|---|---|---|
| bridge interface | created on host, no slave NICs | per-agent veth pairs attach |
| dnsmasq binding | bridge IP (reachable via host loopback in shared netns) | bridge IP (reachable via veth in private netns) |
| agent container netns | shared host | private |
agent /etc/resolv.conf |
unchanged (host DNS) | nameserver <bridge-ip> |
address rules target |
<bridge-ip> (works in both modes) |
unchanged from v1 |
The address rules ship pointing at the bridge IP from v1 so the
DNS contract is fixed before any container actually depends on it
— minimises the things that flip on netns day.
Container shape (where dnsmasq lives)
Co-located in the existing hive-gateway container — single
front-door for both DNS and HTTP, saves a sibling container, single
systemd-unit / state surface to monitor. The gateway shares host
netns (privateNetwork = false) so dnsmasq's bind-interfaces
listener on bridgeIp works without any veth gymnastics today; when
agent containers flip to private netns the binding doesn't change
(it's still on the host's bridge interface).
Configuration
{
services.hyperhive = {
enable = true;
domain = "darkest.space";
network.enable = true; # opt in to bridge + DNS
network.bridgeIp = "10.42.0.1"; # default
network.upstreamDns = [ # default Cloudflare + Quad9
"1.1.1.1"
"9.9.9.9"
];
};
}
Asserts services.hyperhive.domain != null (resolver needs a domain
to be authoritative for) + services.hyperhive.gateway.enable = true (resolver lives in the gateway container).
Bridge addressing
Default subnet is 10.42.0.0/24, host-side gateway at 10.42.0.1.
RFC 1918 space, unlikely to clash with operator's existing setup;
override bridgeIp + bridgePrefixLength if a different range is
already in use. /24 gives 254 usable per-agent addresses — enough
for any single-host hive; bigger swarms or tighter addressing
schemes pick their own.
Resolver behaviour
dnsmasq is authoritative for the hive's own zones — answers
<hive-domain>, forge.<hive-domain>, matrix.<hive-domain>
queries with the bridge IP (where nginx is reachable). Everything
else gets forwarded to upstreamDns. Containers don't need to know
the upstream — they query the bridge IP and dnsmasq does the right
thing per-name.
bind-interfaces + interface = [ bridgeName "lo" ] means the
listener only accepts queries from the bridge interface (plus lo for
container health-checks). External hosts can't reach it — no
DNS-amplification surface even when the operator opens port 80 for
gateway HTTP.
resolveLocalQueries = false keeps dnsmasq out of the host's own
resolution stack — the host's resolver (systemd-resolved, plain
glibc nss, dnscrypt-proxy, etc.) keeps doing whatever the operator
configured. The hive resolver is purely for inbound queries from
agent containers.
Firewall posture
networking.firewall.interfaces.<bridge>.allowedUDPPorts = [ 53 ]
allowedTCPPorts = [ 53 ]opens the resolver on the bridge interface only. Other interfaces stay closed. The hive resolver isn't an external-facing service.
When isolateContainers = true, allowedTCPPorts is extended with
[ 80 443 ] so isolated agents can reach nginx (gateway container,
shared host netns) for the forge sub-domain, per-agent UI proxies,
and any other HTTP services.
Container isolation
services.hyperhive.network.isolateContainers (default false) flips
agent containers from shared host netns to private netns. Set only after
enable = true is stable in production — an assertion blocks the reverse.
What the nix side does when isolateContainers = true
| effect | mechanism |
|---|---|
| IP forwarding | boot.kernel.sysctl."net.ipv4.ip_forward" = 1 |
| Internet NAT | networking.nat { enable = true; internalInterfaces = [ bridgeName ]; } — MASQUERADE on packets leaving via any external NIC |
| Loopback DROP | networking.firewall.extraInputRules — drops bridge-subnet → 127.0.0.0/8 traffic; defence-in-depth against routing table leaks |
| Gateway access | networking.firewall.interfaces.<bridge>.allowedTCPPorts = [ 80 443 ] — lets isolated agents reach nginx on the host (shared netns) |
| Forge URL | HIVE_FORGE_URL flips from http://127.0.0.1:3000 to http://forge.<domain> — agents resolve via dnsmasq, nginx proxies to forgejo |
| c0re signal | HIVE_NETWORK_ISOLATION=1, HIVE_NETWORK_BRIDGE, HIVE_NETWORK_SUBNET in systemd.services.hive-c0re.environment |
HIVE_NETWORK_SUBNET is the host-side bridge IP + prefix (e.g.
10.42.0.1/24), not the canonical network address. The Rust side
must normalise (bitwise-AND with mask) before subnet membership checks or
address arithmetic.
What the Rust side does
hive-c0re reads HIVE_NETWORK_ISOLATION and, when set, passes
PRIVATE_NETWORK=1, LOCAL_ADDRESS=<deterministic-ip>, and
HOST_BRIDGE=<bridgeName> via lifecycle::set_nspawn_flags when
creating or updating containers. Each agent gets a deterministic IP
derived from its name so the address is reproducible across destroy/recreate.
This applies uniformly to all containers including the manager — no special
case.
Why isolation is safe for the manager: all hive-c0re communication goes
through unix domain sockets (/run/hive/mcp.sock for agent requests,
/run/hive/priv.sock for privileged ops, per-agent manager sockets).
These are bind-mounted into containers via the nspawn conf. UDS paths
traverse the VFS, not the network stack, so PRIVATE_NETWORK=1 does not
affect them.
The nix side also enables IP forwarding + NAT (agents reach the internet
through the host) and drops bridge-subnet → loopback traffic (defence-in-depth
against a compromised agent reaching the c0re dashboard HTTP at
127.0.0.1). Agents have no legitimate reason to reach the dashboard over
loopback — the hive-c0re admin socket is a UDS, not TCP.
Prerequisites before flipping on
- All agents must have
hyperhive.web.useUnixSocket = true. Agents that still bind TCP on0.0.0.0:<port>will be reachable at their bridge IP from other agents on the same subnet — defeating the isolation goal. The gateway routes via unix sockets so gateway reach is unaffected.
Migration behaviour
Containers are destroyed and re-created when the flag flips. Agent state
under /agents/<name>/state/ is bind-mounted and survives; the container
rootfs is recreated cleanly from the nix store.
Cross-references
docs/gateway.md— vhost map + the gateway container's other duties