Docker
These apply whether you run standalone Docker or Docker inside Kubernetes.Image Build
Use a minimal base image
Use
alpine or distroless to reduce attack surface. Ensure the image includes libc dependencies required by go-quai’s CGO components (LevelDB, kawpow).Copy the chain config into the image
Bake the default configuration files into the image so the node can start without external volume mounts for config.
Container Runtime
Raise the file descriptor limit to 65536+
The default container ulimit is often 1024. libp2p’s dial limiter will throttle outbound peer connections at low FD counts, reducing connectivity and forcing circuit relay fallback.
Allocate sufficient CPU
go-quai performs PoW verification (kawpow, scrypt) on every received block. CPU starvation delays block processing and causes orphans. Allocate a minimum of 4 cores.
Allocate sufficient memory
The working set is approximately 2 GB (LevelDB memdb + buffer pool + kawpow cache). Set the memory limit to at least 2x the working set to avoid GC thrashing.
Use local SSD storage for chain data
LevelDB is I/O intensive. Mount chain data from a local SSD, not network-attached storage.
Publish p2p ports on the host
The container must be directly reachable by peers. Publish both TCP and UDP on the p2p port.
Kubernetes
These apply on top of the Docker requirements when orchestrating with Kubernetes.Resource Configuration
Use Guaranteed QoS to prevent CPU throttling
Burstable pods are subject to CFS (Completely Fair Scheduler) throttling. When the node gets throttled mid-block-processing, it cannot validate blocks in time, causing orphans. Set requests equal to limits.
Networking
Use hostNetwork or hostPort for p2p ports
Overlay networks (Flannel, Calico, Cilium) add NAT layers that prevent direct peer connections. libp2p falls back to circuit relay when it cannot establish direct connections, adding significant latency.
Set the external address flag to the host's public IP
The node must advertise an address that peers can actually reach.
Ensure network policies allow both ingress and egress
If using Cilium or Calico network policies, remember that once any policy selects an endpoint, all unmatched traffic is denied. You must explicitly allow:
- Ingress to the p2p port from external peers
- Egress to other peers and DNS
Storage
Use local PersistentVolumes for chain data
Chain data must survive pod restarts. Use a
PersistentVolumeClaim backed by local SSD storage, not network-attached volumes.Availability
Reverse Proxy (nginx)
If the node sits behind a reverse proxy (e.g., nginx on an edge VPS forwarding to the node), these settings are critical.Configuration
Set proxy_timeout to 300s or higher
P2P connections are long-lived with idle periods between block messages. A low timeout (e.g., 1s) terminates connections during idle periods, forcing peers to reconnect through circuit relay.
Create separate server blocks for TCP and UDP
listen <port>; is TCP only. QUIC transport requires an explicit UDP listener. Without it, UDP traffic is silently dropped even if the port is exposed.Set proxy_responses to 0 for UDP
proxy_responses 1 closes the UDP session after a single response packet. QUIC requires many packets per session. Set to 0 (unlimited) and let proxy_timeout handle session cleanup.Monitoring
- CPU Throttling
- File Descriptors
- pprof
- Proxy Connectivity
Check if the container is being CPU-throttled by CFS:
Bare Metal vs Containerized Comparison
| Factor | Bare Metal | Containerized (Default) | Fix |
|---|---|---|---|
| CPU scheduling | Unthrottled | CFS quota throttling | Guaranteed QoS, dedicated cores |
| Networking | Direct peer connections | NAT/overlay, circuit relay fallback | hostNetwork or hostPort |
| File descriptors | System default (65536+) | Container default (1024) | Raise ulimit to 65536+ |
| Memory | Unrestricted | Cgroup OOM risk | Set limits to 2x working set |
| Storage I/O | Local NVMe/SSD | Potentially network-attached | Use local PVs |
| Connection lifetime | Unlimited | Proxy may terminate early | proxy_timeout 300s |
