Skip to main content
Running go-quai inside containers introduces networking, resource, and configuration constraints that do not exist on bare metal. This guide covers the requirements across three layers: Docker image and container runtime, Kubernetes orchestration, and reverse proxy configuration.

Docker

These apply whether you run standalone Docker or Docker inside Kubernetes.

Image Build

1

Use a minimal base image

Use alpine or distroless to reduce attack surface. Ensure the image includes libc dependencies required by go-quai’s CGO components (LevelDB, kawpow).
2

Copy the chain config into the image

Bake the default configuration files into the image so the node can start without external volume mounts for config.
3

Expose both TCP and UDP on the p2p port

In your Dockerfile, declare both protocols:
EXPOSE 4002/tcp
EXPOSE 4002/udp

Container Runtime

1

Raise the file descriptor limit to 65536+

The default container ulimit is often 1024. libp2p’s dial limiter will throttle outbound peer connections at low FD counts, reducing connectivity and forcing circuit relay fallback.
docker run --ulimit nofile=65536:65536 ...
2

Allocate sufficient CPU

go-quai performs PoW verification (kawpow, scrypt) on every received block. CPU starvation delays block processing and causes orphans. Allocate a minimum of 4 cores.
docker run --cpus=4 ...
3

Allocate sufficient memory

The working set is approximately 2 GB (LevelDB memdb + buffer pool + kawpow cache). Set the memory limit to at least 2x the working set to avoid GC thrashing.
docker run --memory=8g ...
4

Use local SSD storage for chain data

LevelDB is I/O intensive. Mount chain data from a local SSD, not network-attached storage.
docker run -v /mnt/ssd/quai-data:/root/.quai ...
5

Publish p2p ports on the host

The container must be directly reachable by peers. Publish both TCP and UDP on the p2p port.
docker run -p 4002:4002/tcp -p 4002:4002/udp ...
6

Enable pprof for diagnostics

Always enable pprof in containerized environments for debugging performance issues.
docker run ... go-quai start --node.pprof=true

Kubernetes

These apply on top of the Docker requirements when orchestrating with Kubernetes.

Resource Configuration

1

Use Guaranteed QoS to prevent CPU throttling

Burstable pods are subject to CFS (Completely Fair Scheduler) throttling. When the node gets throttled mid-block-processing, it cannot validate blocks in time, causing orphans. Set requests equal to limits.
resources:
  requests:
    cpu: "4"
    memory: "8Gi"
  limits:
    cpu: "4"
    memory: "8Gi"
2

Raise file descriptor limits

Set ulimits via the container entrypoint or runtime class configuration. Verify inside the pod with ulimit -n.

Networking

1

Use hostNetwork or hostPort for p2p ports

Overlay networks (Flannel, Calico, Cilium) add NAT layers that prevent direct peer connections. libp2p falls back to circuit relay when it cannot establish direct connections, adding significant latency.
# Option A: hostNetwork (simplest, gives full host networking)
spec:
  hostNetwork: true
  dnsPolicy: ClusterFirstWithHostNet

# Option B: hostPort (exposes only specific ports)
ports:
  - containerPort: 4002
    hostPort: 4002
    protocol: TCP
  - containerPort: 4002
    hostPort: 4002
    protocol: UDP
2

Set the external address flag to the host's public IP

The node must advertise an address that peers can actually reach.
args:
  - --node.external-addr=/ip4/<PUBLIC_IP>/tcp/4002
  - --node.force-public=true
  - --node.portmap=false
3

Ensure network policies allow both ingress and egress

If using Cilium or Calico network policies, remember that once any policy selects an endpoint, all unmatched traffic is denied. You must explicitly allow:
  • Ingress to the p2p port from external peers
  • Egress to other peers and DNS
4

Verify pod labels match network policy selectors

A common mistake is defining a network policy that selects on labels the pod does not have. The policy silently does nothing and traffic may be blocked or unexpectedly open.
# Verify labels match
kubectl get pods -n <namespace> --show-labels
kubectl get networkpolicy -n <namespace> -o yaml

Storage

1

Use local PersistentVolumes for chain data

Chain data must survive pod restarts. Use a PersistentVolumeClaim backed by local SSD storage, not network-attached volumes.
2

Pin pods to specific nodes

Use nodeSelector or node affinity to keep the pod on the same host as its local storage and maintain a stable network identity.
spec:
  nodeSelector:
    kubernetes.io/hostname: <target-node>

Availability

1

Set a PodDisruptionBudget

Prevent Kubernetes from evicting the node pod during rolling updates or cluster maintenance.
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: quai-node-pdb
spec:
  maxUnavailable: 0
  selector:
    matchLabels:
      app: quai-node

Reverse Proxy (nginx)

If the node sits behind a reverse proxy (e.g., nginx on an edge VPS forwarding to the node), these settings are critical.
A misconfigured proxy is the single most common cause of orphan rate issues in containerized deployments. The proxy sits in the critical path of every peer connection.

Configuration

1

Use the stream module, not http

P2P traffic is raw TCP/UDP, not HTTP. Use nginx’s stream module.
stream {
    # ...
}
2

Set proxy_timeout to 300s or higher

P2P connections are long-lived with idle periods between block messages. A low timeout (e.g., 1s) terminates connections during idle periods, forcing peers to reconnect through circuit relay.
proxy_timeout 300s;
proxy_connect_timeout 10s;
3

Create separate server blocks for TCP and UDP

listen <port>; is TCP only. QUIC transport requires an explicit UDP listener. Without it, UDP traffic is silently dropped even if the port is exposed.
stream {
    upstream backend_tcp {
        server <node-address>:4002;
    }

    upstream backend_udp {
        server <node-address>:4002;
    }

    server {
        listen 4002;
        proxy_pass backend_tcp;
        proxy_timeout 300s;
        proxy_connect_timeout 10s;
    }

    server {
        listen 4002 udp reuseport;
        proxy_pass backend_udp;
        proxy_timeout 300s;
        proxy_responses 0;
    }
}
4

Set proxy_responses to 0 for UDP

proxy_responses 1 closes the UDP session after a single response packet. QUIC requires many packets per session. Set to 0 (unlimited) and let proxy_timeout handle session cleanup.
5

Raise worker_connections

The default of 1024 may be insufficient for a p2p node maintaining hundreds of peer connections. Set to 4096 or higher.
events {
    worker_connections 4096;
}

Monitoring

Check if the container is being CPU-throttled by CFS:
# From inside the container
cat /sys/fs/cgroup/cpu.stat | grep throttled

# From Prometheus
# Alert on: container_cpu_cfs_throttled_periods_total

Bare Metal vs Containerized Comparison

FactorBare MetalContainerized (Default)Fix
CPU schedulingUnthrottledCFS quota throttlingGuaranteed QoS, dedicated cores
NetworkingDirect peer connectionsNAT/overlay, circuit relay fallbackhostNetwork or hostPort
File descriptorsSystem default (65536+)Container default (1024)Raise ulimit to 65536+
MemoryUnrestrictedCgroup OOM riskSet limits to 2x working set
Storage I/OLocal NVMe/SSDPotentially network-attachedUse local PVs
Connection lifetimeUnlimitedProxy may terminate earlyproxy_timeout 300s