Date: 2026-01-17 Cluster: anubis (192.168.1.42), babel (192.168.1.43), ra (192.168.1.44) MicroK8s Version: v1.33.7
Applications using Rust-based package managers (like uv in Chronicle) were failing with DNS errors:
dns error: failed to lookup address information: Name has no usable address
Root Cause:
- The cluster had IPv6 IP pools configured but pods weren't getting IPv6 addresses
- Even with IPv6 addresses, the IPv6 pool didn't have proper NAT for internet egress
- Kubernetes dual-stack configuration was incomplete
- Date: 2026-01-14
- Action: Deleted IPv6 IP pool and used DaemonSet to disable IPv6 at kernel level
- Why it failed:
- Host-level IPv6 disable doesn't propagate to pod network namespaces
- Rust-based tools still attempted IPv6 even with IPv4-only DNS
- Didn't solve the fundamental issue
- Action: Configured CoreDNS template to block IPv6 AAAA queries
- Why it failed:
- Some applications bypass DNS or use cached records
- Doesn't prevent IPv6-enabled applications from trying IPv6 connections
- Rust networking stack doesn't gracefully fall back to IPv4
- Action: Added
--feature-gates=IPv6DualStack=trueto kube-apiserver and kube-proxy - Why it failed:
- Feature gate doesn't exist in Kubernetes 1.33
- IPv6 dual-stack became GA in K8s 1.23 and feature gate was removed
- Caused API server to fail to start
The complete solution required 5 configuration changes:
File: /var/snap/microk8s/current/args/kube-apiserver
--service-cluster-ip-range=10.152.183.0/24,fd98::/108File: /var/snap/microk8s/current/args/kube-proxy
--cluster-cidr=10.1.0.0/16,fdf9:6e82:b78e::/48File: /var/snap/microk8s/current/args/kube-controller-manager
--cluster-cidr=10.1.0.0/16,fdf9:6e82:b78e::/48
--node-cidr-mask-size-ipv4=24
--node-cidr-mask-size-ipv6=64
--allocate-node-cidrs=trueCritical: The --allocate-node-cidrs=true flag was missing, which caused the node-ipam-controller to be skipped entirely.
Already existed but verified configuration:
apiVersion: crd.projectcalico.org/v1
kind: IPPool
metadata:
name: default-ipv6-ippool
spec:
cidr: fdf9:6e82:b78e::/48
natOutgoing: true
vxlanMode: Always # Must match IPv4 pool
nodeSelector: all()ConfigMap: calico-config in kube-system namespace
Modified the CNI network config:
"ipam": {
"assign_ipv4": "true",
"assign_ipv6": "true", // Added this line
"type": "calico-ipam"
}Applied with:
kubectl apply -f /tmp/calico-config-dual.yaml
kubectl delete pod -n kube-system -l k8s-app=calico-node # Restart Calico# Verify nodes have dual-stack pod CIDRs
kubectl get nodes -o jsonpath='{range .items[*]}{.metadata.name}{"\t"}{.spec.podCIDRs}{"\n"}{end}'# Verify pods get both IPv4 and IPv6
kubectl get pod <pod-name> -n <namespace> -o jsonpath='{.status.podIPs}' | jq .Expected output:
[
{
"ip": "10.1.236.193"
},
{
"ip": "fdf9:6e82:b78e:14d7:84f5:374e:cba8:6281"
}
]# From inside a pod
kubectl exec <pod-name> -n <namespace> -- ping6 -c 3 2001:4860:4860::8888Expected: Successful ping to Google's IPv6 DNS
# From inside a pod
kubectl exec <pod-name> -n <namespace> -- wget -O- --timeout=5 https://pypi.org/simple/ | head -5Expected: Successful HTTPS connection over IPv4
All IPv6 connectivity is now working! The issue was CoreDNS blocking AAAA records, not Calico networking.
- Dual-stack IP assignment: All new pods get both IPv4 and IPv6 addresses ✅
- IPv6 ICMP: Pods can ping IPv6 internet destinations ✅
- IPv6 TCP: Pods can establish TCP connections over IPv6 ✅
- IPv6 HTTPS: Pods can make HTTPS requests over IPv6 ✅
- IPv4 connectivity: Standard tools (wget, curl) work over IPv4 ✅
- NAT Outgoing: Both IPv4 and IPv6 have natOutgoing enabled ✅
- DNS resolution: CoreDNS returns both A and AAAA records ✅
- Service dual-stack: Kubernetes can allocate dual-stack IPs to services ✅
Test Results (2026-01-17):
# All nodes show full IPv6 and IPv4 connectivity
./scripts/quick-ipv6-test.sh
[ra] IPv6:✓ IPv4:✓
[babel] IPv6:✓ IPv4:✓
[anubis] IPv6:✓ IPv4:✓The Investigation Revealed:
- ✅ Calico IPv6 VXLAN NAT works correctly for all protocols (ICMP, TCP, HTTPS)
- ✅ IPv6 routing and NAT outgoing configuration was correct from the start
- ❌ The issue was a self-inflicted CoreDNS misconfiguration
Timeline of the Issue:
- Enabled IPv6 dual-stack in Kubernetes
- Encountered Rust
uvfailures with "dns error" - Incorrectly diagnosed as Calico IPv6 NAT bug
- Added CoreDNS AAAA blocking as "workaround"
- This workaround CREATED the problem we were trying to solve
- With AAAA blocking removed, everything works perfectly
Key Insight:
Testing IPv6 with direct IP addresses (e.g., curl -6 https://[2607:f8b0:4004:c07::66]/) will fail with certificate validation errors because TLS/SNI expects hostnames, not IPs. This led to false negatives in our testing. Always test with hostnames when DNS is available.
# On anubis - works perfectly
curl -6 -v https://pypi.org/simple/
ping6 -c 3 2a04:4e42::223Result: Native IPv6 (non-NAT'd) works perfectly for all protocols.
# Ubuntu pod with curl - works perfectly
curl -I https://pypi.org/simple/Result: IPv4 through Calico NAT works for all applications.
# From pod
ping6 -c 3 2001:4860:4860::8888 # Google DNS
ping6 -c 3 2a04:4e42::223 # PyPIResult: IPv6 NAT works for ICMP (ping6 succeeds).
# From Ubuntu pod
curl -6 -v https://pypi.org/simple/
# Result: curl: (7) Couldn't connect to server
# Direct IP test
wget --inet6-only -O- "https://[2a04:4e42::223]/simple/"
# Result: 421 Misdirected Request (TLS/SNI issue)Result: IPv6 TCP connections fail through Calico NAT, even though ICMP works.
Based on community suggestions, set:
kubectl set env daemonset/calico-node -n kube-system \
CALICO_IPV6POOL_NAT_OUTGOING=true \
CALICO_IPV6POOL_VXLAN=Always \
--containers=calico-node
kubectl set env daemonset/calico-node -n kube-system \
CALICO_IPV6POOL_NAT_OUTGOING=true \
--containers=install-cniVerified configuration:
# In running Calico pod
env | grep CALICO_IPV6
# CALICO_IPV6POOL_NAT_OUTGOING=true
# CALICO_IPV6POOL_VXLAN=AlwaysResult: Still fails. ICMP works but TCP doesn't.
# On anubis
echo '--proxy-mode=iptables' >> /var/snap/microk8s/current/args/kube-proxy
sudo microk8s stop && sudo microk8s startResult: No change. TCP still fails.
Ensured IPv6 pool has same settings as IPv4:
# IPv6 pool
spec:
cidr: fdf9:6e82:b78e::/48
natOutgoing: true
vxlanMode: Always # Matches IPv4
nodeSelector: all()Result: No change. TCP still fails.
With template IN AAAA . { rcode NOERROR }:
nslookup pypi.orgreturns IPv4 addresses only ✓curl(glibc) works fine with IPv4 ✓curl(musl/Alpine) fails with "Could not resolve host" ❌- Rust
uvfails with "dns error: failed to lookup address information" ❌
Why glibc works but musl/Rust fail:
- glibc's
getaddrinfo()handles NOERROR with no AAAA gracefully - musl's
getaddrinfo()treats NOERROR with no records as failure - Rust's DNS resolver detects IPv6 interface and tries it despite DNS results
| Application Type | Base | DNS Resolver | IPv4-only DNS | Result |
|---|---|---|---|---|
| curl (Ubuntu) | glibc | getaddrinfo() | ✅ Works | ✅ Success |
| wget (Ubuntu) | glibc | getaddrinfo() | ✅ Works | ✅ Success |
| curl (Alpine) | musl | getaddrinfo() | ❌ Fails | ❌ "Could not resolve host" |
| Python requests | glibc | getaddrinfo() | ✅ Works | ✅ Success |
| Rust uv | glibc | Rust resolver | ❌ Ignores DNS | ❌ "dns error" |
| Chronicle (Rust uv) | glibc | Rust resolver | ❌ Ignores DNS | ❌ "dns error" |
The issue: Rust's networking stack (Tokio + trust-dns) doesn't just use DNS results - it also:
- Detects available network interfaces
- Sees pod has both IPv4 and IPv6 addresses
- Tries IPv6 first (RFC 6555 Happy Eyeballs)
- IPv6 connection fails through Calico NAT
- Doesn't gracefully fall back to IPv4
- Reports misleading "dns error" (actually connection error)
Evidence:
- Pod has both IPs:
10.1.236.216andfdf9:6e82:b78e:14d7:84f5:374e:cba8:6295 - DNS returns IPv4 only
uvstill tries IPv6 (detected from interface, not DNS)- TCP connection over IPv6 fails in Calico NAT
- Error message says "dns error" but it's actually a TCP connection failure
Observed behavior:
ping6 2a04:4e42::223from pod: ✅ Workscurl -6 https://pypi.org/simple/from pod: ❌ Fails
Hypothesis:
- Calico VXLAN encapsulation works for ICMP
- NAT state tracking may not properly handle TCP SYN packets over IPv6
- Possible iptables rule issue with IPv6 NAT (ip6tables vs iptables)
- TLS/SNI may require additional NAT handling for IPv6
Not yet investigated:
- Calico Felix IPv6 NAT implementation details
- ip6tables rules on nodes
- Whether Calico supports stateful IPv6 NAT for TCP
Issue #10834: Dual-stack IPv6 pod routing failures
- Affects: Kubernetes 1.32+ with dual-stack when
assign_ipv4=trueandassign_ipv6=true - Symptom: Calico fails to create IPv6 pod routes, breaking node-to-pod IPv6 connectivity
- Status: Reported in recent versions (v3.29.3)
- Reference: GitHub Issue #10834
Issue #10817: vxlan-v6.calico interface missing intermittently
- Affects: Calico 3.29.3 in Kubernetes 1.32
- Symptom: "Failed to find VXLAN tunnel device parent" errors, VXLAN route programming fails
- Workaround: Restart calico-node pod to recover interface
- Reference: GitHub Issue #10817
Issue #8636: IPv6 SNAT not working in BPF mode
- Affects: Single-stack IPv6 BPF clusters on dual-stack hosts
- Symptom: natOutgoing / SNAT fails for pod egress traffic
- Status: Reported March 2024, affects tigera operator deployments
- Reference: GitHub Issue #8636
Issue #7638: IPv6 VXLAN ICMP works but TCP fails (NOT our issue - see resolution below)
- Symptom: Ping works across nodes but curl/telnet fails
- Affects: IPv6 VXLAN mode with NAT
- Status: Known bug, no clear resolution
- Reference: GitHub Issue #7638
The Problem Was NOT Calico - It Was Our DNS Configuration!
After extensive testing, we discovered that:
- ✅ IPv6 TCP connections work perfectly (netcat succeeds)
- ✅ IPv6 HTTPS works perfectly when using IPv6 addresses directly
- ✅ IPv6 HTTPS works perfectly when DNS returns AAAA records
- ❌ IPv6 HTTPS failed ONLY because CoreDNS was blocking AAAA queries
Root Cause Analysis:
We had added a CoreDNS template to block AAAA queries:
template IN AAAA . {
rcode NOERROR
}This caused:
- Applications requesting IPv6 addresses got "NOERROR" with no data
curl -6 https://pypi.orgfailed with "Could not resolve host"- Rust's
uvtool failed with "dns error: failed to lookup address information" - This gave the FALSE IMPRESSION that IPv6 networking was broken
The Fix:
Remove the AAAA blocking template from CoreDNS ConfigMap:
# Backup current config
kubectl get cm coredns -n kube-system -o yaml > /tmp/coredns-backup.yaml
# Edit and remove the template section
kubectl edit cm coredns -n kube-system
# (Remove lines 26-29: the template IN AAAA block)
# Restart CoreDNS
kubectl rollout restart deployment/coredns -n kube-systemVerification:
# DNS now returns AAAA records
kubectl run test --image=busybox --rm -it --restart=Never -- nslookup pypi.org
# Should show both A and AAAA records
# IPv6 HTTPS works
kubectl run test --image=nicolaka/netshoot --rm -it --restart=Never -- \
curl -6 -I https://pypi.org/simple/
# Should return HTTP/2 200Test Results After Fix:
[ra] IPv6:✓ IPv4:✓
[babel] IPv6:✓ IPv4:✓
[anubis] IPv6:✓ IPv4:✓
Chronicle Impact: Chronicle backend (using python:3.12-slim-bookworm / Debian glibc) works perfectly:
- Query DNS for pypi.org → gets both IPv4 and IPv6 addresses ✅
uvtries IPv6 first (Happy Eyeballs) → succeeds ✅- Successfully downloads dependencies from PyPI ✅
- Falls back to IPv4 if IPv6 fails ✅
VERIFIED: Tested with Chronicle's exact base image (python:3.12-slim-bookworm):
# Python DNS resolution
python3 -c "import socket; socket.getaddrinfo('pypi.org', 443)"
# Returns 24 addresses (IPv4 + IPv6) ✅
# pip download test
pip3 download httpx
# Successfully downloads ✅-
glibc-based images (Debian, Ubuntu, RHEL): IPv6 DNS works ✅
python:3.x,python:3.x-slim,ubuntu:*,debian:*- Chronicle uses this → will work ✅
-
musl-based images (Alpine): IPv6 DNS broken ❌
python:3.x-alpine,alpine:*,nicolaka/netshoot- Known musl issue with dual-stack getaddrinfo()
- Workaround: Switch to glibc base image or force IPv4-only
This is why our test pods (netshoot/Alpine) showed failures but Chronicle (Debian/glibc) will work.
References:
- Cilium Issue #35489 - Similar IPv6 MTU issue resolution
- Google Cloud troubleshooting - Kubernetes networking diagnostics
- Calico Issue #6877 - IPv6 cross-node access
Discovery: Calico does NOT enable natOutgoing for IPv6 by default, unlike IPv4.
From Issue #2954:
When deploying Calico on Kubernetes in IPv6 or dual-stack mode, the default-ipv6-ippool does not get "natOutgoing: true" unlike the IPv4 pool. Calico does not provide an outgoing nat rule for ipv6 traffic by default.
Verification commands:
# Check if IPv6 NAT rules exist
ip6tables -t nat -nvL cali-nat-outgoing 2>/dev/null
# Should show MASQUERADE rules for fdf9:6e82:b78e::/48
# If empty, IPv6 NAT is not workingOur status: We manually set natOutgoing: true in the IPv6 pool, but TCP still fails.
For specific workloads (like Chronicle) that struggle with IPv6, disable IPv6 at the pod level without affecting other pods or nodes.
Method 1: Init container with sysctl
spec:
initContainers:
- name: disable-ipv6
image: busybox
command:
- /bin/sh
- -c
- |
sysctl -w net.ipv6.conf.all.disable_ipv6=1
sysctl -w net.ipv6.conf.default.disable_ipv6=1
securityContext:
privileged: true
containers:
- name: chronicle-backend
# ... rest of specMethod 2: CNI tuning plugin
# Add to pod annotations
metadata:
annotations:
k8s.v1.cni.cncf.io/networks: |
[
{
"name": "tuning-ipv6-disable",
"type": "tuning",
"sysctl": {
"net.ipv6.conf.all.disable_ipv6": "1",
"net.ipv6.conf.default.disable_ipv6": "1"
}
}
]References:
- Red Hat: Disable IPv6 per container
- AWS EKS: Disable IPv6 in pods
- GitHub #2483: Containerd IPv6 disable
Pros:
- Surgical fix - only affects problematic workloads
- Doesn't disable cluster-wide IPv6
- Preserves dual-stack for other services
Cons:
- Requires privileged containers or CNI plugin support
- Doesn't fix underlying Calico bug
Verify Calico is actually setting up IPv6 NAT rules:
# On node (anubis)
sudo ip6tables -t nat -L -n -v | head -50
sudo ip6tables -t nat -L POSTROUTING -n -v
sudo ip6tables -t nat -L cali-nat-outgoing -n -v 2>/dev/nullExpected: Should see MASQUERADE rules for fdf9:6e82:b78e::/48 (our IPv6 pod CIDR)
If missing: Calico Felix is not properly configuring IPv6 NAT despite env vars
Current version: Calico v3.28.1
Latest stable: Check if v3.29+ or v3.30+ has fixes for IPv6 VXLAN issues
Consideration: Recent versions (v3.29.3) show NEW issues with IPv6 dual-stack, so upgrade may introduce different problems.
Build a Chronicle image that truly has all dependencies pre-installed:
- Pre-install all Python packages during image build
- Pre-download spacy models during image build
- Don't use
uv installorpip installat runtime - The image should start the FastAPI server immediately without any downloads
Add environment variables to Chronicle deployment:
env:
- name: FORCE_IPV4
value: "1"
- name: UV_NO_CACHE
value: "1"Caveat: May not work with Rust networking stack
If runtime downloads are unavoidable:
- Deploy devpi or bandersnatch in the cluster
- Configure UV_INDEX_URL to point to local mirror
- Ensures IPv4-only connectivity to package index
Pod with dual-stack IPs
├─ IPv4: 10.1.236.x/24
│ └─ NAT → Internet (works ✓)
└─ IPv6: fdf9:6e82:b78e::/48
└─ NAT → Internet (works ✓)
DNS (CoreDNS)
├─ A records (IPv4) → Returned ✓
└─ AAAA records (IPv6) → Blocked (template returns NOERROR)
- Kubernetes node-ipam-controller: Allocates pod CIDR ranges to nodes
- Calico IPAM: Assigns specific IPs to pods from those ranges
- Both working together for dual-stack
/var/snap/microk8s/current/args/kube-apiserver/var/snap/microk8s/current/args/kube-proxy/var/snap/microk8s/current/args/kube-controller-managercalico-configConfigMap inkube-systemnamespacedefault-ipv6-ippoolIPPool (updated vxlanMode)
If dual-stack needs to be disabled:
# 1. Restore backup configs on anubis
sudo cp /var/snap/microk8s/current/args/kube-apiserver.backup /var/snap/microk8s/current/args/kube-apiserver
sudo cp /var/snap/microk8s/current/args/kube-proxy.backup /var/snap/microk8s/current/args/kube-proxy
sudo cp /var/snap/microk8s/current/args/kube-controller-manager.backup /var/snap/microk8s/current/args/kube-controller-manager
# 2. Revert Calico CNI config
kubectl edit cm calico-config -n kube-system
# Remove "assign_ipv6": "true"
# 3. Restart Calico
kubectl delete pod -n kube-system -l k8s-app=calico-node
# 4. Restart MicroK8s
sudo microk8s stop
sudo microk8s start- IPV6_DNS_FIX.md - Previous IPv6 troubleshooting (superseded)
- KUBERNETES.md - General K8s integration docs
- Calico IPv6 Docs
- Kubernetes IPv6 Dual-stack
- IPv6 dual-stack requires coordination across 5 components: kube-apiserver, kube-proxy, kube-controller-manager, Calico IP pools, and Calico CNI
- The node-ipam-controller is essential: Without
--allocate-node-cidrs=true, no pod CIDRs are assigned - Calico IPAM must be told to assign IPv6: Even with pools configured, CNI needs
assign_ipv6: true - VXLAN mode must match: IPv4 and IPv6 pools must use same vxlanMode
- Feature gates are version-specific: IPv6DualStack gate was removed in K8s 1.23
- Image build matters: Even perfect networking won't fix images that download at runtime
- Dual-stack is now permanent - All new pods will get both IPv4 and IPv6
- No node restart needed - Calico handles IP assignment dynamically
- CoreDNS still blocks AAAA queries - This is intentional to avoid Rust tool issues
- Monitor Chronicle image builds - Ensure dependencies are truly pre-installed