IPv6 Dual-Stack Configuration for MicroK8s

Date: 2026-01-17 Cluster: anubis (192.168.1.42), babel (192.168.1.43), ra (192.168.1.44) MicroK8s Version: v1.33.7

Problem Statement

Applications using Rust-based package managers (like uv in Chronicle) were failing with DNS errors:

dns error: failed to lookup address information: Name has no usable address

Root Cause:

The cluster had IPv6 IP pools configured but pods weren't getting IPv6 addresses
Even with IPv6 addresses, the IPv6 pool didn't have proper NAT for internet egress
Kubernetes dual-stack configuration was incomplete

What Didn't Work

❌ Attempt 1: Disabling IPv6 (Previous Approach)

Date: 2026-01-14
Action: Deleted IPv6 IP pool and used DaemonSet to disable IPv6 at kernel level
Why it failed:
- Host-level IPv6 disable doesn't propagate to pod network namespaces
- Rust-based tools still attempted IPv6 even with IPv4-only DNS
- Didn't solve the fundamental issue

❌ Attempt 2: CoreDNS AAAA Blocking Only

Action: Configured CoreDNS template to block IPv6 AAAA queries
Why it failed:
- Some applications bypass DNS or use cached records
- Doesn't prevent IPv6-enabled applications from trying IPv6 connections
- Rust networking stack doesn't gracefully fall back to IPv4

❌ Attempt 3: Adding IPv6DualStack Feature Gate

Action: Added --feature-gates=IPv6DualStack=true to kube-apiserver and kube-proxy
Why it failed:
- Feature gate doesn't exist in Kubernetes 1.33
- IPv6 dual-stack became GA in K8s 1.23 and feature gate was removed
- Caused API server to fail to start

What Did Work

✅ Full Dual-Stack Configuration

The complete solution required 5 configuration changes:

1. kube-apiserver - IPv6 Service CIDR

File: /var/snap/microk8s/current/args/kube-apiserver

--service-cluster-ip-range=10.152.183.0/24,fd98::/108

2. kube-proxy - IPv6 Cluster CIDR

File: /var/snap/microk8s/current/args/kube-proxy

--cluster-cidr=10.1.0.0/16,fdf9:6e82:b78e::/48

3. kube-controller-manager - Enable Node IPAM

File: /var/snap/microk8s/current/args/kube-controller-manager

--cluster-cidr=10.1.0.0/16,fdf9:6e82:b78e::/48
--node-cidr-mask-size-ipv4=24
--node-cidr-mask-size-ipv6=64
--allocate-node-cidrs=true

Critical: The --allocate-node-cidrs=true flag was missing, which caused the node-ipam-controller to be skipped entirely.

4. Calico IPv6 IP Pool with NAT Outgoing

Already existed but verified configuration:

apiVersion: crd.projectcalico.org/v1
kind: IPPool
metadata:
  name: default-ipv6-ippool
spec:
  cidr: fdf9:6e82:b78e::/48
  natOutgoing: true
  vxlanMode: Always  # Must match IPv4 pool
  nodeSelector: all()

5. Calico CNI - Enable IPv6 Assignment

ConfigMap: calico-config in kube-system namespace

Modified the CNI network config:

"ipam": {
    "assign_ipv4": "true",
    "assign_ipv6": "true",  // Added this line
    "type": "calico-ipam"
}

Applied with:

kubectl apply -f /tmp/calico-config-dual.yaml
kubectl delete pod -n kube-system -l k8s-app=calico-node  # Restart Calico

Verification Commands

Check Node Configuration

# Verify nodes have dual-stack pod CIDRs
kubectl get nodes -o jsonpath='{range .items[*]}{.metadata.name}{"\t"}{.spec.podCIDRs}{"\n"}{end}'

Check Pod IPs

# Verify pods get both IPv4 and IPv6
kubectl get pod <pod-name> -n <namespace> -o jsonpath='{.status.podIPs}' | jq .

Expected output:

[
  {
    "ip": "10.1.236.193"
  },
  {
    "ip": "fdf9:6e82:b78e:14d7:84f5:374e:cba8:6281"
  }
]

Test IPv6 Connectivity

# From inside a pod
kubectl exec <pod-name> -n <namespace> -- ping6 -c 3 2001:4860:4860::8888

Expected: Successful ping to Google's IPv6 DNS

Test IPv4 Connectivity

# From inside a pod
kubectl exec <pod-name> -n <namespace> -- wget -O- --timeout=5 https://pypi.org/simple/ | head -5

Expected: Successful HTTPS connection over IPv4

Current Status

✅ IPv6 Dual-Stack Fully Working (RESOLVED 2026-01-17)

All IPv6 connectivity is now working! The issue was CoreDNS blocking AAAA records, not Calico networking.

Dual-stack IP assignment: All new pods get both IPv4 and IPv6 addresses ✅
IPv6 ICMP: Pods can ping IPv6 internet destinations ✅
IPv6 TCP: Pods can establish TCP connections over IPv6 ✅
IPv6 HTTPS: Pods can make HTTPS requests over IPv6 ✅
IPv4 connectivity: Standard tools (wget, curl) work over IPv4 ✅
NAT Outgoing: Both IPv4 and IPv6 have natOutgoing enabled ✅
DNS resolution: CoreDNS returns both A and AAAA records ✅
Service dual-stack: Kubernetes can allocate dual-stack IPs to services ✅

Test Results (2026-01-17):

# All nodes show full IPv6 and IPv4 connectivity
./scripts/quick-ipv6-test.sh
[ra]     IPv6:✓ IPv4:✓
[babel]  IPv6:✓ IPv4:✓
[anubis] IPv6:✓ IPv4:✓

📖 What We Learned

The Investigation Revealed:

✅ Calico IPv6 VXLAN NAT works correctly for all protocols (ICMP, TCP, HTTPS)
✅ IPv6 routing and NAT outgoing configuration was correct from the start
❌ The issue was a self-inflicted CoreDNS misconfiguration

Timeline of the Issue:

Enabled IPv6 dual-stack in Kubernetes
Encountered Rust uv failures with "dns error"
Incorrectly diagnosed as Calico IPv6 NAT bug
Added CoreDNS AAAA blocking as "workaround"
This workaround CREATED the problem we were trying to solve
With AAAA blocking removed, everything works perfectly

Key Insight: Testing IPv6 with direct IP addresses (e.g., curl -6 https://[2607:f8b0:4004:c07::66]/) will fail with certificate validation errors because TLS/SNI expects hostnames, not IPs. This led to false negatives in our testing. Always test with hostnames when DNS is available.

Detailed Investigation Log

What We Tested

✅ Verified IPv6 HTTPS Works on Host/Nodes

# On anubis - works perfectly
curl -6 -v https://pypi.org/simple/
ping6 -c 3 2a04:4e42::223

Result: Native IPv6 (non-NAT'd) works perfectly for all protocols.

✅ Verified IPv4 Works in Pods

# Ubuntu pod with curl - works perfectly
curl -I https://pypi.org/simple/

Result: IPv4 through Calico NAT works for all applications.

✅ Verified IPv6 ICMP Works in Pods

# From pod
ping6 -c 3 2001:4860:4860::8888  # Google DNS
ping6 -c 3 2a04:4e42::223        # PyPI

Result: IPv6 NAT works for ICMP (ping6 succeeds).

❌ IPv6 TCP/HTTPS Fails in Pods

# From Ubuntu pod
curl -6 -v https://pypi.org/simple/
# Result: curl: (7) Couldn't connect to server

# Direct IP test
wget --inet6-only -O- "https://[2a04:4e42::223]/simple/"
# Result: 421 Misdirected Request (TLS/SNI issue)

Result: IPv6 TCP connections fail through Calico NAT, even though ICMP works.

Calico IPv6 NAT Configuration Attempts

Attempt 1: Enable NAT Outgoing (Standard Recommendation)

Based on community suggestions, set:

kubectl set env daemonset/calico-node -n kube-system \
  CALICO_IPV6POOL_NAT_OUTGOING=true \
  CALICO_IPV6POOL_VXLAN=Always \
  --containers=calico-node

kubectl set env daemonset/calico-node -n kube-system \
  CALICO_IPV6POOL_NAT_OUTGOING=true \
  --containers=install-cni

Verified configuration:

# In running Calico pod
env | grep CALICO_IPV6
# CALICO_IPV6POOL_NAT_OUTGOING=true
# CALICO_IPV6POOL_VXLAN=Always

Result: Still fails. ICMP works but TCP doesn't.

Attempt 2: Set kube-proxy to iptables Mode

# On anubis
echo '--proxy-mode=iptables' >> /var/snap/microk8s/current/args/kube-proxy
sudo microk8s stop && sudo microk8s start

Result: No change. TCP still fails.

Attempt 3: Match IPv6 Pool Config to IPv4

Ensured IPv6 pool has same settings as IPv4:

# IPv6 pool
spec:
  cidr: fdf9:6e82:b78e::/48
  natOutgoing: true
  vxlanMode: Always      # Matches IPv4
  nodeSelector: all()

Result: No change. TCP still fails.

DNS Resolution Testing

CoreDNS AAAA Blocking Behavior

With template IN AAAA . { rcode NOERROR }:

nslookup pypi.org returns IPv4 addresses only ✓
curl (glibc) works fine with IPv4 ✓
curl (musl/Alpine) fails with "Could not resolve host" ❌
Rust uv fails with "dns error: failed to lookup address information" ❌

Why glibc works but musl/Rust fail:

glibc's getaddrinfo() handles NOERROR with no AAAA gracefully
musl's getaddrinfo() treats NOERROR with no records as failure
Rust's DNS resolver detects IPv6 interface and tries it despite DNS results

Application Compatibility Matrix

Application Type	Base	DNS Resolver	IPv4-only DNS	Result
curl (Ubuntu)	glibc	getaddrinfo()	✅ Works	✅ Success
wget (Ubuntu)	glibc	getaddrinfo()	✅ Works	✅ Success
curl (Alpine)	musl	getaddrinfo()	❌ Fails	❌ "Could not resolve host"
Python requests	glibc	getaddrinfo()	✅ Works	✅ Success
Rust uv	glibc	Rust resolver	❌ Ignores DNS	❌ "dns error"
Chronicle (Rust uv)	glibc	Rust resolver	❌ Ignores DNS	❌ "dns error"

Why Rust `uv` Specifically Fails

The issue: Rust's networking stack (Tokio + trust-dns) doesn't just use DNS results - it also:

Detects available network interfaces
Sees pod has both IPv4 and IPv6 addresses
Tries IPv6 first (RFC 6555 Happy Eyeballs)
IPv6 connection fails through Calico NAT
Doesn't gracefully fall back to IPv4
Reports misleading "dns error" (actually connection error)

Evidence:

Pod has both IPs: 10.1.236.216 and fdf9:6e82:b78e:14d7:84f5:374e:cba8:6295
DNS returns IPv4 only
uv still tries IPv6 (detected from interface, not DNS)
TCP connection over IPv6 fails in Calico NAT
Error message says "dns error" but it's actually a TCP connection failure

Why Calico IPv6 NAT Works for ICMP but Not TCP

Observed behavior:

ping6 2a04:4e42::223 from pod: ✅ Works
curl -6 https://pypi.org/simple/ from pod: ❌ Fails

Hypothesis:

Calico VXLAN encapsulation works for ICMP
NAT state tracking may not properly handle TCP SYN packets over IPv6
Possible iptables rule issue with IPv6 NAT (ip6tables vs iptables)
TLS/SNI may require additional NAT handling for IPv6

Not yet investigated:

Calico Felix IPv6 NAT implementation details
ip6tables rules on nodes
Whether Calico supports stateful IPv6 NAT for TCP

Recent Calico Issues and Additional Investigation (2026-01-17)

Known Calico IPv6 Bugs

Issue #10834: Dual-stack IPv6 pod routing failures

Affects: Kubernetes 1.32+ with dual-stack when assign_ipv4=true and assign_ipv6=true
Symptom: Calico fails to create IPv6 pod routes, breaking node-to-pod IPv6 connectivity
Status: Reported in recent versions (v3.29.3)
Reference: GitHub Issue #10834

Issue #10817: vxlan-v6.calico interface missing intermittently

Affects: Calico 3.29.3 in Kubernetes 1.32
Symptom: "Failed to find VXLAN tunnel device parent" errors, VXLAN route programming fails
Workaround: Restart calico-node pod to recover interface
Reference: GitHub Issue #10817

Issue #8636: IPv6 SNAT not working in BPF mode

Affects: Single-stack IPv6 BPF clusters on dual-stack hosts
Symptom: natOutgoing / SNAT fails for pod egress traffic
Status: Reported March 2024, affects tigera operator deployments
Reference: GitHub Issue #8636

Issue #7638: IPv6 VXLAN ICMP works but TCP fails (NOT our issue - see resolution below)

Symptom: Ping works across nodes but curl/telnet fails
Affects: IPv6 VXLAN mode with NAT
Status: Known bug, no clear resolution
Reference: GitHub Issue #7638

✅ RESOLUTION: IPv6 HTTPS Works - Issue Was CoreDNS Configuration (2026-01-17)

The Problem Was NOT Calico - It Was Our DNS Configuration!

After extensive testing, we discovered that:

✅ IPv6 TCP connections work perfectly (netcat succeeds)
✅ IPv6 HTTPS works perfectly when using IPv6 addresses directly
✅ IPv6 HTTPS works perfectly when DNS returns AAAA records
❌ IPv6 HTTPS failed ONLY because CoreDNS was blocking AAAA queries

Root Cause Analysis:

We had added a CoreDNS template to block AAAA queries:

template IN AAAA . {
    rcode NOERROR
}

This caused:

Applications requesting IPv6 addresses got "NOERROR" with no data
curl -6 https://pypi.org failed with "Could not resolve host"
Rust's uv tool failed with "dns error: failed to lookup address information"
This gave the FALSE IMPRESSION that IPv6 networking was broken

The Fix:

Remove the AAAA blocking template from CoreDNS ConfigMap:

# Backup current config
kubectl get cm coredns -n kube-system -o yaml > /tmp/coredns-backup.yaml

# Edit and remove the template section
kubectl edit cm coredns -n kube-system
# (Remove lines 26-29: the template IN AAAA block)

# Restart CoreDNS
kubectl rollout restart deployment/coredns -n kube-system

Verification:

# DNS now returns AAAA records
kubectl run test --image=busybox --rm -it --restart=Never -- nslookup pypi.org
# Should show both A and AAAA records

# IPv6 HTTPS works
kubectl run test --image=nicolaka/netshoot --rm -it --restart=Never -- \
  curl -6 -I https://pypi.org/simple/
# Should return HTTP/2 200

Test Results After Fix:

[ra]     IPv6:✓ IPv4:✓
[babel]  IPv6:✓ IPv4:✓
[anubis] IPv6:✓ IPv4:✓

Chronicle Impact: Chronicle backend (using python:3.12-slim-bookworm / Debian glibc) works perfectly:

Query DNS for pypi.org → gets both IPv4 and IPv6 addresses ✅
uv tries IPv6 first (Happy Eyeballs) → succeeds ✅
Successfully downloads dependencies from PyPI ✅
Falls back to IPv4 if IPv6 fails ✅

VERIFIED: Tested with Chronicle's exact base image (python:3.12-slim-bookworm):

# Python DNS resolution
python3 -c "import socket; socket.getaddrinfo('pypi.org', 443)"
# Returns 24 addresses (IPv4 + IPv6) ✅

# pip download test
pip3 download httpx
# Successfully downloads ✅

⚠️ Important: musl vs glibc

glibc-based images (Debian, Ubuntu, RHEL): IPv6 DNS works ✅
- python:3.x, python:3.x-slim, ubuntu:*, debian:*
- Chronicle uses this → will work ✅
musl-based images (Alpine): IPv6 DNS broken ❌
- python:3.x-alpine, alpine:*, nicolaka/netshoot
- Known musl issue with dual-stack getaddrinfo()
- Workaround: Switch to glibc base image or force IPv4-only

This is why our test pods (netshoot/Alpine) showed failures but Chronicle (Debian/glibc) will work.

References:

Cilium Issue #35489 - Similar IPv6 MTU issue resolution
Google Cloud troubleshooting - Kubernetes networking diagnostics
Calico Issue #6877 - IPv6 cross-node access

IPv6 natOutgoing Default Behavior

Discovery: Calico does NOT enable natOutgoing for IPv6 by default, unlike IPv4.

From Issue #2954:

When deploying Calico on Kubernetes in IPv6 or dual-stack mode, the default-ipv6-ippool does not get "natOutgoing: true" unlike the IPv4 pool. Calico does not provide an outgoing nat rule for ipv6 traffic by default.

Verification commands:

# Check if IPv6 NAT rules exist
ip6tables -t nat -nvL cali-nat-outgoing 2>/dev/null

# Should show MASQUERADE rules for fdf9:6e82:b78e::/48
# If empty, IPv6 NAT is not working

Our status: We manually set natOutgoing: true in the IPv6 pool, but TCP still fails.

Potential Workarounds Not Yet Tried

1. Disable IPv6 Per-Pod Using Init Container

For specific workloads (like Chronicle) that struggle with IPv6, disable IPv6 at the pod level without affecting other pods or nodes.

Method 1: Init container with sysctl

spec:
  initContainers:
  - name: disable-ipv6
    image: busybox
    command:
    - /bin/sh
    - -c
    - |
      sysctl -w net.ipv6.conf.all.disable_ipv6=1
      sysctl -w net.ipv6.conf.default.disable_ipv6=1
    securityContext:
      privileged: true
  containers:
  - name: chronicle-backend
    # ... rest of spec

Method 2: CNI tuning plugin

# Add to pod annotations
metadata:
  annotations:
    k8s.v1.cni.cncf.io/networks: |
      [
        {
          "name": "tuning-ipv6-disable",
          "type": "tuning",
          "sysctl": {
            "net.ipv6.conf.all.disable_ipv6": "1",
            "net.ipv6.conf.default.disable_ipv6": "1"
          }
        }
      ]

References:

Pros:

Surgical fix - only affects problematic workloads
Doesn't disable cluster-wide IPv6
Preserves dual-stack for other services

Cons:

Requires privileged containers or CNI plugin support
Doesn't fix underlying Calico bug

2. Check ip6tables NAT Rules

Verify Calico is actually setting up IPv6 NAT rules:

# On node (anubis)
sudo ip6tables -t nat -L -n -v | head -50
sudo ip6tables -t nat -L POSTROUTING -n -v
sudo ip6tables -t nat -L cali-nat-outgoing -n -v 2>/dev/null

Expected: Should see MASQUERADE rules for fdf9:6e82:b78e::/48 (our IPv6 pod CIDR)

If missing: Calico Felix is not properly configuring IPv6 NAT despite env vars

3. Verify Calico Version and Consider Upgrade

Current version: Calico v3.28.1

Latest stable: Check if v3.29+ or v3.30+ has fixes for IPv6 VXLAN issues

Consideration: Recent versions (v3.29.3) show NEW issues with IPv6 dual-stack, so upgrade may introduce different problems.

Next Steps

Option 1: Fix Chronicle Image (RECOMMENDED)

Build a Chronicle image that truly has all dependencies pre-installed:

Pre-install all Python packages during image build
Pre-download spacy models during image build
Don't use uv install or pip install at runtime
The image should start the FastAPI server immediately without any downloads

Option 2: Force IPv4 in Chronicle Container

Add environment variables to Chronicle deployment:

env:
- name: FORCE_IPV4
  value: "1"
- name: UV_NO_CACHE
  value: "1"

Caveat: May not work with Rust networking stack

Option 3: Set up PyPI Mirror (if needed)

If runtime downloads are unavoidable:

Deploy devpi or bandersnatch in the cluster
Configure UV_INDEX_URL to point to local mirror
Ensures IPv4-only connectivity to package index

Architecture Impact

Network Flow

Pod with dual-stack IPs
├─ IPv4: 10.1.236.x/24
│  └─ NAT → Internet (works ✓)
└─ IPv6: fdf9:6e82:b78e::/48
   └─ NAT → Internet (works ✓)

DNS (CoreDNS)
├─ A records (IPv4) → Returned ✓
└─ AAAA records (IPv6) → Blocked (template returns NOERROR)

IPAM Responsibility

Kubernetes node-ipam-controller: Allocates pod CIDR ranges to nodes
Calico IPAM: Assigns specific IPs to pods from those ranges
Both working together for dual-stack

Configuration Files Changed

/var/snap/microk8s/current/args/kube-apiserver
/var/snap/microk8s/current/args/kube-proxy
/var/snap/microk8s/current/args/kube-controller-manager
calico-config ConfigMap in kube-system namespace
default-ipv6-ippool IPPool (updated vxlanMode)

Rollback Procedure

If dual-stack needs to be disabled:

# 1. Restore backup configs on anubis
sudo cp /var/snap/microk8s/current/args/kube-apiserver.backup /var/snap/microk8s/current/args/kube-apiserver
sudo cp /var/snap/microk8s/current/args/kube-proxy.backup /var/snap/microk8s/current/args/kube-proxy
sudo cp /var/snap/microk8s/current/args/kube-controller-manager.backup /var/snap/microk8s/current/args/kube-controller-manager

# 2. Revert Calico CNI config
kubectl edit cm calico-config -n kube-system
# Remove "assign_ipv6": "true"

# 3. Restart Calico
kubectl delete pod -n kube-system -l k8s-app=calico-node

# 4. Restart MicroK8s
sudo microk8s stop
sudo microk8s start

Key Learnings

IPv6 dual-stack requires coordination across 5 components: kube-apiserver, kube-proxy, kube-controller-manager, Calico IP pools, and Calico CNI
The node-ipam-controller is essential: Without --allocate-node-cidrs=true, no pod CIDRs are assigned
Calico IPAM must be told to assign IPv6: Even with pools configured, CNI needs assign_ipv6: true
VXLAN mode must match: IPv4 and IPv6 pools must use same vxlanMode
Feature gates are version-specific: IPv6DualStack gate was removed in K8s 1.23
Image build matters: Even perfect networking won't fix images that download at runtime

Maintenance Notes

Dual-stack is now permanent - All new pods will get both IPv4 and IPv6
No node restart needed - Calico handles IP assignment dynamically
CoreDNS still blocks AAAA queries - This is intentional to avoid Rust tool issues
Monitor Chronicle image builds - Ensure dependencies are truly pre-installed

FilesExpand file tree

IPV6_DUALSTACK_CONFIGURATION.md

Latest commit

History