-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Description
Describe the bug
Dgraph Alpha nodes running v25.0.0 experience unbounded heap growth leading to repeated OOMKill events in Kubernetes. The Go garbage collector's target heap size (next_gc) grows beyond the container's 20Gi memory limit because:
- No GOMEMLIMIT is set — Go has no awareness of the container's memory ceiling.
- GOGC=100 (default) — GC triggers at 2× live heap, so with ~10GB live data the GC goal reaches ~20GB, exceeding the 20Gi cgroup limit.
- Posting cache size bugs — v25.0.0 has two known bugs where the posting list cache underestimates entry sizes (fix(cache): Estimate size of posting lists #9515) and does not enforce the max cost limit (fix(cache): make updating the max cost of posting cache work again #9526), causing the cache to consume far more memory than the configured --cache size-mb=4096.
- GOMAXPROCS=128 — Dgraph v25.0.0 hardcodes GOMAXPROCS from the node CPU count (128) rather than the container's CPU limit (6), increasing scheduling overhead and memory fragmentation.
Alphas OOMKill in rotation: alpha-2 was killed on 2026-03-06 at 19:10 UTC, alpha-0 on 2026-03-02, and alpha-1's GC goal currently sits at 21.00GB (above the 20Gi limit), making it the next to be killed. This has been happening repeatedly.
To Reproduce
- Deploy Dgraph v25.0.0 alpha StatefulSet with --cache size-mb=4096, memory limit of 20Gi, and no GOMEMLIMIT/GOGC env vars.
- Allow normal production query and mutation workload to run over days.
- Observe go_memstats_heap_alloc_bytes growing steadily due to the posting cache bug exceeding its configured budget.
- With GOGC=100, the GC goal (go_memstats_next_gc_bytes) reaches 2× live heap (~18-22GB), exceeding the 20Gi container limit.
- Kubernetes OOMKills the alpha (exit code 137, reason: OOMKilled). The pattern rotates across alphas as load shifts after each restart from time to time.
Expected behavior
The Go garbage collector should trigger frequently enough to keep heap usage well within the 20Gi container memory limit. The posting list cache should respect the configured --cache size-mb=4096 (4GB) budget and not grow unboundedly.
Screenshots
Prometheus metrics captured on 2026-03-06 ~20:55 UTC:
| Alpha | Heap Live | GC Goal (next_gc) | RSS | Container Limit | Status |
|---|---|---|---|---|---|
| alpha-1 | 12.75 GB | 21.00 GB | 16.73 GB | 20Gi | GC goal exceeds limit |
| alpha-0 | 9.20 GB | 18.32 GB | 17.02 GB | 20Gi | High OOM risk |
| alpha-2 | 4.17 GB | 6.86 GB | 9.54 GB | 20Gi | Recovering (OOMKilled 1h47m prior) |
Environment
• OS: Linux (GKE nodes: Container-Optimized OS, c4d-standard-8 — 8 vCPU, 31GB RAM)
• Orchestration: Kubernetes (GKE cluster)
• Language: Go (toolchain v1.24, bundled with Dgraph v25.0.0)
• Dgraph Version: v25.0.0
• Go runtime config: GOGC=100 (default), GOMEMLIMIT=not set, GOMAXPROCS=128 (hardcoded by Dgraph from node CPUs)
• Container resources: requests cpu=4 / memory=16Gi, limits cpu=6 / memory=20Gi
• Dgraph flags: --cache size-mb=4096, --raft snapshot-after-entries=100000, --limit mutations=strict
Additional context
• Posting cache hit ratios: posting list 70.6%, block cache 93.5% — cache is actively used but unbounded growth defeats the purpose.