Skip to content

KEP-2570: update v1.36 memory protection details#5977

Open
QiWang19 wants to merge 1 commit intokubernetes:masterfrom
QiWang19:doc-memlow-2570
Open

KEP-2570: update v1.36 memory protection details#5977
QiWang19 wants to merge 1 commit intokubernetes:masterfrom
QiWang19:doc-memlow-2570

Conversation

@QiWang19
Copy link
Copy Markdown
Contributor

  • One-line PR description: update v1.36 MemoryQoS cgroup v2 memory protection details

@k8s-ci-robot
Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: QiWang19
Once this PR has been reviewed and has the lgtm label, please assign dchen1107 for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Mar 24, 2026
@k8s-ci-robot k8s-ci-robot added kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory sig/node Categorizes an issue or PR as relevant to SIG Node. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Mar 24, 2026
@k8s-ci-robot
Copy link
Copy Markdown
Contributor

Hi @QiWang19. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work.

Tip

We noticed you've done this a few times! Consider joining the org to skip this step and gain /lgtm and other bot rights. We recommend asking approvers on your previous PRs to sponsor you.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label Mar 24, 2026
@QiWang19
Copy link
Copy Markdown
Contributor Author

@sohankunkerkar PTAL

@QiWang19 QiWang19 closed this Mar 25, 2026
@QiWang19 QiWang19 reopened this Mar 25, 2026
### Future Considerations (Beta candidates)

1. **Tiered memory protection (memory.low for Burstable/BestEffort)**: Currently, `memory.min` (hard protection) is used for all QoS classes. If Alpha v3 feedback or benchmarks show that padded requests cause excessive OOM thrash, implement a tiered approach for Beta: `memory.min` for Guaranteed pods, `memory.low` (soft protection) for Burstable/BestEffort. This allows the kernel to reclaim unused memory under pressure while still providing best-effort protection. Tracked in [kubernetes/kubernetes#131077](https://github.com/kubernetes/kubernetes/issues/131077).
1. **Tiered memory protection follow-up (BestEffort)**: Alpha v1.36 already uses `memory.min` for Guaranteed and `memory.low` for Burstable. A potential follow-up for later phases is evaluating whether BestEffort-specific `memory.low` memory protection should be added. Tracked in [kubernetes/kubernetes#131077](https://github.com/kubernetes/kubernetes/issues/131077)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can remove this part completely. BestEffort pods have no memory request (requests.memory is not set). There's nothing to protect. memory.min and memory.low are set to requests.memory, and BestEffort has zero requests, so no protection.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed the whole section Fture Considerations (Beta candidates). The benchmark testing is already properly documented in the "Beta Graduation" section

/cgroup2/kubepods/pod<UID>/memory.min=sum(pod.spec.containers[i].resources.requests[memory]) // Guaranteed
/cgroup2/kubepods/burstable/pod<UID>/memory.low=sum(pod.spec.containers[i].resources.requests[memory]) // Burstable
// QoS ancestor cgroup
/cgroup2/kubepods/burstable/memory.min=sum(pod[i].spec.containers[j].resources.requests[memory])
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be:

 // QoS ancestor cgroups
  /cgroup2/kubepods/memory.min=sum(guaranteed_requests + burstable_requests)  // parent covers all children
  /cgroup2/kubepods/burstable/memory.low=sum(burstable_pod_requests)          // soft protection
  /cgroup2/kubepods/burstable/memory.min=0                                    // no hard protection for burstable

- Upgrade: Enabling MemoryQoS on a running kubelet correctly sets memory.min/memory.high on new pods and updates node-level cgroups
- Rollback: Disabling MemoryQoS resets memory.min to 0 and memory.high to max for all managed cgroups
- Upgrade: Enabling `MemoryQoS`, `memoryReservationPolicy: TieredReservation` on a running kubelet correctly sets `memory.min`/`memory.low`/`memory.high` on new pods and updates node-level cgroups
- Rollback: Disabling MemoryQoS reconciles QoS class level memory.min/memory.low to 0.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You might need to add this--> pod-level memory.min/memory.low values are not reset during rollback because reconcilePodMemoryProtection is gated behind the MemoryQoS feature gate. When the gate is off, the reconciler doesn't run, so per-pod values persist. However, they become ineffective because the parent QoS cgroup has memory.min=0/memory.low=0 (kernel cgroup hierarchy rule: child protection can't exceed parent).

Copy link
Copy Markdown
Contributor Author

@QiWang19 QiWang19 Mar 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sohankunkerkar For the review #5977 (comment), are you also suggesting we drop this line 506

/cgroup2/kubepods/memory.min=sum(pod[i].spec.containers[j].resources.requests[memory])

Signed-off-by: Qi Wang <qiwan@redhat.com>
@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Mar 27, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. sig/node Categorizes an issue or PR as relevant to SIG Node. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants