Skip to content

KEP #5309: first draft of Self-Orchestrating Pod KEP #5351

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

SergeyKanzhelev
Copy link
Member

  • One-line PR description:

Self-orchestrating Pod is a proposed new concept. This is a first draft of the KEP

/sig node

@k8s-ci-robot k8s-ci-robot added sig/node Categorizes an issue or PR as relevant to SIG Node. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory labels May 29, 2025
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: SergeyKanzhelev
Once this PR has been reviewed and has the lgtm label, please assign derekwaynecarr, wojtek-t for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. label May 29, 2025
@ArangoGutierrez
Copy link
Contributor

/cc

the specified container in the Pod.
- Declare the communication protocol between the kubelet and a container
that is versioned and extensible.
- Declare enough primitives to satisfy two scenarios:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the goal of this KEP limited to terminating or restarting containers? If so, wouldn't sharing the PID namespace be sufficient to achieve that?


- Reduced API Server Load: Workloads manage their own supporting containers
without frequent API Server interactions.
- Fine-Grained Workload Control: Pods can create and terminate sub-containers
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The usecases below don't talk about creating - rather about terminating and restarting containers.

Creation becomes much more tricky I think - do we need to include it?

and lifecycle management. However, certain advanced use cases require
self-managed, dynamic pod orchestration within a node while minimizing direct
API Server interactions. This KEP proposes Self-Orchestrating Pods (SOPs), a
mechanism that allows a pod to create, manage, and terminate its
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The usecase below don't talk about "create".

Comment on lines +71 to +72
The KEP introduces the communication channel between kubelet and a container in
a Pod, which may be extended to a lot of other scenarios in future.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not fully understand this KEP, maybe is because it requires some context:, why it is required a channel with the kubelet since is self orchestrated, is the process in the pod that instructs the kubelet to create a container in its own pod? why it does not run new processes instead of containers

Comment on lines +100 to +101
- Reduced API Server Load: Workloads manage their own supporting containers
without frequent API Server interactions.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find odd the workload can work in isolation without having to connect back to the apiserver to get state

Comment on lines +120 to +125
- Declare enough primitives to satisfy two scenarios:
- Sidecar to be able to terminate the main container in the Pod effectively
stopping the Job execution.
- Sidecar to be able to restart the main container and receive a signal that it was
restarted. This will allow in-place restart of a single Pod of a large training job to
restart the job from the last checkpoint.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can not this be done today, process can kill other process within the conttainer, right?

## Proposal

The overall idea will be to expose the gRPC endpoint from the container
and declare it in the container spec. Kubelet will connect to this endpoint
Copy link
Member

@aojea aojea May 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this guaranteed? are these pods network pods or only host network pods? what happens with runtimes that are more complex like kata or gvisor?

I see > ### Error handling section also touches on this

ports:
- containerPort: 50051
podManagement:
port: 50051
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

everyone will be able to connect to this port, right?

// Response for command stream.
message CommandResponse {
oneof commandResponse {
TerminateContainerCommandResponse terminate_container_response = 1;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

stderr + stdout?

Comment on lines +251 to +260
1. The sidecar container orchestrates the job. Job is a heavy process requiring
special GPU hardware connected with other Pod.
2. The sidecar receives the signal that the job should be abruptly terminated
and started from the beginning.
3. Instead of terminating the whole Pod, sidecar issues a command to kubelet to
restart a specific container.
4. Kubelet will report back when the container is restarted.
5. Sidecar may need to keep other sidecar containers running or have them also
be restarted, depending on the function of that sidecar container. Ordering
of requests to the kubelet to restart things will be a sidecar decision.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we run today containers in Pods, why is not an alternative for the Pod to run docker in docker or something like this and handle the entire container lifecycle itself instead of bringing this back to the kubelet?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory sig/node Categorizes an issue or PR as relevant to SIG Node. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants