Skip to content

Conversation

@kad
Copy link
Contributor

@kad kad commented Sep 15, 2025

This change introduces more generic CPUAffinity property of Process to specify desired CPU affinities while performing operations on create, start and exec operations.

As it was originally discussed in PR #1253, the existing implementation covers only exec usecase, where setting affinity for OCI hooks and initial container process will benefit wider set of workloads.

@giuseppe
Copy link
Member

@kolyshkin PTAL

@cyphar
Copy link
Member

cyphar commented Sep 15, 2025

Given issues like golang/sys#259, I think it might be nice to have a way to indicate "all CPUs" as a way to reset affinity (but without doing 0-1024, which is ~300x slower than the memset approach). In runc we implicitly do this now, but users might want to reset affinity for other stages explicitly.

@kad
Copy link
Contributor Author

kad commented Sep 15, 2025

Given issues like golang/sys#259, I think it might be nice to have a way to indicate "all CPUs" as a way to reset affinity (but without doing 0-1024, which is ~300x slower than the memset approach). In runc we implicitly do this now, but users might want to reset affinity for other stages explicitly.

theoretically, the higher-level runtime that generates OCI spec might be feeling it with right number based on detected system information. e.g. use result of sched_getaffinity(2) of parent process

@haircommander
Copy link

fyi @bitoku

@cyphar
Copy link
Member

cyphar commented Sep 18, 2025

@kad

theoretically, the higher-level runtime that generates OCI spec might be feeling it with right number based on detected system information. e.g. use result of sched_getaffinity(2) of parent process

My experience is that this rarely happens -- usually higher-level runtimes either hide new knobs like this (requiring you to specify patch config.json through experimental or unsupported hacks) or they transparently forward the values to the lower-level runtime without adding new functionality. Even if they do implement it, there is no guarantee that the behaviour or syntax will be standardised between runtimes, which leads to more problems than it solves.

Given that we have had seen practical issues with container runtimes being spawned with suboptimal CPU affinity values, I would suggest that having an "all" or "max" special value would be a good idea.

Also you do not need to detect anything with sched_getaffinity(2) for this -- you just need to memset(&cpuset, 0xFF, sizeof(cpuset)) to reset the affinity to the maximum possible value. In fact, you don't want to use sched_getaffinity(2) or Go's runtime.NumCpu because they give you the current affinity which is precisely the value you don't want.

Copy link
Member

@cyphar cyphar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, when it comes to formatting -- in the OCI specs we do not hard-wrap lines at N columns. Each complete sentence should be one line (this makes diffs easier to read).

@kad
Copy link
Contributor Author

kad commented Sep 22, 2025

@kad

theoretically, the higher-level runtime that generates OCI spec might be feeling it with right number based on detected system information. e.g. use result of sched_getaffinity(2) of parent process

My experience is that this rarely happens -- usually higher-level runtimes either hide new knobs like this (requiring you to specify patch config.json through experimental or unsupported hacks) or they transparently forward the values to the lower-level runtime without adding new functionality. Even if they do implement it, there is no guarantee that the behaviour or syntax will be standardised between runtimes, which leads to more problems than it solves.

my point was that upper level runtime might be pinned to subset of CPUs (e.g. "infra reserved" partition of the system). OCI runtime when spawned from it, inherits affinity from parent process, thus sched_getaffinity early at OCI runtime start should have real good value of parent runtime.

Given that we have had seen practical issues with container runtimes being spawned with suboptimal CPU affinity values, I would suggest that having an "all" or "max" special value would be a good idea.

I don't mind adding special value all, I see it valuable, would update PR.
However, I'm thinking of additional special values, e.g. default which will be result sched_getaffinity to reset to inherited default affinity? Maybe online that would be same as content in /sys/devices/system/cpu/online, but suspect that in some scenarios sysfs might be not always available....

Also you do not need to detect anything with sched_getaffinity(2) for this -- you just need to memset(&cpuset, 0xFF, sizeof(cpuset)) to reset the affinity to the maximum possible value. In fact, you don't want to use sched_getaffinity(2) or Go's runtime.NumCpu because they give you the current affinity which is precisely the value you don't want.

having all bits set is also might be considered unwanted. As I mentioned, there are setups where runtimes containerd/cri-o are restricted to subset of CPUs, and in those setups it might be unwanted if lower layer OCI runtimes would be using CPU time from other cores.

@kolyshkin
Copy link
Contributor

kolyshkin commented Sep 24, 2025

A naive question -- is there a use case when we want different CPU affinities for different OCI hooks?

@cyphar
Copy link
Member

cyphar commented Sep 25, 2025

@kolyshkin I think the point is to have the affinity change at different stages rather than it be hook-specific -- hooks will probably inherit them but hooks can also set their own affinity if they want to. The naming is similar to hooks because both correspond to runtime lifecycle stages.

@kad
Copy link
Contributor Author

kad commented Nov 26, 2025

@kolyshkin @giuseppe @cyphar update pushed with following changes, as suggested:

  • formating of Markdown fixed
  • introduced special value all
  • names of fields renamed to match hooks naming schema
  • codespell fixes (sigh)

@kad kad requested a review from cyphar November 26, 2025 15:22
This change introduces more generic `cpuAffinity` property of `Process`
to specify desired CPU affinities while performing operations on create,
start and exec operations.

Signed-off-by: Alexander Kanevskiy <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants