Skip to content

Conversation

last-las
Copy link
Contributor

@last-las last-las commented Sep 1, 2025

cgroup v2 doesn't have a file similar to cpuacct.usage_all. Therefore, the proc_stat_read in cgroup v2 always falls back to the host's /proc/stat. As a result, the CPU count reported by this file is incorrect, e.g. issues #593 and #654.

However, the similar proc_cpuinfo_read function does report the right CPU counts. It works by filtering the host's file using both cpuset.cpus and cpu.max. This patch fixes proc_stat_read by using the same method.

@stgraber
Copy link
Member

stgraber commented Sep 1, 2025

@last-las can you edit your commit message to include the required Signed-off-by line? (see CONTRIBUTING.md)

@last-las
Copy link
Contributor Author

last-las commented Sep 2, 2025

@last-las can you edit your commit message to include the required Signed-off-by line? (see CONTRIBUTING.md)

done!

@stgraber stgraber requested a review from mihalicyn September 2, 2025 03:17
@stgraber
Copy link
Member

stgraber commented Sep 2, 2025

@mihalicyn can you take a look?

@divinity76
Copy link

divinity76 commented Oct 6, 2025

on cgroups v2 only, host cpu usage is bleeding into container htop cpu bar, and i think (but haven't tested) that this will fix it 👍

(this does not happen on cgroups v1. on proxmox8 i was switching to cgroups v1 to avoid this issue, but proxmox9 dropped support for cgroups v1)

@mihalicyn
Copy link
Member

Hi @last-las,

thanks for working on this and preparing a PR.

I was thinking about this way too, before. But my concern was and is that if we handle this like that, what we end up with is incorrect /proc/stat statistics anyways. Yes, CPUs number will be correct with this applied, but all the parameters:

			   "%*s"        /* <skip> */
			   " %" PRIu64  /* user */
			   " %" PRIu64  /* nice */
			   " %" PRIu64  /* system */
			   " %" PRIu64  /* idle */
			   " %" PRIu64  /* iowait */
			   " %" PRIu64  /* irq */
			   " %" PRIu64  /* softirq */
			   " %" PRIu64  /* steal */
			   " %" PRIu64  /* guest */
			   " %" PRIu64, /* guest_nice */

will still be taken from a host, isn't it?

So the question is then should we instead go and make a proper kernel support for that CPU time accounting (as it was in cgroupv1) or just apply this half-measure?

Could you elaborate why this cpu-filtering-only solution even works for you? What kind of use-case do you have for this /proc/stat thing inside the container? For me it looks like we instead of giving a user a wrong information now will be giving the user another wrong information. :)

@mihalicyn
Copy link
Member

(this does not happen on cgroups v1. on proxmox8 i was switching to cgroups v1 to avoid this issue, but proxmox9 dropped support for cgroups v1)

Hi @divinity76,

please can you also share your use-case for this /proc/stat thing inside the container? Do you have some kind of monitoring daemon inside? This feedback is very valuable for us so we can priorities things that we should work on to improve LXC containers experience in general.

Copy link
Member

@mihalicyn mihalicyn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, if it helps users I see no problem merging this after we discuss use cases.

@divinity76
Copy link

divinity76 commented Oct 7, 2025

(this does not happen on cgroups v1. on proxmox8 i was switching to cgroups v1 to avoid this issue, but proxmox9 dropped support for cgroups v1)

Hi @divinity76,

please can you also share your use-case for this /proc/stat thing inside the container? Do you have some kind of monitoring daemon inside? This feedback is very valuable for us so we can priorities things that we should work on to improve LXC containers experience in general.

virtualized HestiaCP instances trigger high-cpu-usage-warning-emails to the registered HestiaCP admin when the hypervisor CPU usage is high, it shouldn't. Ideally, HestiaCP should not see the hypervisor cpu usage at all.

htop also report the host cpu usage, not the virtual cpu usage, when humans inspect manually.

this does not happen with cgroups v1.

@stgraber stgraber merged commit e2973d1 into lxc:main Oct 7, 2025
21 checks passed
@last-las
Copy link
Contributor Author

last-las commented Oct 9, 2025

sry for the late

So the question is then should we instead go and make a proper kernel support for that CPU time accounting (as it was in cgroupv1) or just apply this half-measure?

I agree that the perfect solution would be to support it in the kernel.

Could you elaborate why this cpu-filtering-only solution even works for you? What kind of use-case do you have for this /proc/stat thing inside the container? For me it looks like we instead of giving a user a wrong information now will be giving the user another wrong information. :)

  • Some applications (e.g. nodejs) in our cluster are using /proc/stat to determine the available cpu counts, which could cause OOMs under cgroup v2;
  • Like you said, the rest of the work has to be done in the kernel, but I think the patch fixes some of the problems, or at least it doesn't make anything worse.

@last-las last-las deleted the fix-proc-stat-read branch October 10, 2025 00:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

4 participants