Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

smtstate.service fails to offline CPUs #106

Open
Gelbpunkt opened this issue Mar 17, 2025 · 8 comments
Open

smtstate.service fails to offline CPUs #106

Gelbpunkt opened this issue Mar 17, 2025 · 8 comments

Comments

@Gelbpunkt
Copy link

Gelbpunkt commented Mar 17, 2025

Hi, I am running an IBM POWER8 S822L system with OPAL and Fedora 41. I want to persist the SMT state (SMT=1) to be able to use libvirtd.
I ran ppc64_cpu --smt off and then smtstate --save:

$ cat /var/lib/powerpc-utils/smt.state
# smt.state configures the SMT default value for next
# boots.

# SMT state. If smt_init.service is enabled, this value will be
# used to set SMT in the next boot. If smtd.service is enabled,
# this value will be updated whenever ppc64_cpu --smt=<value>
# is issued.
SMT_VALUE=1

Then I ran systemctl enable smtstate.service, but upon next boot it fails and SMT is still set to 8:

$ sudo systemctl status smtstate
× smtstate.service - SMT automatic initialization service
     Loaded: loaded (/usr/lib/systemd/system/smtstate.service; enabled; preset: disabled)
    Drop-In: /usr/lib/systemd/system/service.d
             └─10-timeout-abort.conf, 50-keep-warm.conf
     Active: failed (Result: exit-code) since Mon 2025-03-17 02:14:42 CET; 5min ago
 Invocation: 4f3229c3e07c4c97a5f17b84997d6fd8
       Docs: man:smtstate(8)
    Process: 2611 ExecStart=/usr/sbin/smtstate --load (code=exited, status=1/FAILURE)
   Main PID: 2611 (code=exited, status=1/FAILURE)
   Mem peak: 27.1M
        CPU: 33ms

Mar 17 02:14:44 sirius smtstate[2622]: One or more cpus could not be on/offlined
Mar 17 02:14:42 sirius systemd[1]: Starting smtstate.service - SMT automatic initialization service...
Mar 17 02:14:42 sirius systemd[1]: smtstate.service: Main process exited, code=exited, status=1/FAILURE
Mar 17 02:14:42 sirius systemd[1]: smtstate.service: Failed with result 'exit-code'.
Mar 17 02:14:42 sirius systemd[1]: Failed to start smtstate.service - SMT automatic initialization service.

But when I manually run ppc64_cpu --smt off it works. I assume this is ran too early? But even if I change the unit to multi-user.target it still fails.

@hramrach
Copy link
Contributor

hramrach commented Mar 17, 2025

There is some timeout so if you have many CPUs this may fail. However, if your SMT remains at 8 then there is likely a different problem.

I am not familiar with smtstate.service, and what it does exactly. In any case your logs do not show whatever error happened.

For this purpose I use the smt_off.service which just unconditionally sets SMT to 1 which is more efficient and less error-prone.

@hramrach
Copy link
Contributor

Also with recent kernels you could pass smt=1 or something to the kernel bu it's not possible to pass kernel parameters when automatically booting using the opal firmware.

@Gelbpunkt
Copy link
Author

Gelbpunkt commented Mar 17, 2025

There is some timeout so if you have many CPUs this may fail. However, if your SMT remains at 8 then there is likely a different problem.

The system only has 20 CPUs.

For this purpose I use the smt_off.service which just unconditionally sets SMT to 1 which is more efficient and less error-prone.

smt_off fails with the same error

Also with recent kernels you could pass smt=1 or something to the kernel bu it's not possible to pass kernel parameters when automatically booting using the opal firmware.

I've tried that as well, but it breaks boot entirely and the system won't ever come up. The ASMI logs System Hypervisor Firmware errors all the time and serial console spits out nothing after kexec

@Gelbpunkt
Copy link
Author

I assume ppc64_cpu --smt=1 aslo works?

Yes. I can also get smt_off.service to work in general if I add an ExecStartPre that sleeps for 120s, but apparently it can't run early at boot

@tyreld
Copy link
Member

tyreld commented Mar 20, 2025

Ok, I am in no way a systemd expert. Both of these services were added with limited scope, and are not regularly tested. I mostly forgot they are even in the code base aside from the faint long time ago memory of when the smt_off service was added to accomplish exactly what you are trying to do with libvirtd.

@hramrach
Copy link
Contributor

I tried mounting tmpfs over sysfs to simulate not having it mounted and I get a different error:

Could not determine system cpu/thread information.

@hramrach
Copy link
Contributor

In fact I don't know where "One or more cpus could not be on/offlined" would come from

@hramrach
Copy link
Contributor

it was changed to uppercase in ae2cfe7

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants