You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Description
Salt Minion scheduler (salt-minion v3006.X) on Ubuntu 20.04, 24.04 stops executing scheduled jobs after running once or twice. The schedule initially appears correctly upon restarting the service (systemctl restart salt-minion) but then disappears silently from the runtime (salt-call schedule.list becomes empty).
Restart the Salt Minion:
systemctl restart salt-minion
Verify the schedule is initially listed:
salt-call schedule.list show_all=True
Wait for one or two scheduled executions. After that, check again:
salt-call schedule.list show_all=True
Expected behavior
Schedules remain persistent and the scheduled jobs run continuously at defined intervals.
Salt Minion should terminate cleanly, including all child processes, preserving scheduler integrity across service restarts.
Actual Behavior:
Scheduled jobs vanish from runtime scheduler silently after running one or two times, stopping any further execution until minion is restarted.
Child processes become defunct (zombie processes), causing the scheduler to lose its state and stop executing scheduled tasks.
$ sudo salt-call schedule.list
local:
schedule:
highstate:
enabled: true
function: state.highstate
jid_include: true
maxrunning: 1
name: highstate
saved: true
seconds: 600
splay: 49
$ ll /etc/salt.lastcontact ; date
-rw-r--r-- 1 root root 14 Mar 12 03:36 /etc/salt.lastcontact
Wed Mar 12 03:38:54 UTC 2025
$ ll /etc/salt.lastcontact ; date
-rw-r--r-- 1 root root 14 Mar 12 03:46 /etc/salt.lastcontact
Wed Mar 12 03:48:23 UTC 2025
Here=, highstate ran twice and the schedule vanished
$ sudo salt-call schedule.list
local:
schedule: {}
Logs and Observations:
Observed Logs:
systemd[1]: salt-minion.service: State 'stop-sigterm' timed out. Killing.
systemd[1]: salt-minion.service: Killing process 1341396 (python3.10) with signal SIGKILL.
systemd[1]: salt-minion.service: Main process exited, code=killed, status=9/KILL
systemd[1]: salt-minion.service: Failed with result 'timeout'.
systemd[1]: salt-minion.service: Unit process 1341414 (/opt/saltstack/) remains running after unit stopped.
systemd[1]: Stopped salt-minion.service - The Salt Minion.
systemd[1]: salt-minion.service: Consumed 1min 6.565s CPU time, 329.1M memory peak, 0B memory swap peak.
systemd[1]: salt-minion.service: Found left-over process 1341414 (/opt/saltstack/) in control group while starting unit. Ignoring.
systemd[1]: salt-minion.service: This usually indicates unclean termination of a previous run, or service implementation deficiencies.
systemd[1]: Starting salt-minion.service - The Salt Minion...
systemd[1]: Started salt-minion.service - The Salt Minion.
salt-minion[1897212]: The Salt Minion is shutdown.
salt-minion[1341414]: Minion process encountered exception: [Errno 3] No such process
systemd[1]: salt-minion.service: Main process exited, code=exited, status=1/FAILURE
systemd[1]: salt-minion.service: Failed with result 'exit-code'.
systemd[1]: salt-minion.service: Consumed 1.030s CPU time.
systemd[1]: salt-minion.service: Scheduled restart job, restart counter is at 1.
systemd[1]: Starting salt-minion.service - The Salt Minion...
systemd[1]: Started salt-minion.service - The Salt Minion.
Scheduler initially loads correctly (salt-call --local config.get schedule shows correct schedules).
Logs reveal scheduler evaluates once or twice, runs the scheduled states, and then clears or loses schedule internally without obvious error logging.
Scheduler-related commands (schedule.reload, schedule.enable) don't restore vanished schedules.
Symptoms are exactly I see:
Schedule config is read at startup (config.get schedule is good).
Schedule briefly appears and even runs once or twice.
The scheduler then silently clears or forgets the schedule (schedule.list becomes empty).
Minion logs report no meaningful errors—it’s a silent runtime failure in the scheduler process.
Workarounds Attempted:
Renaming schedules to avoid special characters from the scheduler name. (Prior I had core|salt|highstate)
Increased the seconds from 600 to 1800 and splay to 500 seconds.
Added startup_splay 30seconds.
Clearing minion cache.
Increasing file descriptor limits.
Explicitly reloading schedules (schedule.reload).
Adjusted the upstream systemd service file /lib/systemd/system/salt-minion.service to reflect these improved configurations:
KillMode=mixed: Gracefully terminates all associated child processes. TimeoutStopSec=900: Provides sufficient time (15 minutes) for ongoing highstate jobs to complete gracefully. Restart=on-failure: Enables automatic recovery following unexpected exits. RestartSec=30: Implements a brief delay before service restart, ensuring stability.
However, none of them permanently resolve the issue.
Unstable Environments (Issue Occurs):
Ubuntu 24.04 with Salt 3006.X
Ubuntu 20.04 with Salt Minion versions: 3006.X
Versions Report
$ sudo salt-call --versions-reportSalt Version:
Salt: 3006.9Python Version:
Python: 3.10.14 (main, Jun 26 2024, 11:44:37) [GCC 11.2.0]Dependency Versions:
cffi: 1.17.1cherrypy: 18.6.1cryptography: 42.0.5dateutil: 2.8.1docker-py: Not Installedgitdb: Not Installedgitpython: Not InstalledJinja2: 3.1.4libgit2: Not Installedlooseversion: 1.0.2M2Crypto: Not InstalledMako: Not Installedmsgpack: 1.0.2msgpack-pure: Not Installedmysql-python: Not Installedpackaging: 22.0pycparser: 2.21pycrypto: Not Installedpycryptodome: 3.19.1pygit2: Not Installedpython-gnupg: 0.4.8PyYAML: 6.0.1PyZMQ: 23.2.0relenv: 0.17.0smmap: Not Installedtimelib: 0.2.4Tornado: 4.5.3ZMQ: 4.3.4System Versions:
dist: ubuntu 24.04.1 noblelocale: utf-8machine: x86_64release: 6.8.0-48-genericsystem: Linuxversion: Ubuntu 24.04.1 noble$ lsb_release -a; uname -aNo LSB modules are available.Distributor ID: UbuntuDescription: Ubuntu 24.04.1 LTSRelease: 24.04Codename: nobleLinux salt-master02 6.8.0-48-generic #48-Ubuntu SMP PREEMPT_DYNAMIC Fri Sep 27 14:04:52 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
Severity:
Critical: causes silent failure and configuration drift in production environments.
Additional Notes:
The issue strongly suggests compatibility problems specifically related to system libraries or runtime conditions in Ubuntu 20.04, 24.04.
Request:
Please prioritize investigating compatibility and scheduling behavior in Salt Minion running v3006.x
Let me know if additional logs or tests are required.
The text was updated successfully, but these errors were encountered:
Hi there! Welcome to the Salt Community! Thank you for making your first contribution. We have a lengthy process for issues and PRs. Someone from the Core Team will follow up as soon as possible. In the meantime, here’s some information that may help as you continue your Salt journey.
Please be sure to review our Code of Conduct. Also, check out some of our community resources including:
There are lots of ways to get involved in our community. Every month, there are around a dozen opportunities to meet with other contributors and the Salt Core team and collaborate in real time. The best way to keep track is by subscribing to the Salt Community Events Calendar.
If you have additional questions, email us at [email protected]. We’re glad you’ve joined our community and look forward to doing awesome things with you!
Description
Salt Minion scheduler (salt-minion v3006.X) on Ubuntu 20.04, 24.04 stops executing scheduled jobs after running once or twice. The schedule initially appears correctly upon restarting the service (systemctl restart salt-minion) but then disappears silently from the runtime (salt-call schedule.list becomes empty).
Affected Versions and Environment:
Salt Minion version: 3006.X
Operating System: Ubuntu 24.04, 20.04
Python version: 3.10, 3.8
Setup
On Ubuntu 20.04, 24.04, configure the schedule in
/etc/salt/minion.d/schedule.conf
:Steps to Reproduce the behavior
Restart the Salt Minion:
systemctl restart salt-minion
Verify the schedule is initially listed:
salt-call schedule.list show_all=True
Wait for one or two scheduled executions. After that, check again:
salt-call schedule.list show_all=True
Expected behavior
Schedules remain persistent and the scheduled jobs run continuously at defined intervals.
Salt Minion should terminate cleanly, including all child processes, preserving scheduler integrity across service restarts.
Actual Behavior:
Scheduled jobs vanish from runtime scheduler silently after running one or two times, stopping any further execution until minion is restarted.
Child processes become defunct (zombie processes), causing the scheduler to lose its state and stop executing scheduled tasks.
Logs and Observations:
Observed Logs:
Scheduler initially loads correctly (salt-call --local config.get schedule shows correct schedules).
Logs reveal scheduler evaluates once or twice, runs the scheduled states, and then clears or loses schedule internally without obvious error logging.
Scheduler-related commands (schedule.reload, schedule.enable) don't restore vanished schedules.
Symptoms are exactly I see:
Schedule config is read at startup (config.get schedule is good).
Schedule briefly appears and even runs once or twice.
The scheduler then silently clears or forgets the schedule (schedule.list becomes empty).
Minion logs report no meaningful errors—it’s a silent runtime failure in the scheduler process.
Workarounds Attempted:
Renaming schedules to avoid special characters from the scheduler name. (Prior I had core|salt|highstate)
Increased the seconds from 600 to 1800 and splay to 500 seconds.
Added startup_splay 30seconds.
Clearing minion cache.
Increasing file descriptor limits.
Explicitly reloading schedules (schedule.reload).
Adjusted the upstream systemd service file
/lib/systemd/system/salt-minion.service
to reflect these improved configurations:KillMode=mixed
: Gracefully terminates all associated child processes.TimeoutStopSec=900
: Provides sufficient time (15 minutes) for ongoing highstate jobs to complete gracefully.Restart=on-failure
: Enables automatic recovery following unexpected exits.RestartSec=30
: Implements a brief delay before service restart, ensuring stability.However, none of them permanently resolve the issue.
Unstable Environments (Issue Occurs):
Ubuntu 24.04 with Salt 3006.X
Ubuntu 20.04 with Salt Minion versions: 3006.X
Versions Report
Severity:
Critical: causes silent failure and configuration drift in production environments.
Additional Notes:
The issue strongly suggests compatibility problems specifically related to system libraries or runtime conditions in Ubuntu 20.04, 24.04.
Request:
Please prioritize investigating compatibility and scheduling behavior in Salt Minion running v3006.x
Let me know if additional logs or tests are required.
The text was updated successfully, but these errors were encountered: