Skip to content

Commit 325044b

Browse files
committed
Initial import of current spank plugins project to googlecode.
1 parent 8fd891c commit 325044b

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

73 files changed

+17561
-0
lines changed

COPYING

+340
Large diffs are not rendered by default.

ChangeLog

+516
Large diffs are not rendered by default.

DISCLAIMER

+24
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,24 @@
1+
This work was produced at the Lawrence Livermore National Laboratory
2+
(LLNL) under Contract No. DE-AC52-07NA27344 (Contract 44) between
3+
the U.S. Department of Energy (DOE) and Lawrence Livermore National
4+
Security, LLC (LLNS) for the operation of LLNL.
5+
6+
This work was prepared as an account of work sponsored by an agency of
7+
the United States Government. Neither the United States Government nor
8+
Lawrence Livermore National Security, LLC nor any of their employees,
9+
makes any warranty, express or implied, or assumes any liability or
10+
responsibility for the accuracy, completeness, or usefulness of any
11+
information, apparatus, product, or process disclosed, or represents
12+
that its use would not infringe privately-owned rights.
13+
14+
Reference herein to any specific commercial products, process, or
15+
services by trade name, trademark, manufacturer or otherwise does
16+
not necessarily constitute or imply its endorsement, recommendation,
17+
or favoring by the United States Government or Lawrence Livermore
18+
National Security, LLC. The views and opinions of authors expressed
19+
herein do not necessarily state or reflect those of the Untied States
20+
Government or Lawrence Livermore National Security, LLC, and shall
21+
not be used for advertising or product endorsement purposes.
22+
23+
The precise terms and conditions for copying, distribution, and
24+
modification are specified in the file "COPYING".

META

+4
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
Name: chaos-spankings
2+
Version: 0.34
3+
Release: 1
4+
Author: Mark Grondona <[email protected]>

Makefile

+44
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,44 @@
1+
2+
CFLAGS = -Wall -ggdb
3+
4+
all: renice.so \
5+
oom-detect.so \
6+
system-safe-preload.so system-safe.so \
7+
iotrace.so \
8+
tmpdir.so \
9+
auto-affinity.so \
10+
pty.so \
11+
addr-no-randomize.so \
12+
preserve-env.so \
13+
subdirs
14+
15+
SUBDIRS = use-env overcommit-memory cpuset
16+
17+
.SUFFIXES: .c .o .so
18+
19+
.c.o:
20+
$(CC) $(CFLAGS) -o $@ -fPIC -c $<
21+
.o.so:
22+
$(CC) -shared -o $*.so $< $(LIBS)
23+
24+
subdirs:
25+
@for d in $(SUBDIRS); do make -C $$d; done
26+
27+
system-safe-preload.so : system-safe-preload.o
28+
$(CC) -shared -o $*.so $< -ldl
29+
30+
auto-affinity.so : auto-affinity.o lib/split.o lib/list.o lib/fd.o
31+
$(CC) -shared -o $*.so auto-affinity.o lib/split.o lib/list.o -lslurm
32+
33+
preserve-env.so : preserve-env.o lib/list.o
34+
$(CC) -shared -o $*.so preserve-env.o lib/list.o
35+
36+
pty.so : pty.o
37+
$(CC) -shared -o $*.so $< -lutil
38+
39+
clean: subdirs-clean
40+
rm -f *.so *.o lib/*.o
41+
42+
subdirs-clean:
43+
@for d in $(SUBDIRS); do make -C $$d clean; done
44+

NEWS

+73
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,73 @@
1+
Version 0.34 (2008-09-25):
2+
- auto-affinity: Fix for using auto-affinity module with jobs using
3+
--use-cpusets=task. The auto-affinity module now checks to make sure
4+
CPU mask has not changed in task context, and if so, silently
5+
does nothing.
6+
- preserve-env: New plugin which, when enabled with --preserve-slurm-env
7+
option, will attempt to keep the remote SLURM_* environment variables
8+
the same as in the current context. Useful for invoking
9+
"srun -n1 --pty bash" from within an allocation shell.
10+
11+
Version 0.33 (2008-09-11):
12+
- Fix for critical locking bug in cpuset plugin. The cpuset plugin
13+
now uses a global lockfile in /var/lock instead of locking files
14+
under /dev/cpuset.
15+
- Fix for generation of SLURM_CMDLINE in use-env plugin.
16+
17+
Version 0.32 (2008-08-21):
18+
- oom-detect: Optionally log OOM killed jobs via syslog(3), if
19+
the do_syslog parameter is used in plugstack.conf. The syslog
20+
message has the form "slurmd: OOM detected: jobid=JOBID uid=UID"
21+
22+
Version 0.31 (2008-08-19):
23+
- oom-detect: Delay slightly if an OOM killed process is detected
24+
to give the error message time to make it to srun stderr.
25+
26+
Version 0.30 (2008-08-04):
27+
- cpuset: Slightly improve config file error messages.
28+
- cpuset: Minor fixes for man pages.
29+
- auto-affinity: Update --auto-affinity=help message.
30+
31+
Version 0.29 (2008-07-29):
32+
- cpuset: Major overhaul of SLURM cpuset support. Now includes a PAM
33+
module, pam_slurm_cpuset.so, and a global config file in
34+
/etc/slurm/slurm-cpuset.conf. For more information, see the
35+
new manual pages included with the distribution.
36+
- auto-affinity: Do not set CPU affinity by default if the number
37+
of available CPUs is not evenly divisible by the number of tasks.
38+
39+
Version 0.28 (2008-07-22):
40+
- auto-affinity: Fix error where spank_post_opt hook was incorrectly
41+
run in srun, which caused an immediate error and abort.
42+
43+
Version 0.27 (2008-07-16):
44+
- cpuset: Expand cpuset support to per-task cpusets via --use-cpusets=tasks.
45+
46+
Version 0.26 (2008-07-16):
47+
- cpuset: Add support for per-job-step cpusets via the new srun option
48+
'--use-cpusets'. See the README or --use-cpusets=help for more information.
49+
- auto-affinity: Delay detection of current cpuset until after user
50+
option processing in the event that user option changed our cpuset.
51+
52+
Version 0.25 (2008-07-10):
53+
- cpuset: Added cpuset plugin to constrain jobs to number of CPUs
54+
allocated on shared, but not oversubscribed nodes.
55+
- auto-affinity: Make auto-affinity plugin cpuset-aware. CPU affinity
56+
is assigned as if the job were running on a node the size of the
57+
current cpuset. If cpusets are not enabled, the auto-affinity behavior
58+
is unchanged.
59+
60+
Version 0.24 (2008-06-10):
61+
- auto-affinity: Query SLURM controller for number of CPUs allocated
62+
to the current job in exclusive_only mode if the environment variable
63+
SLURM_JOB_CPUS_PER_NODE is not set.
64+
65+
Version 0.23 (2008-06-10):
66+
- auto-affinity: Add 'exclusive_only' flag to auto-affinity plugin
67+
to constrain plugin activity to only those jobs that have exclusive
68+
use of the current node.
69+
70+
(2008-06-10):
71+
- Started NEWS file.
72+
73+
$Id: NEWS 7811 2008-09-25 22:21:11Z grondo $

README

+156
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,156 @@
1+
SLURM spank plugins README
2+
==================================
3+
4+
This package includes several SLURM spank plugins developed
5+
at LLNL and used on production compute clusters onsite. A few
6+
of these plugins are only valid when used on LLNL's software
7+
stack (oom-detect.so, for example, requires LLNL-specific patches
8+
to track job's terminated by the OOM killer). However, the
9+
source for all plugins is provided here in the hope that they
10+
might be useful to other plugin developers. The following
11+
is a short description of most of the plugins in this package.
12+
13+
addr-no-randomize
14+
-----------------
15+
16+
The addr-no-randomize plugin allows sysadmins to set a default
17+
policy for address space randomization (when supported and
18+
enabled in the Linux kernel), and provides an option for users
19+
to enable/disable randomization on a per-job basis.
20+
21+
auto-affinity
22+
-----------------
23+
24+
Automatically assign CPU affinity using best-guess defaults.
25+
26+
The default behavior of this plugin attempts to accomodate
27+
multi-threaded apps by assigning more than one CPU per task
28+
if the number of tasks running on the node is evenly divisible
29+
into the number of CPUs. Otherwise, CPU affinity is not enabled
30+
unless the cpus_per_task (cpt) option is specified. The default
31+
behavior may be modified using the --auto-affinity options
32+
listed below. Also, the srun(1) --cpu_bind option is processed
33+
after auto-affinity, and thus may be used to override any CPU
34+
affinity settings from this module.
35+
36+
This plugin should not be used alone on systems using node
37+
sharing. In that case, it should be used along with
38+
the cpuset plugin below (and auto-affinity.so should be listed
39+
*after* cpuset.so in the plugstack.conf).
40+
41+
cpuset
42+
-----------------
43+
44+
The cpuset plugin uses Linux cpusets to constrain jobs to the
45+
number of CPUs they have been allocated on nodes. The plugin
46+
is specifically designed for sytems sharing nodes and using CPU
47+
scheduling (i.e. using the select/cons_res plugin). The plugin
48+
will not work on systems where CPUs are oversubscribed to jobs
49+
(i.e. strict node sharing without the use of select/cons_res).
50+
51+
The plugin also has a pam_slurm_cpuset counterpart, which
52+
replaces pam_slurm and serves an identical functionality,
53+
except that user login sessions are constrained to their
54+
currently allocated CPUs on a node.
55+
56+
The cpuset plugin requires the SGI libbitmask and libcpuset
57+
libraries available from
58+
59+
http://oss.sgi.com/projects/cpusets
60+
61+
(See also cpuset/README)
62+
63+
iorelay
64+
-----------------
65+
66+
The iorelay plugin is an experimental proof-of-concept plugin
67+
for remounting required filesystems for a parallel job from
68+
the first allocated node to all others. It is meant to reduce
69+
the load on global NFS servers.
70+
71+
It has not been used in production.
72+
73+
74+
iotrace
75+
-----------------
76+
77+
The iotrace plugin is another experimental plugin which
78+
uses "plasticfs" to log filesystem access on a per-job
79+
basis.
80+
81+
82+
oom-detect
83+
-----------------
84+
85+
The oom-detect plugin detects jobs that have been victims
86+
of the OOM killer using some special code added to the LLNL
87+
Linux kernel. As tasks exit after having been killed by
88+
the OOM killer, a message is printed to the user's stderr
89+
along with some memory information about the task.
90+
91+
overcommit-memory
92+
-----------------
93+
94+
The overcommit-memory plugin is an attempt to allow users
95+
to tune global overcommit behavior of the Linux kernel on
96+
a per-job basis. It is currently buggy and thus not used.
97+
98+
preserve-env
99+
-----------------
100+
101+
The preserve-env plugin adds an srun option
102+
103+
--preserve-slurm-env
104+
105+
which attempts to preserve the current state of all SLURM_*
106+
environment variables in the remotely executed environment. This
107+
is meant solely to be used from an allocation shell with
108+
the syntax
109+
110+
srun -n1 -N1 --pty --preserve-slurm-env $SHELL
111+
112+
as a sort of "remote" allocation shell.
113+
114+
pty
115+
-----------------
116+
117+
The pty plugin provides the SLURM --pty option, introduced
118+
in slurm-1.3, for slurm-1.2. It isn't fully functional at this
119+
point, but is a good example of a complex feature added solely
120+
from a spank plugin.
121+
122+
123+
renice
124+
-----------------
125+
126+
The renice plugin is the same as the example code in the
127+
spank(8) man page. It provides a new srun option "--renice=VALUE"
128+
which allows users to set the nice value of their remote
129+
tasks (down to a minimum value configured by sysadmin).
130+
131+
system-safe
132+
------------------
133+
134+
The system-safe plugin provides an MPI-safe system(3)
135+
replacement through an LD_PRELOAD library (most of the work
136+
is done in system-safe-preload.c). The preloaded library
137+
interposes a version of system(3) that does not fork. Instead,
138+
the command line is passed through a pipe to a copy of the
139+
program which was pre-forked before MPI_Init(). The return
140+
value of the real system() call is passed back through the
141+
pipe and returned to the calling application, for which there
142+
is no noticable difference with the real system(3).
143+
144+
use-env
145+
------------------
146+
147+
The use-env plugin allows system administrators and users to
148+
modify the environment of SLURM jobs using a set of simple
149+
yet very flexible config files. Environment variables can
150+
be overridden, set only if unset, set based on conditional
151+
syntax, and even defined in a per-task context. The config
152+
files have access to key slurm variables such as SLURM_NNODES,
153+
SLURM_NPROCS, etc., so variables can even be defined differently
154+
depending of the size of the job.
155+
156+
See README.use-env for further information.

0 commit comments

Comments
 (0)