|
| 1 | +SLURM spank plugins README |
| 2 | +================================== |
| 3 | + |
| 4 | +This package includes several SLURM spank plugins developed |
| 5 | +at LLNL and used on production compute clusters onsite. A few |
| 6 | +of these plugins are only valid when used on LLNL's software |
| 7 | +stack (oom-detect.so, for example, requires LLNL-specific patches |
| 8 | +to track job's terminated by the OOM killer). However, the |
| 9 | +source for all plugins is provided here in the hope that they |
| 10 | +might be useful to other plugin developers. The following |
| 11 | +is a short description of most of the plugins in this package. |
| 12 | + |
| 13 | +addr-no-randomize |
| 14 | +----------------- |
| 15 | + |
| 16 | +The addr-no-randomize plugin allows sysadmins to set a default |
| 17 | +policy for address space randomization (when supported and |
| 18 | +enabled in the Linux kernel), and provides an option for users |
| 19 | +to enable/disable randomization on a per-job basis. |
| 20 | + |
| 21 | +auto-affinity |
| 22 | +----------------- |
| 23 | + |
| 24 | +Automatically assign CPU affinity using best-guess defaults. |
| 25 | + |
| 26 | +The default behavior of this plugin attempts to accomodate |
| 27 | +multi-threaded apps by assigning more than one CPU per task |
| 28 | +if the number of tasks running on the node is evenly divisible |
| 29 | +into the number of CPUs. Otherwise, CPU affinity is not enabled |
| 30 | +unless the cpus_per_task (cpt) option is specified. The default |
| 31 | +behavior may be modified using the --auto-affinity options |
| 32 | +listed below. Also, the srun(1) --cpu_bind option is processed |
| 33 | +after auto-affinity, and thus may be used to override any CPU |
| 34 | +affinity settings from this module. |
| 35 | + |
| 36 | +This plugin should not be used alone on systems using node |
| 37 | +sharing. In that case, it should be used along with |
| 38 | +the cpuset plugin below (and auto-affinity.so should be listed |
| 39 | +*after* cpuset.so in the plugstack.conf). |
| 40 | + |
| 41 | +cpuset |
| 42 | +----------------- |
| 43 | + |
| 44 | +The cpuset plugin uses Linux cpusets to constrain jobs to the |
| 45 | +number of CPUs they have been allocated on nodes. The plugin |
| 46 | +is specifically designed for sytems sharing nodes and using CPU |
| 47 | +scheduling (i.e. using the select/cons_res plugin). The plugin |
| 48 | +will not work on systems where CPUs are oversubscribed to jobs |
| 49 | +(i.e. strict node sharing without the use of select/cons_res). |
| 50 | + |
| 51 | +The plugin also has a pam_slurm_cpuset counterpart, which |
| 52 | +replaces pam_slurm and serves an identical functionality, |
| 53 | +except that user login sessions are constrained to their |
| 54 | +currently allocated CPUs on a node. |
| 55 | + |
| 56 | +The cpuset plugin requires the SGI libbitmask and libcpuset |
| 57 | +libraries available from |
| 58 | + |
| 59 | + http://oss.sgi.com/projects/cpusets |
| 60 | + |
| 61 | +(See also cpuset/README) |
| 62 | + |
| 63 | +iorelay |
| 64 | +----------------- |
| 65 | + |
| 66 | +The iorelay plugin is an experimental proof-of-concept plugin |
| 67 | +for remounting required filesystems for a parallel job from |
| 68 | +the first allocated node to all others. It is meant to reduce |
| 69 | +the load on global NFS servers. |
| 70 | + |
| 71 | +It has not been used in production. |
| 72 | + |
| 73 | + |
| 74 | +iotrace |
| 75 | +----------------- |
| 76 | + |
| 77 | +The iotrace plugin is another experimental plugin which |
| 78 | +uses "plasticfs" to log filesystem access on a per-job |
| 79 | +basis. |
| 80 | + |
| 81 | + |
| 82 | +oom-detect |
| 83 | +----------------- |
| 84 | + |
| 85 | +The oom-detect plugin detects jobs that have been victims |
| 86 | +of the OOM killer using some special code added to the LLNL |
| 87 | +Linux kernel. As tasks exit after having been killed by |
| 88 | +the OOM killer, a message is printed to the user's stderr |
| 89 | +along with some memory information about the task. |
| 90 | + |
| 91 | +overcommit-memory |
| 92 | +----------------- |
| 93 | + |
| 94 | +The overcommit-memory plugin is an attempt to allow users |
| 95 | +to tune global overcommit behavior of the Linux kernel on |
| 96 | +a per-job basis. It is currently buggy and thus not used. |
| 97 | + |
| 98 | +preserve-env |
| 99 | +----------------- |
| 100 | + |
| 101 | +The preserve-env plugin adds an srun option |
| 102 | + |
| 103 | + --preserve-slurm-env |
| 104 | + |
| 105 | +which attempts to preserve the current state of all SLURM_* |
| 106 | +environment variables in the remotely executed environment. This |
| 107 | +is meant solely to be used from an allocation shell with |
| 108 | +the syntax |
| 109 | + |
| 110 | + srun -n1 -N1 --pty --preserve-slurm-env $SHELL |
| 111 | + |
| 112 | +as a sort of "remote" allocation shell. |
| 113 | + |
| 114 | +pty |
| 115 | +----------------- |
| 116 | + |
| 117 | +The pty plugin provides the SLURM --pty option, introduced |
| 118 | +in slurm-1.3, for slurm-1.2. It isn't fully functional at this |
| 119 | +point, but is a good example of a complex feature added solely |
| 120 | +from a spank plugin. |
| 121 | + |
| 122 | + |
| 123 | +renice |
| 124 | +----------------- |
| 125 | + |
| 126 | +The renice plugin is the same as the example code in the |
| 127 | +spank(8) man page. It provides a new srun option "--renice=VALUE" |
| 128 | +which allows users to set the nice value of their remote |
| 129 | +tasks (down to a minimum value configured by sysadmin). |
| 130 | + |
| 131 | +system-safe |
| 132 | +------------------ |
| 133 | + |
| 134 | +The system-safe plugin provides an MPI-safe system(3) |
| 135 | +replacement through an LD_PRELOAD library (most of the work |
| 136 | +is done in system-safe-preload.c). The preloaded library |
| 137 | +interposes a version of system(3) that does not fork. Instead, |
| 138 | +the command line is passed through a pipe to a copy of the |
| 139 | +program which was pre-forked before MPI_Init(). The return |
| 140 | +value of the real system() call is passed back through the |
| 141 | +pipe and returned to the calling application, for which there |
| 142 | +is no noticable difference with the real system(3). |
| 143 | + |
| 144 | +use-env |
| 145 | +------------------ |
| 146 | + |
| 147 | +The use-env plugin allows system administrators and users to |
| 148 | +modify the environment of SLURM jobs using a set of simple |
| 149 | +yet very flexible config files. Environment variables can |
| 150 | +be overridden, set only if unset, set based on conditional |
| 151 | +syntax, and even defined in a per-task context. The config |
| 152 | +files have access to key slurm variables such as SLURM_NNODES, |
| 153 | +SLURM_NPROCS, etc., so variables can even be defined differently |
| 154 | +depending of the size of the job. |
| 155 | + |
| 156 | +See README.use-env for further information. |
0 commit comments