Skip to content

Commit 789ac9c

Browse files
committed
start of work to add demo containers
the developer should easily be able to test spindle, and the user should be able to run a small example or tutorial. Ideally we can also extend a container to be able to build and test in CI Signed-off-by: vsoch <[email protected]>
1 parent 8b66554 commit 789ac9c

6 files changed

+495
-0
lines changed

docker/Dockerfile

+89
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,89 @@
1+
FROM centos:7
2+
3+
# docker build -t vanessa/slurm:20.11.8 .
4+
5+
LABEL org.label-schema.vcs-url="https://github.com/hpc/spindle" \
6+
org.label-schema.docker.cmd="docker-compose up -d" \
7+
org.label-schema.name="spindle" \
8+
org.label-schema.description="Spindle with SLURM on Centos 7" \
9+
maintainer="Vanessa Sochat"
10+
11+
ARG SLURM_TAG=slurm-20-11-8-1
12+
13+
RUN set -ex \
14+
&& yum makecache fast \
15+
&& yum -y update \
16+
&& yum -y install epel-release \
17+
&& yum -y install \
18+
wget \
19+
bzip2 \
20+
perl \
21+
gcc \
22+
gcc-c++\
23+
git \
24+
gnupg \
25+
make \
26+
munge \
27+
munge-devel \
28+
python-devel \
29+
python-pip \
30+
python3 \
31+
python3-devel \
32+
python3-pip \
33+
mariadb-server \
34+
mariadb-devel \
35+
psmisc \
36+
bash-completion \
37+
vim-enhanced \
38+
automake \
39+
&& yum clean all \
40+
&& rm -rf /var/cache/yum
41+
42+
RUN pip install Cython nose && pip3 install Cython nose
43+
44+
RUN set -x \
45+
&& git clone https://github.com/SchedMD/slurm.git \
46+
&& pushd slurm \
47+
&& git checkout tags/$SLURM_TAG \
48+
&& ./configure --enable-debug --prefix=/usr --sysconfdir=/etc/slurm \
49+
--with-mysql_config=/usr/bin --libdir=/usr/lib64 \
50+
&& make install \
51+
&& install -D -m644 etc/cgroup.conf.example /etc/slurm/cgroup.conf.example \
52+
&& install -D -m644 etc/slurm.conf.example /etc/slurm/slurm.conf.example \
53+
&& install -D -m644 etc/slurmdbd.conf.example /etc/slurm/slurmdbd.conf.example \
54+
&& install -D -m644 contribs/slurm_completion_help/slurm_completion.sh /etc/profile.d/slurm_completion.sh \
55+
&& popd \
56+
&& rm -rf slurm \
57+
&& groupadd -r --gid=995 slurm \
58+
&& useradd -r -g slurm --uid=995 slurm \
59+
&& mkdir /etc/sysconfig/slurm \
60+
/var/spool/slurmd \
61+
/var/run/slurmd \
62+
/var/run/slurmdbd \
63+
/var/lib/slurmd \
64+
/var/log/slurm \
65+
/data \
66+
&& touch /var/lib/slurmd/node_state \
67+
/var/lib/slurmd/front_end_state \
68+
/var/lib/slurmd/job_state \
69+
/var/lib/slurmd/resv_state \
70+
/var/lib/slurmd/trigger_state \
71+
/var/lib/slurmd/assoc_mgr_state \
72+
/var/lib/slurmd/assoc_usage \
73+
/var/lib/slurmd/qos_usage \
74+
/var/lib/slurmd/fed_mgr_state \
75+
&& chown -R slurm:slurm /var/*/slurm* \
76+
&& /sbin/create-munge-key
77+
78+
COPY slurm.conf /etc/slurm/slurm.conf
79+
COPY slurmdbd.conf /etc/slurm/slurmdbd.conf
80+
81+
COPY docker-entrypoint.sh /usr/local/bin/docker-entrypoint.sh
82+
ENTRYPOINT ["/usr/local/bin/docker-entrypoint.sh"]
83+
84+
RUN yum install -y net-tools openssh-server openssh-clients singularity && \
85+
yum install -y epel-release centos-release-scl lsof sudo httpd24-mod_ssl httpd24-mod_ldap
86+
87+
RUN groupadd spindle && \
88+
useradd --create-home --gid spindle spindle && \
89+
echo -n "spindle" | passwd --stdin spindle

docker/Dockerfile.node

+3
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
FROM vanessa/slurm:18.08.6
2+
3+
# This container will be built on docker-compose up -d

docker/README.md

+195
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,195 @@
1+
# Spindle in Docker
2+
3+
This directory contains a set of container recipes and scripts to allow you
4+
to quickly bring up your own tiny cluster with [docker-compose](https://docs.docker.com/compose/install/), install
5+
spindle, and give it a try. You will need both [docker-compose](https://docs.docker.com/compose/install/)
6+
and [Docker](https://docs.docker.com/get-docker/) installed for this tutorial.
7+
8+
## 1. Build Containers
9+
10+
First, let's build a base container with slurm and centos with the [Dockerfile](Dockerfile) here:
11+
12+
```bash
13+
$ docker build -t vanessa/slurm:20.11.8 .
14+
```
15+
Then building containers is as easy as:
16+
17+
```bash
18+
$ docker-compose build
19+
```
20+
21+
And then bringing them up:
22+
23+
```bash
24+
$ docker-compose up -d
25+
```
26+
27+
And checking that they are running
28+
29+
```bash
30+
$ docker-compose ps
31+
Name Command State Ports
32+
------------------------------------------------------------------------
33+
c1 /usr/local/bin/docker-entr ... Up 6818/tcp
34+
c2 /usr/local/bin/docker-entr ... Up 6818/tcp
35+
mysql docker-entrypoint.sh mysqld Up 3306/tcp, 33060/tcp
36+
slurmctld /usr/local/bin/docker-entr ... Up 6817/tcp
37+
slurmdbd /usr/local/bin/docker-entr ... Up 6819/tcp
38+
```
39+
40+
Each of c1 and c2 are nodes for our cluster, and then slurmctld is like the login node.
41+
42+
```bash
43+
$ docker exec -it slurmctld bash
44+
```
45+
46+
Try running a job!
47+
48+
```bash
49+
$ sbatch --wrap="sleep 20"
50+
# squeue
51+
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
52+
1 normal wrap root R 0:00 1 c1
53+
```
54+
55+
## 2. Install spindle
56+
57+
Now let's follow instructions to install spindle.
58+
59+
```bash
60+
$ git clone https://github.com/hpc/spindle
61+
$ cd spindle
62+
```
63+
64+
We want to install providing paths to munge and slurm.
65+
66+
```bash
67+
./configure --with-munge-dir=/etc/munge --enable-sec-munge --with-slurm-dir=/etc/slurm --enable-testsuite=no
68+
make
69+
make install
70+
```
71+
72+
Note that we are disabling the test suite otherwise we'd get an install error not detecting
73+
an MPI library. Now we can see spindle!
74+
75+
```
76+
# spindle --help
77+
Usage: spindle [OPTION...] mpi_command
78+
79+
These options specify what types of files should be loaded through the Spindle
80+
network
81+
-a, --reloc-aout=yes|no Relocate the main executable through Spindle.
82+
Default: yes
83+
-f, --follow-fork=yes|no Relocate objects in fork'd child processes.
84+
Default: yes
85+
-l, --reloc-libs=yes|no Relocate shared libraries through Spindle.
86+
Default: yes
87+
-x, --reloc-exec=yes|no Relocate the targets of exec/execv/execve/...
88+
calls. Default: yes
89+
-y, --reloc-python=yes|no Relocate python modules (.py/.pyc) files when
90+
loaded via python. Default: yes
91+
92+
These options specify how the Spindle network should distibute files. Push is
93+
better for SPMD programs. Pull is better for MPMD programs. Default is push.
94+
-p, --push Use a push model where objects loaded by any
95+
process are made available to all processes
96+
-q, --pull Use a pull model where objects are only made
97+
available to processes that require them
98+
99+
These options configure Spindle's network model. Typical Spindle runs should
100+
not need to set these.
101+
-c, --cobo Use a tree-based cobo network for distributing
102+
objects
103+
-t, --port=port1-port2 TCP/IP port range for Spindle servers. Default:
104+
21940-21964
105+
106+
These options specify the security model Spindle should use for validating TCP
107+
connections. Spindle will choose a default value if no option is specified.
108+
--security-munge Use munge for security authentication
109+
110+
These options specify the job launcher Spindle is being run with. If
111+
unspecified, Spindle will try to autodetect.
112+
--launcher-startup Launch spindle daemons using the system's job
113+
launcher (requires an already set-up session).
114+
--no-mpi Run serial jobs instead of MPI job
115+
--openmpi MPI job is launched with the OpenMPI job jauncher.
116+
117+
--slurm MPI job is launched with the srun job launcher.
118+
--wreck MPI Job is launched with the wreck job launcher.
119+
120+
Options for managing sessions, which can run multiple jobs out of one spindle
121+
cache.
122+
--end-session=session-id End a persistent Spindle session with the
123+
given session-id
124+
--run-in-session=session-id
125+
Run a new job in the given session
126+
--start-session Start a persistent Spindle session and print the
127+
session-id to stdout
128+
129+
Misc options
130+
-b, --shmcache-size=size Size of client shared memory cache in kilobytes,
131+
which can be used to improve performance if
132+
multiple processes are running on each node.
133+
Default: 0
134+
--cache-prefix=path Alias for python-prefix
135+
--cleanup-proc=yes|no Fork a dedicated process to clean-up files
136+
post-spindle. Useful for high-fault situations.
137+
Default: no
138+
-d, --debug=yes|no If yes, hide spindle from debuggers so they think
139+
libraries come from the original locations. May
140+
cause extra overhead. Default: yes
141+
-e, --preload=FILE Provides a text file containing a white-space
142+
separated list of files that should be relocated
143+
to each node before execution begins
144+
--enable-rsh=yes|no Enable startint daemons with an rsh tree, if the
145+
startup mode supports it. Default: No
146+
--hostbin=EXECUTABLE Path to a script that returns the hostlist for a
147+
job on a cluster
148+
-h, --no-hide Don't hide spindle file descriptors from
149+
application
150+
-k, --audit-type=subaudit|audit
151+
Use the new-style subaudit interface for
152+
intercepting ld.so, or the old-style audit
153+
interface. The subaudit option reduces memory
154+
overhead, but is more complex. Default is audit.
155+
--msgcache-buffer=size Enables message buffering if size is non-zero,
156+
otherwise sets the size of the buffer in
157+
kilobytes
158+
--msgcache-timeout=timeout Enables message buffering if size is
159+
non-zero, otherwise sets the buffering timeout in
160+
milliseconds
161+
-n, --noclean=yes|no Don't remove local file cache after execution.
162+
Default: no (removes the cache)
163+
-o, --location=directory Back-end directory for storing relocated files.
164+
Should be a non-shared location such as a ramdisk.
165+
Default: $TMPDIR
166+
--persist=yes|no Allow spindle servers to persist after the last
167+
client job has exited. Default: No
168+
-r, --python-prefix=path Colon-seperated list of directories that contain
169+
the python install location
170+
-s, --strip=yes|no Strip debug and symbol information from binaries
171+
before distributing them. Default: yes
172+
173+
-?, --help Give this help list
174+
--usage Give a short usage message
175+
-V, --version Print program version
176+
177+
Mandatory or optional arguments to long options are also mandatory or optional
178+
for any corresponding short options.
179+
180+
Report bugs to [email protected].
181+
```
182+
183+
## 3. Use Spindle
184+
185+
**TODO** we need a dummy example here
186+
187+
188+
## 4. Clean Up
189+
190+
When you are done, exit from the container, stop and remove your images:
191+
192+
```bash
193+
$ docker-compose stop
194+
$ docker-compose rm
195+
```

docker/docker-compose.yml

+77
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,77 @@
1+
version: "2.2"
2+
3+
services:
4+
mysql:
5+
image: mysql:5.7
6+
hostname: mysql
7+
container_name: mysql
8+
environment:
9+
MYSQL_RANDOM_ROOT_PASSWORD: "yes"
10+
MYSQL_DATABASE: slurm_acct_db
11+
MYSQL_USER: slurm
12+
MYSQL_PASSWORD: password
13+
volumes:
14+
- var_lib_mysql:/var/lib/mysql
15+
16+
slurmdbd:
17+
image: vanessa/slurm:18.08.6
18+
command: "slurmdbd"
19+
container_name: slurmdbd
20+
hostname: slurmdbd
21+
volumes:
22+
- etc_munge:/etc/munge
23+
- etc_slurm:/etc/slurm
24+
- var_log_slurm:/var/log/slurm
25+
expose:
26+
- "6819"
27+
depends_on:
28+
- mysql
29+
30+
slurmctld:
31+
image: vanessa/slurm:18.08.6
32+
command: "slurmctld"
33+
container_name: slurmctld
34+
hostname: slurmctld
35+
volumes_from:
36+
- slurmdbd
37+
expose:
38+
- "6817"
39+
depends_on:
40+
- "slurmdbd"
41+
42+
c1:
43+
build:
44+
context: .
45+
dockerfile: Dockerfile.node
46+
command: "slurmd"
47+
privileged: true
48+
hostname: c1
49+
container_name: c1
50+
volumes_from:
51+
- slurmctld
52+
expose:
53+
- "6818"
54+
depends_on:
55+
- "slurmctld"
56+
57+
c2:
58+
build:
59+
context: .
60+
dockerfile: Dockerfile.node
61+
command: "slurmd"
62+
privileged: true
63+
hostname: c2
64+
container_name: c2
65+
volumes_from:
66+
- slurmctld
67+
expose:
68+
- "6818"
69+
depends_on:
70+
- "slurmctld"
71+
72+
volumes:
73+
etc_munge:
74+
etc_slurm:
75+
slurm_jobdir:
76+
var_lib_mysql:
77+
var_log_slurm:

0 commit comments

Comments
 (0)