You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We also provide an interactive learning configuration with Jupyter Notebook and *ipywidgets*, where you can select the algorithm, environment, and general learning settings with simple clicking on dropdown lists and sliders! A video demonstrating the usage is as following. The interactive mode can be used with [`rlzoo/interactive/main.ipynb`](https://github.com/tensorlayer/RLzoo/blob/master/rlzoo/interactive/main.ipynb) by running `$ jupyter notebook` to open it.
RLzoo supports distributed training frameworks across multiple computational nodes with multiple CPUs/GPUs, using the [Kungfu](https://github.com/lsds/KungFu) package. The installation of Kungfu requires to install *CMake* and *Golang* first, details see the [website of Kungfu](https://github.com/lsds/KungFu).
292
+
An example for distributed training is contained in folder `rlzoo/distributed`, by running the following command, you will launch the distributed training process:
293
+
```bash
294
+
rlzoo/distributed/run_dis_train.sh
295
+
```
296
+
<details><summary><b>Code in Bash script</b> <i>[click to expand]</i></summary>
297
+
<div>
298
+
299
+
```bash
300
+
#!/bin/sh
301
+
set -e
302
+
303
+
cd$(dirname $0)
304
+
305
+
kungfu_flags() {
306
+
echo -q
307
+
echo -logdir logs
308
+
309
+
local ip1=127.0.0.1
310
+
local np1=$np
311
+
312
+
local ip2=127.0.0.10
313
+
local np2=$np
314
+
local H=$ip1:$np1,$ip2:$np2
315
+
local m=cpu,gpu
316
+
317
+
echo -H $ip1:$np1
318
+
}
319
+
320
+
prun() {
321
+
local np=$1
322
+
shift
323
+
kungfu-run $(kungfu_flags) -np $np$@
324
+
}
325
+
326
+
n_learner=2
327
+
n_actor=2
328
+
n_server=1
329
+
330
+
flags() {
331
+
echo -l $n_learner
332
+
echo -a $n_actor
333
+
echo -s $n_server
334
+
}
335
+
336
+
rl_run() {
337
+
local n=$((n_learner + n_actor + n_server))
338
+
prun $n python3 training_components.py $(flags)
339
+
}
340
+
341
+
main() {
342
+
rl_run
343
+
}
344
+
345
+
main
346
+
```
347
+
The script specifies the ip addresses for different computational nodes, as well as the number of policy learners (updating the models), actors (sampling through interaction with environments) and inference servers (policy forward inference during sampling process) as `n_learner`, `n_actor` and `n_server` respectively.
348
+
349
+
</div>
350
+
</details>
351
+
352
+
Other training details are specified in an individual Python script named `training_components.py`**within the same directory** as `run_dis_train.sh`, which can be seen as following.
353
+
354
+
<details><summary><b>Code in Python script</b> <i>[click to expand]</i></summary>
355
+
<div>
356
+
357
+
```python
358
+
from rlzoo.common.env_wrappers import build_env
359
+
from rlzoo.common.policy_networks import*
360
+
from rlzoo.common.value_networks import*
361
+
from rlzoo.algorithms.dppo_clip_distributed.dppo_clip importDPPO_CLIP
362
+
from functools import partial
363
+
364
+
# Specify the training configurations
365
+
training_conf = {
366
+
'total_step': int(1e7), # overall training timesteps
367
+
'traj_len': 200, # length of the rollout trajectory
368
+
'train_n_traj': 2, # update the models after every certain number of trajectories for each learner
369
+
'save_interval': 10, # saving the models after every certain number of updates
Users can specify the environment, network architectures, optimizers and other training detains in this script.
424
+
425
+
</div>
426
+
</details>
427
+
428
+
Note: if RLzoo is installed, you can create the two scripts `run_dis_train.sh` and `training_components.py` in whatever directory to launch distributed training, as long as the two scripts are in the same directory.
0 commit comments