Skip to content

Commit 6811a32

Browse files
Merge pull request #23 from tensorlayer/dev-distributed
Dev distributed
2 parents 1788af8 + dae4afc commit 6811a32

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

78 files changed

+16473
-15599
lines changed

README.md

Lines changed: 154 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -49,7 +49,7 @@ Please check our [**Online Documentation**](https://rlzoo.readthedocs.io) for de
4949
- [Contents](#contents)
5050
- [Algorithms](#algorithms)
5151
- [Environments](#environments)
52-
- [Configurations](#configuration)
52+
- [Configurations](#configurations)
5353
- [Properties](#properties)
5454
- [Troubleshooting](#troubleshooting)
5555
- [Credits](#credits)
@@ -66,8 +66,14 @@ the coming months after initial release. We will keep improving the potential pr
6666

6767
<details><summary><b>Version History</b> <i>[click to expand]</i></summary>
6868
<div>
69+
70+
* 1.0.4 (Current version)
6971

70-
* 1.0.3 (Current version)
72+
Changes:
73+
74+
* Add distributed training for DPPO algorithm, using Kungfu
75+
76+
* 1.0.3
7177

7278
Changes:
7379

@@ -279,6 +285,148 @@ python algorithms/ac/run_ac.py
279285
We also provide an interactive learning configuration with Jupyter Notebook and *ipywidgets*, where you can select the algorithm, environment, and general learning settings with simple clicking on dropdown lists and sliders! A video demonstrating the usage is as following. The interactive mode can be used with [`rlzoo/interactive/main.ipynb`](https://github.com/tensorlayer/RLzoo/blob/master/rlzoo/interactive/main.ipynb) by running `$ jupyter notebook` to open it.
280286

281287
![Interactive Video](https://github.com/tensorlayer/RLzoo/blob/master/gif/interactive.gif)
288+
289+
290+
### Distributed Training
291+
RLzoo supports distributed training frameworks across multiple computational nodes with multiple CPUs/GPUs, using the [Kungfu](https://github.com/lsds/KungFu) package. The installation of Kungfu requires to install *CMake* and *Golang* first, details see the [website of Kungfu](https://github.com/lsds/KungFu).
292+
An example for distributed training is contained in folder `rlzoo/distributed`, by running the following command, you will launch the distributed training process:
293+
```bash
294+
rlzoo/distributed/run_dis_train.sh
295+
```
296+
<details><summary><b>Code in Bash script</b> <i>[click to expand]</i></summary>
297+
<div>
298+
299+
```bash
300+
#!/bin/sh
301+
set -e
302+
303+
cd $(dirname $0)
304+
305+
kungfu_flags() {
306+
echo -q
307+
echo -logdir logs
308+
309+
local ip1=127.0.0.1
310+
local np1=$np
311+
312+
local ip2=127.0.0.10
313+
local np2=$np
314+
local H=$ip1:$np1,$ip2:$np2
315+
local m=cpu,gpu
316+
317+
echo -H $ip1:$np1
318+
}
319+
320+
prun() {
321+
local np=$1
322+
shift
323+
kungfu-run $(kungfu_flags) -np $np $@
324+
}
325+
326+
n_learner=2
327+
n_actor=2
328+
n_server=1
329+
330+
flags() {
331+
echo -l $n_learner
332+
echo -a $n_actor
333+
echo -s $n_server
334+
}
335+
336+
rl_run() {
337+
local n=$((n_learner + n_actor + n_server))
338+
prun $n python3 training_components.py $(flags)
339+
}
340+
341+
main() {
342+
rl_run
343+
}
344+
345+
main
346+
```
347+
The script specifies the ip addresses for different computational nodes, as well as the number of policy learners (updating the models), actors (sampling through interaction with environments) and inference servers (policy forward inference during sampling process) as `n_learner`, `n_actor` and `n_server` respectively.
348+
349+
</div>
350+
</details>
351+
352+
Other training details are specified in an individual Python script named `training_components.py` **within the same directory** as `run_dis_train.sh`, which can be seen as following.
353+
354+
<details><summary><b>Code in Python script</b> <i>[click to expand]</i></summary>
355+
<div>
356+
357+
```python
358+
from rlzoo.common.env_wrappers import build_env
359+
from rlzoo.common.policy_networks import *
360+
from rlzoo.common.value_networks import *
361+
from rlzoo.algorithms.dppo_clip_distributed.dppo_clip import DPPO_CLIP
362+
from functools import partial
363+
364+
# Specify the training configurations
365+
training_conf = {
366+
'total_step': int(1e7), # overall training timesteps
367+
'traj_len': 200, # length of the rollout trajectory
368+
'train_n_traj': 2, # update the models after every certain number of trajectories for each learner
369+
'save_interval': 10, # saving the models after every certain number of updates
370+
}
371+
372+
# Specify the environment and launch it
373+
env_name, env_type = 'CartPole-v0', 'classic_control'
374+
env_maker = partial(build_env, env_name, env_type)
375+
temp_env = env_maker()
376+
obs_shape, act_shape = temp_env.observation_space.shape, temp_env.action_space.shape
377+
378+
env_conf = {
379+
'env_name': env_name,
380+
'env_type': env_type,
381+
'env_maker': env_maker,
382+
'obs_shape': obs_shape,
383+
'act_shape': act_shape,
384+
}
385+
386+
387+
def build_network(observation_space, action_space, name='DPPO_CLIP'):
388+
""" build networks for the algorithm """
389+
hidden_dim = 256
390+
num_hidden_layer = 2
391+
critic = ValueNetwork(observation_space, [hidden_dim] * num_hidden_layer, name=name + '_value')
392+
393+
actor = StochasticPolicyNetwork(observation_space, action_space,
394+
[hidden_dim] * num_hidden_layer,
395+
trainable=True,
396+
name=name + '_policy')
397+
return critic, actor
398+
399+
400+
def build_opt(actor_lr=1e-4, critic_lr=2e-4):
401+
""" choose the optimizer for learning """
402+
import tensorflow as tf
403+
return [tf.optimizers.Adam(critic_lr), tf.optimizers.Adam(actor_lr)]
404+
405+
406+
net_builder = partial(build_network, temp_env.observation_space, temp_env.action_space)
407+
opt_builder = partial(build_opt, )
408+
409+
agent_conf = {
410+
'net_builder': net_builder,
411+
'opt_builder': opt_builder,
412+
'agent_generator': partial(DPPO_CLIP, net_builder, opt_builder),
413+
}
414+
del temp_env
415+
416+
from rlzoo.distributed.start_dis_role import main
417+
418+
print('Start Training.')
419+
main(training_conf, env_conf, agent_conf)
420+
print('Training Finished.')
421+
422+
```
423+
Users can specify the environment, network architectures, optimizers and other training detains in this script.
424+
425+
</div>
426+
</details>
427+
428+
Note: if RLzoo is installed, you can create the two scripts `run_dis_train.sh` and `training_components.py` in whatever directory to launch distributed training, as long as the two scripts are in the same directory.
429+
282430

283431

284432
## Contents
@@ -399,8 +547,12 @@ Our core contributors include:
399547
[Tianyang Yu](https://github.com/Tokarev-TT-33),
400548
[Yanhua Huang](https://github.com/Officium),
401549
[Hongming Zhang](https://github.com/initial-h),
550+
[Guo Li](https://github.com/lgarithm),
551+
Quancheng Guo,
552+
[Luo Mai](https://github.com/luomai),
402553
[Hao Dong](https://github.com/zsdonghao)
403554

555+
404556
## Citing
405557

406558
```

rlzoo/.gitignore

100644100755
Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
*.pyc
2-
/img
3-
/log
4-
/model
1+
*.pyc
2+
/img
3+
/log
4+
/model

rlzoo/__init__.py

100644100755
File mode changed.

rlzoo/algorithms/__init__.py

100644100755
Lines changed: 14 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -1,14 +1,14 @@
1-
from .ac.ac import AC
2-
from .pg.pg import PG
3-
from .dqn.dqn import DQN
4-
from .a3c.a3c import A3C
5-
from .ddpg.ddpg import DDPG
6-
from .td3.td3 import TD3
7-
from .sac.sac import SAC
8-
from .ppo.ppo import PPO
9-
from .ppo_penalty.ppo_penalty import PPO_PENALTY
10-
from .ppo_clip.ppo_clip import PPO_CLIP
11-
from .dppo.dppo import DPPO
12-
from .dppo_penalty.dppo_penalty import DPPO_PENALTY
13-
from .dppo_clip.dppo_clip import DPPO_CLIP
14-
from .trpo.trpo import TRPO
1+
from .ac.ac import AC
2+
from .pg.pg import PG
3+
from .dqn.dqn import DQN
4+
from .a3c.a3c import A3C
5+
from .ddpg.ddpg import DDPG
6+
from .td3.td3 import TD3
7+
from .sac.sac import SAC
8+
from .ppo.ppo import PPO
9+
from .ppo_penalty.ppo_penalty import PPO_PENALTY
10+
from .ppo_clip.ppo_clip import PPO_CLIP
11+
from .dppo.dppo import DPPO
12+
from .dppo_penalty.dppo_penalty import DPPO_PENALTY
13+
from .dppo_clip.dppo_clip import DPPO_CLIP
14+
from .trpo.trpo import TRPO

rlzoo/algorithms/a3c/__init__.py

100644100755
File mode changed.

0 commit comments

Comments
 (0)