|
| 1 | +# Predicate tasks. |
| 2 | + |
| 3 | +This package contains tasks associated with "Behavior Priors for Efficient |
| 4 | +Reiforcement Learning" (https://arxiv.org/abs/2010.14274), "Exploiting Hierarchy |
| 5 | +for Learning and Transfer in KL-Regularized RL" (https://arxiv.org/abs/2010.14274) |
| 6 | +and "Information asymmetry in KL-regularized RL" |
| 7 | +(https://arxiv.org/abs/1905.01240). |
| 8 | +This is research code, and has dependencies on more stable code that is |
| 9 | +available as part of [`dm_control`], in particular upon components in |
| 10 | +[`dm_control.locomotion`] and [`dm_control.manipulation`]. |
| 11 | + |
| 12 | +To get access to preconfigured python environments for the tasks, see the |
| 13 | +`task_examples.py` file. To use the MuJoCo interactive viewer (from dm_control) |
| 14 | +to load the environments, see `explore.py`. |
| 15 | + |
| 16 | +<p float="left"> |
| 17 | + <img src="tasks.png" height="200"> |
| 18 | +</p> |
| 19 | + |
| 20 | +## Installation instructions |
| 21 | + |
| 22 | +1. Download [MuJoCo Pro](https://mujoco.org/) and extract the zip archive as |
| 23 | + `~/.mujoco/mujoco200_$PLATFORM` where `$PLATFORM` is one of `linux`, |
| 24 | + `macos`, or `win64`. |
| 25 | + |
| 26 | +2. Ensure that a valid MuJoCo license key file is located at |
| 27 | + `~/.mujoco/mjkey.txt`. |
| 28 | + |
| 29 | +3. Clone the `deepmind-research` repository: |
| 30 | + |
| 31 | + ```shell |
| 32 | + git clone https://github.com/deepmind/deepmind-research.git |
| 33 | + cd deepmind-research |
| 34 | + ``` |
| 35 | + |
| 36 | +4. Create and activate a Python virtual environment: |
| 37 | + |
| 38 | + ```shell |
| 39 | + python3 -m virtualenv box_arrangement |
| 40 | + source box_arrangement/bin/activate |
| 41 | + ``` |
| 42 | + |
| 43 | +5. Install the package: |
| 44 | + |
| 45 | + ```shell |
| 46 | + pip install ./box_arrangement |
| 47 | + ``` |
| 48 | + |
| 49 | +## Quickstart |
| 50 | + |
| 51 | +To instantiate and step through the go to one of K targets task: |
| 52 | + |
| 53 | +```python |
| 54 | +from box_arrangement import task_examples |
| 55 | +import numpy as np |
| 56 | +
|
| 57 | +# Build an example environment. |
| 58 | +env = task_examples.go_to_k_targets() |
| 59 | +
|
| 60 | +# Get the `action_spec` describing the control inputs. |
| 61 | +action_spec = env.action_spec() |
| 62 | +
|
| 63 | +# Step through the environment for one episode with random actions. |
| 64 | +time_step = env.reset() |
| 65 | +while not time_step.last(): |
| 66 | + action = np.random.uniform(action_spec.minimum, action_spec.maximum, |
| 67 | + size=action_spec.shape) |
| 68 | + time_step = env.step(action) |
| 69 | + print("reward = {}, discount = {}, observations = {}.".format( |
| 70 | + time_step.reward, time_step.discount, time_step.observation)) |
| 71 | +``` |
| 72 | +
|
| 73 | +The above code snippet can also be used for other tasks by replacing |
| 74 | +`go_to_k_targets` with one of (`move_box`, `move_box_or_gtt` and |
| 75 | +`move_box_and_gtt`). |
| 76 | +
|
| 77 | +## Visualization |
| 78 | +
|
| 79 | +[`dm_control.viewer`] can be used to visualize and interact with the |
| 80 | +environment. We provide the `explore.py` script specifically for this. If you |
| 81 | +followed our installation instructions above, this can be launched for the |
| 82 | +go to one of K targets task via: |
| 83 | +
|
| 84 | +```shell |
| 85 | +python3 -m box_arrangement.explore --task='go_to_target' |
| 86 | +``` |
| 87 | +
|
| 88 | +## Citation |
| 89 | +
|
| 90 | +If you use the code or data in this package, please cite: |
| 91 | +
|
| 92 | +``` |
| 93 | +@misc{tirumala2020behavior, |
| 94 | + title={Behavior Priors for Efficient Reinforcement Learning}, |
| 95 | + author={Dhruva Tirumala and Alexandre Galashov and Hyeonwoo Noh and Leonard Hasenclever and Razvan Pascanu and Jonathan Schwarz and Guillaume Desjardins and Wojciech Marian Czarnecki and Arun Ahuja and Yee Whye Teh and Nicolas Heess}, |
| 96 | + year={2020}, |
| 97 | + eprint={2010.14274}, |
| 98 | + archivePrefix={arXiv}, |
| 99 | + primaryClass={cs.AI} |
| 100 | +} |
| 101 | +``` |
| 102 | +
|
| 103 | +[`dm_control`]: https://github.com/deepmind/dm_control |
| 104 | +[`dm_control.locomotion`]: https://github.com/deepmind/dm_control/tree/master/dm_control/locomotion |
| 105 | +[`dm_control.manipulation`]: https://github.com/deepmind/dm_control/tree/master/dm_control/manipulation |
| 106 | +[`dm_control.viewer`]: https://github.com/deepmind/dm_control/tree/master/dm_control/viewer |
0 commit comments