Skip to content

Commit 388273c

Browse files
committed
Upgrade docs
Former-commit-id: f6f70cc
1 parent ee87abc commit 388273c

File tree

6 files changed

+20
-11
lines changed

6 files changed

+20
-11
lines changed

CONTRIBUTING.md

Lines changed: 14 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -3,8 +3,18 @@ Contribution to this project is greatly appreciated! If you find any bugs or hav
33

44
## Roadmaps
55

6-
* **Human-Bot Visualization.** Develop GUI for human interactions (currently we only support terminal-based GUI)
7-
* **Analysis Tools.** Develop tools to visualize the decisions of the agents.
8-
* **Rule-based Agent and Pre-trained Models.** Provide more rule-based agents and pre-trained models to benchmark the evaluation (currently we only support Leduc Hold'em and UNO)
9-
* **Leaderboard.** Develop a platform that enables everyone to upload his/her trained model and compete with each other worldwide.
6+
* **Rule-based Agent and Pre-trained Models.** Provide more rule-based agents and pre-trained models to benchmark the evaluation. We currently have several models in `/models`.
107
* **More Games and Algorithms.** Develop more games and algorithms.
8+
* **Keras Implementation** Provide Keras Implementation of the algorithms.
9+
* **Hyperparameter Search** Search hyperparameters for each environment and update the best one in the example.
10+
11+
## How to create a pull request?
12+
13+
If this your first time to contribute to a project, kindly follow the following instructions. You may find [Creating a pull request](https://help.github.com/en/github/collaborating-with-issues-and-pull-requests/creating-a-pull-request) helpful. Mainly, you need to take the following steps to send a pull request:
14+
15+
* Click **Fork** in the upper-right corner of the project main page to create a new branch in your local Github.
16+
* Clone the repo from your local repo in your Github.
17+
* Make changes in your computer.
18+
* Commit and push your local changes to your local repo.
19+
* Send a pull request to merge your local branch to the branches in RLCard project.
20+

README.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -55,7 +55,7 @@ import rlcard
5555
from rlcard.agents.random_agent import RandomAgent
5656

5757
env = rlcard.make('blackjack')
58-
env.set_agents([RandomAgent()])
58+
env.set_agents([RandomAgent(action_num=env.action_num)])
5959

6060
trajectories, payoffs = env.run()
6161
```
@@ -150,6 +150,7 @@ The purposes of the main modules are listed as below:
150150
* **env.step(action, raw_action=False)**: Take one step in the environment. `action` can be raw action or integer; `raw_action` should be `True` if the action is raw action (string).
151151
* **env.step_back()**: Available only when `allow_step_back` is `True. Take one step backward. This can be used for algorithms that operate on the game tree, such as CFR.
152152
* **env.init_game()**: Initialize a game. Return the state and the first player ID.
153+
* **env.get_payoffs()**: In the end of the game, return a list of payoffs for all the players.
153154
* **env.run()**: Run a complete game and return trajectories and payoffs. The function can be used after the agents are set up.
154155
* **State Definition**: State will always have observation `state['obs']` and legal actions `state['legal_actions']`. If `allow_raw_data` is `True`, state will have raw observation `state['raw_obs']` and raw legal actions `state['raw_legal_actions']`.
155156

docs/algorithms.md

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,6 @@
99
Deep-Q Learning (DQN) [[paper]](https://arxiv.org/abs/1312.5602) is a basic reinforcement learning (RL) algorithm. We wrap DQN as an example to show how RL algorithms can be connected to the environments. In the DQN agent, the following classes are implemented:
1010

1111
* `DQNAgent`: The agent class that interacts with the environment.
12-
* `Normalizer`: The responsibility of this class is to keep a running mean and std. The Normalizer will first preprocess the state before feeding the state into the model.
1312
* `Memory`: A memory buffer that manages the storing and sampling of transitions.
1413
* `Estimator`: The neural network that is used to make predictions.
1514

docs/customizing-environments.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -5,10 +5,10 @@ In addition to the default state representation and action encoding, we also all
55
To define our own state representation, we can modify the ``_extract_state`` function in [/rlcard/envs/limitholdem.py](../rlcard/envs/limitholdem.py#L33).
66

77
## Action Encoding
8-
To define our own action encoding, we can modify the ``_decode_action`` function in [/rlcard/envs/limitholdem.py](../rlcard/envs/limitholdem.py#L69).
8+
To define our own action encoding, we can modify the ``_decode_action`` function in [/rlcard/envs/limitholdem.py](../rlcard/envs/limitholdem.py#L75).
99

1010
## Reward Calculation
11-
To define our own reward calculation, we can modify the ``get_payoffs`` function in [/rlcard/envs/limitholdem.py](../rlcard/envs/limitholdem.py#L60).
11+
To define our own reward calculation, we can modify the ``get_payoffs`` function in [/rlcard/envs/limitholdem.py](../rlcard/envs/limitholdem.py#L67).
1212

1313
## Modifying Game
1414
We can change the parameters of a game to adjust its difficulty. For example, we can change the number of players, the number of allowed raises in Limit Texas Hold'em in the ``__init__`` function in [rlcard/games/limitholdem/game.py](../rlcard/games/limitholdem/game.py#L11).

docs/games.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -139,7 +139,7 @@ Each player will receive a reward 0 (lose) or 1 (win) in the end of the game.
139139

140140
## Mahjong
141141
Mahjong is a tile-based game developed in China, and has spread throughout the world since 20th century. It is commonly played
142-
but 4 players. The game is played with a set of 136 tiles. In turn players draw and discard tiles until
142+
by 4 players. The game is played with a set of 136 tiles. In turn players draw and discard tiles until
143143
The goal of the game is to complete the leagal hand using the 14th drawn tile to form 4 sets and a pair.
144144
We revised the game into a simple version that all of the winning set are equal, and player will win as long as she complete
145145
forming 4 sets and a pair. Please refer the detail on [Wikipedia](https://en.wikipedia.org/wiki/Mahjong) or [Baike](https://baike.baidu.com/item/麻将/215).

docs/high-level-design.md

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ This document introduces the high-level design for the environments, the games,
44
## Environments
55
We wrap each game with an `Env` class. The responsibility of `Env` is to help you generate trajectories of the games. For developing Reinforcement Learning (RL) algorithms, we recommend to use the following interfaces:
66

7-
* `set_agents`: This function tells the `Env` what agents will be used to perform actions in the game. Different games may have a different number of agents. The input of the function is a list of `Agent` class. For example, `env.set_agent([RandomAgent(), RandomAgent()])` indicates that two random agents will be used to generate the trajectories.
7+
* `set_agents`: This function tells the `Env` what agents will be used to perform actions in the game. Different games may have a different number of agents. The input of the function is a list of `Agent` class. For example, `env.set_agent([RandomAgent(action_num=env.action_num) for _ in range(2)])` indicates that two random agents will be used to generate the trajectories.
88
* `run`: After setting the agents, this interface will run a complete trajectory of the game, calculate the reward for each transition, and reorganize the data so that it can be directly fed into a RL algorithm.
99

1010
For advanced access to the environment, such as traversal of the game tree, we provide the following interfaces:
@@ -16,7 +16,6 @@ For advanced access to the environment, such as traversal of the game tree, we p
1616
We also support single-agent mode and human mode. Examples can be found in [examples/](../examples).
1717

1818
* Single agent mode: single-agent environments are developped by simulating other players with pre-trained models or rule-based models. You can enable single-agent mode by `rlcard.make(ENV_ID, config={'single_agent_mode':True})`. Then the `step` function will return `(next_state, reward, done)` just as common single-agent environments. `env.reset()` will reset the game and return the first state.
19-
* Human mode: we provide interfaces to play with the trained agents. You can enable human mode by `rlcard.make(ENV_ID, config={'human_mode':True})`. Then the terminal will print out game information and we play with the agents.
2019

2120
## Games
2221
Card games usually have similar structures. We abstract some concepts in card games and follow the same design pattern. In this way, users/developers can easily dig into the code and change the rules for research purpose. Specifically, the following classes are used in all the games:

0 commit comments

Comments
 (0)