You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: CONTRIBUTING.md
+14-4Lines changed: 14 additions & 4 deletions
Original file line number
Diff line number
Diff line change
@@ -3,8 +3,18 @@ Contribution to this project is greatly appreciated! If you find any bugs or hav
3
3
4
4
## Roadmaps
5
5
6
-
***Human-Bot Visualization.** Develop GUI for human interactions (currently we only support terminal-based GUI)
7
-
***Analysis Tools.** Develop tools to visualize the decisions of the agents.
8
-
***Rule-based Agent and Pre-trained Models.** Provide more rule-based agents and pre-trained models to benchmark the evaluation (currently we only support Leduc Hold'em and UNO)
9
-
***Leaderboard.** Develop a platform that enables everyone to upload his/her trained model and compete with each other worldwide.
6
+
***Rule-based Agent and Pre-trained Models.** Provide more rule-based agents and pre-trained models to benchmark the evaluation. We currently have several models in `/models`.
10
7
***More Games and Algorithms.** Develop more games and algorithms.
8
+
***Keras Implementation** Provide Keras Implementation of the algorithms.
9
+
***Hyperparameter Search** Search hyperparameters for each environment and update the best one in the example.
10
+
11
+
## How to create a pull request?
12
+
13
+
If this your first time to contribute to a project, kindly follow the following instructions. You may find [Creating a pull request](https://help.github.com/en/github/collaborating-with-issues-and-pull-requests/creating-a-pull-request) helpful. Mainly, you need to take the following steps to send a pull request:
14
+
15
+
* Click **Fork** in the upper-right corner of the project main page to create a new branch in your local Github.
16
+
* Clone the repo from your local repo in your Github.
17
+
* Make changes in your computer.
18
+
* Commit and push your local changes to your local repo.
19
+
* Send a pull request to merge your local branch to the branches in RLCard project.
@@ -150,6 +150,7 @@ The purposes of the main modules are listed as below:
150
150
***env.step(action, raw_action=False)**: Take one step in the environment. `action` can be raw action or integer; `raw_action` should be `True` if the action is raw action (string).
151
151
***env.step_back()**: Available only when `allow_step_back` is `True. Take one step backward. This can be used for algorithms that operate on the game tree, such as CFR.
152
152
***env.init_game()**: Initialize a game. Return the state and the first player ID.
153
+
***env.get_payoffs()**: In the end of the game, return a list of payoffs for all the players.
153
154
***env.run()**: Run a complete game and return trajectories and payoffs. The function can be used after the agents are set up.
154
155
***State Definition**: State will always have observation `state['obs']` and legal actions `state['legal_actions']`. If `allow_raw_data` is `True`, state will have raw observation `state['raw_obs']` and raw legal actions `state['raw_legal_actions']`.
Copy file name to clipboardExpand all lines: docs/algorithms.md
-1Lines changed: 0 additions & 1 deletion
Original file line number
Diff line number
Diff line change
@@ -9,7 +9,6 @@
9
9
Deep-Q Learning (DQN) [[paper]](https://arxiv.org/abs/1312.5602) is a basic reinforcement learning (RL) algorithm. We wrap DQN as an example to show how RL algorithms can be connected to the environments. In the DQN agent, the following classes are implemented:
10
10
11
11
*`DQNAgent`: The agent class that interacts with the environment.
12
-
*`Normalizer`: The responsibility of this class is to keep a running mean and std. The Normalizer will first preprocess the state before feeding the state into the model.
13
12
*`Memory`: A memory buffer that manages the storing and sampling of transitions.
14
13
*`Estimator`: The neural network that is used to make predictions.
Copy file name to clipboardExpand all lines: docs/customizing-environments.md
+2-2Lines changed: 2 additions & 2 deletions
Original file line number
Diff line number
Diff line change
@@ -5,10 +5,10 @@ In addition to the default state representation and action encoding, we also all
5
5
To define our own state representation, we can modify the ``_extract_state`` function in [/rlcard/envs/limitholdem.py](../rlcard/envs/limitholdem.py#L33).
6
6
7
7
## Action Encoding
8
-
To define our own action encoding, we can modify the ``_decode_action`` function in [/rlcard/envs/limitholdem.py](../rlcard/envs/limitholdem.py#L69).
8
+
To define our own action encoding, we can modify the ``_decode_action`` function in [/rlcard/envs/limitholdem.py](../rlcard/envs/limitholdem.py#L75).
9
9
10
10
## Reward Calculation
11
-
To define our own reward calculation, we can modify the ``get_payoffs`` function in [/rlcard/envs/limitholdem.py](../rlcard/envs/limitholdem.py#L60).
11
+
To define our own reward calculation, we can modify the ``get_payoffs`` function in [/rlcard/envs/limitholdem.py](../rlcard/envs/limitholdem.py#L67).
12
12
13
13
## Modifying Game
14
14
We can change the parameters of a game to adjust its difficulty. For example, we can change the number of players, the number of allowed raises in Limit Texas Hold'em in the ``__init__`` function in [rlcard/games/limitholdem/game.py](../rlcard/games/limitholdem/game.py#L11).
Copy file name to clipboardExpand all lines: docs/games.md
+1-1Lines changed: 1 addition & 1 deletion
Original file line number
Diff line number
Diff line change
@@ -139,7 +139,7 @@ Each player will receive a reward 0 (lose) or 1 (win) in the end of the game.
139
139
140
140
## Mahjong
141
141
Mahjong is a tile-based game developed in China, and has spread throughout the world since 20th century. It is commonly played
142
-
but 4 players. The game is played with a set of 136 tiles. In turn players draw and discard tiles until
142
+
by 4 players. The game is played with a set of 136 tiles. In turn players draw and discard tiles until
143
143
The goal of the game is to complete the leagal hand using the 14th drawn tile to form 4 sets and a pair.
144
144
We revised the game into a simple version that all of the winning set are equal, and player will win as long as she complete
145
145
forming 4 sets and a pair. Please refer the detail on [Wikipedia](https://en.wikipedia.org/wiki/Mahjong) or [Baike](https://baike.baidu.com/item/麻将/215).
Copy file name to clipboardExpand all lines: docs/high-level-design.md
+1-2Lines changed: 1 addition & 2 deletions
Original file line number
Diff line number
Diff line change
@@ -4,7 +4,7 @@ This document introduces the high-level design for the environments, the games,
4
4
## Environments
5
5
We wrap each game with an `Env` class. The responsibility of `Env` is to help you generate trajectories of the games. For developing Reinforcement Learning (RL) algorithms, we recommend to use the following interfaces:
6
6
7
-
*`set_agents`: This function tells the `Env` what agents will be used to perform actions in the game. Different games may have a different number of agents. The input of the function is a list of `Agent` class. For example, `env.set_agent([RandomAgent(), RandomAgent()])` indicates that two random agents will be used to generate the trajectories.
7
+
*`set_agents`: This function tells the `Env` what agents will be used to perform actions in the game. Different games may have a different number of agents. The input of the function is a list of `Agent` class. For example, `env.set_agent([RandomAgent(action_num=env.action_num) for _ in range(2)])` indicates that two random agents will be used to generate the trajectories.
8
8
*`run`: After setting the agents, this interface will run a complete trajectory of the game, calculate the reward for each transition, and reorganize the data so that it can be directly fed into a RL algorithm.
9
9
10
10
For advanced access to the environment, such as traversal of the game tree, we provide the following interfaces:
@@ -16,7 +16,6 @@ For advanced access to the environment, such as traversal of the game tree, we p
16
16
We also support single-agent mode and human mode. Examples can be found in [examples/](../examples).
17
17
18
18
* Single agent mode: single-agent environments are developped by simulating other players with pre-trained models or rule-based models. You can enable single-agent mode by `rlcard.make(ENV_ID, config={'single_agent_mode':True})`. Then the `step` function will return `(next_state, reward, done)` just as common single-agent environments. `env.reset()` will reset the game and return the first state.
19
-
* Human mode: we provide interfaces to play with the trained agents. You can enable human mode by `rlcard.make(ENV_ID, config={'human_mode':True})`. Then the terminal will print out game information and we play with the agents.
20
19
21
20
## Games
22
21
Card games usually have similar structures. We abstract some concepts in card games and follow the same design pattern. In this way, users/developers can easily dig into the code and change the rules for research purpose. Specifically, the following classes are used in all the games:
0 commit comments