Upgrade docs

daochenzha · daochenzha · commit 388273c44fcf · 2020-03-20T00:09:17.000-05:00
Former-commit-id: f6f70cc
diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
@@ -3,8 +3,18 @@ Contribution to this project is greatly appreciated! If you find any bugs or hav
 
 ## Roadmaps
 
-*   **Human-Bot Visualization.** Develop GUI for human interactions (currently we only support terminal-based GUI)
-*   **Analysis Tools.** Develop tools to visualize the decisions of the agents.
-*   **Rule-based Agent and Pre-trained Models.** Provide more rule-based agents and pre-trained models to benchmark the evaluation (currently we only support Leduc Hold'em and UNO)
-*   **Leaderboard.** Develop a platform that enables everyone to upload his/her trained model and compete with each other worldwide.
+*   **Rule-based Agent and Pre-trained Models.** Provide more rule-based agents and pre-trained models to benchmark the evaluation. We currently have several models in `/models`.
 *   **More Games and Algorithms.** Develop more games and algorithms.
+*   **Keras Implementation** Provide Keras Implementation of the algorithms.
+*   **Hyperparameter Search** Search hyperparameters for each environment and update the best one in the example.
+
+## How to create a pull request?
+
+If this your first time to contribute to a project, kindly follow the following instructions. You may find [Creating a pull request](https://help.github.com/en/github/collaborating-with-issues-and-pull-requests/creating-a-pull-request) helpful. Mainly, you need to take the following steps to send a pull request:
+
+*   Click **Fork** in the upper-right corner of the project main page to create a new branch in your local Github.
+*   Clone the repo from your local repo in your Github.
+*   Make changes in your computer.
+*   Commit and push your local changes to your local repo.
+*   Send a pull request to merge your local branch to the branches in RLCard project.
+
diff --git a/README.md b/README.md
@@ -55,7 +55,7 @@ import rlcard
 from rlcard.agents.random_agent import RandomAgent
 
 env = rlcard.make('blackjack')
-env.set_agents([RandomAgent()])
+env.set_agents([RandomAgent(action_num=env.action_num)])
 
 trajectories, payoffs = env.run()
 ```
@@ -150,6 +150,7 @@ The purposes of the main modules are listed as below:
 *   **env.step(action, raw_action=False)**: Take one step in the environment. `action` can be raw action or integer; `raw_action` should be `True` if the action is raw action (string).
 *   **env.step_back()**: Available only when `allow_step_back` is `True. Take one step backward. This can be used for algorithms that operate on the game tree, such as CFR. 
 *   **env.init_game()**: Initialize a game. Return the state and the first player ID.
+*   **env.get_payoffs()**: In the end of the game, return a list of payoffs for all the players.
 *   **env.run()**: Run a complete game and return trajectories and payoffs. The function can be used after the agents are set up.
 *   **State Definition**: State will always have observation `state['obs']` and legal actions `state['legal_actions']`. If `allow_raw_data` is `True`, state will have raw observation `state['raw_obs']` and raw legal actions `state['raw_legal_actions']`.
 
diff --git a/docs/algorithms.md b/docs/algorithms.md
@@ -9,7 +9,6 @@
 Deep-Q Learning (DQN) [[paper]](https://arxiv.org/abs/1312.5602) is a basic reinforcement learning (RL) algorithm. We wrap DQN as an example to show how RL algorithms can be connected to the environments. In the DQN agent, the following classes are implemented:
 
 *   `DQNAgent`: The agent class that interacts with the environment.
-*   `Normalizer`: The responsibility of this class is to keep a running mean and std. The Normalizer will first preprocess the state before feeding the state into the model.
 *   `Memory`: A memory buffer that manages the storing and sampling of transitions.
 *   `Estimator`: The neural network that is used to make predictions.
 
diff --git a/docs/customizing-environments.md b/docs/customizing-environments.md
@@ -5,10 +5,10 @@ In addition to the default state representation and action encoding, we also all
 To define our own state representation, we can modify the ``_extract_state`` function in [/rlcard/envs/limitholdem.py](../rlcard/envs/limitholdem.py#L33).
 
 ## Action Encoding
-To define our own action encoding, we can modify the ``_decode_action`` function in [/rlcard/envs/limitholdem.py](../rlcard/envs/limitholdem.py#L69).
+To define our own action encoding, we can modify the ``_decode_action`` function in [/rlcard/envs/limitholdem.py](../rlcard/envs/limitholdem.py#L75).
 
 ## Reward Calculation
-To define our own reward calculation, we can modify the ``get_payoffs`` function in [/rlcard/envs/limitholdem.py](../rlcard/envs/limitholdem.py#L60).
+To define our own reward calculation, we can modify the ``get_payoffs`` function in [/rlcard/envs/limitholdem.py](../rlcard/envs/limitholdem.py#L67).
 
 ## Modifying Game
 We can change the parameters of a game to adjust its difficulty. For example, we can change the number of players, the number of allowed raises in Limit Texas Hold'em in the ``__init__`` function in [rlcard/games/limitholdem/game.py](../rlcard/games/limitholdem/game.py#L11).
diff --git a/docs/games.md b/docs/games.md
@@ -139,7 +139,7 @@ Each player will receive a reward 0 (lose) or 1 (win) in the end of the game.
 
 ## Mahjong
 Mahjong is a tile-based game developed in China, and has spread throughout the world since 20th century. It is commonly played
-but 4 players. The game is played with a set of 136 tiles. In turn players draw and discard tiles until  
+by 4 players. The game is played with a set of 136 tiles. In turn players draw and discard tiles until  
 The goal of the game is to complete the leagal hand using the 14th drawn tile to form 4 sets and a pair. 
 We revised the game into a simple version that all of the winning set are equal, and player will win as long as she complete 
 forming 4 sets and a pair. Please refer the detail on [Wikipedia](https://en.wikipedia.org/wiki/Mahjong) or  [Baike](https://baike.baidu.com/item/麻将/215).
diff --git a/docs/high-level-design.md b/docs/high-level-design.md
@@ -4,7 +4,7 @@ This document introduces the high-level design for the environments, the games,
 ## Environments
 We wrap each game with an `Env` class. The responsibility of `Env` is to help you generate trajectories of the games. For developing Reinforcement Learning (RL) algorithms, we recommend to use the following interfaces:
 
-*   `set_agents`: This function tells the `Env` what agents will be used to perform actions in the game. Different games may have a different number of agents. The input of the function is a list of `Agent` class. For example, `env.set_agent([RandomAgent(), RandomAgent()])` indicates that two random agents will be used to generate the trajectories.
+*   `set_agents`: This function tells the `Env` what agents will be used to perform actions in the game. Different games may have a different number of agents. The input of the function is a list of `Agent` class. For example, `env.set_agent([RandomAgent(action_num=env.action_num) for _ in range(2)])` indicates that two random agents will be used to generate the trajectories.
 *   `run`: After setting the agents, this interface will run a complete trajectory of the game, calculate the reward for each transition, and reorganize the data so that it can be directly fed into a RL algorithm.
 
 For advanced access to the environment, such as traversal of the game tree, we provide the following interfaces:
@@ -16,7 +16,6 @@ For advanced access to the environment, such as traversal of the game tree, we p
 We also support single-agent mode and human mode. Examples can be found in [examples/](../examples).
 
 *   Single agent mode: single-agent environments are developped by simulating other players with pre-trained models or rule-based models. You can enable single-agent mode by `rlcard.make(ENV_ID, config={'single_agent_mode':True})`. Then the `step` function will return `(next_state, reward, done)` just as common single-agent environments. `env.reset()` will reset the game and return the first state.
-*   Human mode: we provide interfaces to play with the trained agents. You can enable human mode by `rlcard.make(ENV_ID, config={'human_mode':True})`. Then the terminal will print out game information and we play with the agents.
 
 ## Games
 Card games usually have similar structures. We abstract some concepts in card games and follow the same design pattern. In this way, users/developers can easily dig into the code and change the rules for research purpose. Specifically, the following classes are used in all the games: