You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: paper_notes/drl_flappy.md
+4-1
Original file line number
Diff line number
Diff line change
@@ -39,4 +39,7 @@ It is quite impressive that the paper is a project paper from a stanford undergr
39
39
- The training code is heavily based on [this repo](https://github.com/asrivat1/DeepLearningVideoGames) to train DQN on Atari pong or tetris. However the main difference is that in previous code is the **frame skipping strategy**. The Atari code picks an action and [repeats the action](https://github.com/asrivat1/DeepLearningVideoGames/blob/master/deep_q_network.py#L133-L139) for the next K-1 frames, the flappy bird code [keeps idle](https://github.com/yenchenlin/DeepLearningFlappyBird/blob/master/deep_q_network.py#L122-L131) for the next K-1 frames. This is due to the nature of the game. Too much flap causes the bird to go too high and crash.
40
40
- Another difference is that the Atari code uses random play to initialize experience replay buffer, but the flappy bird repo uses the $\epsilon$-greedy training routine. This should not be a huge difference.
41
41
- This repo does not implement the target network. In order to copy one network to another, use `target_net_param = tf.assign(net_param)`, as shown in [this repo](https://github.com/initial-h/FlappyBird_DQN_with_target_network/blob/master/DQN_with_target_network.py#L321) and [this one](https://github.com/dennybritz/reinforcement-learning/blob/master/DQN/dqn.py#L150-L169).
42
-
42
+
- Two things to remember to use the global variable
43
+
1. Create the global step variable (either through `tf.Variable` or
44
+
`tf.train.create_global_variable`)
45
+
2. Pass it to tf.train.Optimizer.minimize for automatic increment
0 commit comments