-
Notifications
You must be signed in to change notification settings - Fork 396
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Question about reproducing the result #26
Comments
Not answering your questions, but, What kinds of an environment are you testing? Breakout-v0? |
Hello,
I believe there's something wrong(hardware or software),when I try to run
similar program (made by the assistant Professor in the deep learning
class in NCTU ), while I can only get 10 points in average and 20 in
maximum, my friends told me he can easily get 30 or 40 in average and 238
in maximum.
For hardware equipment, I use GTX 1080 (8 G), INTEL core i7 7700 3.6GHz (4
core), ASUS PRIME Z270-K (ATX/DDR4*4/1A1D1H/U3.1 A+C/M.2/Com for main
Board, and KINGSTON 8GB DDR4 2400 KVR24N17S8/8, RAM 288pin,DDR4-2400 5 SEAGATE
1TB ST1000DM010 BarraCuda for hardware disk; for software, I installed
openai gym with version 0.7.3, tensorflow 0.12.0 (from the binary).
Two or three months ago I tried to run the program, at 3M step it takes around 5 hours
(1 miniutes 07 seconds for 10 thounsand frames) in order to
speed up I used some trick, I mean, if at the n-th step we get 1 point, at
n-1 step we will let the reward be 0.85, in n-2 step be 0.85^2, and so on.
I didn't compare this result to n-step DQN or
https://arxiv.org/pdf/1611.01606.pdf , as far as I remembered it can
achieved average episode reward 1 in one hour, and 3 in two hours, but the
issue is the dramatically growth of q-value (around 9 )will make the
learning process very insensitive, so we need a mechanism to slow down the
growth of q-vale (at that time I dynamically adjust the discount factor,
let me not go into the detail since I think it is very immature), and this
method will not work well on A3C like algorithm, too many consecutive
frames will accelerate the growth of q-value,in this case n-step A3C woks
much better.
Even now I didn't figure out what is wrong with the setting, now I tried to
work on A3C like algorithm, since it does not rely on hardware equipment so
much. In any case thanks for your patient and thanks for the kindly reply.
Best,
Chih-Chieh
…On Tue, Jun 27, 2017 at 2:06 AM, Wonjoon Goo ***@***.***> wrote:
Not answering your questions, but, What kinds of an environment are you
testing? Breakout-v0?
How long does it take for about 3M steps in your setting?
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#26 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AaDanW8neGJ8yfuL7fNbg92v6bCo2qc3ks5sH_MSgaJpZM4M60IS>
.
|
I am running the model, and get very similar performance as you. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Hello,
I tried to reproduce the result (with n_action_repeat 1) on the computer with GTX 1080, however the performance is not as good as shown in the figure. After 2.88 M steps the average reward is 0.0174,
the average ep_reward is 3.1071, and the max ep_reward is 7.
Maybe I did something wrong in the setting or misread some information. Could you give me some suggestions? Thanks a lot!
Chih-Chieh
The text was updated successfully, but these errors were encountered: