Question about reproducing the result #26

chihchiehchen · 2017-04-12T00:40:22Z

Hello,

I tried to reproduce the result (with n_action_repeat 1) on the computer with GTX 1080, however the performance is not as good as shown in the figure. After 2.88 M steps the average reward is 0.0174,
the average ep_reward is 3.1071, and the max ep_reward is 7.

Maybe I did something wrong in the setting or misread some information. Could you give me some suggestions? Thanks a lot!

Chih-Chieh

hiwonjoon · 2017-06-26T18:05:56Z

Not answering your questions, but, What kinds of an environment are you testing? Breakout-v0?
How long does it take for about 3M steps in your setting?

chihchiehchen · 2017-06-27T01:18:30Z

Hello, I believe there's something wrong(hardware or software)，when I try to run similar program (made by the assistant Professor in the deep learning class in NCTU ), while I can only get 10 points in average and 20 in maximum, my friends told me he can easily get 30 or 40 in average and 238 in maximum. For hardware equipment, I use GTX 1080 (8 G), INTEL core i7 7700 3.6GHz (4 core), ASUS PRIME Z270-K (ATX/DDR4*4/1A1D1H/U3.1 A+C/M.2/Com for main Board, and KINGSTON 8GB DDR4 2400 KVR24N17S8/8, RAM 288pin,DDR4-2400 5 SEAGATE 1TB ST1000DM010 BarraCuda for hardware disk; for software, I installed openai gym with version 0.7.3, tensorflow 0.12.0 (from the binary). Two or three months ago I tried to run the program, at 3M step it takes around 5 hours (1 miniutes 07 seconds for 10 thounsand frames) in order to speed up I used some trick, I mean, if at the n-th step we get 1 point, at n-1 step we will let the reward be 0.85, in n-2 step be 0.85^2, and so on. I didn't compare this result to n-step DQN or https://arxiv.org/pdf/1611.01606.pdf , as far as I remembered it can achieved average episode reward 1 in one hour, and 3 in two hours, but the issue is the dramatically growth of q-value (around 9 )will make the learning process very insensitive, so we need a mechanism to slow down the growth of q-vale (at that time I dynamically adjust the discount factor, let me not go into the detail since I think it is very immature), and this method will not work well on A3C like algorithm, too many consecutive frames will accelerate the growth of q-value，in this case n-step A3C woks much better. Even now I didn't figure out what is wrong with the setting, now I tried to work on A3C like algorithm, since it does not rely on hardware equipment so much. In any case thanks for your patient and thanks for the kindly reply. Best, Chih-Chieh

…

On Tue, Jun 27, 2017 at 2:06 AM, Wonjoon Goo ***@***.***> wrote: Not answering your questions, but, What kinds of an environment are you testing? Breakout-v0? How long does it take for about 3M steps in your setting? — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#26 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AaDanW8neGJ8yfuL7fNbg92v6bCo2qc3ks5sH_MSgaJpZM4M60IS> .

FushanLi · 2017-12-14T11:09:07Z

I am running the model, and get very similar performance as you.
My guess is the results shown in the figure are run by DQN published on nature. And I am using the model published in nips.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question about reproducing the result #26

Question about reproducing the result #26

chihchiehchen commented Apr 12, 2017

hiwonjoon commented Jun 26, 2017

chihchiehchen commented Jun 27, 2017 via email •

edited

Loading

FushanLi commented Dec 14, 2017

Question about reproducing the result #26

Question about reproducing the result #26

Comments

chihchiehchen commented Apr 12, 2017

hiwonjoon commented Jun 26, 2017

chihchiehchen commented Jun 27, 2017 via email • edited Loading

FushanLi commented Dec 14, 2017

chihchiehchen commented Jun 27, 2017 via email •

edited

Loading