-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Open

Description
Hi Morvan,
I am trying to implement your Batch Normalization tutorial on your DDPG algorithme tutorial, but i have a hard time understanding the bits?
one of my problems is:
` self.a_loss = - tf.reduce_mean(q) # maximize the q
self.atrain = tf.train.AdamOptimizer(LR_A).minimize(self.a_loss, var_list=a_params)
update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS)
with tf.control_dependencies(update_ops):
with tf.control_dependencies(target_update): # soft replacement happened at here
self.q_target = self.R + ((GAMMA * (1- self.Done)) * (q_ * (1 - self.Done)))
self.td_error = tf.losses.mean_squared_error(labels=self.q_target, predictions=q)
self.ctrain = tf.train.AdamOptimizer(LR_C).minimize(self.td_error, var_list=c_params) `
Since you said you need to have that update_ops i imagned that it should look something like this, but this then won't include the atrain, if not this being incorrect of course?
furthermore if you could give some signs of directions on how to implement it on your ddpg implementation that would be nice,
Jan
Metadata
Metadata
Assignees
Labels
No labels