-
Notifications
You must be signed in to change notification settings - Fork 296
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bug: successor.local_gradient_for_argument(self)
could be corrupted
#52
Comments
The math notations are going weird. I post the raw markdown here
|
I have been slow to respond to this comment because it is quite detailed and I want to make sure I give it the time it deserves. For starters, one cannot compose Next, I believe the structure of the code ensures that all gradients are computed before any update steps occur. I believe you can see this by applying the following patch (attached as a file as well), and running any of the integration tests in diff --git a/neural_network/neural_network.py b/neural_network/neural_network.py
index cee6a4a..db639e1 100644
--- a/neural_network/neural_network.py
+++ b/neural_network/neural_network.py
@@ -107,9 +107,11 @@ class Node:
def do_gradient_descent_step(self, step_size):
'''The core gradient step subroutine: compute the gradient for each of this node's
tunable parameters, step away from the gradient.'''
+ print("Starting one gradient descent step")
if self.has_parameters:
for i, gradient_entry in enumerate(self.global_parameter_gradient):
# step away from the gradient
+ print("Doing gradient update for parameter %s" % i)
self.parameters[i] -= step_size * gradient_entry
'''Gradient computations which don't depend on the node's definition.'''
@@ -147,6 +149,7 @@ class Node:
@property
def local_gradient(self):
if self.cache.local_gradient is None:
+ print("Computing local gradient for %s" % (self, ))
self.cache.local_gradient = self.compute_local_gradient()
return self.cache.local_gradient
@@ -159,6 +162,7 @@ class Node:
@property
def local_parameter_gradient(self):
if self.cache.local_parameter_gradient is None:
+ print("Computing local parameter gradient for %s" % (self, ))
self.cache.local_parameter_gradient = self.compute_local_parameter_gradient()
return self.cache.local_parameter_gradient
@@ -391,6 +395,7 @@ class NeuralNetwork:
return self.error_node.compute_error(inputs, label)
def backpropagation_step(self, inputs, label, step_size=None):
+ print("Doing one backpropagation step")
self.compute_error(inputs, label)
self.for_each(lambda node: node.do_gradient_descent_step(step_size)) The test output should show that all gradient computations occur before all parameter updates. E.g., here is what I see in one gradient descent step of the xor example:
The mnist example looks similar to me. It's possible this issue is still occurring, and I'm too dense to see it. Perhaps you could try to formulate your issue as a failing test case? Or else, just an example network in the code with print statements that demonstrate the issue? |
@j2kun I have come up with a test case https://github.com/Banyc/programmers-introduction-to-mathematics/blob/banyc/issue_52/neural_network/neural_network_test.py#L257-L316 The bug is reproduced at that test case |
Also, thanks for the instruction on the |
Issue
local_gradient_for_argument might read a corrupted float value from the successor.
Say we have the computation graph:
Solutions
I can come up with two solutions
local_gradient
(local_gradient
(global_gradient
(PS
Thank you for your book and the sample code so that I could deeper understand the neural network!
The text was updated successfully, but these errors were encountered: