diff --git a/backpropagation.qmd b/backpropagation.qmd index 9831156..668a710 100644 --- a/backpropagation.qmd +++ b/backpropagation.qmd @@ -705,7 +705,7 @@ Let's compute the backward pass output: $$\begin{aligned} &= [\mathbf{g}_{\texttt{out}}^a, \mathbf{g}_{\texttt{out}}^b] \frac{\partial [\mathbf{x}_{\texttt{out}}^a, \mathbf{x}_{\texttt{out}}^b]}{\partial \mathbf{x}_{\texttt{in}}}\\ &= [\mathbf{g}_{\texttt{out}}^a, \mathbf{g}_{\texttt{out}}^b] \frac{\partial [\mathbf{x}_{\texttt{in}}, \mathbf{x}_{\texttt{in}}]}{\partial \mathbf{x}_{\texttt{in}}}\\ &= [\mathbf{g}_{\texttt{out}}^a, \mathbf{g}_{\texttt{out}}^b][1, 1]^\mathsf{T}\\ - &= \mathbf{g}_{\texttt{out}}^b + \mathbf{g}_{\texttt{out}}^b + &= \mathbf{g}_{\texttt{out}}^a + \mathbf{g}_{\texttt{out}}^b \end{aligned}$$ So, branching just sums both the gradients passed backward to it.