# Backpropagation
With error function of the form
$
E(\mathbf{w})=\sum_{n=1}^{N} E_{n}(\mathbf{w})
$
$
\frac{\partial E_{n}(\mathbf{w})}{\partial \mathbf{w}}
$
The goal is to evaluate the above gradient for optimization of parameters using [[Stochastic Gradient Descent]].
Output/hidden activations: $a_{j}^{(l)}=\sum_{i} w_{j i}^{(l)} z_{i}^{(l-1)}$
Output/hidden units: $z_{j}^{(l)}=h^{(l)}\left(a_{j}^{(l)}\right)$
Two stages:
1. Forward propagation: Compute all $a_{j}$ and $z_{j}$
2. Back propagation: Compute all derivatives $\frac{\partial E_{n}}{\partial w_{j i}^{(l)}}$