Backpropagation - Notes on AI

# Backpropagation With error function of the form $ E(\mathbf{w})=\sum_{n=1}^{N} E_{n}(\mathbf{w}) $ $ \frac{\partial E_{n}(\mathbf{w})}{\partial \mathbf{w}} $ The goal is to evaluate the above gradient for optimization of parameters using [[Stochastic Gradient Descent]]. Output/hidden activations: $a_{j}^{(l)}=\sum_{i} w_{j i}^{(l)} z_{i}^{(l-1)}$ Output/hidden units: $z_{j}^{(l)}=h^{(l)}\left(a_{j}^{(l)}\right)$ Two stages: 1. Forward propagation: Compute all $a_{j}$ and $z_{j}$ 2. Back propagation: Compute all derivatives $\frac{\partial E_{n}}{\partial w_{j i}^{(l)}}$