Universal Approximation Theorem

# Universal Approximation Theorem Let $f$ by any continuous function on a compact area of $\mathbb{R}^D$ and $h$ any fixed analytic function which is not polynomial (e.g. logistic function, tanh function, ...). Given any small number $\epsilon>0$ of an acceptable error, we can find a number $M$ and weights $w_m^{(2)}$ and $w_{m d}^{(1)} \in \mathbb{R}$ such that: $ \left|f(\boldsymbol{x})-y\left(\boldsymbol{x}, \boldsymbol{W}^{(1)}, \boldsymbol{w}^{(2)}\right)\right|<\epsilon $ with $y$ as two-layer $\mathrm{NN}$. - For smaller $\epsilon$ we need more hidden units $\Rightarrow$ larger $M$ - We may also take deeper networks that are usually capable to approximate more complex functions with less units A more precise statement of the universality theorem is that neural networks with a single hidden layer can be used to approximate any continuous function to any desired precision. --- ## References 1. Micheal Neilsen's intuitive visual proof: http://neuralnetworksanddeeplearning.com/chap4.html