# Universal Approximation Theorem
Let $f$ by any continuous function on a compact area of $\mathbb{R}^D$ and $h$ any fixed analytic function which is not polynomial (e.g. logistic function, tanh function, ...). Given any small number $\epsilon>0$ of an acceptable error, we can find a number $M$ and weights $w_m^{(2)}$ and $w_{m d}^{(1)} \in \mathbb{R}$ such that:
$
\left|f(\boldsymbol{x})-y\left(\boldsymbol{x}, \boldsymbol{W}^{(1)}, \boldsymbol{w}^{(2)}\right)\right|<\epsilon
$
with $y$ as two-layer $\mathrm{NN}$.
- For smaller $\epsilon$ we need more hidden units $\Rightarrow$ larger $M$
- We may also take deeper networks that are usually capable to approximate more complex functions with less units
A more precise statement of the universality theorem is that neural networks with a single hidden layer can be used to approximate any continuous function to any desired precision.
---
## References
1. Micheal Neilsen's intuitive visual proof: http://neuralnetworksanddeeplearning.com/chap4.html