Loss Functions - Notes on AI

# Loss Functions ## Mean-Squared-Error Loss Data: inputs $\mathbf{X}=\left(\mathbf{x}_{1}, \ldots, \mathbf{x}_{N}\right)^{T},$ and targets $\mathbf{t}=\left(t_{1}, \ldots, t_{N}\right)^{T}$ Assume target distribution as Gaussian: $p(t \mid \mathbf{x}, \mathbf{w}) = \mathcal{N}(\mathbf{t}| y(\mathbf{w},\mathbf{x}),\beta^{-1})$ single target $->$ single output unit: $y(\mathbf{x}, \mathbf{w})=h^{(L)}\left(a^{\text {out }}\right)$ Targets are real valued: identity output activation function: $y(\mathbf{x}, \mathbf{w})=h^{(L)}\left(a^{\text {out }}\right)=a^{\text {out }}$ Maximum Likelihood/minimum negative log likelihood: $ E(\mathbf{w})=-\ln p(\mathbf{t} \mid \mathbf{X}, \mathbf{w})=\frac{\beta}{2} \sum_{n=1}^{N}\left\{y\left(\mathbf{x}_{n}, \mathbf{w}\right)-t_{n}\right\}^{2}-\frac{N}{2} \ln \beta+\frac{N}{2} \ln 2 \pi \\ $ Equivalently, $ E(\mathbf{w})=\frac{1}{2} \sum_{n=1}^{N}\left\{y\left(\mathbf{x}_{n}, \mathbf{w}\right)-t_{n}\right\}^{2} $ commonly referred to as Mean-Squared Error, Quadratic Loss etc. ## Binary Cross Entropy Loss Data: inputs $\mathbf{X}=\left(\mathbf{x}_{1}, \ldots, \mathbf{x}_{N}\right)^{T},$ and targets $\mathbf{t}=\left(t_{1}, \ldots, t_{N}\right)^{T}$ Assume target distribution as Bernoulli, and make prediction for probability for class 1: $y(\mathbf{x}, \mathbf{w})=p(t=1 \mid \mathbf{x})$ $ p(t \mid \mathbf{x}, \mathbf{w})=\quad y(\mathbf{x}, \mathbf{w})^{t}\left(1-y(\mathbf{x}, \mathbf{w})\right)^{1-t} $ Targets are binary: [[Activation Functions#Logistic Sigmoid|Sigmoid]] output activation function: $ y(\mathbf{x}, \mathbf{w})=h^{(L)}\left(a^{\text {out }}\right)=\sigma\left(a^{\text {out }}\right) $ Maximum Likelihood/minimum negative log likelihood: $ E(\mathbf{w})=-\sum_{n=1}^{N} t_{n} \ln y\left(\mathbf{x}_{n}, \mathbf{w}\right)+\left(1-t_{n}\right) \ln \left(1-y\left(\mathbf{x}_{n}, \mathbf{y}\right)\right) $ commonly referred to as Binary [[Cross entropy]] loss. ## Cross Entropy Loss Data: inputs $\mathbf{X}=\left(\mathbf{x}_{1}, \ldots, \mathbf{x}_{N}\right)^{T},$ and one-hot encoded targets $\mathbf{T}=\left(\mathbf{t}_{1}, . . ., \mathbf{t}_{N}\right)^{T}$ Assume target distribution as generalized Bernoulli: $ p\left(\mathbf{t}_{n} \mid \mathbf{x}_{n}, \mathbf{w}\right) = \prod^{k}_{k=1}y_k(\mathbf{x_n},\mathbf{w})^{t_{nk}} $ $\mathrm{K}$ targets $->\mathrm{K}$ output units: $\quad y_{k}(\mathbf{x}, \mathbf{w})=h^{(L)}\left(a_{k}^{\text {out }}\right)$ Categorical targets: [[Softmax]] output activation function: $ y_{k}(\mathbf{x}, \mathbf{w})=h^{(L)}\left(\mathbf{a}^{\text {out }}\right)=\frac{\exp \left(a_{k}^{\text {out }}\right)}{\sum_{j=1}^{K} \exp \left(a_{j}^{\text {out }}\right)} $ Maximum Likelihood/minimum negative log likelihood: $ E(\mathbf{w})=-\sum_{n=1}^{N} \sum_{k=1}^{k} t_{nk} \ln y_{k}\left(\mathbf{x}_{n}, \mathbf{w}\right) $ commonly referred to as [[Cross entropy]] loss. ## Other Loss functions [[Polyloss]] --- ## References