# Loss Functions
## Mean-Squared-Error Loss
Data: inputs $\mathbf{X}=\left(\mathbf{x}_{1}, \ldots, \mathbf{x}_{N}\right)^{T},$ and targets $\mathbf{t}=\left(t_{1}, \ldots, t_{N}\right)^{T}$
Assume target distribution as Gaussian:
$p(t \mid \mathbf{x}, \mathbf{w}) = \mathcal{N}(\mathbf{t}| y(\mathbf{w},\mathbf{x}),\beta^{-1})$
single target $->$ single output unit: $y(\mathbf{x}, \mathbf{w})=h^{(L)}\left(a^{\text {out }}\right)$
Targets are real valued: identity output activation function:
$y(\mathbf{x}, \mathbf{w})=h^{(L)}\left(a^{\text {out }}\right)=a^{\text {out }}$
Maximum Likelihood/minimum negative log likelihood:
$
E(\mathbf{w})=-\ln p(\mathbf{t} \mid \mathbf{X}, \mathbf{w})=\frac{\beta}{2} \sum_{n=1}^{N}\left\{y\left(\mathbf{x}_{n}, \mathbf{w}\right)-t_{n}\right\}^{2}-\frac{N}{2} \ln \beta+\frac{N}{2} \ln 2 \pi \\
$
Equivalently,
$
E(\mathbf{w})=\frac{1}{2} \sum_{n=1}^{N}\left\{y\left(\mathbf{x}_{n}, \mathbf{w}\right)-t_{n}\right\}^{2}
$
commonly referred to as Mean-Squared Error, Quadratic Loss etc.
## Binary Cross Entropy Loss
Data: inputs $\mathbf{X}=\left(\mathbf{x}_{1}, \ldots, \mathbf{x}_{N}\right)^{T},$ and targets $\mathbf{t}=\left(t_{1}, \ldots, t_{N}\right)^{T}$
Assume target distribution as Bernoulli, and make prediction for probability for class 1:
$y(\mathbf{x}, \mathbf{w})=p(t=1 \mid \mathbf{x})$
$
p(t \mid \mathbf{x}, \mathbf{w})=\quad y(\mathbf{x}, \mathbf{w})^{t}\left(1-y(\mathbf{x}, \mathbf{w})\right)^{1-t}
$
Targets are binary: [[Activation Functions#Logistic Sigmoid|Sigmoid]] output activation function:
$
y(\mathbf{x}, \mathbf{w})=h^{(L)}\left(a^{\text {out }}\right)=\sigma\left(a^{\text {out }}\right)
$
Maximum Likelihood/minimum negative log likelihood:
$
E(\mathbf{w})=-\sum_{n=1}^{N} t_{n} \ln y\left(\mathbf{x}_{n}, \mathbf{w}\right)+\left(1-t_{n}\right) \ln \left(1-y\left(\mathbf{x}_{n}, \mathbf{y}\right)\right)
$
commonly referred to as Binary [[Cross entropy]] loss.
## Cross Entropy Loss
Data: inputs $\mathbf{X}=\left(\mathbf{x}_{1}, \ldots, \mathbf{x}_{N}\right)^{T},$ and one-hot encoded targets $\mathbf{T}=\left(\mathbf{t}_{1}, . . ., \mathbf{t}_{N}\right)^{T}$
Assume target distribution as generalized Bernoulli:
$
p\left(\mathbf{t}_{n} \mid \mathbf{x}_{n}, \mathbf{w}\right) = \prod^{k}_{k=1}y_k(\mathbf{x_n},\mathbf{w})^{t_{nk}}
$
$\mathrm{K}$ targets $->\mathrm{K}$ output units: $\quad y_{k}(\mathbf{x}, \mathbf{w})=h^{(L)}\left(a_{k}^{\text {out }}\right)$
Categorical targets: [[Softmax]] output activation function:
$
y_{k}(\mathbf{x}, \mathbf{w})=h^{(L)}\left(\mathbf{a}^{\text {out }}\right)=\frac{\exp \left(a_{k}^{\text {out }}\right)}{\sum_{j=1}^{K} \exp \left(a_{j}^{\text {out }}\right)}
$
Maximum Likelihood/minimum negative log likelihood:
$
E(\mathbf{w})=-\sum_{n=1}^{N} \sum_{k=1}^{k} t_{nk} \ln y_{k}\left(\mathbf{x}_{n}, \mathbf{w}\right)
$
commonly referred to as [[Cross entropy]] loss.
## Other Loss functions
[[Polyloss]]
---
## References