# Gaussian Distribution
In univariate case, Gaussian distribution is given by
$
p\left(x ; \mu, \sigma^{2}\right)=\frac{1}{\sqrt{2 \pi} \sigma} \exp \left(-\frac{1}{2 \sigma^{2}}(x-\mu)^{2}\right)
$
The coefficient in front, $\frac{1}{\sqrt{2 \pi} \sigma},$ is a constant that does not depend on $x$ hence, we can think of it as simply a "normalization factor" used to ensure that
$
\frac{1}{\sqrt{2 \pi} \sigma} \int_{-\infty}^{\infty} \exp \left(-\frac{1}{2 \sigma^{2}}(x-\mu)^{2}\right)=1
$
### Expectation
$\mathbb{E}[x]=\int_{-\infty}^{\infty} \mathcal{N}\left(x \mid \mu, \sigma^{2}\right) x \mathrm{d} x=\mu$
$\mathbb{E}\left[x^{2}\right]=\int_{-\infty}^{\infty} \mathcal{N}\left(x \mid \mu, \sigma^{2}\right) x^{2} \mathrm{d} x=\mu^{2}+\sigma^{2}$
$\operatorname{var}[x]=\mathbb{E}\left[x^{2}\right]-\mathbb{E}[x]^{2}=\sigma^{2}$
### Multivariate Gaussian Distribution
A vector-valued random variable $X=\left[X_{1} \cdots X_{n}\right]^{T}$ is said to have a multivariate normal (or Gaussian) distribution with mean $\mu \in \mathbf{R}^{n}$ and covariance matrix $\Sigma \in \mathbf{S}_{++}^{n}$ " if its probability density function $^{2}$ is given by
$
p(x ; \mu, \Sigma)=\frac{1}{(2 \pi)^{n / 2}|\Sigma|^{1 / 2}} \exp \left(-\frac{1}{2}(x-\mu)^{T} \Sigma^{-1}(x-\mu)\right)
$
Note that $\mathbf{S}_{++}^{n}$ is the space of symmetric positive definite $n \times n$ matrices, defined as $\mathbf{S}_{++}^{n}=\left\{A \in \mathbf{R}^{n \times n}: A=A^{T}\right.$ and $x^{T} A x>0$ for all $x \in \mathbf{R}^{n}$ such that $\left.x \neq 0\right\}$
## Properties of Gaussian Distributions
### Marginalization property
If two sets of variables are jointly Gaussian, then the conditional Gaussian distribution of one set conditioned on the other is again Gaussian. Similarly, the marginal distributions of the either set is also Gaussian.
Consider a distribution:
$
p\left(x_{1}, x_{2}\right)=\mathcal{N}\left(\left[\begin{array}{l}
x_{1} \\
x_{2}
\end{array}\right] \mid\left[\begin{array}{l}
\mu_{1} \\
\mu_{2}
\end{array}\right],\left[\begin{array}{ll}
\Sigma_{11} & \Sigma_{12} \\
\Sigma_{21} & \Sigma_{22}
\end{array}\right]\right)
$
Then the marginals are given by
$
p\left(x_{1}\right)=\mathcal{N}\left(x_{1} \mid \mu_{1}, \Sigma_{11}\right)
$
$
p\left(x_{2}\right)=N\left(x_{2} \mid \mu_{2}, \Sigma_{22}\right)
$
### Conditioning property
Similarly, the condtional for the above distribution is given as another gaussian
$
p\left(x_{1} \mid x_{2}\right)=\mathcal{N}\left(\mu_{1 \mid 2}, \Sigma_{1 \mid 2}\right)
$
with
$
\begin{array}{l}
\mu_{1 \mid 2}=\mu_{1}+\Sigma_{12} \Sigma_{22}^{-1}\left(x_{2}-\mu_{2}\right) \\
\Sigma_{112}=\Sigma_{11}-\Sigma_{12} \Sigma_{22}^{-1} \Sigma_{21}
\end{array}
$
![[conditional and marginal.jpg]]
### Summation property
The sum of two independant Gaussian random variables is also a Gaussian random variable
If
$
\begin{array}{l}
x \sim \mathcal{N}(\mu, \Sigma) \\
y \sim \mathscr{N}\left(\mu^{\prime}, \Sigma^{\prime}\right)
\end{array}
$
Then $z=x+y \quad \rightarrow \quad z \sim \mathcal{N}\left(\mu+\mu^{\prime}, \Sigma+\Sigma^{\prime}\right)$
### Reparameterization Trick
If we sample a vector $\mathbf{x}$ from a Gaussian $\mathbf{x} \sim \mathscr{N}(\mathbf{0}, \mathbf{I})$, and if $\mathbf{y}=\boldsymbol{\mu}+\mathbf{A x}$, then we have
$
\mathbf{y} \sim \mathscr{N}\left(\boldsymbol{\mu}, \mathbf{A} \mathbf{A}^{T}\right)
$
where $\mathbf{\Sigma}=\mathbf{A} \mathbf{A}^{T}$
This means if you have access to a sampler for uncorrelated Gaussian variables, you can create correlated samples for a given mean $\boldsymbol{\mu}$ and covariance $\boldsymbol{\Sigma}$
$
y \sim N(\mu, \Sigma)
$
For a given $\boldsymbol{\Sigma}$ you can compute $\Sigma=\mathbf{A A}^{T}$ with Cholesky decomposition such that $\mathbf{A}$ is lower triangular. Or you can compute the eigendecomposition $\mathbf{\Sigma}=\mathbf{U} \mathbf{\Lambda} \mathbf{U}^{T}$ and take $\mathbf{A}=\mathbf{U} \mathbf{\Lambda}^{1 / 2}$.
### Approximating other distributions
Any distribution in $d$ dimensions can be generated by taking a set of $d$ variables that are normally distributed and mapping them through a sufficiently complicated function (ex: neural networks). This trick is the core idea behind [[Variational Autoencoders]].
By using a sufficient number of gaussians, and by adjusting their means and covariances as well as the coefficients in the linear combination, almost any continuous density can be approximated to arbitrary accuracy.
## Some useful results
Density derivative with respect to $\mathbf{\mu}_k$ (when the covariance matrix is positive definite) is given by
$
\frac{\partial}{\partial \mu_{k}} \mathcal{N}\left(\mathbf{x} \mid \boldsymbol{\mu}_{k}, \mathbf{\Sigma}_{k}\right)={\mathcal{N}\left(\mathbf{x} \mid \boldsymbol{\mu}_{k}, \mathbf{\Sigma}_{k}\right)}\left(\mathbf{x}-\boldsymbol{\mu}_{k}\right)^{T} \mathbf{\Sigma}^{-1}
$
## Limitations of Gaussian Distribution
1. Very sensitive to outliers as means are affected by them.
2. Unimodal distribution (but can be handled with latent variables giving rise to [[Gaussian Mixture Model]])
---
## References
1. http://cs229.stanford.edu/section/gaussians.pdf
2. 2.3 Bishop 2006