# Gaussian Distribution In univariate case, Gaussian distribution is given by $ p\left(x ; \mu, \sigma^{2}\right)=\frac{1}{\sqrt{2 \pi} \sigma} \exp \left(-\frac{1}{2 \sigma^{2}}(x-\mu)^{2}\right) $ The coefficient in front, $\frac{1}{\sqrt{2 \pi} \sigma},$ is a constant that does not depend on $x$ hence, we can think of it as simply a "normalization factor" used to ensure that $ \frac{1}{\sqrt{2 \pi} \sigma} \int_{-\infty}^{\infty} \exp \left(-\frac{1}{2 \sigma^{2}}(x-\mu)^{2}\right)=1 $ ### Expectation $\mathbb{E}[x]=\int_{-\infty}^{\infty} \mathcal{N}\left(x \mid \mu, \sigma^{2}\right) x \mathrm{d} x=\mu$ $\mathbb{E}\left[x^{2}\right]=\int_{-\infty}^{\infty} \mathcal{N}\left(x \mid \mu, \sigma^{2}\right) x^{2} \mathrm{d} x=\mu^{2}+\sigma^{2}$ $\operatorname{var}[x]=\mathbb{E}\left[x^{2}\right]-\mathbb{E}[x]^{2}=\sigma^{2}$ ### Multivariate Gaussian Distribution A vector-valued random variable $X=\left[X_{1} \cdots X_{n}\right]^{T}$ is said to have a multivariate normal (or Gaussian) distribution with mean $\mu \in \mathbf{R}^{n}$ and covariance matrix $\Sigma \in \mathbf{S}_{++}^{n}$ " if its probability density function $^{2}$ is given by $ p(x ; \mu, \Sigma)=\frac{1}{(2 \pi)^{n / 2}|\Sigma|^{1 / 2}} \exp \left(-\frac{1}{2}(x-\mu)^{T} \Sigma^{-1}(x-\mu)\right) $ Note that $\mathbf{S}_{++}^{n}$ is the space of symmetric positive definite $n \times n$ matrices, defined as $\mathbf{S}_{++}^{n}=\left\{A \in \mathbf{R}^{n \times n}: A=A^{T}\right.$ and $x^{T} A x>0$ for all $x \in \mathbf{R}^{n}$ such that $\left.x \neq 0\right\}$ ## Properties of Gaussian Distributions ### Marginalization property If two sets of variables are jointly Gaussian, then the conditional Gaussian distribution of one set conditioned on the other is again Gaussian. Similarly, the marginal distributions of the either set is also Gaussian. Consider a distribution: $ p\left(x_{1}, x_{2}\right)=\mathcal{N}\left(\left[\begin{array}{l} x_{1} \\ x_{2} \end{array}\right] \mid\left[\begin{array}{l} \mu_{1} \\ \mu_{2} \end{array}\right],\left[\begin{array}{ll} \Sigma_{11} & \Sigma_{12} \\ \Sigma_{21} & \Sigma_{22} \end{array}\right]\right) $ Then the marginals are given by $ p\left(x_{1}\right)=\mathcal{N}\left(x_{1} \mid \mu_{1}, \Sigma_{11}\right) $ $ p\left(x_{2}\right)=N\left(x_{2} \mid \mu_{2}, \Sigma_{22}\right) $ ### Conditioning property Similarly, the condtional for the above distribution is given as another gaussian $ p\left(x_{1} \mid x_{2}\right)=\mathcal{N}\left(\mu_{1 \mid 2}, \Sigma_{1 \mid 2}\right) $ with $ \begin{array}{l} \mu_{1 \mid 2}=\mu_{1}+\Sigma_{12} \Sigma_{22}^{-1}\left(x_{2}-\mu_{2}\right) \\ \Sigma_{112}=\Sigma_{11}-\Sigma_{12} \Sigma_{22}^{-1} \Sigma_{21} \end{array} $ ![[conditional and marginal.jpg]] ### Summation property The sum of two independant Gaussian random variables is also a Gaussian random variable If $ \begin{array}{l} x \sim \mathcal{N}(\mu, \Sigma) \\ y \sim \mathscr{N}\left(\mu^{\prime}, \Sigma^{\prime}\right) \end{array} $ Then $z=x+y \quad \rightarrow \quad z \sim \mathcal{N}\left(\mu+\mu^{\prime}, \Sigma+\Sigma^{\prime}\right)$ ### Reparameterization Trick If we sample a vector $\mathbf{x}$ from a Gaussian $\mathbf{x} \sim \mathscr{N}(\mathbf{0}, \mathbf{I})$, and if $\mathbf{y}=\boldsymbol{\mu}+\mathbf{A x}$, then we have $ \mathbf{y} \sim \mathscr{N}\left(\boldsymbol{\mu}, \mathbf{A} \mathbf{A}^{T}\right) $ where $\mathbf{\Sigma}=\mathbf{A} \mathbf{A}^{T}$ This means if you have access to a sampler for uncorrelated Gaussian variables, you can create correlated samples for a given mean $\boldsymbol{\mu}$ and covariance $\boldsymbol{\Sigma}$ $ y \sim N(\mu, \Sigma) $ For a given $\boldsymbol{\Sigma}$ you can compute $\Sigma=\mathbf{A A}^{T}$ with Cholesky decomposition such that $\mathbf{A}$ is lower triangular. Or you can compute the eigendecomposition $\mathbf{\Sigma}=\mathbf{U} \mathbf{\Lambda} \mathbf{U}^{T}$ and take $\mathbf{A}=\mathbf{U} \mathbf{\Lambda}^{1 / 2}$. ### Approximating other distributions Any distribution in $d$ dimensions can be generated by taking a set of $d$ variables that are normally distributed and mapping them through a sufficiently complicated function (ex: neural networks). This trick is the core idea behind [[Variational Autoencoders]]. By using a sufficient number of gaussians, and by adjusting their means and covariances as well as the coefficients in the linear combination, almost any continuous density can be approximated to arbitrary accuracy. ## Some useful results Density derivative with respect to $\mathbf{\mu}_k$ (when the covariance matrix is positive definite) is given by $ \frac{\partial}{\partial \mu_{k}} \mathcal{N}\left(\mathbf{x} \mid \boldsymbol{\mu}_{k}, \mathbf{\Sigma}_{k}\right)={\mathcal{N}\left(\mathbf{x} \mid \boldsymbol{\mu}_{k}, \mathbf{\Sigma}_{k}\right)}\left(\mathbf{x}-\boldsymbol{\mu}_{k}\right)^{T} \mathbf{\Sigma}^{-1} $ ## Limitations of Gaussian Distribution 1. Very sensitive to outliers as means are affected by them. 2. Unimodal distribution (but can be handled with latent variables giving rise to [[Gaussian Mixture Model]]) --- ## References 1. http://cs229.stanford.edu/section/gaussians.pdf 2. 2.3 Bishop 2006