# Monte-Carlo Estimation
## Motivation
We are often interested to compute quantities (statistics) on random variables. For ex,
- the average response to a drug
- probability of a particular sum when throwing two dice
- average reconstructions in VAE given an input.
These statistics often intractable to compute.
- Cannot derive a perfect drug response model (too complex)
- Cannot enumerate all possible dice combinations (too lazy)
- Computationally infeasible (intractable integrals)
## Monte Carlo Integration
Use random sampling instead of analytical computation. A single random sample might not be enough Many random samples can give us a reliable quantification. E.g, by throwing dice many times we can obtain a histogram of probabilities for each possible sum. If we throw dice once, the histogram will be very wrong (just a single bar). But if we repeat hundreds of times and average, we are gonna get very close.
More formally, in MC integration we treat inputs $x$ as random variables with pdf $p(x)$. Our desired statistic $y$ is the output and integrate over all possible $x$
$
y=\int_{x} f(x) p(x) d x
$
This integral is equivalent to an expectation
$
y = \int_{x} f(x) p(x) d x = \mathbb{E}_{x \sim p(x)}[f(x)]
$
We can approximate this expectation by random sampling and summation
$y=\mathbb{E}_{x \sim p(x)}[f(x)] \approx \frac{1}{n} \sum_{i} f\left(x_{i}\right)=\hat{y}$ where $x_{i}$ is sampled from $p(x)$.
Note that $\hat{y}$ is an estimator because it only approximately estimates the value of $y$.
If we want to compute a quantity $y$ that we can express it as an integral of a function $f$ over a probability space $x$ that has a known and easy to sample pdf $p(x)$ we can replace the exact but intractable computation with a tractable $\mathrm{MC}$ estimator
$
y=\mathbb{E}_{x \sim p(x)}[f(x)] \approx \frac{1}{n} \sum_{i} f\left(x_{i}\right), x_{i} \sim p(x)
$
If we can't translate the quantity as such an integral, we can't estimate it. For instance, we cannot use $\mathrm{MC}$ on the following because neither the $\log p(x \mid z)$ nor the $\nabla_{\varphi} q_{\varphi}(z \mid x)$ are probability densities:
$
\nabla_{\varphi} \mathbb{E}_{\mathbf{z} \sim q_{\varphi}(\mathbf{z} \mid x)}[\log p(\boldsymbol{x} \mid \mathbf{z})]=\int_{Z} \log p(\boldsymbol{x} \mid \mathbf{z}) \nabla_{\varphi} q_{\varphi}(\mathbf{z} \mid \boldsymbol{x}) d \mathbf{z}
$
## Estimator mean and variance
Our estimator is itself a random variable $\rightarrow$. It has its own mean $\mu_{\hat{y}}=\mathbb{E}[\hat{y}]$ and variance $\operatorname{Var}[\hat{y}]=\mathbb{E}\left[\left(\hat{y}-\mu_{\hat{y}}\right)^{2}\right]$.
The higher the variance, the more the estimation fluctuates after every new experiment.
### Estimator bias
An estimator is unbiased it in expectation it matches the true statistic
$
\mathbb{E}[\hat{y}]=y
$
Otherwise, biased with bias
$
\text { bias }=\mathbb{E}[\hat{y}]-y
$
It is better to have unbiased estimators, but in some cases a bit of bias is ok. Trade tractability for less accurate solutions (than what could be).
The MC estimators are unbiased due to law of large numbers: "As the number of identically distributed, randomly generated variables increases, their sample mean (average) approaches their theoretical mean."
## Standard error of MC estimator
The MC estimator is a sample mean
$
\mathbb{E}_{x \sim p(x)}[f(x)] \approx \frac{1}{n} \sum_{i} f\left(x_{i}\right)
$
The standard error of a sample mean is
$
\sigma_{\hat{f}}=\frac{\sigma}{\sqrt{n}}
$
The more samples we take the less the estimator deviates. But the deviation reduces only as $\sqrt{n}$. With $4 x$ more samples we only improve our error two times.
---
## References
1. Lecture 12.1, UvA DL Course 2020