# Monte-Carlo Estimation ## Motivation We are often interested to compute quantities (statistics) on random variables. For ex, - the average response to a drug - probability of a particular sum when throwing two dice - average reconstructions in VAE given an input. These statistics often intractable to compute. - Cannot derive a perfect drug response model (too complex) - Cannot enumerate all possible dice combinations (too lazy) - Computationally infeasible (intractable integrals) ## Monte Carlo Integration Use random sampling instead of analytical computation. A single random sample might not be enough Many random samples can give us a reliable quantification. E.g, by throwing dice many times we can obtain a histogram of probabilities for each possible sum. If we throw dice once, the histogram will be very wrong (just a single bar). But if we repeat hundreds of times and average, we are gonna get very close. More formally, in MC integration we treat inputs $x$ as random variables with pdf $p(x)$. Our desired statistic $y$ is the output and integrate over all possible $x$ $ y=\int_{x} f(x) p(x) d x $ This integral is equivalent to an expectation $ y = \int_{x} f(x) p(x) d x = \mathbb{E}_{x \sim p(x)}[f(x)] $ We can approximate this expectation by random sampling and summation $y=\mathbb{E}_{x \sim p(x)}[f(x)] \approx \frac{1}{n} \sum_{i} f\left(x_{i}\right)=\hat{y}$ where $x_{i}$ is sampled from $p(x)$. Note that $\hat{y}$ is an estimator because it only approximately estimates the value of $y$. If we want to compute a quantity $y$ that we can express it as an integral of a function $f$ over a probability space $x$ that has a known and easy to sample pdf $p(x)$ we can replace the exact but intractable computation with a tractable $\mathrm{MC}$ estimator $ y=\mathbb{E}_{x \sim p(x)}[f(x)] \approx \frac{1}{n} \sum_{i} f\left(x_{i}\right), x_{i} \sim p(x) $ If we can't translate the quantity as such an integral, we can't estimate it. For instance, we cannot use $\mathrm{MC}$ on the following because neither the $\log p(x \mid z)$ nor the $\nabla_{\varphi} q_{\varphi}(z \mid x)$ are probability densities: $ \nabla_{\varphi} \mathbb{E}_{\mathbf{z} \sim q_{\varphi}(\mathbf{z} \mid x)}[\log p(\boldsymbol{x} \mid \mathbf{z})]=\int_{Z} \log p(\boldsymbol{x} \mid \mathbf{z}) \nabla_{\varphi} q_{\varphi}(\mathbf{z} \mid \boldsymbol{x}) d \mathbf{z} $ ## Estimator mean and variance Our estimator is itself a random variable $\rightarrow$. It has its own mean $\mu_{\hat{y}}=\mathbb{E}[\hat{y}]$ and variance $\operatorname{Var}[\hat{y}]=\mathbb{E}\left[\left(\hat{y}-\mu_{\hat{y}}\right)^{2}\right]$. The higher the variance, the more the estimation fluctuates after every new experiment. ### Estimator bias An estimator is unbiased it in expectation it matches the true statistic $ \mathbb{E}[\hat{y}]=y $ Otherwise, biased with bias $ \text { bias }=\mathbb{E}[\hat{y}]-y $ It is better to have unbiased estimators, but in some cases a bit of bias is ok. Trade tractability for less accurate solutions (than what could be). The MC estimators are unbiased due to law of large numbers: "As the number of identically distributed, randomly generated variables increases, their sample mean (average) approaches their theoretical mean." ## Standard error of MC estimator The MC estimator is a sample mean $ \mathbb{E}_{x \sim p(x)}[f(x)] \approx \frac{1}{n} \sum_{i} f\left(x_{i}\right) $ The standard error of a sample mean is $ \sigma_{\hat{f}}=\frac{\sigma}{\sqrt{n}} $ The more samples we take the less the estimator deviates. But the deviation reduces only as $\sqrt{n}$. With $4 x$ more samples we only improve our error two times. --- ## References 1. Lecture 12.1, UvA DL Course 2020