# REINFORCE - score function estimator Exploiting the following property $ \frac{d}{d x} \log f(x)=\frac{1}{f(x)} \cdot \frac{d f(x)}{d x} $ When our function $f(x)$ is a probability density $ \nabla_{\varphi} \log p_{\varphi}(x)=\frac{1}{p_{\varphi}(x)} \nabla_{\varphi} p_{\varphi}(x) \\ \Rightarrow \nabla_{\varphi} p_{\varphi}(x)=p_{\varphi}(x) \nabla_{\varphi} \log p_{\varphi}(x) $ where $\nabla_{\varphi} \log p(x)$ is known as the score-function. This gives a neat trick to rewrite the gradient of a density as another density. ## Deriving the score-function estimator for VAE As a use case the following expectation from VAE: $\nabla_{\varphi} \mathbb{E}_{\mathbf{z} \sim q_{\varphi}(z \mid x)}[\log p(x \mid z)]$ $ \begin{array}{l} \nabla_{\varphi} \mathbb{E}_{\mathbf{z} \sim q_{\varphi}(z \mid x)}[\log p(x \mid z)]= \\ =\nabla_{\varphi} \int_{z} \log p(x \mid z) q_{\varphi}(z \mid x) d z \\ =\int_{z} \log p(x \mid z) \nabla_{\varphi} q_{\varphi}(z \mid x) d z \\ =\int_{z} \log p(x \mid z) q_{\varphi}(z \mid x) \nabla_{\varphi} \log q_{\varphi}(z \mid x) d z \\ =\mathbb{E}_{z \sim q_{\varphi}(z \mid x)}\left[\log p(x \mid z) \nabla_{\varphi} \log q_{\varphi}(z \mid x)\right] \\ =\frac{1}{n} \sum_{i} \log p\left(x \mid z^{(i)}\right) \nabla_{\varphi} \log q_{\varphi}\left(z^{(i)} \mid x\right), z^{(i)} \sim q_{\varphi}(z \mid x) \end{array} $ Thus with REINFORCE, we were able to rewrite non-densities as density, which allows us to use MC estimation! ## Score-function estimator properties Any function $f(x)$ amenable - Good for simulators or black box functions (RL) The $p_{\theta}(x)$ must be differentiable w.r.t. to parameters $\varphi$ It must be easy to sample from $p_{\varphi}(x)$ Unbiased estimator High variance estimator - The gradient will deviate a lot, but in the limit of many samples is accurate - Increases with more dimensions - If you sample once, this can be a problem and slow down or stop learning - Variance reduction methods like [[Control variates]] are usually needed --- ## References