# REINFORCE - score function estimator
Exploiting the following property
$
\frac{d}{d x} \log f(x)=\frac{1}{f(x)} \cdot \frac{d f(x)}{d x}
$
When our function $f(x)$ is a probability density
$
\nabla_{\varphi} \log p_{\varphi}(x)=\frac{1}{p_{\varphi}(x)} \nabla_{\varphi} p_{\varphi}(x) \\ \Rightarrow \nabla_{\varphi} p_{\varphi}(x)=p_{\varphi}(x) \nabla_{\varphi} \log p_{\varphi}(x)
$
where $\nabla_{\varphi} \log p(x)$ is known as the score-function.
This gives a neat trick to rewrite the gradient of a density as another density.
## Deriving the score-function estimator for VAE
As a use case the following expectation from VAE: $\nabla_{\varphi} \mathbb{E}_{\mathbf{z} \sim q_{\varphi}(z \mid x)}[\log p(x \mid z)]$
$
\begin{array}{l}
\nabla_{\varphi} \mathbb{E}_{\mathbf{z} \sim q_{\varphi}(z \mid x)}[\log p(x \mid z)]= \\
=\nabla_{\varphi} \int_{z} \log p(x \mid z) q_{\varphi}(z \mid x) d z \\
=\int_{z} \log p(x \mid z) \nabla_{\varphi} q_{\varphi}(z \mid x) d z \\
=\int_{z} \log p(x \mid z) q_{\varphi}(z \mid x) \nabla_{\varphi} \log q_{\varphi}(z \mid x) d z \\
=\mathbb{E}_{z \sim q_{\varphi}(z \mid x)}\left[\log p(x \mid z) \nabla_{\varphi} \log q_{\varphi}(z \mid x)\right] \\
=\frac{1}{n} \sum_{i} \log p\left(x \mid z^{(i)}\right) \nabla_{\varphi} \log q_{\varphi}\left(z^{(i)} \mid x\right), z^{(i)} \sim q_{\varphi}(z \mid x)
\end{array}
$
Thus with REINFORCE, we were able to rewrite non-densities as density, which allows us to use MC estimation!
## Score-function estimator properties
Any function $f(x)$ amenable
- Good for simulators or black box functions (RL)
The $p_{\theta}(x)$ must be differentiable w.r.t. to parameters $\varphi$
It must be easy to sample from $p_{\varphi}(x)$
Unbiased estimator
High variance estimator
- The gradient will deviate a lot, but in the limit of many samples is accurate
- Increases with more dimensions
- If you sample once, this can be a problem and slow down or stop learning
- Variance reduction methods like [[Control variates]] are usually needed
---
## References