# Wassertein GAN
Instead of KL/JS, use [[Wasserstein Distance]]
$
W\left(p_{r}, p_{g}\right)=\inf _{\gamma \sim \Pi\left(p_{\mathrm{r}}, \mathrm{p}_{\mathrm{g}}\right)} \mathrm{E}_{(\mathrm{x}, \mathrm{y}) \sim \gamma}|x-y|
$
$D_{K L}$ gives us inifity when two distributions are disjoint. The value of $D_{J S}$ has sudden jump, not differentiable at $\theta=0 .$ Only Wasserstein metric provides a smooth measure, which is super helpful for a stable learning process using gradient descent.
So the overall benefit is that even for non-overlapping supports, the distance is still meaningful.
---
## References
1. Arjovsky, Chintala, Bottou, Wasserstein GAN
2. https://lilianweng.github.io/lil-log/2017/08/20/from-GAN-to-WGAN.html#wasserstein-gan-wgan