# Cross entropy
Cross entropy between two probability distribution $p$ and $q$ over same underlying events measures the average number of bits needed to identify an event drawn from the set if a coding scheme optimized for $q$ instead of true distribution $p$ is used.
$H(p,q) = -\sum_i p_i log_2(q_i)$
If the true probability density p is equal to predicted density q, then cross entropy becomes entropy.
$
H(p,q) = H(p) + D_{KL}(p,q)
$
The amount of bits that differ cross entropy from entropy is called KL divergence.
---
## References