# Jensen's Inequality
If a function $g(x)$ is a [[Convex Function]] i.e. it's first and second order derivatives are greater than 0, then we have
$
\mathbb{E}[g(x)] \geq g[\mathbb{E}(x)] \quad {\text {iff g is convex }}
$
Intuition:
1. Algebraic view: The function evaluated at the average of inputs is less than or equal to the average of the function evaluated at each input.
2. Geometric view: A line connecting any two points on a convex function lies below (or on) the function curve - this is the visual definition of convexity
It's essentially the go-to tool for proving convexity/concavity properties and establishing bounds in information theory, machine learning, and probability theory. Examples:
- [[KL Divergence]] non-negativity
- Entropy concavity - Shows that Shannon entropy is a concave function
- Mutual information non-negativity - Proves that I(X;Y) ≥ 0
- Convexity of loss functions - Establishes convexity for cross-entropy and other ML losses
- Log-sum-exp properties - Proves convexity and bounds for this common function
- f-divergence non-negativity - Generalizes beyond KL to other divergence measures
- Arithmetic-geometric mean inequality - Special case when applied to the log function
- Concentration inequalities - Used in proving bounds like Hoeffding's inequality
---
## References
1. Intuition behind Jensen's inequality https://www.youtube.com/watch?v=HfCb1K4Nr8M