# Jensen's Inequality If a function $g(x)$ is a [[Convex Function]] i.e. it's first and second order derivatives are greater than 0, then we have $ \mathbb{E}[g(x)] \geq g[\mathbb{E}(x)] \quad {\text {iff g is convex }} $ Intuition: 1. Algebraic view: The function evaluated at the average of inputs is less than or equal to the average of the function evaluated at each input. 2. Geometric view: A line connecting any two points on a convex function lies below (or on) the function curve - this is the visual definition of convexity It's essentially the go-to tool for proving convexity/concavity properties and establishing bounds in information theory, machine learning, and probability theory. Examples: - [[KL Divergence]] non-negativity - Entropy concavity - Shows that Shannon entropy is a concave function - Mutual information non-negativity - Proves that I(X;Y) ≥ 0 - Convexity of loss functions - Establishes convexity for cross-entropy and other ML losses - Log-sum-exp properties - Proves convexity and bounds for this common function - f-divergence non-negativity - Generalizes beyond KL to other divergence measures - Arithmetic-geometric mean inequality - Special case when applied to the log function - Concentration inequalities - Used in proving bounds like Hoeffding's inequality --- ## References 1. Intuition behind Jensen's inequality https://www.youtube.com/watch?v=HfCb1K4Nr8M