Information Theory - Notes on AI

# Information Theory According to Shannon's theory, information and uncertainty are the two sides of the same coin: the more uncertainty there is, the more information we gain by removing the uncertainty. In his 1948 seminal work, Shannon proposed a measure of the uncertainty associated with a random memory-less source, called Entropy. The concept of Entropy first emerged in thermodynamics in the 18th century in the work of Carnot. Boltzmann later found the connection between entropy and probability, and the notion of information as used by Shannon is a generalization of the notion of entropy. $H = -\sum_a p_a \log _2 p_a .$ Shannon found that entropy is the only quantity that satisfies the three important conditions: 1. $H(X)$ is always non-negative, since information cannot be lost. 2. The uniform distribution maximizes $H(X)$, since it also maximizes uncertainty. 3. The additivity property which relates the sum of entropies of two independent events. For instance, in thermodynamics, the total entropy of two isolated systems which coexist in equilibrium is the sum of the entropies of each system in isolation. ## References 1. Better intuition for information theory - https://www.blackhc.net/blog/2019/better-intuition-for-information-theory/