Shannon’s entropy formula for a discrete random variable is $H(X) = - \sum_x(P(x) \log_2 P(x)) \ \text {bits}$ Entropy is the [[expectation]] of the [[information]] across all outcomes. This gives us a way to quantify how much uncertainty a random variable has and, in turn, how much information we gain when we observe it. ## motivating example If you've ever played the game "Guess Who?" you might have an intuition for how to use the concept of entropy. In the game, you're trying to guess which person your opponent picked from a board of 24 characters. Each character has visible features like hair color, glasses, hat, etc. You can ask yes/no questions like "does your person wear glasses?" to eliminate possibilities. If we assume your opponent has chosen in a way that each character is equally likely, the probability of any one is $P(x) = \frac{1}{24}$ Using the formula for entropy we find that the initial entropy of the game is $H(X) = - 24 \cdot \frac{1}{24} \log_2 \Big ( \frac{1}{24} \Big ) = \log_2(24) \approx 4.58 \ \text{bits}$ On average, it will take $4.58$ yes/no questions to uniquely identify the chosen person if each question is as informative as possible, which is to say cuts the field about in half. However, the game was purposefully designed so that any single feature (e.g., glasses, beard, bald, eye color, and gender) are shared by at most six characters. Let's say our first guess is "Does your person wear glasses" where six characters have glasses. What [[information]] can we gain from this question? The information gain will depend on your opponents answer. If the answer is "no", you've eliminated only six characters and eighteen remain. The information is $I(\text{No}) = - \log_2 \Big (\frac{18}{24} \Big ) \approx 0.42 \ \text{bits}$ If the answer is "yes", the information is $I(\text{Yes}) = - \log_2 \Big (\frac{4}{24} \Big) = 2 \ \text {bits}$ That's a lot of information! One bit of information narrows down a set of possibilities by half, so two bits of information narrows it by one fourth. It's equivalent to asking two questions that reduce the field by half! In this case, the entropy remaining on the board is $4.58 - 2 = 2.58 \ \text{bits}$. However, to understand the value of the question itself, we can calculate the expected [[information gain]] from the question, the probability weighted information for each outcome (yes or no): $(0.25 \cdot 2) + (0.75 \cdot 0.42) \approx 0.5 + 0.31 \approx 0.81 \ \text{bits}$ The question reduces your uncertainty by $0.81$ bits on average. The ideal question, which would eliminate 12 characters whether the answer is yes or no, would yield 1 bit of information. Now what if we have some insight into our opponent? Let's say we're playing against our kid sister who picks a female character 50% of the time, of which there are only five on the board (leaving nineteen males). The probability of each female is now 10% (up from 4.2%) $P(\text{female}) = 0.5 \cdot \frac15 = 0.10$ The probability of each male is $P(\text{male}) = 0.5 \cdot \frac{1}{19} \approx 0.0263$ This insight actually reduces the entropy of the game from 4.58 bits to 4.28 bits. $H(x) = - \sum_{\text{females}} 0.10 \log_2(0.10) - \sum_{\text{males}}0.0263 \log_2 (0.0263) = 4.28 \ \text{bits}$ Now the best question to ask is "Is your person female?". If the answer is yes (which you believe it will be 50% of the time), each of the remaining $5$ females are equally likely and the remaining entropy is $I(\text{Yes}) = \log_2(5) = 2.32 \ \text{bits}$ If the answer is no, each of the remaining $19$ males are equally likely and the remaining entropy is $I(\text{No}) = \log_2(19) = 4.25 \ \text{bits}$ So, the expected entropy after the question is (again based on your prior belief that your kid sister picks a female 50% of the time) is $(0.5 \cdot 2.32) + (0.5 \cdot 4.25) = 3.285 \ \text {bits}$ The information gain is $H_{\text{before}} - H_{\text{after}} = 4.28 - 3.285 = 0.995 \ \text{bits}$ That's almost 1 bit of information. Very efficient! It's no coincidence either, your prior belief in your kid sister's strategy gives you the insight needed to make a smart first guess. See this [article](https://www.geekyhobbies.com/how-to-win-guess-who-within-six-turns/) for a full strategy to beat your kid sister at Guess Who.