positive pointwise mutual information

Positive pointwise mutual information (PPMI) is a dimensionality reduction technique for [[word embeddings]], leveraging the concept of [[pointwise mutual information]]. The pointwise mutual information between a target word $w$ in a specific document $d$ for a corpus of documents size $N$ is given by $\text{PMI}(w,d) = \log_2 \frac{P(w,d)}{P(w)P(d)} $ where $P(w,c) = \frac{Count(w,d)}{N}$ $P(w) = \frac{Count(w)}{N}$ $P(d) = \frac{Size(d)}{N}$ Because negative mutual information is typically unreliable unless the corpus is very large, thus PPMI is $\text{PPMI}(w, c) = \max(PMI, 0)$ However, PPMI is biased towards rarity. Raising the probability of a context word to a power $\alpha$ is one technique to overcome this. $\text{PPMI}_\alpha(w, c) = \log_2 \frac{P(w,c)}{P(w)P_{\alpha}(c)}$ where $P_{\alpha}(c) = \frac{\text{count}(c)^{\alpha}}{\sum_c \text{count}(c)^{\alpha}}$ Levy et al.[^1] found that setting $\alpha = 0.75$ improved performance of embeddings on a wide range of tasks. Another possible solution is [[smoothing]]: Before computing PMI, a small constant $k$ (values of 0.1-3 are common) is added to each of the counts, shrinking (discounting) all the non-zero values. The larger the $k$, the more the non-zero counts are discounted. [^1]: Levy, O., Y. Goldberg, and I. Dagan. 2015. Improving dis tributional similarity with lessons learned from word em beddings. TACL, 3:211–225.