Positive pointwise mutual information (PPMI) is a dimensionality reduction technique for [[word embeddings]], leveraging the concept of [[pointwise mutual information]]. The pointwise mutual information between a target word $w$ in a specific document $d$ for a corpus of documents size $N$ is given by
$\text{PMI}(w,d) = \log_2 \frac{P(w,d)}{P(w)P(d)} $
where
$P(w,c) = \frac{Count(w,d)}{N}$
$P(w) = \frac{Count(w)}{N}$
$P(d) = \frac{Size(d)}{N}$
Because negative mutual information is typically unreliable unless the corpus is very large, thus PPMI is
$\text{PPMI}(w, c) = \max(PMI, 0)$
However, PPMI is biased towards rarity. Raising the probability of a context word to a power $\alpha$ is one technique to overcome this.
$\text{PPMI}_\alpha(w, c) = \log_2 \frac{P(w,c)}{P(w)P_{\alpha}(c)}$
where
$P_{\alpha}(c) = \frac{\text{count}(c)^{\alpha}}{\sum_c \text{count}(c)^{\alpha}}$
Levy et al.[^1] found that setting $\alpha = 0.75$ improved performance of embeddings on a wide range of tasks.
Another possible solution is [[smoothing]]: Before computing PMI, a small
constant $k$ (values of 0.1-3 are common) is added to each of the counts, shrinking
(discounting) all the non-zero values. The larger the $k$, the more the non-zero counts
are discounted.
[^1]: Levy, O., Y. Goldberg, and I. Dagan. 2015. Improving dis tributional similarity with lessons learned from word em beddings. TACL, 3:211–225.