The size of the vocabulary in a [[corpus]] grows somewhat faster than the square root of the corpus size. $|V| = kN^{\beta}$ where $k$ and $\beta$ are free variables (usually $10 \le k \le 100$ and $0.6 \le \beta \le 0.7$.)