sample variance

An unbiased [[estimator]] of the sample variance ($S^2$) is $S^2 = \frac{\sum_{i=1}^n(X_i - \bar X)^2}{n-1}$ The denominator $n-1$ is known as the [[degrees of freedom]]. For the normal distribution, $S^2$ and $\bar X$ are independent. The sample variance can also be specified as $\tilde S^2 := \frac{\sum_{i=1}^n (X_i - \bar X)^2}{n}$ Note that $\tilde S$ is a biased estimator. We can see this by calculating the [[expectation]] for the sample variance. $E[\tilde S^2] = E \Big [\frac{\sum_{i=1}^n (X_i - \bar X)^2}{n} \Big] = \frac{n-1}{n} \sigma^2$ From this, we discover that to make this estimator unbiased, we multiply by the fraction $\frac{n}{n-1}$ in so doing, we arrive at our unbiased estimator for the sample variance. The distribution of the sample variance from a [[normal distribution]] can be modeled as a [[chi-squared distribution]] with $n-1$ degrees of freedom. $\frac{S^2(n-1)}{\sigma^2} \sim \chi^2(n-1)$ $S^2$ is a [[consistent estimator]] of $\sigma^2$. We can use the sample variance in place of the population variance provided $n$ is sufficiently large ($n > 30$). $\frac{\bar X - \mu}{S / \sqrt{n}} \sim N(0,1)$ When $n$ is small, and the population distribution is the [[normal distribution]], use the [[t-distribution]] with $n - 1$ degrees of freedom. $\frac{\bar X - \mu}{S / \sqrt{n}} \sim t(n-1)$ Another form of the sample variance (derived by squaring the term in the numerator and running the sums through) is $S^2 = \frac{\sum_{i=1}^n X_i^2 - [(\sum_{i=1}^n X_i)^2) / n]}{n-1}$ You might use this when the only data you have are reported the sum of the $x_is and their squares. #refactor #expand Discuss how the MLE for $\sigma^2$ gives you the biased estimator for the sample variance, and the correction to (n-1) resulting from the calculation of expectation of that estimator. Week 2 Lesson 3 DS5002