Neyman-Pearson Lemma

The Neyman-Pearson Lemma helps find the test statistic on which the [[best test]] is based for [[hypothesis testing]]. Let $X_1, X_2, \dots, X_n$ be a random sample from a distribution with [[probability density function|pdf]] $f(x; \theta)$ and [[joint probability density function|joint pdf]] $f(\vec x; \theta) \overset{iid}{=}\prod_{i=1}^n f(x_i; \theta)$ For the simple versus simple test $H_0 : \theta = \theta_0$ vs. $H_1: \theta = \theta_1$, the best test of size $\alpha$ is to reject $H_0$ in favor of $H_1$ if $\vec X \in R \text*$ where $R\text* = \Big \{\vec x: \frac{f(\vec x; \theta_0)}{f(\vec x; \theta_1)} \le c \Big \}$ In other words, make a [[rejection region]] where you put in all the [[vector|vectors]] $x$ where the ratio is small. #expand This ratio is referred to as a "likelihood ratio", because each element of the ratio can be called a likelihood in much the same way as the term is used for the [[maximum likelihood estimator]]. For a [[discrete random variable]], the joint pdf $f(\vec x; \theta)$ is the probability $P(X_1 = x_1, X_2=x_2, \dots, X_n = x_n; \theta)$. When $H_0$ is true, $f(\vec x; \theta_0)$, which is the joint probability under the assumption that $\theta = \theta_0$, is relatively large. Since it is the numerator in the above ratio, the ratio is also large. Conversely, when $H_1$ is true, $f(\vec x; \theta_1)$ is relatively large and so the ratio is small. The Neyman-Pearson lemma holds similarly for any [[continuous random variable]]. # Example Suppose that $X_1, X_2, \dots, X_n$ is a random sample from the [[exponential distribution]] with rate $\lambda > 0$. Find the best test of size $\alpha$ for testing $H_0: \lambda = \lambda_0$ vs $h_1: \lambda = \lambda_1$ where $\lambda_1 > \lambda_0$. First, write the pdf and joint pdf. $f(x; \lambda) = \lambda e^{-\lambda x}$ $f(\vec x; \lambda) \overset{iid}{=} \prod_{i=1}^n f(x_i; \lambda)$ Next, substitute in the pdf and solve the product. $\prod_{i=1}^n f(x_i; \lambda) = \prod_{i=1}^n \lambda e^{-\lambda x_i} = \lambda^n e^{\lambda \sum x_i}$ Now evaluate the likelihood ratio. Simply plug in the relevant $\lambda$ for each joint pdf and simplify. $\frac{f(\vec x; \lambda_0)}{f(\vec x; \lambda_1)} = \frac{\lambda_0^ne^{-\lambda_0 \sum x_i}}{\lambda_1^ne^{-\lambda_1 \sum x_i}} = \Big ( \frac{\lambda_0}{\lambda_1} \Big )^n e^{(-\lambda_0 - \lambda_1) \sum x_i}$ Thus, we have the statistic and direction (steps 1 and 2 of hypothesis testing). We will reject the null hypothesis in favor of the alternate hypothesis if $\Big ( \frac{\lambda_0}{\lambda_1} \Big )^n e^{(-\lambda_0 - \lambda_1) \sum x_i} \le c$ We can further simplify by pulling the constants together on the $c$ side of the inequality. The quantities $\lambda_0$, $\lambda_1$, and $n$ are all fixed and known quantities and can be treated as constants. We want to get down to just the expression that contains the $x_i$s. In doing so, we may flip the inequality, which will inform the form (direction) of the test. To simplify, divide both sides by the term $(\lambda_0 / \lambda_1)^n$. We don't necessarily need to keep track of exactly what the constant $c$ is, it is just some constant $c$ to be determined, so we can simply fold the term into $c$. $e^{(-\lambda_0 - \lambda_1) \sum x_i} \le c$ Next take the $\log$ of both sides (because $\log$ is a monotonically increasing function, it will not affect the direction of the inequality). $-(\lambda_0 - \lambda_1) \sum_{i=1}^n x_i \le c$ Finally, divide both sides by the term $-(\lambda_0 - \lambda_1)$. Importantly, it was given that $\lambda_1 > \lambda_0$, so this quantity is positive and does not flip the inequality. $\sum_{i=1}^n X_i \le c$ The sum of all $x_i$s is the best test statistic, and the direction of the test is to reject the null hypothesis in favor of the alternate hypothesis if the sum of the $x_i$s is less than some constant $c$. In reporting our final best test, we also change the $x$ to capital $X$ to indicate a random variable (as opposed to some observed data). For ease of calculation, we can divide both sides by $n$ to use the mean, rather than the sum, of all $x_i$s. This is still a best test but possibly easier to work with depending on how the data are provided. $\bar X \le c$ We find $c$ in the usual way, which is by calculating the probability of a [[Type I Error]]. $\begin{align} \alpha &= P(\text{Type I Error}) \\ &= P(\text{Reject }H_0; \lambda_0) \\ &= P(\bar X < c; \lambda_0) \\ \end{align} $ We know that the mean of exponential random variables can be converted to the [[chi-squared distribution]] by multiplying by $2n\lambda$, where the resulting chi-square random variable ($W$) has $2n$ degrees of freedom. $\begin{align} \alpha &= P(2n\lambda_0 \bar X < 2n\lambda_0 c) \\ &= P(W < 2n\lambda_0 c) \end{align}$ To solve the probability, we want $2n\lambda_0 c$ to be a critical value from the $\chi^2(2n)$ distribution that contains the area $1-\alpha$. $2n\lambda_0c = \chi^2_{1-\alpha,2n}$ which solves to $c = \frac{\chi^2_{1-\alpha,2n}}{2n\lambda_0}$ Thus the best test of size $\alpha$ for testing $H_0: \lambda = \lambda_0$ vs $h_1: \lambda = \lambda_1$ where $\lambda_1 > \lambda_0$ is to reject $H_0$ in favor of $H_1$ if $\bar X < \frac{\chi^2_{1-\alpha,2n}}{2n\lambda_0}$