The Neyman-Pearson Lemma helps find the test statistic on which the [[best test]] is based for [[hypothesis testing]].
Let $X_1, X_2, \dots, X_n$ be a random sample from a distribution with [[probability density function|pdf]] $f(x; \theta)$ and [[joint probability density function|joint pdf]]
$f(\vec x; \theta) \overset{iid}{=}\prod_{i=1}^n f(x_i; \theta)$
For the simple versus simple test $H_0 : \theta = \theta_0$ vs. $H_1: \theta = \theta_1$, the best test of size $\alpha$ is to reject $H_0$ in favor of $H_1$ if $\vec X \in R \text*$ where
$R\text* = \Big \{\vec x: \frac{f(\vec x; \theta_0)}{f(\vec x; \theta_1)} \le c \Big \}$
In other words, make a [[rejection region]] where you put in all the [[vector|vectors]] $x$ where the ratio is small. #expand This ratio is referred to as a "likelihood ratio", because each element of the ratio can be called a likelihood in much the same way as the term is used for the [[maximum likelihood estimator]].
For a [[discrete random variable]], the joint pdf $f(\vec x; \theta)$ is the probability $P(X_1 = x_1, X_2=x_2, \dots, X_n = x_n; \theta)$. When $H_0$ is true, $f(\vec x; \theta_0)$, which is the joint probability under the assumption that $\theta = \theta_0$, is relatively large. Since it is the numerator in the above ratio, the ratio is also large. Conversely, when $H_1$ is true, $f(\vec x; \theta_1)$ is relatively large and so the ratio is small.
The Neyman-Pearson lemma holds similarly for any [[continuous random variable]].
# Example
Suppose that $X_1, X_2, \dots, X_n$ is a random sample from the [[exponential distribution]] with rate $\lambda > 0$.
Find the best test of size $\alpha$ for testing $H_0: \lambda = \lambda_0$ vs $h_1: \lambda = \lambda_1$ where $\lambda_1 > \lambda_0$.
First, write the pdf and joint pdf.
$f(x; \lambda) = \lambda e^{-\lambda x}$
$f(\vec x; \lambda) \overset{iid}{=} \prod_{i=1}^n f(x_i; \lambda)$
Next, substitute in the pdf and solve the product.
$\prod_{i=1}^n f(x_i; \lambda) = \prod_{i=1}^n \lambda e^{-\lambda x_i} = \lambda^n e^{\lambda \sum x_i}$
Now evaluate the likelihood ratio. Simply plug in the relevant $\lambda$ for each joint pdf and simplify.
$\frac{f(\vec x; \lambda_0)}{f(\vec x; \lambda_1)} = \frac{\lambda_0^ne^{-\lambda_0 \sum x_i}}{\lambda_1^ne^{-\lambda_1 \sum x_i}} = \Big ( \frac{\lambda_0}{\lambda_1} \Big )^n e^{(-\lambda_0 - \lambda_1) \sum x_i}$
Thus, we have the statistic and direction (steps 1 and 2 of hypothesis testing). We will reject the null hypothesis in favor of the alternate hypothesis if
$\Big ( \frac{\lambda_0}{\lambda_1} \Big )^n e^{(-\lambda_0 - \lambda_1) \sum x_i} \le c$
We can further simplify by pulling the constants together on the $c$ side of the inequality. The quantities $\lambda_0$, $\lambda_1$, and $n$ are all fixed and known quantities and can be treated as constants. We want to get down to just the expression that contains the $x_i$s. In doing so, we may flip the inequality, which will inform the form (direction) of the test.
To simplify, divide both sides by the term $(\lambda_0 / \lambda_1)^n$. We don't necessarily need to keep track of exactly what the constant $c$ is, it is just some constant $c$ to be determined, so we can simply fold the term into $c$.
$e^{(-\lambda_0 - \lambda_1) \sum x_i} \le c$
Next take the $\log$ of both sides (because $\log$ is a monotonically increasing function, it will not affect the direction of the inequality).
$-(\lambda_0 - \lambda_1) \sum_{i=1}^n x_i \le c$
Finally, divide both sides by the term $-(\lambda_0 - \lambda_1)$. Importantly, it was given that $\lambda_1 > \lambda_0$, so this quantity is positive and does not flip the inequality.
$\sum_{i=1}^n X_i \le c$
The sum of all $x_i$s is the best test statistic, and the direction of the test is to reject the null hypothesis in favor of the alternate hypothesis if the sum of the $x_i$s is less than some constant $c$. In reporting our final best test, we also change the $x$ to capital $X$ to indicate a random variable (as opposed to some observed data).
For ease of calculation, we can divide both sides by $n$ to use the mean, rather than the sum, of all $x_i$s. This is still a best test but possibly easier to work with depending on how the data are provided.
$\bar X \le c$
We find $c$ in the usual way, which is by calculating the probability of a [[Type I Error]].
$\begin{align}
\alpha &= P(\text{Type I Error}) \\
&= P(\text{Reject }H_0; \lambda_0) \\
&= P(\bar X < c; \lambda_0) \\
\end{align}
$
We know that the mean of exponential random variables can be converted to the [[chi-squared distribution]] by multiplying by $2n\lambda$, where the resulting chi-square random variable ($W$) has $2n$ degrees of freedom.
$\begin{align}
\alpha &= P(2n\lambda_0 \bar X < 2n\lambda_0 c) \\
&= P(W < 2n\lambda_0 c)
\end{align}$
To solve the probability, we want $2n\lambda_0 c$ to be a critical value from the $\chi^2(2n)$ distribution that contains the area $1-\alpha$.
$2n\lambda_0c = \chi^2_{1-\alpha,2n}$
which solves to
$c = \frac{\chi^2_{1-\alpha,2n}}{2n\lambda_0}$
Thus the best test of size $\alpha$ for testing $H_0: \lambda = \lambda_0$ vs $h_1: \lambda = \lambda_1$ where $\lambda_1 > \lambda_0$ is to reject $H_0$ in favor of $H_1$ if
$\bar X < \frac{\chi^2_{1-\alpha,2n}}{2n\lambda_0}$