Cramér-Rao Lower Bound

The Cramér-Rao Lower Bound (CRLB) gives the lower bound on the variance for all unbiased [[estimator|estimators]]. Suppose that $\hat \tau(\theta)$ is an unbiased estimator of the function $\tau (\theta)$. $V[\hat \tau(\theta)] >= \frac{[\tau'(\theta)]^2}{I_n(\theta)}$ where $I_n(\theta)$ is the [[Fisher information]]. > [!NOTE] > ![[Fisher information]] ## UMVUE The Uniformly Minimum Variance Unbiased Estimator (UMVUE) describes the estimator that "achieves" the CRLB, or in other words has the lowest possible variance for any unbiased estimator. ## Restrictions The proof of the CRLB depends on the [[Cauchy-Schwartz inequality]], and results in the following conditions for the CRLB to be valid. The Fisher information must be between 0 and $\infty$. $ 0 < E \Big [ \Big( \frac{\delta}{\delta \theta} \ln f(\vec{X};\theta) \Big )^2\Big ] < \infty$ The derivative of the $\log$ of the pdf must exist $\frac{\delta}{\delta \theta} \ln (f(\vec{x}; \theta)) \text { exists}$ You must be able to pull the derivative outside of the integral, and so the following integrals must be equivalent $\frac{\delta}{\delta \theta} \int f(\vec{x}; \theta) dx = \int \frac{\delta}{\delta \theta} f(\vec{x}; \theta) dx$ This condition does not hold whenever the parameter is included in the definition of the support of the distribution, which is true for example of the [[uniform distribution]]. This last condition is the most likely condition to be violated. ## Computational simplifications Some computational simplifications are available for finding the CRLB. The expected value of the [[score function]] (which is the Fisher Information without squaring) equals 0. $E \Big [ \frac{\delta}{\delta \theta} \ln f(\vec{X}; \theta) \Big ] = 0$ This is not often useful but is used to achieve the remaining computational simplifications described below. The Fisher information is the negative of the expected value of the second derivative of the $\log$ of the joint pdf. $I_n(\theta) = -E \Big [ \frac{\delta^2}{\delta^2 \theta} \ln f(\vec{X};\theta) \Big ]$ You can use this when calculating a [[second derivative]] is helpful, typically when the second derivative is a constant. When data are [[independent and identically distributed]] (iid), the Fisher information is equal to $n$ times the Fisher information for the first random variable (simplifying the calculation of expectation by removing the [[vector|vectors]] associated with joint probability). This is true because the Fisher information for each iid variable is equivalent. This avoids the need to bump up to the distribution of the sum of multiple random variables. ## Process for finding the CRLB In general, the process for finding the CRLB includes the following steps 1. Write the pdf 2. Write the joint pdf (*computational simplification*: if iid, use n * the Fisher information of a single variable instead of writing the joint pdf) 3. Calculate the Fisher information (drop the indicator portion of the equation) 1. take the natural log of the pdf 2. find the derivative (*computational simplification*: if the second derivative is a constant, the Fisher information is the negative of this constant). 3. substitute the term $\sum x_i$ with the appropriate distribution from the [[sums of random variables]] as $Y$ 4. calculate the expectation (*computational simplification*: check whether the expectation is an expression of the variance for the distribution, substitute from a lookup table if so). 5. simplify 4. Calculate the derivative of $\tau(\theta)$ 5. Substitute $(\tau'(\theta))^2$ and the Fisher Information into the formula for CRLB 6. Solve the inequality Often, but certainly not always, the Fisher Information is simply the inverse of the variance provided in a lookup table. This will be true when the estimator provided is UMVUE and the $\tau'(\theta) = 1$. ## Example Find the CRLB of the variance of all unbiased estimators of $p$ for the [[Bernoulli distribution]] given $X \overset{iid}{\sim} Bern(p)$. In this example, $\theta=p$ and $\tau(p) = p$. The [[probability density function|pdf]] for a single Bernoulli random variable is $f(x;p) = p^x(1-p)^{1-x}$ and the [[joint probability density function|joint pdf]] can be written as (note all sums are from $i=1$ to $n$) $f(\vec{x};p) = p^{\sum x_i} \ (1-p)^{n - \sum x_i}$ The [[Fisher information]] is the expectation of the square of the derivative of the log of the joint pdf. First we take the [[logarithm]] of the joint pdf $ln \ (f\vec{x};p) = \Big ( \sum x_i \Big ) \ln p + \Big ( n - \sum x_i \Big) \ln (1-p)$ Then we take the [[derivative]], recalling that the derivative of $ln \ x = 1 / x$, finding a common denominator and simplifying we have $\displaylines{ \begin{align} \frac{\delta}{\delta \theta} \ln f(\vec{x}; p) &= \frac{\sum x_i}{p} - \frac{n - \sum x_i}{1-p} \\ &=\frac{(1-p)\sum x_i}{p(1-p)} - \frac{p \Big(n - \sum x_i)}{p(1-p)} \\ &=\frac{\sum x_i - np}{p(1-p)} \end{align}}$ Noting that the sum of multiple Bernoulli random variables (the $\sum x_i$ in the above equation) is a [[Binomial distribution|binomial random variable]], we can let $Y = \sum x_i \sim bin(n, p)$. Using the [[sums of random variables]] at this step is a very common strategy for finding the CRLB. Substituting in $Y$ for the derivative we just found, we have $\frac{\sum x_i - np}{p(1-p)} = \frac{Y - np}{p(1-p)}$ Plugging this into the formula for Fischer information we have $I_n(p) = E \Big [ \Big ( \frac{Y-np}{p(1-p)} \Big)^2 \Big ] = \frac{1}{p^2(1-p)^2} E \Big [ (Y - np)^2 \Big ]$ Notice that $E[(Y - np)^2]$ is the variance of the binomial distribution because $Y$ is the binomial random variable (by definition above) and $np$ is the mean. Thus you have the random variable minus it's mean squared, which is the definition of the [[variance]] of a random variable. The variance of a binomial random variable is given by $np(1-p)$, which we can substitute in to get $I_n(p) = \frac{1}{p^2(1-p)^2} * np(1-p) = \frac{n}{p(1-p)}$ Thankfully, we can reuse the Fisher Information for all future calculations of the CRLB! Let's plug in what we have for the CRLB. First, note that the derivative of $\tau (p) = p$ is 1 with respect to $p$. $V(\hat p) >= \frac{[\tau'(p)]^2}{I_n(p)} >= \frac{1}{n/[p(1-p)]} = \frac{p(1-p)}{n}$ Looking up the variance for the Bernoulli distribution, we see that the variance is the same as the CRLB, so we can say its variance achieves the CRLB and is a UMVUE.