The Akaike Information Criterion is a means of selecting the best model from a set of models. Conceptually, AIC reflects the distance between the "true" model and the model of interest. AIC is based on [[information theory]]. The best model will have the smallest AIC. AIC uses the [[Kullback-Leibler divergence]]. $D_{KL}(f,g) = \int f(x) \log \Big ( \frac{f(x)}{g(x; \beta)} \Big ) dx$ We cannot compute this directly because we don't have $f(x)$ and we don't have $\beta$. We can estimate this by expanding using [[logarithm rules]] and integral rules to $D_{KL}(f,g) = \int f(x) \log f(x) - \int f(x) \log g(x; \beta) ) dx$ The first term is constant with respect to $g$, and the second term can be estimated by the log-likelihood of $\beta$. $-\ln L(\beta) + p + 1 + c$ AIC drops the $c$ and multiplies by $2$ to get $AIC = 2(p+1) - \log L(\hat \beta)$ Note that this equation balances the number of parameters (from the first term) with goodness of fit (from the second term). In the context of [[linear regression]], AIC can be further simplified using the [[maximum likelihood estimator]] for $\beta$ and $\sigma^2$ to $AIC = -2 \ln L(\hat \beta) = 2(p+1) + n \ln (\frac{RSS}{n})$ See also [[Bayes Information Criterion]].