The Akaike Information Criterion is a means of selecting the best model from a set of models. Conceptually, AIC reflects the distance between the "true" model and the model of interest. AIC is based on [[information theory]]. The best model will have the smallest AIC.
AIC uses the [[Kullback-Leibler divergence]].
$D_{KL}(f,g) = \int f(x) \log \Big ( \frac{f(x)}{g(x; \beta)} \Big ) dx$
We cannot compute this directly because we don't have $f(x)$ and we don't have $\beta$. We can estimate this by expanding using [[logarithm rules]] and integral rules to
$D_{KL}(f,g) = \int f(x) \log f(x) - \int f(x) \log g(x; \beta) ) dx$
The first term is constant with respect to $g$, and the second term can be estimated by the log-likelihood of $\beta$.
$-\ln L(\beta) + p + 1 + c$
AIC drops the $c$ and multiplies by $2$ to get
$AIC = 2(p+1) - \log L(\hat \beta)$
Note that this equation balances the number of parameters (from the first term) with goodness of fit (from the second term).
In the context of [[linear regression]], AIC can be further simplified using the [[maximum likelihood estimator]] for $\beta$ and $\sigma^2$ to
$AIC = -2 \ln L(\hat \beta) = 2(p+1) + n \ln (\frac{RSS}{n})$
See also [[Bayes Information Criterion]].