Generative models are models for [[classification]] that use [[base/Probability/Bayes rule]] to flip the probability $P(Y=k | X=x)$ to predict the class $k$ given some data $x$.
Generative models include
- [[linear discriminant analysis]]
- [[quadratic discriminant analysis]]
- [[naive Bayes]]
## linear discriminant analysis
Linear discriminant analysis (LDA) assumes that the data can be separated by linear boundaries. Within each class, the predictors are assumed to follow a multivariate [[normal distribution]] (Gaussian) with a common covariance matrix across classes. LDA maximizes the ratio of between-class variance to within-class variance in the data. This ensures maximum separation between the different classes. The important assumption is that all classes share a single covariance matrix. If this does not hold, use [[quadratic discriminant analysis]].
LDA is often used for feature extraction in facial recognition systems.
The [[estimator|estimators]] for LDA are
$\hat \mu_k = \frac{1}{n_k} \sum_{i: y_i=k} x_i$
$\hat \sigma^2 = \frac{1}{n - K} \sum_{k=1}^K \ \sum_{i: y_i=k} (X_i - \hat \mu_k)^2$
$\hat \pi_k = \frac{n_k}{n}$
where $\pi_k$ is the [[prior]] for the class $k$, simply the proportion of elements in class $k$ in the training data.
Select the class that maximizes the delta function for class $k$ given by (where $p=1$),
$
\delta_k(x) = x \frac{\mu_k}{\sigma^2} - \frac{\mu_k^2}{2 \sigma^2} + \ln(\pi_k)
$
With more than one predictor ($p > 1$), the delta function is given in [[matrix-vector form]] as
$
\delta_k(x) = x^\top \Sigma^{-1} \mu_k - \frac{1}{2} \mu_k^\top \Sigma^{-1} \mu_k + \ln(\pi_k)
$
where
- $x$ is the observation (feature vector)
- $\Sigma$ is the common covariance matrix
- $\mu_k$ is the mean vector for class $k$
- $\pi_k$ is the prior probability for class $k$
### R
```R
lda.fit <- lda(response ~ predictors, data=data)
```
## quadratic discriminant analysis
Quadratic discriminant analysis (QDA) assumes each class has its own covariance matrix. Otherwise the setup is the same as [[linear discriminant analysis]].
Maximize the delta function given by
$ \delta_k(x) = -\frac{1}{2} \ln|\Sigma_k| - \frac{1}{2} (x - \mu_k)^\top \Sigma_k^{-1} (x - \mu_k) + \ln(\pi_k) $ where:
- $x$: The observation (feature vector).
- $\Sigma_k$: The covariance matrix for class $k$ (not shared across classes as in LDA).
- $\mu_k$: The mean vector for class $k$.
- $\pi_k$: The prior probability for class $k$.
- $|\Sigma_k|$: The determinant of the covariance matrix $\Sigma_k$.
### R
```R
qda.fit <- qda(response ~ predictors, data=data)
```
## naive Bayes
Naive Bayes differs from [[linear discriminant analysis]] and [[quadratic discriminant analysis]] because it makes no assumption about the distribution of $f_k$, but simply assumes that in class $k$ all predictors are independent.
Thus, the form of $f_k$ is simply the product of each predictor $f_{k_p}$.
$f_k(x) = \prod f_{k_p}(x_{p})$
### R
Install the package `e1071` for the `naiveBayes` classifier.
```R
install.packages(e1071)
library(e1071)
nb.fit <- naiveBayes(response ~ predictors, data=data)
```