generative models

Generative models are models for [[classification]] that use [[base/Probability/Bayes rule]] to flip the probability $P(Y=k | X=x)$ to predict the class $k$ given some data $x$. Generative models include - [[linear discriminant analysis]] - [[quadratic discriminant analysis]] - [[naive Bayes]] ## linear discriminant analysis Linear discriminant analysis (LDA) assumes that the data can be separated by linear boundaries. Within each class, the predictors are assumed to follow a multivariate [[normal distribution]] (Gaussian) with a common covariance matrix across classes. LDA maximizes the ratio of between-class variance to within-class variance in the data. This ensures maximum separation between the different classes. The important assumption is that all classes share a single covariance matrix. If this does not hold, use [[quadratic discriminant analysis]]. LDA is often used for feature extraction in facial recognition systems. The [[estimator|estimators]] for LDA are $\hat \mu_k = \frac{1}{n_k} \sum_{i: y_i=k} x_i$ $\hat \sigma^2 = \frac{1}{n - K} \sum_{k=1}^K \ \sum_{i: y_i=k} (X_i - \hat \mu_k)^2$ $\hat \pi_k = \frac{n_k}{n}$ where $\pi_k$ is the [[prior]] for the class $k$, simply the proportion of elements in class $k$ in the training data. Select the class that maximizes the delta function for class $k$ given by (where $p=1$), $ \delta_k(x) = x \frac{\mu_k}{\sigma^2} - \frac{\mu_k^2}{2 \sigma^2} + \ln(\pi_k) $ With more than one predictor ($p > 1$), the delta function is given in [[matrix-vector form]] as $ \delta_k(x) = x^\top \Sigma^{-1} \mu_k - \frac{1}{2} \mu_k^\top \Sigma^{-1} \mu_k + \ln(\pi_k) $ where - $x$ is the observation (feature vector) - $\Sigma$ is the common covariance matrix - $\mu_k$ is the mean vector for class $k$ - $\pi_k$ is the prior probability for class $k$ ### R ```R lda.fit <- lda(response ~ predictors, data=data) ``` ## quadratic discriminant analysis Quadratic discriminant analysis (QDA) assumes each class has its own covariance matrix. Otherwise the setup is the same as [[linear discriminant analysis]]. Maximize the delta function given by $ \delta_k(x) = -\frac{1}{2} \ln|\Sigma_k| - \frac{1}{2} (x - \mu_k)^\top \Sigma_k^{-1} (x - \mu_k) + \ln(\pi_k) $ where: - $x$: The observation (feature vector). - $\Sigma_k$: The covariance matrix for class $k$ (not shared across classes as in LDA). - $\mu_k$: The mean vector for class $k$. - $\pi_k$: The prior probability for class $k$. - $|\Sigma_k|$: The determinant of the covariance matrix $\Sigma_k$. ### R ```R qda.fit <- qda(response ~ predictors, data=data) ``` ## naive Bayes Naive Bayes differs from [[linear discriminant analysis]] and [[quadratic discriminant analysis]] because it makes no assumption about the distribution of $f_k$, but simply assumes that in class $k$ all predictors are independent. Thus, the form of $f_k$ is simply the product of each predictor $f_{k_p}$. $f_k(x) = \prod f_{k_p}(x_{p})$ ### R Install the package `e1071` for the `naiveBayes` classifier. ```R install.packages(e1071) library(e1071) nb.fit <- naiveBayes(response ~ predictors, data=data) ```