Classification is used to predict categories. Classification can either be for binary classes (Yes/No) or multi-class ("Cat", "Dog", "Platypus"). Classification models include [[logistic regression]], [[support vector machine]], [[decision tree]], and [[neural network]]. ## decision tree A decision tree is a useful [[classification]] approach that provides an interpretable output. Multiple decision trees are used in the [[ensemble]] method [[random forests]]. Decision tree induction is a top down, recursive, [[divide-and-conquer]] [[greedy algorithm]]. For each split, determine the attribute splits that provide the most [[information]]. Common approaches include information gain, information gain ratio, and CART, which uses the Gini Index. ### information gain Information gain is the difference between the information in the class labels and the information for the considered attribute. At each split, select the attribute or feature that yields the largest information gain, where information is defined as $I(D) = - \sum p_i \log_2(p_i)$ This is equivalent to asking “Which attribute, if I branch on it, reduces my uncertainty about the target label the most?” ### information gain ratio Gain ratio adds a denominator to control for the size of the subsets created. CART uses the Gini Index to always create binary branches.