# Decision Trees - Split input space into rectangles which are aligned along the axes (parallel to axes) - We use sequential binary decisions which can be summarized in a tree structure. - Used for classification and regression ## Common splitting criteria 1. Information Gain (Entropy) $(t)=- \sum_{i=0}^{c-1} p(i) \log _2 p(i)$ 2. Gini: $1-\sum_{i=0}^{c-1}[p(i)]^2$ 3. Misclassification Rate $(t)=1-\max _i[p(i)]$. ![[decision_tree_criteria.png]] - Tree building process is recursively by minimizing the squared error (for regression). At each iteration, we add the feature boundary that reduces the error the most - Stop criteria can be for example min. number of data points in region, depth/height,... or decrease of loss is lower than certain threshold - Pruning: give a penalty to trees with large number of leafs to prevent unnecessary overfitting - Random forests: By combining bootstrapping and feature bagging, we ensure that the models uses different features to build the trees. Thus, the models are less correlated and probably result in better accuracies. ## Advantages - Interpretable - Combining with boosting strongly increases performance ## Disadvantages - Large trees easily overfit but small trees underfit (can be prevented by training large trees and sequentially removing nodes that reduce the error the least)