# Decision Trees
- Split input space into rectangles which are aligned along the axes (parallel to axes)
- We use sequential binary decisions which can be summarized in a tree structure.
- Used for classification and regression
## Common splitting criteria
1. Information Gain (Entropy) $(t)=- \sum_{i=0}^{c-1} p(i) \log _2 p(i)$
2. Gini: $1-\sum_{i=0}^{c-1}[p(i)]^2$
3. Misclassification Rate $(t)=1-\max _i[p(i)]$.
![[decision_tree_criteria.png]]
- Tree building process is recursively by minimizing the squared error (for regression). At each iteration, we add the feature boundary that reduces the error the most
- Stop criteria can be for example min. number of data points in region, depth/height,... or decrease of loss is lower than certain threshold
- Pruning: give a penalty to trees with large number of leafs to prevent unnecessary overfitting
- Random forests: By combining bootstrapping and feature bagging, we ensure that the models uses different features to build the trees. Thus, the models are less correlated and probably result in better accuracies.
## Advantages
- Interpretable
- Combining with boosting strongly increases performance
## Disadvantages
- Large trees easily overfit but small trees underfit (can be prevented by training large trees and sequentially removing nodes that reduce the error the least)