Ensemble Methods - Notes on AI

# Ensemble Methods When does it work: - If errors of individual estimators are independent. - Differently trained models converge to different local minima. ## Bagging - Bagging (Bootstrap Aggregating) is technique of training independent estimators on different samples of the original data set and average or vote across all the predictions. - Bagging assumes that if the single predictors have independent errors, then a majority vote of their outputs should be better than the individual predictions. - Since multiple model predictions are averaged together to form the final predictions, Bagging reduces variance and helps to avoid overfitting. - Bagging is more helpful if we have overfitting (high variance) base models. - Random Forests are popular bagging models. ## Boosting - In case of boosting, machine learning models are used one after the other and the predictions made by first layer models are used as input to next layer models to get the final predictions. - Boosting is more helpful if we have high-bias models. - With bootstrapping and bagging models, we start with really complex models (low bias) and remove the variance by averaging. With boosting, we use simpler models, so aim to decrease the bias. - Extreme Gradient Boosting Machine (XGBoost), LightGBM ## Bagging vs Boosting - Bagging is variance reduction scheme while boosting reduces bias. - With bagging models, we start with really complex models (low bias) and remove the variance by averaging. - With boosting, we use simpler models with low variance and aim to decrease the bias. ## Stacking (Mixture of Experts) - Model stacking is a method for combining models to reduce their biases. The predictions of each individual model are stacked together and used as input to a final estimator to compute the prediction. This final estimator is trained through cross-validation. ## Snapshot Ensembling - Models are saved to disk whenever a better metric is observed and top-k snapshots are used to average the predictions. - Variations like changing the random seed after new snapshot is saved or cyclic learning-rate schedulers are used to improve diversity. --- ## References