# Cross Validation
Cross validation is a technique (frequentist) in which the available data is split into $K$ folds. Then $K-1$ folds is used for training, and the remaining fold is used for test. This is done $K$ times so that each fold becomes the validation set at least once. We compute the cross-validation performance as the arithmetic mean over the K performance estimates from the validation sets.
The prediction error is obtained from the average of errors of $K$ runs.
$CV(\hat{y})=\frac{1}{N} \sum_{i=1}^{N} L\left(\hat{y}^{-\kappa(i)}\left(\boldsymbol{x}_{i}\right), t_{i}\right)$
Cross Validation is used for 2 main tasks:
1. To select optimal hyperparameters.
2. To estimate model performance.
Drawbacks of cross validation is that the number of runs required scales with number of hyperparameters and the folds, and should only be used for small datasets.
## Stratification
Stratification in cross validation ensures each fold has roughly the same class distribution of the overall data set.
## Nested cross validation
Split the dataset into $M$ folds. Keep $m$ for test.
Then split each $M-1$ into $k$ folds and find best hyperparameters. Then retrain the model with best hyperparameters and test on remaining unseen fold $m$.
![[nested cross validation.jpg]]
---