Tags: #notesonai #mnemonic
Topics: [[Deep Learning]]
ID: 20230107190956
---
# Babysitting Deep Neural Networks
## Before training
1. Preprocess data to center them
2. Take care of [[Weight Initialization]]
3. Use [[Normalization]]
4. Prefer residual connections, they make a big difference.
## Common Implementation Mistakes
1. Forgetting to toggle train/eval mode for the net (dropout). Be careful of is_training parameters.
2. Passing softmax outputs to the loss function that expects logits instead. Check if loss function computes softmax by itself.
3. You didn't use bias=False for your Linear/Conv2d layer when using BatchNorm, or conversely forget to include it for the output layer.
4. Forgetting to specify reduction dimension/axis when doing sum/average.
5. Forgetting to reset the graph between each test (if applicable).
6. Turn off bias for layers before batchnorm.
## Sanity Checks
1. Trying to overfit a single batch first. Turn off regularization and get loss of 0. Then turn it on, loss should be higher.
2. Make sure all the trainable variables are getting trained.
---
## References
1. Karpathy's Twitter thread on common mistakes https://twitter.com/karpathy/status/1013244313327681536
2. A Recipe for Training Neural Networks by Karpathy http://karpathy.github.io/2019/04/25/recipe/
3. Unit testing in machine learning https://medium.com/@keeper6928/how-to-unit-test-machine-learning-code-57cf6fd81765