Tags: #notesonai #mnemonic Topics: [[Deep Learning]] ID: 20230107190956 --- # Babysitting Deep Neural Networks ## Before training 1. Preprocess data to center them 2. Take care of [[Weight Initialization]] 3. Use [[Normalization]] 4. Prefer residual connections, they make a big difference. ## Common Implementation Mistakes 1. Forgetting to toggle train/eval mode for the net (dropout). Be careful of  is_training parameters. 2. Passing softmax outputs to the loss function that expects logits instead. Check if loss function computes softmax by itself. 3. You didn't use bias=False for your Linear/Conv2d layer when using BatchNorm, or conversely forget to include it for the output layer. 4. Forgetting to specify reduction dimension/axis when doing sum/average. 5. Forgetting to reset the graph between each test (if applicable). 6. Turn off bias for layers before batchnorm. ## Sanity Checks 1. Trying to overfit a single batch first. Turn off regularization and get loss of 0. Then turn it on, loss should be higher. 2. Make sure all the trainable variables are getting trained. --- ## References 1. Karpathy's Twitter thread on common mistakes https://twitter.com/karpathy/status/1013244313327681536 2. A Recipe for Training Neural Networks by Karpathy http://karpathy.github.io/2019/04/25/recipe/ 3. Unit testing in machine learning https://medium.com/@keeper6928/how-to-unit-test-machine-learning-code-57cf6fd81765