# The Inception Net
After [[AlexNet]] and [[VGGNet]], researchers moved on to the remaining challenges in object recognition with deep nets:
- Salient parts have great variation in sizes Hence, the receptive fields should vary in size accordingly
- Intuitively, deeper models are preferred, but very deep nets are prone to overfitting
## Inception Module
Multiple kernel filters of different sizes (1x1, 3x3, 5x5) in the same layer!
- Naive version
- Very expensive!
Solution: Add intermediate 1x1 convolutions for compression
![[inception.jpg]]
Although driven by good intuitions, trial and error was also a big part.
## Architecture
- 9 Inception Modules
- 22 layers deep (27 with the pooling layers)
- Global average pooling at the end of last Inception Module
- Because of the increased depth -> Vanishing gradients
- Inception solution to vanishing gradients: intermediate classifiers, which were removed after training
![[googlenet.jpg]]
## Inceptions v2, v3, v4, ...
Factorize 5x5 in two 3x3 filters
Factorize nxn in two nx1 and 1xn filters (quite a lot cheaper)
Make nets wider
RMSprop, BatchNorms, ...
---
## References
1. Lecture 5.4, UvA DL course 2020