# The Inception Net After [[AlexNet]] and [[VGGNet]], researchers moved on to the remaining challenges in object recognition with deep nets: - Salient parts have great variation in sizes Hence, the receptive fields should vary in size accordingly - Intuitively, deeper models are preferred, but very deep nets are prone to overfitting ## Inception Module Multiple kernel filters of different sizes (1x1, 3x3, 5x5) in the same layer! - Naive version - Very expensive! Solution: Add intermediate 1x1 convolutions for compression ![[inception.jpg]] Although driven by good intuitions, trial and error was also a big part. ## Architecture - 9 Inception Modules - 22 layers deep (27 with the pooling layers) - Global average pooling at the end of last Inception Module - Because of the increased depth -> Vanishing gradients - Inception solution to vanishing gradients: intermediate classifiers, which were removed after training ![[googlenet.jpg]] ## Inceptions v2, v3, v4, ... Factorize 5x5 in two 3x3 filters Factorize nxn in two nx1 and 1xn filters (quite a lot cheaper) Make nets wider RMSprop, BatchNorms, ... --- ## References 1. Lecture 5.4, UvA DL course 2020