VGGNet - Notes on AI

# VGGNet 7.3% error rate in ImageNet compared to 18.2% of [[AlexNet]]. Although deeper, number of weights does not explode. ![[vggnet.jpg]] Credit: Arden Dertat ## Notable properties - All convolution filters are of size 3x3 - Cascades of sequential convolutions. - Convolutions with stride 1 preserve spatial resolution but change volume. - Maxpool 2x2 with stride 2 downsample spatial resolution but preserve volume. ### Effective receptive field The number of actual pixels contributing at the activation in $l$ -th layer - Not just the ones from the previous layers, but the others before that too A large filter can be replaced by a deeper stack of successive smaller filters - Two 3x3 filters have the receptive field of one 5x5 - Three 3x3 filters have the receptive field of one 7x7 Depth increases effective receptive field - Every "pixel" in the 2nd layer corresponds to a 3x3 region in the previous one ![[receptive-fields.jpg]] ### Why 3x3 filters? They are the smallest possible filter to capture the up, down, left and right Deeper stacks of smaller filters likely more powerful than single large filter - Three more nonlinearities for the same "size" of pattern learning - Fewer parameters and regularization --- ## References 1. Lecture 5.1, UvA DL course 2020