# VGGNet
7.3% error rate in ImageNet compared to 18.2% of [[AlexNet]]. Although deeper, number of weights does not explode.
![[vggnet.jpg]]
Credit: Arden Dertat
## Notable properties
- All convolution filters are of size 3x3
- Cascades of sequential convolutions.
- Convolutions with stride 1 preserve spatial resolution but change volume.
- Maxpool 2x2 with stride 2 downsample spatial resolution but preserve volume.
### Effective receptive field
The number of actual pixels contributing at the activation in $l$ -th layer
- Not just the ones from the previous layers, but the others before that too
A large filter can be replaced by a deeper stack of successive smaller filters
- Two 3x3 filters have the receptive field of one 5x5
- Three 3x3 filters have the receptive field of one 7x7
Depth increases effective receptive field
- Every "pixel" in the 2nd layer corresponds to a 3x3 region in the previous one
![[receptive-fields.jpg]]
### Why 3x3 filters?
They are the smallest possible filter to capture the up, down, left and right
Deeper stacks of smaller filters likely more powerful than single large filter
- Three more nonlinearities for the same "size" of pattern learning
- Fewer parameters and regularization
---
## References
1. Lecture 5.1, UvA DL course 2020