GoogLeNet
GoogLeNet, also called Inception-v1, is a particular incarnation of the Inception architecture proposed by Szegedy et al. in 2015 [1].
Network features:
- 22 layers
- use inception modules
- use bottleneck before 3x3 and 5x5 convolutions
Architecture
Convolution
As explained by the authors in their paper [1] all the convolution are followed by a rectified linear activation (ReLU).
Inception module
The inception module is designed to allow CNN to benefit from multi-level feature extraction by implementing filters at various sizes (e.g. 1x1, 3x3, 5x5) in the same layer of the network. This allow the network to capture information at various scales and complexities. [3]
Depth concatenation
The output of all filters and the pooling layer concatenated along the channel dimension before fed to the next layer. This concatenation ensures that the subsequent layers can access features extracted at different scales. [3]
Global average pooling
The global average, used at the end of the network, averages each 7x7 feature map into 1x1. The difference between the global average and a fully connected layer is the number of weights. If a fully connected layer is used on a 1024x7x7 feature maps then there are 1024x7x7x1024 weights (cf. the figure below).
If a global average pooling is used on the same bloc, there is 0 weight (cf. the figure below).
Bibliography
- [1] [Paper] Going deeper with convolutions
- [2] [Medium] Going deeper with convolutions: the inception paper, explained
- [3] [Blog] Understanding the inception module in deep learning
- [4] [Medium] Review GoogLeNet (Inception v1) - Winner of ILSVRC 2014 (Image Classification)
- [5] [Dockship] GoogLeNet pretrained model