AlexNet
AlexNet is a convolutional neural network.
Network features:
- 60M parameters
- 8 layers (5 convolutional + 3 fully connected)
- use ReLU as activation function
- use LRN for normalization
Architecture
Parameters
Layer | Activation Shape | Filter Shape | Weights | Biases | Parameters |
---|---|---|---|---|---|
Input | 3x224x224 | - | 0 | 0 | 0 |
Conv1 | 96x55x55 | 11x11 | 34,848 | 96 | 34,944 |
Pool1 | 96x27x27 | - | 0 | 0 | 0 |
Conv2 | 256x27x27 | 5x5 | 614,400 | 256 | 614,656 |
Pool2 | 256x13x13 | - | 0 | 0 | 0 |
Conv3 | 384x13x13 | 3x3 | 884,736 | 384 | 885,120 |
Conv4 | 384x13x13 | 3x3 | 1,327,104 | 384 | 1,327,488 |
Conv5 | 256x13x13 | 3x3 | 884,736 | 256 | 884,992 |
Pool3 | 256x6x6 | - | 0 | 0 | 0 |
FC1 | 4096x1x1 | - | 37,748,736 | 4096 | 37,752,832 |
FC2 | 4096x1x1 | - | 16,777,216 | 4096 | 16,781,312 |
FC3 | 1000x1x1 | - | 4,096,000 | 1000 | 4,097,000 |
Total | 62,378,344 |
See [5] for explanations about parameters calculations.
Activation Function
ReLU is used after each convolutional and fully connected layers. AlexNet is one of the first neural network to use this activation function. The authors chose ReLu because “Deep convolutional neural networks with ReLUs train several times faster than their equivalents with tanh units” [1].
Dropout
In order to reduce overfitting, dropout are used after the two first fully connected layers. This technique presented by Hinton et al. [2] randomly deactivate each neuron of a layer (here the probability to be deactivated is 0.5). The deactivated neurons do not take part of the forward pass and the back propagation. In this way, a neuron cannot rely on the presence of another neuron, avoiding complex co-adaptions on the training data. At each new training input, the architecture of the network is changed but the weights are shared between all these achitectures. During the test step, all the neurons are used but the output is multiply by 0.5 to take the “geometric mean of the predictive distributions produced by the exponentially-many dropout networks” [1].
See also [3].
Local Response Normalization (LRN)
The LRN, introduced in the AlexNet paper [1], is used after ReLu to normalized values to avoid overgrowth. There are two types of LRN: Inter-Channel LRN and Intra-Channel LRN. The first one is used in AlexNet. For more explanations see [4].
Bibliography
- [1] [Paper] ImageNet classification with deep convolutional neural networds
- [2] [Paper] Improving neural networks by preventing co-adaptation of feature detectors
- [3] [Machine Learning Mastery] A Gentle Introduction to Dropout for Regularizing Deep Neural Networks
- [4] [Medium] Difference between Local Response Normalization and Batch Normalization
- [5] [LearnOpenCV] Number of Parameters and Tensor Sizes in a Convolutional Neural Network (CNN)