VGGNet

VGGNet, also called VGG (Visual Geometry Group), is a convolutional neural network, deeper than AlexNet. VGG-16 is the most popular variant of this network.

You can find the variants of the VGGNet on this page.

Network features:

138M parameters
16 layers (13 convolutional + 3 fully connected)
use ReLU as activation function

Architecture

Parameters

Layer	Activation Shape	Filter Shape	Weights	Biases	Parameters
Input	3x224x224	-	0	0	0
Conv1	64x224x224	3x3	1,728	64	1,792
Conv2	64x224x224	3x3	36,864	64	36,928
Pool1	64x112x112	-	0	0	0
Conv3	128x112x112	3x3	73,728	128	73,856
Conv4	128x112x112	3x3	147,456	128	147,584
Pool2	128x56x56	-	0	0	0
Conv5	256x56x56	3x3	294,912	256	295,168
Conv6	256x56x56	3x3	589,824	256	590,080
Conv7	256x56x56	3x3	589,824	256	590,080
Pool3	256x28x28	-	0	0	0
Conv8	512x28x28	3x3	1,179,648	512	1,180,160
Conv9	512x28x28	3x3	2,359,296	512	2,359,808
Conv10	512x28x28	3x3	2,359,296	512	2,359,808
Pool4	512x14x14	-	0	0	0
Conv11	512x14x14	3x3	2,359,296	512	2,359,808
Conv12	512x14x14	3x3	2,359,296	512	2,359,808
Conv13	512x14x14	3x3	2,359,296	512	2,359,808
Pool5	512x7x7	-	0	0	0
FC1	4096x1x1	-	102,760,448	4096	102,764,544
FC2	4096x1x1	-	16,777,216	4096	16,781,312
FC3	1000x1x1	-	4,096,000	1000	4,097,000
Total					138,357,544

Stack of Convolutional Layers

Unlike AlexNet which uses big filters in the convolution layers at the beginning of the network (11x11 and 5x5), VGG uses only 3x3 filters. Using a stack several convolutional layers, without spatial pooling in between, increase the receptive field in the input of the stack. The figure below presents a stack of two 3x3 convolutional layers and the respective receptive fields (in blue and red).

As we can observed, the effective receptive field of two 3x3 convolutions is the same than using a single 5x5 convolution. The figure below presents the same technique with a stack of three 3x3 convolutions. In that case, the effective receptive field is the same than using a single 7x7 convolution.

The authors explained that the use of several small convolutions decreases the number of parameters. For a single 7x7 convolutional layer there are $49C^2$ parameters (with $C$ the number of channels) while a stack of three 3x3 convolutional layers has $27C^2$ parameters. Moreover, the use of three ReLU rather than one (for a stack of three) makes the decision function more discriminative.

For more information about the receptive field, see [3].

VGGNet

VGGNet

Architecture

Parameters

Stack of Convolutional Layers

Bibliography

results matching ""

No results matching ""