ZFNet

ZFNet, named after Zeiler and Fergus, is a slight improvement of AlexNet. The authors attach a deconvnet to the CNN in order to visualize the feature activations and modify the network accordingly.

Architecture

DeconvNet

In order to improve AlexNet, a deconvnet is attached to the CNN. The architecture of the ZFNet-Deconvnet is presented below.

The left part of the figure is the ZFNet as presented in the “Architecture” section (with the ReLU activation functions in order to understand why ReLU is also used in DeconvNet) while the right part is the DeconvNet.

2D Unpooling

To understand how 2D max unpooling works, we will first remind how to do a 2D max pooling. On the figure below, a 2x2 max pooling with a stride 2 is applied on a 4x4 feature map. The colored regions represent the max pooling areas and the bold values in the input feature map are the maximum values for each area.

2D Max Pooling

There are two outputs of the 2D max pooling: one is the pooled map which will be used by the next layer of the neural network and the second output is the max locations. These locations are the row and column indices of the max value inside the corresponding pooling area. For example, in the red area the maximum value is $110$ which is located at $(1, 0)$ (row = 1, column = 0). This second output will be used by the unpooling layer.

The figure below represents a 2x2 max unpooling with a stride 2 applied on a 2x2 feature map.

2D Max Unpooling

The max locations map is used to place the values of the input feature map into the new one. Note that unpooling is a partial reconstruction since the non maximum values are lost (replaced by $0$).

Rectification

The “rectification” is simply the use of ReLU activation function. Since ReLU is used to obtain only positive values in the CNN part, for the same reason that it is used in the CNN.

Filtering

“The convnet uses learned filters to convolve the feature maps from the previous layer. To invert this, the deconvnet uses transposed versions of the same filters, but applied to the rectified maps, not the output of the layer beneath. In practice this means flipping each filter vertically and horizontally.” In the ZFNet-DeconvNet architecture, the “deconvolution” is noted “deconv”.

About the deconvolution

The deconvolution layer used here is not the same than a transposed convolution. You can find more information in this post [2].

Bibliography

results matching ""

    No results matching ""