Architecture insights

Instead of choosing a particular filter size as in the previous architectures, the GoogLeNet designers applied all the three filters of sizes 1 x 1, 3 x 3, and 5 x 5 on the same patch, with a 3 x 3 max pooling and concatenation into a single output vector.

The use of 1 x 1 convolutions decreases the dimensions wherever the computation is increased by the expensive 3 x 3 and 5 x 5 convolutions. 1 x 1 convolutions with the ReLU activation function are used before the expensive 3 x 3 and 5 x 5 convolutions.

In GoogLeNet, inception modules are stacked one over the other. This stacking allows us to modify each module without affecting the later layers. For example, you can increase or decrease the width of any layer:

GoogLeNet ...

Get Practical Convolutional Neural Networks now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.