Batch normalization helps our networks perform better overall and learn faster. Batch normalization is also fairly easy to understand in an application; however, why it works is still somewhat debated by researchers.
When using batch normalization, for each minibatch, we can normalize that batch to have a mean of 0 and unit variance, after (or before) each nonlinearity. This allows each layer to have a normalized input to learn from, which makes that layer more efficient at learning.
Batch normalization layers are easy to implement in Keras, and I'll be using them after each convolutional layer for the examples in this chapter. The following code is used for batch normalization:
from keras.layers import BatchNormalization ...