There are several ways of controlling training of CNNs to prevent overfitting in the training phase. For example, L2/L1 regularization, max norm constraints, and drop out:
- L2 regularization: This is perhaps the most common form of regularization. It can be implemented by penalizing the squared magnitude of all parameters directly in the objective. For example, using the gradient descent parameter update, L2 regularization ultimately means that every weight is decayed linearly: W += -lambda * W towards zero.
- L1 regularization: This is another relatively common form of regularization, where for each weight w we add the term λ∣w∣ to the objective. However, it is also possible to possible to combine the L1 regularization with ...