Activation functions in PyTorch

Part of the trick that makes ANNs perform as well as they do is the use of nonlinear activation functions. A first thought is simply to use a step function. In this case, an output from a particular occurs only when the input exceeds zero. The problem with the step function is that it cannot be differentiated, since it does not have a defined gradient. It consists only of flat sections and is discontinuous at zero. 

Another method is to use a linear activation function; however, this restricts our output to a linear function as well. This is not what we want, since we need to model highly nonlinear real-world data. It turns out that we can inject nonlinearity into our networks by using nonlinear activation ...

Get Deep Learning with PyTorch Quick Start Guide now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.