Implementation of the softmax layer

We will now look at how we can implement a softmax layer. As we have already discussed, a sigmoid layer is used for assigning labels to a class—that is, if you want to have multiple nonexclusive characteristics that you want to infer from an input, you should use a sigmoid layer. A softmax layer is used when you only want to assign a single class to a sample by inference—this is done by computing a probability for each possible class (with probabilities over all classes, of course, summing to 100%). We can then select the class with the highest probability to give the final classification.

Now, let's see exactly what the softmax layer does—given a set of a collection of N real numbers (c0, ..., cN-1) , ...

Get Hands-On GPU Programming with Python and CUDA now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.