Our output layer will contain a neuron for each class. Each class's associated neuron will be trained to predict the probability of that class as a value between 0 and 1. We will use a special activation called softmax to make sure all these outputs sum to one, and we will cover the details of softmax shortly.
This means that we will need to create a binary/categorical encoding of our classes. For example, if we had y = [0, 3, 2,1] and we encoded it categorically, we would have a matrix y like this:
Luckily, Keras provides a convenient function to make this conversion for us.