We have already tried running the simplest neural network consisting of a layer and an activation function. Let's see how a more complicated network will perform. We will use the network that has shown a very good performance on the MNIST dataset to see if it performs the same on CIFAR-10. Consider the following code:
arch = @mx.chain mx.Variable(:data) => mx.FullyConnected(num_hidden=128) => mx.Activation(act_type=:relu) => mx.FullyConnected(num_hidden=64) => mx.Activation(act_type=:relu) => mx.FullyConnected(num_hidden=10) => mx.Activation(act_type=:relu) => mx.SoftmaxOutput(mx.Variable(:label))nnet = mx.FeedForward(arch, context = mx.cpu())mx.fit(nnet, mx.ADAM(), train_data_provider, eval_data = validation_data_provider, ...