Nevertheless, I still believe it is possible to attain about 100% accuracy with more LSTM layers. The following are the hyperparameters that I would still try to tune to see the accuracy:
// Hyper parameters for the LSTM trainingval learningRate = 0.001fval trainingIters = trainingDataCount * 1000 // Loop 1000 times on the datasetval batchSize = 1500 // I would set it 5000 and see the performanceval displayIter = 15000 // To show test set accuracy during trainingval numLstmLayer = 3 // 5, 7, 9 etc.
There are many other variants of the LSTM cell. One particularly popular variant is the Gated Recurrent Unit (GRU) cell, which is a slightly dramatic variation on the LSTM. It also merges the cell state and hidden ...