Now that we can select the state with the most awards given for the best policy recursively (see the nextState method in the following code), an online training method for the Q-learning algorithm can be performed for options trading, for example.
So, once the Q-learning model is trained using the supplied data, the next state can be predicted using the Q-learning model by overriding the data transformation method (PipeOperator, that is, |) with a transformation of a state to a predicted goal state:
override def |> : PartialFunction[QLState[T], Try[QLState[T]]] = { case st: QLState[T] if isModel => Try( if (st.isGoal) st else nextState(QLIndexedState[T](st, 0)).state) }
I guess that's enough of ...