Once high-frequency trading models have been identified, the models are back-tested to ensure their viability. The back-testing software should be a "paper"-based prototype of the eventual live system. The same code should be used in both, and the back-testing engine should run on tick-by-tick data to reenact past market conditions. The main functionality code from the back-testing modules should then be reused in the live system.
To ensure statistically significant inferences, the model "training" period T should be sufficiently large; according to the central limit theorem (CLT), 30 observations is the bare minimum for any statistical significance, and 200 observations is considered a reasonable number. Given strong seasonality in intra-day data (recurrent price and volatility changes at specific times throughout the day), benchmark high-frequency models are back-tested on several years of tick-by-tick data.
The main difference between the live trading model and the back-test model should be the origin of the quote data; the back-test system includes a historical quote-streaming module that reads historical tick data from archives and feeds it sequentially to the module that has the main functionality. In the live trading system, a different quote module receives real-time tick data originating at the broker-dealers.
Except for differences in receiving quotes, both live and back-test systems should be identical; they can be ...