8.11. Genetic Algorithm Warning Label

A central danger in using the GA (and many other techniques) is that you may fall deep into the data mine. Recall that if you look at 100 statistical relationships that are significant at the 5 percent level, five of them are there only by chance. If you look at a million relationships, then 50,000 will be significant at that level. During the early stages of our use of the GA technology, a chromosome evolved encoding for a variable based on a 15-month average lagged 7 months minus a 7-month average lagged 15 months. It had a nice symmetry to it. It had high statistical significance. Was this data mining? You bet. Was the high predictive power spurious? A virtual certainty. It was ignominiously retired to the bit bucket.

With the GA, we know that we can dig down to the deepest region of the data mine and produce models with much better statistics and even wackier variables than the previous example. It is very easy to generate models with near-perfect predictive power, and coefficients significant at the 0.1 percent level on randomly generated data. It is important to be very careful not to fool ourselves. Here are some of the ways we achieve this:

  • Wetware before software. Wetware is the gray matter between your ears. The brain needs to be engaged before the GA is put into gear. The starting population always includes models developed using established econometric and quantitative methods. They make sense economically. Often, the result of ...

Get Nerds on Wall Street: Math, Machines, and Wired Markets now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.