Chapter 6. Stupid Data Miner Tricks

To Err Is Human. To Really Screw Up, You Need a Computer.

—Popular campus T-shirt, ca. 1980

This chapter started out over 10 years ago as a set of joke slides showing silly, spurious correlations. Originally, my quant equity research group planned on deliberately abusing the genetic algorithm (see Chapter 8 on evolutionary computation) to find the wackiest relationships, but as it turned out, we didn't need to get that fancy. Just looking at enough data using plain-vanilla regression would more than suffice.

We uncovered utterly meaningless but statistically appealing relationships between the stock market and third world dairy products and livestock populations which have been cited often—in BusinessWeek, the Wall Street Journal, the book A Mathematician Plays the Stock Market,[] and many others. Students from Bill Sharpe's classes at Stanford seem to be familiar with them. This was expanded, to have some actual content about data mining, and reissued as an academic working paper in 2001. Occasional requests for this arrive from distant corners of the world. An updated version appeared in the Journal of Investing in 2007.[]

[] This article originally appeared in the Spring 2007 issue of the Journal of Investing ("Stupid Data Miner Tricks: Overfitting the S&P 500"). It is reprinted with permission. To view the original article, please go to iijoi.com.

Without taking too much of a hatchet to the original, the advice here is still valuable—perhaps ...

Get Nerds on Wall Street: Math, Machines, and Wired Markets now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.