6.3. Enough Regression Tricks

To hammer a little harder on this point about the dangers of data mining, look at another equally bogus example. Who wants to go count pregnant sheep in Bangladesh to figure out next year's sheep population? We'll get away from ordinary linear regressions and show how we can fit a perfect model, with R2 = 100 percent, using only one variable: the year's digits.

Figure 6.5. First-degree polynomial fi t: just a plain old line.

This has to be about the most accessible data on the planet. There is no need to go counting sheep. Instead of regression, we'll use a different prediction method to make this work, a polynomial fit. Everyone with a recollection of junior high school math knows that there is a line (a first-degree polynomial) through any two points, as shown in Figure 6.5.

Put in a third point and you can fit a parabola, or second-degree polynomial, through all three points, as shown in Figure 6.6.

We have 10 points in the S&P 500 annual series from 1983 to 1992, so we fit a ninth-degree polynomial. However, as Mr. Wizard says, "Don't try this at home," unless you have some sort of infinite precision math tool like Mathematica or Maple. The ordinary floating point arithmetic in a spreadsheet or regular programming language isn't accurate enough for this to work. That said, our ninth-degree polynomial hits every annual close exactly. We have a ...

Get Nerds on Wall Street: Math, Machines, and Wired Markets now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.