Calculate offense, defensive, and ballpark factors simultaneously, by fitting score data to a model.
In this hack, we will introduce a mathematical model that will be fit to a simple data set that can be derived from the scores of every game in a league’s season. If you have the data available, you can use this method for any league, anywhere.
We consider the list of scores for a season. Each game’s results are typically represented by the home team, the visiting team, the ballpark, and the scores. For example, consider the score of the opening game of the 2004 Cincinnati Reds (CIN) campaign, in which the Chicago Cubs (CHN) beat the Reds at the Great American Ballpark, 7–4.
Within this one game’s scores, we can derive five parameters and two scores. In other words, we can make the following two observations:
The Reds’ offense and the Cubs’ defense at the Cincinnati ballpark led to four runs.
The Cubs’ offense and the Reds’ defense at the Cincinnati ballpark led to seven runs.
You might also represent this as the following set of observations, organized as (OFFENSE, DEFENSE, BALLPARK, RUNS):
CIN CHN CIN 4 CHN CIN CIN 7
Thus, for one full season’s worth of Major League Baseball games, we will have two rows of observations per game, or 15 x 162 x 2 = 4,860 total rows of data.
Our objective here is to use this data to understand the contributions each component had on the scores for the season. More specifically, we will fit this data to a model to solve ...