Use R and the Baseball DataBank database in MySQL to analyze statistically the 10 most underpaid outfielders for the years 1999–2003.
The most common conversation between any two baseball fans (after arguing about whose favorite team is better, of course) is what player is the best in the game. Baseball, more than any other sport, it seems, is obsessed with ranking players. Certainly, the fact that every player gets a turn at bat plays into the perception that such a thing should be a “simple” matter. The ensuing debate after failing to establish the proper all-time pecking order is how high all players’ salaries have become. Of course, both parties agree on that and then they’re friends again. But the real truth is that ranking players is highly subjective, and the distribution of player salaries is highly skewed. For every A-Rod making $22 million, there are 100 younger players making a “scant” $300,000.
So, I won’t even try to claim who the “best” players are; rather, I will simply use R to help identify what players have “similar” attributes to one another and then attempt to predict what their salary can be expected to be, based on those attributes. Players who were paid much lower than expected given their similarity to other players are considered “undervalued.” Likewise, players paid much more than expected are considered “overvalued.” And to help simplify things, I look only at outfielders for the years 1999–2003.
(This analysis isn’t completely ...