Cover image for Baseball Hacks

Book description


Baseball Hacks isn't your typical baseball book--it's a book about how to watch, research, and understand baseball. It's an instruction manual for the free baseball databases. It's a cookbook for baseball research. Every part of this book is designed to teach baseball fans how to do something. In short, it's a how-to book--one that will increase your enjoyment and knowledge of the game.

So much of the way baseball is played today hinges upon interpreting statistical data. Players are acquired based on their performance in statistical categories that ownership deems most important. Managers make in-game decisions based not on instincts, but on probability - how a particular batter might fare against left-handed
pitching, for instance.

The goal of this unique book is to show fans all the baseball-related stuff that they can do for free (or close to free). Just as open source projects have made great software freely available, collaborative projects such as Retrosheet and Baseball DataBank have made great data freely available. You can use these data sources to research your favorite players, win your fantasy league, or appreciate the game of baseball even more than you do now.

Baseball Hacks shows how easy it is to get data, process it, and use it to truly understand baseball. The book lists a number of sources for current and historical baseball data, and explains how to load it into a database for analysis. It then introduces several powerful statistical tools for understanding data and forecasting results.

For the uninitiated baseball fan, author Joseph Adler walks readers through the core statistical categories for hitters (batting average, on-base percentage, etc.), pitchers (earned run average, strikeout-to-walk ratio, etc.), and fielders (putouts, errors, etc.). He then extrapolates upon these numbers to examine more advanced data groups like career averages, team stats, season-by-season comparisons, and more. Whether you're a mathematician, scientist, or season-ticket holder to your favorite team, Baseball Hacks is sure to have something for you.

Advance praise for Baseball Hacks:

"Baseball Hacks is the best book ever written for understanding and practicing baseball analytics. A must-read for baseball professionals and enthusiasts alike."

-- Ari Kaplan, database consultant to the Montreal Expos, San Diego Padres, and Baltimore Orioles

"The game was born in the 19th century, but the passion for its analysis continues to grow into the 21st. In Baseball Hacks, Joe Adler not only demonstrates that
the latest data-mining technologies have useful application to the study of baseball statistics, he also teaches the reader how to do the analysis himself, arming the dedicated baseball fan with tools to take his understanding of the game to a higher level."

-- Mark E. Johnson, Ph.D., Founder, SportMetrika, Inc. and Baseball Analyst for the 2004 St. Louis Cardinals

Table of Contents

  1. Baseball Hacks
    1. Credits
      1. About the Author
      2. Contributors
      3. Acknowledgments
    2. Preface
      1. Why Baseball Hacks?
      2. How to Use This Book
      3. How This Book Is Organized
      4. Conventions Used in This Book
      5. Using Code Examples
      6. How to Contact Us
      7. Got a Hack?
      8. Safari® Enabled
    3. 1. Basics of Baseball
      1. Hacks 1–7: Introduction
        1. Baseball 101
      2. Score a Baseball Game
        1. Traditional Scoring
          1. Record starting players’ names.
          2. Record plays during the game.
          3. Record substitutions.
          4. Record other information.
        2. Hacking the Hack
        3. See Also
      3. Make a Box Score from a Score Sheet
        1. The Official Rules for Scoring
        2. Calculating a Box Score from a Score Sheet
          1. Step 1: Draw columns for names, at bats, runs, hits, and anything else.
          2. Step 2: Copy the batters’ names.
          3. Step 3: Count statistics for each batter.
          4. Step 4: Count statistics for each pitcher.
          5. Step 5: Prove the box score.
        3. Hacking the Hack
      4. Keep Score, Project Scoresheet–Style
        1. The Contents of a Play Code
          1. Play code structure.
          2. Fielding.
          3. Type of play.
          4. Description.
          5. Base running.
          6. Example play codes.
          7. Pitch codes.
      5. Follow Pitches During a Game
        1. Following the Pitching Strategy
          1. Set up a pitch outside.
          2. Follow breaking balls with a fastball.
          3. Follow fastballs with a breaking ball.
          4. Always throw the same impossible-to-hit pitch.
          5. Move the player off the plate.
        2. Identifying Pitches
          1. Step 1: Watch the umpire for location.
          2. Step 2: Watch the catcher for location.
          3. Step 3: Look at pitch speeds to determine pitch type.
          4. Step 4: Watch what the ball does at the plate.
          5. Step 5: Watch the pitcher react to the catcher’s signals.
          6. Step 6: Watch the catcher’s signals.
      6. Follow the Game Online
        1. Player Statistics
        2. Independent Commentary (Including Blogs)
        3. Hacking the Hack
      7. Add Baseball Searches to Firefox
        1. Adding Search Engines to Firefox
        2. Running the Hack
        3. Hacking the Hack
      8. Find Images of Stadiums
        1. Better Pictures and Distances with Google Earth
    4. 2. Baseball Games from Past Years
      1. Hacks 8–23: Introduction
      2. Get and Install MySQL
        1. Installation on Windows
          1. Step 0: Buy and install a software or hardware firewall.
          2. Step 1: Download and unpack the installer.
          3. Step 2: Run the Installation wizard.
          4. Step 3: Run the Configuration wizard.
        2. Testing the Installation
        3. Hacking the Hack
        4. See Also
      3. Get an Access Database of Player and Team Statistics
        1. A Player and Team Statistics Database for Microsoft Access
          1. Step 1: Download the file.
          2. Step 2: Decompress and save the file.
          3. Step 3: Open the database file.
          4. Step 4: Test the database.
        2. The Contents of the Database
      4. Get a MySQL Database of Player and Team Statistics
        1. Step 1: Download the File
        2. Step 2: Decompress the File
        3. Step 3: Create the Database
        4. Step 4: Import the Database
        5. Step 5: Check That Everything Is There
        6. The Contents of the Database
        7. Hacking the Hack
          1. Annual updates.
          2. Getting baseball statistics as text files.
      5. Make Your Own Stats Book
        1. Write the Queries
          1. Step 1: Create “batters who played in 2004” query.
          2. Step 2: Create “fielding by games” query.
          3. Step 3: Create “fielding by most frequent position” query.
          4. Step 4: Create “team names” query.
          5. Step 5: Create “batting plus” query.
        2. Build the Report
        3. Hacking the Hack
      6. Get Perl
        1. Getting and Installing Perl
        2. Install the Perl Modules Required in This Book
        3. Hacking the Hack
          1. Step 1: Download the Cygwin installer.
          2. Step 2: Run the Cygwin installer.
          3. Step 3: Configure Cygwin.
      7. Learn Perl
        1. The Basics
          1. Statements.
          2. Variables.
          3. Datatypes.
          4. Control structures.
          5. Comments.
        2. An Example Program
        3. Some Not-so-Basic Basics
          1. Pattern matching through regular expressions.
          2. Subroutines.
          3. Modules and packages.
        4. Editors
        5. Hacking the Hack
      8. Get Historical Play-by-Play Data
        1. Retrosheet Event Files
        2. The Code
        3. Running the Hack
        4. See Also
      9. Make Box Scores or Database Tables from Play-by-Play Data with Retrosheet Tools
        1. Running the Tools
          1. Preprocessing event files with BEVENT.
        2. Chadwick
        3. See Also
      10. Use SQL to Explore Game Data
        1. Talking to Your Database
        2. Tables
        3. Queries
          1. Joins.
          2. Aggregates.
          3. Subqueries.
          4. Saving results in tables.
          5. Deleting tables.
        4. Running Scripts
        5. Getting More Information and Help
      11. Use Microsoft Access to Run SQL Queries
        1. SQL Queries in Access
        2. Changing SQL Queries to Graphical Queries
        3. Subqueries in Access
      12. Get a GUI for MySQL
        1. Other Tools
          1. MySQL Administrator.
          2. Tora and Toad.
          3. Aqua Data Studio.
      13. Move Data from a Database to Excel
        1. Select the Right Data for a Spreadsheet
        2. Running the Hack
          1. Moving data from Access to Excel.
          2. Moving data from MySQL Query Browser to Excel.
          3. Moving data from MySQL to Excel.
        3. Hacking the Hack
      14. Load Baseball Data into MySQL
        1. The Code
        2. Running the Hack
        3. Hacking the Hack
        4. See Also
      15. Load Retrosheet Game Logs
        1. The Code
      16. Make a Historical Play-by-Play Database
        1. The Code
          1. Fetching the data.
          2. Transforming the data.
          3. Creating a database import statement.
          4. Creating a play-by-play database and tables.
      17. Use Regular Expressions to Identify Events
        1. Hacking the Hack
        2. See Also
    5. 3. Stats from the Current Season
      1. Hacks 24–29: Introduction
      2. Use Microsoft Excel Web Queries to Get Stats
        1. Web Queries
        2. Web Query Example: Up-to-Date Park Factors
          1. Step 1: Find the data.
          2. Step 2: Running the queries.
          3. Step 3: Name stuff.
          4. Step 4: Create a results table.
        3. Hacking the Hack
      3. Spider Baseball Sites for Data
        1. The Code
        2. Running the Hack
        3. Hacking the Hack
        4. See Also
      4. Discover How Live Score Applications Work
        1. Use Your Router’s Content Filtering Feature
        2. Use a Proxy Server
        3. Packet Filters
      5. Keep Your Stats Database Up-to-Date
        1. The Code
          1. Create the box score database.
          2. The update script.
          3. The bootstrapping script.
          4. The helping code.
        2. Running the Hack
        3. Hacking the Hack
      6. Get Recent Play-by-Play Data
        1. The Code
          1. The spider script.
          2. The parser script.
        2. Running the Hack
        3. Hacking the Hack
        4. See Also
      7. Find Data on Hit Locations
        1. The Code
        2. Running the Hack
        3. Hacking the Hack
    6. 4. Visualize Baseball Statistics
      1. Hacks 30–39: Introduction
      2. Plot Histograms in Excel
        1. The Code
        2. Hacking the Hack
      3. Get R and R Packages
      4. Analyze Baseball with R
        1. Calculations in R
        2. Assignment in R
        3. Arrays
        4. Data Frames
        5. Comments
        6. Functions
        7. Graphics in R
        8. Hacking the Hack
        9. See Also
      5. Access Databases Directly from Excel or R
        1. Use ODBC in R
        2. Use ODBC in Excel
        3. Hacking the Hack
      6. Load Text Files into R
        1. The Code
        2. See Also
      7. Compare Teams and Players with Lattices
        1. The Code
        2. Running the Hack
      8. Compare Teams Using Chernoff Faces
        1. The Code
        2. Run the Hack
        3. Hacking the Hack
      9. Plot Spray Charts
          1. Step 1: Load the file into a data frame.
          2. Step 2: Set up the axes and the diamond.
          3. Step 3: Plot matchups.
        1. Batter Spray Diagrams
        2. Hexagonal Binning
          1. Step 1: Get the hexbin package.
          2. Step 2: Load the hexbin package.
          3. Step 3: Plot the graph.
        3. Hacking the Hack
      10. Chart Team Stats in Real Time
        1. The Code
        2. Running the Hack
        3. Hacking the Hack
          1. Create a batch file to plow through several teams.
          2. Still more automation.
          3. Don’t export individual PNG images.
      11. Slice and Dice Teams with Cubes
        1. Prerequisites
        2. The Code
          1. Step 1: Define local cube contents.
          2. Step 2: Create the local cube.
          3. Step 3: Create a local web application to interact with the cube.
        3. Running the Hack
        4. Hacking the Hack
    7. 5. Formulas
      1. Hacks 40–59: Introduction
        1. How I Chose the Formulas
          1. Summary Statistics for the Formulas
          2. Using Formulas for Fantasy Baseball
          3. Who Came Up with These Things?
      2. Measure Batting with Batting Average
        1. Sample Code
          1. Batting average formula.
        2. Running the Hack
          1. Summary statistics.
          2. Top 10.
          3. Distribution.
          4. Box plot.
      3. Measure Batting with On-Base Percentage
        1. Sample Code
        2. Running the Hack
          1. Summary statistics.
          2. Top 10.
          3. Distribution and box plot.
      4. Measure Batting with SLG
        1. The Formula
        2. Sample Code
        3. Running the Hack
          1. Summary statistics.
          2. Top 10.
          3. Distribution and box plot.
      5. Measure Batting with OPS
        1. Running the Hack
          1. Summary statistics.
          2. Top 10.
          3. Distribution and box plot.
      6. Measure Power with ISO
        1. Sample Code
          1. Summary statistics.
          2. Top 10.
          3. Distribution and box plot.
      7. Measure Batting with Runs Created
        1. The Formula
        2. Sample Code
          1. Summary statistics.
          2. Top 10.
          3. Histogram and box plot.
      8. Measure Batting with Linear Weights
        1. Estimating the Weights from an Expected Runs Matrix
        2. Palmer’s Formula
        3. Sample Code
        4. Running the Hack
          1. Summary statistics.
          2. Top 10.
          3. Histogram and box plot.
        5. Hacking the Hack
          1. Estimate weights with linear regression.
      9. Measure Pitching with ERA
        1. The Code
        2. Running the Hack
          1. Summary statistics.
          2. Top 10.
          3. Distribution and box plot.
      10. Measure Pitching with WHIP
        1. Running the Hack
          1. Summary statistics.
          2. Top 10.
          3. Distribution and box plot.
      11. Measure Pitching with Linear Weights
        1. Sample Code
        2. Running the Hack
          1. Summary statistics.
          2. Top 10.
          3. Distribution and box plot.
      12. Measure Defense with Defensive Efficiency
        1. The Formula
        2. Sample Code
        3. Summary statistics.
          1. Distribution in last 10 years (1994–2003).
          2. Distribution and box plot.
      13. Measure Pitching with DIPS
        1. The Formula
        2. Sample Code
          1. Last year (2003).
          2. Last 10 years (1994–2003).
          3. Last 50 years (1955–2003).
        3. Lucky and Unlucky Players
      14. Measure Base Running Through EqBR
        1. Equivalent Batter Runs
        2. The Code
          1. Summary statistics.
          2. Top 10.
          3. Histogram.
        3. Hacking the Hack
      15. Measure Fielding with Fielding Percentage
        1. The Formula
        2. Sample Code
      16. Measure Fielding with Range Factor
        1. Sample Code
          1. Summary statistics.
          2. Top 10.
          3. Histogram.
          4. Box plot.
        2. Hacking the Hack
      17. Measure Fielding with Linear Weights
        1. The Formula
        2. Calculating Fielding Runs
          1. Step 1: Calculate league totals.
          2. Step 2: Calculate team totals.
          3. Step 3: Calculate expected values for each player.
          4. Step 4: Calculate fielding runs for each player.
        3. Sample Code
        4. Summary statistics.
          1. Descriptive statistics.
          2. Top 10.
          3. Distribution and box plot.
        5. See Also
      18. Measure Park Effects
        1. Approaches
          1. Requirements for a good park factor.
        2. Methodology
        3. Sample Code
        4. Using Park Factors
          1. Park factors for run-based measurements.
          2. Park factors for other statistics.
        5. Hacking the Hack
          1. Enhancements.
          2. Compare individual offensive stats (singles, doubles, triples, home runs, etc.).
          3. Fancy statistical approaches.
        6. See Also
      19. Calculate Fan Save Value
        1. Saves
          1. Saves are subjective.
        2. Fan Save Value
          1. How does a fan save value compare to the standard save value?
        3. Sample Code
        4. Using the Fan Save Value Formula
      20. Calculate Save Value
        1. The Formula
        2. Using the Save Value
        3. Sample Code
        4. Using the Save Value Formula
      21. Calculate Holds and Decent Holds for Relief Pitchers
        1. What Is a Hold?
          1. Analysis of reliever statistics.
          2. Analysis of the hold statistic.
        2. The Code
        3. Decent Hold
    8. 6. Sabermetric Thinking
      1. Hacks 60–71: Introduction
        1. Thinking About Baseball
      2. Calculate Expected Runs
        1. The Code
        2. Running the Hack
        3. Hacking the Hack
          1. What actually happened when teams bunted?
          2. Is bunting ever a good strategy?
        4. See Also
      3. Calculate an Expected Hits Matrix
        1. The Code
        2. Running the Hack
        3. Hacking the Hack
          1. Strikeouts and the count.
          2. Extra base hits and the count.
          3. Balls in play and the count.
      4. Look for Evidence of Platoon Effects
        1. Average Platoon Effects
        2. Switch Hitters
        3. Hacking the Hack
      5. Significant Number of At Bats
        1. Find the Distribution of At Bats
        2. Statistical Significance
          1. OBP, AVG, and accuracy.
          2. Testing the hypothesis.
      6. Find “Clutch” Players
        1. Identify Clutch Players
        2. Measure Player Performance in Clutch Situations
        3. Compare Players
        4. Understanding the Results
        5. The Code
          1. Top players in clutch situations.
          2. Significant clutch performances.
      7. Calculate Expected Number of Wins
        1. The Pythagorean Wins Formula
          1. The code.
        2. The Pythagenport Formula
        3. The Back-of-the-Envelope Method
      8. Measure Hits by Pitch Count
        1. The Code
        2. Running the Hack
        3. Hacking the Hack
      9. OBP, SLG, and Scoring Runs
        1. The Data and the Code
        2. The Results
      10. Measure Skill Versus Luck
        1. The Code
      11. Odds of the Best Team Winning the World Series
      12. Top 10 Bargain Outfielders
        1. The Code
          1. Prerequisites.
          2. Define SQL query for getting raw data.
          3. Import data into R.
        2. Running the Hack
          1. Identify common attributes.
          2. Look at correlations.
          3. Identify possible explanations for correlations.
          4. Assign attribute scores.
          5. Group players based on similarity.
          6. Attach group membership to data set.
          7. Transform salary variable.
          8. Create linear regression model.
          9. Compare predicted versus actual salaries.
          10. Identify most-underpaid players.
          11. Identify most-overpaid players.
        3. Hacking the Hack
          1. Vary the numbers of factors and/or number of clusters.
          2. Look at different positions.
          3. Look at different years.
          4. Include more variables.
      13. Fitting Game Scores to a Strength Model
        1. The Data and the Code
        2. Strength of Schedule
        3. Conclusion
    9. 7. The Bullpen
      1. Hacks 72–75: Introduction
      2. Start or Join a Fantasy League
        1. The Basics
          1. Methods of ranking teams.
          2. Methods of picking teams.
          3. Find or form a fantasy league.
        2. See Also
      3. Draft Your Fantasy Team
        1. The Basics
          1. Pick a closer.
          2. Pick an RBI man.
          3. Focus on AVG, not OBP.
        2. Draft Tips from an Expert
          1. Be patient with your money.
          2. Perform the “mustard toss”.
          3. Find a bargain.
          4. Target your favorites.
          5. Know thine enemy.
      4. Make a Scoreboard Widget
        1. The Code
        2. Running the Hack
        3. Hacking the Hack
        4. See Also
      5. Analyze Other Sports
        1. Possessions
        2. Clock Strategy
        3. See Also
    10. A. Where to Learn More Stuff
      1. Baseball Books
      2. Baseball Web Sites
      3. Statistics and Data Mining Books
      4. Databases and Computer Languages
    11. B. Abbreviations
    12. Index
    13. Colophon