O'Reilly logo

Baseball Hacks by Joseph Adler

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Compare Teams Using Chernoff Faces

Use your innate ability to recognize facial features to compare teams.

People are naturally very good at recognizing visual patterns, particularly similarities and differences in human faces. In 1973, statistician Herman Chernoff developed a novel technique for comparing data points: plot data points as human faces, where each facial characteristic (mouth size, mouth expression, face shape, eye shape, etc.) represents a different variable in the data. This hack shows you how to apply this idea to baseball teams (or to anything else that you want to compare).

I found two sources of free code for plotting Chernoff faces. The first is from Dr. Hans Peter Wolf. You can find the code on his home page at http://www.wiwi.uni-bielefeld.de/~wolf. The second source is from Shigenobu AOKI, available at http://aoki2.si.gunma-u.ac.jp/R/face.html. In this book, I use Dr. Wolf’s code. (It doesn’t implement the original algorithm exactly, but it’s a lot easier to use.) Just copy all the code from http://www.wiwi.uni-bielefeld.de/~wolf/software/R-wtools/faces/faces.R, paste it into your R window, and hit Return. Or, easier yet, you can just use R’s source() command to load the code in one step. (I show this in the next section.)

Dr. Wolf’s faces() code requires a matrix of values to run. The data in the first column controls the height of the face, the data in the second controls the width, and so on. (See his site for the complete details.) Here are mappings you can use to find teams that are similar offensively:

Table 4-4. 

Column

Facial characteristics

Variable

1

Heightof face

HR

2

Width of face

H

3

Shape of face

HA

4

Height of mouth

HRA

5

Width of mouth

SOA

6

Curve of smile

BB

7

Height of eyes

BBA

As a small wrinkle, the faces() function uses the row names to label the diagram. Because it’s a lot nicer to know which face corresponds to which team than it is to associate row numbers such as “2416,” there’s an extra step to grab the team names. You’ll see this in the code.

The Code

We’ll use an ODBC connection [Hack #33] to the Baseball DataBank database [Hack #10] . Because the SQL statement to select the data spans several lines, the code uses R’s paste() function to concatenate the multiple lines into a single string. Type the following code into R:

	#Load the required libraries
	library(RODBC);

	#Load the faces code
	source("http://www.wiwi.uni-bielefeld.de/~wolf/software/R-wtools/faces/faces.R");

	#Fetch the data that will be used for the faces
	channel<-odbcConnect('bballdata');
	al2003<-sqlQuery(channel, paste (
	  "SELECT HR, H,HA,HRA,SOA,BB, BBA ",
	  "FROM teams WHERE ",
	  "lgID = 'AL' AND ",
	  "yearID = 2003"));
	#Fetch the team names and save as the row names
	row.names(al2003)<- sqlQuery(channel, paste (
	 "SELECT teamID FROM teams WHERE lgID = 'AL' AND yearID = 2003"))$teamID;
	# Run the faces program
	faces(as.matrix(al2003));

Run the Hack

Here’s the al2003 data set that I got when I ran the preceding code:

	>al2003
	     HR    H   HA HRA  SOA  BB BBA
	ANA 150 1473 1444 190  980 476 486
	BAL 152 1516 1579 198  981 431 526
	BOS 238 1667 1503 153 1141 620 488
	CHA 220 1445 1364 162 1056 519 518
	CLE 158 1413 1477 179  943 466 501
	DET 153 1312 1616 195  764 443 557
	KCA 162 1526 1569 190  865 476 566
	MIN 155 1567 1526 187  997 512 402
	NYA 230 1518 1512 145 1119 684 375
	OAK 176 1398 1336 140 1018 556 499
	SEA 139 1509 1340 173 1001 586 466
	TBA 137 1501 1454 196  877 420 639
	TEX 239 1506 1625 208 1009 488 603
	TOR 190 1580 1560 184  984 546 485

Figure 4-14 shows the output of the faces() plot for this data. As you can see, Boston scored many more runs than Baltimore did, so its face is much taller; Texas allowed a lot more home runs than Oakland did, so its mouth is larger.

What I find most remarkable about this diagram is that it’s not total nonsense. For example, the Yankees and the Red Sox are fairly similar offensive teams in many ways, and their “faces” bear out this resemblance.

Hacking the Hack

Here are some ideas for different things to compare with faces():

Compare groups of players

You can easily run this code on other groups of players to try to find similarities. I suggest trying groups of batters and pitchers.

Find players similar in some characteristics

If, for example, you want to look at pitcher injuries and compare similar players, you can use Chernoff faces to find pitchers who are most similar to one another.

Chernoff faces

Figure 4-14. Chernoff faces

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required