Posted on by & filed under Content - Highlights and Reviews, Programming & Development.

code A guest post by Tom Barker, a software engineer, an engineering manager, a professor and an author. Currently he is Director of Software Engineering and Development at Comcast, and an Adjunct Professor at Philadelphia University. He has authored Pro JavaScript Performance: Monitoring and Visualization, Pro Data Visualization with R and JavaScript, and Technical Management: A Primer, and can be reached at @tomjbarker.

Continuing the theme I’ve been writing about, let’s look at crafting data maps in R. But first let’s level set and make sure that we clearly define a data map. A data map is a representation of information over a spatial field – a marriage of statistics with cartography. Data maps are one of the most easily understood and widely used data visualizations, because their data is couched in something that we are all familiar with and use: maps.

One of the earliest and most famous data maps is the Cholera map created by Jon Snow in 1854. The Cholera map is a data map that outlined the location of all of the diagnosed cases of cholera in the outbreak of London 1854. You can see the Cholera map below.


The shaded areas are recorded deaths from cholera, and the shaded circles on the map are water pumps. You can see from careful inspection that the recorded deaths seem to radiate out from the water pump on Broad Street.

Dr Snow had the Broad Street water pump closed and the outbreak ended. That was beautiful, concise, and logical.

While the Cholera map is considered one of the earliest examples of a data map, there are several notable contemporaries, including several by Charles Minard. Charles Minard was an engineer in nineteenth century France. He is most widely remembered for his data visualization of Napolean’s invasion of Russia in 1812.

But he also created several prominent data maps. Two of Minard’s most famous data maps are a data map demonstrating the source region and percentage of total cattle consumed in France, and a data map demonstrating the wine export path and destination from France.

Here is a copy of Charles Minard’s data map of cattle consumption:


And here is Minard’s is a copy of data map of wine exports:


Today we see data maps everywhere. They can be informative and artistic expresions like the wind map project from Fernanda Viegas and Martin Wattenberg (see below). Available at, the wind project demonstrates the path and force of wind currents over the US.


This is a wind map, showing wind speeds by region for the touchdown of Hurricane Sandy.

OK, now that we have some background information let’s get started.

Creating a Data Map in R

If you want to follow along with the exact data set that I’m using, I’ve uploaded my data here: . This is an excerpt from my access logs that I’ve already parsed, scrubbed, and used’s geo location API to associate country, state, and city information.

Because we are making a map we’ll need to install the map package. Let’s open up R and from the console type in:

Now we can begin! To reference the map package in the R script, we will need to load it into memory by calling the library() function.

Next let’s create several variables, one to point to our formatted access log data, and another to a list of column names. We create a third variable, logData, to hold the data frame created when we read in the flat file.

If you type logData in the console, you see the data frame formatted like so:

Clearly, you could start to track several different data points here. Let’s first look at mapping out what countries the traffic is coming from.

Begin by pulling the unique country names from logData. Store this in a variable named country:

If you type country in the console, the data in country looks like the following:

These are the country codes that you get back from R has a different set of country codes that it uses, so we’ll need to convert the iphost country codes to R country codes. You can do this by applying a function to the country list.

Let’s use sapply to apply an anonymous function of our own design to the list of country codes. In the anonymous function we will trim any whitespace and do a direct replacement of country codes:

There are a couple of things to notice from the source code above. First is that we are hard coding every country code that we have. The second thing to notice is that we have the country code “UA,” which is for the Ukraine, and we need to convert that to “USSR.” Apparently that aspect of the map package hasn’t been updated since the fall of the Soviet Union in 1991.

If you type country into the console again you’ll now see:

Next let’s use the function to match our countries with the map package’s list of countries. The function creates a numeric vector where each element corresponds to a country in the world map. The elements of intersection – where countries in our country list match with countries in the world map – have values assigned to them, specifically the index number from the original country list. So the element that corresponds to USA has a 1, the element that corresponds to Canada has a 2, and so on. Where there is no intersection, the element has the value NA.

Let’s next use our countryMatch list to create a color-coded country match. To do this, simply apply a function that checks each element, and if it is not NA assign the color “#C6DBEF” to the element, which is a nice light blue. If the element is NA, set the element to white or “#FFFFFF”. Let’s save the result of this in a new list that will call colorCountry.

Now let’s create our first visualization with the map() function! The map function accepts several parameters.

The first is the name of the database to use. The database name can be either world, usa, state or county, and each contains data points that correlate to geographic areas that the map function will draw.

If you only want to draw a subset of the larger geographic database, you can specify an optional parameter named region that lists the areas to draw.

You can also specify the map projection to use. If you don’t know what a map projection is, it is basically a way to represent a three-dimensional curved space on a flat surface. There are a number of predefined projections, and the mapproj package in R supports a number of these. For the world map that we will be making, we will use an equal area projection, the identifier of which is “azequalarea”. For more about map projections look here

You also can specify the center point of the map, in latitude and longitude, using the orientation parameter.

Finally, let’s pass the colorCountry list that was just made to the col parameter:

This produces the map below:


From this map you can see that the countries from the unique list are shaded blue and the rest of the countries are colored white. This is good, but let’s make it better.

Start by adding latitude and longitude lines. These will accentuate the curvature of the globe and give context to where the poles are. To create latitude and longitude lines, you must first create a new map object. Let’s set plot to FALSE so that the map is not drawn to the screen. Save this map object to a variable named m:

Next call map.grid and pass in the stored map object:

While we’re at it, let’s add a scale to the chart:

The completed R code should now look like so:

And this outputs the world map that you can see here:


For more details about R, see the resources below from Safari Books Online.

Not a subscriber? Sign up for a free trial.

Safari Books Online has the content you need

R for Everyone: Advanced Analytics and Graphics shows how by using the open source R language, you can build powerful statistical models to answer many of your most challenging questions. R has traditionally been difficult for non-statisticians to learn, and most R books assume far too much knowledge to be of help. R for Everyone is the solution. You’ll download and install R; navigate and use the R environment; master basic program control, data import, and manipulation; and walk through several essential tests. Then, building on this foundation, you’ll construct several complete models, both linear and nonlinear, and use some data mining techniques.
The Art of R Programming is both broad in its coverage of various language constructs and data structures, and deep and co mprehensive in explaining them. It provides working examples, and iluminates the R philosophy: a clean functional language with strong vector operation support, and a “do more with less typing” foundation that can make programs an order of magnitude smaller and expressive.
Pro Data Visualization using R and JavaScript by Tom Barker, makes the R language approachable, and promotes the idea of data gathering and analysis. You’ll see how to use R to interrogate and analyze your data, and then use the D3 JavaScript library to format and display that data in an elegant, informative, and interactive way. You will learn how to gather data effectively, and also how to understand the philosophy and implementation of each type of chart, so as to be able to represent the results visually.
Pro JavaScript Performance: Monitoring and Visualization by Tom Barker, gives you the tools to observe and track the performance of your web applications over time from multiple perspectives, so that you are always aware of, and can fix, all aspects of your performance.
Learning R will help you learn how to perform data analysis with the R language and software environment, even if you have little or no programming experience. With the tutorials in this hands-on guide, you’ll learn how to use the essential R tools you need to know to analyze data, including data types and programming concepts.

Tags: cartography, Charles Minard, Data Maps, Jon Snow, map function, R,

Comments are closed.