Geocoding

When we first looked at the data, we thought it would be really important to geocode all 436,106 unique addresses. That is, we wanted to associate a latitude and longitude with each address so that it would be easy to explore fine-grained spatial effects. This is an interesting challenge: how can you geocode nearly half a million addresses?

We started by looking at the well-known web services provided by Google and Yahoo!. These were unsuitable for two reasons: they impose strict daily limits on the number of requests, and there are cumbersome restrictions on the use of the resulting data. The request limit alone meant that it would take well over a month to geocode all the addresses, and then the licensing would have affected publication of the results! After further investigation we found a very useful open service, the USC WebGIS, provided by the GIS research laboratory at the University of Southern California (Goldberg and Wilson 2008). This service is free for noncommercial use and makes no restrictions on the uses of the resulting data. There was no daily usage cap when we began using the service, but there is an implicit cap caused by the speed: we could only geocode about 80,000 addresses per day, so it took us around five days to do all 400,000. The disadvantage of this free service is that the quality of the geocoding is not quite as good (it uses only publicly available address data), but the creators were very helpful and have published an excellent free introduction ...

Get Beautiful Data now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.