Shaking the XML Tree

Parsing well-formed and valid XML is much easier parsing than the Sheriff’s html. An XML parsing package is available for R; here’s how to install it from CRAN’s repository:

    > install.packages("XML")
    > library("XML") 

Warning

If you are behind a firewall or proxy and getting errors:

On Unix: Set your http_proxy environment variable.

On Windows: try the custom install R wizard with internet2 option instead of “standard”. Click for additional info.

Our goal is to extract values contained within the <Latitude> and <Longitude> leaf nodes. These nodes live within the <Result> node, which lives inside a <ResultSet> node, which itself lies inside the root node

To find an appropriate library for getting these values, call library(help=XML). This function lists the functions in the XML package.

    > library(help=XML) #hit space to scroll, q to exit
    > ?xmlTreeParse

I see the function xmlTreeParse will accept an XML file or url and return an R structure. Paste in this block after inserting your Yahoo App ID.

  > library(XML)
  > appid<-'<put your appid here>'
  > street<-"1 South Broad Street"
  > requestUrl<-paste(
    "http://local.yahooapis.com/MapsService/V1/geocode?appid=",
    appid,
    "&street=",
    URLencode(street),
    "&city=Philadelphia&state=PA"
    ,sep="")
  > xmlResult<-xmlTreeParse(requestUrl,isURL=TRUE)

Warning

Are you behind a firewall or proxy in windows and this example is giving you trouble?

xmlTreeParse has no respect for your proxy settings. Do the following:

> Sys.setenv("http_proxy" ...

Get Data Mashups in R now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.