Part Three A BAG OF CASE STUDIES
Overview of all case studies
Scraping and information | Main | Important | ||
Case study | Description | extraction via... | packages | functions |
Collaboration Networks in the U.S. Senate | Scraping of bill cosponsorship data from the US Senate at thomas.loc.gov, assessment of collaboration network structure | URL manipulation, regular expressions | RCurl, stringr, igraph | getURL(), str_extract(), graph.edgelist(), get.adjacency() |
Parsing Information from Semi-Structured Documents | Scraping of climate data from Californian weather stations (ftp.wcc.nrcs.usda.gov), construction of a regex-based parser | FTP download, regular expressions and string manipulation tools | RCurl, stringr | getURL(), str_extract(), str_replace() |
Predicting the 2014 Academy Awards using Twitter | Collection of tweets from Twitter API (dev.twitter.com/docs/api/streaming), frequency-based prediction of Oscar winners | Persistent connection to Streaming API via streamR, regular expressions | streamR, twitteR, lubridate, stringr, plyr | filterStream(), parseTweets(), str_detect(), agrep() |
Mapping the Geographic Distribution of Names | Scraping phone book data from dastelefonbuch.de, extraction of zip codes and matching with geo-coordinates, creation of family name maps | HTML forms,XPath and regular expressions, R geographic functionality | RCurl, stringr, XML, maptools, maps, rgdal | getForm(), htmlParse(), xpathSApply(), str_extract(), function() |
Gathering Data on Mobile Phones | Scraping of mobile ... |
Get Automated Data Collection with R: A Practical Guide to Web Scraping and Text Mining now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.