R package maintainers

Another similarly straightforward data source might be the list of R package maintainers. We can download the names and e-mail addresses of the package maintainers from a public page of CRAN, where this data is stored in a nicely structured HTML table that is extremely easy to parse:

> packages <- readHTMLTable(paste0('http://cran.r-project.org', 
+   '/web/checks/check_summary.html'), which = 2)

Extracting the names from the Maintainer column can be done via some quick data cleansing and transformations, mainly using regular expressions. Please note that the column name starts with a space—that's why we quoted the column name:

> maintainers <- sub('(.*) <(.*)>', '\\1', packages$' Maintainer')
> maintainers <- gsub(' ', ' ', ...

Get Mastering Data Analysis with R now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.