Web crawling frameworks

The following can be utilized to build web scrapers:

  • Scrapy: Scrapy is a free and open source web crawling platform written in Python that was originally designed for scraping the web. It is also possible to use Scrapy as a general purpose web scraping tool if you use its new version and APIs.
  • rvest: rvest is an R package that was written by Hadley Wickham that allows simple data collection from HTML web pages.
  • RSeleniumRSelenium is designed to make it easy to connect to a Selenium Server/Remote Selenium Server. RSelenium allows connections from the R environment to the Selenium Webdriver API.

Get R Web Scraping Quick Start Guide now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.