Implementing the scraper

Scraper would be a system of copying content of other websites using web scraping. First, we want to state a few of the things that we want to accomplish:

  • Downloading a web page
  • Parsing HTML
  • Cherry-picking attributes from the HTML
  • Saving the results

For a modern way to fetch content from the web, we will avoid the standard urllib library and go directly with the nicer requests library from the Python community.

For parsing and drilling into web pages, we'll use the almost de-facto library for this in the Python world—BeautifulSoup.

Let's fetch these via pip:

$ pip install requests beautifulsoup Requirement already satisfied (use --upgrade to upgrade): requests in /Library/Python/2.7/site-packages/requests-2.2.1-py2.7.egg Downloading/unpacking ...

Get Mastering RabbitMQ now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.