Implementing the scraper

Scraper would be a system of copying content of other websites using web scraping. First, we want to state a few of the things that we want to accomplish:

Downloading a web page
Parsing HTML
Cherry-picking attributes from the HTML
Saving the results

For a modern way to fetch content from the web, we will avoid the standard urllib library and go directly with the nicer requests library from the Python community.

For parsing and drilling into web pages, we'll use the almost de-facto library for this in the Python world—BeautifulSoup.

Let's fetch these via pip:

$ pip install requests beautifulsoup Requirement already satisfied (use --upgrade to upgrade): requests in /Library/Python/2.7/site-packages/requests-2.2.1-py2.7.egg Downloading/unpacking ...

Get Mastering RabbitMQ now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.

Start your free trial

Mastering RabbitMQ by Emrah Ayanoglu, Yusuf Aytaş, Dotan Nahum

Implementing the scraper

Don’t leave empty-handed

It’s yours, free.

Check it out now on O’Reilly