Part I. Building Scrapers

This first part of the book focuses on the basic mechanics of web scraping: how to use Python to request information from a web server, how to perform basic handling of the server’s response, and how to begin interacting with a website in an automated fashion. By the end, you’ll be cruising around the Internet with ease, building scrapers that can hop from one domain to another, gather information, and store that information for later use.

To be honest, web scraping is a fantastic field to get into if you want a huge payout for relatively little upfront investment. In all likelihood, 90% of web scraping projects you’ll encounter will draw on techniques used in just the next six chapters. This section covers what the general (albeit technically savvy) public tends to think of when they think of “web scrapers”:

  • Retrieving HTML data from a domain name
  • Parsing that data for target information
  • Storing the target information
  • Optionally, moving to another page to repeat the process

This will give you a solid foundation before moving on to more complex projects in part II. Don’t be fooled into thinking that this first section isn’t as important as some of the more advanced projects in the second half. You will use nearly all the information in the first half of this book on a daily basis while writing web scrapers!

Get Web Scraping with Python now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.