How to do it

So how do you go about being a good scraper?  There are several factors to this that we will cover in this chapter:

  • You can start with respecting the robots.txt file
  • Don't crawl every link you find on a site, just those given in a site map
  • Throttle your requests, so as do as Han Solo said to Chewbacca: Fly Casual; or, don't look like you are repeatedly taking content by Crawling Casual
  • Identify yourself so that you are known to the site

 

Get Python Web Scraping Cookbook now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.