Preventing bans by scraping via proxies

Sometimes you may get blocked by a site that your are scraping because you are identified as a scraper, and sometimes this happens because the webmaster sees the scrape requests coming from a uniform IP, at which point they simply block access to that IP.

To help prevent this problem, it is possible to use proxy randomization middleware within Scrapy. There exists a library, scrapy-proxies, which implements a proxy randomization feature.

Get Python Web Scraping Cookbook now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.