Retrying failed page downloads

Failed page requests can be easily handled by Scrapy using retry middleware. When installed, Scrapy will attempt retries when receiving the following HTTP error codes:

[500, 502, 503, 504, 408]

The process can be further configured using the following parameters:

  • RETRY_ENABLED (True/False - default is True)
  • RETRY_TIMES (# of times to retry on any errors - default is 2)
  • RETRY_HTTP_CODES (a list of HTTP error codes which should be retried - default is [500, 502, 503, 504, 408])

Get Python Web Scraping Cookbook now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.