How to do it

The depth control middleware is installed in the middleware pipeline by default. An example of depth limiting is contained in the 06/06_limit_depth.py script. This script crawls the static site provided with the source code on port 8080, and allows you to configure the depth limit. This site consists of three levels: 0, 1, and 2, and has three pages at each level. The files are named CrawlDepth<level><pagenumber>.html. Page 1 on each level links to the other two pages on the same level, as well as to the first page on the next level. Links to higher levels end at level 2. This structure is great for examining how depth processing is handled in Scrapy.

Get Python Web Scraping Cookbook now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.