There's more...

First, for detailed information on robots.txt, see https://developers.google.com/search/reference/robots_txt.

Note that not all sites have a robots.txt, and its absence does not imply you have free rights to crawl all the content.  

Also, a robots.txt file may contain information on where to find the sitemap(s) for the website. We examine these sitemaps in the next recipe.

Scrapy can also read robots.txt and find sitemaps for you.  

Get Python Web Scraping Cookbook now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.