More on robots.txt

Using robots.txt is the original way to tell crawlers what not to crawl. This method is particularly helpful when you do not want search engines to crawl certain portions or all portions of your website. Maybe your website is not ready to be browsed by the general public, or you simply have materials that are not appropriate for inclusion in the SERPs.

When you think of robots.txt, it needs to be in the context of crawling and never in terms of indexing. Think of crawling as rules for document access on your website. The use of the robots.txt standard is almost always applied at the sitewide level, whereas the use of the robots HTML meta tag is limited to the page level or lower. It is possible to use robots.txt for individual files, but you should avoid this practice due to its associated additional maintenance overhead.

Note

All web spiders do not interpret or support the robots.txt file in entirely the same way. Although the big three search engines have started to collaborate on the robots.txt standard, they still deviate in terms of how they support robots.txt.

Is robots.txt an absolute requirement for every website? In short, no; but the use of robots.txt is highly encouraged, as it can play a vital role in SEO issues such as content duplication.

Creation of robots.txt

Creating robots.txt is straightforward and can be done in any simple text editor. Once you’ve created the file, you should give it read permissions so that it is visible to the outside world. On ...

Get SEO Warrior now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.