Improving our robots.txt file

As mentioned in previous chapters, the robots.txt file should only be used to let search engines know which pages/paths on the website we wish or do not wish to be crawled. Ideally, we would only want our main pages to be crawled and cached by search engines (products, categories, and CMS pages).

The robots.txt file should be updated whenever a page is created that we do not wish to be crawled; however, the following list is a good place to start and will help to reduce the number of unnecessary pages cached by search engines.

Inside the robots.txt file, we would add the following options (one per line, under User-Agent: *):

Disallow: /checkout/
# To stop our checkout pages being crawled
Disallow: /review/ # To disallow ...

Get Magento Search Engine Optimization now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.