Improving our robots.txt file
As mentioned in previous chapters, the robots.txt
file should only be used to let search engines know which pages/paths on the website we wish or do not wish to be crawled. Ideally, we would only want our main pages to be crawled and cached by search engines (products, categories, and CMS pages).
The robots.txt
file should be updated whenever a page is created that we do not wish to be crawled; however, the following list is a good place to start and will help to reduce the number of unnecessary pages cached by search engines.
Inside the robots.txt
file, we would add the following options (one per line, under User-Agent: *
):
Disallow: /checkout/ # To stop our checkout pages being crawled Disallow: /review/ # To disallow ...
Get Magento Search Engine Optimization now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.