14.4. Securing robots.txt

If you have a web site, you can assume that search engines will find it and index the text and code of your site, adding it into their extensive catalog of sites for users to search. Many administrators do not want their sites to appear in search engines for a variety of reasons. The robots.txt file is a simple text file script at the root of your web host that tells a robot whether it has access to a certain file or directory. It is designed for companies that want to keep their data from being scanned by bots, preventing search engines from scanning or crawling their web site. It's flexible, in that different rules can be specified based on the robot's user agent. A sample robots.txt is as follows:

User-agent: * Disallow: ...

Get Enterprise Mac Security: Mac OS X Snow Leopard now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.