- The script begins by importing reppy.robots:
from reppy.robots import Robots
- The code then uses Robots to fetch the robots.txt for amazon.com.
url = "http://www.amazon.com"robots = Robots.fetch(url + "/robots.txt")
- Using the content that was fetched, the script checks several URLs for accessibility:
paths = [ '/', '/gp/dmusic/', '/gp/dmusic/promotions/PrimeMusic/', '/gp/registry/wishlist/']for path in paths: print("{0}: {1}".format(robots.allowed(path, '*'), url + path))
The results of this code is the following:
True: http://www.amazon.com/False: http://www.amazon.com/gp/dmusic/True: http://www.amazon.com/gp/dmusic/promotions/PrimeMusic/False: http://www.amazon.com/gp/registry/wishlist/
The call to robots.allowed is given ...