O'Reilly logo

HTTP: The Definitive Guide by Brian Totty, Marjorie Sayer, Sailu Reddy, Anshu Aggarwal, David Gourley

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Chapter 9. Web Robots

We continue our tour of HTTP architecture with a close look at the self-animating user agents called web robots.

Web robots are software programs that automate a series of web transactions without human interaction. Many robots wander from web site to web site, fetching content, following hyperlinks, and processing the data they find. These kinds of robots are given colorful names such as "crawlers," "spiders," "worms," and "bots" because of the way they automatically explore web sites, seemingly with minds of their own.

Here are a few examples of web robots:

  • Stock-graphing robots issue HTTP GETs to stock market servers every few minutes and use the data to build stock price trend graphs.

  • Web-census robots gather "census" information about the scale and evolution of the World Wide Web. They wander the Web counting the number of pages and recording the size, language, and media type of each page.[1]

  • Search-engine robots collect all the documents they find to create search databases.

  • Comparison-shopping robots gather web pages from online store catalogs to build databases of products and their prices.

Crawlers and Crawling

Web crawlers are robots that recursively traverse information webs, fetching first one web page, then all the web pages to which that page points, then all the web pages to which those pages point, and so on. When a robot recursively follows web links, it is called a crawler or a spider because it "crawls" along the web created by HTML hyperlinks. ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required