O'Reilly logo

Collective Intelligence in Action by Satnam Alag

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Chapter 6. Intelligent web crawling

This chapter covers
  • A brief overview of web crawling and intelligent crawling
  • A step-by-step implementation of a web crawler
  • Crawling with Nutch
  • Scalable web crawling

No one knows the exact number of web pages on the Internet. But we do know that the World Wide Web is

  • Huge, with billions of web pages
  • Dynamic, with pages being constantly added, removed, or updated
  • Growing rapidly

Given the huge amount of information available on the Internet, how does one find information of interest?

In this chapter, we continue our theme of gathering information from outside one’s application. You’ll be introduced to the field of intelligent web crawling to retrieve relevant information. Search engines crawl the web periodically ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required