Cover by Carl D. Meyer, Amy N. Langville

Safari, the world’s most comprehensive technology and business learning platform.

Find the exact information you need to solve a problem on the fly, or go deeper to master the technologies and skills you need to succeed

Start Free Trial

No credit card required

O'Reilly logo

image

Chapter Two

Crawling, Indexing, and Query Processing

Spiders are the building blocks of search engines. Decisions about the design of the crawler and the capabilities of its spiders affect the design of the other modules, such as the indexing and query processing modules.

So in this chapter, we begin our description of the basic components of a web search engine with the crawler and its spiders. We purposely exclude one component, the ranking component, since it is the focus of this book and is covered in the remaining chapters. The goals and challenges of web crawlers are introduced in section 2.1, and a simple program for crawling the Web is ...

Find the exact information you need to solve a problem on the fly, or go deeper to master the technologies and skills you need to succeed

Start Free Trial

No credit card required