You are previewing Google's PageRank and Beyond.
O'Reilly logo
Google's PageRank and Beyond

Book Description

Why doesn't your home page appear on the first page of search results, even when you query your own name? How do other web pages always appear at the top? What creates these powerful rankings? And how? The first book ever about the science of web page rankings, Google's PageRank and Beyond supplies the answers to these and other questions and more.

The book serves two very different audiences: the curious science reader and the technical computational reader. The chapters build in mathematical sophistication, so that the first five are accessible to the general academic reader. While other chapters are much more mathematical in nature, each one contains something for both audiences. For example, the authors include entertaining asides such as how search engines make money and how the Great Firewall of China influences research.

The book includes an extensive background chapter designed to help readers learn more about the mathematics of search engines, and it contains several MATLAB codes and links to sample web data sets. The philosophy throughout is to encourage readers to experiment with the ideas and algorithms in the text.

Any business seriously interested in improving its rankings in the major search engines can benefit from the clear examples, sample code, and list of resources provided.

  • Many illustrative examples and entertaining asides
  • MATLAB code
  • Accessible and informal style
  • Complete and self-contained section for mathematics review

Table of Contents

  1. Cover
  2. Half title
  3. Title
  4. Contents
  5. Preface
  6. Chapter 1. Introduction to Web Search Engines
    1. 1.1 A Short History of Information Retrieval
    2. 1.2 An Overview of Traditional Information Retrieval
    3. 1.3 Web Information Retrieval
  7. Chapter 2. Crawling, Indexing, and Query Processing
    1. 2.1 Crawling
    2. 2.2 The Content Index
    3. 2.3 Query Processing
  8. Chapter 3. Ranking Webpages by Popularity
    1. 3.1 The Scene in 1998
    2. 3.2 Two Theses
    3. 3.3 Query-Independence
  9. Chapter 4. The Mathematics of Google’s PageRank
    1. 4.1 The Original Summation Formula for PageRank
    2. 4.2 Matrix Representation of the Summation Equations
    3. 4.3 Problems with the Iterative Process
    4. 4.4 A Little Markov Chain Theory
    5. 4.5 Early Adjustments to the Basic Model
    6. 4.6 Computation of the PageRank Vector
    7. 4.7 Theorem and Proof for Spectrum of the Google Matrix
  10. Chapter 5. Parameters in the PageRank Model
    1. 5.1 The α Factor
    2. 5.2 The Hyperlink Matrix H
    3. 5.3 The Teleportation Matrix E
  11. Chapter 6. The Sensitivity of PageRank
    1. 6.1 Sensitivity with respect to α
    2. 6.2 Sensitivity with respect to H
    3. 6.3 Sensitivity with respect to v
    4. 6.4 Other Analyses of Sensitivity
    5. 6.5 Sensitivity Theorems and Proofs
  12. Chapter 7. The PageRank Problem as a Linear System
    1. 7.1 Properties of (I − αS)
    2. 7.2 Properties of (I − αH)
    3. 7.3 Proof of the PageRank Sparse Linear System
  13. Chapter 8. Issues in Large-Scale Implementation of PageRank
    1. 8.1 Storage Issues
    2. 8.2 Convergence Criterion
    3. 8.3 Accuracy
    4. 8.4 Dangling Nodes
    5. 8.5 Back Button Modeling
  14. Chapter 9. Accelerating the Computation of PageRank
    1. 9.1 An Adaptive Power Method
    2. 9.2 Extrapolation
    3. 9.3 Aggregation
    4. 9.4 Other Numerical Methods
  15. Chapter 10. Updating the PageRank Vector
    1. 10.1 The Two Updating Problems and their History
    2. 10.2 Restarting the Power Method
    3. 10.3 Approximate Updating Using Approximate Aggregation
    4. 10.4 Exact Aggregation
    5. 10.5 Exact vs. Approximate Aggregation
    6. 10.6 Updating with Iterative Aggregation
    7. 10.7 Determining the Partition
    8. 10.8 Conclusions
  16. Chapter 11. The HITS Method for Ranking Webpages
    1. 11.1 The HITS Algorithm
    2. 11.2 HITS Implementation
    3. 11.3 HITS Convergence
    4. 11.4 HITS Example
    5. 11.5 Strengths and Weaknesses of HITS
    6. 11.6 HITS’s Relationship to Bibliometrics
    7. 11.7 Query-Independent HITS
    8. 11.8 Accelerating HITS
    9. 11.9 HITS Sensitivity
  17. Chapter 12. Other Link Methods for Ranking Webpages
    1. 12.1 SALSA
    2. 12.2 Hybrid Ranking Methods
    3. 12.3 Rankings based on Traffic Flow
  18. Chapter 13. The Future of Web Information Retrieval
    1. 13.1 Spam
    2. 13.2 Personalization
    3. 13.3 Clustering
    4. 13.4 Intelligent Agents
    5. 13.5 Trends and Time-Sensitive Search
    6. 13.6 Privacy and Censorship
    7. 13.7 Library Classification Schemes
    8. 13.8 Data Fusion
  19. Chapter 14. Resources for Web Information Retrieval
    1. 14.1 Resources for Getting Started
    2. 14.2 Resources for Serious Study
  20. Chapter 15. The Mathematics Guide
    1. 15.1 Linear Algebra
    2. 15.2 Perron–Frobenius Theory
    3. 15.3 Markov Chains
    4. 15.4 Perron Complementation
    5. 15.5 Stochastic Complementation
    6. 15.6 Censoring
    7. 15.7 Aggregation
    8. 15.8 Disaggregation
  21. Chapter 16. Glossary
  22. Bibliography
  23. Index