You are previewing An Introduction to Search Engines and Web Navigation.
O'Reilly logo
An Introduction to Search Engines and Web Navigation

Book Description

This book is a second edition, updated and expanded to explain the technologies that help us find information on the web. Search engines and web navigation tools have become ubiquitous in our day to day use of the web as an information source, a tool for commercial transactions and a social computing tool. Moreover, through the mobile web we have access to the web's services when we are on the move. This book demystifies the tools that we use when interacting with the web, and gives the reader a detailed overview of where we are and where we are going in terms of search engine and web navigation technologies.

Table of Contents

  1. Copyright
  2. Preface
    1. MOTIVATION
    2. AUDIENCE AND PREREQUISITES
    3. TIMELINESS
    4. ACKNOWLEDGMENTS
  3. 1. INTRODUCTION
    1. 1.1. BRIEF SUMMARY OF CHAPTERS
    2. 1.2. BRIEF HISTORY OF HYPERTEXT AND THE WEB
    3. 1.3. BRIEF HISTORY OF SEARCH ENGINES
  4. 2. THE WEB AND THE PROBLEM OF SEARCH
    1. 2.1. CHAPTER OBJECTIVES
    2. 2.2. SOME STATISTICS
      1. 2.2.1. Web Size Statistics
      2. 2.2.2. Web Usage Statistics
    3. 2.3. TABULAR DATA VERSUS WEB DATA
    4. 2.4. STRUCTURE OF THE WEB
      1. 2.4.1. Bow-Tie Structure of the Web
      2. 2.4.2. Small-World Structure of the Web
    5. 2.5. INFORMATION SEEKING ON THE WEB
      1. 2.5.1. Direct Navigation
      2. 2.5.2. Navigation within a Directory
      3. 2.5.3. Navigation using a Search Engine
      4. 2.5.4. Problems with Web Information Seeking
    6. 2.6. INFORMATIONAL, NAVIGATIONAL, AND TRANSACTIONAL QUERIES
    7. 2.7. COMPARING WEB SEARCH TO TRADITIONAL INFORMATION RETRIEVAL
      1. 2.7.1. Recall and Precision
    8. 2.8. LOCAL SITE SEARCH VERSUS GLOBAL WEB SEARCH
    9. 2.9. DIFFERENCE BETWEEN SEARCH AND NAVIGATION
    10. 2.10. CHAPTER SUMMARY
    11. 2.11. EXERCISES
  5. 3. THE PROBLEM OF WEB NAVIGATION
    1. 3.1. CHAPTER OBJECTIVES
    2. 3.2. GETTING LOST IN HYPERSPACE AND THE NAVIGATION PROBLEM
    3. 3.3. HOW CAN THE MACHINE ASSIST IN USER SEARCH AND NAVIGATION
      1. 3.3.1. The Potential Use of Machine Learning Algorithms
      2. 3.3.2. The Naive Bayes Classifier for Categorizing Web Pages
        1. 3.3.2.1. Bayes Rule
        2. 3.3.2.2. Naive Bayes Assumption
    4. 3.4. TRAILS SHOULD BE FIRST CLASS OBJECTS
    5. 3.5. ENTER MARKOV CHAINS AND TWO INTERPRETATIONS OF ITS PROBABILITIES
      1. 3.5.1. Markov Chains and the Markov Property
        1. 3.5.1.1. Markov Property
      2. 3.5.2. Markov Chains and the Probabilities of Following Links
      3. 3.5.3. Markov Chains and the Relevance of Links
    6. 3.6. CONFLICT BETWEEN WEB SITE OWNER AND VISITOR
    7. 3.7. CONFLICT BETWEEN SEMANTICS OF WEB SITE AND THE BUSINESS MODEL
    8. 3.8. CHAPTER SUMMARY
    9. 3.9. EXERCISES
  6. 4. SEARCHING THE WEB
    1. 4.1. CHAPTER OBJECTIVES
    2. 4.2. MECHANICS OF A TYPICAL SEARCH
    3. 4.3. SEARCH ENGINES AS INFORMATION GATEKEEPERS OF THE WEB
    4. 4.4. SEARCH ENGINE WARS, IS THE DUST SETTLING?
      1. 4.4.1. Competitor Number One: Google
      2. 4.4.2. Competitor Number Two: Yahoo
      3. 4.4.3. Competitor Number Three: Bing
      4. 4.4.4. Other Competitors
    5. 4.5. STATISTICS FROM STUDIES OF SEARCH ENGINE QUERY LOGS
      1. 4.5.1. Search Engine Query Logs
      2. 4.5.2. Search Engine Query Syntax
      3. 4.5.3. The Most Popular Search Keywords
    6. 4.6. ARCHITECTURE OF A SEARCH ENGINE
      1. 4.6.1. The Search Index
      2. 4.6.2. The Query Engine
      3. 4.6.3. The Search Interface
    7. 4.7. CRAWLING THE WEB
      1. 4.7.1. Crawling Algorithms
      2. 4.7.2. Refreshing Web Pages
      3. 4.7.3. The Robots Exclusion Protocol
      4. 4.7.4. Spider Traps
    8. 4.8. WHAT DOES IT TAKE TO DELIVER A GLOBAL SEARCH SERVICE?
    9. 4.9. CHAPTER SUMMARY
    10. 4.10. EXERCISES
  7. 5. HOW DOES A SEARCH ENGINE WORK
    1. 5.1. CHAPTER OBJECTIVES
    2. 5.2. CONTENT RELEVANCE
      1. 5.2.1. Processing Web Pages
      2. 5.2.2. Interpreting the Query
      3. 5.2.3. Term Frequency
      4. 5.2.4. Inverse Document Frequency
      5. 5.2.5. Computing Keyword TF–IDF Values
      6. 5.2.6. Caching Queries
      7. 5.2.7. Phrase Matching
      8. 5.2.8. Synonyms
      9. 5.2.9. Link Text
      10. 5.2.10. URL Analysis
      11. 5.2.11. Date Last Updated
      12. 5.2.12. HTML Structure Weighting
      13. 5.2.13. Spell Checking
      14. 5.2.14. Non-English Queries
      15. 5.2.15. Home Page Detection
      16. 5.2.16. Related Searches and Query Suggestions
    3. 5.3. LINK-BASED METRICS
      1. 5.3.1. Referential and Informational Links
      2. 5.3.2. Combining Link Analysis with Content Relevance
      3. 5.3.3. Are Links the Currency of the Web?
      4. 5.3.4. PageRank Explained
        1. 5.3.4.1. PageRank:
      5. 5.3.5. Online Computation of PageRank
      6. 5.3.6. Monte Carlo Methods in PageRank Computation
      7. 5.3.7. Hyperlink-Induced Topic Search
        1. 5.3.7.1. HITS:
      8. 5.3.8. Stochastic Approach for Link-Structure Analysis
        1. 5.3.8.1. SALSA:
      9. 5.3.9. Counting Incoming Links
      10. 5.3.10. The Bias of PageRank against New Pages
      11. 5.3.11. PageRank within a Community
      12. 5.3.12. Influence of Weblogs on PageRank
      13. 5.3.13. Link Spam
      14. 5.3.14. Citation Analysis
      15. 5.3.15. The Wide Ranging Interest in PageRank
    4. 5.4. POPULARITY-BASED METRICS
      1. 5.4.1. Direct Hit's Popularity Metric
      2. 5.4.2. Document Space Modification
      3. 5.4.3. Using Query Log Data to Improve Search
      4. 5.4.4. Learning to Rank
      5. 5.4.5. BrowseRank
    5. 5.5. EVALUATING SEARCH ENGINES
      1. 5.5.1. Search Engine Awards
      2. 5.5.2. Evaluation Metrics
      3. 5.5.3. Performance Measures
      4. 5.5.4. Eye Tracking Studies
      5. 5.5.5. Test Collections
      6. 5.5.6. Inferring Ranking Algorithms
    6. 5.6. CHAPTER SUMMARY
    7. 5.7. EXERCISES
  8. 6. DIFFERENT TYPES OF SEARCH ENGINES
    1. 6.1. CHAPTER OBJECTIVES
    2. 6.2. DIRECTORIES AND CATEGORIZATION OF WEB CONTENT
    3. 6.3. SEARCH ENGINE ADVERTISING
      1. 6.3.1. Paid Inclusion
      2. 6.3.2. Banner Ads
      3. 6.3.3. Sponsored Search and Paid Placement
      4. 6.3.4. Behavioral Targeting
      5. 6.3.5. User Behavior
      6. 6.3.6. The Trade-Off between Bias and Demand
      7. 6.3.7. Sponsored Search Auctions
      8. 6.3.8. Pay per Action
      9. 6.3.9. Click Fraud and Other Forms of Advertising Fraud
    4. 6.4. METASEARCH
      1. 6.4.1. Fusion Algorithms
      2. 6.4.2. Operational Metasearch Engines
      3. 6.4.3. Clustering Search Results
      4. 6.4.4. Classifying Search Results
    5. 6.5. PERSONALIZATION
      1. 6.5.1. Personalization versus Customization
      2. 6.5.2. Personalized Results Tool
      3. 6.5.3. Privacy and Scalability
      4. 6.5.4. Relevance Feedback
      5. 6.5.5. Personalized PageRank
      6. 6.5.6. Outride's Personalized Search
    6. 6.6. QUESTION ANSWERING (Q&A) ON THE WEB
      1. 6.6.1. Natural Language Annotations
      2. 6.6.2. Factual Queries
      3. 6.6.3. Open Domain Question Answering
      4. 6.6.4. Semantic Headers
    7. 6.7. IMAGE SEARCH
      1. 6.7.1. Text-Based Image Search
      2. 6.7.2. Content-Based Image Search
      3. 6.7.3. VisualRank
      4. 6.7.4. CAPTCHA and reCAPTCHA
      5. 6.7.5. Image Search for Finding Location-Based Information
    8. 6.8. SPECIAL PURPOSE SEARCH ENGINES
    9. 6.9. CHAPTER SUMMARY
    10. 6.10. EXERCISES
  9. 7. NAVIGATING THE WEB
    1. 7.1. CHAPTER OBJECTIVES
    2. 7.2. FRUSTRATION IN WEB BROWSING AND NAVIGATION
      1. 7.2.1. HTML and Web Site Design
      2. 7.2.2. Hyperlinks and Surfing
      3. 7.2.3. Web Site Design and Usability
    3. 7.3. NAVIGATION TOOLS
      1. 7.3.1. The Basic Browser Tools
      2. 7.3.2. The Back and Forward Buttons
      3. 7.3.3. Search Engine Toolbars
      4. 7.3.4. The Bookmarks Tool
      5. 7.3.5. The History List
      6. 7.3.6. Identifying Web Pages
      7. 7.3.7. Breadcrumb Navigation
      8. 7.3.8. Quicklinks
      9. 7.3.9. Hypertext Orientation Tools
      10. 7.3.10. Hypercard Programming Environment
    4. 7.4. NAVIGATIONAL METRICS
      1. 7.4.1. The Potential Gain
      2. 7.4.2. Structural Analysis of a Web Site
      3. 7.4.3. Measuring the Usability of Web Sites
    5. 7.5. WEB DATA MINING
      1. 7.5.1. Three Perspectives on Data Mining
      2. 7.5.2. Measuring the Success of a Web Site
      3. 7.5.3. Web Analytics
      4. 7.5.4. E-Metrics
      5. 7.5.5. Web Analytics Tools
      6. 7.5.6. Weblog File Analyzers
      7. 7.5.7. Identifying the Surfer
      8. 7.5.8. Sessionizing
      9. 7.5.9. Supplementary Analyses
      10. 7.5.10. Markov Chain Model of Web Site Navigation
      11. 7.5.11. Applications of Web Usage Mining
      12. 7.5.12. Information Extraction
    6. 7.6. THE BEST TRAIL ALGORITHM
      1. 7.6.1. Effective View Navigation
      2. 7.6.2. Web Usage Mining for Personalization
      3. 7.6.3. Developing a Trail Engine
    7. 7.7. VISUALIZATION THAT AIDS NAVIGATION
      1. 7.7.1. How to Visualize Navigation Patterns
      2. 7.7.2. Overview Diagrams and Web Site Maps
      3. 7.7.3. Fisheye Views
      4. 7.7.4. Visualizing Trails within a Web Site
      5. 7.7.5. Visual Search Engines
      6. 7.7.6. Social Data Analysis
      7. 7.7.7. Mapping Cyberspace
    8. 7.8. NAVIGATION IN VIRTUAL AND PHYSICAL SPACES
      1. 7.8.1. Real-World Web Usage Mining
      2. 7.8.2. The Museum Experience Recorder
      3. 7.8.3. Navigating in the Real World
    9. 7.9. CHAPTER SUMMARY
    10. 7.10. EXERCISES
  10. 8. THE MOBILE WEB
    1. 8.1. CHAPTER OBJECTIVES
    2. 8.2. THE PARADIGM OF MOBILE COMPUTING
      1. 8.2.1. Wireless Markup Language
      2. 8.2.2. The i-mode Service
    3. 8.3. MOBILE WEB SERVICES
      1. 8.3.1. M-Commerce
      2. 8.3.2. Delivery of Personalized News
      3. 8.3.3. Delivery of Learning Resources
    4. 8.4. MOBILE DEVICE INTERFACES
      1. 8.4.1. Mobile Web Browsers
      2. 8.4.2. Information Seeking on Mobile Devices
      3. 8.4.3. Text Entry on Mobile Devices
      4. 8.4.4. Voice Recognition for Mobile Devices
      5. 8.4.5. Presenting Information on a Mobile Device
    5. 8.5. THE NAVIGATION PROBLEM IN MOBILE PORTALS
      1. 8.5.1. Click-Distance
      2. 8.5.2. Adaptive Mobile Portals
      3. 8.5.3. Adaptive Web Navigation
    6. 8.6. MOBILE SEARCH
      1. 8.6.1. Mobile Search Interfaces
      2. 8.6.2. Search Engine Support for Mobile Devices
      3. 8.6.3. Focused Mobile Search
      4. 8.6.4. Laid Back Mobile Search
      5. 8.6.5. Mobile Query Log Analysis
      6. 8.6.6. Personalization of Mobile Search
      7. 8.6.7. Location-Aware Mobile Search
    7. 8.7. CHAPTER SUMMARY
    8. 8.8. EXERCISES
  11. 9. SOCIAL NETWORKS
    1. 9.1. CHAPTER OBJECTIVES
    2. 9.2. WHAT IS A SOCIAL NETWORK?
      1. 9.2.1. Milgram's Small-World Experiment
      2. 9.2.2. Collaboration Graphs
      3. 9.2.3. Instant Messaging Social Network
      4. 9.2.4. The Social Web
      5. 9.2.5. Social Network Start-Ups
    3. 9.3. SOCIAL NETWORK ANALYSIS
      1. 9.3.1. Social Network Terminology
      2. 9.3.2. The Strength of Weak Ties
      3. 9.3.3. Centrality
      4. 9.3.4. Web Communities
      5. 9.3.5. Pajek: Large Network Analysis Software
    4. 9.4. PEER-TO-PEER NETWORKS
      1. 9.4.1. Centralized P2P Networks
      2. 9.4.2. Decentralized P2P Networks
      3. 9.4.3. Hybrid P2P Networks
      4. 9.4.4. Distributed Hash Tables
      5. 9.4.5. BitTorrent File Distribution
      6. 9.4.6. JXTA P2P Search
      7. 9.4.7. Incentives in P2P Systems
    5. 9.5. COLLABORATIVE FILTERING
      1. 9.5.1. Amazon.com
      2. 9.5.2. Collaborative Filtering Explained
      3. 9.5.3. User-Based Collaborative Filtering
        1. 9.5.3.1. User-Based CF
      4. 9.5.4. Item-Based Collaborative Filtering
        1. 9.5.4.1. Item-Based CF:
      5. 9.5.5. Model-Based Collaborative Filtering
      6. 9.5.6. Content-Based Recommendation Systems
      7. 9.5.7. Evaluation of Collaborative Filtering Systems
      8. 9.5.8. Scalability of Collaborative Filtering Systems
      9. 9.5.9. A Case Study of Amazon.co.uk
      10. 9.5.10. The Netflix Prize
      11. 9.5.11. Some Other Collaborative Filtering Systems
    6. 9.6. WEBLOGS (BLOGS)
      1. 9.6.1. Blogrolling
      2. 9.6.2. Blogspace
      3. 9.6.3. Blogs for Testing Machine Learning Algorithms
      4. 9.6.4. Spreading Ideas via Blogs
      5. 9.6.5. The Real-Time Web and Microblogging
    7. 9.7. POWER-LAW DISTRIBUTIONS IN THE WEB
      1. 9.7.1. Detecting Power-Law Distributions
      2. 9.7.2. Power-Law Distributions in the Internet
      3. 9.7.3. A Law of Surfing and a Law of Participation
      4. 9.7.4. The Evolution of the Web via Preferential Attachment
      5. 9.7.5. The Evolution of the Web as a Multiplicative Process
      6. 9.7.6. The Evolution of the Web via HOT
      7. 9.7.7. Small-World Networks
      8. 9.7.8. The Robustness and Vulnerability of a Scale-Free Network
    8. 9.8. SEARCHING IN SOCIAL NETWORKS
      1. 9.8.1. Social Navigation
      2. 9.8.2. Social Search Engines
      3. 9.8.3. Navigation Within Social Networks
      4. 9.8.4. Navigation Within Small-World Networks
      5. 9.8.5. Testing Navigation Strategies in Social Networks
    9. 9.9. SOCIAL TAGGING AND BOOKMARKING
      1. 9.9.1. Flickr—Sharing Your Photos
      2. 9.9.2. YouTube—Broadcast Yourself
      3. 9.9.3. Delicious for Social Bookmarking
      4. 9.9.4. Communities Within Content Sharing Sites
      5. 9.9.5. Sharing Scholarly References
      6. 9.9.6. Folksonomy
      7. 9.9.7. Tag Clouds
      8. 9.9.8. Tag Search and Browsing
      9. 9.9.9. The Efficiency of Tagging
      10. 9.9.10. Clustering and Classifying Tags
    10. 9.10. OPINION MINING
      1. 9.10.1. Feature-Based Opinion Mining
      2. 9.10.2. Sentiment Classification
      3. 9.10.3. Comparative Sentence and Relation Extraction
    11. 9.11. WEB 2.0 AND COLLECTIVE INTELLIGENCE
      1. 9.11.1. Ajax
      2. 9.11.2. Syndication
      3. 9.11.3. Open APIs, Mashups, and Widgets
      4. 9.11.4. Software as a Service
      5. 9.11.5. Collective Intelligence
      6. 9.11.6. Algorithms for Collective Intelligence
      7. 9.11.7. Wikipedia—The World's Largest Encyclopedia
      8. 9.11.8. eBay—The World's Largest Online Trading Community
    12. 9.12. CHAPTER SUMMARY
    13. 9.13. EXERCISES
  12. 10. THE FUTURE OF WEB SEARCH AND NAVIGATION
  13. BIBLIOGRAPHY