You are previewing Web Caching and Replication.
O'Reilly logo
Web Caching and Replication

Book Description

"Rabinovich and Spatscheck report a wealth of detailed information about how to implement Web caching and replication mechanisms, but more importantly, they teach me how to think about the general problem of content distribution. I'm pleased that there is finally a comprehensive book on this important subject."
--Larry Peterson, Professor of Computer Science, Princeton University

"This book is a remarkable piece of work, well-organized and clearly articulated. The authors have masterfully presented advanced topics in Internet Web infrastructure and content delivery networks in a way that is suitable for both novices and experts."
--Steve McCanne, Chief Technology Officer, Inktomi

As the Internet grows, evolving from a research tool into a staple of daily life, it is essential that the Web's scalability and performance keep up with increased demand and expectations. Every day, more and more users turn to the Internet to use resource-hungry applications like video and audio on-demand and distributed games. At the same time, more and more computer applications are built to rely on the Web, but with much higher sensitivity to delays of even a few milliseconds. The key to satisfying these growing demands and expectations lies in the practices of caching and replication and in the increased scalability solutions they represent.

Web Caching and Replication provides essential material based on the extensive real-world experience of two experts from AT&T Labs. This comprehensive examination of caching, replication, and load-balancing practices for the Web brings together information from and for the commercial world, including real-life products; technical standards communities, such as IETF and W3C; and academic research.

By focusing on the underlying, fundamental ideas that are behind the varied technologies currently used in caching and replication, this book will remain a relevant, much-needed resource as the multi-billion dollar industries that rely on the Web continue to grow and evolve.

The book approaches its two central topics in two distinct parts. The part on caching includes coverage of:

  • Proxy caching, including latency reduction and TCP connection caching

  • Transparent and nontransparent proxy deployment

  • Cooperative caching

  • Cache consistency

  • Replacement policies

  • Prefetching

  • "Caching the uncacheable"

  • The part on replication includes coverage of:

  • Basic mechanisms for request distribution, including content-blind and content-aware request distribution

  • CDNs, including DNS request distribution, streaming content delivery, and secure content access

  • Server selection

  • Examples and illustrations are included throughout the book. Extensive cross-referencing also enables readers to identify the corresponding parts of each section. Web Caching and Replication concludes with a thorough look into the future. It not only considers how new services can be implemented on caching and replication platforms, but also outlines emerging technologies that will allow for cooperation between different caching and replication enterprises in order to improve the overall performance of the Web.


    Table of Contents

    1. Copyright
    2. Preface
    3. Acknowledgments
    4. Introduction
    5. Background
      1. Network Layers and Protocols
        1. The ISO/OSI Reference Model
        2. Network Components at Different Layers
        3. Overview of Internet Protocols
        4. Summary
      2. The Internet Protocol and Routing
        1. Addressing
        2. IP Datagram Header
        3. Routing
        4. Multicast
        5. Summary
      3. Transmission Control Protocol
        1. Segment Header
        2. Opening a Connection
        3. Closing a Connection
        4. Flow Control
        5. Congestion Control
        6. Retransmission
        7. Summary
      4. Application Protocols for the Web
        1. Uniform Resource Locators
        2. The Domain Name System
        3. The HyperText Transfer Protocol
        4. The HTTP Message Exchange
        5. Hyperlinks and Embedded Objects
        6. Summary
      5. HTTP Support for Caching and Replication
        1. Conditional Requests
        2. Age and Expiration of Cached Objects
        3. Request Redirection
        4. Range Requests
        5. The cache-control Header
        6. Storing State for a Stateless Server: Cookies
        7. Support for Server Sharing
        8. Expanded Object Identifiers
        9. Learning the Proxy Chain
        10. Cacheability of Web Content
        11. Summary
      6. Web Behavior Rules of Thumb
        1. Evaluation Methods
        2. Object Size
        3. Object Types and Cacheability
        4. Object Popularity
        5. Locality of Reference
        6. Rate of Object Modifications
        7. Other Observations
        8. Summary: Rules of Thumb for the Web
    6. Web Caching
      1. Proxy Caching: Realistic Expectations
        1. Do Proxy Caches Deserve a Hearing?
        2. Latency Reduction
        3. Bandwidth Savings
        4. Proxies and Streaming Media
        5. Summary
      2. Proxy Deployment
        1. Overview of Internet Connectivity Architectures
        2. Nontransparent Proxy Deployment
        3. Transparent Proxy Deployment
        4. Security and Access Control Issues
        5. Summary
      3. Cooperative Proxy Caching
        1. Shared Cache: How Big Is Big Enough?
        2. Issues in Cooperative Proxy Caching
        3. Location Management
        4. Caching on a Global Scale: Proxy Pruning
        5. An Overview of Existing Platforms
        6. Summary
      4. Cache Consistency
        1. Cache Validation
        2. Cache Invalidation
        3. Issues in Cooperative Cache Consistency
        4. Summary
      5. Replacement Policy
        1. Replacement Policy Metrics
        2. Replacement Policy Algorithms
        3. The Value of Replacement Policy
        4. Summary
      6. Prefetching
        1. Performance Metrics
        2. Performance Bounds of Prefetching
        3. Taxonomy
        4. Nondata Prefetching
        5. Nontransparent Prefetching
        6. Server Push versus Client Pull
        7. Information Used in Prefetching Algorithms
        8. Prediction Algorithms
        9. Summary
      7. Caching the Uncacheable
        1. A Note on Implementation
        2. Modified Content and Stale Delivery Avoidance
        3. Cookied Content
        4. Expressly Uncacheable Content and Hit Metering
        5. Dynamic Content
        6. Active Proxies
        7. Summary
    7. Web Replication
      1. Basic Mechanisms for Request Distribution
        1. Content-Blind Request Distribution with Full Replication
        2. Content-Blind Request Distribution with Partial Replication
        3. Content-Aware Request Distribution
        4. Summary
      2. Content Delivery Networks
        1. Types of CDNs
        2. Delivering Requests to a CDN
        3. Finding Origin Servers
        4. Request Distribution in CDNs
        5. Pitfalls of DNS-Based Request Distribution
        6. Fine-Tuning DNS Request Distribution
        7. Data Consistency in CDNs
        8. Streaming Content Delivery
        9. Supporting Secure Content Access
        10. Summary
      3. Server Selection
        1. Metrics
        2. Algorithms
        3. Server Selection with Multiple Metrics
        4. DNS-Based Server Selection
        5. Why Choose a Server When You Can Have Them All?
        6. Summary
    8. Further Directions
      1. Adding Value at the Edge
        1. Content Filtering
        2. Content Transcoding
        3. Watermarking
        4. Custom Usage Reporting
        5. Implementing New Services with an Edge Server API
        6. The ICAP Protocol
        7. Distributing Web Applications
        8. Summary
      2. Content Distribution Internetworking
        1. Pros and Cons of CDI
        2. Request Distribution
        3. Content Distribution
        4. Accounting
        5. Summary
      3. Glossary
      4. Bibliography