Cover image for Web Caching

Book description

On the World Wide Web, speed and efficiency are vital. Users have little patience for slow web pages, while network administrators want to make the most of their available bandwidth. A properly designed web cache reduces network traffic and improves access times to popular web sites--a boon to network administrators and web users alike. Web Caching hands you all the technical information you need to design, deploy, and operate an effective web caching service. It starts with the basics of how web caching works, from the HTTP headers that govern cachability to cache validation and replacement algorithms. Topics covered in this book include:

  • Designing an effective cache solution

  • Configuring web browsers to use a cache

  • Setting up a collection of caches that can talk to each other

  • Configuring an interception cache or proxy

  • Monitoring and fine-tuning the performance of a cache

  • Configuring web servers to cooperate with web caches

  • Benchmarking cache products

The book also covers the important political aspects of web caching, including privacy, intellectual property, and security issues. Internet service providers, large corporations, or educational institutions--in short, any network that provides connectivity to a wide variety of users--can reap enormous benefit from running a well-tuned web caching service. Web Caching shows you how to do it right.

Table of Contents

  1. Web Caching
    1. Preface
      1. Audience
      2. What You Will and Won’t Find Here
      3. Caching Resources
        1. Web Sites
        2. Mailing Lists
      4. Conventions Used in This Book
      5. How To Contact Us
      6. Acknowledgments
    2. 1. Introduction
      1. Web Architecture
        1. Clients and Servers
        2. Proxies
        3. Web Objects
        4. Resource Identifiers
      2. Web Transport Protocols
        1. HTTP
        2. FTP
        3. SSL/TLS
        4. Gopher
      3. Why Cache the Web?
        1. Latency
        2. Bandwidth
        3. Server Load
      4. Why Not Cache the Web?
      5. Types of Web Caches
        1. Browser Caches
        2. Caching Proxies
        3. Surrogates
      6. Caching Proxy Features
      7. Meshes, Clusters, and Hierarchies
      8. Products
    3. 2. How Web Caching Works
      1. HTTP Requests
        1. Origin Server Requests
        2. Proxy Requests
        3. Non-HTTP Proxy Requests
      2. Is It Cachable?
        1. Status Codes
        2. Request Methods
        3. Expiration and Validation
        4. Cache-control
        5. Authentication
        6. Cookies
        7. Dynamic Content
      3. Hits, Misses, and Freshness
      4. Hit Ratios
      5. Validation
        1. Last-modified Timestamps
        2. Entity Tags
        3. Weak and Strong Validators
      6. Forcing a Cache to Refresh
        1. The no-cache Directive
        2. The max-age Directive
        3. The min-fresh Directive
      7. Cache Replacement
        1. Least Recently Used (LRU)
        2. First In, First Out (FIFO)
        3. Least Frequently Used (LFU)
        4. Size
        5. GreedyDual-Size (GDS)
        6. Other Algorithms
    4. 3. Politics of Web Caching
      1. Privacy
        1. Access Logs
        2. Making Requests Anonymous
      2. Request Blocking
      3. Copyright
        1. Does Caching Infringe?
        2. Cases and Precedents
        3. The DMCA
        4. HTTP’s Role
      4. Offensive Content
      5. Dynamic Web Pages
        1. Java Applets
      6. Content Integrity
      7. Cache Busting and Server Busting
      8. Advertising
      9. Trust
      10. Effects of Proxies
    5. 4. Configuring Cache Clients
      1. Proxy Addresses
      2. Manual Proxy Configuration
        1. Configuring Microsoft Internet Explorer
        2. Configuring Netscape Navigator
        3. NCSA Mosaic, Lynx, and Wget
      3. Proxy Auto-Configuration Script
        1. Writing a Proxy Auto-Configuration Function
        2. Sample PAC Scripts
        3. Setting the Proxy Auto-Configuration Script
      4. Web Proxy Auto-Discovery
      5. Other Configuration Options
      6. The Bottom Line
    6. 5. Interception Proxying and Caching
      1. Overview
      2. The IP Layer: Routing
        1. Inline Caches
        2. Layer Four Switches
        3. WCCP
        4. Cisco Policy Routing
      3. The TCP Layer: Ports and Delivery
        1. Linux
          1. ipchains
          2. iptables
        2. FreeBSD
        3. Other Operating Systems
      4. The Application Layer: HTTP
      5. Debugging Interception
      6. Issues
        1. It’s Difficult for Users to Bypass
        2. Packet Transport Service
        3. Routing Changes
        4. It Affects More Than Browsers and Users
        5. No-Intercept Lists
        6. Are Port 80 Packets Always HTTP?
        7. HTTP Interoperation Problems
        8. IP Interoperation Problems
      7. To Intercept or Not To Intercept
    7. 6. Configuring Servers to Work with Caches
      1. Important HTTP Headers
        1. Date
        2. Last-modified
        3. Expires
        4. Cache-control
        5. Content-length
      2. Being Cache-Friendly
        1. Why?
          1. Latency
          2. Hiding network failures
          3. Server load reduction
        2. Ten Ways to be Cache-Friendly
        3. Apache
          1. The Expires header
          2. General header manipulation
          3. Setting headers from CGI scripts
        4. How to Choose Expiration Times
      3. Being Cache-Unfriendly
      4. Other Issues for Content Providers
        1. What About Dynamic Responses?
        2. What About Advertisements?
        3. Getting Accurate Access Counts
    8. 7. Cache Hierarchies
      1. How Hierarchies Work
      2. Why Join a Hierarchy?
        1. Performance
        2. Nondefault Routing
      3. Why Not Join a Hierarchy?
        1. Trust
        2. Low Hit Ratios
        3. Effects on Routing
        4. Freshness
        5. Large Families
        6. Abuses, Real and Imagined
        7. Error Messages
        8. False Hits
        9. Forwarding Loops
        10. Failures and Service Denial
      4. Optimizing Hierarchies
    9. 8. Intercache Protocols
      1. ICP
        1. History
        2. Features
          1. Hit prediction
          2. Probing the network
          3. Object data with hits
          4. Source RTT measurements
        3. Issues
          1. Delays
          2. Bandwidth
          3. False hits
          4. UDP
          5. No request method
          6. Queries for uncachable responses
          7. Interoperation
          8. Unwanted queries
        4. Multicast ICP
      2. CARP
      3. HTCP
        1. Issues
      4. Cache Digests
        1. Bloom Filters
        2. Comparing Digests and ICP
      5. Which Protocol to Use
    10. 9. Cache Clusters
      1. The Hot Spare
      2. Throughput and Load Sharing
      3. Bandwidth
    11. 10. Design Considerations for Caching Services
      1. Appliance or Software Solution
        1. Appliances
        2. Software
      2. Disk Space
      3. Memory
      4. Network Interfaces
      5. Operating Systems
      6. High Availability
      7. Intercepting Traffic
      8. Load Sharing
      9. Location
      10. Using a Hierarchy
    12. 11. Monitoring the Health of Your Caches
      1. What to Monitor?
      2. Monitoring Tools
        1. UCD-SNMP
        2. RRDTool
        3. Other Tools
    13. 12. Benchmarking Proxy Caches
      1. Metrics
        1. Throughput
        2. Response Time
        3. Hit Ratio
        4. Connection Capacity
        5. Cost
      2. Performance Bottlenecks
        1. Disk Throughput
        2. CPU Power
        3. NIC Bandwidth
        4. Memory
        5. Network State
      3. Benchmarking Tools
        1. Web Polygraph
        2. Blast
        3. Wisconsin Proxy Benchmark
        4. WebJamma
        5. Other Benchmarks
      4. Benchmarking Gotchas
        1. TCP Delayed ACKs
        2. Port Number Exhaustion
        3. NIC Duplex Mode
        4. Bad Ethernet Cables
        5. Full Caches
        6. Test Duration
        7. Long-Lived Connections
        8. Small Working Sets
        9. Clock Sync
        10. MSL (TIME_WAIT) Values
      5. How to Benchmark a Proxy Cache
        1. Configure Systems
        2. Test the Network
        3. No-Proxy Test
        4. Fill the Cache
        5. Run the Benchmark
      6. Sample Benchmark Results
        1. Throughput
        2. Response Time
        3. Hit Ratio
        4. Other Results
    14. A. Analysis of Production Cache Trace Data
      1. Reply and Object Sizes
      2. Content Types
      3. HTTP Headers
        1. Client Request Headers
        2. Client Reply Headers
      4. Protocols
      5. Port Numbers
      6. Popularity
        1. Size and Popularity
      7. Cachability
      8. Service Times
      9. Hit Ratios
      10. Object Life Cycle
      11. Request Methods
      12. Reply Status Code
    15. B. Internet Cache Protocol
      1. ICPv2 Message Format
        1. Opcode
        2. Version
        3. Message Length
        4. Reqnum
        5. Options
        6. Option Data
        7. Sender Host Address
        8. Payload
      2. Opcodes
      3. Option Flags
      4. Experimental Features
        1. Pointers
        2. Object Advertisement
        3. Request Notification
        4. Object Removal and Invalidation
        5. MD5 Object Keys
        6. Eliminating URLs from Replies
        7. Wiretapping
        8. Prefetching
    16. C. Cache Array Routing Protocol
      1. Membership Table
      2. Routing Function
      3. Examples
    17. D. Hypertext Caching Protocol
      1. Message Format and Magic Constants
        1. HEADER
        2. DATA
        3. AUTH
      2. HTCP Data Types
        1. COUNTSTR
        2. SPECIFIER
        3. DETAIL
        4. IDENTITY
      3. HTCP Opcodes
        1. NOP
        2. TST
          1. TST request
          2. TST response
        3. MON
          1. MON request
          2. MON response
        4. SET
          1. SET request
          2. SET response
        5. CLR
          1. CLR request
          2. CLR response
    18. E. Cache Digests
      1. The Cache Digest Implementation
        1. Keys
        2. Hash Functions
        3. Sizing the Filter
        4. Selecting Objects for the Digest
        5. False Hits and Digest Freshness
        6. Exchanging Digests
      2. Message Format
      3. An Example
    19. F. HTTP Status Codes
      1. 1xx Intermediate Status
      2. 2xx Successful Response
      3. 3xx Redirects
      4. 4xx Request Errors
      5. 5xx Server Errors
    20. G. U.S.C. 17 Sec. 512. Limitations on Liability Relating to Material Online
    21. List of Acronyms
    22. H. Bibliography
      1. Books and Articles
      2. Request For Comments
    23. Index
    24. Colophon