Cover image for The Art of Capacity Planning

Book description

Success on the web is measured by usage and growth. Web-based companies live or die by the ability to scale their infrastructure to accommodate increasing demand. This book is a hands-on and practical guide to planning for such growth, with many techniques and considerations to help you plan, deploy, and manage web application infrastructure. The Art of Capacity Planning is written by the manager of data operations for the world-famous photo-sharing site Flickr.com, now owned by Yahoo! John Allspaw combines personal anecdotes from many phases of Flickr's growth with insights from his colleagues in many other industries to give you solid guidelines for measuring your growth, predicting trends, and making cost-effective preparations. Topics include:

  • Evaluating tools for measurement and deployment

  • Capacity analysis and prediction for storage, database, and application servers

  • Designing architectures to easily add and measure capacity

  • Handling sudden spikes

  • Predicting exponential and explosive growth

  • How cloud services such as EC2 can fit into a capacity strategy

In this book, Allspaw draws on years of valuable experience, starting from the days when Flickr was relatively small and had to deal with the typical growth pains and cost/performance trade-offs of a typical company with a Web presence. The advice he offers in The Art of Capacity Planning will not only help you prepare for explosive growth, it will save you tons of grief.

Table of Contents

  1. The Art of Capacity Planning
    1. SPECIAL OFFER: Upgrade this ebook with O’Reilly
    2. Preface
      1. Why I Wrote This Book
      2. Focus and Topics
      3. Audience for This Book
      4. Organization of the Material
      5. Conventions Used in This Book
      6. Using Code Examples
      7. We'd Like to Hear from You
      8. Safari® Books Online
      9. Acknowledgments
    3. 1. Goals, Issues, and Processes in Capacity Planning
      1. Quick and Dirty Math
      2. Predicting When Your Systems Will Fail
      3. Make Your System Stats Tell Stories
      4. Buying Stuff: Procurement Is a Process
      5. Performance and Capacity: Two Different Animals
      6. The Effects of Social Websites and Open APIs
    4. 2. Setting Goals for Capacity
      1. Different Kinds of Requirements and Measurements
        1. Interpreting Formal Measurements
        2. Service Level Agreements
        3. Business Capacity Requirements
        4. User Expectations
      2. Architecture Decisions
        1. Providing Measurement Points
        2. Providing Scaling Points
        3. Hardware Decisions (Vertical, Horizontal, and Diagonal Scaling)
        4. Disaster Recovery
    5. 3. Measurement: Units of Capacity
      1. Aspects of Capacity Tracking Tools
        1. Fundamentals and Elements of Metric Collection Systems
        2. Round-Robin Database and RRDTool
        3. Ganglia
        4. SNMP
        5. Treating Logs As Past Metrics
        6. Monitoring As a Tool for Urgent Problem Identification
        7. Network Measurement and Planning
        8. Load Balancing
      2. Applications of Monitoring
        1. Application-Level Measurement
        2. Storage Capacity
          1. Consumption rates
          2. A real-world example: Tracking storage consumption
          3. Storage I/O patterns
          4. Logs and backup: The metacapacity issue
          5. Measuring loads on web servers
          6. A real-world example: Web server measurement
          7. Finding web server ceilings in a load-balancing environment
        3. Database Capacity
          1. A real-world example: Database measurement
          2. Finding database ceilings
        4. Caching Systems
          1. Cache efficiency: Working sets and dynamic data
        5. Establishing Caching System Ceilings
          1. A real-world example: Cache measurement
        6. Special Use and Multiple Use Servers
      3. API Usage and Its Effect on Capacity
      4. Examples and Reality
      5. Summary
    6. 4. Predicting Trends
      1. Riding Your Waves
        1. Trends, Curves, and Time
        2. Tying Application Level Metrics to System Statistics: Database Example
        3. Forecasting Peak-Driven Resource Usage: Web Server Example
        4. Caveats Concerning Small Data Sets
        5. Automating the Forecasting
        6. Safety Factors
      2. Procurement
        1. Procurement Time: The Killer Metric
        2. Just-In-Time Inventory
      3. The Effects of Increasing Capacity
      4. Long-Term Trends
        1. Traffic Pattern Changes
        2. Application Usage Changes and Product Planning
      5. Iteration and Calibration
        1. Best Guesses
        2. Diagonal Scaling Opportunities
      6. Summary
    7. 5. Deployment
      1. Automated Deployment Philosophies
        1. Goal: Minimize Time to Provision New Capacity
        2. Goal: All Changes Happen in One Place
        3. Goal: Never Log In to an Individual Server (for Management)
        4. Goal: Have New Servers Start Working Automatically
        5. Maintain Consistency for Easier Troubleshooting
      2. Automated Installation Tools
        1. Preparing the OS Image
        2. The Installation Process
      3. Automated Configuration
        1. Defining Roles and Services
        2. An Example: Splitting Off Static Web Content
        3. User Management and Access Control
        4. Ad Hockery
        5. Example 2: Multiple Data Centers
      4. Summary
    8. A. Virtualization and Cloud Computing
      1. Virtualization
      2. Cloud Computing
        1. Computing Resource Evolutions
        2. Mixed Definitions
        3. Cloud Capacity
          1. Use it or lose it (your wallet)
          2. Measuring the clouds
        4. Cloud Case Studies
        5. Cloud Use Case: Anonymous Desktop Software Company
          1. Lack of suitable SLAs
          2. Legal concerns of user data
          3. Cost
          4. Control and confidence
        6. Cloud Use Case: WordPress.com
        7. Cloud Use Case: Anonymous News Aggregation Site
        8. Cloud Use Case: SmugMug.com
          1. Capacity feedback loops
      3. Summary
    9. B. Dealing with Instantaneous Growth
      1. Mitigating Failure
        1. Disabling Heavy Features
        2. Baked Static Pages
        3. Cache But Serve Stale
      2. Handling Outages
    10. C. Capacity Tools
      1. Monitoring
        1. Metric Collection and Event Notification Systems
        2. Ad Hoc Measurement and Graphing Tools
      2. Deployment Tools
        1. Automated OS Installation
        2. Configuration Management
        3. Cluster Management
        4. Inventory Management
        5. Trend Analysis and Curve Fitting
        6. Books on Queuing Theory and the Mathematics of Capacity Planning
    11. Index
    12. About the Author
    13. COLOPHON
    14. SPECIAL OFFER: Upgrade this ebook with O’Reilly