You are previewing Architecting for Scale.
O'Reilly logo
Architecting for Scale

Book Description

Every day, companies struggle to scale critical applications. As traffic volume and data demands increase, these applications become more complicated and brittle, exposing risks and compromising availability. This practical guide shows IT, devops, and system reliability managers how to prevent an application from becoming slow, inconsistent, or downright unavailable as it grows. Author Lee Atchison provides basic techniques for building applications that can handle huge quantities of traffic, data, and demand without affecting the quality your customers expect.

Table of Contents

  1. Foreword
  2. Preface
    1. Who Should Read This Book
    2. Why I Wrote This Book
    3. A Word on Scale Today
    4. Navigating This Book
      1. Part I, “Availability”
      2. Part II, “Risk Management”
      3. Part III, “Services and Microservices”
      4. Part IV, “Scaling Applications”
      5. Part V, “Cloud Services”
      6. Part VI, “Conclusion”
    5. Online Resources
    6. Conventions Used in This Book
    7. Safari® Books Online
    8. How to Contact Us
    9. Acknowledgments
  3. I. Availability
  4. 1. What Is Availability?
    1. Availability Versus Reliability
    2. What Causes Poor Availability?
  5. 2. Five Focuses to Improve Application Availability
    1. Focus #1: Build with Failure in Mind
    2. Focus #2: Always Think About Scaling
    3. Focus #3: Mitigate Risk
    4. Focus #4: Monitor Availability
    5. Focus #5: Respond to Availability Issues in a Predictable and Defined Way
    6. Being Prepared
  6. 3. Measuring Availability
    1. The Nines
      1. What’s Reasonable?
    2. Don’t Be Fooled
    3. Availability by the Numbers
  7. 4. Improving Your Availability When It Slips
    1. Measure and Track Your Current Availability
    2. Automate Your Manual Processes
      1. Automated Deploys
      2. Configuration Management
      3. Change Experiments and High Frequency Changes
      4. Automated Change Sanity Testing
    3. Improve Your Systems
    4. Your Changing and Growing Application
    5. Keeping on Top of Availability
  8. II. Risk Management
  9. 5. What Is Risk Management?
    1. Managing Risk
    2. Identify Risk
    3. Remove Worst Offenders
    4. Mitigate
    5. Review Regularly
    6. Managing Risk Summary
  10. 6. Likelihood Versus Severity
    1. The Top 10 List: Low Likelihood, Low Severity Risk
    2. The Order Database: Low Likelihood, High Severity Risk
    3. Custom Fonts: High Likelihood, Low Severity Risk
    4. T-Shirt Photos: High Likelihood, High Severity Risk
  11. 7. The Risk Matrix
    1. Scope of the Risk Matrix
    2. Creating the Risk Matrix
      1. Brainstorming the List
      2. Set the Likelihood and Severity Fields
      3. Risk Item Details
      4. Mitigation Plan
      5. Triggered Plan
    3. Using the Risk Matrix for Planning
    4. Maintaining the Risk Matrix
  12. 8. Risk Mitigation
    1. Recovery Plans
    2. Disaster Recovery Plans
    3. Improving Our Risk Situation
  13. 9. Game Days
    1. Staging Versus Production Environments
    2. Concerns with Running Game Days in Production
    3. Game Day Testing
  14. 10. Building Systems with Reduced Risk
    1. Redundancy
    2. Examples of Idempotent Interfaces
    3. Redundancy Improvements That Increase Complexity
    4. Independence
    5. Security
    6. Simplicity
    7. Self-Repair
    8. Operational Processes
  15. III. Services and Microservices
  16. 11. Why Use Services?
    1. The Monolith Application
    2. The Service-Based Application
    3. The Ownership Benefit
    4. The Scaling Benefit
  17. 12. Using Microservices
    1. What Should Be a Service?
      1. Dividing into Services
      2. Guideline #1: Specific Business Requirements
      3. Guideline #2: Distinct and Separable Team Ownership
      4. Guideline #3: Naturally Separable Data
      5. Guideline #4: Shared Capabilities/Data
      6. Mixed Reasons
    2. Going Too Far
    3. The Right Balance
  18. 13. Dealing with Service Failures
    1. Cascading Service Failures
    2. Responding to a Service Failure
      1. Predictable Response
      2. Understandable Response
      3. Reasonable Response
    3. Determining Failures
    4. Appropriate Action
      1. Graceful Degradation
      2. Graceful Backoff
      3. Fail as Early as Possible
      4. Customer-Caused Problems
  19. IV. Scaling Applications
  20. 14. Two Mistakes High
    1. What Is “Two Mistakes High”?
    2. “Two Mistakes High” in Practice
      1. Losing a Node
      2. Problems During Upgrades
      3. Data Center Resiliency
      4. Hidden Shared Failure Types
      5. Failure Loops
    3. Managing Your Applications
    4. The Space Shuttle
  21. 15. Service Ownership
    1. Single Team Owned Service Architecture
    2. Advantages of a STOSA Application and Organization
    3. What Does it Mean to Be a Service Owner?
  22. 16. Service Tiers
    1. Application Complexity
    2. What Are Service Tiers?
    3. Assigning Service Tier Labels to Services
      1. Tier 1
      2. Tier 2
      3. Tier 3
      4. Tier 4
    4. Example: Online Store
    5. What’s Next?
  23. 17. Using Service Tiers
    1. Expectations
    2. Responsiveness
    3. Dependencies
      1. Critical Dependency
      2. Noncritical Dependency
    4. Summary
  24. 18. Service-Level Agreements
    1. What are Service-Level Agreements?
    2. External Versus Internal SLAs
    3. Why Are Internal SLAs Important?
    4. SLAs as Trust
    5. SLAs for Problem Diagnosis
    6. Performance Measurements for SLAs
      1. Limit SLAs
      2. Top Percentile SLAs
      3. Latency Groups
    7. How Many and Which Internal SLAs?
    8. Additional Comments on SLAs
  25. 19. Continuous Improvement
    1. Examine Your Application Regularly
    2. Microservices
    3. Service Ownership
    4. Stateless Services
    5. Where’s the Data?
    6. Data Partitioning
    7. The Importance of Continuous Improvement
  26. V. Cloud Services
  27. 20. Change and the Cloud
    1. What Has Changed in the Cloud?
      1. Acceptance of Microservice-Based Architectures
      2. Smaller, More Specialized Services
      3. Greater Focus on the Application
      4. The Micro Startup
      5. Security and Compliance Has Matured
    2. Change Continues
  28. 21. Distributing the Cloud
    1. AWS Architecture
      1. AWS Region
      2. AWS Availability Zone
      3. Data Center
    2. Architecture Overview
    3. Availability Zones Are Not Data Centers
    4. Maintaining Location Diversity for Availability Reasons
  29. 22. Managed Infrastructure
    1. Structure of Cloud-Based Services
      1. Raw Resource
      2. Managed Resource (Server-Based)
      3. Managed Resource (Non-server-based)
    2. Implications of Using Managed Resources
    3. Implications of Using Non-Managed Resources
    4. Monitoring and CloudWatch
  30. 23. Cloud Resource Allocation
    1. Allocated-Capacity Resource Allocation
      1. Changing Allocations
      2. Reserved Capacity
    2. Usage-Based Resource Allocation
      1. The “Magic” of Usage-Based Resource Allocation
    3. The Pros and Cons of Resource Allocation Techniques
  31. 24. Scalable Computing Options
    1. Cloud-Based Servers
      1. Advantages
      2. Disadvantages
      3. Optimized Use Cases
    2. Compute Slices
      1. Advantages
      2. Disadvantages
      3. Optimized Use Cases
    3. Dynamic Containers
      1. Advantages
      2. Disadvantages
      3. Optimized Use Cases
    4. Microcompute
      1. Advantages
      2. Disadvantages
      3. Optimized Use Cases
    5. Now What?
  32. 25. AWS Lambda
    1. Using Lambda
      1. Event Processing
      2. Mobile Backend
      3. Internet of Things Data Intake
    2. Advantages and Disadvantages of Lambda
  33. VI. Conclusion
  34. 26. Putting It All Together
    1. Availability
    2. Risk Management
    3. Services
    4. Scaling
    5. Cloud
    6. Architecting for Scale
  35. Index