You are previewing The Art of Scalability: Scalable Web Architecture, Processes, and Organizations for the Modern Enterprise, Second Edition.
O'Reilly logo
The Art of Scalability: Scalable Web Architecture, Processes, and Organizations for the Modern Enterprise, Second Edition

Book Description

A Comprehensive, Proven Approach to IT Scalability from Two Veteran Software, Technology, and Business Executives

In this second edition of The Art of Scalability, AKF Partners cofounders Martin L. Abbott and Michael T. Fisher cover everything product, technology and business leaders must know to build products that can scale smoothly to meet any business requirement. Drawing on their unparalleled experience managing some of the world’s highest-transaction-volume Web sites, the authors provide detailed models and best-practice approaches available in no other book.

Unlike previous books on scalability, The Art of Scalability doesn’t limit its coverage to technology. Writing for both technical and nontechnical decision-makers, this book covers everything that impacts scalability, including architecture, processes, people, and organizations.  This second edition has been edited to improve readability and includes new and updated content, a new chapter on Agile architecture, and new case studies.

Throughout, the authors address a broad spectrum of real-world challenges, from performance testing to IT governance. Using their tools and guidance, organizations can systematically overcome obstacles to scalability and achieve unprecedented levels of technical and business performance.

New and Updated Coverage includes

·         Staffing the scalable organization: essential organizational, management, and leadership skills for technical leaderswith a special focus on how to build truly Agile organizations.

·         How to organize teams to maximize innovation and reduce value destroying team conflict.

·         Building processes for scale: process lessons from hyper-growth companies, from technical issue resolution to crisis management

·         How to manage risk and how to create effective change management in the age of continuous deployment.

·         Architecting scalable solutions: powerful proprietary models for identifying scalability needs and choosing the best approaches to meet them

·         How to practice “Agile” Architecture

·         Optimizing performance through caching, application and database splitting, and asynchronous design

·         New and updated scalability techniques for emerging technologies, including clouds and grids

·         Planning for rapid data growth and new data centers

·         Evolving monitoring strategies to tightly align with customer requirements

Table of Contents

  1. About This eBook
  2. Title Page
  3. Copyright Page
  4. Praise for The Art of Scalability, Second Edition
  5. Praise for the First Edition
  6. Dedication Page
  7. Contents
  8. Foreword
  9. Acknowledgments
  10. About the Authors
  11. Introduction
    1. Scalability: So Much More Than Just Technology
    2. Art Versus Science
    3. Who Needs Scalability?
    4. Book Organization and Structure
  12. Part I: Staffing a Scalable Organization
    1. Chapter 1. The Impact of People and Leadership on Scalability
      1. The Case Method
      2. Why People?
      3. Why Organizations?
      4. Why Management and Leadership?
      5. Conclusion
        1. Key Points
    2. Chapter 2. Roles for the Scalable Technology Organization
      1. The Effects of Failure
      2. Defining Roles
      3. Executive Responsibilities
        1. Chief Executive Officer
        2. Chief Financial Officer
        3. Business Unit Owners, General Managers, and P&L Owners
        4. Chief Technology Officer/Chief Information Officer
      4. Individual Contributor Responsibilities
        1. Architecture Responsibilities
        2. Engineering Responsibilities
        3. DevOps Responsibilities
        4. Infrastructure Responsibilities
        5. Quality Assurance Responsibilities
        6. Capacity Planning Responsibilities
      5. A Tool for Defining Responsibilities
      6. Conclusion
        1. Key Points
    3. Chapter 3. Designing Organizations
      1. Organizational Influences That Affect Scalability
      2. Team Size
        1. Warning Signs
        2. Growing or Splitting Teams
      3. Organizational Structure
        1. Functional Organization
        2. Matrix Organization
        3. Agile Organization
      4. Conclusion
        1. Key Points
    4. Chapter 4. Leadership 101
      1. What Is Leadership?
      2. Leadership: A Conceptual Model
      3. Taking Stock of Who You Are
      4. Leading from the Front
      5. Checking Your Ego at the Door
      6. Mission First, People Always
      7. Making Timely, Sound, and Morally Correct Decisions
      8. Empowering Teams and Scalability
      9. Alignment with Shareholder Value
      10. Transformational Leadership
      11. Vision
      12. Mission
      13. Goals
      14. Putting It All Together
      15. The Causal Roadmap to Success
      16. Conclusion
        1. Key Points
    5. Chapter 5. Management 101
      1. What Is Management?
      2. Project and Task Management
      3. Building Teams: A Sports Analogy
      4. Upgrading Teams: A Garden Analogy
      5. Measurement, Metrics, and Goal Evaluation
      6. The Goal Tree
      7. Paving the Path for Success
      8. Conclusion
        1. Key Points
    6. Chapter 6. Relationships, Mindset, and the Business Case
      1. Understanding the Experiential Chasm
        1. Why the Business Executive Might Be the Problem
        2. Why the Technology Executive Might Be the Problem
      2. Defeating the IT Mindset
      3. The Business Case for Scale
      4. Conclusion
        1. Key Points
  13. Part II: Building Processes for Scale
    1. Chapter 7. Why Processes Are Critical to Scale
      1. The Purpose of Process
      2. Right Time, Right Process
        1. A Process Maturity Framework
        2. When to Implement Processes
        3. Process Complexity
      3. When Good Processes Go Bad
      4. Conclusion
        1. Key Points
    2. Chapter 8. Managing Incidents and Problems
      1. What Is an Incident?
      2. What Is a Problem?
      3. The Components of Incident Management
      4. The Components of Problem Management
      5. Resolving Conflicts Between Incident and Problem Management
      6. Incident and Problem Life Cycles
      7. Implementing the Daily Incident Meeting
      8. Implementing the Quarterly Incident Review
      9. The Postmortem Process
      10. Putting It All Together
      11. Conclusion
        1. Key Points
    3. Chapter 9. Managing Crises and Escalations
      1. What Is a Crisis?
      2. Why Differentiate a Crisis from Any Other Incident?
      3. How Crises Can Change a Company
      4. Order Out of Chaos
        1. The Role of the Problem Manager
        2. The Role of Team Managers
        3. The Role of Engineering Leads
        4. The Role of Individual Contributors
      5. Communications and Control
      6. The War Room
      7. Escalations
      8. Status Communications
      9. Crisis Postmortem and Communication
      10. Conclusion
        1. Key Points
    4. Chapter 10. Controlling Change in Production Environments
      1. What Is a Change?
      2. Change Identification
      3. Change Management
        1. Change Proposal
        2. Change Approval
        3. Change Scheduling
        4. Change Implementation and Logging
        5. Change Validation
        6. Change Review
      4. The Change Control Meeting
      5. Continuous Process Improvement
      6. Conclusion
        1. Key Points
    5. Chapter 11. Determining Headroom for Applications
      1. Purpose of the Process
      2. Structure of the Process
      3. Ideal Usage Percentage
      4. A Quick Example Using Spreadsheets
      5. Conclusion
        1. Key Points
    6. Chapter 12. Establishing Architectural Principles
      1. Principles and Goals
      2. Principle Selection
      3. AKF’s Most Commonly Adopted Architectural Principles
        1. N + 1 Design
        2. Design for Rollback
        3. Design to Be Disabled
        4. Design to Be Monitored
        5. Design for Multiple Live Sites
        6. Use Mature Technologies
        7. Asynchronous Design
        8. Stateless Systems
        9. Scale Out, Not Up
        10. Design for at Least Two Axes of Scale
        11. Buy When Non-Core
        12. Use Commodity Hardware
        13. Build Small, Release Small, Fail Fast
        14. Isolate Faults
        15. Automation over People
      4. Conclusion
        1. Key Points
    7. Chapter 13. Joint Architecture Design and Architecture Review Board
      1. Fixing Organizational Dysfunction
      2. Designing for Scale Cross-Functionally
      3. JAD Entry and Exit Criteria
      4. From JAD to ARB
      5. Conducting the Meeting
      6. ARB Entry and Exit Criteria
      7. Conclusion
        1. Key Points
    8. Chapter 14. Agile Architecture Design
      1. Architecture in Agile Organizations
      2. Ownership of Architecture
      3. Limited Resources
      4. Standards
      5. ARB in the Agile Organization
      6. Conclusion
        1. Key Points
    9. Chapter 15. Focus on Core Competencies: Build Versus Buy
      1. Building Versus Buying, and Scalability
      2. Focusing on Cost
      3. Focusing on Strategy
      4. “Not Built Here” Phenomenon
      5. Merging Cost and Strategy
      6. Does This Component Create Strategic Competitive Differentiation?
      7. Are We the Best Owners of This Component or Asset?
      8. What Is the Competition for This Component?
      9. Can We Build This Component Cost-Effectively?
      10. The Best Buy Decision Ever
      11. Anatomy of a Build-It-Yourself Failure
      12. Conclusion
        1. Key Points
    10. Chapter 16. Determining Risk
      1. Importance of Risk Management to Scale
      2. Measuring Risk
      3. Managing Risk
      4. Conclusion
        1. Key Points
    11. Chapter 17. Performance and Stress Testing
      1. Performing Performance Testing
        1. Establish Success Criteria
        2. Establish the Appropriate Environment
        3. Define the Tests
        4. Execute the Tests
        5. Analyze the Data
        6. Report to Engineers
        7. Repeat the Tests and Analysis
      2. Don’t Stress over Stress Testing
        1. Identify the Objectives
        2. Identify the Key Services
        3. Determine the Load
        4. Establish the Appropriate Environment
        5. Identify the Monitors
        6. Create the Load
        7. Execute the Tests
        8. Analyze the Data
      3. Performance and Stress Testing for Scalability
      4. Conclusion
        1. Key Points
    12. Chapter 18. Barrier Conditions and Rollback
      1. Barrier Conditions
        1. Barrier Conditions and Agile Development
        2. Barrier Conditions and Waterfall Development
        3. Barrier Conditions and Hybrid Models
      2. Rollback Capabilities
        1. Rollback Window
        2. Rollback Technology Considerations
        3. Cost Considerations of Rollback
      3. Markdown Functionality: Design to Be Disabled
      4. Conclusion
        1. Key Points
    13. Chapter 19. Fast or Right?
      1. Tradeoffs in Business
      2. Relation to Scalability
      3. How to Think About the Decision
      4. Conclusion
        1. Key Points
  14. Part III: Architecting Scalable Solutions
    1. Chapter 20. Designing for Any Technology
      1. An Implementation Is Not an Architecture
      2. Technology-Agnostic Design
        1. TAD and Cost
        2. TAD and Risk
        3. TAD and Scalability
        4. TAD and Availability
      3. The TAD Approach
      4. Conclusion
        1. Key Points
    2. Chapter 21. Creating Fault-Isolative Architectural Structures
      1. Fault-Isolative Architecture Terms
      2. Benefits of Fault Isolation
        1. Fault Isolation and Availability: Limiting Impact
        2. Fault Isolation and Availability: Incident Detection and Resolution
        3. Fault Isolation and Scalability
        4. Fault Isolation and Time to Market
        5. Fault Isolation and Cost
      3. How to Approach Fault Isolation
        1. Principle 1: Nothing Is Shared
        2. Principle 2: Nothing Crosses a Swim Lane Boundary
        3. Principle 3: Transactions Occur Along Swim Lanes
      4. When to Implement Fault Isolation
        1. Approach 1: Swim Lane the Money-Maker
        2. Approach 2: Swim Lane the Biggest Sources of Incidents
        3. Approach 3: Swim Lane Along Natural Barriers
      5. How to Test Fault-Isolative Designs
      6. Conclusion
        1. Key Points
    3. Chapter 22. Introduction to the AKF Scale Cube
      1. The AKF Scale Cube
      2. The x-Axis of the Cube
      3. The y-Axis of the Cube
      4. The z-Axis of the Cube
      5. Putting It All Together
      6. When and Where to Use the Cube
      7. Conclusion
        1. Key Points
    4. Chapter 23. Splitting Applications for Scale
      1. The AKF Scale Cube for Applications
      2. The x-Axis of the AKF Application Scale Cube
      3. The y-Axis of the AKF Application Scale Cube
      4. The z-Axis of the AKF Application Scale Cube
      5. Putting It All Together
      6. Practical Use of the Application Cube
        1. Observations
      7. Conclusion
        1. Key Points
    5. Chapter 24. Splitting Databases for Scale
      1. Applying the AKF Scale Cube to Databases
      2. The x-Axis of the AKF Database Scale Cube
      3. The y-Axis of the AKF Database Scale Cube
      4. The z-Axis of the AKF Database Scale Cube
      5. Putting It All Together
      6. Practical Use of the Database Cube
        1. Ecommerce Implementation
        2. Search Implementation
        3. Business-to-Business SaaS Solution
        4. Observations
        5. Timeline Considerations
      7. Conclusion
        1. Key Points
    6. Chapter 25. Caching for Performance and Scale
      1. Caching Defined
      2. Object Caches
      3. Application Caches
        1. Proxy Caches
        2. Reverse Proxy Caches
        3. Caching Software
      4. Content Delivery Networks
      5. Conclusion
        1. Key Points
    7. Chapter 26. Asynchronous Design for Scale
      1. Synching Up on Synchronization
      2. Synchronous Versus Asynchronous Calls
        1. Scaling Synchronously or Asynchronously
        2. Example Asynchronous Systems
      3. Defining State
      4. Conclusion
        1. Key Points
  15. Part IV: Solving Other Issues and Challenges
    1. Chapter 27. Too Much Data
      1. The Cost of Data
      2. The Value of Data and the Cost-Value Dilemma
      3. Making Data Profitable
        1. Option Value
        2. Strategic Competitive Differentiation
        3. Cost-Justify the Solution (Tiered Storage Solutions)
        4. Transform the Data
      4. Handling Large Amounts of Data
        1. Big Data
        2. A NoSQL Primer
      5. Conclusion
        1. Key Points
    2. Chapter 28. Grid Computing
      1. History of Grid Computing
      2. Pros and Cons of Grids
        1. Pros of Grids
        2. Cons of Grids
      3. Different Uses for Grid Computing
        1. Production Grid
        2. Build Grid
        3. Data Warehouse Grid
        4. Back-Office Grid
      4. Conclusion
        1. Key Points
    3. Chapter 29. Soaring in the Clouds
      1. History and Definitions
        1. Public Versus Private Clouds
      2. Characteristics and Architecture of Clouds
        1. Pay by Usage
        2. Scale on Demand
        3. Multiple Tenants
        4. Virtualization
      3. Differences Between Clouds and Grids
      4. Pros and Cons of Cloud Computing
        1. Pros of Cloud Computing
        2. Cons of Cloud Computing
      5. Where Clouds Fit in Different Companies
        1. Environments
        2. Skill Sets
      6. Decision Process
      7. Conclusion
        1. Key Points
    4. Chapter 30. Making Applications Cloud Ready
      1. The Scale Cube in a Cloud
        1. x-Axis
        2. y- and z-Axes
      2. Overcoming Challenges
        1. Fault Isolation in a Cloud
        2. Variability in Input/Output
      3. Intuit Case Study
      4. Conclusion
        1. Key Points
    5. Chapter 31. Monitoring Applications
      1. “Why Didn’t We Catch That Earlier?”
      2. A Framework for Monitoring
        1. User Experience and Business Metrics
        2. Systems Monitoring
        3. Application Monitoring
      3. Measuring Monitoring: What Is and Isn’t Valuable?
      4. Monitoring and Processes
      5. Conclusion
        1. Key Points
    6. Chapter 32. Planning Data Centers
      1. Data Center Costs and Constraints
      2. Location, Location, Location
      3. Data Centers and Incremental Growth
      4. When Do I Consider IaaS?
      5. Three Magic Rules of Three
        1. The First Rule of Three: Three Magic Drivers of Data Center Costs
        2. The Second Rule of Three: Three Is the Magic Number for Servers
        3. The Third Rule of Three: Three Is the Magic Number for Data Centers
      6. Multiple Active Data Center Considerations
      7. Conclusion
        1. Key Points
    7. Chapter 33. Putting It All Together
      1. What to Do Now?
      2. Further Resources on Scalability
        1. Blogs
        2. Books
  16. Part V: Appendices
    1. Appendix A. Calculating Availability
      1. Hardware Uptime
      2. Customer Complaints
      3. Portion of Site Down
      4. Third-Party Monitoring Service
      5. Business Graph
    2. Appendix B. Capacity Planning Calculations
    3. Appendix C. Load and Performance Calculations
  17. Index
  18. Code Snippets