You are previewing The Practice of Cloud System Administration: Designing and Operating Large Distributed Systems, Volume 2.
O'Reilly logo
The Practice of Cloud System Administration: Designing and Operating Large Distributed Systems, Volume 2

Book Description

“There’s an incredible amount of depth and thinking in the practices described here, and it’s impressive to see it all in one place.”

—Win Treese, coauthor of Designing Systems for Internet Commerce

The Practice of Cloud System Administration, Volume 2, focuses on “distributed” or “cloud” computing and brings a DevOps/SRE sensibility to the practice of system administration. Unsatisfied with books that cover either design or operations in isolation, the authors created this authoritative reference centered on a comprehensive approach.

Case studies and examples from Google, Etsy, Twitter, Facebook, Netflix, Amazon, and other industry giants are explained in practical ways that are useful to all enterprises. The new companion to the best-selling first volume, The Practice of System and Network Administration, Second Edition, this guide offers expert coverage of the following and many other crucial topics:

Designing and building modern web and distributed systems

  • Fundamentals of large system design

  • Understand the new software engineering implications of cloud administration

  • Make systems that are resilient to failure and grow and scale dynamically

  • Implement DevOps principles and cultural changes

  • IaaS/PaaS/SaaS and virtual platform selection

  • Operating and running systems using the latest DevOps/SRE strategies

  • Upgrade production systems with zero down-time

  • What and how to automate; how to decide what not to automate

  • On-call best practices that improve uptime

  • Why distributed systems require fundamentally different system administration techniques

  • Identify and resolve resiliency problems before they surprise you

  • Assessing and evaluating your team’s operational effectiveness

  • Manage the scientific process of continuous improvement

  • A forty-page, pain-free assessment system you can start using today

  • Table of Contents

    1. About This eBook
    2. Title Page
    3. Copyright Page
    4. Contents at a Glance
    5. Contents
    6. Preface
      1. About This Book
      2. Acknowledgments
        1. Part I Design: Building It
        2. Part II Operations: Running It
        3. Part III Appendices
    7. About the Authors
    8. Introduction
      1. Business Objectives
      2. Ideal System Architecture
      3. Ideal Release Process
      4. Ideal Operations
    9. Part I: Design: Building It
      1. Chapter 1. Designing in a Distributed World
        1. 1.1 Visibility at Scale
        2. 1.2 The Importance of Simplicity
        3. 1.3 Composition
          1. 1.3.1 Load Balancer with Multiple Backend Replicas
          2. 1.3.2 Server with Multiple Backends
          3. 1.3.3 Server Tree
        4. 1.4 Distributed State
        5. 1.5 The CAP Principle
          1. 1.5.1 Consistency
          2. 1.5.2 Availability
          3. 1.5.3 Partition Tolerance
        6. 1.6 Loosely Coupled Systems
        7. 1.7 Speed
        8. 1.8 Summary
        9. Exercises
      2. Chapter 2. Designing for Operations
        1. 2.1 Operational Requirements
          1. 2.1.1 Configuration
          2. 2.1.2 Startup and Shutdown
          3. 2.1.3 Queue Draining
          4. 2.1.4 Software Upgrades
          5. 2.1.5 Backups and Restores
          6. 2.1.6 Redundancy
          7. 2.1.7 Replicated Databases
          8. 2.1.8 Hot Swaps
          9. 2.1.9 Toggles for Individual Features
          10. 2.1.10 Graceful Degradation
          11. 2.1.11 Access Controls and Rate Limits
          12. 2.1.12 Data Import Controls
          13. 2.1.13 Monitoring
          14. 2.1.14 Auditing
          15. 2.1.15 Debug Instrumentation
          16. 2.1.16 Exception Collection
          17. 2.1.17 Documentation for Operations
        2. 2.2 Implementing Design for Operations
          1. 2.2.1 Build Features in from the Beginning
          2. 2.2.2 Request Features as They Are Identified
          3. 2.2.3 Write the Features Yourself
          4. 2.2.4 Work with a Third-Party Vendor
        3. 2.3 Improving the Model
        4. 2.4 Summary
        5. Exercises
      3. Chapter 3. Selecting a Service Platform
        1. 3.1 Level of Service Abstraction
          1. 3.1.1 Infrastructure as a Service
          2. 3.1.2 Platform as a Service
          3. 3.1.3 Software as a Service
        2. 3.2 Type of Machine
          1. 3.2.1 Physical Machines
          2. 3.2.2 Virtual Machines
          3. 3.2.3 Containers
        3. 3.3 Level of Resource Sharing
          1. 3.3.1 Compliance
          2. 3.3.2 Privacy
          3. 3.3.3 Cost
          4. 3.3.4 Control
        4. 3.4 Colocation
        5. 3.5 Selection Strategies
        6. 3.6 Summary
        7. Exercises
      4. Chapter 4. Application Architectures
        1. 4.1 Single-Machine Web Server
        2. 4.2 Three-Tier Web Service
          1. 4.2.1 Load Balancer Types
          2. 4.2.2 Load Balancing Methods
          3. 4.2.3 Load Balancing with Shared State
          4. 4.2.4 User Identity
          5. 4.2.5 Scaling
        3. 4.3 Four-Tier Web Service
          1. 4.3.1 Frontends
          2. 4.3.2 Application Servers
          3. 4.3.3 Configuration Options
        4. 4.4 Reverse Proxy Service
        5. 4.5 Cloud-Scale Service
          1. 4.5.1 Global Load Balancer
          2. 4.5.2 Global Load Balancing Methods
          3. 4.5.3 Global Load Balancing with User-Specific Data
          4. 4.5.4 Internal Backbone
        6. 4.6 Message Bus Architectures
          1. 4.6.1 Message Bus Designs
          2. 4.6.2 Message Bus Reliability
          3. 4.6.3 Example 1: Link-Shortening Site
          4. 4.6.4 Example 2: Employee Human Resources Data Updates
        7. 4.7 Service-Oriented Architecture
          1. 4.7.1 Flexibility
          2. 4.7.2 Support
          3. 4.7.3 Best Practices
        8. 4.8 Summary
        9. Exercises
      5. Chapter 5. Design Patterns for Scaling
        1. 5.1 General Strategy
          1. 5.1.1 Identify Bottlenecks
          2. 5.1.2 Reengineer Components
          3. 5.1.3 Measure Results
          4. 5.1.4 Be Proactive
        2. 5.2 Scaling Up
        3. 5.3 The AKF Scaling Cube
          1. 5.3.1 x: Horizontal Duplication
          2. 5.3.2 y: Functional or Service Splits
          3. 5.3.3 z: Lookup-Oriented Split
          4. 5.3.4 Combinations
        4. 5.4 Caching
          1. 5.4.1 Cache Effectiveness
          2. 5.4.2 Cache Placement
          3. 5.4.3 Cache Persistence
          4. 5.4.4 Cache Replacement Algorithms
          5. 5.4.5 Cache Entry Invalidation
          6. 5.4.6 Cache Size
        5. 5.5 Data Sharding
        6. 5.6 Threading
        7. 5.7 Queueing
          1. 5.7.1 Benefits
          2. 5.7.2 Variations
        8. 5.8 Content Delivery Networks
        9. 5.9 Summary
        10. Exercises
      6. Chapter 6. Design Patterns for Resiliency
        1. 6.1 Software Resiliency Beats Hardware Reliability
        2. 6.2 Everything Malfunctions Eventually
          1. 6.2.1 MTBF in Distributed Systems
          2. 6.2.2 The Traditional Approach
          3. 6.2.3 The Distributed Computing Approach
        3. 6.3 Resiliency through Spare Capacity
          1. 6.3.1 How Much Spare Capacity
          2. 6.3.2 Load Sharing versus Hot Spares
        4. 6.4 Failure Domains
        5. 6.5 Software Failures
          1. 6.5.1 Software Crashes
          2. 6.5.2 Software Hangs
          3. 6.5.3 Query of Death
        6. 6.6 Physical Failures
          1. 6.6.1 Parts and Components
          2. 6.6.2 Machines
          3. 6.6.3 Load Balancers
          4. 6.6.4 Racks
          5. 6.6.5 Datacenters
        7. 6.7 Overload Failures
          1. 6.7.1 Traffic Surges
          2. 6.7.2 DoS and DDoS Attacks
          3. 6.7.3 Scraping Attacks
        8. 6.8 Human Error
        9. 6.9 Summary
        10. Exercises
    10. Part II Operations: Running It
      1. Chapter 7. Operations in a Distributed World
        1. 7.1 Distributed Systems Operations
          1. 7.1.1 SRE versus Traditional Enterprise IT
          2. 7.1.2 Change versus Stability
          3. 7.1.3 Defining SRE
          4. 7.1.4 Operations at Scale
        2. 7.2 Service Life Cycle
          1. 7.2.1 Service Launches
          2. 7.2.2 Service Decommissioning
        3. 7.3 Organizing Strategy for Operational Teams
          1. 7.3.1 Team Member Day Types
          2. 7.3.2 Other Strategies
        4. 7.4 Virtual Office
          1. 7.4.1 Communication Mechanisms
          2. 7.4.2 Communication Policies
        5. 7.5 Summary
        6. Exercises
      2. Chapter 8. DevOps Culture
        1. 8.1 What Is DevOps?
          1. 8.1.1 The Traditional Approach
          2. 8.1.2 The DevOps Approach
        2. 8.2 The Three Ways of DevOps
          1. 8.2.1 The First Way: Workflow
          2. 8.2.2 The Second Way: Improve Feedback
          3. 8.2.3 The Third Way: Continual Experimentation and Learning
          4. 8.2.4 Small Batches Are Better
          5. 8.2.5 Adopting the Strategies
        3. 8.3 History of DevOps
          1. 8.3.1 Evolution
          2. 8.3.2 Site Reliability Engineering
        4. 8.4 DevOps Values and Principles
          1. 8.4.1 Relationships
          2. 8.4.2 Integration
          3. 8.4.3 Automation
          4. 8.4.4 Continuous Improvement
          5. 8.4.5 Common Nontechnical DevOps Practices
          6. 8.4.6 Common Technical DevOps Practices
          7. 8.4.7 Release Engineering DevOps Practices
        5. 8.5 Converting to DevOps
          1. 8.5.1 Getting Started
          2. 8.5.2 DevOps at the Business Level
        6. 8.6 Agile and Continuous Delivery
          1. 8.6.1 What Is Agile?
          2. 8.6.2 What Is Continuous Delivery?
        7. 8.7 Summary
        8. Exercises
      3. Chapter 9. Service Delivery: The Build Phase
        1. 9.1 Service Delivery Strategies
          1. 9.1.1 Pattern: Modern DevOps Methodology
          2. 9.1.2 Anti-pattern: Waterfall Methodology
        2. 9.2 The Virtuous Cycle of Quality
        3. 9.3 Build-Phase Steps
          1. 9.3.1 Develop
          2. 9.3.2 Commit
          3. 9.3.3 Build
          4. 9.3.4 Package
          5. 9.3.5 Register
        4. 9.4 Build Console
        5. 9.5 Continuous Integration
        6. 9.6 Packages as Handoff Interface
        7. 9.7 Summary
        8. Exercises
      4. Chapter 10. Service Delivery: The Deployment Phase
        1. 10.1 Deployment-Phase Steps
          1. 10.1.1 Promotion
          2. 10.1.2 Installation
          3. 10.1.3 Configuration
        2. 10.2 Testing and Approval
          1. 10.2.1 Testing
          2. 10.2.2 Approval
        3. 10.3 Operations Console
        4. 10.4 Infrastructure Automation Strategies
          1. 10.4.1 Preparing Physical Machines
          2. 10.4.2 Preparing Virtual Machines
          3. 10.4.3 Installing OS and Services
        5. 10.5 Continuous Delivery
        6. 10.6 Infrastructure as Code
        7. 10.7 Other Platform Services
        8. 10.8 Summary
        9. Exercises
      5. Chapter 11. Upgrading Live Services
        1. 11.1 Taking the Service Down for Upgrading
        2. 11.2 Rolling Upgrades
        3. 11.3 Canary
        4. 11.4 Phased Roll-outs
        5. 11.5 Proportional Shedding
        6. 11.6 Blue-Green Deployment
        7. 11.7 Toggling Features
        8. 11.8 Live Schema Changes
        9. 11.9 Live Code Changes
        10. 11.10 Continuous Deployment
        11. 11.11 Dealing with Failed Code Pushes
        12. 11.12 Release Atomicity
        13. 11.13 Summary
        14. Exercises
      6. Chapter 12. Automation
        1. 12.1 Approaches to Automation
          1. 12.1.1 The Left-Over Principle
          2. 12.1.2 The Compensatory Principle
          3. 12.1.3 The Complementarity Principle
          4. 12.1.4 Automation for System Administration
          5. 12.1.5 Lessons Learned
        2. 12.2 Tool Building versus Automation
          1. 12.2.1 Example: Auto Manufacturing
          2. 12.2.2 Example: Machine Configuration
          3. 12.2.3 Example: Account Creation
          4. 12.2.4 Tools Are Good, But Automation Is Better
        3. 12.3 Goals of Automation
        4. 12.4 Creating Automation
          1. 12.4.1 Making Time to Automate
          2. 12.4.2 Reducing Toil
          3. 12.4.3 Determining What to Automate First
        5. 12.5 How to Automate
        6. 12.6 Language Tools
          1. 12.6.1 Shell Scripting Languages
          2. 12.6.2 Scripting Languages
          3. 12.6.3 Compiled Languages
          4. 12.6.4 Configuration Management Languages
        7. 12.7 Software Engineering Tools and Techniques
          1. 12.7.1 Issue Tracking Systems
          2. 12.7.2 Version Control Systems
          3. 12.7.3 Software Packaging
          4. 12.7.4 Style Guides
          5. 12.7.5 Test-Driven Development
          6. 12.7.6 Code Reviews
          7. 12.7.7 Writing Just Enough Code
        8. 12.8 Multitenant Systems
        9. 12.9 Summary
        10. Exercises
      7. Chapter 13. Design Documents
        1. 13.1 Design Documents Overview
          1. 13.1.1 Documenting Changes and Rationale
          2. 13.1.2 Documentation as a Repository of Past Decisions
        2. 13.2 Design Document Anatomy
        3. 13.3 Template
        4. 13.4 Document Archive
        5. 13.5 Review Workflows
          1. 13.5.1 Reviewers and Approvers
          2. 13.5.2 Achieving Sign-off
        6. 13.6 Adopting Design Documents
        7. 13.7 Summary
        8. Exercises
      8. Chapter 14. Oncall
        1. 14.1 Designing Oncall
          1. 14.1.1 Start with the SLA
          2. 14.1.2 Oncall Roster
          3. 14.1.3 Onduty
          4. 14.1.4 Oncall Schedule Design
          5. 14.1.5 The Oncall Calendar
          6. 14.1.6 Oncall Frequency
          7. 14.1.7 Types of Notifications
          8. 14.1.8 After-Hours Maintenance Coordination
        2. 14.2 Being Oncall
          1. 14.2.1 Pre-shift Responsibilities
          2. 14.2.2 Regular Oncall Responsibilities
          3. 14.2.3 Alert Responsibilities
          4. 14.2.4 Observe, Orient, Decide, Act (OODA)
          5. 14.2.5 Oncall Playbook
          6. 14.2.6 Third-Party Escalation
          7. 14.2.7 End-of-Shift Responsibilities
        3. 14.3 Between Oncall Shifts
          1. 14.3.1 Long-Term Fixes
          2. 14.3.2 Postmortems
        4. 14.4 Periodic Review of Alerts
        5. 14.5 Being Paged Too Much
        6. 14.6 Summary
        7. Exercises
      9. Chapter 15. Disaster Preparedness
        1. 15.1 Mindset
          1. 15.1.1 Antifragile Systems
          2. 15.1.2 Reducing Risk
        2. 15.2 Individual Training: Wheel of Misfortune
        3. 15.3 Team Training: Fire Drills
          1. 15.3.1 Service Testing
          2. 15.3.2 Random Testing
        4. 15.4 Training for Organizations: Game Day/DiRT
          1. 15.4.1 Getting Started
          2. 15.4.2 Increasing Scope
          3. 15.4.3 Implementation and Logistics
          4. 15.4.4 Experiencing a DiRT Test
        5. 15.5 Incident Command System
          1. 15.5.1 How It Works: Public Safety Arena
          2. 15.5.2 How It Works: IT Operations Arena
          3. 15.5.3 Incident Action Plan
          4. 15.5.4 Best Practices
          5. 15.5.5 ICS Example
        6. 15.6 Summary
        7. Exercises
      10. Chapter 16. Monitoring Fundamentals
        1. 16.1 Overview
          1. 16.1.1 Uses of Monitoring
          2. 16.1.2 Service Management
        2. 16.2 Consumers of Monitoring Information
        3. 16.3 What to Monitor
        4. 16.4 Retention
        5. 16.5 Meta-monitoring
        6. 16.6 Logs
          1. 16.6.1 Approach
          2. 16.6.2 Timestamps
        7. 16.7 Summary
        8. Exercises
      11. Chapter 17. Monitoring Architecture and Practice
        1. 17.1 Sensing and Measurement
          1. 17.1.1 Blackbox versus Whitebox Monitoring
          2. 17.1.2 Direct versus Synthesized Measurements
          3. 17.1.3 Rate versus Capability Monitoring
          4. 17.1.4 Gauges versus Counters
        2. 17.2 Collection
          1. 17.2.1 Push versus Pull
          2. 17.2.2 Protocol Selection
          3. 17.2.3 Server Component versus Agent versus Poller
          4. 17.2.4 Central versus Regional Collectors
        3. 17.3 Analysis and Computation
        4. 17.4 Alerting and Escalation Manager
          1. 17.4.1 Alerting, Escalation, and Acknowledgments
          2. 17.4.2 Silence versus Inhibit
        5. 17.5 Visualization
          1. 17.5.1 Percentiles
          2. 17.5.2 Stack Ranking
          3. 17.5.3 Histograms
        6. 17.6 Storage
        7. 17.7 Configuration
        8. 17.8 Summary
        9. Exercises
      12. Chapter 18. Capacity Planning
        1. 18.1 Standard Capacity Planning
          1. 18.1.1 Current Usage
          2. 18.1.2 Normal Growth
          3. 18.1.3 Planned Growth
          4. 18.1.4 Headroom
          5. 18.1.5 Resiliency
          6. 18.1.6 Timetable
        2. 18.2 Advanced Capacity Planning
          1. 18.2.1 Identifying Your Primary Resources
          2. 18.2.2 Knowing Your Capacity Limits
          3. 18.2.3 Identifying Your Core Drivers
          4. 18.2.4 Measuring Engagement
          5. 18.2.5 Analyzing the Data
          6. 18.2.6 Monitoring the Key Indicators
          7. 18.2.7 Delegating Capacity Planning
        3. 18.3 Resource Regression
        4. 18.4 Launching New Services
        5. 18.5 Reduce Provisioning Time
        6. 18.6 Summary
        7. Exercises
      13. Chapter 19. Creating KPIs
        1. 19.1 What Is a KPI?
        2. 19.2 Creating KPIs
          1. 19.2.1 Step 1: Envision the Ideal
          2. 19.2.2 Step 2: Quantify Distance to the Ideal
          3. 19.2.3 Step 3: Imagine How Behavior Will Change
          4. 19.2.4 Step 4: Revise and Select
          5. 19.2.5 Step 5: Deploy the KPI
        3. 19.3 Example KPI: Machine Allocation
          1. 19.3.1 The First Pass
          2. 19.3.2 The Second Pass
          3. 19.3.3 Evaluating the KPI
        4. 19.4 Case Study: Error Budget
          1. 19.4.1 Conflicting Goals
          2. 19.4.2 A Unified Goal
          3. 19.4.3 Everyone Benefits
        5. 19.5 Summary
        6. Exercises
      14. Chapter 20. Operational Excellence
        1. 20.1 What Does Operational Excellence Look Like?
        2. 20.2 How to Measure Greatness
        3. 20.3 Assessment Methodology
          1. 20.3.1 Operational Responsibilities
          2. 20.3.2 Assessment Levels
          3. 20.3.3 Assessment Questions and Look-For’s
        4. 20.4 Service Assessments
          1. 20.4.1 Identifying What to Assess
          2. 20.4.2 Assessing Each Service
          3. 20.4.3 Comparing Results across Services
          4. 20.4.4 Acting on the Results
          5. 20.4.5 Assessment and Project Planning Frequencies
        5. 20.5 Organizational Assessments
        6. 20.6 Levels of Improvement
        7. 20.7 Getting Started
        8. 20.8 Summary
        9. Exercises
      15. Epilogue
    11. Part III Appendices
      1. Appendix A. Assessments
        1. A.1 Regular Tasks (RT)
          1. Sample Assessment Questions
          2. Level 1: Initial
          3. Level 2: Repeatable
          4. Level 3: Defined
          5. Level 4: Managed
          6. Level 5: Optimizing
        2. A.2 Emergency Response (ER)
          1. Sample Assessment Questions
          2. Level 1: Initial
          3. Level 2: Repeatable
          4. Level 3: Defined
          5. Level 4: Managed
          6. Level 5: Optimizing
        3. A.3 Monitoring and Metrics (MM)
          1. Sample Assessment Questions
          2. Level 1: Initial
          3. Level 2: Repeatable
          4. Level 3: Defined
          5. Level 4: Managed
          6. Level 5: Optimizing
        4. A.4 Capacity Planning (CP)
          1. Sample Assessment Questions
          2. Level 1: Initial
          3. Level 2: Repeatable
          4. Level 3: Defined
          5. Level 4: Managed
          6. Level 5: Optimizing
        5. A.5 Change Management (CM)
          1. Sample Assessment Questions
          2. Level 1: Initial
          3. Level 2: Repeatable
          4. Level 3: Defined
          5. Level 4: Managed
          6. Level 5: Optimizing
        6. A.6 New Product Introduction and Removal (NPI/NPR)
          1. Sample Assessment Questions
          2. Level 1: Initial
          3. Level 2: Repeatable
          4. Level 3: Defined
          5. Level 4: Managed
          6. Level 5: Optimizing
        7. A.7 Service Deployment and Decommissioning (SDD)
          1. Sample Assessment Questions
          2. Level 1: Initial
          3. Level 2: Repeatable
          4. Level 3: Defined
          5. Level 4: Managed
          6. Level 5: Optimizing
        8. A.8 Performance and Efficiency (PE)
          1. Sample Assessment Questions
          2. Level 1: Initial
          3. Level 2: Repeatable
          4. Level 3: Defined
          5. Level 4: Managed
          6. Level 5: Optimizing
        9. A.9 Service Delivery: The Build Phase
          1. Sample Assessment Questions
          2. Level 1: Initial
          3. Level 2: Repeatable
          4. Level 3: Defined
          5. Level 4: Managed
          6. Level 5: Optimizing
        10. A.10 Service Delivery: The Deployment Phase
          1. Sample Assessment Questions
          2. Level 1: Initial
          3. Level 2: Repeatable
          4. Level 3: Defined
          5. Level 4: Managed
          6. Level 5: Optimizing
        11. A.11 Toil Reduction
          1. Sample Assessment Questions
          2. Level 1: Initial
          3. Level 2: Repeatable
          4. Level 3: Defined
          5. Level 4: Managed
          6. Level 5: Optimizing
        12. A.12 Disaster Preparedness
          1. Sample Assessment Questions
          2. Level 1: Initial
          3. Level 2: Repeatable
          4. Level 3: Defined
          5. Level 4: Managed
          6. Level 5: Optimizing
      2. Appendix B. The Origins and Future of Distributed Computing and Clouds
        1. B.1 The Pre-Web Era (1985–1994)
          1. Availability Requirements
          2. Technology
          3. Scaling
          4. High Availability
          5. Costs
        2. B.2 The First Web Era: The Bubble (1995–2000)
          1. Availability Requirements
          2. Technology
          3. Scaling
          4. High Availability
          5. N + 1 Configurations
          6. N + 2 Configurations
          7. Costs
        3. B.3 The Dot-Bomb Era (2000–2003)
        4. Availability Requirements
          1. Technology
          2. High Availability
          3. Scaling
          4. Data Scaling
          5. Applicability
          6. Costs
        5. B.4 The Second Web Era (2003–2010)
        6. Availability Requirements
          1. Technology
          2. High Availability
          3. Scaling
          4. Costs
        7. B.5 The Cloud Computing Era (2010–present)
          1. Availability Requirements
          2. Costs
        8. Scaling and High Availability
          1. Technology
        9. B.6 Conclusion
        10. Exercises
      3. Appendix C. Scaling Terminology and Concepts
        1. C.1 Constant, Linear, and Exponential Scaling
        2. C.2 Big O Notation
        3. C.3 Limitations of Big O Notation
      4. Appendix D. Templates and Examples
        1. D.1 Design Document Template
        2. D.2 Design Document Example
        3. D.3 Sample Postmortem Template
      5. Appendix E. Recommended Reading
        1. DevOps:
        2. ITIL:
        3. Theory:
        4. Classic Google Papers:
        5. Classic Facebook Papers:
        6. Scalability:
        7. UNIX Internals:
        8. UNIX Systems Programming:
        9. Network Protocols:
    12. Bibliography
    13. Index