You are previewing The Datacenter as a Computer.
O'Reilly logo
The Datacenter as a Computer

Book Description

As computation continues to move into the cloud, the computing platform of interest no longer resembles a pizza box or a refrigerator, but a warehouse full of computers. These new large datacenters are quite different from traditional hosting facilities of earlier times and cannot be viewed simply as a collection of co-located servers. Large portions of the hardware and software resources in these facilities must work in concert to efficiently deliver good levels of Internet service performance, something that can only be achieved by a holistic approach to their design and deployment. In other words, we must treat the datacenter itself as one massive warehouse-scale computer (WSC). We describe the architecture of WSCs, the main factors influencing their design, operation, and cost structure, and the characteristics of their software base. We hope it will be useful to architects and programmers of today's WSCs, as well as those of future many-core platforms which may one day implement the equivalent of today's WSCs on a single board.

Table of Contents

  1. Cover
  2. Synthesis Lectures on Computer Architecture
  3. Copyright
  4. Title Page
  5. Acknowledgments
  6. Contents
  7. 1. Introduction
    1. 1.1 Warehouse-Scale Computers
    2. 1.2 Emphasis on Cost Efficiency
    3. 1.3 Not Just a Collection of Servers
    4. 1.4 One Datacenter vs. Several Datacenters
    5. 1.5 Why WSCs Might Matter to You
    6. 1.6 Architectural Overview of WSCs
      1. 1.6.1 Storage
      2. 1.6.2 Networking Fabric
      3. 1.6.3 Storage Hierarchy
      4. 1.6.4 Quantifying Latency, Bandwidth, and Capacity
      5. 1.6.5 Power Usage
      6. 1.6.6 Handling Failures
  8. 2. Workloads and Software Infrastructure
    1. 2.1 Datacenter vs. Desktop
    2. 2.2 Performance and Availability Toolbox
    3. 2.3 Cluster-Level Infrastructure Software
      1. 2.3.1 Resource Management
      2. 2.3.2 Hardware Abstraction and Other Basic Services
      3. 2.3.3 Deployment and Maintenance
      4. 2.3.4 Programming Frameworks
    4. 2.4 Application-Level Software
      1. 2.4.1 Workload Examples
      2. 2.4.2 Online: Web Search
      3. 2.4.3 Offline: Scholar Article Similarity
    5. 2.5 A Monitoring Infrastructure
      1. 2.5.1 Service-Level Dashboards
      2. 2.5.2 Performance Debugging Tools
      3. 2.5.3 Platform-Level Monitoring
    6. 2.6 Buy vs. Build
    7. 2.7 Further Reading
  9. 3. Hardware Building Blocks
    1. 3.1 Cost-Efficient Hardware
      1. 3.1.1 How About Parallel Application Performance?
      2. 3.1.2 How Low-End Can You Go?
      3. 3.1.3 Balanced Designs
  10. 4. Datacenter Basics
    1. 4.1 Datacenter Tier Classifications
    2. 4.2 Datacenter Power Systems
      1. 4.2.1 UPS Systems
      2. 4.2.2 Power Distribution Units
    3. 4.3 Datacenter Cooling Systems
      1. 4.3.1 CRAC Units
      2. 4.3.2 Free Cooling
      3. 4.3.3 Air Flow Considerations
      4. 4.3.4 In-Rack Cooling
      5. 4.3.5 Container-Based Datacenters
  11. 5. Energy and Power Efficiency
    1. 5.1 Datacenter Energy Efficiency
      1. 5.1.1 Sources of Efficiency Losses in Datacenters
      2. 5.1.2 Improving the Energy Efficiency of Datacenters
    2. 5.2 Measuring the Efficiency of Computing
      1. 5.2.1 Some Useful Benchmarks
      2. 5.2.2 Load vs. Efficiency
    3. 5.3 Energy-Proportional Computing
      1. 5.3.1 Dynamic Power Range of Energy-Proportional Machines
      2. 5.3.2 Causes of Poor Energy Proportionality
      3. 5.3.3 How to Improve Energy Proportionality
    4. 5.4 Relative Effectiveness of Low-Power Modes
    5. 5.5 The Role of Software in Energy Proportionality
    6. 5.6 Datacenter Power Provisioning
      1. 5.6.1 Deployment and Power Management Strategies
      2. 5.6.2 Advantages of Oversubscribing Facility Power
    7. 5.7 Trends in Server Energy Usage
    8. 5.8 Conclusions
      1. 5.8.1 Further Reading
  12. 6. Modeling Costs
    1. 6.1 Capital Costs
    2. 6.2 Operational Costs
    3. 6.3 Case Studies
      1. 6.3.1 Real-World Datacenter Costs
      2. 6.3.2 Modeling a Partially Filled Datacenter
  13. 7. Dealing with Failures and repairs
    1. 7.1 Implications of Software-Based Fault Tolerance
    2. 7.2 Categorizing Faults
      1. 7.2.1 Fault Severity
      2. 7.2.2 Causes of Service-Level Faults
    3. 7.3 Machine-Level Failures
      1. 7.3.1 What Causes Machine Crashes?
      2. 7.3.2 Predicting Faults
    4. 7.4 Repairs
    5. 7.5 Tolerating Faults, Not Hiding Them
  14. 8. Closing remarks
    1. 8.1 Hardware
    2. 8.2 Software
    3. 8.3 Economics
    4. 8.4 Key Challenges
      1. 8.4.1 Rapidly Changing Workloads
      2. 8.4.2 Building Balanced Systems from Imbalanced Components
      3. 8.4.3 Curbing Energy Usage
      4. 8.4.4 Amdahl’s Cruel Law
    5. 8.5 Conclusions
  15. References
  16. Author Biographies