You are previewing Performance Modeling and Design of Computer Systems.
O'Reilly logo
Performance Modeling and Design of Computer Systems

Book Description

Tackling the questions that systems designers care about, this book brings queueing theory decisively back to computer science. The book is written with computer scientists and engineers in mind and is full of examples from computer systems, as well as manufacturing and operations research. Fun and readable, the book is highly approachable, even for undergraduates, while still being thoroughly rigorous and also covering a much wider span of topics than many queueing books. Readers benefit from a lively mix of motivation and intuition, with illustrations, examples and more than 300 exercises - all while acquiring the skills needed to model, analyze and design large-scale systems with good performance and low cost. The exercises are an important feature, teaching research-level counterintuitive lessons in the design of computer systems. The goal is to train readers not only to customize existing analyses but also to invent their own.

Table of Contents

  1. Cover
  2. Performance Modeling and Design of Computer Systems
  3. Title Page
  4. Copyright
  5. Dedication
  6. Epigraph
  7. Table of Contents
  8. Preface
  9. Acknowledgments
  10. I Introduction to Queueing
    1. 1 Motivating Examples of the Power of Analytical Modeling
      1. 1.1 What Is Queueing Theory?
      2. 1.2 Examples of the Power of Queueing Theory
    2. 2 Queueing Theory Terminology
      1. 2.1 Where We Are Heading
      2. 2.2 The Single-Server Network
      3. 2.3 Classification of Queueing Networks
      4. 2.4 Open Networks
      5. 2.5 More Metrics: Throughput and Utilization
      6. 2.6 Closed Networks
        1. 2.6.1 Interactive (Terminal-Driven) Systems
        2. 2.6.2 Batch Systems
        3. 2.6.3 Throughput in a Closed System
      7. 2.7 Differences between Closed and Open Networks
        1. 2.7.1 A Question on Modeling
      8. 2.8 Related Readings
      9. 2.9 Exercises
  11. II Necessary Probability Background
    1. 3 Probability Review
      1. 3.1 Sample Space and Events
      2. 3.2 Probability Defined on Events
      3. 3.3 Conditional Probabilities on Events
      4. 3.4 Independent Events and Conditionally Independent Events
      5. 3.5 Law of Total Probability
      6. 3.6 Bayes Law
      7. 3.7 Discrete versus Continuous Random Variables
      8. 3.8 Probabilities and Densities
        1. 3.8.1 Discrete: Probability Mass Function
        2. 3.8.2 Continuous: Probability Density Function
      9. 3.9 Expectation and Variance
      10. 3.10 Joint Probabilities and Independence
      11. 3.11 Conditional Probabilities and Expectations
      12. 3.12 Probabilities and Expectations via Conditioning
      13. 3.13 Linearity of Expectation
      14. 3.14 Normal Distribution
        1. 3.14.1 Linear Transformation Property
        2. 3.14.2 Central Limit Theorem
      15. 3.15 Sum of a Random Number of Random Variables
      16. 3.16 Exercises
    2. 4 Generating Random Variables for Simulation
      1. 4.1 Inverse-Transform Method
        1. 4.1.1 The Continuous Case
        2. 4.1.2 The Discrete Case
      2. 4.2 Accept-Reject Method
        1. 4.2.1 Discrete Case
        2. 4.2.2 Continuous Case
        3. 4.2.3 Some Harder Problems
      3. 4.3 Readings
      4. 4.4 Exercises
    3. 5 Sample Paths, Convergence, and Averages
      1. 5.1 Convergence
      2. 5.2 Strong and Weak Laws of Large Numbers
      3. 5.3 Time Average versus Ensemble Average
        1. 5.3.1 Motivation
        2. 5.3.2 Definition
        3. 5.3.3 Interpretation
        4. 5.3.4 Equivalence
        5. 5.3.5 Simulation
        6. 5.3.6 Average Time in System
      4. 5.4 Related Readings
      5. 5.5 Exercise
  12. III The Predictive Power of Simple Operational Laws: “What-If” Questions and Answers
    1. 6 Little’s Law and Other Operational Laws
      1. 6.1 Little’s Law for Open Systems
      2. 6.2 Intuitions
      3. 6.3 Little’s Law for Closed Systems
      4. 6.4 Proof of Little’s Law for Open Systems
        1. 6.4.1 Statement via Time Averages
        2. 6.4.2 Proof
        3. 6.4.3 Corollaries
      5. 6.5 Proof of Little’s Law for Closed Systems
        1. 6.5.1 Statement via Time Averages
        2. 6.5.2 Proof
      6. 6.6 Generalized Little’s Law
      7. 6.7 Examples Applying Little’s Law
      8. 6.8 More Operational Laws: The Forced Flow Law
      9. 6.9 Combining Operational Laws
      10. 6.10 Device Demands
      11. 6.11 Readings and Further Topics Related to Little’s Law
      12. 6.12 Exercises
    2. 7 Modification Analysis: “What-If” for Closed Systems
      1. 7.1 Review
      2. 7.2 Asymptotic Bounds for Closed Systems
      3. 7.3 Modification Analysis for Closed Systems
      4. 7.4 More Modification Analysis Examples
      5. 7.5 Comparison of Closed and Open Networks
      6. 7.6 Readings
      7. 7.7 Exercises
  13. IV From Markov Chains to Simple Queues
    1. 8 Discrete-Time Markov Chains
      1. 8.1 Discrete-Time versus Continuous-Time Markov Chains
      2. 8.2 Definition of a DTMC
      3. 8.3 Examples of Finite-State DTMCs
        1. 8.3.1 Repair Facility Problem
        2. 8.3.2 Umbrella Problem
        3. 8.3.3 Program Analysis Problem
      4. 8.4 Powers of P: n-Step Transition Probabilities
      5. 8.5 Stationary Equations
      6. 8.6 The Stationary Distribution Equals the Limiting Distribution
      7. 8.7 Examples of Solving Stationary Equations
        1. 8.7.1 Repair Facility Problem with Cost
        2. 8.7.2 Umbrella Problem
      8. 8.8 Infinite-State DTMCs
      9. 8.9 Infinite-State Stationarity Result
      10. 8.10 Solving Stationary Equations in Infinite-State DTMCs
      11. 8.11 Exercises
    2. 9 Ergodicity Theory
      1. 9.1 Ergodicity Questions
      2. 9.2 Finite-State DTMCs
        1. 9.2.1 Existence of the Limiting Distribution
        2. 9.2.2 Mean Time between Visits to a State
        3. 9.2.3 Time Averages
      3. 9.3 Infinite-State Markov Chains
        1. 9.3.1 Recurrent versus Transient
        2. 9.3.2 Infinite Random Walk Example
        3. 9.3.3 Positive Recurrent versus Null Recurrent
      4. 9.4 Ergodic Theorem of Markov Chains
      5. 9.5 Time Averages
      6. 9.6 Limiting Probabilities Interpreted as Rates
      7. 9.7 Time-Reversibility Theorem
      8. 9.8 When Chains Are Periodic or Not Irreducible
        1. 9.8.1 Periodic Chains
        2. 9.8.2 Chains that Are Not Irreducible
      9. 9.9 Conclusion
      10. 9.10 Proof of Ergodic Theorem of Markov Chains*
      11. 9.11 Exercises
    3. 10 Real-World Examples: Google, Aloha, and Harder Chains*
      1. 10.1 Google’s PageRank Algorithm
        1. 10.1.1 Google’s DTMC Algorithm
        2. 10.1.2 Problems with Real Web Graphs
        3. 10.1.3 Google’s Solution to Dead Ends and Spider Traps
        4. 10.1.4 Evaluation of the PageRank Algorithm
        5. 10.1.5 Practical Implementation Considerations
      2. 10.2 Aloha Protocol Analysis
        1. 10.2.1 The Slotted Aloha Protocol
        2. 10.2.2 The Aloha Markov Chain
        3. 10.2.3 Properties of the Aloha Markov Chain
        4. 10.2.4 Improving the Aloha Protocol
      3. 10.3 Generating Functions for Harder Markov Chains
        1. 10.3.1 The z-Transform
        2. 10.3.2 Solving the Chain
      4. 10.4 Readings and Summary
      5. 10.5 Exercises
    4. 11 Exponential Distribution and the Poisson Process
      1. 11.1 Definition of the Exponential Distribution
      2. 11.2 Memoryless Property of the Exponential
      3. 11.3 Relating Exponential to Geometric via δ-Steps
      4. 11.4 More Properties of the Exponential
      5. 11.5 The Celebrated Poisson Process
      6. 11.6 Merging Independent Poisson Processes
      7. 11.7 Poisson Splitting
      8. 11.8 Uniformity
      9. 11.9 Exercises
    5. 12 Transition to Continuous-Time Markov Chains
      1. 12.1 Defining CTMCs
      2. 12.2 Solving CTMCs
      3. 12.3 Generalization and Interpretation
        1. 12.3.1 Interpreting the Balance Equations for the CTMC
        2. 12.3.2 Summary Theorem for CTMCs
      4. 12.4 Exercises
    6. 13 M/M/1 and PASTA
      1. 13.1 The M/M/1 Queue
      2. 13.2 Examples Using an M/M/1 Queue
      3. 13.3 PASTA
      4. 13.4 Further Reading
      5. 13.5 Exercises
  14. V Server Farms and Networks: Multi-server, Multi-queue Systems
    1. 14 Server Farms: M/M/k and M/M/k/k
      1. 14.1 Time-Reversibility for CTMCs
      2. 14.2 M/M/k/k Loss System
      3. 14.3 M/M/k
      4. 14.4 Comparison of Three Server Organizations
      5. 14.5 Readings
      6. 14.6 Exercises
    2. 15 Capacity Provisioning for Server Farms
      1. 15.1 What Does Load Really Mean in an M/M/k?
      2. 15.2 The M/M/∞
        1. 15.2.1 Analysis of the M/M/∞
        2. 15.2.2 A First Cut at a Capacity Provisioning Rule for the M/M/k
      3. 15.3 Square-Root Staffing
      4. 15.4 Readings
      5. 15.5 Exercises
    3. 16 Time-Reversibility and Burke’s Theorem
      1. 16.1 More Examples of Finite-State CTMCs
        1. 16.1.1 Networks with Finite Buffer Space
        2. 16.1.2 Batch System with M/M/2 I/O
      2. 16.2 The Reverse Chain
      3. 16.3 Burke’s Theorem
      4. 16.4 An Alternative (Partial) Proof of Burke’s Theorem
      5. 16.5 Application: Tandem Servers
      6. 16.6 General Acyclic Networks with Probabilistic Routing
      7. 16.7 Readings
      8. 16.8 Exercises
    4. 17 Networks of Queues and Jackson Product Form
      1. 17.1 Jackson Network Definition
      2. 17.2 The Arrival Process into Each Server
      3. 17.3 Solving the Jackson Network
      4. 17.4 The Local Balance Approach
      5. 17.5 Readings
      6. 17.6 Exercises
    5. 18 Classed Network of Queues
      1. 18.1 Overview
      2. 18.2 Motivation for Classed Networks
      3. 18.3 Notation and Modeling for Classed Jackson Networks
      4. 18.4 A Single-Server Classed Network
      5. 18.5 Product Form Theorems
      6. 18.6 Examples Using Classed Networks
        1. 18.6.1 Connection-Oriented ATM Network Example
        2. 18.6.2 Distribution of Job Classes Example
        3. 18.6.3 CPU-Bound and I/O-Bound Jobs Example
      7. 18.7 Readings
      8. 18.8 Exercises
    6. 19 Closed Networks of Queues
      1. 19.1 Motivation
      2. 19.2 Product-Form Solution
        1. 19.2.1 Local Balance Equations for Closed Networks
        2. 19.2.2 Example of Deriving Limiting Probabilities
      3. 19.3 Mean Value Analysis (MVA)
        1. 19.3.1 The Arrival Theorem
        2. 19.3.2 Iterative Derivation of Mean Response Time
        3. 19.3.3 An MVA Example
      4. 19.4 Readings
      5. 19.5 Exercises
  15. VI Real-World Workloads: High Variability and Heavy Tails
    1. 20 Tales of Tails: A Case Study of Real-World Workloads
      1. 20.1 Grad School Tales ... Process Migration
      2. 20.2 UNIX Process Lifetime Measurements
      3. 20.3 Properties of the Pareto Distribution
      4. 20.4 The Bounded Pareto Distribution
      5. 20.5 Heavy Tails
      6. 20.6 The Benefits of Active Process Migration
      7. 20.7 Pareto Distributions Are Everywhere
      8. 20.8 Exercises
    2. 21 Phase-Type Distributions and Matrix-Analytic Methods
      1. 21.1 Representing General Distributions by Exponentials
      2. 21.2 Markov Chain Modeling of PH Workloads
      3. 21.3 The Matrix-Analytic Method
      4. 21.4 Analysis of Time-Varying Load
        1. 21.4.1 High-Level Ideas
        2. 21.4.2 The Generator Matrix, Q
        3. 21.4.3 Solving for R
        4. 21.4.4 Finding
        5. 21.4.5 Performance Metrics
      5. 21.5 More Complex Chains
      6. 21.6 Readings and Further Remarks
      7. 21.7 Exercises
    3. 22 Networks with Time-Sharing (PS) Servers (BCMP)
      1. 22.1 Review of Product-Form Networks
      2. 22.2 BCMP Result
        1. 22.2.1 Networks with FCFS Servers
        2. 22.2.2 Networks with PS Servers
      3. 22.3 M/M/1/PS
      4. 22.4 M/Cox/1/PS
      5. 22.5 Tandem Network of M/G/1/PS Servers
      6. 22.6 Network of PS Servers with Probabilistic Routing
      7. 22.7 Readings
      8. 22.8 Exercises
    4. 23 The M/G/1 Queue and the Inspection Paradox
      1. 23.1 The Inspection Paradox
      2. 23.2 The M/G/1 Queue and Its Analysis
      3. 23.3 Renewal-Reward Theory
      4. 23.4 Applying Renewal-Reward to Get Expected Excess
      5. 23.5 Back to the Inspection Paradox
      6. 23.6 Back to the M/G/1 Queue
      7. 23.7 Exercises
    5. 24 Task Assignment Policies for Server Farms
      1. 24.1 Task Assignment for FCFS Server Farms
      2. 24.2 Task Assignment for PS Server Farms
      3. 24.3 Optimal Server Farm Design
      4. 24.4 Readings and Further Follow-Up
      5. 24.5 Exercises
    6. 25 Transform Analysis
      1. 25.1 Definitions of Transforms and Some Examples
      2. 25.2 Getting Moments from Transforms: Peeling the Onion
      3. 25.3 Linearity of Transforms
      4. 25.4 Conditioning
      5. 25.5 Distribution of Response Time in an M/M/1
      6. 25.6 Combining Laplace and z-Transforms
      7. 25.7 More Results on Transforms
      8. 25.8 Readings
      9. 25.9 Exercises
    7. 26 M/G/1 Transform Analysis
      1. 26.1 The z-Transform of the Number in System
      2. 26.2 The Laplace Transform of Time in System
      3. 26.3 Readings
      4. 26.4 Exercises
    8. 27 Power Optimization Application
      1. 27.1 The Power Optimization Problem
      2. 27.2 Busy Period Analysis of M/G/1
      3. 27.3 M/G/1 with Setup Cost
      4. 27.4 Comparing ON/IDLE versus ON/OFF
      5. 27.5 Readings
      6. 27.6 Exercises
  16. VII Smart Scheduling in the M/G/1
    1. 28 Performance Metrics
      1. 28.1 Traditional Metrics
      2. 28.2 Commonly Used Metrics for Single Queues
      3. 28.3 Today’s Trendy Metrics
      4. 28.4 Starvation/Fairness Metrics
      5. 28.5 Deriving Performance Metrics
      6. 28.6 Readings
    2. 29 Scheduling: Non-Preemptive, Non-Size-Based Policies
      1. 29.1 FCFS, LCFS, and RANDOM
      2. 29.2 Readings
      3. 29.3 Exercises
    3. 30 Scheduling: Preemptive, Non-Size-Based Policies
      1. 30.1 Processor-Sharing (PS)
        1. 30.1.1 Motivation behind PS
        2. 30.1.2 Ages of Jobs in the M/G/1/PS System
        3. 30.1.3 Response Time as a Function of Job Size
        4. 30.1.4 Intuition for PS Results
        5. 30.1.5 Implications of PS Results for Understanding FCFS
      2. 30.2 Preemptive-LCFS
      3. 30.3 FB Scheduling
      4. 30.4 Readings
      5. 30.5 Exercises
    4. 31 Scheduling: Non-Preemptive, Size-Based Policies
      1. 31.1 Priority Queueing
      2. 31.2 Non-Preemptive Priority
      3. 31.3 Shortest-Job-First (SJF)
      4. 31.4 The Problem with Non-Preemptive Policies
      5. 31.5 Exercises
    5. 32 Scheduling: Preemptive, Size-Based Policies
      1. 32.1 Motivation
      2. 32.2 Preemptive Priority Queueing
      3. 32.3 Preemptive-Shortest-Job-First (PSJF)
      4. 32.4 Transform Analysis of PSJF
      5. 32.5 Exercises
    6. 33 Scheduling: SRPT and Fairness
      1. 33.1 Shortest-Remaining-Processing-Time (SRPT)
      2. 33.2 Precise Derivation of SRPT Waiting Time*
      3. 33.3 Comparisons with Other Policies
        1. 33.3.1 Comparison with PSJF
        2. 33.3.2 SRPT versus FB
        3. 33.3.3 Comparison of All Scheduling Policies
      4. 33.4 Fairness of SRPT
      5. 33.5 Readings
  17. Bibliography
  18. Index