You are previewing Distributed Computing.
O'Reilly logo
Distributed Computing

Book Description

Designing distributed computing systems is a complex process requiring a solid understanding of the design problems and the theoretical and practical aspects of their solutions. This comprehensive textbook covers the fundamental principles and models underlying the theory, algorithms and systems aspects of distributed computing. Broad and detailed coverage of the theory is balanced with practical systems-related issues such as mutual exclusion, deadlock detection, authentication, and failure recovery. Algorithms are carefully selected, lucidly presented, and described without complex proofs. Simple explanations and illustrations are used to elucidate the algorithms. Important emerging topics such as peer-to-peer networks and network security are also considered. With vital algorithms, numerous illustrations, examples and homework problems, this textbook is suitable for advanced undergraduate and graduate students of electrical and computer engineering and computer science. Practitioners in data networking and sensor networks will also find this a valuable resource. Additional resources are available online at www.cambridge.org/9780521876346.

Table of Contents

  1. Cover
  2. Half Title
  3. Title Page
  4. Copyright
  5. Dedication
  6. Contents
  7. Preface
  8. 1. Introduction
    1. 1.1 Definition
    2. 1.2 Relation to computer system components
    3. 1.3 Motivation
    4. 1.4 Relation to parallel multiprocessor/multicomputer systems
    5. 1.5 Message-passing systems versus shared memory systems
    6. 1.6 Primitives for distributed communication
    7. 1.7 Synchronous versus asynchronous executions
    8. 1.8 Design issues and challenges
    9. 1.9 Selection and coverage of topics
    10. 1.10 Chapter summary
    11. 1.11 Exercises
    12. 1.12 Notes on references
    13. References
  9. 2. A model of distributed computations
    1. 2.1 A distributed program
    2. 2.2 A model of distributed executions
    3. 2.3 Models of communication networks
    4. 2.4 Global state of a distributed system
    5. 2.5 Cuts of a distributed computation
    6. 2.6 Past and future cones of an event
    7. 2.7 Models of process communications
    8. 2.8 Chapter summary
    9. 2.9 Exercises
    10. 2.10 Notes on references
    11. References
  10. 3. Logical time
    1. 3.1 Introduction
    2. 3.2 A framework for a system of logical clocks
    3. 3.3 Scalar time
    4. 3.4 Vector time
    5. 3.5 Efficient implementations of vector clocks
    6. 3.6 Jard–Jourdan’s adaptive technique
    7. 3.7 Matrix time
    8. 3.8 Virtual time
    9. 3.9 Physical clock synchronization: NTP
    10. 3.10 Chapter summary
    11. 3.11 Exercises
    12. 3.12 Notes on references
    13. References
  11. 4. Global state and snapshot recording algorithms
    1. 4.1 Introduction
    2. 4.2 System model and definitions
    3. 4.3 Snapshot algorithms for FIFO channels
    4. 4.4 Variations of the Chandy–Lamport algorithm
    5. 4.5 Snapshot algorithms for non-FIFO channels
    6. 4.6 Snapshots in a causal delivery system
    7. 4.7 Monitoring global state
    8. 4.8 Necessary and sufficient conditions for consistent global snapshots
    9. 4.9 Finding consistent global snapshots in a distributed computation
    10. 4.10 Chapter summary
    11. 4.11 Exercises
    12. 4.12 Notes on references
    13. References
  12. 5. Terminology and basic algorithms
    1. 5.1 Topology abstraction and overlays
    2. 5.2 Classifications and basic concepts
    3. 5.3 Complexity measures and metrics
    4. 5.4 Program structure
    5. 5.5 Elementary graph algorithms
    6. 5.6 Synchronizers
    7. 5.7 Maximal independent set (MIS)
    8. 5.8 Connected dominating set
    9. 5.9 Compact routing tables
    10. 5.10 Leader election
    11. 5.11 Challenges in designing distributed graph algorithms
    12. 5.12 Object replication problems
    13. 5.13 Chapter summary
    14. 5.14 Exercises
    15. 5.15 Notes on references
    16. References
  13. 6. Message ordering and group communication
    1. 6.1 Message ordering paradigms
    2. 6.2 Asynchronous execution with synchronous communication
    3. 6.3 Synchronous program order on an asynchronous system
    4. 6.4 Group communication
    5. 6.5 Causal order (CO)
    6. 6.6 Total order
    7. 6.7 A nomenclature for multicast
    8. 6.8 Propagation trees for multicast
    9. 6.9 Classification of application-level multicast algorithms
    10. 6.10 Semantics of fault-tolerant group communication
    11. 6.11 Distributed multicast algorithms at the network layer
    12. 6.12 Chapter summary
    13. 6.13 Exercises
    14. 6.14 Notes on references
    15. References
  14. 7. Termination detection
    1. 7.1 Introduction
    2. 7.2 System model of a distributed computation
    3. 7.3 Termination detection using distributed snapshots
    4. 7.4 Termination detection by weight throwing
    5. 7.5 A spanning-tree-based termination detection algorithm
    6. 7.6 Message-optimal termination detection
    7. 7.7 Termination detection in a very general distributed computing model
    8. 7.8 Termination detection in the atomic computation model
    9. 7.9 Termination detection in a faulty distributed system
    10. 7.10 Chapter summary
    11. 7.11 Exercises
    12. 7.12 Notes on references
    13. References
  15. 8. Reasoning with knowledge
    1. 8.1 The muddy children puzzle
    2. 8.2 Logic of knowledge
    3. 8.3 Knowledge in synchronous systems
    4. 8.4 Knowledge in asynchronous systems
    5. 8.5 Knowledge transfer
    6. 8.6 Knowledge and clocks
    7. 8.7 Chapter summary
    8. 8.8 Exercises
    9. 8.9 Notes on references
    10. References
  16. 9. Distributed mutual exclusion algorithms
    1. 9.1 Introduction
    2. 9.2 Preliminaries
    3. 9.3 Lamport’s algorithm
    4. 9.4 Ricart–Agrawala algorithm
    5. 9.5 Singhal’s dynamic information-structure algorithm
    6. 9.6 Lodha and Kshemkalyani’s fair mutual exclusion algorithm
    7. 9.7 Quorum-based mutual exclusion algorithms
    8. 9.8 Maekawa’s algorithm
    9. 9.9 Agarwal–El Abbadi quorum-based algorithm
    10. 9.10 Token-based algorithms
    11. 9.11 Suzuki–Kasami’s broadcast algorithm
    12. 9.12 Raymond’s tree-based algorithm
    13. 9.13 Chapter summary
    14. 9.14 Exercises
    15. 9.15 Notes on references
    16. References
  17. 10. Deadlock detection in distributed systems
    1. 10.1 Introduction
    2. 10.2 System model
    3. 10.3 Preliminaries
    4. 10.4 Models of deadlocks
    5. 10.5 Knapp’s classification of distributed deadlock detection algorithms
    6. 10.6 Mitchell and Merritt’s algorithm for the single-resource model
    7. 10.7 Chandy–Misra–Haas algorithm for the AND model
    8. 10.8 Chandy–Misra–Haas algorithm for the OR model
    9. 10.9 Kshemkalyani–Singhal algorithm for the P-out-of-Q model
    10. 10.10 Chapter summary
    11. 10.11 Exercises
    12. 10.12 Notes on references
    13. References
  18. 11. Global predicate detection
    1. 11.1 Stable and unstable predicates
    2. 11.2 Modalities on predicates
    3. 11.3 Centralized algorithm for relational predicates
    4. 11.4 Conjunctive predicates
    5. 11.5 Distributed algorithms for conjunctive predicates
    6. 11.6 Further classification of predicates
    7. 11.7 Chapter summary
    8. 11.8 Exercises
    9. 11.9 Notes on references
    10. References
  19. 12. Distributed shared memory
    1. 12.1 Abstraction and advantages
    2. 12.2 Memory consistency models
    3. 12.3 Shared memory mutual exclusion
    4. 12.4 Wait-freedom
    5. 12.5 Register hierarchy and wait-free simulations
    6. 12.6 Wait-free atomic snapshots of shared objects
    7. 12.7 Chapter summary
    8. 12.8 Exercises
    9. 12.9 Notes on references
    10. References
  20. 13. Checkpointing and rollback recovery
    1. 13.1 Introduction
    2. 13.2 Background and definitions
    3. 13.3 Issues in failure recovery
    4. 13.4 Checkpoint-based recovery
    5. 13.5 Log-based rollback recovery
    6. 13.6 Koo–Toueg coordinated checkpointing algorithm
    7. 13.7 Juang–Venkatesan algorithm for asynchronous checkpointing and recovery
    8. 13.8 Manivannan–Singhal quasi-synchronous checkpointing algorithm
    9. 13.9 Peterson–Kearns algorithm based on vector time
    10. 13.10 Helary–Mostefaoui–Netzer–Raynal communication-induced protocol
    11. 13.11 Chapter summary
    12. 13.12 Exercises
    13. 13.13 Notes on references
    14. References
  21. 14. Consensus and agreement algorithms
    1. 14.1 Problem definition
    2. 14.2 Overview of results
    3. 14.3 Agreement in a failure-free system (synchronous or asynchronous)
    4. 14.4 Agreement in (message-passing) synchronous systems with failures
    5. 14.5 Agreement in asynchronous message-passing systems with failures
    6. 14.6 Wait-free shared memory consensus in asynchronous systems
    7. 14.7 Chapter summary
    8. 14.8 Exercises
    9. 14.9 Notes on references
    10. References
  22. 15. Failure detectors
    1. 15.1 Introduction
    2. 15.2 Unreliable failure detectors
    3. 15.3 The consensus problem
    4. 15.4 Atomic broadcast
    5. 15.5 A solution to atomic broadcast
    6. 15.6 The weakest failure detectors to solve fundamental agreement problems
    7. 15.7 An implementation of a failure detector
    8. 15.8 An adaptive failure detection protocol
    9. 15.9 Exercises
    10. 15.10 Notes on references
    11. References
  23. 16. Authentication in distributed systems
    1. 16.1 Introduction
    2. 16.2 Background and definitions
    3. 16.3 Protocols based on symmetric cryptosystems
    4. 16.4 Protocols based on asymmetric cryptosystems
    5. 16.5 Password-based authentication
    6. 16.6 Authentication protocol failures
    7. 16.7 Chapter summary
    8. 16.8 Exercises
    9. 16.9 Notes on references
    10. References
  24. 17. Self-stabilization
    1. 17.1 Introduction
    2. 17.2 System model
    3. 17.3 Definition of self-stabilization
    4. 17.4 Issues in the design of self-stabilization algorithms
    5. 17.5 Methodologies for designing self-stabilizing systems
    6. 17.6 Communication protocols
    7. 17.7 Self-stabilizing distributed spanning trees
    8. 17.8 Self-stabilizing algorithms for spanning-tree construction
    9. 17.9 An anonymous self-stabilizing algorithm for 1-maximal independent set in trees
    10. 17.10 A probabilistic self-stabilizing leader election algorithm
    11. 17.11 The role of compilers in self-stabilization
    12. 17.12 Self-stabilization as a solution to fault tolerance
    13. 17.13 Factors preventing self-stabilization
    14. 17.14 Limitations of self-stabilization
    15. 17.15 Chapter summary
    16. 17.16 Exercises
    17. 17.17 Notes on references
    18. References
  25. 18. Peer-to-peer computing and overlay graphs
    1. 18.1 Introduction
    2. 18.2 Data indexing and overlays
    3. 18.3 Unstructured overlays
    4. 18.4 Chord distributed hash table
    5. 18.5 Content addressible networks (CAN)
    6. 18.6 Tapestry
    7. 18.7 Some other challenges in P2P system design
    8. 18.8 Tradeoffs between table storage and route lengths
    9. 18.9 Graph structures of complex networks
    10. 18.10 Internet graphs
    11. 18.11 Generalized random graph networks
    12. 18.12 Small-world networks
    13. 18.13 Scale-free networks
    14. 18.14 Evolving networks
    15. 18.15 Chapter summary
    16. 18.16 Exercises
    17. 18.17 Notes on references
    18. References
  26. Index