You are previewing The Art of Concurrency.
O'Reilly logo
The Art of Concurrency

Book Description

If you're looking to take full advantage of multi-core processors with concurrent programming, this practical book provides the knowledge and hands-on experience you need. The Art of Concurrency is one of the few resources to focus on implementing algorithms in the shared-memory model of multi-core processors, rather than just theoretical models or distributed-memory architectures. The book provides detailed explanations and usable samples to help you transform algorithms from serial to parallel code, along with advice and analysis for avoiding mistakes that programmers typically make when first attempting these computations. Written by an Intel engineer with over two decades of parallel and concurrent programming experience, this book will help you:

  • Understand parallelism and concurrency

  • Explore differences between programming for shared-memory and distributed-memory

  • Learn guidelines for designing multithreaded applications, including testing and tuning

  • Discover how to make best use of different threading libraries, including Windows threads, POSIX threads, OpenMP, and Intel Threading Building Blocks

  • Explore how to implement concurrent algorithms that involve sorting, searching, graphs, and other practical computations

The Art of Concurrency shows you how to keep algorithms scalable to take advantage of new processors with even more cores. For developing parallel code algorithms for concurrent programming, this book is a must.

Table of Contents

  1. Dedication
  2. Special Upgrade Offer
  3. A Note Regarding Supplemental Files
  4. Preface
    1. Why Should You Read This Book?
    2. Who Is This Book For?
    3. What’s in This Book?
    4. Conventions Used in This Book
    5. Using Code Examples
    6. Comments and Questions
    7. Safari® Books Online
    8. Acknowledgments
  5. 1. Want to Go Faster? Raise Your Hands if You Want to Go Faster!
    1. Some Questions You May Have
      1. What Is a Thread Monkey?
      2. Parallelism and Concurrency: What’s the Difference?
      3. Why Do I Need to Know This? What’s in It for Me?
      4. Isn’t Concurrent Programming Hard?
      5. Aren’t Threads Dangerous?
    2. Four Steps of a Threading Methodology
      1. Step 1. Analysis: Identify Possible Concurrency
      2. Step 2. Design and Implementation: Threading the Algorithm
      3. Step 3. Test for Correctness: Detecting and Fixing Threading Errors
      4. Step 4. Tune for Performance: Removing Performance Bottlenecks
        1. The testing and tuning cycle
      5. What About Concurrency from Scratch?
    3. Background of Parallel Algorithms
      1. Theoretical Models
      2. Distributed-Memory Programming
      3. Parallel Algorithms Literature
    4. Shared-Memory Programming Versus Distributed-Memory Programming
      1. Common Features
        1. Redundant work
        2. Dividing work
        3. Sharing data
        4. Static/dynamic allocation of work
      2. Features Unique to Shared Memory
        1. Local declarations and thread-local storage
        2. Memory effects
        3. Communication in memory
        4. Mutual exclusion
        5. Producer/consumer
        6. Readers/writer locks
    5. This Book’s Approach to Concurrent Programming
  6. 2. Concurrent or Not Concurrent?
    1. Design Models for Concurrent Algorithms
      1. Task Decomposition
        1. What are the tasks and how are they defined?
        2. What are the dependencies between tasks and how can they be satisfied?
        3. How are the tasks assigned to threads?
        4. Example: numerical integration
      2. Data Decomposition
        1. How should you divide the data into chunks?
        2. How can you ensure that the tasks for each chunk have access to all data required for updates?
        3. How are the data chunks (and tasks) assigned to threads?
        4. Example: Game of Life on a finite grid
      3. Concurrent Design Models Wrap-Up
    2. What’s Not Parallel
      1. Algorithms with State
      2. Recurrences
      3. Induction Variables
      4. Reduction
      5. Loop-Carried Dependence
        1. Not-so-typical loop-carried dependence
  7. 3. Proving Correctness and Measuring Performance
    1. Verification of Parallel Algorithms
    2. Example: The Critical Section Problem
      1. First Attempt
      2. Second Attempt
      3. Third Attempt
      4. Fourth Attempt
      5. Dekker’s Algorithm
        1. Case 1
        2. Case 2a: T0 is the favored thread
        3. Case 2b: T1 is the favored thread
        4. Case 3
        5. What about indefinite postponement?
      6. What Did You Learn?
      7. There Are No Evil Threads, Just Threads Programmed for Evil
    3. Performance Metrics (How Am I Doing?)
      1. Speedup
        1. Amdahl’s Law
        2. Gustafson-Barsis’s Law
      2. Efficiency
      3. One Final Note on Speedup and Efficiency
    4. Review of the Evolution for Supporting Parallelism in Hardware
  8. 4. Eight Simple Rules for Designing Multithreaded Applications
    1. Rule 1: Identify Truly Independent Computations
    2. Rule 2: Implement Concurrency at the Highest Level Possible
    3. Rule 3: Plan Early for Scalability to Take Advantage of Increasing Numbers of Cores
    4. Rule 4: Make Use of Thread-Safe Libraries Wherever Possible
    5. Rule 5: Use the Right Threading Model
    6. Rule 6: Never Assume a Particular Order of Execution
    7. Rule 7: Use Thread-Local Storage Whenever Possible or Associate Locks to Specific Data
    8. Rule 8: Dare to Change the Algorithm for a Better Chance of Concurrency
    9. Summary
  9. 5. Threading Libraries
    1. Implicit Threading
      1. OpenMP
      2. Intel Threading Building Blocks
    2. Explicit Threading
      1. Pthreads
      2. Windows Threads
    3. What Else Is Out There?
    4. Domain-Specific Libraries
  10. 6. Parallel Sum and Prefix Scan
    1. Parallel Sum
      1. PRAM Algorithm
        1. A dash of reality
      2. A More Practical Algorithm
      3. Design Factor Scorecard
        1. Efficiency
        2. Simplicity
        3. Portability
        4. Scalability
    2. Prefix Scan
      1. PRAM Algorithm
        1. A less heavy dash of reality
      2. A More Practical Algorithm
        1. What the main thread does
        2. What the spawned threads are doing
      3. Design Factor Scorecard
        1. Efficiency
        2. Simplicity
        3. Portability
        4. Scalability
    3. Selection
      1. The Serial Algorithm
      2. The Concurrent Algorithm
        1. Finding the medians of subsequences
        2. Counting and marking elements for partitions
        3. The ArrayPack() function
      3. Some Design Notes
    4. A Final Thought
  11. 7. MapReduce
    1. Map As a Concurrent Operation
      1. Implementing a Concurrent Map
    2. Reduce As a Concurrent Operation
      1. Handcoded Reduction
      2. A Barrier Object Implementation
      3. Design Factor Scorecard
        1. Efficiency
        2. Simplicity
        3. Portability
        4. Scalability
    3. Applying MapReduce
      1. Friendly Numbers Example Summary
    4. MapReduce As Generic Concurrency
  12. 8. Sorting
    1. Bubblesort
      1. Will It Work?
      2. Design Factor Scorecard
        1. Efficiency
        2. Simplicity
        3. Portability
        4. Scalability
    2. Odd-Even Transposition Sort
      1. A Concurrent Code for Odd-Even Transposition Sort
      2. Trying to Push the Concurrency Higher
        1. Keeping threads awake longer without caffeine
      3. Design Factor Scorecard
        1. Efficiency
        2. Simplicity
        3. Portability
        4. Scalability
    3. Shellsort
      1. Quick Review of Insertion Sort
      2. Serial Shellsort
      3. Concurrent Shellsort
      4. Design Factor Scorecard
        1. Efficiency
        2. Simplicity
        3. Portability
        4. Scalability
    4. Quicksort
      1. Concurrency Within Recursion
      2. Concurrency Within an Iterative Version
        1. Iterative Quicksort
        2. Concurrent iterative version
          1. Letting threads know the work is done
          2. Finding work for threads
          3. Giving threads their pink slips
      3. Final Threaded Version
      4. Design Factor Scorecard
        1. Efficiency
        2. Simplicity
        3. Portability
        4. Scalability
    5. Radix Sort
      1. Radix Exchange Sort
      2. Straight Radix Sort
        1. Using prefix scan to gather keys
        2. Keeping data movement stable
        3. Reducing the number of data touches
      3. The Concurrent Straight Radix Sort Solution
      4. Design Factor Scorecard
        1. Efficiency
        2. Simplicity
        3. Portability
        4. Scalability
  13. 9. Searching
    1. Unsorted Sequence
      1. Curtailing the Search
      2. Design Factor Scorecard
        1. Efficiency
        2. Simplicity
        3. Portability
        4. Scalability
    2. Binary Search
      1. But First, a Serial Version
      2. At Last, the Concurrent Solution
      3. Design Factor Scorecard
        1. Efficiency
        2. Simplicity
        3. Portability
        4. Scalability
  14. 10. Graph Algorithms
    1. Depth-First Search
      1. A Recursive Solution
      2. An Iterative Solution
      3. Not the Concurrent Solution, Yet
        1. How many locks do we need?
        2. Locking a conditional expression evaluation
      4. Now for the Concurrent Solution
        1. A little interleaving analysis
        2. Spawning the depth-first search threads
      5. Design Factor Scorecard
        1. Efficiency
        2. Simplicity
        3. Portability
        4. Scalability
      6. Breadth-First Search
        1. It’s all in the queue
      7. Static Graphs Versus Dynamic Graphs
    2. All-Pairs Shortest Path
      1. What About the Data Race on the k<sup xmlns="http://www.w3.org/1999/xhtml" xmlns:epub="http://www.idpf.org/2007/ops" xmlns:m="http://www.w3.org/1998/Math/MathML" xmlns:pls="http://www.w3.org/2005/01/pronunciation-lexicon" xmlns:ssml="http://www.w3.org/2001/10/synthesis" xmlns:svg="http://www.w3.org/2000/svg">th</sup> Row? Row?
      2. Design Factor Scorecard
        1. Efficiency
        2. Simplicity
        3. Portability
        4. Scalability
      3. Alternatives to Floyd’s Algorithm
    3. Minimum Spanning Tree
      1. Kruskal’s Algorithm
      2. Prim’s Algorithm
      3. Which Serial Algorithm Should We Start With?
      4. Concurrent Version of Prim’s Algorithm
      5. Design Factor Scorecard
        1. Efficiency
        2. Simplicity
        3. Portability
        4. Scalability
  15. 11. Threading Tools
    1. Debuggers
      1. Thread-Aware Debugger
      2. Thread Issue Debugger: Thread Checker
    2. Performance Tools
      1. Profiling
      2. Thread Profiling: Standard Profile Tool (Sample Over Time), Thread Profiler
    3. Anything Else Out There?
    4. Go Forth and Conquer
  16. Glossary
  17. A. Photo Credits
  18. Index
  19. About the Author
  20. Colophon
  21. Special Upgrade Offer
  22. Copyright