You are previewing Intel Threading Building Blocks.
O'Reilly logo
Intel Threading Building Blocks

Book Description

Multi-core chips from Intel and AMD offer a dramatic boost in speed and responsiveness, and plenty of opportunities for multiprocessing on ordinary desktop computers. But they also present a challenge: More than ever, multithreading is a requirement for good performance. This guide explains how to maximize the benefits of these processors through a portable C++ library that works on Windows, Linux, Macintosh, and Unix systems. With it, you'll learn how to use Intel Threading Building Blocks (TBB) effectively for parallel programming -- without having to be a threading expert. Written by James Reinders, Chief Evangelist of Intel Software Products, and based on the experience of Intel's developers and customers, this book explains the key tasks in multithreading and how to accomplish them with TBB in a portable and robust manner. With plenty of examples and full reference material, the book lays out common patterns of uses, reveals the gotchas in TBB, and gives important guidelines for choosing among alternatives in order to get the best performance. You'll learn how Intel Threading Building Blocks:

  • Enables you to specify tasks instead of threads for better portability, easier programming, more understandable source code, and better performance and scalability in general

  • Focuses on the goal of parallelizing computationally intensive work to deliver high-level solutions

  • Is compatible with other threading packages, and doesn't force you to pick one package for your entire program

  • Emphasizes scalable, data-parallel programming, which allows program performance to increase as you add processors

  • Relies on generic programming, which enables you to write the best possible algorithms with the fewest constraints

Any C++ programmer who wants to write an application to run on a multi-core system will benefit from this book. TBB is also very approachable for a C programmer or a C++ programmer without much experience with templates. Best of all, you don't need experience with parallel programming or multi-core processors to use this book.

Table of Contents

  1. Intel Threading Building Blocks
    1. SPECIAL OFFER: Upgrade this ebook with O’Reilly
    2. Foreword
    3. Note from the Lead Developer of Intel Threading Building Blocks
    4. Preface
      1. Assumptions This Book Makes
      2. Contents of This Book
      3. Conventions Used in This Book
        1. Informal Class Declarations
      4. Using Code Examples
      5. How to Contact Us
      6. Acknowledgments
    5. 1. Why Threading Building Blocks?
      1. Overview
      2. Benefits
        1. Comparison with Raw Threads and MPI
        2. Comparison with OpenMP
        3. Recursive Splitting, Task Stealing, and Algorithms
    6. 2. Thinking Parallel
      1. Elements of Thinking Parallel
      2. Decomposition
        1. Data Parallelism
        2. Task Parallelism
        3. Pipelining (Task and Data Parallelism Together)
        4. Mixed Solutions
        5. Achieving Parallelism
      3. Scaling and Speedup
        1. How Much Parallelism Is There in an Application?
          1. Amdahl’s Law
          2. Gustafson’s observations regarding Amdahl’s Law
          3. What did they really say?
          4. Serial versus parallel algorithms
      4. What Is a Thread?
        1. Programming Threads
        2. Safety in the Presence of Concurrency
      5. Mutual Exclusion and Locks
      6. Correctness
      7. Abstraction
      8. Patterns
      9. Intuition
    7. 3. Basic Algorithms
      1. Initializing and Terminating the Library
      2. Loop Parallelization
        1. parallel_for
          1. Grain size
          2. Automatic grain size
          3. Notes on automatic grain size
          4. parallel_for with partitioner
        2. parallel_reduce
          1. Advanced example
          2. Parallel_reduce with partitioner
        3. Advanced Topic: Other Kinds of Iteration Spaces
          1. Notes on blocked_range2d
        4. parallel_scan
          1. Parallel_scan with partitioner
      3. Recursive Range Specifications
        1. Splittable Concept
      4. Recursive Range Specifications
        1. Model Types: Splittable Ranges
          1. split Class
          2. Range Concept
        2. Model Types
          1. blocked_range<Value> Template Class
          2. blocked_range2d Template Class
          3. Partitioner Concept
        3. Model Types: Partitioners
          1. simple_partitioner Class
          2. auto_partitioner Class
          3. parallel_for<Range,Body> Template Function
          4. parallel_reduce<Range,Body> Template Function
          5. parallel_scan<Range,Body> Template Function
          6. pre_scan_tag and final_scan_tag Classes
      5. Summary of Loops
    8. 4. Advanced Algorithms
      1. Parallel Algorithms for Streams
        1. Cook Until Done: parallel_while
          1. Notes on parallel_while scaling
            1. parallel_while Template Class
        2. Working on the Assembly Line: Pipeline
          1. Throughput of pipeline
          2. Nonlinear pipelines
            1. pipeline Class
            2. filter Class
        3. parallel_sort
          1. parallel_sort<RandomAccessIterator, Compare> Template Function
    9. 5. Containers
      1. concurrent_queue
        1. Iterating over a concurrent_queue for Debugging
        2. When Not to Use Queues
          1. Template Class
      2. concurrent_vector
        1. concurrent_vector Template Class
      3. concurrent_vector
        1. Whole Vector Operations
        2. Concurrent Operations
        3. Parallel Iteration
        4. Capacity
        5. Iterators
      4. concurrent_hash_map
        1. More on HashCompare
          1. Template Class
        2. Whole-Table Operations
        3. Concurrent Access
          1. const_accessor
          2. accessor class
        4. Concurrent Operations: find, insert, erase
        5. Parallel Iteration
        6. Capacity
        7. Iterators
    10. 6. Scalable Memory Allocation
      1. Limitations
      2. Problems in Memory Allocation
      3. Memory Allocators
        1. Which Library to Link into Your Application
        2. Using the Allocator Argument to C++ STL Template Classes
      4. Replacing malloc, new, and delete
        1. Replace malloc, free, realloc, and calloc
        2. Replace new and delete
        3. Allocator Concept
        4. Model Types
          1. scalable_allocator<T> Template Class
          2. cache_aligned_allocator<T> Template Class
          3. aligned_space Template Class
    11. 7. Mutual Exclusion
      1. When to Use Mutual Exclusion
      2. Mutexes
        1. Mutex Flavors
        2. Reader-Writer Mutexes
        3. Upgrade/Downgrade
        4. Lock Pathologies
          1. Deadlock
          2. Convoying and priority inversion
      3. Mutexes
        1. Mutex Concept
          1. mutex Class
          2. spin_mutex Class
          3. queuing_mutex Class
        2. ReaderWriterMutex Concept
          1. Model Types
            1. spin_rw_mutex Class
            2. queuing_rw_mutex Class
      4. Atomic Operations
        1. Why atomic<T> Has No Constructors
        2. Memory Consistency and Fences
          1. atomic<T> Template Class
    12. 8. Timing
      1. tick_count Class
      2. tick_count::interval_t Class
    13. 9. Task Scheduler
      1. When Task-Based Programming Is Inappropriate
      2. Much Better Than Raw Native Threads
        1. Oversubscription
        2. Fair Scheduling
        3. High Coding Overhead
        4. Load Imbalance
        5. Portability
      3. Initializing the Library Is Your Job
      4. Example Program for Fibonacci Numbers
      5. Task Scheduling Overview
      6. How Task Scheduling Works
      7. Recommended Task Recurrence Patterns
        1. Blocking Style with Children
        2. Continuation-Passing Style with Children
          1. Recycling the parent as the continuation
          2. Recycling the parent as a child
      8. Making Best Use of the Scheduler
        1. Recursive Chain Reaction
        2. Continuation Passing
          1. Scheduler bypass
          2. Recycling
          3. Empty tasks
          4. Lazy copying
      9. Task Scheduler Interfaces
        1. task_scheduler_init Class
        2. task Class
        3. empty_task Class
        4. task_list Class
      10. Task Scheduler Summary
    14. 10. Keys to Success
      1. Key Steps to Success
      2. Relaxed Sequential Execution
      3. Safe Concurrency for Methods and Libraries
      4. Debug Versus Release
      5. For Efficiency’s Sake
      6. Enabling Debugging Features
        1. The TBB_DO_ASSERT Macro
        2. Do Not Ship Your Program Built with TBB_DO_ASSERT
        3. The TBB_DO_THREADING_TOOLS Macro
        4. Debug Versus Release Libraries
      7. Mixing with Other Threading Packages
      8. Naming Conventions
        1. The tbb Namespace
        2. The tbb::internal Namespace
        3. The _ _TBB Prefix
    15. 11. Examples
      1. The Aha! Factor
      2. A Few Other Key Points
      3. parallel_for Examples
        1. ParallelAverage
        2. Seismic
        3. Matrix Multiply
        4. ParallelMerge
        5. SubstringFinder
      4. The Game of Life
        1. Implementation
        2. Automaton
        3. Automata: Implementation
        4. Extending the Application
        5. Futher Reading
      5. Parallel_reduce Examples
        1. ParallelSum
        2. ParallelSum without Having to Specify a Grain Size
        3. ParallelPrime
      6. CountStrings: Using concurrent_hash_map
        1. CountStrings: Using concurrent_hash_map
          1. Switching from an STL map
      7. Quicksort: Visualizing Task Stealing
      8. A Better Matrix Multiply (Strassen)
      9. Advanced Task Programming
        1. Start a Large Task in Parallel with the Main Program
        2. Two Mouths: Feeding Two from the Same Task in a Pipeline
      10. Packet Processing Pipeline
        1. Parallel Programming for an Internet Device
        2. Local Network Router Example
        3. Pipelined Components for the Local Network Router
          1. Network address translation (NAT)
          2. Application Layer Gateway (ALG)
          3. Packet forwarding
          4. Our example
        4. Implementation
          1. The Threading Building Blocks pipeline
          2. Synchronization with the pipeline item and concurrent hash maps
        5. Filter Classes
          1. Class get_next_packet : public tbb::filter
          2. Class output_packet : public tbb::filter
          3. Class translator : public tbb::filter
          4. Class gateway : public tbb::filter
          5. Class forwarding : public tbb::filter
          6. Additional reading
      11. Memory Allocation
        1. Replacing new and delete
        2. Replacing malloc, calloc, realloc, and free
      12. Game Threading Example
        1. Threading Architecture: Physics + Rendering
        2. Overview of Keys to Scalability
        3. A Frame Loop
        4. Domain Decomposition Data Structure Needs
        5. Think Tasks, Not Threads
        6. Load Balancing Versus Task Stealing
        7. Synchronization Between Physics Threads
        8. Integrating the Example into a Real Game
        9. How to Measure Performance
      13. Physics Interaction and Update Code
      14. Open Dynamics Engine
        1. Look for Hotspots
        2. Improving on the First Solution
        3. The Code
    16. 12. History and Related Projects
      1. Libraries
      2. Languages
      3. Pragmas
      4. Generic Programming
        1. Concepts in Generic Programming
        2. Pseudosignatures in Generic Programming
        3. Models in Generic Programming
      5. Caches
      6. Costs of Time Slicing
      7. Quick Introduction to Lambda Functions
      8. Further Reading
    17. Index
    18. About the Author
    19. Colophon
    20. SPECIAL OFFER: Upgrade this ebook with O’Reilly