O'Reilly logo

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Heterogeneous Computing with OpenCL, 2nd Edition

Book Description

Heterogeneous Computing with OpenCL teaches OpenCL and parallel programming for complex systems that may include a variety of device architectures: multi-core CPUs, GPUs, and fully-integrated Accelerated Processing Units (APUs) such as AMD Fusion technology. Designed to work on multiple platforms and with wide industry support, OpenCL will help you more effectively program for a heterogeneous future.

Written by leaders in the parallel computing and OpenCL communities, this book will give you hands-on OpenCL experience to address a range of fundamental parallel algorithms. The authors explore memory spaces, optimization techniques, graphics interoperability, extensions, and debugging and profiling. Intended to support a parallel programming course, Heterogeneous Computing with OpenCL includes detailed examples throughout, plus additional online exercises and other supporting materials.

  • Explains principles and strategies to learn parallel programming with OpenCL, from understanding the four abstraction models to thoroughly testing and debugging complete applications.
  • Covers image processing, web plugins, particle simulations, video editing, performance optimization, and more.
  • Shows how OpenCL maps to an example target architecture and explains some of the tradeoffs associated with mapping to various architectures
  • Addresses a range of fundamental programming techniques, with multiple examples and case studies that demonstrate OpenCL extensions for a variety of hardware platforms

Table of Contents

  1. Cover image
  2. Title page
  3. Table of Contents
  4. Copyright
  5. Foreword to the Revised OpenCL 1.2 Edition
  6. Foreword to the First Edition
  7. Preface
    1. Our Heterogeneous World
    2. OpenCL
    3. This Text
  8. Acknowledgments
  9. About the Authors
  10. Chapter 1. Introduction to Parallel Programming
    1. Introduction
    2. OpenCL
    3. The Goals of This Book
    4. Thinking Parallel
    5. Concurrency and Parallel Programming Models
    6. Structure
    7. Reference
    8. Further Reading and Relevant Websites
  11. Chapter 2. Introduction to OpenCL
    1. Introduction
    2. Platform and Devices
    3. The Execution Environment
    4. Memory Model
    5. Writing Kernels
    6. Full Source Code Example for Vector Addition
    7. Vector Addition with C++ Wrapper
    8. Summary
    9. Reference
  12. Chapter 3. OpenCL Device Architectures
    1. Introduction
    2. Hardware trade-offs
    3. The architectural design space
    4. Summary
    5. References
  13. Chapter 4. Basic OpenCL Examples
    1. Introduction
    2. Example Applications
    3. Compiling OpenCL Host Applications
    4. Summary
  14. Chapter 5. Understanding OpenCL’s Concurrency and Execution Model
    1. Introduction
    2. Kernels, Work-Items, Workgroups, and the Execution Domain
    3. OpenCL Synchronization: Kernels, Fences, and Barriers
    4. Queuing and Global Synchronization
    5. The Host-Side Memory Model
    6. The Device-Side Memory Model
    7. Summary
  15. Chapter 6. Dissecting a CPU/GPU OpenCL Implementation
    1. Introduction
    2. OpenCL on an AMD Bulldozer CPU
    3. OpenCL on the AMD Radeon HD7970 GPU
    4. Memory Performance Considerations in OpenCL
    5. Summary
    6. References
  16. Chapter 7. Data Management
    1. Memory management
    2. Data transfer in a discrete environment
    3. Data placement in a shared-memory environment
    4. Example application—work group reduction
    5. References
  17. Chapter 8. OpenCL Case Study: Convolution
    1. Introduction
    2. Convolution Kernel
    3. Conclusions
    4. Code Listings
    5. Reference
  18. Chapter 9. OpenCL Case Study: Histogram
    1. Introduction
    2. Choosing the Number of Workgroups
    3. Choosing the Optimal Workgroup Size
    4. Optimizing Global Memory Data Access Patterns
    5. Using Atomics to Perform Local Histogram
    6. Optimizing Local Memory Access
    7. Local Histogram Reduction
    8. The Global Reduction
    9. Full Kernel Code
    10. Performance and Summary
  19. Chapter 10. OpenCL Case Study: Mixed Particle Simulation
    1. Introduction
    2. Overview of the Computation
    3. GPU Implementation
    4. CPU Implementation
    5. Load Balancing
    6. Performance and Summary
    7. Kernel for Uniform Grid Creation
    8. Kernels for Simulation
  20. Chapter 11. OpenCL Extensions
    1. Introduction
    2. Overview of Extension Mechanism
    3. Device Fission
    4. Double Precision
    5. References
  21. Chapter 12. Foreign Lands: Plugging OpenCL In
    1. Introduction
    2. Beyond C and C++
    3. Haskell OpenCL
    4. Summary
    5. References
  22. Chapter 13. OpenCL Profiling and Debugging
    1. Introduction
    2. Profiling with events
    3. AMD Accelerated Parallel Processing Profiler
    4. AMD Accelerated Parallel Processing KernelAnalyzer
    5. Walking through the AMD APP Profiler
    6. Debugging OpenCL Applications
    7. Overview of gDEBugger
    8. AMD Printf Extension
    9. Conclusion
  23. Chapter 14. Performance Optimization of an Image Analysis Application
    1. Introduction
    2. Description of the algorithm
    3. Migrating multithreaded CPU implementation to OpenCL
    4. Performance optimization
    5. Power and performance analysis
    6. Conclusion
    7. References
  24. Index