You are previewing Parallel Programming with OpenACC.
O'Reilly logo
Parallel Programming with OpenACC

Book Description

Parallel Programming with OpenACC is a modern, practical guide to implementing dependable computing systems. The book explains how anyone can use OpenACC to quickly ramp-up application performance using high-level code directives called pragmas. The OpenACC directive-based programming model is designed to provide a simple, yet powerful, approach to accelerators without significant programming effort.

Author Rob Farber, working with a team of expert contributors, demonstrates how to turn existing applications into portable GPU accelerated programs that demonstrate immediate speedups. The book also helps users get the most from the latest NVIDIA and AMD GPU plus multicore CPU architectures (and soon for Intel® Xeon Phi™ as well). Downloadable example codes provide hands-on OpenACC experience for common problems in scientific, commercial, big-data, and real-time systems.

Topics include writing reusable code, asynchronous capabilities, using libraries, multicore clusters, and much more. Each chapter explains how a specific aspect of OpenACC technology fits, how it works, and the pitfalls to avoid. Throughout, the book demonstrates how the use of simple working examples that can be adapted to solve application needs.



  • Presents the simplest way to leverage GPUs to achieve application speedups
  • Shows how OpenACC works, including working examples that can be adapted for application needs
  • Allows readers to download source code and slides from the book's companion web page

Table of Contents

  1. Cover image
  2. Title page
  3. Table of Contents
  4. Copyright
  5. Contributors
  6. Foreword by Michael Wolfe
  7. Preface
  8. Acknowledgments
  9. Chapter 1: From serial to parallel programming using OpenACC
    1. Abstract
    2. A Simple Data-Parallel Loop
    3. A Simple Task-Parallel Example
    4. Amdahl’s Law and Scaling
    5. Parallel Execution and Race Conditions
    6. Lock-Free Programming
    7. Controlling Parallel Resources
    8. Make Your Life Simple
  10. Chapter 2: Profile-guided development with OpenACC
    1. Abstract
    2. Benchmark Code: Conjugate Gradient
    3. Describe Parallelism
    4. Describe Data Movement
    5. Optimize Loops
    6. Running in Parallel on Multicore
    7. Summary
  11. Chapter 3: Profiling performance of hybrid applications with Score-P and Vampir
    1. Abstract
    2. Performance Analysis Techniques and Terminology
    3. Evolutionary Performance Improvement
    4. A Particle-in-Cell Simulation of a Laser Driven Electron Beam
    5. Preparing the Measurement Through Code Instrumentation
    6. Recording Performance Information During the Application Run
    7. Looking at a First Parallel PIConGPU Implementation
    8. Freeing Up the Host Process
    9. Optimizing GPU Kernels
    10. Adding GPU Task Parallelism
    11. Investigating OpenACC Run Time Events With Score-P and Vampir
    12. Summary
  12. Chapter 4: Pipelining data transfers with OpenACC
    1. Abstract
    2. Introduction to Pipelining
    3. Example Code: Mandelbrot Generator
    4. Pipelining Across Multiple Devices
    5. Conclusions
  13. Chapter 5: Advanced data management
    1. Abstract
    2. Unstructured Data Regions
    3. Aggregate Types With Dynamic Data Members
    4. C++ Class Data Management
    5. Using Global and Module Variables in Routines
    6. Using Device Only Data
    7. Code Examples
    8. Runtime Results
    9. Summary
  14. Chapter 6: Tuning OpenACC loop execution
    1. Abstract
    2. The Loop Construct
    3. Basic Loop Optimization Clauses
    4. Advanced Loop Optimization Clauses
    5. Performance Results
    6. Conclusion
  15. Chapter 7: Multidevice programming with OpenACC
    1. Abstract
    2. Introduction
    3. Three Ways to Program Multiple Devices With OpenACC
    4. Example: Jacobi Solver for the 2D Poisson Equation
    5. Domain Decomposition
    6. Debugging and Profiling
    7. Conclusion
  16. Chapter 8: Using OpenACC for stencil and Feldkamp algorithms
    1. Abstract
    2. Introduction
    3. Experimental Setup
    4. Hybrid OpenMP/OpenACC
    5. Summary
  17. Chapter 9: Accelerating 3D wave equations using OpenACC
    1. Abstract
    2. Introduction
    3. Code Example: Solving 3D Scalar Wave Equation
    4. Converting Stack to Heap
    5. Measuring Host Baseline Scalability
    6. Using OpenACC Tools
    7. Using OpenACC Data Directives
    8. Targeting Multicore Systems With OpenACC
    9. Summary
  18. Chapter 10: The detailed development of an OpenACC application
    1. Abstract
    2. Introducing CloverLeaf
    3. Development Platform: Cray XK6
    4. Development of OpenACC CloverLeaf
    5. Conclusion
    6. Summary
    7. For More Information
  19. Chapter 11: GPU-accelerated molecular dynamics clustering analysis with OpenACC
    1. Abstract
    2. Acknowledgments
    3. Introduction
    4. Overview of MD Clustering Analysis
    5. Hardware Architecture Considerations
    6. Implementation
    7. Performance Results
    8. Summary and Conclusion
  20. Chapter 12: Incrementally accelerating the RI-MP2 correlated method of electronic structure theory using OpenACC compiler directives
    1. Abstract
    2. Acknowledgments
    3. Introduction
    4. Theory
    5. Implementation
    6. Results
    7. Summary and Conclusion
  21. Chapter 13: Using OpenACC to port large legacy climate and weather modeling code to GPUs
    1. Abstract
    2. Introduction
    3. Porting Approach: Step by Step
    4. Performance Optimization
    5. Results for the Radiation Parameterization
  22. Index