Chapter 9Multi-GPU Programming

What's in this chapter?

  • Managing multiple GPUs
  • Executing kernels across multiple GPUs
  • Overlapping computation and communication between GPUs
  • Synchronizing across GPUs
  • Exchanging data using CUDA-aware MPI
  • Exchanging data using CUDA-aware MPI with GPUDirect RDMA
  • Scaling applications across a GPU-accelerated cluster
  • Understanding CPU and GPU affinity

So far, most of the examples in this book have used a single GPU. In this chapter, you will gain experience in multi-GPU programming: scaling your application across multiple GPUs within a compute node, or across multiple GPU-accelerated nodes. CUDA provides a number of features to facilitate multi-GPU programming, including multi-device management from one or more processes, direct access to other devices' memory using Unified Virtual Addressing (UVA) and GPUDirect, and computation-communication overlap across multiple devices using streams and asynchronous functions. In this chapter, you will learn the necessary skills to:

  • Manage and execute kernels on multiple GPUs.
  • Overlap computation and communication across multiple GPUs.
  • Synchronize execution across multiple GPUs using streams and events.
  • Scale CUDA-aware MPI applications across a GPU-accelerated ...

Get Professional CUDA C Programming now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.