Chapter 9Multi-GPU Programming

What's in this chapter?

Managing multiple GPUs
Executing kernels across multiple GPUs
Overlapping computation and communication between GPUs
Synchronizing across GPUs
Exchanging data using CUDA-aware MPI
Exchanging data using CUDA-aware MPI with GPUDirect RDMA
Scaling applications across a GPU-accelerated cluster
Understanding CPU and GPU affinity

So far, most of the examples in this book have used a single GPU. In this chapter, you will gain experience in multi-GPU programming: scaling your application across multiple GPUs within a compute node, or across multiple GPU-accelerated nodes. CUDA provides a number of features to facilitate multi-GPU programming, including multi-device management from one or more processes, direct access to other devices' memory using Unified Virtual Addressing (UVA) and GPUDirect, and computation-communication overlap across multiple devices using streams and asynchronous functions. In this chapter, you will learn the necessary skills to:

Manage and execute kernels on multiple GPUs.
Overlap computation and communication across multiple GPUs.
Synchronize execution across multiple GPUs using streams and events.
Scale CUDA-aware MPI applications across a GPU-accelerated ...

Get Professional CUDA C Programming now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.

Start your free trial

Professional CUDA C Programming by John Cheng, Max Grossman, Ty McKercher

Chapter 9Multi-GPU Programming

Don’t leave empty-handed

It’s yours, free.

Check it out now on O’Reilly