O'Reilly logo

Professional CUDA C Programming by Ty McKercher, Max Grossman, John Cheng

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Chapter 6Streams and Concurrency

What's in this chapter?

  • Understanding the nature of streams and events
  • Exploiting grid level concurrency
  • Overlapping kernel execution and data transfer
  • Overlapping CPU and GPU execution
  • Understanding synchronization mechanisms
  • Avoiding unwanted synchronization
  • Adjusting stream priorities
  • Registering device callback functions
  • Displaying application execution timelines with the NVIDIA Visual Profiler

Generally speaking, there are two levels of concurrency in CUDA C programming:

  • Kernel level concurrency
  • Grid level concurrency

Up to this point, your focus has been solely on kernel level concurrency, in which a single task, or kernel, is executed in parallel by many threads on the GPU. Several ways to improve kernel performance have been covered from the programming model, execution model, and memory model points-of-view. You have developed your ability to dissect and analyze your kernel's behavior using the command-line profiler.

This chapter will examine grid level concurrency. In grid level concurrency, multiple kernel launches are executed simultaneously on a single device, often leading to better device utilization. In this chapter, you will learn how to use CUDA streams to implement grid level concurrency. ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required