O'Reilly logo

Professional CUDA C Programming by Ty McKercher, Max Grossman, John Cheng

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Chapter 3CUDA Execution Model

What's in this chapter?

  • Developing kernels with a profile-driven approach
  • Understanding the nature of warp execution
  • Exposing more parallelism to the GPU
  • Mastering grid and block configuration heuristics
  • Learning various CUDA performance metrics and events
  • Probing dynamic parallelism and nested execution

Through the exercises in the last chapter, you learned how to organize threads into grids and blocks to deliver the best performance. While you can find the best execution configuration through trial-and-error, you might be left wondering why the selected execution configuration outperforms others. You might want to know if there are some guidelines for selecting grid and block configurations. This chapter will answer those questions and provide you with deeper insight into kernel launch configurations and performance profile information, but from a different angle: the hardware perspective.

Introducing the CUDA Execution Model

In general, an execution model provides an operational view of how instructions are executed on a specific computing architecture. The CUDA execution model exposes an abstract view of the GPU parallel architecture, allowing you to reason about thread concurrency. In Chapter 2, you learned ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required