O'Reilly logo

Professional CUDA C Programming by Ty McKercher, Max Grossman, John Cheng

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Chapter 7Tuning Instruction-Level Primitives

What's in this chapter?

  • Learning about multiple classes of CUDA instructions and their impact on application behavior
  • Observing the relative accuracy of single- and double-precision floating-point values
  • Experimenting with the performance and accuracy of standard and intrinsic functions
  • Uncovering undefined behavior from unsafe memory accesses
  • Understanding the significance of arithmetic instructions and the consequences of using them improperly

When making the decision to use CUDA for a particular application, the primary motivator is usually the computational throughput of GPUs. As you learned in previous chapters in this book, in order to achieve high throughput on GPUs you need to understand what factors are limiting peak performance. You have already learned about CUDA tools that can help you determine if your workload is sensitive to latency, bandwidth, or arithmetic operations. Based on this understanding you can generally classify applications into two categories:

  • I/O-bound
  • Compute-bound

In this chapter, you will focus on tuning compute-bound workloads. The computational throughput of a processor can be measured by the number of operations it performs in a period of ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required