O'Reilly logo

CUDA Application Design and Development by Rob Farber

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Chapter 4. The CUDA Execution Model
The heart of CUDA performance and scalability lies in the execution model and the simple partitioning of a computation into fixed-sized blocks of threads in the execution configuration. CUDA was created to map naturally the parallelism within an application to the massive parallelism of the GPGPU hardware. From the high-level language expression of the kernel to the replication of the lowest-level hardware units, on-board GPU scalability is preserved while many common parallel programming pitfalls are avoided. The result is massive thread scalability and high application performance across GPGPU hardware generations. The CUDA toolkit provides the programmer with those tools needed to exploit parallelism at both ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required