Chapter 6. Reduction and Atomic Functions

In this chapter we deal with computations where all of the threads interact to contribute to a single output. Many such computations lead to a pattern known as reduction, which involves an input array whose elements are combined until a single output value is obtained. Applications include dot products (a.k.a. inner products or scalar products), image similarity measures, integral properties, and (with slight generalization) histograms.

Threads Interacting Globally

In Chapter 5, “Stencils and Shared Memory,” we took the first serious step toward dealing with interaction between computational threads, but stencil computations only involve local interactions between threads that are nearby in the grid. ...

Get CUDA for Engineers: An Introduction to High-Performance Parallel Computing now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.