CUDA for Engineers: An Introduction to High-Performance Parallel Computing

Chapter 6. Reduction and Atomic Functions

In this chapter we deal with computations where all of the threads interact to contribute to a single output. Many such computations lead to a pattern known as reduction, which involves an input array whose elements are combined until a single output value is obtained. Applications include dot products (a.k.a. inner products or scalar products), image similarity measures, integral properties, and (with slight generalization) histograms.

Threads Interacting Globally

In Chapter 5, “Stencils and Shared Memory,” we took the first serious step toward dealing with interaction between computational threads, but stencil computations only involve local interactions between threads that are nearby in the grid. ...

Get CUDA for Engineers: An Introduction to High-Performance Parallel Computing now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.

Start your free trial

CUDA for Engineers: An Introduction to High-Performance Parallel Computing by Mete Yurtoglu, Duane Storti

Chapter 6. Reduction and Atomic Functions

Threads Interacting Globally

Don’t leave empty-handed

It’s yours, free.

Check it out now on O’Reilly