Chapter 11. Streaming Workloads

Streaming workloads are among the simplest that can be ported to CUDA: computations where each data element can be computed independently of the others, often with such low computational density that the workload is bandwidth-bound. Streaming workloads do not use many of the hardware resources of the GPU, such as caches and shared memory, that are designed to optimize reuse of data.

Since GPUs give the biggest benefits on workloads with high computational density, it might be useful to review some cases when it still makes sense for streaming workloads to port to GPUs.

• If the input and output are in device memory, it doesn’t make sense to transfer the data back to the CPU just to perform one operation.

• If the ...

Get The CUDA Handbook: A Comprehensive Guide to GPU Programming now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.