O'Reilly logo

OpenCL Parallel Programming Development Cookbook by Raymond Tay

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Faster OpenCL implementation of the matrix multiplication by thread coarsening

In this section, let's try to make this beast run faster by applying a technique in parallel programming: thread coarsening. This is important because when you have a work item accessing an element, and then you have large matrices you could potentially have millions of work items running! In general, that's not a good thing because many devices today cannot support millions of work items in n dimensions unless it's a supercomputer. But there are often clever ways to reduce the amount of work items needed.

Getting ready

The general technique here is to explore ways in which we can merge threads so that each thread now calculates multiple elements. When we reexamine the ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required