Parallelizing for two dimensions

Now, let's use the filters by applying them to the images. The arguments for the Boost Compute kernel are set using set_arg before execution, and when the execution is performed using enqueue_nd_range_kernel(), we apply the number of dimensions and the ranges of each dimension, which is the equivalent of how a double for-loop is used in the C++ code. The corresponding x and y variables in the kernel are then fetched using get_global_id() in OpenCL.

Take notice of the similarities between STL algorithms and the Boost Compute equivalents as shown in the table below:

Box filter on CPU

Box filter on GPU

auto box_filter_test_cpu( int w, int h, int r) { using array_t = std::array<size_t,2>; // Create ...

Get C++ High Performance now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.