parallel_scan

A parallel_scan computes a parallel prefix, also known as a parallel scan. This computation is an advanced concept in parallel computing that is sometimes useful in scenarios that appear to have inherently serial dependencies.

A mathematical definition of the parallel prefix is as follows. Let ⊕ be an associative operation ⊕ with left-identity element id⊕. The parallel prefix of ⊕ over a sequence x0, x1 , … xn − 1 is a sequence y0, y1, y2 , …yn − 1 where:

  • y0 = id x0

  • yi = yi − 1 ⊕ i

For example, if ⊕ is addition, the parallel prefix corresponds to a running sum and the identity element is 0. A serial implementation of parallel prefix is:

	T temp = id⊕;
	for( int i=1; i<=n; ++i ) {
	    temp = temp ⊕ x[i];
	    y[i] = temp;
	}

Parallel prefix performs this in parallel by reassociating the application of ⊕ and using two passes. It may invoke ⊕ up to twice as many times as the serial prefix algorithm. But given the right grain size and sufficient hardware threads, it can out-perform the serial prefix because—even though it does more work—it can distribute the work across more than one hardware thread.

Tip

Because parallel_scan needs two passes, systems with only two hardware threads tend to exhibit only a small speedup. parallel_scan is better suited for future systems with more than two cores. It shows how a problem that appears inherently sequential can be parallelized.

Example 3-20 demonstrates how to use parallel_scan in a way similar to the sequential example.

Example 3-20. parallel_scan ...

Get Intel Threading Building Blocks now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.