Parallel_scan with partitioner
Parallel_scan
has an optional third argument to specify a partitioner (Example 3-21). See the section “Automatic grain size” for more information.
Example 3-21. parallel_scan with partitioner argument
using namespace tbb; class Body { T sum; T* const y; const T* const x; public: Body( T y_[], const T x_[] ) : sum(0), x(x_), y(y_) {} T get_sum() const {return sum;} template<typename Tag> void operator()( const blocked_range<int>& r, Tag ) { T temp = sum; for( int i=r.begin(); i<r.end(); ++i ) { temp = temp ⊕ x[i]; if( Tag::is_final_scan() ) y[i] = temp; } sum = temp; } Body( Body& b, split ) : x(b.x), y(b.y), sum(id⊕) {} voidreverse_join( Body& a ) { sum = a.sum ⊕ sum;} void assign( Body& b ) {sum = b.sum;} }; float DoParallelScan( T y[], const T x[], int n) { Body body(y,x); parallel_scan( blocked_range<int>(0,n), body, auto_partitioner() ); return body.get_sum(); }
Two important changes from parallel_scan should be noted:
The call to
parallel_scan
takes a third argument, anauto_partitioner
object.The
blocked_range
constructor is not provided with agrainsize
parameter.
Get Intel Threading Building Blocks now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.