Automatic grain size
The parallel loop templates in the original release of Threading Building Blocks required a grainsize
parameter. We have been looking into ways to automatically determine the right value, but it’s not easy.
Feedback from users is that they want automatic grain size determination, even if it is not always optimal, so the grainsize
parameter is now optional in creating the iterator. When grainsize
is not specified, a partitioner should be supplied to the algorithm template.
If both the partitioner and the grainsize
are omitted, it’s the same as specifying a grainsize
of 1
. If there are more than 10,000 instructions per iteration, it will work okay. With fewer than a thousand or so, there will be a serious performance hit.
A partitioner is an object that guides the chunking of a range. Currently, only auto_ partitioner
makes sense without a grainsize
.
The auto_partitioner
provides an alternative that heuristically chooses the grain size so that you do not have to specify one. The heuristic attempts to limit overhead while still providing ample opportunities for load balancing. Guessing the grain size with the heuristic is not easy, but it does have a connection with the task scheduler that allows it to get dynamic guidance, which can make it better than a static choice of grain size.
Example 3-7 shows how to use an auto_partitioner
instead of a grainsize
. Notice that the grainsize
parameter is omitted when constructing the blocked_range
and that an auto_partitioner ...
Get Intel Threading Building Blocks now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.