Automatic grain size

The parallel loop templates in the original release of Threading Building Blocks required a grainsize parameter. We have been looking into ways to automatically determine the right value, but it’s not easy.

Feedback from users is that they want automatic grain size determination, even if it is not always optimal, so the grainsize parameter is now optional in creating the iterator. When grainsize is not specified, a partitioner should be supplied to the algorithm template.

If both the partitioner and the grainsize are omitted, it’s the same as specifying a grainsize of 1. If there are more than 10,000 instructions per iteration, it will work okay. With fewer than a thousand or so, there will be a serious performance hit.

A partitioner is an object that guides the chunking of a range. Currently, only auto_ partitioner makes sense without a grainsize.

The auto_partitioner provides an alternative that heuristically chooses the grain size so that you do not have to specify one. The heuristic attempts to limit overhead while still providing ample opportunities for load balancing. Guessing the grain size with the heuristic is not easy, but it does have a connection with the task scheduler that allows it to get dynamic guidance, which can make it better than a static choice of grain size.

Example 3-7 shows how to use an auto_partitioner instead of a grainsize. Notice that the grainsize parameter is omitted when constructing the blocked_range and that an auto_partitioner ...

Get Intel Threading Building Blocks now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.