A Frame Loop

Simply put, we propose that instead of having several data parallel loops in a frame, we have only two: one to move objects and a second to update positions for the next frame.

To compare threaded game architectures, consider data parallelism (Figure 11-24) and domain decomposition (Figure 11-25). The data parallelism structure indicates several parallel loops running in each frame. Operations on pieces of data are distributed in each parallel loop. After each parallel loop is executed, synchronization with the root thread occurs. It performs some serial computation and after several parallel loops, it synchronizes with the rendering thread to allow the next frame to begin. Each of these synchronizations, sometimes called barriers, is terrible (3 on the scale) because:

  • The fastest thread must wait for the slowest thread every loop. There are more chances for load imbalance.

  • Synchronization with the root thread is a much more contentious multiple-writer, single-reader (root thread) type.

  • How many synchronizations are really needed? Note that for each frame there are at least two synchronizations: at the end of parallel interactions (SI), and then at the end of the frame when all the object positions are updated for the next frame to start (SF; F is for frame).

  • SI and SF are sufficient to keep all threads synchronized on each frame.

With domain decomposition, the data is divided among the threads at the beginning into domains of objects. Also, fewer synchronizations are done ...

Get Intel Threading Building Blocks now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.