Loop unrolling

Loop unrolling is a technique that seeks to ensure you do a reasonable number of data operations for the overhead of running through a loop. Take the following code:

{

  for (i=0;i<100;i++)

   q[i]=i;

}

In terms of assembly code, this will generate:

• A load of a register with 0 for parameter i.

• A test of the register with 100.

• A branch to either exit or execute the loop.

• An increment of the register holding the loop counter.

• An address calculation of array q indexed by i.

• A store of i to the calculated address.

Only the last of these instructions actually does some real work. The rest of the instructions are overhead.

We can rewrite this C code as

{

  for (i=0;i<25;i+=4)

   q[i]=i;

   q[i+1]=i+1;

   q[i+2]=i+2;

Get CUDA Programming now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.