Another interesting and useful way to compute the population count of a word is based on the "divide and
conquer" paradigm. This algorithm might be devised by reasoning,
"Suppose I had a way to compute the population count of a 16-bit quantity. Then I could run
that on the left and right halves of the 32-bit word, and add the
results, to obtain the population count of the 32-bit word." This
strategy won't pay off if the basic algorithm must be run sequentially
on the two halves and it takes time proportional to the number of bits
being analyzed, because it would then take 16*k* +
16*k* = 32*k* units of time, where
*k* is the constant of proportionality, plus another
instruction for the addition. But if we can somehow do the operation on
the two halfwords in parallel, there will be an improvement from,
essentially, 32*k* to 16*k* +
1.

To efficiently compute the population count of two 16-bit quantities, we need a way to do it for 8-bit quantities, and to do 4 of them in parallel. Continuing this reasoning, we need a way to compute the population count of 2-bit quantities, and to do 16 of them in parallel.

The algorithm to be described in no way depends on running
operations on separate processors, or on unusual instructions such as
the SIMD^{[19]}instructions found on some computers. It uses only the
facilities usually found on a conventional uniprocessor RISC or
CISC.

The plan is illustrated in Figure 10-1.

Figure 10-1. Counting 1-bits, "divide and conquer" strategy

The first line ...

Start Free Trial

No credit card required