Another interesting and useful way to compute the population count of a word is based on the "divide and conquer" paradigm. This algorithm might be devised by reasoning, "Suppose I had a way to compute the population count of a 16-bit quantity. Then I could run that on the left and right halves of the 32-bit word, and add the results, to obtain the population count of the 32-bit word." This strategy won't pay off if the basic algorithm must be run sequentially on the two halves and it takes time proportional to the number of bits being analyzed, because it would then take 16k + 16k = 32k units of time, where k is the constant of proportionality, plus another instruction for the addition. But if we can somehow do the operation on the two halfwords in parallel, there will be an improvement from, essentially, 32k to 16k + 1.
To efficiently compute the population count of two 16-bit quantities, we need a way to do it for 8-bit quantities, and to do 4 of them in parallel. Continuing this reasoning, we need a way to compute the population count of 2-bit quantities, and to do 16 of them in parallel.
The algorithm to be described in no way depends on running operations on separate processors, or on unusual instructions such as the SIMDinstructions found on some computers. It uses only the facilities usually found on a conventional uniprocessor RISC or CISC.
The plan is illustrated in Figure 10-1.
Figure 10-1. Counting 1-bits, "divide and conquer" strategy
The first line ...