This is the pseudocode for the up-sweep. (Notice the parfor over the j variable, which means that this block of code can be parallelized over threads indexed by j):
input: x0, ..., xn-1initialize: for i = 0 to n - 1: yi := xibegin:for k=0 to log2(n) - 1: parfor j=0 to n - 1: if j is divisible by 2k+1: yj+2k+1-1 = yj+2k-1 yj +2k+1 -1 else: continueendoutput: y0, ..., yn-1