2011年6月22日 星期三

[Parallel Computing] Indexing tricks

When making device kernels (shaders,) how to assign threads to some tasks is very important.
Since the GPU has really little support of recursion, how to turn your recursive program into a non-recursive one becomes very important as well.

I saw this approach when I was working on GPU sorting.
The original Odd Even Merge Sort is described here

I'm not gonna describe the iterative version in detail but mention some tricks to accustom your code to the GPU kernel.

1. When your code has the thread assignment with a circular pattern:
For example:
8 Threads:
Thread Id 0 1 2 3 4 5 6 7
Activate/no n n y y n n y y

Solution:
Here we have a circular pattern with a period of 4.
Of course we can use something like if(Idx.x==xxx), but if the case is making an iterative execution, there will be too many cases to be considered.
A better way is to use (Idx & (period-1)) and set a threshold.
In our example, we can use (Idx&3) and set a threshold equal to 2.
So the values become:

Thread Id 0 1 2 3 4 5 6 7
Activate/no n n y y n n y y
Value 0 1 2 3 0 1 2 3
Thresholded n n y y n n y y

So it's exactly what we want for our program.

沒有留言:

張貼留言