View Single Post
Old 2010-12-09, 15:09   #21
Andrew Thall
 
Dec 2010

816 Posts
Default

With regard the GPU LLR work; haven't looked at the sequential algorithms; based on George W.'s description, use of straightline in place of circular convolution and shift-add for modular reduction...actually sounds pretty close to my initial CUDA efforts on LL, before I dug into Crandall's paper and got a better handle on the IBDWT approach.

You'll pay the cost of the larger FFTs; shift-add modular reduction isn't too hard, but you'll also need a parallel scan-based carry-adder if you need fully resolved carries---I have a hotwired CUDPP that does carry-add and subtract with borrow, so that's doable. (I can ask Mark Harris if they'd like to include that in the standard CUDPP release.) The most recent gpuLucas forgoes that and uses a carry-save configuration to keep all computations local except for the FFTs themselves. Big time savings there.
Andrew Thall is offline   Reply With Quote