Quote:
Originally Posted by The Carnivore
Yes, we know that some of you want GPUs for k*2^n+/1 numbers, so quit repeating it every few weeks.

Just so you know, it is a nontrivial task to go from a discrete weighted transform (DWT) that support Mersenne numbers to a DWT that works on k*2^n+/1. In fact, a DWT can only support "small" k values (up to 50,000 or so).
To support all k values, you'll need to write C code or CUDA code to do the modular reduction at the same time as the carry propagation. This requires using FFTs that are twice the size as Mersenne numbers, zeroing the upper half of the FFT data. Thus, you can expect the LLR test time for a 12,500,000 bit number to be just a tad slower than the LL test time for a 25,000,000 bit number.