Quote:
Originally Posted by Andrew Thall
I'd like to announce the implementation of a LucasLehmer tester, gpuLucas, written in CUDA and running on Fermiclass NVidia cards. It's a full implementation of Crandall's IBDWT method and uses balanced integers and a few little tricks to make it fast on the GPU.
Example timing: demonstrated primality of M_{42643801} in 57.86 hours, at a rate of 4.88 msec per Lucas product. This used a DWT runlength of 2,359,296 = 2^{18}*3^{2}, taking advantage of good efficiency for CUFFT runlengths of powers of small primes. Maximum error was 1.8e1.
gpuLucas has been tested on GTX 480 and Tesla 2050 cards; there's actually very little difference in runtimes between the two...fears of a performance hit due to slow floating point on the 480 are bogusit's a wicked fast card for the GPGPU stuff; you get an additional 32 CUDA cores in place of the faster double precision, and it's clocked much faster than the Tesla. The Tesla only really shines when you overclock the heck out of it; I ran it up to 1402 Mhz for the above test, at which point it is 1520% faster than the GTX for the big Mersenne numbers. (It depends on the FFT length, though, and when the greater number of processors on the GTX are offset by slower double precision, which is only used in the FFTs anyway.)
Finishing off a paper on the topic, and will post a preprint here in a week or so. I'll make the code available publicly as well, and maybe set up a tutorial webpage if folks are interested and if time permits.

Truly awesome. Kudos.
Now, it needs to be publicized. I am sure many users will take advantage of
it, but they need to know about it, how to install, run, etc.
It should also be folded in to GIMPS.