mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   GPU Computing (https://www.mersenneforum.org/forumdisplay.php?f=92)
-   -   Fast Mersenne Testing on the GPU using CUDA (https://www.mersenneforum.org/showthread.php?t=14310)

Andrew Thall 2010-12-07 15:15

Fast Mersenne Testing on the GPU using CUDA
 
I'd like to announce the implementation of a Lucas-Lehmer tester, gpuLucas, written in CUDA and running on Fermi-class NVidia cards. It's a full implementation of Crandall's IBDWT method and uses balanced integers and a few little tricks to make it fast on the GPU.

Example timing: demonstrated primality of M[SUB]42643801[/SUB] in 57.86 hours, at a rate of 4.88 msec per Lucas product. This used a DWT runlength of 2,359,296 = 2[SUP]18[/SUP]*3[SUP]2[/SUP], taking advantage of good efficiency for CUFFT runlengths of powers of small primes. Maximum error was 1.8e-1.

gpuLucas has been tested on GTX 480 and Tesla 2050 cards; there's actually very little difference in runtimes between the two...fears of a performance hit due to slow floating point on the 480 are bogus---it's a wicked fast card for the GPGPU stuff; you get an additional 32 CUDA cores in place of the faster double precision, and it's clocked much faster than the Tesla. The Tesla only really shines when you overclock the heck out of it; I ran it up to 1402 Mhz for the above test, at which point it is 15-20% faster than the GTX for the big Mersenne numbers. (It depends on the FFT length, though, and when the greater number of processors on the GTX are offset by slower double precision, which is only used in the FFTs anyway.)

Finishing off a paper on the topic, and will post a pre-print here in a week or so. I'll make the code available publicly as well, and maybe set up a tutorial webpage if folks are interested and if time permits.

msft 2010-12-07 20:14

Hi ,Andrew Thall
Congratulations ! :smile:

Mini-Geek 2010-12-07 20:39

When they use the same FFT lengths, how does the speed of this program compare to MacLucasFFTW? In any case, the flexibility of having non-power-of-2 FFTs makes it a very attractive choice compared to MacLucasFFTW.

CRGreathouse 2010-12-07 22:47

[QUOTE=Andrew Thall;240510]Finishing off a paper on the topic, and will post a pre-print here in a week or so. I'll make the code available publicly as well, and maybe set up a tutorial webpage if folks are interested and if time permits.[/QUOTE]

I'd love to see those if/when you get to them.

Uncwilly 2010-12-08 00:50

A verification run in 3 days!?!?! :w00t:

Brain 2010-12-08 00:57

Sounds great
 
We are very interested. I would buy a GTX 460 just for running your program. ;-) Verification in 3 days? Wow. What would CUDALucas have needed?

msft 2010-12-08 01:21

[QUOTE=Brain;240606] What would CUDALucas have needed?[/QUOTE]
9.04 (ms/iter) / 4.88 (msec) * 57.86 (hours) = 107.2 (hours) :smile:

ixfd64 2010-12-08 02:04

I'm usually a bit leery when a brand new user makes such a bold claim; after all, we do get a fair share of trolls and cranks here (for example, someone recently claimed to have written an OpenCL-enabled siever but following up after his second post).

However, I am 99% sure that this is legit because the OP in this thread seems to know what he is talking about. If the "gpuLucas" really works as claimed, it will greatly benefit the GIMPS community.

Mathew 2010-12-08 02:14

[URL="http://andrewthall.org/"]http://andrewthall.org/[/URL]

ixfd64 2010-12-08 02:52

[QUOTE=Mathew Steine;240625][URL="http://andrewthall.org/"]http://andrewthall.org/[/URL][/QUOTE]

This is the real deal, then!

No offense to msft, but it looks like that CUDALucas just got owned!

msft 2010-12-08 03:17

[QUOTE=ixfd64;240634]No offense to msft, but it looks like that CUDALucas just got owned![/QUOTE]
I can change name to "YLucas" ,"Y" is my Initial.:lol:


All times are UTC. The time now is 19:36.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.