mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware > GPU Computing

Reply
 
Thread Tools
Old 2010-12-07, 15:15   #1
Andrew Thall
 
Dec 2010

23 Posts
Default Fast Mersenne Testing on the GPU using CUDA

I'd like to announce the implementation of a Lucas-Lehmer tester, gpuLucas, written in CUDA and running on Fermi-class NVidia cards. It's a full implementation of Crandall's IBDWT method and uses balanced integers and a few little tricks to make it fast on the GPU.

Example timing: demonstrated primality of M42643801 in 57.86 hours, at a rate of 4.88 msec per Lucas product. This used a DWT runlength of 2,359,296 = 218*32, taking advantage of good efficiency for CUFFT runlengths of powers of small primes. Maximum error was 1.8e-1.

gpuLucas has been tested on GTX 480 and Tesla 2050 cards; there's actually very little difference in runtimes between the two...fears of a performance hit due to slow floating point on the 480 are bogus---it's a wicked fast card for the GPGPU stuff; you get an additional 32 CUDA cores in place of the faster double precision, and it's clocked much faster than the Tesla. The Tesla only really shines when you overclock the heck out of it; I ran it up to 1402 Mhz for the above test, at which point it is 15-20% faster than the GTX for the big Mersenne numbers. (It depends on the FFT length, though, and when the greater number of processors on the GTX are offset by slower double precision, which is only used in the FFTs anyway.)

Finishing off a paper on the topic, and will post a pre-print here in a week or so. I'll make the code available publicly as well, and maybe set up a tutorial webpage if folks are interested and if time permits.
Andrew Thall is offline   Reply With Quote
Old 2010-12-07, 20:14   #2
msft
 
msft's Avatar
 
Jul 2009
Tokyo

2×5×61 Posts
Default

Hi ,Andrew Thall
Congratulations !
msft is offline   Reply With Quote
Old 2010-12-07, 20:39   #3
Mini-Geek
Account Deleted
 
Mini-Geek's Avatar
 
"Tim Sorbera"
Aug 2006
San Antonio, TX USA

3·1,423 Posts
Default

When they use the same FFT lengths, how does the speed of this program compare to MacLucasFFTW? In any case, the flexibility of having non-power-of-2 FFTs makes it a very attractive choice compared to MacLucasFFTW.

Last fiddled with by Mini-Geek on 2010-12-07 at 20:40
Mini-Geek is offline   Reply With Quote
Old 2010-12-07, 22:47   #4
CRGreathouse
 
CRGreathouse's Avatar
 
Aug 2006

3·1,993 Posts
Default

Quote:
Originally Posted by Andrew Thall View Post
Finishing off a paper on the topic, and will post a pre-print here in a week or so. I'll make the code available publicly as well, and maybe set up a tutorial webpage if folks are interested and if time permits.
I'd love to see those if/when you get to them.
CRGreathouse is offline   Reply With Quote
Old 2010-12-08, 00:50   #5
Uncwilly
6809 > 6502
 
Uncwilly's Avatar
 
"""""""""""""""""""
Aug 2003
101×103 Posts

24·631 Posts
Default

A verification run in 3 days!?!?!
Uncwilly is online now   Reply With Quote
Old 2010-12-08, 00:57   #6
Brain
 
Brain's Avatar
 
Dec 2009
Peine, Germany

33110 Posts
Default Sounds great

We are very interested. I would buy a GTX 460 just for running your program. ;-) Verification in 3 days? Wow. What would CUDALucas have needed?

Last fiddled with by Brain on 2010-12-08 at 00:58
Brain is offline   Reply With Quote
Old 2010-12-08, 01:21   #7
msft
 
msft's Avatar
 
Jul 2009
Tokyo

10011000102 Posts
Default

Quote:
Originally Posted by Brain View Post
What would CUDALucas have needed?
9.04 (ms/iter) / 4.88 (msec) * 57.86 (hours) = 107.2 (hours)
msft is offline   Reply With Quote
Old 2010-12-08, 02:04   #8
ixfd64
Bemusing Prompter
 
ixfd64's Avatar
 
"Danny"
Dec 2002
California

32×269 Posts
Default

I'm usually a bit leery when a brand new user makes such a bold claim; after all, we do get a fair share of trolls and cranks here (for example, someone recently claimed to have written an OpenCL-enabled siever but following up after his second post).

However, I am 99% sure that this is legit because the OP in this thread seems to know what he is talking about. If the "gpuLucas" really works as claimed, it will greatly benefit the GIMPS community.

Last fiddled with by ixfd64 on 2010-12-08 at 02:06
ixfd64 is offline   Reply With Quote
Old 2010-12-08, 02:14   #9
Mathew
 
Mathew's Avatar
 
Nov 2009

2×52×7 Posts
Default

http://andrewthall.org/
Mathew is offline   Reply With Quote
Old 2010-12-08, 02:52   #10
ixfd64
Bemusing Prompter
 
ixfd64's Avatar
 
"Danny"
Dec 2002
California

32×269 Posts
Default

Quote:
Originally Posted by Mathew Steine View Post
This is the real deal, then!

No offense to msft, but it looks like that CUDALucas just got owned!
ixfd64 is offline   Reply With Quote
Old 2010-12-08, 03:17   #11
msft
 
msft's Avatar
 
Jul 2009
Tokyo

61010 Posts
Default

Quote:
Originally Posted by ixfd64 View Post
No offense to msft, but it looks like that CUDALucas just got owned!
I can change name to "YLucas" ,"Y" is my Initial.
msft is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
mfaktc: a CUDA program for Mersenne prefactoring TheJudger GPU Computing 3519 2021-11-23 23:45
Do normal adults give themselves an allowance? (...to fast or not to fast - there is no question!) jasong jasong 35 2016-12-11 00:57
Find Mersenne Primes twice as fast? Derived Number Theory Discussion Group 24 2016-09-08 11:45
TPSieve CUDA Testing Thread Ken_g6 Twin Prime Search 52 2011-01-16 16:09
Fast calculations modulo small mersenne primes like M61 Dresdenboy Programming 10 2004-02-29 17:27

All times are UTC. The time now is 18:30.


Sat Nov 27 18:30:28 UTC 2021 up 127 days, 12:59, 0 users, load averages: 1.38, 1.10, 1.11

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.