mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware > GPU Computing

Reply
 
Thread Tools
Old 2010-12-08, 03:24   #12
Uncwilly
6809 > 6502
 
Uncwilly's Avatar
 
"""""""""""""""""""
Aug 2003
101×103 Posts

100111000111112 Posts
Default

Quote:
Originally Posted by Mathew Steine View Post
Quote:
All your irrational base are ours.
Uncwilly is online now   Reply With Quote
Old 2010-12-08, 04:35   #13
msft
 
msft's Avatar
 
Jul 2009
Tokyo

10011000102 Posts
Default

Is "an old Cg Lucas-Lehmer implementation" mean CUDALucas ?
I can choice new name "YLucas" or "an old Cg Lucas-Lehmer implementation".
Who is Godfather ?
msft is offline   Reply With Quote
Old 2010-12-08, 04:37   #14
mdettweiler
A Sunny Moo
 
mdettweiler's Avatar
 
Aug 2007
USA (GMT-5)

3×2,083 Posts
Default

Quote:
Originally Posted by msft View Post
Is "an old Cg Lucas-Lehmer implementation" mean CUDALucas ?
I can choice new name "YLucas" or "an old Cg Lucas-Lehmer implementation".
Who is Godfather ?
I'd keep the current name for CUDALucas...since the new program is called gpuLucas, there shouldn't be too much trouble telling them apart.
mdettweiler is offline   Reply With Quote
Old 2010-12-08, 09:14   #15
frmky
 
frmky's Avatar
 
Jul 2003
So Cal

2·1,097 Posts
Default

Quote:
Originally Posted by Andrew Thall View Post
gpuLucas has been tested on GTX 480 and Tesla 2050 cards
Is it expected to work on compute 1.3 cards? I've got a Tesla S1070 that I could test it on.
frmky is online now   Reply With Quote
Old 2010-12-08, 14:30   #16
Andrew Thall
 
Dec 2010

108 Posts
Default

Certainly no intention of pwning anyone; this is purely research code, I was working from Crandall's original paper and with the understanding that other's had gotten it to work with non-powers of two, so I really don't know all the excellent work you all have done with cudaLucas and macLucasFFTW and such. Mainly why I did post this week...I can't finish my paper on this without mentioning other current work. --if anyone would care to summarize the principal players and their programs, you'll get a grateful acknowledgment, for sure.

I'll post some timing results today or tomorrow...I've got a Friday deadline so finishing off my time trials right now.

As to whether it'll work with 1.3 cards...the implementation is pretty transparent, so it may need one or two mods but will probably work with any card that has true double precision and can run CUDA 3.2, though it does depend on the recent Fermi cards for a lot of its efficiency. Note
that CUFFT has improved a lot in the most recent implementation, eliminating crippling bugs and substantially improving the non-power-of-two FFTs.

As to my credentials...no offense taken...I'm mainly an image-analysis guy, and these days teach undergrads, but I've been interested in Mersenne prime testing since 1995, when I was trying to parallelize LL for a Maspar MP-1. :) I was at Carolina in the late '90s when they were doing the original work with PixelFlow, so we were all excited about programmable graphics hardware. The obsolete Cg work from a few years back was using compiled shaders on 8800GT and 9800 cards, with my own homebrew extended-precision float-float FFTs and very baroque parallel carry-adds. Totally crazy, but perhaps y'all here might appreciate that. :)
Andrew Thall is offline   Reply With Quote
Old 2010-12-08, 16:14   #17
R.D. Silverman
 
R.D. Silverman's Avatar
 
Nov 2003

1D2416 Posts
Default

Quote:
Originally Posted by Andrew Thall View Post
I'd like to announce the implementation of a Lucas-Lehmer tester, gpuLucas, written in CUDA and running on Fermi-class NVidia cards. It's a full implementation of Crandall's IBDWT method and uses balanced integers and a few little tricks to make it fast on the GPU.

Example timing: demonstrated primality of M42643801 in 57.86 hours, at a rate of 4.88 msec per Lucas product. This used a DWT runlength of 2,359,296 = 218*32, taking advantage of good efficiency for CUFFT runlengths of powers of small primes. Maximum error was 1.8e-1.

gpuLucas has been tested on GTX 480 and Tesla 2050 cards; there's actually very little difference in runtimes between the two...fears of a performance hit due to slow floating point on the 480 are bogus---it's a wicked fast card for the GPGPU stuff; you get an additional 32 CUDA cores in place of the faster double precision, and it's clocked much faster than the Tesla. The Tesla only really shines when you overclock the heck out of it; I ran it up to 1402 Mhz for the above test, at which point it is 15-20% faster than the GTX for the big Mersenne numbers. (It depends on the FFT length, though, and when the greater number of processors on the GTX are offset by slower double precision, which is only used in the FFTs anyway.)

Finishing off a paper on the topic, and will post a pre-print here in a week or so. I'll make the code available publicly as well, and maybe set up a tutorial webpage if folks are interested and if time permits.
Truly awesome. Kudos.

Now, it needs to be publicized. I am sure many users will take advantage of
it, but they need to know about it, how to install, run, etc.

It should also be folded in to GIMPS.
R.D. Silverman is offline   Reply With Quote
Old 2010-12-08, 16:32   #18
ixfd64
Bemusing Prompter
 
ixfd64's Avatar
 
"Danny"
Dec 2002
California

19×127 Posts
Default

Research code or not, it's definitely very exciting. I certainly hope it'll find its way into Prime95 soon!

Also, I never doubted your work, so I hope you don't take it that way. Oh, and since nobody else has said it: welcome to the GIMPS forum!
ixfd64 is offline   Reply With Quote
Old 2010-12-09, 01:39   #19
Ken_g6
 
Ken_g6's Avatar
 
Jan 2005
Caught in a sieve

18B16 Posts
Default

Thanks, Andrew!

You also sound like the kind of person who would have the experience necessary to create an LLR test, considering George Woltman's requirements for such a test. Even a test for only small K's, as described in that post, would be of enormous benefit to PrimeGrid, the No Prime Left Behind search, and probably others as well.
Ken_g6 is offline   Reply With Quote
Old 2010-12-09, 02:46   #20
mdettweiler
A Sunny Moo
 
mdettweiler's Avatar
 
Aug 2007
USA (GMT-5)

3·2,083 Posts
Default

Quote:
Originally Posted by Ken_g6 View Post
Thanks, Andrew!

You also sound like the kind of person who would have the experience necessary to create an LLR test, considering George Woltman's requirements for such a test. Even a test for only small K's, as described in that post, would be of enormous benefit to PrimeGrid, the No Prime Left Behind search, and probably others as well.


I'm not sure entirely how much effort building a GPU LLR application would entail, but since LLR is an extension of LL, I imagine it could be at least partially derived from the existing application.

As Ken mentioned, such a program would be immensely beneficial to the many k*2^n+-1 prime search projects out there. I myself am an assistant admin at NPLB and would be glad to help with testing such an app. (Our main admin, Gary, has a GTX 460 that he bought both for sieving, which is already available for CUDA, and to help test prospective CUDA LLR programs. He's not particularly savvy with this stuff but I have remote access to the GPU machine and can run stuff on it as needed.)

Max

Last fiddled with by mdettweiler on 2010-12-09 at 02:46
mdettweiler is offline   Reply With Quote
Old 2010-12-09, 15:09   #21
Andrew Thall
 
Dec 2010

10002 Posts
Default

With regard the GPU LLR work; haven't looked at the sequential algorithms; based on George W.'s description, use of straightline in place of circular convolution and shift-add for modular reduction...actually sounds pretty close to my initial CUDA efforts on LL, before I dug into Crandall's paper and got a better handle on the IBDWT approach.

You'll pay the cost of the larger FFTs; shift-add modular reduction isn't too hard, but you'll also need a parallel scan-based carry-adder if you need fully resolved carries---I have a hotwired CUDPP that does carry-add and subtract with borrow, so that's doable. (I can ask Mark Harris if they'd like to include that in the standard CUDPP release.) The most recent gpuLucas forgoes that and uses a carry-save configuration to keep all computations local except for the FFTs themselves. Big time savings there.
Andrew Thall is offline   Reply With Quote
Old 2010-12-10, 00:09   #22
Oddball
 
Oddball's Avatar
 
May 2010

499 Posts
Default

Quote:
Originally Posted by Ken_g6 View Post
Speaking of that post, here's a quick summary of the arguments for both sides:

Pro-LLR GPU side:
1.) Allows people with GPUs freedom of choice. If an GPU program for LLR is developed, those with GPUs can choose to either sieve or test for primes.
2.) Allows for faster verification of large (>1 million digit) primes.
3.) GPU clients are not optimized yet, so there's more potential for improvement.
4.) GPUs are more energy efficient than old CPUs (Pentium 4's, Athlons, etc), judging by the amount of electricity needed to LLR one k/n pair.

Anti-LLR GPU side:
1.) Reduces the number of participants. Those without fast CPUs would be discouraged from participating since they would no longer be able to do a significant amount of "meaningful" LLR work (defined as LLR work that has a reasonable chance of getting into the top 5000 list).
2.) GPUs are much less effective at primality testing than at sieving or trial factoring. Computing systems should be used for what they are best at, so CPU users should stick to LLR tests and GPU users should stick to sieving and factoring.
3.) GPUs have a high power consumption (~400 watts for a GPU system vs. ~150 watts for a CPU system). Even when comparing power needed per primality test, they are less efficient than core i7's and other recent CPUs.
4.) GPUs have a higher error rate than CPUs. It's much easier to check factors than it is to check LLR residues, so GPUs should stay with doing trial division.
Oddball is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
mfaktc: a CUDA program for Mersenne prefactoring TheJudger GPU Computing 3509 2021-10-22 11:54
Do normal adults give themselves an allowance? (...to fast or not to fast - there is no question!) jasong jasong 35 2016-12-11 00:57
Find Mersenne Primes twice as fast? Derived Number Theory Discussion Group 24 2016-09-08 11:45
TPSieve CUDA Testing Thread Ken_g6 Twin Prime Search 52 2011-01-16 16:09
Fast calculations modulo small mersenne primes like M61 Dresdenboy Programming 10 2004-02-29 17:27

All times are UTC. The time now is 02:45.


Mon Oct 25 02:45:42 UTC 2021 up 93 days, 21:14, 0 users, load averages: 1.33, 0.94, 1.05

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.