![]() |
![]() |
#12 | ||
6809 > 6502
"""""""""""""""""""
Aug 2003
101×103 Posts
23·1,327 Posts |
![]() Quote:
Quote:
![]() |
||
![]() |
![]() |
![]() |
#13 |
Jul 2009
Tokyo
2·5·61 Posts |
![]()
Is "an old Cg Lucas-Lehmer implementation" mean CUDALucas ?
I can choice new name "YLucas" or "an old Cg Lucas-Lehmer implementation". Who is Godfather ? ![]() |
![]() |
![]() |
![]() |
#14 |
A Sunny Moo
Aug 2007
USA (GMT-5)
3·2,083 Posts |
![]()
I'd keep the current name for CUDALucas...since the new program is called gpuLucas, there shouldn't be too much trouble telling them apart.
|
![]() |
![]() |
![]() |
#15 |
Jul 2003
So Cal
2·11·109 Posts |
![]() |
![]() |
![]() |
![]() |
#16 |
Dec 2010
23 Posts |
![]()
Certainly no intention of pwning anyone; this is purely research code, I was working from Crandall's original paper and with the understanding that other's had gotten it to work with non-powers of two, so I really don't know all the excellent work you all have done with cudaLucas and macLucasFFTW and such. Mainly why I did post this week...I can't finish my paper on this without mentioning other current work. --if anyone would care to summarize the principal players and their programs, you'll get a grateful acknowledgment, for sure.
I'll post some timing results today or tomorrow...I've got a Friday deadline so finishing off my time trials right now. As to whether it'll work with 1.3 cards...the implementation is pretty transparent, so it may need one or two mods but will probably work with any card that has true double precision and can run CUDA 3.2, though it does depend on the recent Fermi cards for a lot of its efficiency. Note that CUFFT has improved a lot in the most recent implementation, eliminating crippling bugs and substantially improving the non-power-of-two FFTs. As to my credentials...no offense taken...I'm mainly an image-analysis guy, and these days teach undergrads, but I've been interested in Mersenne prime testing since 1995, when I was trying to parallelize LL for a Maspar MP-1. :) I was at Carolina in the late '90s when they were doing the original work with PixelFlow, so we were all excited about programmable graphics hardware. The obsolete Cg work from a few years back was using compiled shaders on 8800GT and 9800 cards, with my own homebrew extended-precision float-float FFTs and very baroque parallel carry-adds. Totally crazy, but perhaps y'all here might appreciate that. :) |
![]() |
![]() |
![]() |
#17 | |
"Bob Silverman"
Nov 2003
North of Boston
11101001001002 Posts |
![]() Quote:
Now, it needs to be publicized. I am sure many users will take advantage of it, but they need to know about it, how to install, run, etc. It should also be folded in to GIMPS. |
|
![]() |
![]() |
![]() |
#18 |
Bemusing Prompter
"Danny"
Dec 2002
California
2,467 Posts |
![]()
Research code or not, it's definitely very exciting. I certainly hope it'll find its way into Prime95 soon!
![]() Also, I never doubted your work, so I hope you don't take it that way. Oh, and since nobody else has said it: welcome to the GIMPS forum! ![]() |
![]() |
![]() |
![]() |
#19 |
Jan 2005
Caught in a sieve
5×79 Posts |
![]()
Thanks, Andrew!
You also sound like the kind of person who would have the experience necessary to create an LLR test, considering George Woltman's requirements for such a test. Even a test for only small K's, as described in that post, would be of enormous benefit to PrimeGrid, the No Prime Left Behind search, and probably others as well. |
![]() |
![]() |
![]() |
#20 | |
A Sunny Moo
Aug 2007
USA (GMT-5)
3×2,083 Posts |
![]() Quote:
![]() I'm not sure entirely how much effort building a GPU LLR application would entail, but since LLR is an extension of LL, I imagine it could be at least partially derived from the existing application. As Ken mentioned, such a program would be immensely beneficial to the many k*2^n+-1 prime search projects out there. I myself am an assistant admin at NPLB and would be glad to help with testing such an app. (Our main admin, Gary, has a GTX 460 that he bought both for sieving, which is already available for CUDA, and to help test prospective CUDA LLR programs. He's not particularly savvy with this stuff but I have remote access to the GPU machine and can run stuff on it as needed.) Max ![]() Last fiddled with by mdettweiler on 2010-12-09 at 02:46 |
|
![]() |
![]() |
![]() |
#21 |
Dec 2010
10002 Posts |
![]()
With regard the GPU LLR work; haven't looked at the sequential algorithms; based on George W.'s description, use of straightline in place of circular convolution and shift-add for modular reduction...actually sounds pretty close to my initial CUDA efforts on LL, before I dug into Crandall's paper and got a better handle on the IBDWT approach.
You'll pay the cost of the larger FFTs; shift-add modular reduction isn't too hard, but you'll also need a parallel scan-based carry-adder if you need fully resolved carries---I have a hotwired CUDPP that does carry-add and subtract with borrow, so that's doable. (I can ask Mark Harris if they'd like to include that in the standard CUDPP release.) The most recent gpuLucas forgoes that and uses a carry-save configuration to keep all computations local except for the FFTs themselves. Big time savings there. |
![]() |
![]() |
![]() |
#22 | |
May 2010
49910 Posts |
![]() Quote:
Pro-LLR GPU side: 1.) Allows people with GPUs freedom of choice. If an GPU program for LLR is developed, those with GPUs can choose to either sieve or test for primes. 2.) Allows for faster verification of large (>1 million digit) primes. 3.) GPU clients are not optimized yet, so there's more potential for improvement. 4.) GPUs are more energy efficient than old CPUs (Pentium 4's, Athlons, etc), judging by the amount of electricity needed to LLR one k/n pair. Anti-LLR GPU side: 1.) Reduces the number of participants. Those without fast CPUs would be discouraged from participating since they would no longer be able to do a significant amount of "meaningful" LLR work (defined as LLR work that has a reasonable chance of getting into the top 5000 list). 2.) GPUs are much less effective at primality testing than at sieving or trial factoring. Computing systems should be used for what they are best at, so CPU users should stick to LLR tests and GPU users should stick to sieving and factoring. 3.) GPUs have a high power consumption (~400 watts for a GPU system vs. ~150 watts for a CPU system). Even when comparing power needed per primality test, they are less efficient than core i7's and other recent CPUs. 4.) GPUs have a higher error rate than CPUs. It's much easier to check factors than it is to check LLR residues, so GPUs should stay with doing trial division. |
|
![]() |
![]() |
![]() |
Thread Tools | |
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
mfaktc: a CUDA program for Mersenne prefactoring | TheJudger | GPU Computing | 3541 | 2022-04-21 22:37 |
Do normal adults give themselves an allowance? (...to fast or not to fast - there is no question!) | jasong | jasong | 35 | 2016-12-11 00:57 |
Find Mersenne Primes twice as fast? | Derived | Number Theory Discussion Group | 24 | 2016-09-08 11:45 |
TPSieve CUDA Testing Thread | Ken_g6 | Twin Prime Search | 52 | 2011-01-16 16:09 |
Fast calculations modulo small mersenne primes like M61 | Dresdenboy | Programming | 10 | 2004-02-29 17:27 |