20101208, 03:24  #12  
6809 > 6502
"""""""""""""""""""
Aug 2003
101×103 Posts
2×3×1,709 Posts 
Quote:
Quote:


20101208, 04:35  #13 
Jul 2009
Tokyo
2×5×61 Posts 
Is "an old Cg LucasLehmer implementation" mean CUDALucas ?
I can choice new name "YLucas" or "an old Cg LucasLehmer implementation". Who is Godfather ? 
20101208, 04:37  #14 
A Sunny Moo
Aug 2007
USA (GMT5)
14151_{8} Posts 
I'd keep the current name for CUDALucas...since the new program is called gpuLucas, there shouldn't be too much trouble telling them apart.

20101208, 09:14  #15 
Jul 2003
So Cal
2^{4}×3×47 Posts 

20101208, 14:30  #16 
Dec 2010
2^{3} Posts 
Certainly no intention of pwning anyone; this is purely research code, I was working from Crandall's original paper and with the understanding that other's had gotten it to work with nonpowers of two, so I really don't know all the excellent work you all have done with cudaLucas and macLucasFFTW and such. Mainly why I did post this week...I can't finish my paper on this without mentioning other current work. if anyone would care to summarize the principal players and their programs, you'll get a grateful acknowledgment, for sure.
I'll post some timing results today or tomorrow...I've got a Friday deadline so finishing off my time trials right now. As to whether it'll work with 1.3 cards...the implementation is pretty transparent, so it may need one or two mods but will probably work with any card that has true double precision and can run CUDA 3.2, though it does depend on the recent Fermi cards for a lot of its efficiency. Note that CUFFT has improved a lot in the most recent implementation, eliminating crippling bugs and substantially improving the nonpoweroftwo FFTs. As to my credentials...no offense taken...I'm mainly an imageanalysis guy, and these days teach undergrads, but I've been interested in Mersenne prime testing since 1995, when I was trying to parallelize LL for a Maspar MP1. :) I was at Carolina in the late '90s when they were doing the original work with PixelFlow, so we were all excited about programmable graphics hardware. The obsolete Cg work from a few years back was using compiled shaders on 8800GT and 9800 cards, with my own homebrew extendedprecision floatfloat FFTs and very baroque parallel carryadds. Totally crazy, but perhaps y'all here might appreciate that. :) 
20101208, 16:14  #17  
Nov 2003
2^{2}·5·373 Posts 
Quote:
Now, it needs to be publicized. I am sure many users will take advantage of it, but they need to know about it, how to install, run, etc. It should also be folded in to GIMPS. 

20101208, 16:32  #18 
Bemusing Prompter
"Danny"
Dec 2002
California
11·13·17 Posts 
Research code or not, it's definitely very exciting. I certainly hope it'll find its way into Prime95 soon!
Also, I never doubted your work, so I hope you don't take it that way. Oh, and since nobody else has said it: welcome to the GIMPS forum! 
20101209, 01:39  #19 
Jan 2005
Caught in a sieve
5×79 Posts 
Thanks, Andrew!
You also sound like the kind of person who would have the experience necessary to create an LLR test, considering George Woltman's requirements for such a test. Even a test for only small K's, as described in that post, would be of enormous benefit to PrimeGrid, the No Prime Left Behind search, and probably others as well. 
20101209, 02:46  #20  
A Sunny Moo
Aug 2007
USA (GMT5)
3·2,083 Posts 
Quote:
I'm not sure entirely how much effort building a GPU LLR application would entail, but since LLR is an extension of LL, I imagine it could be at least partially derived from the existing application. As Ken mentioned, such a program would be immensely beneficial to the many k*2^n+1 prime search projects out there. I myself am an assistant admin at NPLB and would be glad to help with testing such an app. (Our main admin, Gary, has a GTX 460 that he bought both for sieving, which is already available for CUDA, and to help test prospective CUDA LLR programs. He's not particularly savvy with this stuff but I have remote access to the GPU machine and can run stuff on it as needed.) Max Last fiddled with by mdettweiler on 20101209 at 02:46 

20101209, 15:09  #21 
Dec 2010
2^{3} Posts 
With regard the GPU LLR work; haven't looked at the sequential algorithms; based on George W.'s description, use of straightline in place of circular convolution and shiftadd for modular reduction...actually sounds pretty close to my initial CUDA efforts on LL, before I dug into Crandall's paper and got a better handle on the IBDWT approach.
You'll pay the cost of the larger FFTs; shiftadd modular reduction isn't too hard, but you'll also need a parallel scanbased carryadder if you need fully resolved carriesI have a hotwired CUDPP that does carryadd and subtract with borrow, so that's doable. (I can ask Mark Harris if they'd like to include that in the standard CUDPP release.) The most recent gpuLucas forgoes that and uses a carrysave configuration to keep all computations local except for the FFTs themselves. Big time savings there. 
20101210, 00:09  #22  
May 2010
499 Posts 
Quote:
ProLLR GPU side: 1.) Allows people with GPUs freedom of choice. If an GPU program for LLR is developed, those with GPUs can choose to either sieve or test for primes. 2.) Allows for faster verification of large (>1 million digit) primes. 3.) GPU clients are not optimized yet, so there's more potential for improvement. 4.) GPUs are more energy efficient than old CPUs (Pentium 4's, Athlons, etc), judging by the amount of electricity needed to LLR one k/n pair. AntiLLR GPU side: 1.) Reduces the number of participants. Those without fast CPUs would be discouraged from participating since they would no longer be able to do a significant amount of "meaningful" LLR work (defined as LLR work that has a reasonable chance of getting into the top 5000 list). 2.) GPUs are much less effective at primality testing than at sieving or trial factoring. Computing systems should be used for what they are best at, so CPU users should stick to LLR tests and GPU users should stick to sieving and factoring. 3.) GPUs have a high power consumption (~400 watts for a GPU system vs. ~150 watts for a CPU system). Even when comparing power needed per primality test, they are less efficient than core i7's and other recent CPUs. 4.) GPUs have a higher error rate than CPUs. It's much easier to check factors than it is to check LLR residues, so GPUs should stay with doing trial division. 

Thread Tools  
Similar Threads  
Thread  Thread Starter  Forum  Replies  Last Post 
mfaktc: a CUDA program for Mersenne prefactoring  TheJudger  GPU Computing  3520  20211222 18:05 
Do normal adults give themselves an allowance? (...to fast or not to fast  there is no question!)  jasong  jasong  35  20161211 00:57 
Find Mersenne Primes twice as fast?  Derived  Number Theory Discussion Group  24  20160908 11:45 
TPSieve CUDA Testing Thread  Ken_g6  Twin Prime Search  52  20110116 16:09 
Fast calculations modulo small mersenne primes like M61  Dresdenboy  Programming  10  20040229 17:27 