20110201, 18:37  #1 
Dec 2010
1000_{2} Posts 
Talk on gpuLucas at GPGPU4 Workshop in March
I'll be presenting a short paper describing the GPU LucasLehmer code I reported on last month. If anyone is in the LA area, the workshop is in conjunction with ASPLOS XVI conference; GPGPU4 will meet Saturday, March 5.
I'm doing revisions on the paper that I'll be presenting; I'd really like to say more about the other work that's been done on GPUbased Mersenne testing; if anyone could get in touch with me (thall) at my alma.edu address, I'd be grateful for more info on other systems, timing results, etc. I'll be glad to send you a copy of my current draft of the paper, and include credit for any information you provide in the final draft. Deadline is soon, however. I'll post the final preprint shortly, and I've been removing dependencies from the code and will distribute that as well. (You can email me about that, too, if you're interested.) 
20110202, 23:21  #2 
P90 years forever!
Aug 2002
Yeehaw, FL
2^{4}×17×29 Posts 
Is anyone helping out? If not, let's get to it folks!
Andrew, are there specific video cards and FFT lengths you are interested in? Are you at all interested in some nonGPU timings for additional data? If so, singlecore or allcoresworkingontheoneexponent? 
20110202, 23:33  #3 
Einyen
Dec 2003
Denmark
110011110001_{2} Posts 
I'm interested in helping with either compiling if I can or speedtesting. I have a Geforce GTX 460, and if you need CPU testing I can test on Core2Duo, Core2Quad, pentium4 (Prescott) and a laptop with Celeron (penryn).

20110203, 00:46  #4 
P90 years forever!
Aug 2002
Yeehaw, FL
2^{4}·17·29 Posts 
I read Andrew's post as wanting timings of CUDALucas on your GTX 460

20110203, 02:38  #5 
Dec 2010
1000_{2} Posts 
Thanks, all. I've had a few volunteers by email already...it is mainly the CUDALucas timings I need, but I think we've got it covered for now. Just looking at msec per Lucas iteration for given FFT sizes on Fermi architecture cards.
I pulled some single CPU times off the benchmark pages but would be interested in the sorts of speedups you get with multiple cores...I did some early experiments with multicore FFTW that left me less than thrilled, but that was a few years ago, and I'd like to hear how your welltuned FFTs perform. Too late to make it into this paper, though. Finally had a chance to dig into the CUDALucas source code...a very different method than my approach, which is your academic, massively dataparallel, digitperthread sort of technique. It's likely faster than mine for large N, but without nonpoweroftwo transforms, it's not going to be as fast for any given Mersenne number. I suspect its performance won't scale as rapidly with a higher multiprocessing levelmore coresbut we'll have to see when we get some better cards. Last fiddled with by Andrew Thall on 20110203 at 02:40 
20110203, 02:59  #6  
P90 years forever!
Aug 2002
Yeehaw, FL
2^{4}·17·29 Posts 
Quote:
P.S. I'm glad the community sent you the information you needed. I eagerly await studying your preprint. Last fiddled with by Prime95 on 20110203 at 03:01 

20110203, 14:46  #7  
Jun 2005
3·43 Posts 
Quote:


Thread Tools  
Similar Threads  
Thread  Thread Starter  Forum  Replies  Last Post 
gpuLucas and CUDALucas  source where?  Christenson  Information & Answers  2  20110321 23:31 
Me, and you, and GPGPU.  TehPenguin  Software  27  20081013 11:20 
CADO workshop on integer factorization  akruppa  Factoring  14  20080918 23:52 
New GPGPU programming systems  dsouza123  Programming  1  20061117 21:54 
[ANN] SHARCS'06 workshop  Tromer  Factoring  6  20060318 21:25 