20070502, 03:25  #1 
Jun 2006
Perth, Western Australia
2×5 Posts 
Me, and you, and GPGPU.
We have NVidia releasing their GPGPU programming tools:
http://developer.nvidia.com/page/tools.html And the 'Generic C' version of Prime: http://mersenne.org/freeware.htm ftp://ftp.netdoor.com/users/acurry/lucdwt.c Would it be possible for someone to use both of these to create a version of Prime for the 8 series GeForce? 
20070502, 07:36  #2 
May 2005
2·809 Posts 
Could someone create a "sticky" (or even subforum) for subject "Running Prime95 on graphics cards"? Eventually it will be possible+feasible

20070502, 16:08  #3 
∂^{2}ω=0
Sep 2002
República de California
2^{3}×1,229 Posts 
I suggest we wait until some clever programmer *shows* it to be feasible and effective to create a separate subforum. Until, we'll just have to put up with the monthly sequence of
New Thread: Hey! How about using grafixxx cards for prime searching?? Reply: These cards are singleprecision only, and thus are useless for GIMPS work. Have a nice day. (No offense to the thread starter intended it's just that we've had about a bazillion of these in the history of the project, and none has come to anything. That's not to say none ever will, but the proof is in the pudding, as they say.) I do seem to recall collecting a bunch of these threads (or links to a bunch of them) in a single thread somewhere a couple years back  let me see if I can find that... Last fiddled with by ewmayer on 20070502 at 16:09 
20070503, 04:21  #4 
"Richard B. Woods"
Aug 2002
Wisconsin USA
2^{2}×3×599 Posts 
Ernst, one of my standard answers is in http://mersenneforum.org/showthread.php?t=6852.
When the appropriately round tuit shows up on my doorstep, I'll compose a comprehensive answer to the GPU question, in the wiki, then link from here. 
20070503, 05:03  #5 
Jun 2003
231_{8} Posts 
Ernst,
I understand that the ATI and NVidia graphics cards that have been discussed before are 32 bit cards (i.e. single precision) and that Prime95 uses 64 bit (i.e. double precision) math. However, the PeakStream folks at www.peakstreaminc.com/reference/PeakStream_datasheet.pdf claim to be able to do C and C++ double precision math using single precision ATI 580 series graphics cards. Maybe they split the 64 bit double precision values into 2 high/low 32 bit single precision values and store these high/low "coupled" values in "coupled" 32 single precision registers. I think that their claims are worth a closer look and I hope you agree. Last fiddled with by RMAC9.5 on 20070503 at 05:05 
20070712, 22:42  #6 
Sep 2006
The Netherlands
13×53 Posts 
hi,
Just ordered an ATI 2900 card. Delivery time not sure yet. Intend to get floating point FFT to work at it. As soon as it works for proving primes, which might take a month after receiving card, i'll open source it. Might need some help with the floating point math as i'm not a math guy, but a programmer. Big problem is how to risky put in bits like George is doing in prime95. Have a basic C frame here worked out with some others that should work quite quickly for floating point and have some ideas on parallellizing it. That frame right now is using modulo integer math, which is quite a diff from floating point, especially with calculations how many bits you can put in at a given size. Optimizing the code very well might take years though, as there is very very very little hardware information available about how big caches are on those cards and what are the bottlenecks to avoid at those cards. Yet the future seems cool for those cards. That ATI 2900 card is so so much cheaper than the best 8800 cards, not to mention those special calculation cards (tesla if i remember well) that it is just not funny. Vincent p.s. can reach me at diep@xs4all.nl if you have some tips or questions Last fiddled with by diep on 20070712 at 22:43 
20070815, 17:09  #8  
∂^{2}ω=0
Sep 2002
República de California
2^{3}·1,229 Posts 
Quote:
But, if I read things correctly, there is no direct mapping between video memory and main memory. So how would a proposed LLtest [or other user program] make use of the GFX card? Basically, if I have C code, how do I build and run it on such specialpurpose hardware? Some links about that kind of codedevforGFX would be welcome. I have C code I could attempt to deploy any time, given access to the HW and needed build tools. 

20070815, 18:56  #9 
P90 years forever!
Aug 2002
Yeehaw, FL
2×3×1,193 Posts 
Ernst,
I think CUDA already has all the info you need. http://www.mersenneforum.org/showthread.php?t=7150 If you read all their documentation, please let us know your opinions on how difficult it will be to integrate into your C code. 
20070815, 19:03  #10  
Sep 2006
The Netherlands
13·53 Posts 
Quote:
Basically all graphic cards work with 128 bits floating point, which is 4 x 32 bits. So single precision. If they would modify that to double precision in *hardware*, it would have more like 4 billion transistors and get a lot lower clocked than the current roughly 500Mhz that the 8800 GTX gets. There is a very simple marketing model for this. That's gains versus returns. The gain of moving from single precision to double precision is that they sell a couple of thousands of cards more to dudes like us. The price they pay for that is lower yields and lower clock for the cpu. OpenGL/DirectX on paper could do with 16 bits floats, so 32 bits floats definitely do. When moving to double precision, nvidia's new card would be slower than their old card for directx 9 games, and a lot. So let's first sit wait and see whether they are prepared to lose billions of income, before believing a single rumour at a website. More interesting than whether the claim is true, is how much that 1 Tflop single precision has been lied about. We know that some other FFT implementations managed to get out of the 8800 GTX, which on paper delivers 0.5 Tflop according to Nvidia, just 50 Gflop single precision (compare a quad core intel gets double precision already nearly 40 Gflop). So just 10% performance out of it. Some bottlenecks in the card were there that caused that. Tesla, which amazingly also gets claimed to have 0.5 Tflop despite getting clocked more than 2x higher than that 8800 GTX, they claim to have removed that bottleneck. The real interesting thing will be to see whether normal graphics cards released for a cheap price, can get quite a tad more than 50 Gflop, say *more* than a factor 2. Of course so far it was easy to claim for ATI and Nvidia the sky as the limit for their cards in terms of gflops, as it is very hard to disprove it. Now that there are unified shaders this is possible to disprove and it seems Nvidia lies a tad harder than ATI so far, but i should hope to find out within a few months in how far ATI has been lying here with respect to the 0.5 Tflop that the 2900 is supposed to get. Vincent Last fiddled with by diep on 20070815 at 19:10 

20070815, 19:12  #11  
∂^{2}ω=0
Sep 2002
República de California
23150_{8} Posts 
Quote:
http://forums.nvidia.com/index.php?showtopic=36286 More reading for my upcoming weekandhalflong vacation. Based on a quick firstrunthrough browsing, it looks like the crux of the codeoptimization tasks will be the rather different granularity of parallelism the GFX cards use compared to typical multicore CPUs  I've been targeting coarsegrained , i.e. a relatively small number of threads (116) each crunching a realtively large MBsized data chunk as independently of the others as possible. GFX seems to be best for finegrained , much smaller data chunks, many more compute units workign in . Should make for an interesting problem. 

Thread Tools  
Similar Threads  
Thread  Thread Starter  Forum  Replies  Last Post 
Talk on gpuLucas at GPGPU4 Workshop in March  Andrew Thall  GPU Computing  6  20110203 14:46 
New GPGPU programming systems  dsouza123  Programming  1  20061117 21:54 