mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Software

Reply
 
Thread Tools
Old 2007-05-02, 03:25   #1
TehPenguin
 
TehPenguin's Avatar
 
Jun 2006
Perth, Western Australia

2×5 Posts
Default Me, and you, and GPGPU.

We have NVidia releasing their GPGPU programming tools:
http://developer.nvidia.com/page/tools.html

And the 'Generic C' version of Prime:
http://mersenne.org/freeware.htm
ftp://ftp.netdoor.com/users/acurry/lucdwt.c

Would it be possible for someone to use both of these to create a version of Prime for the 8 series GeForce?
TehPenguin is offline   Reply With Quote
Old 2007-05-02, 07:36   #2
Cruelty
 
Cruelty's Avatar
 
May 2005

2·809 Posts
Default

Could someone create a "sticky" (or even subforum) for subject "Running Prime95 on graphics cards"? Eventually it will be possible+feasible
Cruelty is offline   Reply With Quote
Old 2007-05-02, 16:08   #3
ewmayer
2ω=0
 
ewmayer's Avatar
 
Sep 2002
República de California

23×1,229 Posts
Default

I suggest we wait until some clever programmer *shows* it to be feasible and effective to create a separate subforum. Until, we'll just have to put up with the monthly sequence of

New Thread: Hey! How about using grafixxx cards for prime searching??

Reply: These cards are single-precision only, and thus are useless for GIMPS work. Have a nice day.

(No offense to the thread starter intended- it's just that we've had about a bazillion of these in the history of the project, and none has come to anything. That's not to say none ever will, but the proof is in the pudding, as they say.)

I do seem to recall collecting a bunch of these threads (or links to a bunch of them) in a single thread somewhere a couple years back - let me see if I can find that...

Last fiddled with by ewmayer on 2007-05-02 at 16:09
ewmayer is online now   Reply With Quote
Old 2007-05-03, 04:21   #4
cheesehead
 
cheesehead's Avatar
 
"Richard B. Woods"
Aug 2002
Wisconsin USA

22×3×599 Posts
Default

Ernst, one of my standard answers is in http://mersenneforum.org/showthread.php?t=6852.

When the appropriately round tuit shows up on my doorstep, I'll compose a comprehensive answer to the GPU question, in the wiki, then link from here.
cheesehead is offline   Reply With Quote
Old 2007-05-03, 05:03   #5
RMAC9.5
 
RMAC9.5's Avatar
 
Jun 2003

2318 Posts
Default

Ernst,
I understand that the ATI and NVidia graphics cards that have been discussed before are 32 bit cards (i.e. single precision) and that Prime95 uses 64 bit (i.e. double precision) math. However, the PeakStream folks at www.peakstreaminc.com/reference/PeakStream_datasheet.pdf claim to be able to do C and C++ double precision math using single precision ATI 580 series graphics cards. Maybe they split the 64 bit double precision values into 2 high/low 32 bit single precision values and store these high/low "coupled" values in "coupled" 32 single precision registers. I think that their claims are worth a closer look and I hope you agree.

Last fiddled with by RMAC9.5 on 2007-05-03 at 05:05
RMAC9.5 is offline   Reply With Quote
Old 2007-07-12, 22:42   #6
diep
 
diep's Avatar
 
Sep 2006
The Netherlands

13×53 Posts
Default

hi,

Just ordered an ATI 2900 card. Delivery time not sure yet. Intend to get floating point FFT to work at it. As soon as it works for proving primes, which might take a month after receiving card, i'll open source it. Might need some help with the floating point math as i'm not a math guy, but a programmer. Big problem is how to risky put in bits like George is doing in prime95.

Have a basic C frame here worked out with some others that should work quite quickly for floating point and have some ideas on parallellizing it. That frame right now is using modulo integer math, which is quite a diff from floating point, especially with calculations how many bits you can put in at a given size.

Optimizing the code very well might take years though, as there is very very very little hardware information available about how big caches are on those cards and what are the bottlenecks to avoid at those cards.

Yet the future seems cool for those cards. That ATI 2900 card is so so much cheaper than the best 8800 cards, not to mention those special calculation cards (tesla if i remember well) that it is just not funny.

Vincent
p.s. can reach me at diep@xs4all.nl if you have some tips or questions

Last fiddled with by diep on 2007-07-12 at 22:43
diep is offline   Reply With Quote
Old 2007-08-15, 08:34   #7
Cruelty
 
Cruelty's Avatar
 
May 2005

110010100102 Posts
Default

Rumor has it that the next generation of nVidia cards will support double precision. Read here.
Cruelty is offline   Reply With Quote
Old 2007-08-15, 17:09   #8
ewmayer
2ω=0
 
ewmayer's Avatar
 
Sep 2002
República de California

23·1,229 Posts
Default

Quote:
Originally Posted by Cruelty View Post
Rumor has it that the next generation of nVidia cards will support double precision. Read here.
Well, that could finally start to get interesting - thanks for the link.

But, if I read things correctly, there is no direct mapping between video memory and main memory. So how would a proposed LL-test [or other user program] make use of the GFX card? Basically, if I have C code, how do I build and run it on such special-purpose hardware? Some links about that kind of codedev-for-GFX would be welcome. I have C code I could attempt to deploy any time, given access to the HW and needed build tools.
ewmayer is online now   Reply With Quote
Old 2007-08-15, 18:56   #9
Prime95
P90 years forever!
 
Prime95's Avatar
 
Aug 2002
Yeehaw, FL

2×3×1,193 Posts
Default

Ernst,

I think CUDA already has all the info you need.

http://www.mersenneforum.org/showthread.php?t=7150

If you read all their documentation, please let us know your opinions on how difficult it will be to integrate into your C code.
Prime95 is online now   Reply With Quote
Old 2007-08-15, 19:03   #10
diep
 
diep's Avatar
 
Sep 2006
The Netherlands

13·53 Posts
Default

Quote:
Originally Posted by Cruelty View Post
Rumor has it that the next generation of nVidia cards will support double precision. Read here.
I guess another victory for the marketeers. Most likely it won't have double precision floating point and the reviewer fell for the trick that nvidia has a software *emulation* of double precision floating point.

Basically all graphic cards work with 128 bits floating point, which is 4 x 32 bits.

So single precision.

If they would modify that to double precision in *hardware*, it would have more like 4 billion transistors and get a lot lower clocked than the current roughly 500Mhz that the 8800 GTX gets.

There is a very simple marketing model for this. That's gains versus returns. The gain of moving from single precision to double precision is that they sell a couple of thousands of cards more to dudes like us.

The price they pay for that is lower yields and lower clock for the cpu. OpenGL/DirectX on paper could do with 16 bits floats, so 32 bits floats definitely do. When moving to double precision, nvidia's new card would be slower than their old card for directx 9 games, and a lot.

So let's first sit wait and see whether they are prepared to lose billions of income, before believing a single rumour at a website.

More interesting than whether the claim is true, is how much that 1 Tflop single precision has been lied about.

We know that some other FFT implementations managed to get out of the 8800 GTX, which on paper delivers 0.5 Tflop according to Nvidia, just 50 Gflop single precision (compare a quad core intel gets double precision already nearly 40 Gflop).

So just 10% performance out of it.

Some bottlenecks in the card were there that caused that. Tesla, which amazingly also gets claimed to have 0.5 Tflop despite getting clocked more than 2x higher than that 8800 GTX, they claim to have removed that bottleneck.

The real interesting thing will be to see whether normal graphics cards released for a cheap price, can get quite a tad more than 50 Gflop, say *more* than a factor 2.

Of course so far it was easy to claim for ATI and Nvidia the sky as the limit for their cards in terms of gflops, as it is very hard to disprove it. Now that there are unified shaders this is possible to disprove and it seems Nvidia lies a tad harder than ATI so far, but i should hope to find out within a few months in how far ATI has been lying here with respect to the 0.5 Tflop that the 2900 is supposed to get.

Vincent

Last fiddled with by diep on 2007-08-15 at 19:10
diep is offline   Reply With Quote
Old 2007-08-15, 19:12   #11
ewmayer
2ω=0
 
ewmayer's Avatar
 
Sep 2002
República de California

231508 Posts
Default

Quote:
Originally Posted by Prime95 View Post
Ernst,

I think CUDA already has all the info you need.

http://www.mersenneforum.org/showthread.php?t=7150

If you read all their documentation, please let us know your opinions on how difficult it will be to integrate into your C code.
Yes, I found the same link by googling around just now:

http://forums.nvidia.com/index.php?showtopic=36286

More reading for my upcoming week-and-half-long vacation.

Based on a quick first-run-through browsing, it looks like the crux of the code-optimization tasks will be the rather different granularity of parallelism the GFX cards use compared to typical multicore CPUs - I've been targeting coarse-grained ||, i.e. a relatively small number of threads (1-16) each crunching a realtively large MB-sized data chunk as independently of the others as possible. GFX seems to be best for fine-grained ||, much smaller data chunks, many more compute units workign in ||. Should make for an interesting problem.
ewmayer is online now   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
Talk on gpuLucas at GPGPU-4 Workshop in March Andrew Thall GPU Computing 6 2011-02-03 14:46
New GPGPU programming systems dsouza123 Programming 1 2006-11-17 21:54

All times are UTC. The time now is 20:22.

Fri Nov 27 20:22:32 UTC 2020 up 78 days, 17:33, 3 users, load averages: 1.51, 1.86, 1.80

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2020, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.