mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware > GPU Computing

Reply
 
Thread Tools
Old 2010-09-23, 09:02   #331
nucleon
 
nucleon's Avatar
 
Mar 2003
Melbourne

5·103 Posts
Default

Historian - I've never seen someone reach so far to prove so little.

Please do not dismiss the good work msft has done. I think msft has done a fantastic job.

I'm sorry but there appears to be a number of people here that just can't accept that for 2^n FFTs LL tests, GPUs are unbeaten for time to result (latency), results per time (throughput) and results per cost (both upfront and ongoing costs).

-- Craig
nucleon is offline   Reply With Quote
Old 2010-09-23, 09:10   #332
ldesnogu
 
ldesnogu's Avatar
 
Jan 2008
France

25216 Posts
Default

I fully agree with Nucleon. msft, don't listen to Historian, he belongs to 20th century; keep up the good work!

Last fiddled with by ldesnogu on 2010-09-23 at 09:10
ldesnogu is offline   Reply With Quote
Old 2010-09-23, 11:52   #333
Mini-Geek
Account Deleted
 
Mini-Geek's Avatar
 
"Tim Sorbera"
Aug 2006
San Antonio, TX USA

10000101101112 Posts
Default

Quote:
Originally Posted by Historian View Post
The development of GPU clients for LLR is a terrible idea. It's like the Prisoner's Dilemma:
This is only valid if you think that the absolute size of the primes you're finding has no meaning, and that only the competition, their relative size to others' primes, matters. I personally don't agree with that. I'll happily crunch primes at a size decent for my hardware, (whatever that means to me for that moment) then upgrade hardware and upgrade my expectations. To me, where I place relative to others is a side effect, not the goal. Some people don't think this way and to them (probably including you), yes, adding GPUs just means higher costs for everyone.
There will always be people with better hardware that get more primes than other people. They are willing to pay more upfront and over time for that. Adding GPUs to the mix just makes a different sort of step up between different budgets.

Last fiddled with by Mini-Geek on 2010-09-23 at 11:58
Mini-Geek is offline   Reply With Quote
Old 2010-09-23, 13:50   #334
Prime95
P90 years forever!
 
Prime95's Avatar
 
Aug 2002
Yeehaw, FL

22·43·47 Posts
Default

Quote:
Originally Posted by The Carnivore View Post
Yes, we know that some of you want GPUs for k*2^n+/-1 numbers, so quit repeating it every few weeks.
Just so you know, it is a non-trivial task to go from a discrete weighted transform (DWT) that support Mersenne numbers to a DWT that works on k*2^n+/-1. In fact, a DWT can only support "small" k values (up to 50,000 or so).

To support all k values, you'll need to write C code or CUDA code to do the modular reduction at the same time as the carry propagation. This requires using FFTs that are twice the size as Mersenne numbers, zeroing the upper half of the FFT data. Thus, you can expect the LLR test time for a 12,500,000 bit number to be just a tad slower than the LL test time for a 25,000,000 bit number.
Prime95 is online now   Reply With Quote
Old 2010-09-23, 18:35   #335
MooMoo2
 
MooMoo2's Avatar
 
"Michael Kwok"
Aug 2010

2·337 Posts
Default

Quote:
Originally Posted by Historian View Post
Group B sees that their primes are quickly beginning to get wiped off the top 5000 list, so they buy GPUs and run them to prevent this from happening.
Quote:
adding GPUs just means higher costs for everyone
Quote:
they are now worse off. Members of group B each have to spend hundreds of dollars to get good GPUs, and the power consumption of both groups have more than tripled.
I don't think anything much will happen if there's a GPU LLR client. The primes found by CPUs won't be quickly wiped off the top 5000 list, and the non-GPU folks won't be rushing out to buy GPUs. Even the ones with good GPUs probably won't bother to run them except for the really diehard people.

More than four years ago, the only machine I had was a (single core) Pentium 4. I found a top 5000 prime within a few months, and another one several months later. People started using Core 2 Duos, and then came core 2 quads, Phenom II's, and Core i7's. Despite this, both primes are still on the top 5000 list today, and I expect them to stay there at least until the end of the year.

The difference between a Core i7 and a Pentium 4 is far greater than the difference between a Core i7 and a GPU. If the primes that I found back then are still on the top 5000 list today, I don't see why any primes found on my high-end CPU today will disappear from that list anytime soon.

Like I said before, the additional computing power would be so little that it would hardly be worth the effort to develop a LLR GPU client. As Prime 95 said, "you can expect the LLR test time for a 12,500,000 bit number to be just a tad slower than the LL test time for a 25,000,000 bit number", so a GPU wouldn't even be able to match a high-end quad core if all cores were running. As for beating 6-core processors? Forget it.

I don't have a CUDA capable GPU, and I wouldn't get one even if they were sold at the 99 cents store.
MooMoo2 is offline   Reply With Quote
Old 2010-09-23, 20:24   #336
mdettweiler
A Sunny Moo
 
mdettweiler's Avatar
 
Aug 2007
USA (GMT-5)

3·2,083 Posts
Default

Quote:
Originally Posted by MooMoo2 View Post
Like I said before, the additional computing power would be so little that it would hardly be worth the effort to develop a LLR GPU client. As Prime 95 said, "you can expect the LLR test time for a 12,500,000 bit number to be just a tad slower than the LL test time for a 25,000,000 bit number", so a GPU wouldn't even be able to match a high-end quad core if all cores were running. As for beating 6-core processors? Forget it.
Er...I think George was talking about a general limitation of the LLR test. That is, the situation of a 12,500,000 bit LLR test being a tad slower than a 25,000,000 bit LL test would hold true for CPUs as well.

And besides, this only kicks in for k>50000 or so. Most of the k*2^n-1 testing being done at this time is below that, so even if a CUDA LLR program only supported k<50000, it would still be immensely useful.
mdettweiler is offline   Reply With Quote
Old 2010-09-23, 21:18   #337
agent1
 
May 2010

22×3 Posts
Default

Quote:
Originally Posted by mdettweiler View Post
The first result is in from Gary's GTX 460:
M25652651
Amazing--a test that would have taken upward of 10 days on one core of a fast CPU took only a little over 2 days!
its a poachers dream
agent1 is offline   Reply With Quote
Old 2010-09-23, 23:07   #338
The Carnivore
 
The Carnivore's Avatar
 
Jun 2010

3×5×17 Posts
Default

Quote:
Originally Posted by mdettweiler View Post
Hey, keep it cool man...my most recent post was mainly to let people know that now I actually have a GPU with which to help test this stuff. It does change the situation a bit and thus it seemed to warrant a new post.
OK. It's not directed at you specifically, but I still don't understand the impatience and bugging of the pro-GPU side. Those examples I posted earlier weren't the only ones that showed the aggressive behavior of the pro-GPU side, there's also this post from another thread:
http://www.mersenneforum.org/showpos...83&postcount=3
Quote:
Would it be difficult to produce (if you haven't done so already) a version of LLR based on FFTW?
Not a lot of repetition there, but sensible posts like this one: http://www.mersenneforum.org/showpos...5&postcount=20
are getting slammed on:
http://www.mersenneforum.org/showpos...0&postcount=21
Quote:
So I guess these guys are employed by nVidia to spread "fraud"?

Now that's crazy, they got very nice speeds, and didn't get any money for that. They probably are stupid or are in fact employees of nVidia trying to spread bullsh.t
This isn't directed at one person specifically, but flooding other threads with the same request doesn't work, it makes people annoyed.

What's the rush? It's not like there's a lack of GPU work anyway - there's ppsieve, tpsieve, LL testing for mersenne numbers, and a trial division program that's used in Operation Billion Digits.
The Carnivore is offline   Reply With Quote
Old 2010-09-23, 23:28   #339
Prime95
P90 years forever!
 
Prime95's Avatar
 
Aug 2002
Yeehaw, FL

22·43·47 Posts
Default

Quote:
Originally Posted by mdettweiler View Post
Er...I think George was talking about a general limitation of the LLR test. That is, the situation of a 12,500,000 bit LLR test being a tad slower than a 25,000,000 bit LL test would hold true for CPUs as well.
Yes, if msft develops a CUDA LLR program then it will be modestly more powerful (in terms of throughput) than an i7 -- just like LL testing.

From a project admin's point of view, he'd rather GPUs did sieving than primality testing as it seems a GPU will greatly exceed (as opposed to modestly exceed) the thoughput of an i7.

In any event, we are all better off with GPUs doing useful work rather than sitting idle!
Prime95 is online now   Reply With Quote
Old 2010-09-24, 08:44   #340
ldesnogu
 
ldesnogu's Avatar
 
Jan 2008
France

2×33×11 Posts
Default

Quote:
Originally Posted by The Carnivore View Post
Not a lot of repetition there, but sensible posts like this one: http://www.mersenneforum.org/showpos...5&postcount=20
are getting slammed on:
http://www.mersenneforum.org/showpos...0&postcount=21

This isn't directed at one person specifically, but flooding other threads with the same request doesn't work, it makes people annoyed.
I'm not sure I get the relation here. I'm a GPU disbeliever (at least I don't buy the x100 speedups over CPU some "scientific" papers claim), so I won't be the one crying for some GPU code.

That being said, I think msft and TheJudger deserve respect for what they are doing. So when I read Vincent (aka Diep) post that seems to imply no amateur work has been done that shows GPU code faster than finely tuned CPU code that made me angry. I admit I was slightly over-reacting

Anyway, what Historian wrote about msft in this thread is not acceptable.
ldesnogu is offline   Reply With Quote
Old 2010-09-24, 17:55   #341
xilman
Bamboozled!
 
xilman's Avatar
 
"๐’‰บ๐’ŒŒ๐’‡ท๐’†ท๐’€ญ"
May 2003
Down not across

2×3×1,931 Posts
Default

Quote:
Originally Posted by ldesnogu View Post
I'm not sure I get the relation here. I'm a GPU disbeliever (at least I don't buy the x100 speedups over CPU some "scientific" papers claim), so I won't be the one crying for some GPU code.
My experience, for what it's worth, is that the speed-up lies between 0.3 and 50 times in the cases I've so far examined. That's comparing a Tesla C1060 to a 2.8GHz Xeon; clearly different hardware will probably compare differently. Problems which are essentially unparallelizable have solutions which tend to run more slowly on the GPU. Problems which match the GPU architecture especially well run much faster on that platform

Some crypto applications use only integer and logical operations on small word sizes are are embarassingly parallel. Examples include direct key search on simple block ciphers or LFSR-based stream ciphers, together with similar computations to build Hellman tables or rainbow tables. These typically run very quickly on a GPU.


Paul
xilman is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
Don't DC/LL them with CudaLucas LaurV Data 131 2017-05-02 18:41
CUDALucas / cuFFT Performance on CUDA 7 / 7.5 / 8 Brain GPU Computing 13 2016-02-19 15:53
CUDALucas: which binary to use? Karl M Johnson GPU Computing 15 2015-10-13 04:44
settings for cudaLucas fairsky GPU Computing 11 2013-11-03 02:08
Trying to run CUDALucas on Windows 8 CP Rodrigo GPU Computing 12 2012-03-07 23:20

All times are UTC. The time now is 23:18.


Fri Dec 9 23:18:48 UTC 2022 up 113 days, 20:47, 0 users, load averages: 0.76, 0.65, 0.76

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2022, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.

โ‰  ยฑ โˆ“ รท ร— ยท โˆ’ โˆš โ€ฐ โŠ— โŠ• โŠ– โŠ˜ โŠ™ โ‰ค โ‰ฅ โ‰ฆ โ‰ง โ‰จ โ‰ฉ โ‰บ โ‰ป โ‰ผ โ‰ฝ โŠ โŠ โŠ‘ โŠ’ ยฒ ยณ ยฐ
โˆ  โˆŸ ยฐ โ‰… ~ โ€– โŸ‚ โซ›
โ‰ก โ‰œ โ‰ˆ โˆ โˆž โ‰ช โ‰ซ โŒŠโŒ‹ โŒˆโŒ‰ โˆ˜ โˆ โˆ โˆ‘ โˆง โˆจ โˆฉ โˆช โจ€ โŠ• โŠ— ๐–• ๐–– ๐–— โŠฒ โŠณ
โˆ… โˆ– โˆ โ†ฆ โ†ฃ โˆฉ โˆช โŠ† โŠ‚ โŠ„ โŠŠ โŠ‡ โŠƒ โŠ… โŠ‹ โŠ– โˆˆ โˆ‰ โˆ‹ โˆŒ โ„• โ„ค โ„š โ„ โ„‚ โ„ต โ„ถ โ„ท โ„ธ ๐“Ÿ
ยฌ โˆจ โˆง โŠ• โ†’ โ† โ‡’ โ‡ โ‡” โˆ€ โˆƒ โˆ„ โˆด โˆต โŠค โŠฅ โŠข โŠจ โซค โŠฃ โ€ฆ โ‹ฏ โ‹ฎ โ‹ฐ โ‹ฑ
โˆซ โˆฌ โˆญ โˆฎ โˆฏ โˆฐ โˆ‡ โˆ† ฮด โˆ‚ โ„ฑ โ„’ โ„“
๐›ข๐›ผ ๐›ฃ๐›ฝ ๐›ค๐›พ ๐›ฅ๐›ฟ ๐›ฆ๐œ€๐œ– ๐›ง๐œ ๐›จ๐œ‚ ๐›ฉ๐œƒ๐œ— ๐›ช๐œ„ ๐›ซ๐œ… ๐›ฌ๐œ† ๐›ญ๐œ‡ ๐›ฎ๐œˆ ๐›ฏ๐œ‰ ๐›ฐ๐œŠ ๐›ฑ๐œ‹ ๐›ฒ๐œŒ ๐›ด๐œŽ๐œ ๐›ต๐œ ๐›ถ๐œ ๐›ท๐œ™๐œ‘ ๐›ธ๐œ’ ๐›น๐œ“ ๐›บ๐œ”