mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware > GPU Computing

Reply
 
Thread Tools
Old 2010-12-10, 02:14   #23
axn
 
axn's Avatar
 
Jun 2003

5,179 Posts
Default

Quote:
Originally Posted by Oddball View Post
Speaking of that post, here's a quick summary of the arguments for both sides:
That post describes the technical complexities of developing a suitable FFT for LLR. It doesn't deal with any "sides".

Quote:
Originally Posted by Oddball View Post
Pro-LLR GPU side:
1.) Allows people with GPUs freedom of choice. If an GPU program for LLR is developed, those with GPUs can choose to either sieve or test for primes.
2.) Allows for faster verification of large (>1 million digit) primes.
3.) GPU clients are not optimized yet, so there's more potential for improvement.
4.) GPUs are more energy efficient than old CPUs (Pentium 4's, Athlons, etc), judging by the amount of electricity needed to LLR one k/n pair.

Anti-LLR GPU side:
1.) Reduces the number of participants. Those without fast CPUs would be discouraged from participating since they would no longer be able to do a significant amount of "meaningful" LLR work (defined as LLR work that has a reasonable chance of getting into the top 5000 list).
2.) GPUs are much less effective at primality testing than at sieving or trial factoring. Computing systems should be used for what they are best at, so CPU users should stick to LLR tests and GPU users should stick to sieving and factoring.
3.) GPUs have a high power consumption (~400 watts for a GPU system vs. ~150 watts for a CPU system). Even when comparing power needed per primality test, they are less efficient than core i7's and other recent CPUs.
4.) GPUs have a higher error rate than CPUs. It's much easier to check factors than it is to check LLR residues, so GPUs should stay with doing trial division.
Aside from you, I haven't actually seen any arguments advanced by any persons for not developing a primality testing program for GPU. One person hardly makes a "side". The arguments against a GPU-based LLR reads like a what's-what of fallacy files.
axn is offline   Reply With Quote
Old 2010-12-10, 02:24   #24
CRGreathouse
 
CRGreathouse's Avatar
 
Aug 2006

597910 Posts
Default

Quote:
Originally Posted by Oddball View Post
3.) GPUs have a high power consumption (~400 watts for a GPU system vs. ~150 watts for a CPU system). Even when comparing power needed per primality test, they are less efficient than core i7's and other recent CPUs.
4.) GPUs have a higher error rate than CPUs. It's much easier to check factors than it is to check LLR residues, so GPUs should stay with doing trial division.
Do we have numbers on those?
CRGreathouse is offline   Reply With Quote
Old 2010-12-10, 06:17   #25
Oddball
 
Oddball's Avatar
 
May 2010

499 Posts
Default

Quote:
3.) GPUs have a high power consumption (~400 watts for a GPU system vs. ~150 watts for a CPU system). Even when comparing power needed per primality test, they are less efficient than core i7's and other recent CPUs.
4.) GPUs have a higher error rate than CPUs. It's much easier to check factors than it is to check LLR residues, so GPUs should stay with doing trial division.
Quote:
Originally Posted by CRGreathouse View Post
Do we have numbers on those?
There's this:
http://www.mersenneforum.org/showpos...&postcount=152
"487W for GTX 295 under full load!"

The Phenom II system I have right now draws ~150 watts at full load.

The reference for claim #4 is here:
http://mersenneforum.org/showpost.ph...&postcount=379

"Consumer video cards are designed for gaming rather than technical computing, so they don't have as many error-checking features."
There's not enough data to provide more accurate figures.
Oddball is offline   Reply With Quote
Old 2010-12-10, 06:27   #26
Oddball
 
Oddball's Avatar
 
May 2010

1F316 Posts
Default

Quote:
Originally Posted by axn View Post
That post describes the technical complexities of developing a suitable FFT for LLR. It doesn't deal with any "sides".
In that post, the quote that Prime95 posted was referring to The Carnivore, who was describing the impatience of the pro-GPU side. I wasn't the first person who made the claim of different sides.

Quote:
Aside from you, I haven't actually seen any arguments advanced by any persons for not developing a primality testing program for GPU. One person hardly makes a "side".
Here's another person with an anti-GPU point of view:
http://mersenneforum.org/showpost.ph...&postcount=327

Here's what George has to say:
http://mersenneforum.org/showpost.ph...&postcount=339

"if msft develops a CUDA LLR program then it will be modestly more powerful (in terms of throughput) than an i7 -- just like LL testing.
From a project admin's point of view, he'd rather GPUs did sieving than primality testing as it seems a GPU will greatly exceed (as opposed to modestly exceed) the thoughput of an i7."

But I'm done debating this issue; it's been beaten to death, and none of the users involved are going to change their minds.
Oddball is offline   Reply With Quote
Old 2010-12-10, 06:37   #27
mdettweiler
A Sunny Moo
 
mdettweiler's Avatar
 
Aug 2007
USA (GMT-5)

3·2,083 Posts
Default

Quote:
Originally Posted by Oddball View Post
"if msft develops a CUDA LLR program then it will be modestly more powerful (in terms of throughput) than an i7 -- just like LL testing.
From a project admin's point of view, he'd rather GPUs did sieving than primality testing as it seems a GPU will greatly exceed (as opposed to modestly exceed) the thoughput of an i7."
The way I see it (from the perspective of a project admin) I figure it's nice to at least have the ability to do both. As a case in point to why this would be of importance, currently NPLB and PrimeGrid are collaborating on a large (covering all k<10000) GPU sieving drive. What with the combined GPU resources of our two projects, we blew through the n<2M range in no time at all--and the 2M-3M range itself is itself moving very rapidly. Yet the primary leading edges of both projects' LLR testing are below n=1M. We won't get to some of this stuff for years, after which GPUs will likely be so much advanced that much of the work done now will be a drop in the bucket compared to the optimal depth relative to the GPUs of then.

Right now, the only work available from k*2^n+-1 prime search projects for GPUs is sieving. Thus, in order to keep the GPUs busy at all, we have to keep sieving farther and farther up in terms of n, which becomes increasingly suboptimal the further we depart from our LLR leading edge. If we had the option of putting those GPUs to work on LLR once everything needed in the forseeable future has been well-sieved, even if it's not quite the GPUs' forte, we could at least be using them for something that's needed, rather than effectively throwing away sieving work that can be done much more efficiently down the road.

Anyway, that's my $0.02...not trying to beat this to death on this end either.
mdettweiler is offline   Reply With Quote
Old 2010-12-10, 06:58   #28
MooMoo2
 
MooMoo2's Avatar
 
Aug 2010

22×3×5×11 Posts
Default

Quote:
Originally Posted by mdettweiler View Post
The way I see it (from the perspective of a project admin) I figure it's nice to at least have the ability to do both. As a case in point to why this would be of importance, currently NPLB and PrimeGrid are collaborating on a large (covering all k<10000) GPU sieving drive. What with the combined GPU resources of our two projects, we blew through the n<2M range in no time at all--and the 2M-3M range itself is itself moving very rapidly. Yet the primary leading edges of both projects' LLR testing are below n=1M. We won't get to some of this stuff for years, after which GPUs will likely be so much advanced that much of the work done now will be a drop in the bucket compared to the optimal depth relative to the GPUs of then.

Right now, the only work available from k*2^n+-1 prime search projects for GPUs is sieving. Thus, in order to keep the GPUs busy at all, we have to keep sieving farther and farther up in terms of n, which becomes increasingly suboptimal the further we depart from our LLR leading edge. If we had the option of putting those GPUs to work on LLR once everything needed in the forseeable future has been well-sieved, even if it's not quite the GPUs' forte, we could at least be using them for something that's needed, rather than effectively throwing away sieving work that can be done much more efficiently down the road.
You can direct the GPUs to the TPS forum if they're out of work
MooMoo2 is offline   Reply With Quote
Old 2010-12-10, 07:48   #29
mdettweiler
A Sunny Moo
 
mdettweiler's Avatar
 
Aug 2007
USA (GMT-5)

3×2,083 Posts
Default

Quote:
Originally Posted by MooMoo2 View Post
You can direct the GPUs to the TPS forum if they're out of work
Indeed, that is an option. However, speaking solely from the perspective of a project admin (that is, trying to maximize the utilization of resources within my own project), it would seem worthwhile to have GPU LLR as an option--so that if (say) you have a participant who wants to contribute with his GPU at NPLB but is not particularly interested in TPS, he can still have useful work to do. (Or vice versa.)
mdettweiler is offline   Reply With Quote
Old 2010-12-10, 14:20   #30
CRGreathouse
 
CRGreathouse's Avatar
 
Aug 2006

3·1,993 Posts
Default

Quote:
Originally Posted by Oddball View Post
There's this:
http://www.mersenneforum.org/showpos...&postcount=152
"487W for GTX 295 under full load!"

The Phenom II system I have right now draws ~150 watts at full load.
I'm seeing 181 watts for the i7 under load. So for your claim "Even when comparing power needed per primality test, they are less efficient than core i7's and other recent CPUs" to hold, the GTX 295 needs to be less than 2.7 times faster than the i7 -- or 10.8 times faster than a single (physical) core. Is that so?
CRGreathouse is offline   Reply With Quote
Old 2010-12-10, 18:05   #31
Oddball
 
Oddball's Avatar
 
May 2010

499 Posts
Default

Quote:
Originally Posted by CRGreathouse View Post
I'm seeing 181 watts for the i7 under load. So for your claim "Even when comparing power needed per primality test, they are less efficient than core i7's and other recent CPUs" to hold, the GTX 295 needs to be less than 2.7 times faster than the i7 -- or 10.8 times faster than a single (physical) core. Is that so?
Yes. See: http://mersenneforum.org/showpost.ph...&postcount=293

"in a worst-case (for the GPU) scenario, you still need all cores of your i7 working together to match its output! In a best-case scenario, it's closer to twice as fast as your CPU."
Oddball is offline   Reply With Quote
Old 2010-12-10, 18:56   #32
henryzz
Just call me Henry
 
henryzz's Avatar
 
"David"
Sep 2007
Cambridge (GMT/BST)

3·1,979 Posts
Default

Please remember that your i7 cpu can be running as well as the GPU app on most cores without much more power comsumption.
henryzz is offline   Reply With Quote
Old 2010-12-10, 19:16   #33
Prime95
P90 years forever!
 
Prime95's Avatar
 
Aug 2002
Yeehaw, FL

11101111111012 Posts
Default

Quote:
Originally Posted by Oddball View Post
From a project admin's point of view, he'd rather GPUs did sieving than primality testing as it seems a GPU will greatly exceed (as opposed to modestly exceed) the thoughput of an i7."
This argument holds less water with this new CUDA program. As expected, IBDWT has halved the iteration times.

A different conclusion is also possible: Perhaps prime95's TF code is in need of optimization.
Prime95 is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
mfaktc: a CUDA program for Mersenne prefactoring TheJudger GPU Computing 3519 2021-11-23 23:45
Do normal adults give themselves an allowance? (...to fast or not to fast - there is no question!) jasong jasong 35 2016-12-11 00:57
Find Mersenne Primes twice as fast? Derived Number Theory Discussion Group 24 2016-09-08 11:45
TPSieve CUDA Testing Thread Ken_g6 Twin Prime Search 52 2011-01-16 16:09
Fast calculations modulo small mersenne primes like M61 Dresdenboy Programming 10 2004-02-29 17:27

All times are UTC. The time now is 00:04.


Tue Nov 30 00:04:05 UTC 2021 up 129 days, 18:33, 0 users, load averages: 1.34, 1.19, 1.20

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.