mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   GPU Computing (https://www.mersenneforum.org/forumdisplay.php?f=92)
-   -   Fast Mersenne Testing on the GPU using CUDA (https://www.mersenneforum.org/showthread.php?t=14310)

axn 2010-12-10 02:14

[QUOTE=Oddball;241034]Speaking of that post, here's a quick summary of the arguments for both sides:[/quote]
That post describes the technical complexities of developing a suitable FFT for LLR. It doesn't deal with any "sides".

[QUOTE=Oddball;241034]Pro-LLR GPU side:
1.) Allows people with GPUs freedom of choice. If an GPU program for LLR is developed, those with GPUs can choose to either sieve or test for primes.
2.) Allows for faster verification of large (>1 million digit) primes.
3.) GPU clients are not optimized yet, so there's more potential for improvement.
4.) GPUs are more energy efficient than old CPUs (Pentium 4's, Athlons, etc), judging by the amount of electricity needed to LLR one k/n pair.

Anti-LLR GPU side:
1.) Reduces the number of participants. Those without fast CPUs would be discouraged from participating since they would no longer be able to do a significant amount of "meaningful" LLR work (defined as LLR work that has a reasonable chance of getting into the top 5000 list).
2.) GPUs are much less effective at primality testing than at sieving or trial factoring. Computing systems should be used for what they are best at, so CPU users should stick to LLR tests and GPU users should stick to sieving and factoring.
3.) GPUs have a high power consumption (~400 watts for a GPU system vs. ~150 watts for a CPU system). Even when comparing power needed per primality test, they are less efficient than core i7's and other recent CPUs.
4.) GPUs have a higher error rate than CPUs. It's much easier to check factors than it is to check LLR residues, so GPUs should stay with doing trial division.[/QUOTE]

Aside from you, I haven't actually seen any arguments advanced by any persons for not developing a primality testing program for GPU. One person hardly makes a "side". The arguments against a GPU-based LLR reads like a what's-what of fallacy files.

CRGreathouse 2010-12-10 02:24

[QUOTE=Oddball;241034]3.) GPUs have a high power consumption (~400 watts for a GPU system vs. ~150 watts for a CPU system). Even when comparing power needed per primality test, they are less efficient than core i7's and other recent CPUs.
4.) GPUs have a higher error rate than CPUs. It's much easier to check factors than it is to check LLR residues, so GPUs should stay with doing trial division.[/QUOTE]

Do we have numbers on those?

Oddball 2010-12-10 06:17

[quote]
[I]3.) GPUs have a high power consumption (~400 watts for a GPU system vs. ~150 watts for a CPU system). Even when comparing power needed per primality test, they are less efficient than core i7's and other recent CPUs.[/I]
[I]4.) GPUs have a higher error rate than CPUs. It's much easier to check factors than it is to check LLR residues, so GPUs should stay with doing trial division.[/I]
[/quote]
[QUOTE=CRGreathouse;241045]Do we have numbers on those?[/QUOTE]
There's this:
[URL]http://www.mersenneforum.org/showpost.php?p=213089&postcount=152[/URL]
"487W for GTX 295 under full load!"

The Phenom II system I have right now draws ~150 watts at full load.

The reference for claim #4 is here:
[URL]http://mersenneforum.org/showpost.php?p=238232&postcount=379[/URL]

"Consumer video cards are designed for gaming rather than technical computing, so they don't have as many error-checking features."
There's not enough data to provide more accurate figures.

Oddball 2010-12-10 06:27

[QUOTE=axn;241044]That post describes the technical complexities of developing a suitable FFT for LLR. It doesn't deal with any "sides".[/QUOTE]
In that post, the quote that Prime95 posted was referring to The Carnivore, who was describing the impatience of the pro-GPU side. I wasn't the first person who made the claim of different sides.

[quote]
Aside from you, I haven't actually seen any arguments advanced by any persons for not developing a primality testing program for GPU. One person hardly makes a "side".[/quote]
Here's another person with an anti-GPU point of view:
[URL]http://mersenneforum.org/showpost.php?p=231062&postcount=327[/URL]

Here's what George has to say:
[URL]http://mersenneforum.org/showpost.php?p=231172&postcount=339[/URL]

"if msft develops a CUDA LLR program then it will be modestly more powerful (in terms of throughput) than an i7 -- just like LL testing.
[B]From a project admin's point of view, he'd rather GPUs did sieving than primality testing[/B] as it seems a GPU will greatly exceed (as opposed to modestly exceed) the thoughput of an i7."

But I'm done debating this issue; it's been beaten to death, and none of the users involved are going to change their minds.

mdettweiler 2010-12-10 06:37

[QUOTE=Oddball;241055]"if msft develops a CUDA LLR program then it will be modestly more powerful (in terms of throughput) than an i7 -- just like LL testing.
[B]From a project admin's point of view, he'd rather GPUs did sieving than primality testing[/B] as it seems a GPU will greatly exceed (as opposed to modestly exceed) the thoughput of an i7."[/QUOTE]
The way I see it (from the perspective of a project admin) I figure it's nice to at least have the [i]ability[/i] to do both. As a case in point to why this would be of importance, currently NPLB and PrimeGrid are collaborating on a large (covering all k<10000) GPU sieving drive. What with the combined GPU resources of our two projects, we blew through the n<2M range in no time at all--and the 2M-3M range itself is itself moving very rapidly. Yet the primary leading edges of both projects' LLR testing are below n=1M. We won't get to some of this stuff for years, after which GPUs will likely be so much advanced that much of the work done now will be a drop in the bucket compared to the optimal depth relative to the GPUs of then.

Right now, the only work available from k*2^n+-1 prime search projects for GPUs is sieving. Thus, in order to keep the GPUs busy at all, we have to keep sieving farther and farther up in terms of n, which becomes increasingly suboptimal the further we depart from our LLR leading edge. If we had the option of putting those GPUs to work on LLR once everything needed in the forseeable future has been well-sieved, even if it's not quite the GPUs' forte, we could at least be using them for something that's needed, rather than effectively throwing away sieving work that can be done much more efficiently down the road.

Anyway, that's my $0.02...not trying to beat this to death on this end either.

MooMoo2 2010-12-10 06:58

[QUOTE=mdettweiler;241056]The way I see it (from the perspective of a project admin) I figure it's nice to at least have the [I]ability[/I] to do both. As a case in point to why this would be of importance, currently NPLB and PrimeGrid are collaborating on a large (covering all k<10000) GPU sieving drive. What with the combined GPU resources of our two projects, we blew through the n<2M range in no time at all--and the 2M-3M range itself is itself moving very rapidly. Yet the primary leading edges of both projects' LLR testing are below n=1M. We won't get to some of this stuff for years, after which GPUs will likely be so much advanced that much of the work done now will be a drop in the bucket compared to the optimal depth relative to the GPUs of then.

Right now, the only work available from k*2^n+-1 prime search projects for GPUs is sieving. Thus, in order to keep the GPUs busy at all, we have to keep sieving farther and farther up in terms of n, which becomes increasingly suboptimal the further we depart from our LLR leading edge. If we had the option of putting those GPUs to work on LLR once everything needed in the forseeable future has been well-sieved, even if it's not quite the GPUs' forte, we could at least be using them for something that's needed, rather than effectively throwing away sieving work that can be done much more efficiently down the road.
[/QUOTE]
You can direct the GPUs to the TPS forum if they're out of work :smile:

mdettweiler 2010-12-10 07:48

[QUOTE=MooMoo2;241058]You can direct the GPUs to the TPS forum if they're out of work :smile:[/QUOTE]
Indeed, that is an option. :smile: However, speaking solely from the perspective of a project admin (that is, trying to maximize the utilization of resources within my own project), it would seem worthwhile to have GPU LLR as an option--so that if (say) you have a participant who wants to contribute with his GPU at NPLB but is not particularly interested in TPS, he can still have useful work to do. (Or vice versa.)

CRGreathouse 2010-12-10 14:20

[QUOTE=Oddball;241054]There's this:
[URL]http://www.mersenneforum.org/showpost.php?p=213089&postcount=152[/URL]
"487W for GTX 295 under full load!"

The Phenom II system I have right now draws ~150 watts at full load.[/QUOTE]

I'm seeing 181 watts for the i7 under load. So for your claim "Even when comparing power needed per primality test, they are less efficient than core i7's and other recent CPUs" to hold, the GTX 295 needs to be less than 2.7 times faster than the i7 -- or 10.8 times faster than a single (physical) core. Is that so?

Oddball 2010-12-10 18:05

[QUOTE=CRGreathouse;241092]I'm seeing 181 watts for the i7 under load. So for your claim "Even when comparing power needed per primality test, they are less efficient than core i7's and other recent CPUs" to hold, the GTX 295 needs to be less than 2.7 times faster than the i7 -- or 10.8 times faster than a single (physical) core. Is that so?[/QUOTE]
Yes. See: [URL]http://mersenneforum.org/showpost.php?p=227433&postcount=293[/URL]

"in a worst-case (for the GPU) scenario, you still need all cores of your i7 working together to match its output! In a best-case scenario, it's closer to twice as fast as your CPU."

henryzz 2010-12-10 18:56

Please remember that your i7 cpu can be running as well as the GPU app on most cores without much more power comsumption.

Prime95 2010-12-10 19:16

[QUOTE=Oddball;241055]
[B]From a project admin's point of view, he'd rather GPUs did sieving than primality testing[/B] as it seems a GPU will greatly exceed (as opposed to modestly exceed) the thoughput of an i7."[/QUOTE]

This argument holds less water with this new CUDA program. As expected, IBDWT has halved the iteration times.

A different conclusion is also possible: Perhaps prime95's TF code is in need of optimization.


All times are UTC. The time now is 01:54.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2022, Jelsoft Enterprises Ltd.