mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware > GPU Computing > GpuOwl

Reply
 
Thread Tools
Old 2020-09-02, 03:56   #1
Aramis Wyler
 
Aramis Wyler's Avatar
 
"Bill Staffen"
Jan 2013
Pittsburgh, PA, USA

3·137 Posts
Default An obvious question (sorry).

So I'm pretty excited to have gotten colab to compile and execute gpuowl, and am happily chugging along at about 25% of the way through my first PRP (on gpuowl). Now, the output says it will take 20 hours to run this (111M range) and so I'm like, ok, that sounds pretty awesome, how many GHzD/Day is that? Some back of the napkin math tells me that it is about 600.


Now that card does TF at about 6x times that fast. Is it really that much slower to do PRPs on GPUs, or is it more likely that I've done a bad job compiling or configuring gpuowl?


EDIT: For reference, the card is a Tesla V100-SXM2

Last fiddled with by Aramis Wyler on 2020-09-02 at 03:59
Aramis Wyler is offline   Reply With Quote
Old 2020-09-02, 04:41   #2
Prime95
P90 years forever!
 
Prime95's Avatar
 
Aug 2002
Yeehaw, FL

22·1,873 Posts
Default

You cannot use Primenet GHzDays to compare TF and PRP efficiency. The GHz days formulas were set in stone based on how fast a 2008(?) Core2 Intel CPU performed these calculations. That CPU was (relatively speaking) good at LL, bad at TF. Thus, when an architecture was developed that was good at TF, the GHz-days credited to that architecture were inflated (compared to actual wall clock time invested).

Hope that made sense :)
Prime95 is offline   Reply With Quote
Old 2020-09-02, 04:59   #3
Aramis Wyler
 
Aramis Wyler's Avatar
 
"Bill Staffen"
Jan 2013
Pittsburgh, PA, USA

6338 Posts
Default

That does make sense, thank you - though now I feel like I'm abusing the Top Producers ranking every time I TF something.


Is there some other unit of work that is used to work out the value of TF bit depth vs P-1 vs the PRP itself? Flops?
Aramis Wyler is offline   Reply With Quote
Old 2020-09-02, 11:56   #4
Viliam Furik
 
"Viliam Furík"
Jul 2018
Martin, Slovakia

1CD16 Posts
Default

Quote:
Originally Posted by Aramis Wyler View Post
That does make sense, thank you - though now I feel like I'm abusing the Top Producers ranking every time I TF something.


Is there some other unit of work that is used to work out the value of TF bit depth vs P-1 vs the PRP itself? Flops?
You can convert the GHz-D/D to FLOPS (FLoating Point OPerations per Second), by applying a simple formula: 500 GHz-D/D = 1 TFLOPS (1012 FLOPS). So if you have a GPU with TF performance say 2000 GHz-D/D, that's 4 TFLOPS in FP32 (single-precision floating-point operations).
Viliam Furik is online now   Reply With Quote
Old 2020-09-02, 14:35   #5
Aramis Wyler
 
Aramis Wyler's Avatar
 
"Bill Staffen"
Jan 2013
Pittsburgh, PA, USA

3·137 Posts
Default

Interesting. I think we've already established that GHzD/Day aren't comparable with TF vs PRP, so is there a different calculation converting PRP GHzD/Days to Flops?
Aramis Wyler is offline   Reply With Quote
Old 2020-09-02, 15:57   #6
Prime95
P90 years forever!
 
Prime95's Avatar
 
Aug 2002
Yeehaw, FL

22·1,873 Posts
Default

Quote:
Originally Posted by Aramis Wyler View Post
Is there some other unit of work that is used to work out the value of TF bit depth vs P-1 vs the PRP itself? Flops?
The unit is "wall clock time".

To decide the correct TF level we compare "how many exponents can this hardware eliminate per day by TFing to 2^N" to "how many exponents can this hardware eliminate per day by PRPing".

Since the above comparison is different for each piece of hardware we kind of guess as to the average piece of consumer hardware to determine our target TF levels.
Prime95 is offline   Reply With Quote
Old 2020-09-02, 16:07   #7
Viliam Furik
 
"Viliam Furík"
Jul 2018
Martin, Slovakia

461 Posts
Default

Quote:
Originally Posted by Aramis Wyler View Post
Interesting. I think we've already established that GHzD/Day aren't comparable with TF vs PRP, so is there a different calculation converting PRP GHzD/Days to Flops?
Well, it's all the same, at least that's how the server treats those numbers when displaying TFLOPS throughput (here).
Viliam Furik is online now   Reply With Quote
Old 2020-09-02, 16:25   #8
chalsall
If I May
 
chalsall's Avatar
 
"Chris Halsall"
Sep 2002
Barbados

100101011010002 Posts
Default

Quote:
Originally Posted by Prime95 View Post
Since the above comparison is different for each piece of hardware we kind of guess as to the average piece of consumer hardware to determine our target TF levels.
Actually, it's a bit more than guesses... James stepped up many years ago to answer the question "Just where should we TF to?"

Before this we /were/ just guessing, but with really absolutely no idea what was optimal. Please see the charts shown on each drill-down page from his GPU Lucas-Lehmer performance comparison chart. For example, for a Tesla V100 it ***used*** to be economically optimal to go to 77 "bits" at 92M or so.

One of the exciting things about the project is development is ongoing. So the economically optimal cross-over points have changed several times over the years.

Now that the Proof Mechnisim has been introduced, DCs will soon (read: in a few years) be obsolete, so the cross-over analysis will once again have to be revisited.

We live in very interesting times!!!

P.S. Oh, also... Optimal is something to be strived for, but difficult to achieve. Further complicating the calculus is different people like to do different things. Their kit, time, and electrons...

P.P.S. Perfect is the enemy of good.
chalsall is online now   Reply With Quote
Old 2020-09-02, 18:00   #9
Uncwilly
6809 > 6502
 
Uncwilly's Avatar
 
"""""""""""""""""""
Aug 2003
101×103 Posts

61·157 Posts
Default

Quote:
Originally Posted by chalsall View Post
P.S. Oh, also... Optimal is something to be strived for, but difficult to achieve. Further complicating the calculus is different people like to do different things. Their kit, time, and electrons...
And throw in the volume of available work force.
Uncwilly is online now   Reply With Quote
Old 2020-09-03, 05:10   #10
LaurV
Romulan Interpreter
 
LaurV's Avatar
 
Jun 2011
Thailand

3·47·67 Posts
Default

Yep. On James' graphic, the "PRP Line" has to be somewhere in the middle between "First LL Line" and "DC Line". The reason is that in the future, we will mostly do PRP+CERT, which is a bit more than a single LL, but less than two LLs. So, click on your hardware (GPU) and see where you are, and decide how high you have to go with TF with your hardware, to eliminate the exponents faster (wall clock time).



On the other hand, James, your filters are missing the newest cards (RTX30xx).
LaurV is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
Simple problem with no obvious solution strategy mathPuzzles Math 0 2019-02-16 11:08
Sorry if this is obvious robert44444uk Miscellaneous Math 51 2018-06-18 15:23
A piece of information obvious in retrospect fivemack Factoring 0 2014-05-01 07:08
Col. Chemistry, General Math & Capt. Obvious Fusion_power Puzzles 10 2013-09-19 03:41
Area of Triangle, non-obvious case Unregistered Homework Help 9 2012-01-19 12:26

All times are UTC. The time now is 20:50.

Fri May 14 20:50:18 UTC 2021 up 36 days, 15:31, 0 users, load averages: 2.23, 2.16, 2.08

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.