![]() |
![]() |
#1 |
"Bill Staffen"
Jan 2013
Pittsburgh, PA, USA
19B16 Posts |
![]()
So I'm pretty excited to have gotten colab to compile and execute gpuowl, and am happily chugging along at about 25% of the way through my first PRP (on gpuowl). Now, the output says it will take 20 hours to run this (111M range) and so I'm like, ok, that sounds pretty awesome, how many GHzD/Day is that? Some back of the napkin math tells me that it is about 600.
Now that card does TF at about 6x times that fast. Is it really that much slower to do PRPs on GPUs, or is it more likely that I've done a bad job compiling or configuring gpuowl? EDIT: For reference, the card is a Tesla V100-SXM2 Last fiddled with by Aramis Wyler on 2020-09-02 at 03:59 |
![]() |
![]() |
![]() |
#2 |
P90 years forever!
Aug 2002
Yeehaw, FL
11100111011002 Posts |
![]()
You cannot use Primenet GHzDays to compare TF and PRP efficiency. The GHz days formulas were set in stone based on how fast a 2008(?) Core2 Intel CPU performed these calculations. That CPU was (relatively speaking) good at LL, bad at TF. Thus, when an architecture was developed that was good at TF, the GHz-days credited to that architecture were inflated (compared to actual wall clock time invested).
Hope that made sense :) |
![]() |
![]() |
![]() |
#3 |
"Bill Staffen"
Jan 2013
Pittsburgh, PA, USA
3·137 Posts |
![]()
That does make sense, thank you - though now I feel like I'm abusing the Top Producers ranking every time I TF something.
Is there some other unit of work that is used to work out the value of TF bit depth vs P-1 vs the PRP itself? Flops? |
![]() |
![]() |
![]() |
#4 |
"Viliam Furík"
Jul 2018
Martin, Slovakia
2·13·17 Posts |
![]()
You can convert the GHz-D/D to FLOPS (FLoating Point OPerations per Second), by applying a simple formula: 500 GHz-D/D = 1 TFLOPS (1012 FLOPS). So if you have a GPU with TF performance say 2000 GHz-D/D, that's 4 TFLOPS in FP32 (single-precision floating-point operations).
|
![]() |
![]() |
![]() |
#5 |
"Bill Staffen"
Jan 2013
Pittsburgh, PA, USA
1100110112 Posts |
![]()
Interesting. I think we've already established that GHzD/Day aren't comparable with TF vs PRP, so is there a different calculation converting PRP GHzD/Days to Flops?
|
![]() |
![]() |
![]() |
#6 | |
P90 years forever!
Aug 2002
Yeehaw, FL
22×3×617 Posts |
![]() Quote:
To decide the correct TF level we compare "how many exponents can this hardware eliminate per day by TFing to 2^N" to "how many exponents can this hardware eliminate per day by PRPing". Since the above comparison is different for each piece of hardware we kind of guess as to the average piece of consumer hardware to determine our target TF levels. |
|
![]() |
![]() |
![]() |
#7 | |
"Viliam Furík"
Jul 2018
Martin, Slovakia
1BA16 Posts |
![]() Quote:
|
|
![]() |
![]() |
![]() |
#8 | |
If I May
"Chris Halsall"
Sep 2002
Barbados
224658 Posts |
![]() Quote:
Before this we /were/ just guessing, but with really absolutely no idea what was optimal. Please see the charts shown on each drill-down page from his GPU Lucas-Lehmer performance comparison chart. For example, for a Tesla V100 it ***used*** to be economically optimal to go to 77 "bits" at 92M or so. One of the exciting things about the project is development is ongoing. So the economically optimal cross-over points have changed several times over the years. Now that the Proof Mechnisim has been introduced, DCs will soon (read: in a few years) be obsolete, so the cross-over analysis will once again have to be revisited. We live in very interesting times!!! ![]() P.S. Oh, also... Optimal is something to be strived for, but difficult to achieve. Further complicating the calculus is different people like to do different things. Their kit, time, and electrons... P.P.S. Perfect is the enemy of good. |
|
![]() |
![]() |
![]() |
#9 |
6809 > 6502
"""""""""""""""""""
Aug 2003
101×103 Posts
19·499 Posts |
![]() |
![]() |
![]() |
![]() |
#10 |
Romulan Interpreter
Jun 2011
Thailand
33×347 Posts |
![]()
Yep. On James' graphic, the "PRP Line" has to be somewhere in the middle between "First LL Line" and "DC Line". The reason is that in the future, we will mostly do PRP+CERT, which is a bit more than a single LL, but less than two LLs. So, click on your hardware (GPU) and see where you are, and decide how high you have to go with TF with your hardware, to eliminate the exponents faster (wall clock time).
On the other hand, James, your filters are missing the newest cards (RTX30xx). |
![]() |
![]() |
![]() |
Thread Tools | |
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Simple problem with no obvious solution strategy | mathPuzzles | Math | 0 | 2019-02-16 11:08 |
Sorry if this is obvious | robert44444uk | Miscellaneous Math | 51 | 2018-06-18 15:23 |
A piece of information obvious in retrospect | fivemack | Factoring | 0 | 2014-05-01 07:08 |
Col. Chemistry, General Math & Capt. Obvious | Fusion_power | Puzzles | 10 | 2013-09-19 03:41 |
Area of Triangle, non-obvious case | Unregistered | Homework Help | 9 | 2012-01-19 12:26 |