20200902, 03:56  #1 
"Bill Staffen"
Jan 2013
Pittsburgh, PA, USA
19B_{16} Posts 
An obvious question (sorry).
So I'm pretty excited to have gotten colab to compile and execute gpuowl, and am happily chugging along at about 25% of the way through my first PRP (on gpuowl). Now, the output says it will take 20 hours to run this (111M range) and so I'm like, ok, that sounds pretty awesome, how many GHzD/Day is that? Some back of the napkin math tells me that it is about 600.
Now that card does TF at about 6x times that fast. Is it really that much slower to do PRPs on GPUs, or is it more likely that I've done a bad job compiling or configuring gpuowl? EDIT: For reference, the card is a Tesla V100SXM2 Last fiddled with by Aramis Wyler on 20200902 at 03:59 
20200902, 04:41  #2 
P90 years forever!
Aug 2002
Yeehaw, FL
1110011101100_{2} Posts 
You cannot use Primenet GHzDays to compare TF and PRP efficiency. The GHz days formulas were set in stone based on how fast a 2008(?) Core2 Intel CPU performed these calculations. That CPU was (relatively speaking) good at LL, bad at TF. Thus, when an architecture was developed that was good at TF, the GHzdays credited to that architecture were inflated (compared to actual wall clock time invested).
Hope that made sense :) 
20200902, 04:59  #3 
"Bill Staffen"
Jan 2013
Pittsburgh, PA, USA
3·137 Posts 
That does make sense, thank you  though now I feel like I'm abusing the Top Producers ranking every time I TF something.
Is there some other unit of work that is used to work out the value of TF bit depth vs P1 vs the PRP itself? Flops? 
20200902, 11:56  #4 
"Viliam Furík"
Jul 2018
Martin, Slovakia
2·13·17 Posts 
You can convert the GHzD/D to FLOPS (FLoating Point OPerations per Second), by applying a simple formula: 500 GHzD/D = 1 TFLOPS (10^{12} FLOPS). So if you have a GPU with TF performance say 2000 GHzD/D, that's 4 TFLOPS in FP32 (singleprecision floatingpoint operations).

20200902, 14:35  #5 
"Bill Staffen"
Jan 2013
Pittsburgh, PA, USA
110011011_{2} Posts 
Interesting. I think we've already established that GHzD/Day aren't comparable with TF vs PRP, so is there a different calculation converting PRP GHzD/Days to Flops?

20200902, 15:57  #6  
P90 years forever!
Aug 2002
Yeehaw, FL
2^{2}×3×617 Posts 
Quote:
To decide the correct TF level we compare "how many exponents can this hardware eliminate per day by TFing to 2^N" to "how many exponents can this hardware eliminate per day by PRPing". Since the above comparison is different for each piece of hardware we kind of guess as to the average piece of consumer hardware to determine our target TF levels. 

20200902, 16:07  #7  
"Viliam Furík"
Jul 2018
Martin, Slovakia
1BA_{16} Posts 
Quote:


20200902, 16:25  #8  
If I May
"Chris Halsall"
Sep 2002
Barbados
22465_{8} Posts 
Quote:
Before this we /were/ just guessing, but with really absolutely no idea what was optimal. Please see the charts shown on each drilldown page from his GPU LucasLehmer performance comparison chart. For example, for a Tesla V100 it ***used*** to be economically optimal to go to 77 "bits" at 92M or so. One of the exciting things about the project is development is ongoing. So the economically optimal crossover points have changed several times over the years. Now that the Proof Mechnisim has been introduced, DCs will soon (read: in a few years) be obsolete, so the crossover analysis will once again have to be revisited. We live in very interesting times!!! P.S. Oh, also... Optimal is something to be strived for, but difficult to achieve. Further complicating the calculus is different people like to do different things. Their kit, time, and electrons... P.P.S. Perfect is the enemy of good. 

20200902, 18:00  #9 
6809 > 6502
"""""""""""""""""""
Aug 2003
101×103 Posts
19·499 Posts 

20200903, 05:10  #10 
Romulan Interpreter
Jun 2011
Thailand
3^{3}×347 Posts 
Yep. On James' graphic, the "PRP Line" has to be somewhere in the middle between "First LL Line" and "DC Line". The reason is that in the future, we will mostly do PRP+CERT, which is a bit more than a single LL, but less than two LLs. So, click on your hardware (GPU) and see where you are, and decide how high you have to go with TF with your hardware, to eliminate the exponents faster (wall clock time).
On the other hand, James, your filters are missing the newest cards (RTX30xx). 
Thread Tools  
Similar Threads  
Thread  Thread Starter  Forum  Replies  Last Post 
Simple problem with no obvious solution strategy  mathPuzzles  Math  0  20190216 11:08 
Sorry if this is obvious  robert44444uk  Miscellaneous Math  51  20180618 15:23 
A piece of information obvious in retrospect  fivemack  Factoring  0  20140501 07:08 
Col. Chemistry, General Math & Capt. Obvious  Fusion_power  Puzzles  10  20130919 03:41 
Area of Triangle, nonobvious case  Unregistered  Homework Help  9  20120119 12:26 