2018-12-09, 19:54   #1
petrw1
1976 Toyota Corona years forever!

"Wayne"
Nov 2006

2·5·439 Posts
Ruminations about comparisons of "sizes of kits", conveniently rehashed once again (from Lucky13)

Quote:
 Originally Posted by Prime95 On Dec 7th, a new Mersenne prime was reported. I emailed the discoverer, obtained the last save file, and reran the last 100,000 iterations. Sure enough, we have a new prime! Aaron has started a full test using prime95, but that doesn't count as an official double-check. We need a volunteer to run CudaLucas and/or mlucas. As usual, the automatic email notification to myself and past Mersenne prime discoverers failed. Yes, the automatic email was debugged and tested after its last failure. Apparently bit rot set in. Fortunately the backup notification plan worked.
I just installed cudaLucas … just because.

I have a RTX-2080Ti. Is that good for cudaLucas or are other cards better?
Unless a good config file is important it was not NOT impressive.

Code:
>CUDALucas-2.03-cuda4.2-sm_30-x86-64.exe -t 86243

Warning: No ini file detected. Using defaults for non-specified options.
Starting M86243 fft length = 4608
Iteration 10000 M( 86243 )C, 0x23992ccd735a03d9, n = 4608, CUDALucas v2.03 err = 0.0131 (0:04 real, 0.4476 ms/iter, ETA 0:31)
Iteration 20000 M( 86243 )C, 0x89c58d63ebee7ad1, n = 4608, CUDALucas v2.03 err = 0.0144 (0:05 real, 0.4472 ms/iter, ETA 0:26)
Iteration 30000 M( 86243 )C, 0x8ad9ad7af5b51d09, n = 4608, CUDALucas v2.03 err = 0.0144 (0:04 real, 0.4472 ms/iter, ETA 0:22)
Iteration 40000 M( 86243 )C, 0xeed70124ff3b4f5a, n = 4608, CUDALucas v2.03 err = 0.0144 (0:05 real, 0.4482 ms/iter, ETA 0:17)
Iteration 50000 M( 86243 )C, 0x6ef44d2b23c538e1, n = 4608, CUDALucas v2.03 err = 0.0144 (0:04 real, 0.4410 ms/iter, ETA 0:13)
Iteration 60000 M( 86243 )C, 0x76f20516c9858691, n = 4608, CUDALucas v2.03 err = 0.0149 (0:04 real, 0.4298 ms/iter, ETA 0:08)
Iteration 70000 M( 86243 )C, 0x1c98576ef37a22df, n = 4608, CUDALucas v2.03 err = 0.0149 (0:05 real, 0.4433 ms/iter, ETA 0:04)
Iteration 80000 M( 86243 )C, 0x6809f7b9c9f1e33d, n = 4608, CUDALucas v2.03 err = 0.0149 (0:04 real, 0.4352 ms/iter, ETA 0:00)
M( 86243 )P, n = 4608, CUDALucas v2.03
Took 31 seconds...my CPU well under 1 second.

I'm willing to try is this COULD BE a good card for this.

2018-12-11, 00:02   #2
petrw1
1976 Toyota Corona years forever!

"Wayne"
Nov 2006

2·5·439 Posts

Quote:
 Originally Posted by kriesel If it still matters, you have my standing offer from some time ago. Current fastest hardware is GTX1080Ti ready with CUDALucas v2.06 May 5 2017 beta, capable of knocking out a ~82M exponent in about 3 days, backed up by a GTX1080, and a 16-core e5-2670 system with prime95 V29.4b8. The GTX1080Ti at 84+GhzD/day for 85M is ~10% faster than a 2080 Ti for LL per https://www.mersenne.ca/cudalucas.php (14th fastest model in the list, and that includes combined throughputs of some duals and a quad; 10th fastest single). And no, that 82M or 85M is not a clue to the nature of Mp51; it's just what I checked the 1080Ti's run time on recently. It would take me some time to put up mlucas on the system.
Any idea why the 2080Ti would be 3 times faster than the 1080Ti at TF but slower at LL/DC?

2018-12-11, 00:16   #3
chalsall
If I May

"Chris Halsall"
Sep 2002

100100011000102 Posts

Quote:
 Originally Posted by petrw1 Any idea why the 2080Ti would be 3 times faster than the 1080Ti at TF but slower at LL/DC?
You're joking. Correct?

2018-12-11, 00:36   #4
petrw1
1976 Toyota Corona years forever!

"Wayne"
Nov 2006

10001001001102 Posts

Quote:
 Originally Posted by chalsall You're joking. Correct?
No, really.

2018-12-11, 00:51   #5
kriesel

"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

22·3·7·53 Posts

Quote:
 Originally Posted by petrw1 Any idea why the 2080Ti would be 3 times faster than the 1080Ti at TF but slower at LL/DC?
Basic design differences, described somewhere in the RTX2080 thread https://www.mersenneforum.org/showthread.php?t=23592 far better than I could manage from memory

2018-12-11, 01:44   #6

"Kieren"
Jul 2011
In My Own Galaxy!

23·32·139 Posts

Quote:
 Originally Posted by petrw1 Any idea why the 2080Ti would be 3 times faster than the 1080Ti at TF but slower at LL/DC?
Look at the ratios of DP floating point performance. I would guess that the 2080Ti has stunning integer performance, but much more throttled DP than the 1080Ti.

2018-12-11, 02:54   #7
Serpentine Vermin Jar

Jul 2014

CCA16 Posts

Quote:
 Originally Posted by kladner Look at the ratios of DP floating point performance. I would guess that the 2080Ti has stunning integer performance, but much more throttled DP than the 1080Ti.
All I know is, if you want top-notch LL speed, that Tesla V100 is amazing. When ATH completed the verification run in an amazingly short time, my lust for a V100 was pretty high. I think (correct me if I'm mistaken) he used one on an Amazon EC2 instance, so that's definitely more reasonable for short term projects like testing a single exponent...

On my i7-7700K running Prime95 I think the estimate (running a single worker on all 4 cores) was about 6.5 days although I'm probably off by a couple hours one way or another... I don't remember the estimate it showed when it started. I still have over 3 days to go, and although it's been verified once already, I figure I'll finish it off, just to say I did.

2018-12-11, 04:04   #8
kriesel

"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

22×3×7×53 Posts

Quote:
 Originally Posted by Madpoo On my i7-7700K running Prime95 I think the estimate (running a single worker on all 4 cores) was about 6.5 days although I'm probably off by a couple hours one way or another... I don't remember the estimate it showed when it started. I still have over 3 days to go, and although it's been verified once already, I figure I'll finish it off, just to say I did.
For comparison, CUDALucas 2.06 LL on GTX1080Ti is 3 days at 82m exponent; gpuowl V5 PRP on an RX480 is an estimated 5.6+ days on 85m exponent.

2018-12-11, 05:45   #9
ATH
Einyen

Dec 2003
Denmark

22·3·5·72 Posts

Quote:
 Originally Posted by Madpoo All I know is, if you want top-notch LL speed, that Tesla V100 is amazing. When ATH completed the verification run in an amazingly short time, my lust for a V100 was pretty high. I think (correct me if I'm mistaken) he used one on an Amazon EC2 instance, so that's definitely more reasonable for short term projects like testing a single exponent...
Yes, the Tesla V100 is the fastest benchmarks we have seen for single exponents and yes it was running on EC2. Remember this is a $10K card and not for mere mortals, and the throughput per dollar sucks compared to "George's dream build" I do not think we have seen benchmarks from the latest generation(s) of crazy Xeon Platinum processors with 6-channel RAM. They probably beat a single V100 in total throughput on all cores, but not sure if they beat it on a single exponent. But those are way more expensive than$10K, those are in the "World Domination" territory.

Not far behind the Tesla V100 is the most expensive "consumer" card the $3K Titan V with 6900 GFLOPS FP64 performance vs Tesla V100 7450 GFLOPS. I'm do not think we have seen a Titan V benchmark here on the forum. Regarding trial factoring the Tesla V100 is doing 4200-5000 Ghz-d/day depending on exponent size and bit depth, it has ~ 14,000 GFLOPS FP32 performance vs Titan V 13,800 GFLOPS vs 2080Ti 11,750 GFLOPS. Trial factoring just got its own "Titan V" card with the$2.5K "Titan RTX". It has 16,300 GFLOPS FP32 so it should beat even the Tesla V100 in trial factoring (but NOT in LL speed, only 510 GFLOPS FP64).

2018-12-11, 06:12   #10
ATH
Einyen

Dec 2003
Denmark

22·3·5·72 Posts

Quote:
 Originally Posted by petrw1 Any idea why the 2080Ti would be 3 times faster than the 1080Ti at TF but slower at LL/DC?
According to these benchmarks the 2080Ti is faster:
https://mersenneforum.org/showpost.p...postcount=2690
https://mersenneforum.org/showpost.p...postcount=2572

For example at 4608K: 2.6288 ms/iter for 2080Ti vs 3.1588 ms/iter for 1080Ti Founders edition.

What surprises me (and saddens me a bit) is that they are both faster than my 8 core Haswell-E and faster than my old Titan Black even though they are so limited in DP performance.

I do not really understand it, the specs say they are supposed to be ~440 GFLOPS and ~350 GFLOPS of DP performance, and my old Titan Black is supposed to be ~1700 GFLOPS:
https://www.anandtech.com/show/13668...500-top-turing
https://en.wikipedia.org/wiki/GeForce_10_series
https://en.wikipedia.org/wiki/GeForce_700_series

2018-12-11, 07:30   #11
axn

Jun 2003

469810 Posts

Quote:
 Originally Posted by ATH I do not really understand it, the specs say they are supposed to be ~440 GFLOPS and ~350 GFLOPS of DP performance, and my old Titan Black is supposed to be ~1700 GFLOPS: https://www.anandtech.com/show/13668...500-top-turing https://en.wikipedia.org/wiki/GeForce_10_series https://en.wikipedia.org/wiki/GeForce_700_series
Memory bandwidth limited. 2080 Ti 616 GB/s, 1080 Ti 484 GB/s, Titan black 336 GB/s

