mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware > GPU Computing

Reply
 
Thread Tools
Old 2017-02-11, 16:11   #2564
Lorenzo
 
Lorenzo's Avatar
 
Aug 2010
Republic of Belarus

2×5×17 Posts
Default

Oliver, could you please test for 332M.
Lorenzo is online now   Reply With Quote
Old 2017-02-12, 15:59   #2565
TheJudger
 
TheJudger's Avatar
 
"Oliver"
Mar 2005
Germany

2×3×5×37 Posts
Default

Quote:
Originally Posted by ATH View Post
Now there is a Nvidia Quadro GP100 coming in March:
http://www.anandtech.com/show/11102/...s-quadro-gp100

Unfortunately they only state FP64 = 1/2 FP32, not the actual numbers.
Same chip, even same number of cores enabled and I'm pretty sure there are no other difference so just scale based on clock rates. Keep in mind that the Quadro has lower TDP and thus less prohability to run at max boost clock.
For CUDALucas it is safe to assume the the TDP won't limit the clock rates (will post some numbers later).

Oliver
TheJudger is offline   Reply With Quote
Old 2017-02-12, 16:23   #2566
TheJudger
 
TheJudger's Avatar
 
"Oliver"
Mar 2005
Germany

21268 Posts
Default

Benchmark for FFTs from 8M to 32M:
Code:
Device              Tesla P100-PCIE-16GB
Compatibility       6.0
clockRate (MHz)     1328
memClockRate (MHz)  715

  fft    max exp  ms/iter
 8192  149447533   2.2734
 8232  150161509   2.7919
 8748  159365399   2.9310
 9000  163856051   2.9859
 9072  165138601   3.0350
 9216  167703023   3.0529
10976  198980129   3.1378
11200  202952693   3.8478
11664  211176269   3.9333
11907  215480183   4.0265
12000  217126817   4.0990
12250  221551991   4.1330
12348  223286171   4.1555
12500  225975263   4.1805
12544  226753511   4.1911
12800  231280639   4.2456
13122  236972111   4.3220
15552  279831199   4.3367
15680  282084599   5.2484
16000  287716357   5.2611
16384  294471259   5.2623
16807  301908293   5.6589
17496  314013451   5.7786
18000  322861793   5.9534
21952  392070229   6.0496
22400  399897793   7.5535
23328  416101459   7.9283
23814  424581893   7.9459
24500  436545821   8.0674
25088  446794913   8.1354
25600  455715121   8.4951
26244  466929581   8.5585
27000  480086839   8.8713
27648  491358173   9.0432
27783  493705637   9.4553
28672  509158127   9.5131
28800  511382147   9.7871
30375  538730923  10.2642
31104  551379091  10.3052
31752  562616531  10.3843
32000  566915989  10.4485
32768  580225813  10.5754
And as requested (won't continue, known composite (factor found)):
Code:
Starting M332192879 fft length = 21952K
|   Date     Time    |   Test Num     Iter        Residue        |    FFT   Error     ms/It     Time  |       ETA      Done   |
|  Feb 12  17:18:57  | M332192879     10000  0xa19043095e213f4c  | 21952K  0.00684   6.0656   60.65s  |  23:07:41:48   0.00%  |
|  Feb 12  17:19:58  | M332192879     20000  0xcb7bc66ac81b24be  | 21952K  0.00635   6.0665   60.66s  |  23:07:43:22   0.00%  |
|  Feb 12  17:20:59  | M332192879     30000  0x38e4cc517de8fda3  | 21952K  0.00647   6.0663   60.66s  |  23:07:42:50   0.00%  |
Power consumption (board power as reported by nvidia-smi) is around 180W-185W.

Oliver

Last fiddled with by TheJudger on 2017-02-12 at 16:24
TheJudger is offline   Reply With Quote
Old 2017-02-12, 19:24   #2567
Lorenzo
 
Lorenzo's Avatar
 
Aug 2010
Republic of Belarus

2·5·17 Posts
Default

Cool! Thank you, Oliver. Looks like an absolute record: ~6.0665 ms/iter and ETA ~23 days and 7 hours.
Lorenzo is online now   Reply With Quote
Old 2017-02-12, 21:38   #2568
TheJudger
 
TheJudger's Avatar
 
"Oliver"
Mar 2005
Germany

2×3×5×37 Posts
Default

Quote:
Originally Posted by Lorenzo View Post
Cool! Thank you, Oliver. Looks like an absolute record: ~6.0665 ms/iter and ETA ~23 days and 7 hours.
Guess performance per watt isn't that bad, too.
Quadro GP100 should beat absolute performance of this baby.
Tesla P100-SXM2 will be even faster (same chip, higher clockrate and 300W TDP)

On the other hand performance per money (hardware purchase) is on the lower end somehow.

Oliver

P.S. 1000th post for me!

Last fiddled with by TheJudger on 2017-02-12 at 21:39
TheJudger is offline   Reply With Quote
Old 2017-03-08, 18:17   #2569
TheJudger
 
TheJudger's Avatar
 
"Oliver"
Mar 2005
Germany

2×3×5×37 Posts
Default Need moar baaaaaaandwidth!

Linux, CUDALucas 2.05.1, CUDA 8.0 running './CUDALucas -cufftbench 2048 32768 20'
Code:
Device              Tesla P100-PCIE-12GB
Compatibility       6.0
clockRate (MHz)     1328
memClockRate (MHz)  715

  fft    max exp  ms/iter
 2048   38492887   0.8290
 2058   38676779   1.0388
 2592   48471289   1.0416
 2744   51250889   1.1145
 3136   58404433   1.2523
 3200   59570449   1.5077
 3240   60298969   1.5151
 3888   72075517   1.5259
 4096   75846319   1.5597
 5184   95507747   2.0062
 5488  100984691   2.3054
 5600  103000823   2.6145
 5832  107174381   2.6678
 6075  111541967   2.8229
 6125  112440191   2.8590
 6272  115080019   2.8595
 6400  117377567   2.8738
 7776  142017539   2.9085
 8000  146019329   3.0031
 8192  149447533   3.0191
 8640  157439981   3.9448
 8748  159365399   4.0645
 9072  165138601   4.1045
 9604  174608443   4.2650
 9800  178094491   4.3199
10976  198980129   4.4162
11200  202952693   5.1175
11664  211176269   5.2472
11907  215480183   5.3872
12150  219782179   5.4020
12250  221551991   5.4762
12544  226753511   5.4953
12800  231280639   5.6827
15552  279831199   5.7517
15625  281116351   7.0191
15876  285534331   7.0333
16384  294471259   7.0384
16807  301908293   7.7869
17150  307935821   7.8517
17280  310219633   7.8554
17496  314013451   8.0196
18144  325388893   8.2523
21952  392070229   8.3905
22400  399897793  10.1928
23328  416101459  10.6708
23814  424581893  10.6803
24300  433058579  10.7332
24500  436545821  10.7743
25088  446794913  10.9927
25600  455715121  11.3715
25920  461288279  11.7309
26244  466929581  12.0685
27216  483844577  12.3786
27648  491358173  12.5897
28224  501372343  12.7468
28672  509158127  13.1385
28800  511382147  13.2269
30375  538730923  13.6812
31104  551379091  13.8575
32000  566915989  13.9916
32768  580225813  14.1780
Code:
Starting M332192879 fft length = 21952K
|   Date     Time    |   Test Num     Iter        Residue        |    FFT   Error     ms/It     Time  |       ETA      Done   |
|  Mar 08  19:12:13  | M332192879     10000  0xa19043095e213f4c  | 21952K  0.00658   8.4420   84.42s  |  32:10:58:17   0.00%  |
|  Mar 08  19:13:37  | M332192879     20000  0xcb7bc66ac81b24be  | 21952K  0.00684   8.4417   84.41s  |  32:10:56:14   0.00%  |
|  Mar 08  19:15:02  | M332192879     30000  0x38e4cc517de8fda3  | 21952K  0.00696   8.4417   84.41s  |  32:10:54:27   0.00%  |
The Tesla P100-PCIE-12GB is identical to the P100-PCIE-16GB except that memory capacity and bandwidth is only 3/4. Seems like CUDALucas is memory bandwidth bound on that P100... 732 GB/s (16GB) vs. 549 GB/s (12GB).

Again no special tuning, just checkout sourcecode, compile and run benchmark.

Oliver
TheJudger is offline   Reply With Quote
Old 2017-03-09, 01:36   #2570
flashjh
 
flashjh's Avatar
 
"Jerry"
Nov 2011
Vancouver, WA

21438 Posts
Default

Nice
flashjh is offline   Reply With Quote
Old 2017-03-09, 09:09   #2571
Lorenzo
 
Lorenzo's Avatar
 
Aug 2010
Republic of Belarus

2·5·17 Posts
Default

Very sensitive to bandwidth. And results for 332M test are different more than 3/4 (or 25%).
Lorenzo is online now   Reply With Quote
Old 2017-03-23, 22:42   #2572
TheJudger
 
TheJudger's Avatar
 
"Oliver"
Mar 2005
Germany

2·3·5·37 Posts
Default stock 1080 Ti "Founders Edition"

Again Linux, CUDALucas 2.05.1, CUDA 8.0 running './CUDALucas -cufftbench 2048 32768 20'
Code:
Device              Graphics Device
Compatibility       6.1
clockRate (MHz)     1582
memClockRate (MHz)  5505

  fft    max exp  ms/iter
 2048   38492887   1.3294
 2160   40551479   1.5075
 2304   43194913   1.5292
 2592   48471289   1.6809
 2625   49075057   1.9591
 2700   50446621   1.9755
 2800   52274087   2.0140
 2916   54392209   2.0207
 3136   58404433   2.0756
 3240   60298969   2.2794
 3402   63247511   2.3988
 3584   66556463   2.4230
 3600   66847171   2.5569
 4096   75846319   2.6252
 4608   85111207   3.1588
 5184   95507747   3.5201
 5292   97454309   3.9470
 5600  103000823   4.0726
 5832  107174381   4.1691
 6144  112781477   4.3976
 6272  115080019   4.4866
 6480  118813021   4.7310
 6912  126558077   4.7951
 7168  131142761   5.0873
 7200  131715607   5.2188
 8192  149447533   5.4126
 8640  157439981   6.3776
 9216  167703023   6.4584
 9408  171120919   7.1689
 9600  174537299   7.2167
 9720  176671801   7.3573
10080  183071879   7.6530
10240  185914837   7.8567
10368  188188471   8.0238
10584  192023851   8.1263
10935  198252811   8.2703
11200  202952693   8.3223
11664  211176269   8.4596
12096  218826341   8.9538
12544  226753511   9.3923
12960  234109067   9.6682
13824  249369863  10.0077
14336  258403573  10.2316
14400  259532291  10.6340
15552  279831199  11.1631
16384  294471259  11.7147
18432  330441847  13.0869
18816  337176443  14.4504
19440  348113921  14.9802
20480  366326371  14.9975
20736  370806323  15.9969
21168  378363589  16.3615
23040  411074273  16.5980
23328  416101459  17.3808
24192  431175197  19.0066
25088  446794913  19.2028
25600  455715121  19.9979
27648  491358173  20.5194
28672  509158127  21.1422
28800  511382147  21.5948
32256  571353353  23.3608
32768  580225813  23.7932
Funny fact: driver version 378.13 doesn't know the name of the card...

Code:
Starting M332192879 fft length = 18432K
|   Date     Time    |   Test Num     Iter        Residue        |    FFT   Error     ms/It     Time  |       ETA      Done   |
|  Mar 23  23:37:01  | M332192879     10000  0xa19043095e213f4c  | 18432K  0.25000  12.5995  125.99s  |  48:10:35:53   0.00%  |
|  Mar 23  23:39:12  | M332192879     20000  0xcb7bc66ac81b24be  | 18432K  0.25000  13.0765  130.76s  |  49:08:34:04   0.00%  |
|  Mar 23  23:41:23  | M332192879     30000  0x38e4cc517de8fda3  | 18432K  0.25781  13.0962  130.96s  |  49:16:28:27   0.00%  |
Power consumption during this testrun hovers around 180W (board power as reported by nvidia driver).

Oliver
TheJudger is offline   Reply With Quote
Old 2017-03-30, 14:30   #2573
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

22·11·107 Posts
Default

[QUOTE=LaurV;301515] ... For mfaktc I solve this with batches, anyhow I should create some batches for CL, just in case

Code:
copy /b allresults.txt+cl0\result.txt
del cl0\results.txt
copy /b allresults.txt+cl1\result.txt
del cl1\results.txt
copy /b allresults.txt+cl2\result.txt
del cl2\results.txt
etc
and launch it from time to time...

That doesn't look very safe to me. This works, or you don't mind losing some results? (Copy is assumed to be successful. Note result.txt in copy versus results.txt in del.) (A little late nitpicking of my own ;) on not enough sleep to code myself.)
kriesel is online now   Reply With Quote
Old 2017-03-30, 21:11   #2574
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

22×11×107 Posts
Smile Adding logging

[QUOTE=Dubslow;302866]Sigh... it had been on my personal todo list to clean up a lot of the functions, which would be necessary to produce logging functionality... it'd probably take me a couple of days.

I wouldn't think adding the logging itself would take long. A few decades ago (pre-386!) when I was writing LL code the hard slow way in c, before I heard of Woltman, Crandall, prime95, etc., I had gotten in deep before deciding to add logging. One day bit the bullet and replaced all my printf's with dprintf, and made a little routine called dprintf to print whatever it got fed, to both stdout and a log file.
kriesel is online now   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
Don't DC/LL them with CudaLucas LaurV Data 131 2017-05-02 18:41
CUDALucas / cuFFT Performance on CUDA 7 / 7.5 / 8 Brain GPU Computing 13 2016-02-19 15:53
CUDALucas: which binary to use? Karl M Johnson GPU Computing 15 2015-10-13 04:44
settings for cudaLucas fairsky GPU Computing 11 2013-11-03 02:08
Trying to run CUDALucas on Windows 8 CP Rodrigo GPU Computing 12 2012-03-07 23:20

All times are UTC. The time now is 13:35.

Thu Nov 26 13:35:25 UTC 2020 up 77 days, 10:46, 3 users, load averages: 1.16, 1.47, 1.52

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2020, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.