![]() |
![]() |
#1794 |
"Mihai Preda"
Apr 2015
2·23·29 Posts |
![]()
Yes as I said, I tested (i.e. measured) with ROCm 2.10. Should not regress on other platforms, but I'm looking for feedback on this. If a regression is detected (e.g. on Nvidia) I'll switch the change on/off as appropriate.
|
![]() |
![]() |
![]() |
#1795 | |
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
37·131 Posts |
![]() Quote:
This is timely, as I was just considering rolling through a slew of gpu models with gpuowl minor updates and -use options timing script updates on PRP. |
|
![]() |
![]() |
![]() |
#1796 |
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
37·131 Posts |
![]()
Only ran -h so far, but here it is. This is the latest commit at the moment, that has Preda's P-1 stage 2 tweak.
|
![]() |
![]() |
![]() |
#1797 |
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
37·131 Posts |
![]() Code:
gpuowl v6.11-134 RX550 4GB Win7 x64 exponent 92400689 PRP 5M fft -iters 10000 -time NO_ASM 14491 NO_ASM,UNROLL_ALL 14492 NO_ASM,UNROLL_NONE 14364 NO_ASM,UNROLL_WIDTH 14363 NO_ASM,UNROLL_HEIGHT 14360 * NO_ASM,UNROLL_MIDDLEMUL1 14412 NO_ASM,UNROLL_MIDDLEMUL2 14363 NO_ASM,UNROLL_WIDTH,UNROLL_HEIGHT 14369 NO_ASM,UNROLL_WIDTH,UNROLL_MIDDLEMUL2 14363 NO_ASM,UNROLL_HEIGHT,UNROLL_MIDDLEMUL2 14361 NO_ASM,UNROLL_WIDTH,UNROLL_HEIGHT,UNROLL_MIDDLEMUL2 14362 NO_ASM,MERGED_MIDDLE,WORKINGIN 19729 NO_ASM,MERGED_MIDDLE,WORKINGIN 19730 NO_ASM,MERGED_MIDDLE,WORKINGIN1 14683 NO_ASM,MERGED_MIDDLE,WORKINGIN1A 14573 NO_ASM,MERGED_MIDDLE,WORKINGIN2 14849 NO_ASM,MERGED_MIDDLE,WORKINGIN3 15175 NO_ASM,MERGED_MIDDLE,WORKINGIN4 19404 NO_ASM,MERGED_MIDDLE,WORKINGIN5 14487 * NO_ASM,MERGED_MIDDLE,WORKINGOUT 32143 NO_ASM,MERGED_MIDDLE,WORKINGOUT0 17920 NO_ASM,MERGED_MIDDLE,WORKINGOUT1 14866 NO_ASM,MERGED_MIDDLE,WORKINGOUT1A 14825 NO_ASM,MERGED_MIDDLE,WORKINGOUT2 14395 * NO_ASM,MERGED_MIDDLE,WORKINGOUT3 14496 NO_ASM,MERGED_MIDDLE,WORKINGOUT4 15450 NO_ASM,MERGED_MIDDLE,WORKINGOUT5 15736 NO_ASM,MERGED_MIDDLE,%wkgin%,%wkgout%,T2_SHUFFLE_WIDTH 14554 NO_ASM,MERGED_MIDDLE,%wgkin%,%wkgout%,T2_SHUFFLE_MIDDLE 14319 * NO_ASM,MERGED_MIDDLE,%wkgin%,%wkgout%,T2_SHUFFLE_HEIGHT 14364 NO_ASM,MERGED_MIDDLE,%wkgin%,%wkgout%,T2_SHUFFLE_REVERSELINE 14394 NO_ASM,MERGED_MIDDLE,%wkgin%,%wkgout%,T2_SHUFFLE 14483 NO_ASM,MERGED_MIDDLE,%wgkin%,%wkgout%,T2_SHUFFLE_HEIGHT,T2_SHUFFLE_MIDDLE 14309 * NO_ASM,MERGED_MIDDLE,%wgkin%,%wkgout%,T2_SHUFFLE_HEIGHT,T2_SHUFFLE_REVERSELINE,T2_SHUFFLE_MIDDLE 18362 NO_ASM,MERGED_MIDDLE,%wgkin%,%wkgout%,T2_SHUFFLE_HEIGHT,T2_SHUFFLE_MIDDLE,CARRY32 14326 * NO_ASM,MERGED_MIDDLE,%wgkin%,%wkgout%,T2_SHUFFLE_HEIGHT,T2_SHUFFLE_REVERSELINE,T2_SHUFFLE_MIDDLE,CARRY64 14965 %allotheroptions%,FANCY_MIDDLEMUL1 14320 %allotheroptions%,MORE_SQUARES_MIDDLEMUL1 14398 %allotheroptions%,CHEBYSHEV_METHOD EE on load %allotheroptions%,CHEBYSHEV_METHOD_FMA EE on load %allotheroptions%,ORIGINAL_METHOD 14318 * %allotheroptions%,ORIGINAL_TWEAKED 14321 %allotheroptions%,ORIG_MIDDLEMUL2 14315 %allotheroptions%,CHEBYSHEV_MIDDLEMUL2 14309 * %allotheroptions%,ORIG_SLOWTRIG 14772 %allotheroptions%,NEW_SLOWTRIG 14306 %allotheroptions%,MORE_ACCURATE 14309 %allotheroptions%,LESS_ACCURATE 14184 * NO_ASM,UNROLL_HEIGHT,MERGED_MIDDLE,WORKINGIN5,WORKINGOUT2,T2_SHUFFLE_HEIGHT,T2_SHUFFLE_MIDDLE,CARRY32,ORIGINAL_METHOD,LESS_ACCURATE 14152 * |
![]() |
![]() |
![]() |
#1798 |
Sep 2002
Database er0rr
3,533 Posts |
![]()
I have just downloaded a bunch of world record PRPs and some end in 0 and others end in 2. Will "program":{"name":"gpuowl", "version":"v6.11-124-g267cc60"} perform P-1 automatically on those ending with "2" or do I need to upgrade gpuOwl?
|
![]() |
![]() |
![]() |
#1799 |
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
37·131 Posts |
![]() Code:
gpuowl v6.11-134-g1e0ce1d RX480 8GB Win7 x64 exponent 92162731 PRP 5M fft -iters 10000 -time NO_ASM 3372, 3374 NO_ASM,UNROLL_ALL 3375 NO_ASM,UNROLL_NONE 3349 NO_ASM,UNROLL_WIDTH 3351 NO_ASM,UNROLL_HEIGHT 3344 * NO_ASM,UNROLL_MIDDLEMUL1 3352 NO_ASM,UNROLL_MIDDLEMUL2 3373 NO_ASM,UNROLL_WIDTH,UNROLL_HEIGHT 3337 * NO_ASM,UNROLL_WIDTH,UNROLL_MIDDLEMUL2 3374 NO_ASM,UNROLL_HEIGHT,UNROLL_MIDDLEMUL2 3365 NO_ASM,UNROLL_WIDTH,UNROLL_HEIGHT,UNROLL_MIDDLEMUL2 3370 NO_ASM,MERGED_MIDDLE,WORKINGIN 5991 NO_ASM,MERGED_MIDDLE,WORKINGIN 6011 NO_ASM,MERGED_MIDDLE,WORKINGIN1 3397 * NO_ASM,MERGED_MIDDLE,WORKINGIN1A 3426 NO_ASM,MERGED_MIDDLE,WORKINGIN2 3478 NO_ASM,MERGED_MIDDLE,WORKINGIN3 3473 NO_ASM,MERGED_MIDDLE,WORKINGIN4 3821 NO_ASM,MERGED_MIDDLE,WORKINGIN5 3365 NO_ASM,MERGED_MIDDLE,WORKINGOUT 5835 NO_ASM,MERGED_MIDDLE,WORKINGOUT0 4543 NO_ASM,MERGED_MIDDLE,WORKINGOUT1 3352 * NO_ASM,MERGED_MIDDLE,WORKINGOUT1A 3384 NO_ASM,MERGED_MIDDLE,WORKINGOUT2 3739 NO_ASM,MERGED_MIDDLE,WORKINGOUT3 3365 NO_ASM,MERGED_MIDDLE,WORKINGOUT4 3468 NO_ASM,MERGED_MIDDLE,WORKINGOUT5 3427 NO_ASM,MERGED_MIDDLE,%wkgin%,%wkgout%,T2_SHUFFLE_WIDTH 3383 * NO_ASM,MERGED_MIDDLE,%wgkin%,%wkgout%,T2_SHUFFLE_MIDDLE 3394 NO_ASM,MERGED_MIDDLE,%wkgin%,%wkgout%,T2_SHUFFLE_HEIGHT 3390 NO_ASM,MERGED_MIDDLE,%wkgin%,%wkgout%,T2_SHUFFLE_REVERSELINE 3395 NO_ASM,MERGED_MIDDLE,%wkgin%,%wkgout%,T2_SHUFFLE 3436 set allotheroptions=NO_ASM,UNROLL_HEIGHT,UNROLL_WIDTH,WORKINGIN1,WORKINGOUT1 %allotheroptions%,T2_SHUFFLE_WIDTH,T2_SHUFFLE_HEIGHT 3341 * %allotheroptions%,T2_SHUFFLE_HEIGHT,T2_SHUFFLE_MIDDLE,T2_SHUFFLE_WIDTH 3353 %allotheroptions%,T2_SHUFFLE_HEIGHT,T2_SHUFFLE_MIDDLE,T2_SHUFFLE_WIDTH,T2_REVERSELINE 3351 set allotheroptions=NO_ASM,MERGED_MIDDLE,UNROLL_HEIGHT,UNROLL_WIDTH,WORKINGIN1,WORKINGOUT1,T2_SHUFFLE_WIDTH,T2_SHUFFLE_HEIGHT %allotheroptions%,CARRY32 3356 * %allotheroptions%,CARRY64 3479 set allotheroptions=NO_ASM,MERGED_MIDDLE,UNROLL_HEIGHT,UNROLL_WIDTH,WORKINGIN1,WORKINGOUT1,T2_SHUFFLE_WIDTH,T2_SHUFFLE_HEIGHT,CARRY32 %allotheroptions%,FANCY_MIDDLEMUL1 3349 %allotheroptions%,MORE_SQUARES_MIDDLEMUL1 3341 * %allotheroptions%,CHEBYSHEV_METHOD EE on load %allotheroptions%,CHEBYSHEV_METHOD_FMA 3350 %allotheroptions%,ORIGINAL_METHOD 3356 %allotheroptions%,ORIGINAL_TWEAKED 3348 %allotheroptions%,ORIG_MIDDLEMUL2 3434 %allotheroptions%,CHEBYSHEV_MIDDLEMUL2 3357 * %allotheroptions%,ORIG_SLOWTRIG EE %allotheroptions%,NEW_SLOWTRIG 3360 * %allotheroptions%,MORE_ACCURATE 3362 %allotheroptions%,LESS_ACCURATE EE NO_ASM,UNROLL_HEIGHT,UNROLL_WIDTH,MERGED_MIDDLE,WORKINGIN1,WORKINGOUT1,T2_SHUFFLE_HEIGHT,T2_SHUFFLE_WIDTH,CARRY32,MORE_SQUARES_MIDDLEMUL1,CHEBYSHEV_MIDDLEMUL2,NEW_SLOWTRIG |
![]() |
![]() |
![]() |
#1800 |
Feb 2005
Colorado
577 Posts |
![]()
Yes, that version is after the automatic P-1 commit, so you should be fine. However, the latest commit from 2 days ago implements a change that results in a 33% speed improvement in P-1 stage 2.
|
![]() |
![]() |
![]() |
#1801 |
Sep 2002
Database er0rr
3,533 Posts |
![]() |
![]() |
![]() |
![]() |
#1802 |
"Mihai Preda"
Apr 2015
53616 Posts |
![]()
Correction, it's 33% speed-up of one kernel (tailFusedMulDelta) that was taking up 45% of stage-2 time before. So it's more like a 12% speed-up of the stage2 (I hope).
|
![]() |
![]() |
![]() |
#1803 |
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
10010111011112 Posts |
![]() Code:
gpuowl v6.11-134-g1e0ce1d GTX1080 8GB Win7 x64 exponent 91996859 PRP 5M fft -iters 10000 -time NO_ASM 4541, 4560 NO_ASM,UNROLL_ALL 4542 NO_ASM,UNROLL_NONE BUILD_PROGRAM_FAILURE NO_ASM,UNROLL_WIDTH BUILD_PROGRAM_FAILURE NO_ASM,UNROLL_HEIGHT BUILD_PROGRAM_FAILURE NO_ASM,UNROLL_MIDDLEMUL1 BUILD_PROGRAM_FAILURE NO_ASM,UNROLL_MIDDLEMUL2 BUILD_PROGRAM_FAILURE NO_ASM,UNROLL_WIDTH,UNROLL_HEIGHT BUILD_PROGRAM_FAILURE NO_ASM,UNROLL_WIDTH,UNROLL_MIDDLEMUL2 BUILD_PROGRAM_FAILURE NO_ASM,UNROLL_HEIGHT,UNROLL_MIDDLEMUL2 BUILD_PROGRAM_FAILURE NO_ASM,UNROLL_WIDTH,UNROLL_HEIGHT,UNROLL_MIDDLEMUL2 4554 NO_ASM,MERGED_MIDDLE,WORKINGIN 4590 NO_ASM,MERGED_MIDDLE,WORKINGIN 5006 NO_ASM,MERGED_MIDDLE,WORKINGIN1 4574 NO_ASM,MERGED_MIDDLE,WORKINGIN1A 4666 NO_ASM,MERGED_MIDDLE,WORKINGIN2 4541 NO_ASM,MERGED_MIDDLE,WORKINGIN3 4548 NO_ASM,MERGED_MIDDLE,WORKINGIN4 4539 * NO_ASM,MERGED_MIDDLE,WORKINGIN5 4594 NO_ASM,MERGED_MIDDLE,WORKINGOUT 4615 NO_ASM,MERGED_MIDDLE,WORKINGOUT0 4622 NO_ASM,MERGED_MIDDLE,WORKINGOUT1 4587 NO_ASM,MERGED_MIDDLE,WORKINGOUT1A 4654 NO_ASM,MERGED_MIDDLE,WORKINGOUT2 4614 NO_ASM,MERGED_MIDDLE,WORKINGOUT3 4587 NO_ASM,MERGED_MIDDLE,WORKINGOUT4 4555 * NO_ASM,MERGED_MIDDLE,WORKINGOUT5 4602 NO_ASM,MERGED_MIDDLE,WORKINGIN4,WORKINGOUT4 4533 * NO_ASM,MERGED_MIDDLE,%wkgin%,%wkgout%,T2_SHUFFLE_WIDTH 4599 NO_ASM,MERGED_MIDDLE,%wgkin%,%wkgout%,T2_SHUFFLE_MIDDLE 4646 NO_ASM,MERGED_MIDDLE,%wkgin%,%wkgout%,T2_SHUFFLE_HEIGHT 4591 NO_ASM,MERGED_MIDDLE,%wkgin%,%wkgout%,T2_SHUFFLE_REVERSELINE 4605 NO_ASM,MERGED_MIDDLE,%wkgin%,%wkgout%,T2_SHUFFLE 4548 set allotheroptions=NO_ASM,UNROLL_ALL,WORKINGIN4,WORKINGOUT4 %allotheroptions%,T2_SHUFFLE_WIDTH,T2_SHUFFLE_HEIGHT 4537 %allotheroptions%,T2_SHUFFLE_HEIGHT,T2_SHUFFLE_MIDDLE,T2_SHUFFLE_WIDTH 4558 %allotheroptions%,T2_SHUFFLE_HEIGHT,T2_SHUFFLE_MIDDLE,T2_SHUFFLE_WIDTH,T2_REVERSELINE 4517 * set allotheroptions=NO_ASM,MERGED_MIDDLE,UNROLL_ALL,WORKINGIN4,WORKINGOUT4,T2_SHUFFLE_HEIGHT,T2_SHUFFLE_MIDDLE,T2_SHUFFLE_WIDTH,T2_REVERSELINE %allotheroptions%,CARRY32 4698 %allotheroptions%,CARRY64 4559 * set allotheroptions=NO_ASM,MERGED_MIDDLE,UNROLL_HEIGHT,UNROLL_WIDTH,WORKINGIN1,WORKINGOUT1,T2_SHUFFLE_HEIGHT,T2_SHUFFLE_MIDDLE,T2_SHUFFLE_WIDTH,T2_REVERSELINE,CARRY32 %allotheroptions%,FANCY_MIDDLEMUL1 4531 %allotheroptions%,MORE_SQUARES_MIDDLEMUL1 4542 %allotheroptions%,CHEBYSHEV_METHOD 4492 %allotheroptions%,CHEBYSHEV_METHOD_FMA 4483 * %allotheroptions%,ORIGINAL_METHOD 4546 %allotheroptions%,ORIGINAL_TWEAKED 4579 %allotheroptions%,ORIG_MIDDLEMUL2 4518 %allotheroptions%,CHEBYSHEV_MIDDLEMUL2 4445 * %allotheroptions%,ORIG_SLOWTRIG 4552 %allotheroptions%,NEW_SLOWTRIG 4447 %allotheroptions%,MORE_ACCURATE 4438 %allotheroptions%,LESS_ACCURATE 4428 * -use NO_ASM,UNROLL_ALL,MERGED_MIDDLE,WORKINGIN4,WORKINGOUT4,T2_SHUFFLE_HEIGHT,T2_SHUFFLE_MIDDLE,T2_SHUFFLE_WIDTH,T2_REVERSELINE,CARRY64,CHEBYSHEV_METHOD_FMA,CHEBYSHEV_MIDDLEMUL2,LESS_ACCURATE 4550/4428 =~ 1.0276 |
![]() |
![]() |
![]() |
#1804 |
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
37×131 Posts |
![]()
Substantial tuning gained about 3% above program defaults.
Code:
gpuowl v6.11-134-g1e0ce1d Radeon VII 16GB at 1244Mhz gpu clock, 880Mhz memory clock Win10 x64 exponent 92561231 PRP 5M fft -iters 10000 -time NO_ASM 1017 NO_ASM 1014 NO_ASM,UNROLL_ALL 1015 NO_ASM,UNROLL_NONE 1001 NO_ASM,UNROLL_WIDTH 1002 NO_ASM,UNROLL_HEIGHT 1002 NO_ASM,UNROLL_MIDDLEMUL1 1013 NO_ASM,UNROLL_MIDDLEMUL2 989 * NO_ASM,MERGED_MIDDLE,WORKINGIN 1393 NO_ASM,MERGED_MIDDLE,WORKINGIN 1391 NO_ASM,MERGED_MIDDLE,WORKINGIN1 1035 NO_ASM,MERGED_MIDDLE,WORKINGIN1A 1032 NO_ASM,MERGED_MIDDLE,WORKINGIN2 1038 NO_ASM,MERGED_MIDDLE,WORKINGIN3 1023 NO_ASM,MERGED_MIDDLE,WORKINGIN4 1081 NO_ASM,MERGED_MIDDLE,WORKINGIN5 1010 * NO_ASM,MERGED_MIDDLE,WORKINGOUT 1177 NO_ASM,MERGED_MIDDLE,WORKINGOUT0 1133 NO_ASM,MERGED_MIDDLE,WORKINGOUT1 1028 NO_ASM,MERGED_MIDDLE,WORKINGOUT1A 1058 NO_ASM,MERGED_MIDDLE,WORKINGOUT2 1117 NO_ASM,MERGED_MIDDLE,WORKINGOUT3 1011 * NO_ASM,MERGED_MIDDLE,WORKINGOUT4 1042 NO_ASM,MERGED_MIDDLE,WORKINGOUT5 1026 set wkgin=WORKINGIN5 set wkgout=WORKINGOUT3 NO_ASM,UNROLL_WIDTH,UNROLL_HEIGHT 1000 NO_ASM,UNROLL_WIDTH,UNROLL_MIDDLEMUL2 987 NO_ASM,UNROLL_HEIGHT,UNROLL_MIDDLEMUL2 989 NO_ASM,UNROLL_WIDTH,UNROLL_HEIGHT,UNROLL_MIDDLEMUL2 986 * NO_ASM,MERGED_MIDDLE,%wkgin%,%wkgout%,T2_SHUFFLE_WIDTH 1017 NO_ASM,MERGED_MIDDLE,%wgkin%,%wkgout%,T2_SHUFFLE_MIDDLE 1022 NO_ASM,MERGED_MIDDLE,%wkgin%,%wkgout%,T2_SHUFFLE_HEIGHT 1012 NO_ASM,MERGED_MIDDLE,%wkgin%,%wkgout%,T2_SHUFFLE_REVERSELINE 1011 * NO_ASM,MERGED_MIDDLE,%wkgin%,%wkgout%,T2_SHUFFLE 1029 set allotheroptions=NO_ASM,MERGED_MIDDLE,UNROLL_HEIGHT,UNROLL_WIDTH,WORKINGIN5,WORKINGOUT3 %allotheroptions%,T2_SHUFFLE_REVERSELINE,T2_SHUFFLE_HEIGHT 1014 * %allotheroptions%,T2_SHUFFLE_REVERSELINE,T2_SHUFFLE_WIDTH 1016 %allotheroptions%,T2_SHUFFLE_HEIGHT,T2_SHUFFLE_WIDTH,T2_SHUFFLE_REVERSELINE 1018 %allotheroptions%,T2_SHUFFLE_HEIGHT,T2_SHUFFLE_MIDDLE,T2_SHUFFLE_REVERSELINE 1020 %allotheroptions%,T2_SHUFFLE_HEIGHT,T2_SHUFFLE_MIDDLE,T2_SHUFFLE_WIDTH,T2_SHUFFLE_REVERSELINE 1028 set allotheroptions=NO_ASM,MERGED_MIDDLE,UNROLL_HEIGHT,UNROLL_WIDTH,WORKINGIN1,WORKINGOUT1,T2_SHUFFLE_WIDTH,T2_SHUFFLE_HEIGHT %allotheroptions%,CARRY32 989 * %allotheroptions%,CARRY64 1022 set allotheroptions=NO_ASM,MERGED_MIDDLE,UNROLL_HEIGHT,UNROLL_WIDTH,WORKINGIN1,WORKINGOUT1,T2_SHUFFLE_WIDTH,T2_SHUFFLE_HEIGHT,CARRY32 %allotheroptions%,FANCY_MIDDLEMUL1 1011 %allotheroptions%,MORE_SQUARES_MIDDLEMUL1 991 %allotheroptions%,CHEBYSHEV_METHOD 989 * %allotheroptions%,CHEBYSHEV_METHOD_FMA 989 * %allotheroptions%,ORIGINAL_METHOD 991 %allotheroptions%,ORIGINAL_TWEAKED 990 %allotheroptions%,ORIG_MIDDLEMUL2 987 * %allotheroptions%,CHEBYSHEV_MIDDLEMUL2 988 %allotheroptions%,ORIG_SLOWTRIG 1022 %allotheroptions%,NEW_SLOWTRIG 988 %allotheroptions%,MORE_ACCURATE 988 %allotheroptions%,LESS_ACCURATE 986 * NO_ASM,UNROLL_MIDDLEMUL2,MERGED_MIDDLE,WORKINGIN5,WORKINGOUT3,T2_SHUFFLE_REVERSELINE,T2_SHUFFLE_HEIGHT,CARRY32,CHEBYSHEV_METHOD,ORIG_MIDDLEMUL2,LESS_ACCURATE repeatability +-1.5/1015.5 = 0.148% base 1015.5 final 986 ratio 1015.5/986 = 1.030 timing overhead ~986/974-1 =~ .0123 |
![]() |
![]() |
![]() |
Thread Tools | |
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
mfakto: an OpenCL program for Mersenne prefactoring | Bdot | GPU Computing | 1668 | 2020-12-22 15:38 |
GPUOWL AMD Windows OpenCL issues | xx005fs | GpuOwl | 0 | 2019-07-26 21:37 |
Testing an expression for primality | 1260 | Software | 17 | 2015-08-28 01:35 |
Testing Mersenne cofactors for primality? | CRGreathouse | Computer Science & Computational Number Theory | 18 | 2013-06-08 19:12 |
Primality-testing program with multiple types of moduli (PFGW-related) | Unregistered | Information & Answers | 4 | 2006-10-04 22:38 |