mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware > GPU Computing > GpuOwl

Reply
 
Thread Tools
Old 2020-10-25, 06:34   #144
preda
 
preda's Avatar
 
"Mihai Preda"
Apr 2015

24·83 Posts
Default

Quote:
Originally Posted by kriesel View Post
It's repeatable in V7.0-35, on both error looping and normally running worktodo lines.
Thanks, I'll need to look into why STATS fails for large exponents.
preda is offline   Reply With Quote
Old 2020-10-25, 07:04   #145
preda
 
preda's Avatar
 
"Mihai Preda"
Apr 2015

24·83 Posts
Default Proof validation

For those who can, it may be a good idea to use -proof 9, which enables validation of the proof. The cost of the validation is 0.2% which is small enough (on the order of 2-3 minutes on R7), but it makes sure that the proof is good before beaming it up to the server.
preda is offline   Reply With Quote
Old 2020-10-25, 14:29   #146
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

4,703 Posts
Default FYI

Quote:
Originally Posted by preda View Post
Thanks, I'll need to look into why STATS fails for large exponents.
-use STATS some quick try on existing runs results, supplemented with ~binary search with new worktodo lines:
123M PRP, v7.0-40, ok
150M PRP, V6.11-364 ok
177.8M LL, V6.11-364 ok
181M LL, V6.11-380 ok
190M LL, V6.11-364 ok
320M PRP, V7.0-40 ok
480M PRP, V7.1-1, ok
554M PRP, V7.1-1, ok
558M PRP/P-1, V7.1-1, ok
560M PRP/P-1, V7.1-1, ok

561 PRP/P-1, V7.1-1, out of resources error
562.6 PRP/P-1, V7.1-1, out of resources error
600M PRP/P-1, V7.1-1, out of resources error
642M PRP, V6.11-364 out of resources error
764M PRP, V6.11-364 out of resources error
(previously reported below:)
843M PRP/P-1 V7.0-35 out of resources error
957M PRP/P-1 V7.0-35 out of resources error

Last fiddled with by Prime95 on 2020-10-25 at 16:54
kriesel is online now   Reply With Quote
Old 2020-10-26, 09:19   #147
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

111378 Posts
Default

Gpuowl FFT limits from the help.txt files

v7.1-1:
FFT 30M [ 47.19M - 560.64M] 1K:15:1K 4K:15:256
FFT 32M [ 50.33M - 599.62M] 4K:8:512 4K:4:1K

V6.11-364:
FFT 30M [ 47.19M - 560.64M] 1K:15:1K 4K:15:256
FFT 32M [ 50.33M - 599.62M] 4K:8:512 4K:4:1K

560.63M B1=3200000,B2=140000000;PRP=0,1,2,560630051,-1,83,2 STATS ok at 30M fft, maxalloc 14G, Radeon VII
560.65M B1=3200000,B2=140000000;PRP=0,1,2,560650067,-1,83,2 STATS at 32M fail with OUT_OF_RESOURCES error
560.63M forced to 32M with -fft 32M in config.txt: fail with OUT_OF_RESOURCES error

-use STATS OUT_OF_RESOURCES fatal error appears to relate to fft size 32M or larger
kriesel is online now   Reply With Quote
Old 2020-10-26, 17:04   #148
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

4,703 Posts
Default

... and that behavior transition coincides with the change to 4K head, since I was testing with default fft selection.
Code:
FFT   30M [ 47.19M -  560.64M]  1K:15:1K 4K:15:256

FFT   32M [ 50.33M -  599.62M]  4K:8:512 4K:4:1K
FFT   36M [ 56.62M -  671.04M]  4K:9:512
FFT   40M [ 62.91M -  743.74M]  4K:10:512 4K:5:1K
FFT   44M [ 69.21M -  816.39M]  4K:11:512
FFT   48M [ 75.50M -  889.11M]  4K:12:512 4K:6:1K
FFT   52M [ 81.79M -  961.97M]  4K:13:512
FFT   56M [ 88.08M - 1033.20M]  4K:14:512 4K:7:1K
FFT   60M [ 94.37M - 1103.74M]  4K:15:512
FFT   64M [100.66M - 1177.31M]  4K:8:1K
FFT   72M [113.25M - 1321.02M]  4K:9:1K
FFT   80M [125.83M - 1464.31M]  4K:10:1K
FFT   88M [138.41M - 1607.03M]  4K:11:1K
FFT   96M [150.99M - 1751.79M]  4K:12:1K
FFT  104M [163.58M - 1893.52M]  4K:13:1K
FFT  112M [176.16M - 2035.14M]  4K:14:1K
FFT  120M [188.74M - 2172.36M]  4K:15:1K
Looking at the fft descriptors tested for https://mersenneforum.org/showpost.p...&postcount=146
123M 1K:13:256
150M 1K:8:512
177M, 181M, 190M 1K:10:512
320M 1K:9:1K
480M 1K:13:1M
554M, 556M, 560M 1K:15:1K
1K head; 8, 9, 10, 13, 15 middle; 256, 512, or 1K tail combinations tried ok;

561M, 562.6M 4K:8:512
600M, 642M 4K:9:512
764M 4K:11:512
843M 4K:12:512
957M 4K:13:512
4K head; 8, 9, 11, 12 or 13 middle; 512 tail combinations tried failed.
The common factor seems to be related to the 4K head. Some middles fall in both lists, as does the 512 tail. That would mean all fft lengths from 32M to 120M could have the -use STATS OUT_OF_RESOURCES issue.


(edit)
But it also means some lower than 32M could have it. And this is confirmed by running
a single test on the same exponent that was ok with the default fft, with 4K:15:256:
Code:
2020-10-26 12:16:31 gpuowl v7.1-1-g0f73d04
2020-10-26 12:16:31 config: -user kriesel -cpu asr2/radeonvii0 -d 1 -maxAlloc 14G -proof 9 -log 100000 -use NO_ASM,STATS -fft 4K:15:256
2020-10-26 12:16:31 device 1, unique id ''
2020-10-26 12:16:31 asr2/radeonvii0 560630051 FFT: 30M 4K:15:256 (17.82 bpw)
2020-10-26 12:16:36 asr2/radeonvii0 560630051 OpenCL args "-DEXP=560630051u -DWIDTH=4096u -DSMALL_HEIGHT=256u -DMIDDLE=15u -DAMDGPU=1 -DCARRY64=1 -DCARRYM64=1 -DMM_CHAIN=3u -DMM2_CHAIN=3u -DMAX_ACCURACY=1 -DULTRA_TRIG=1 -DWEIGHT_STEP_MINUS_1=0x8.681b5a84b24dp-6 -DIWEIGHT_STEP_MINUS_1=-0xe.dc7abc0d7b388p-7 -DNO_ASM=1 -DSTATS=1  -cl-unsafe-math-optimizations -cl-std=CL2.0 -cl-finite-math-only "
2020-10-26 12:16:41 asr2/radeonvii0 560630051 OpenCL compilation in 4.72 s
2020-10-26 12:16:43 asr2/radeonvii0 560630051 maxAlloc: 14.0 GB
2020-10-26 12:16:43 asr2/radeonvii0 560630051 P1(3.2M) 4617012 bits
2020-10-26 12:16:44 asr2/radeonvii0 560630051 Acquired memory lock 'memlock-1'
2020-10-26 12:16:44 asr2/radeonvii0 560630051 P1(3.2M) using 100 buffers
2020-10-26 12:16:46 asr2/radeonvii0 560630051 P1(3.2M) releasing 100 buffers
2020-10-26 12:16:46 asr2/radeonvii0 560630051 Released memory lock 'memlock-1'
2020-10-26 12:16:47 asr2/radeonvii0 Exception gpu_error: OUT_OF_RESOURCES carryFused at clwrap.cpp:325 run
  2020-10-26 12:16:47 asr2/radeonvii0 Bye
(end edit)


It appears from extrapolation of FFT lengths' exponent limits that a modified gpuowl to attack P-1 or testing of F33 would require at least 480M length (perhaps as 4K:15:4K), and that would benefit from -use STATS checks being available both in development and end use.

Last fiddled with by kriesel on 2020-10-26 at 17:27
kriesel is online now   Reply With Quote
Old 2020-10-28, 17:21   #149
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

4,703 Posts
Default Assertion failed on large P-1

Same asr2 system as in some previous reports, gpuowl-win 7.1-1
Code:
2020-10-28 10:15:50 asr2/radeonvii0 937156667 P1 Jacobi OK @ 12451200 d8514e45158c1e7d
2020-10-28 10:16:00 asr2/radeonvii0 937156667 OK  12500800   1.33% 9f520cdb7c997859 16645 us/it + check 6.66s + save 2.89s; ETA 178d 03:15
2020-10-28 10:16:00 asr2/radeonvii0 937156667 P2(8630000,258.9M) D=210, nBuf=22
Assertion failed: nBuf >= minBufsFor(D), file Pm1Plan.cpp, line 154
Issue was repeatable on application restart, not surprising. Looks like it occurred at the onset of P2.

Resolved by changing to -maxAlloc 15G from 14G at least for now. Larger exponents are likely to run into trouble.
The "Assertion failed" line is not present in gpuowl.log. It was captured from the console window.
If practical it would be useful to have minBufs a user option. Or switch to the next worktodo line when a roadblock is hit. Or both.
Attached Thumbnails
Click image for larger version

Name:	assertion failed.png
Views:	14
Size:	77.9 KB
ID:	23658  
kriesel is online now   Reply With Quote
Old 2020-10-28, 19:05   #150
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

470310 Posts
Default v7.1-11

How does one actually attempt to use the 2xSP?
There's nothing in the help output about it, or readme.
(It seems a bit early to expect it to be automatically selecting depending on gpu model...)

Last fiddled with by kriesel on 2020-10-28 at 19:07
kriesel is online now   Reply With Quote
Old 2020-10-28, 19:49   #151
preda
 
preda's Avatar
 
"Mihai Preda"
Apr 2015

24·83 Posts
Default

Quote:
Originally Posted by kriesel View Post
Same asr2 system as in some previous reports, gpuowl-win 7.1-1
Code:
2020-10-28 10:15:50 asr2/radeonvii0 937156667 P1 Jacobi OK @ 12451200 d8514e45158c1e7d
2020-10-28 10:16:00 asr2/radeonvii0 937156667 OK  12500800   1.33% 9f520cdb7c997859 16645 us/it + check 6.66s + save 2.89s; ETA 178d 03:15
2020-10-28 10:16:00 asr2/radeonvii0 937156667 P2(8630000,258.9M) D=210, nBuf=22
Assertion failed: nBuf >= minBufsFor(D), file Pm1Plan.cpp, line 154
Issue was repeatable on application restart, not surprising. Looks like it occurred at the onset of P2.

Resolved by changing to -maxAlloc 15G from 14G at least for now. Larger exponents are likely to run into trouble.
The "Assertion failed" line is not present in gpuowl.log. It was captured from the console window.
If practical it would be useful to have minBufs a user option. Or switch to the next worktodo line when a roadblock is hit. Or both.
P2 needs at least 24 buffers. As the exponent grows, the buffer size grows, and this minimum required may not be met dependning on the -maxAlloc allowed. I do not plan to fix this ATM, let's simply say that if not enough GPU memory is available, then huge exponents can't be P2.
preda is offline   Reply With Quote
Old 2020-10-28, 19:50   #152
preda
 
preda's Avatar
 
"Mihai Preda"
Apr 2015

24×83 Posts
Default

Quote:
Originally Posted by kriesel View Post
How does one actually attempt to use the 2xSP?
There's nothing in the help output about it, or readme.
(It seems a bit early to expect it to be automatically selecting depending on gpu model...)
No, 2xSP is an experiment, can't be used for anything yet. Still a long way to go. (I was just measuring the precission that can be achieved *if* it was implemented)

Last fiddled with by preda on 2020-10-28 at 19:51
preda is offline   Reply With Quote
Old 2020-10-30, 18:30   #153
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

10010010111112 Posts
Default

2 things:

V7.1-11 P-1 hiccup, so says Bye. (Sure would be nice if it would progress to the next worktodo entry instead of quitting.)
Code:
2020-10-30 13:18:44 GpuOwl VERSION v7.1-11-g97cfbd2
2020-10-30 13:18:44 config: -user kriesel -cpu asr2/radeonvii2 -d 2 -maxAlloc 15G -proof 9 -log 100000 -use NO_ASM
2020-10-30 13:18:44 device 2, unique id ''
2020-10-30 13:18:44 asr2/radeonvii2 153021377 FFT: 8M 1K:8:512 (18.24 bpw)
2020-10-30 13:18:46 asr2/radeonvii2 153021377 OpenCL args "-DEXP=153021377u -DWIDTH=1024u -DSMALL_HEIGHT=512u -DMIDDLE=8u -DAMDGPU=1 -DCARRY64=1 -DCARRYM64=1 -DMM_CHAIN=1u -DMM2_CHAIN=1u -DMAX_ACCURACY=1 -DWEIGHT_STEP_MINUS_1=0xb.10feac5431868p-4 -DIWEIGHT_STEP_MINUS_1=-0xd.156361ac01fe8p-5 -DNO_ASM=1  -cl-unsafe-math-optimizations -cl-std=CL2.0 -cl-finite-math-only "
2020-10-30 13:18:53 asr2/radeonvii2 153021377 OpenCL compilation in 7.71 s
2020-10-30 13:18:54 asr2/radeonvii2 153021377 maxAlloc: 15.0 GB
2020-10-30 13:18:54 asr2/radeonvii2 153021377 P1(1M) 1442134 bits
2020-10-30 13:18:54 asr2/radeonvii2 153021377 PRP starting from beginning
2020-10-30 13:18:54 asr2/radeonvii2 153021377 Acquired memory lock 'memlock-2'
2020-10-30 13:18:54 asr2/radeonvii2 153021377 P1(1M) using 460 buffers
2020-10-30 13:18:58 asr2/radeonvii2 153021377 [0] 2d87ce26 != fffffffb
2020-10-30 13:18:58 asr2/radeonvii2 153021377 [1] 7581b6da != 00000019
2020-10-30 13:18:58 asr2/radeonvii2 153021377 [2] efca9779 != ffffff83
2020-10-30 13:18:58 asr2/radeonvii2 153021377 [3] 03fe031c != 00000271
2020-10-30 13:18:58 asr2/radeonvii2 153021377 [4] 21d014f0 != fffff3cb
2020-10-30 13:18:58 asr2/radeonvii2 153021377 [5] 2100996a != 00003d09
2020-10-30 13:18:58 asr2/radeonvii2 153021377 [6] 18280ed1 != fffeced3
2020-10-30 13:18:58 asr2/radeonvii2 153021377 [7] fdd2f6a2 != 0005f5e1
2020-10-30 13:18:58 asr2/radeonvii2 153021377 [8] 563a16a6 != ffe2329b
2020-10-30 13:18:58 asr2/radeonvii2 153021377 [9] 1c97ee6a != 009502f9
2020-10-30 13:18:58 asr2/radeonvii2 153021377 [10] 44fb8d20 != fd16f123
2020-10-30 13:18:58 asr2/radeonvii2 153021377 [11] 0de14ac0 != 0e8d4a51
2020-10-30 13:18:58 asr2/radeonvii2 153021377 [12] fd818931 != b73d8c6b
2020-10-30 13:18:58 asr2/radeonvii2 153021377 [13] 058c6909 != 6bcc41e9
2020-10-30 13:18:58 asr2/radeonvii2 153021377 [14] c4dfa66e != e502b673
2020-10-30 13:18:58 asr2/radeonvii2 153021377 [15] 8739ef6f != 86f26fc1
2020-10-30 13:18:58 asr2/radeonvii2 153021377 [16] 3f8bbf6e != 5d43d13b
2020-10-30 13:18:58 asr2/radeonvii2 153021377 [17] 19ad23d9 != 2dace9d9
2020-10-30 13:18:58 asr2/radeonvii2 153021377 [18] 6b9d9ac6 != 1b9f6ec3
2020-10-30 13:18:58 asr2/radeonvii2 153021377 [19] 729bd6ae != 75e2d631
2020-10-30 13:18:58 asr2/radeonvii2 153021377 fold() does not roundtrip
2020-10-30 13:18:58 asr2/radeonvii2 153021377 P1(1M) releasing 460 buffers
2020-10-30 13:18:59 asr2/radeonvii2 153021377 Released memory lock 'memlock-2'
2020-10-30 13:18:59 asr2/radeonvii2 Exiting because "fold roundtrip"
2020-10-30 13:18:59 asr2/radeonvii2 Bye
A 623M assignment had no such issue at 623466917 FFT: 36M 4K:9:512 (16.52 bpw) . Maybe 8M fft on 153M P-1&PRP is a bit optimistic for a default, at 18.24 bits/word? It launches ok if forced to 9M, 16.21 bpw.


It's confirmed by quick test that the -use STATS issue for 4K fft head extends down to the minimum length that offers it, 6M
Code:
2020-10-30 12:52:38 gpuowl v6.11-364-g36f4e2a
2020-10-30 12:52:38 config: -user kriesel -cpu asr2/radeonvii -d 1 -maxAlloc 15000 -use NO_ASM,STATS -fft 4K:3:256
2020-10-30 12:52:38 device 1, unique id ''
2020-10-30 12:52:38 asr2/radeonvii 100759339 FFT: 6M 4K:3:256 (16.02 bpw)
2020-10-30 12:52:38 asr2/radeonvii Expected maximum carry32: 12AD0000
2020-10-30 12:52:39 asr2/radeonvii OpenCL args "-DEXP=100759339u -DWIDTH=4096u -DSMALL_HEIGHT=256u -DMIDDLE=3u -DPM1=1 -DAMDGPU=1 -DWEIGHT_STEP_MINUS_1=0xf.a9c658667f95p-4 -DIWEIGHT_STEP_MINUS_1=-0xf.d46dc4b3339dp-5 -DNO_ASM=1 -DSTATS=1  -cl-unsafe-math-optimizations -cl-std=CL2.0 -cl-finite-math-only "
2020-10-30 12:52:44 asr2/radeonvii OpenCL compilation in 4.50 s
2020-10-30 12:52:45 asr2/radeonvii 100759339 P1 B1=1000000, B2=30000000; 1442134 bits; starting at 1001
2020-10-30 12:52:45 asr2/radeonvii Exception gpu_error: OUT_OF_RESOURCES carryFused at clwrap.cpp:325 run
2020-10-30 12:52:45 asr2/radeonvii Bye

2020-10-30 12:53:07 gpuowl v6.11-364-g36f4e2a
2020-10-30 12:53:07 config: -user kriesel -cpu asr2/radeonvii -d 1 -maxAlloc 15000 -use NO_ASM -fft 4K:3:256
2020-10-30 12:53:07 device 1, unique id ''
2020-10-30 12:53:07 asr2/radeonvii 100759339 FFT: 6M 4K:3:256 (16.02 bpw)
2020-10-30 12:53:07 asr2/radeonvii Expected maximum carry32: 12AD0000
2020-10-30 12:53:09 asr2/radeonvii OpenCL args "-DEXP=100759339u -DWIDTH=4096u -DSMALL_HEIGHT=256u -DMIDDLE=3u -DPM1=1 -DAMDGPU=1 -DWEIGHT_STEP_MINUS_1=0xf.a9c658667f95p-4 -DIWEIGHT_STEP_MINUS_1=-0xf.d46dc4b3339dp-5 -DNO_ASM=1  -cl-unsafe-math-optimizations -cl-std=CL2.0 -cl-finite-math-only "
2020-10-30 12:53:13 asr2/radeonvii OpenCL compilation in 4.26 s
2020-10-30 12:53:14 asr2/radeonvii 100759339 P1 B1=1000000, B2=30000000; 1442134 bits; starting at 1001
2020-10-30 12:53:24 asr2/radeonvii 100759339 P1    10000   0.69%; 1161 us/it; ETA 0d 00:28; 849bb19b9a9f4ce3
2020-10-30 12:53:36 asr2/radeonvii 100759339 P1    20000   1.39%; 1158 us/it; ETA 0d 00:27; 556f71d2a8cf201c

Last fiddled with by kriesel on 2020-10-30 at 18:42
kriesel is online now   Reply With Quote
Old 2020-11-01, 06:42   #154
preda
 
preda's Avatar
 
"Mihai Preda"
Apr 2015

24·83 Posts
Default v7.2

Please upgrade to v7.2 which fixes a proof generation bug.
preda is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
GpuOwl PRP-Proof changes preda GpuOwl 20 2020-10-17 06:51
gpuowl: runtime error SELROC GpuOwl 59 2020-10-02 03:56
gpuOWL for Wagstaff GP2 GpuOwl 22 2020-06-13 16:57
gpuowl tuning M344587487 GpuOwl 14 2018-12-29 08:11
How to interface gpuOwl with PrimeNet preda PrimeNet 2 2017-10-07 21:32

All times are UTC. The time now is 19:54.

Wed Nov 25 19:54:18 UTC 2020 up 76 days, 17:05, 3 users, load averages: 1.66, 1.63, 1.57

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2020, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.