mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware > GPU Computing > GpuOwl

Reply
 
Thread Tools
Old 2020-10-23, 23:11   #122
moebius
 
moebius's Avatar
 
Jul 2009
Germany

461 Posts
Default

Quote:
Originally Posted by preda View Post
..
The new version gpuowl-win 7.1-1 has a very big slowdown (GPU Fans at 100%) with Vega 10.

Attempt 1
Code:
2020-10-24 00:41:45 gpuowl v7.1-1-g0f73d04
2020-10-24 00:41:45 Note: not found 'config.txt'
2020-10-24 00:41:45 config: -prp 77936867
2020-10-24 00:41:45 device 0, unique id ''
2020-10-24 00:41:46 gfx900-0 77936867 FFT: 4M 1K:8:256 (18.58 bpw)
2020-10-24 00:41:46 gfx900-0 77936867 OpenCL args "-DEXP=77936867u -DWIDTH=1024u -DSMALL_HEIGHT=256u -DMIDDLE=8u -DAMDGPU=1 -DCARRY64=1 -DCARRYM64=1 -DMM_CHAIN=1u -DMM2_CHAIN=2u -DMAX_ACCURACY=1 -DWEIGHT_STEP_MINUS_1=0xa.c42d0d7cec038p-5 -DIWEIGHT_STEP_MINUS_1=-0x8.0e50c8817ddf8p-5  -cl-unsafe-math-optimizations -cl-std=CL2.0 -cl-finite-math-only "
2020-10-24 00:41:46 gfx900-0 77936867 ASM compilation failed, retrying compilation using NO_ASM
2020-10-24 00:41:48 gfx900-0 77936867 OpenCL compilation in 2.42 s
2020-10-24 00:41:48 gfx900-0 77936867 maxAlloc: 0.0 GB
2020-10-24 00:41:48 gfx900-0 77936867 You should use -maxAlloc if your GPU has more than 4GB memory. See help '-h'
2020-10-24 00:41:48 gfx900-0 77936867 P1(0) 0 bits
2020-10-24 00:41:49 gfx900-0 77936867 OK       800 loaded: blockSize 400, 1579c241dc63eca6
2020-10-24 00:41:49 gfx900-0 77936867 validating proof residues for power 8
2020-10-24 00:41:49 gfx900-0 77936867 Proof using power 8
2020-10-24 00:41:51 gfx900-0 77936867 OK      1600   0.00% 0f62a1fcc1c78fe9 1225 us/it + check 0.54s + save 0.10s; ETA 1d 02:32
2020-10-24 00:42:01 gfx900-0 77936867        10000   0.01% fc4f135f7cf4ad29 1227 us/it
2020-10-24 00:42:16 gfx900-0 77936867        20000   0.03% 3cd1bd9d5e09cbc5 1499 us/it
Attempt 2 (GPU Fans at 80%)
Code:
gpuowl-v7.1-1-g0f73d04>gpuowl-win -prp 77936867
2020-10-24 00:58:14 gpuowl v7.1-1-g0f73d04
2020-10-24 00:58:14 Note: not found 'config.txt'
2020-10-24 00:58:14 config: -prp 77936867
2020-10-24 00:58:14 device 0, unique id ''
2020-10-24 00:58:14 gfx900-0 77936867 FFT: 4M 1K:8:256 (18.58 bpw)
2020-10-24 00:58:14 gfx900-0 77936867 OpenCL args "-DEXP=77936867u -DWIDTH=1024u -DSMALL_HEIGHT=256u -DMIDDLE=8u -DAMDGPU=1 -DCARRY64=1 -DCARRYM64=1 -DMM_CHAIN=1u -DMM2_CHAIN=2u -DMAX_ACCURACY=1 -DWEIGHT_STEP_MINUS_1=0xa.c42d0d7cec038p-5 -DIWEIGHT_STEP_MINUS_1=-0x8.0e50c8817ddf8p-5  -cl-unsafe-math-optimizations -cl-std=CL2.0 -cl-finite-math-only "
2020-10-24 00:58:14 gfx900-0 77936867 ASM compilation failed, retrying compilation using NO_ASM
2020-10-24 00:58:16 gfx900-0 77936867 OpenCL compilation in 2.41 s
2020-10-24 00:58:16 gfx900-0 77936867 maxAlloc: 0.0 GB
2020-10-24 00:58:16 gfx900-0 77936867 You should use -maxAlloc if your GPU has more than 4GB memory. See help '-h'
2020-10-24 00:58:16 gfx900-0 77936867 P1(0) 0 bits
2020-10-24 00:58:17 gfx900-0 77936867 OK      1600 loaded: blockSize 400, 0f62a1fcc1c78fe9
2020-10-24 00:58:17 gfx900-0 77936867 validating proof residues for power 8
2020-10-24 00:58:17 gfx900-0 77936867 Proof using power 8
2020-10-24 00:58:19 gfx900-0 77936867 OK      2400   0.00% 37da222eb3acf668 1222 us/it + check 0.53s + save 0.10s; ETA 1d 02:27
2020-10-24 00:58:28 gfx900-0 77936867        10000   0.01% fc4f135f7cf4ad29 1224 us/it
2020-10-24 00:58:40 gfx900-0 77936867        20000   0.03% 3cd1bd9d5e09cbc5 1225 us/it
2020-10-24 00:58:53 gfx900-0 77936867        30000   0.04% c4e0ff35e3290d98 1229 us/it
2020-10-24 00:59:05 gfx900-0 77936867        40000   0.05% dffe1b1b0d748128 1232 us/it
2020-10-24 00:59:17 gfx900-0 77936867        50000   0.06% 52e286945371ed29 1235 us/it
2020-10-24 00:59:30 gfx900-0 77936867        60000   0.08% 0945da4dc08bdd95 1237 us/it
2020-10-24 00:59:42 gfx900-0 77936867        70000   0.09% 7131fa4eb77f4bb2 1239 us/it
2020-10-24 00:59:54 gfx900-0 77936867        80000   0.10% 8d76071d27ee4221 1240 us/it
2020-10-24 01:00:07 gfx900-0 77936867        90000   0.12% 0bacff453b2f470e 1241 us/it
2020-10-24 01:00:19 gfx900-0 77936867       100000   0.13% 6d7296b9e2830f50 1242 us/it
2020-10-24 01:00:32 gfx900-0 77936867       110000   0.14% 8cbfd4435622bda7 1242 us/it
2020-10-24 01:00:44 gfx900-0 77936867       120000   0.15% 79ae5dad855057ad 1243 us/it
2020-10-24 01:00:57 gfx900-0 77936867       130000   0.17% 50c97bcbf876231f 1244 us/it
2020-10-24 01:01:09 gfx900-0 77936867       140000   0.18% e1db15f897271496 1244 us/it
2020-10-24 01:01:21 gfx900-0 77936867       150000   0.19% 127631386c6a9b17 1245 us/it
2020-10-24 01:01:34 gfx900-0 77936867       160000   0.21% 25b7b6206fc6f085 1247 us/it
2020-10-24 01:01:46 gfx900-0 77936867       170000   0.22% 416816b0d9f4bba8 1245 us/it
2020-10-24 01:01:59 gfx900-0 77936867       180000   0.23% 6bee5d054f770861 1246 us/it
2020-10-24 01:02:11 gfx900-0 77936867       190000   0.24% f37f068f014b18a0 1252 us/it
2020-10-24 01:02:24 gfx900-0 77936867 OK    200000   0.26% f0b04b45b0855bd2 1250 us/it + check 0.54s + save 0.10s; ETA 1d 03:00
2020-10-24 01:02:37 gfx900-0 77936867       210000   0.27% 43eb2fc2424d8aac 1251 us/it
2020-10-24 01:02:49 gfx900-0 77936867       220000   0.28% a1081c6dc6a7689f 1248 us/it
2020-10-24 01:03:02 gfx900-0 77936867       230000   0.30% 2387818d3d3d0d01 1249 us/it
2020-10-24 01:03:14 gfx900-0 77936867       240000   0.31% a9deae45055e5216 1252 us/it
2020-10-24 01:03:27 gfx900-0 77936867       250000   0.32% 89fcab15218f7cac 1252 us/it
2020-10-24 01:03:39 gfx900-0 77936867       260000   0.33% 55da428da4cf928a 1254 us/it
2020-10-24 01:03:52 gfx900-0 77936867       270000   0.35% dc349756c5f05abf 1253 us/it
2020-10-24 01:04:05 gfx900-0 77936867       280000   0.36% 3564af24488443f4 1251 us/it
The old version gpuowl-win 6.11-364 (GPU Fans at 80%) runs more stable.
Code:
020-09-28 11:54:26 gfx900 RX Vega AMD OpenCL compilation in 2.52 s
2020-09-28 11:54:27 gfx900 RX Vega AMD 77936867 OK        0 loaded: blockSize 400, 0000000000000003
2020-09-28 11:54:28 gfx900 RX Vega AMD 77936867 OK      800   0.00%; 1206 us/it; ETA 1d 02:07; 1579c241dc63eca6 (check 0.53s)
2020-09-28 11:58:31 gfx900 RX Vega AMD 77936867 OK   200000   0.26%; 1216 us/it; ETA 1d 02:16; f0b04b45b0855bd2 (check 0.54s)
2020-09-28 12:02:35 gfx900 RX Vega AMD 77936867 OK   400000   0.51%; 1219 us/it; ETA 1d 02:15; c03f94396a5aa29e (check 0.54s)
2020-09-28 12:06:40 gfx900 RX Vega AMD 77936867 OK   600000   0.77%; 1219 us/it; ETA 1d 02:11; b9decd65ca71b629 (check 0.54s)
What might that be? The GPU and RAM temperatures are approx. 80 ° C for all 3 measurements, which is normal for the card and cannot be the reason. Is very sad.

Last fiddled with by moebius on 2020-10-23 at 23:14
moebius is offline   Reply With Quote
Old 2020-10-23, 23:19   #123
preda
 
preda's Avatar
 
"Mihai Preda"
Apr 2015

24618 Posts
Default

Quote:
Originally Posted by moebius View Post
The new version gpuowl-win 7.1-1 has a very big slowdown (GPU Fans at 100%) with Vega 10.
Would you quantify the "very big slowdown" as a percentage?

You should include the initial part of the log for the v6.x run, which includes the OpenCL defines, which may be the explanation for the difference you see.

Also, I would recommend to try out a run with P1 (just start a new exponent with P1) to check whether that works at all.

Last fiddled with by preda on 2020-10-23 at 23:23
preda is offline   Reply With Quote
Old 2020-10-23, 23:39   #124
moebius
 
moebius's Avatar
 
Jul 2009
Germany

1CD16 Posts
Default

Quote:
Originally Posted by preda View Post
Would you quantify the "very big slowdown" as a percentage?
you should include the initial part of the log for the v6.x run, which includes the OpenCL defines, which may be the explanation for the difference you see.
I mainly mean this mysterious drop from 1227 to 1499 us / it, which is 18%, otherwise it has only been running with 4% less performance since the improvement.
The log for the 6.11 run is no longer available to me.
moebius is offline   Reply With Quote
Old 2020-10-23, 23:43   #125
preda
 
preda's Avatar
 
"Mihai Preda"
Apr 2015

3·443 Posts
Default

Quote:
Originally Posted by moebius View Post
I mainly mean this mysterious drop from 1227 to 1499 us / it, which is 18%, otherwise it has only been running with 4% less performance since the improvement.
The log for the 6.11 run is no longer available to me.
I see 1250 in your logs with the new version. I only see one instance of 1499, which is not meaningful by itself (some variation can happen, the GPU hiccuped). Keep measuring some more and see how often you get that, or what you see in average etc.
preda is offline   Reply With Quote
Old 2020-10-24, 00:03   #126
M344587487
 
M344587487's Avatar
 
"Composite as Heck"
Oct 2017

3·233 Posts
Default

Quote:
Originally Posted by Viliam Furik View Post
...
I am pretty sharp and I know my way around Python, but I haven't learned the C yet. (I know it's a different kind of language, but still, it is a language for a computer, so the main structure - how does it do its thing, not how is it called, or how is it "translated" to the 1s and 0s - is always almost the same.) It is not exactly clear to me how the FFT process works, but I've got a rough idea. Same for the random shift. I must admit, FP numbers are a bit hazy for me, but as I said, I can learn quickly.
...
C is trivial and OpenCL is quite different to normal programming but doable. The hard parts are the theory/algorithms, and translating an algorithm into efficient code is an art in itself. I mean no offense which is why I'm using myself as an example, it would take literally years of effort for me to get proficient enough to confidently maintain something like gpuowl.

It sounds like you have a lot of basic programming to learn let alone OpenCL, but if and when you want to try OpenCL mfakto is a good example to play around with IMO. It's simple, concise, and the algorithms aren't too crazy.
M344587487 is offline   Reply With Quote
Old 2020-10-24, 03:27   #127
preda
 
preda's Avatar
 
"Mihai Preda"
Apr 2015

3×443 Posts
Default Looking for P-1 missed factors

I'm very much looking for bugs in P-1, manifested in missed factors -- either in stage1 or stage2. Would be grateful to the person that can trigger such bugs.

How: choose some known factors, identify the bounds needed to detect them, run P-1 and if the factor isn't found: Bingo! a bug.

PS: I did the above myself too, I'm not outsourcing 100% of testing :). But maybe somebody is more inspired or simply luckier in hitting the bugs.

Please report back. Bugs appreciated :)
preda is offline   Reply With Quote
Old 2020-10-24, 08:51   #128
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

4,751 Posts
Default Error loop

An example of gpuowl getting stuck in an error loop until manual intervention. This was gpuowl-win v7.0-35-gf06bc5b on Radeon VII gpu, Celeron G1840 cpu, Windows 10 x64 Pro, system ram 16 GB. Gpu was switched manually to a different assignment.
This progress loss of most of a day until manual intervention is an example of why I've requested conversion of an assignment to a comment as a way of bypassing but preserving the entry, and progressing to a subsequent worktodo line.
(Jacobi check duration is ~14 minutes each with this large exponent, slow cpu combination.)
Code:
2020-10-21 08:28:38 asr2/radeonvii1 957156667 OK   5600000   0.59% 7236c2983efa4706 14281 us/it; ETA 157d 06:54 | P1(8310000) 46.7%
2020-10-21 08:31:00 asr2/radeonvii1 957156667      5610000   0.59% b6637d62ffbb3b70
2020-10-21 08:33:22 asr2/radeonvii1 957156667      5620000   0.59% 3d2bc926989a4724
2020-10-21 08:35:44 asr2/radeonvii1 957156667      5630000   0.59% f937236f45165651
2020-10-21 08:38:06 asr2/radeonvii1 957156667      5640000   0.59% d559e92ddb700589
2020-10-21 08:40:28 asr2/radeonvii1 957156667      5650000   0.59% 195b235c73ce9ab6
2020-10-21 08:42:50 asr2/radeonvii1 957156667      5660000   0.59% a98f286467d3b204
2020-10-21 08:45:12 asr2/radeonvii1 957156667      5670000   0.59% 01af810688424480
2020-10-21 08:47:34 asr2/radeonvii1 957156667      5680000   0.59% a69fe66e72a901e6
2020-10-21 08:49:56 asr2/radeonvii1 957156667      5690000   0.59% e579cb9f8075a999
2020-10-21 08:52:26 asr2/radeonvii1 957156667 EE   5700000   0.60% 13177fff3371962d 14276 us/it; ETA 157d 05:10
2020-10-21 08:52:42 asr2/radeonvii1 957156667 OK   5600000 loaded: blockSize 500, 7236c2983efa4706
2020-10-21 08:52:45 asr2/radeonvii1 957156667 P1   5600000 starting on-load Jacobi check
2020-10-21 08:53:18 asr2/radeonvii1 957156667 OK   5601000   0.59% 0403831ae3cb03c2 24576 us/it; ETA 270d 16:00 1 errors | P1(8310000) 46.7%
2020-10-21 08:55:47 asr2/radeonvii1 957156667      5610000   0.59% b6637d62ffbb3b70
2020-10-21 08:58:31 asr2/radeonvii1 957156667      5620000   0.59% 3d2bc926989a4724
2020-10-21 09:01:17 asr2/radeonvii1 957156667      5630000   0.59% f937236f45165651
2020-10-21 09:04:01 asr2/radeonvii1 957156667      5640000   0.59% d559e92ddb700589
2020-10-21 09:06:45 asr2/radeonvii1 957156667      5650000   0.59% 195b235c73ce9ab6
2020-10-21 09:07:05 asr2/radeonvii1 957156667 P1 Jacobi check OK
2020-10-21 09:09:10 asr2/radeonvii1 957156667      5660000   0.59% a98f286467d3b204
2020-10-21 09:11:32 asr2/radeonvii1 957156667      5670000   0.59% 01af810688424480
2020-10-21 09:13:55 asr2/radeonvii1 957156667      5680000   0.59% a69fe66e72a901e6
2020-10-21 09:16:16 asr2/radeonvii1 957156667      5690000   0.59% e579cb9f8075a999
2020-10-21 09:18:46 asr2/radeonvii1 957156667 EE   5700000   0.60% 13177fff3371962d 15433 us/it; ETA 169d 22:52 1 errors
2020-10-21 09:19:02 asr2/radeonvii1 957156667 OK   5601000 loaded: blockSize 500, 0403831ae3cb03c2
2020-10-21 09:19:06 asr2/radeonvii1 957156667 P1   5601000 starting on-load Jacobi check
2020-10-21 09:21:34 asr2/radeonvii1 957156667      5610000   0.59% b6637d62ffbb3b70
2020-10-21 09:24:19 asr2/radeonvii1 957156667      5620000   0.59% 3d2bc926989a4724
2020-10-21 09:27:03 asr2/radeonvii1 957156667      5630000   0.59% f937236f45165651
2020-10-21 09:29:48 asr2/radeonvii1 957156667      5640000   0.59% d559e92ddb700589
2020-10-21 09:32:33 asr2/radeonvii1 957156667      5650000   0.59% 195b235c73ce9ab6
2020-10-21 09:33:24 asr2/radeonvii1 957156667 P1 Jacobi check OK
2020-10-21 09:35:01 asr2/radeonvii1 957156667      5660000   0.59% a98f286467d3b204
2020-10-21 09:37:23 asr2/radeonvii1 957156667      5670000   0.59% 01af810688424480
2020-10-21 09:39:46 asr2/radeonvii1 957156667      5680000   0.59% a69fe66e72a901e6
2020-10-21 09:42:08 asr2/radeonvii1 957156667      5690000   0.59% e579cb9f8075a999
2020-10-21 09:44:37 asr2/radeonvii1 957156667 EE   5700000   0.60% 13177fff3371962d 15318 us/it; ETA 168d 16:30 2 errors
...(ad nauseum)...
Code:
2020-10-22 01:42:26 asr2/radeonvii1 957156667 OK   5601000 loaded: blockSize 500, 0403831ae3cb03c2
2020-10-22 01:42:29 asr2/radeonvii1 957156667 P1   5601000 starting on-load Jacobi check
2020-10-22 01:44:58 asr2/radeonvii1 957156667      5610000   0.59% b6637d62ffbb3b70
2020-10-22 01:47:43 asr2/radeonvii1 957156667      5620000   0.59% 3d2bc926989a4724
2020-10-22 01:50:29 asr2/radeonvii1 957156667      5630000   0.59% f937236f45165651
2020-10-22 01:53:14 asr2/radeonvii1 957156667      5640000   0.59% d559e92ddb700589
2020-10-22 01:56:00 asr2/radeonvii1 957156667      5650000   0.59% 195b235c73ce9ab6
2020-10-22 01:56:54 asr2/radeonvii1 957156667 P1 Jacobi check OK
2020-10-22 01:58:30 asr2/radeonvii1 957156667      5660000   0.59% a98f286467d3b204
2020-10-22 02:00:52 asr2/radeonvii1 957156667      5670000   0.59% 01af810688424480
2020-10-22 02:03:14 asr2/radeonvii1 957156667      5680000   0.59% a69fe66e72a901e6
2020-10-22 02:05:36 asr2/radeonvii1 957156667      5690000   0.59% e579cb9f8075a999
2020-10-22 02:08:06 asr2/radeonvii1 957156667 EE   5700000   0.60% 13177fff3371962d 15366 us/it; ETA 169d 05:09 40 errors
2020-10-22 02:08:22 asr2/radeonvii1 957156667 OK   5601000 loaded: blockSize 500, 0403831ae3cb03c2
2020-10-22 02:08:25 asr2/radeonvii1 957156667 P1   5601000 starting on-load Jacobi check
2020-10-22 02:10:54 asr2/radeonvii1 957156667      5610000   0.59% b6637d62ffbb3b70
2020-10-22 02:13:39 asr2/radeonvii1 957156667      5620000   0.59% 3d2bc926989a4724
2020-10-22 02:16:25 asr2/radeonvii1 957156667      5630000   0.59% f937236f45165651
2020-10-22 02:19:10 asr2/radeonvii1 957156667      5640000   0.59% d559e92ddb700589
2020-10-22 02:19:18 asr2/radeonvii1 957156667 Stopping, please wait..
2020-10-22 02:19:36 asr2/radeonvii1 957156667 OK   5640500   0.59% a5e93f63e0eb2bcf 16339 us/it; ETA 179d 22:35 41 errors | P1(8310000) 47.0%
2020-10-22 02:22:56 asr2/radeonvii1 957156667 P1(8310000) releasing 49 buffers
2020-10-22 02:22:57 asr2/radeonvii1 957156667 Released memory lock 'memlock-1'
2020-10-22 02:22:57 asr2/radeonvii1 Exiting because "stop requested"
2020-10-22 02:22:57 asr2/radeonvii1 Bye

Last fiddled with by kriesel on 2020-10-24 at 08:58
kriesel is online now   Reply With Quote
Old 2020-10-24, 08:56   #129
preda
 
preda's Avatar
 
"Mihai Preda"
Apr 2015

53116 Posts
Default

Quote:
Originally Posted by kriesel View Post
An example of gpuowl getting stuck in an error loop until manual intervention. This was gpuowl-win v7.0-35-gf06bc5b on Radeon VII gpu, Celeron G1840 cpu, Windows 10 x64 Pro, system ram 16 GB. Gpu was switched manually to a different assignment.
Yes pretty serious. I'll try to address this (I didn't hit this myself yet, thus didn't realize the regression).

You should be able to continue the exponent, just bump up the FFT-size or some other precission-related defines.

BTW, you should run that exponent with -use STATS and see what that reports, whether the observed round-off errors are excessive/dangerous (I expect that to be the case).

Also, you may run somewhat lower exponents -- otherwise you're in the not-tested/not-tuned area.

Last fiddled with by preda on 2020-10-24 at 09:00
preda is offline   Reply With Quote
Old 2020-10-24, 09:42   #130
preda
 
preda's Avatar
 
"Mihai Preda"
Apr 2015

3×443 Posts
Default

Quote:
Originally Posted by kriesel View Post
An example of gpuowl getting stuck in an error loop
Were you using -log 100000 by any chance in your config?
preda is offline   Reply With Quote
Old 2020-10-24, 10:08   #131
preda
 
preda's Avatar
 
"Mihai Preda"
Apr 2015

3·443 Posts
Default

Quote:
Originally Posted by kriesel View Post
An example of gpuowl getting stuck in an error loop
Ken I merged an attempted fix, maybe you could try it on the looping exponent and check the behavior.
preda is offline   Reply With Quote
Old 2020-10-24, 11:10   #132
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

4,751 Posts
Default

Quote:
Originally Posted by preda View Post
Ken I merged an attempted fix, maybe you could try it on the looping exponent and check the behavior.
Thanks for the prompt action.
I see by scanning the commit, you've changed it from loop on repeated error, to stop the program on repeated error. Either of those results in up to a lost gpu day if checked daily. (Saves a little electricity, loses a lot of time.) That does not occur to me as a solution to the lost gpu time problem.
Please implement the following under program control. (If the possibility of it rolling through all assignments troubles you, consider making stop or continue on repeated error a user selectable config file option.)
Code:
when repeated error detected:
   stop processing of current worktodo line A
   log error
   modify first active worktodo line A to a comment by prepending # to the troublemaker line A
   reread modified worktodo file for first available worktodo line B
   crunch away on worktodo line B (until done, repeated error detected, or manual stop)
Example worktodo file before repeated error trouble hits
Code:
B1=8310000,B2=249300000;PRP=0,1,2,957156667,-1,86,2 
B1=7520000,B2=225600000;PRP=0,1,2,843112609,-1,85,2
PRP=AID,1,2,182585281,-1,79,2
Example after repeated error detected; work occurs on second line after first is deactivated:
Code:
# B1=8310000,B2=249300000;PRP=0,1,2,957156667,-1,86,2
B1=7520000,B2=225600000;PRP=0,1,2,843112609,-1,85,2
PRP=AID,1,2,182585281,-1,79,2
Or even include a bit of explanation:
Code:
# B1=8310000,B2=249300000;PRP=0,1,2,957156667,-1,86,2 repeated error detected, see gpuowl.log for details around 22 Oct 2020 02:36:45
B1=7520000,B2=225600000;PRP=0,1,2,843112609,-1,85,2
PRP=AID,1,2,182585281,-1,79,2
Then productive work can continue on a later assignment that's safe, until the owner sees the situation during a routine daily or weekly check, and can manually intervene, such as by forcing an fft length, uncommenting the problem line, and trying again. Safe production work types can then backstop the more adventurous or QA lines, ensuring productive use ~24/7 for the many hours or multiple days that would otherwise be lost. (Use thoughtfully; run times of large exponents can exceed the expiration time of a wavefront exponent.)
Consider that not everyone is running multiple instances per gpu, especially with memory-hungry work, and things can also go awry with two instances on the same gpu.
FYI, on an LL of 182585281 in gpuowl v6.11-384, I did not find a way to get past a recurrent Jacobi error. Will post that separately later, probably in the Things that make you go hmm, concerning gpuowl runs thread.

Last fiddled with by kriesel on 2020-10-24 at 11:54
kriesel is online now   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
GpuOwl PRP-Proof changes preda GpuOwl 20 2020-10-17 06:51
gpuowl: runtime error SELROC GpuOwl 59 2020-10-02 03:56
gpuOWL for Wagstaff GP2 GpuOwl 22 2020-06-13 16:57
gpuowl tuning M344587487 GpuOwl 14 2018-12-29 08:11
How to interface gpuOwl with PrimeNet preda PrimeNet 2 2017-10-07 21:32

All times are UTC. The time now is 22:42.

Wed Dec 2 22:42:20 UTC 2020 up 83 days, 19:53, 2 users, load averages: 1.38, 1.47, 1.97

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2020, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.