mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware > GPU Computing > GpuOwl

Reply
 
Thread Tools
Old 2020-10-14, 10:48   #89
UBR47K
 
UBR47K's Avatar
 
Aug 2015

4016 Posts
Default another proof failure

https://www.mersenne.org/report_expo...8980089&full=1

This AMD Fury X card consistently fails to generate PRP proofs (but it seems to produce valid RES64).

Self verification fails:
Code:
$ ./gpuowl -verify proof/108980089-9.proof                                                                                                      
2020-10-14 11:21:16 gpuowl v7.0-25-g1cbd87d-dirty                                                                                                                               
2020-10-14 11:21:16 config: -proof 9                                                                                                                                            
2020-10-14 11:21:16 config: -maxAlloc 3584                                                                                                                                      
2020-10-14 11:21:16 config: -verify proof/108980089-9.proof                                                                                                                     
2020-10-14 11:21:16 device 0, unique id ''                                                                                                                                      
2020-10-14 11:21:16 gfx803-0 0 FFT: 6M 1K:12:256 (17.32 bpw)                                                                                                                    
2020-10-14 11:21:17 gfx803-0 0 OpenCL args "-DEXP=108980089u -DWIDTH=1024u -DSMALL_HEIGHT=256u -DMIDDLE=12u -DAMDGPU=1 -DCARRY64=1 -DCARRYM64=1 -DWEIGHT_STEP_MINUS_1=0x1.333492
ce02374p-1 -DIWEIGHT_STEP_MINUS_1=-0x1.800112b07bd55p-2  -cl-unsafe-math-optimizations -cl-std=CL2.0 -cl-finite-math-only "                                                     
2020-10-14 11:21:20 gfx803-0 0 /tmp/comgr-8d0411/input/CompileSource:50:9: warning: GpuOwl requires OpenCL 200, found 120                                                       
#pragma message "GpuOwl requires OpenCL 200, found " STR(__OPENCL_VERSION__)                                                                                                    
        ^                                                                                                                                                                       
1 warning generated.                                                                                                                                                            
                                                                                                                                                                                
2020-10-14 11:21:20 gfx803-0 0 OpenCL compilation in 2.78 s                                                                                                                     
2020-10-14 11:21:20 gfx803-0 0 proof: doing 136 iterations                                                                                                                      
2020-10-14 11:21:29 gfx803-0 0 proof verification: doing 212852 iterations                                                                                                      
2020-10-14 11:22:18 gfx803-0 0 20000 / 212852, 2414 us/it                                                                                                                       
2020-10-14 11:23:06 gfx803-0 0 40000 / 212852, 2414 us/it                                                                                                                       
2020-10-14 11:23:54 gfx803-0 0 60000 / 212852, 2425 us/it                                                                                                                       
2020-10-14 11:24:43 gfx803-0 0 80000 / 212852, 2414 us/it                                                                                                                       
2020-10-14 11:25:31 gfx803-0 0 100000 / 212852, 2414 us/it                                                                                                                      
2020-10-14 11:26:19 gfx803-0 0 120000 / 212852, 2414 us/it                                                                                                                      
2020-10-14 11:27:08 gfx803-0 0 140000 / 212852, 2419 us/it                                                                                                                      
2020-10-14 11:27:56 gfx803-0 0 160000 / 212852, 2414 us/it                                                                                                                      
2020-10-14 11:28:44 gfx803-0 0 180000 / 212852, 2414 us/it                                                                                                                      
2020-10-14 11:29:33 gfx803-0 0 200000 / 212852, 2414 us/it                                                                                                                      
2020-10-14 11:30:04 gfx803-0 0 proof: invalid (364e0402bdbXXXXX expected ebe899f33efXXXXX)                                                                                      
2020-10-14 11:30:04 gfx803-0 0 proof 'proof/108980089-9.proof' failed                                                                                                           
2020-10-14 11:30:04 gfx803-0 Bye

Last fiddled with by UBR47K on 2020-10-14 at 10:49
UBR47K is online now   Reply With Quote
Old 2020-10-14, 11:28   #90
preda
 
preda's Avatar
 
"Mihai Preda"
Apr 2015

101001100002 Posts
Default

Quote:
Originally Posted by UBR47K View Post
https://www.mersenne.org/report_expo...8980089&full=1

This AMD Fury X card consistently fails to generate PRP proofs (but it seems to produce valid RES64).

Self verification fails: [..]
Thanks. I don't know yet the reason for this. Also I can't seem to reproduce.
Does that GPU produce any errors during the PRP? (EE) (i.e. is the PRP 100% reliable, or sometimes there are errors and retries?)

One approach I'm thinking of to tackle this is:
- for power >= 10, do automatic local proof verification before upload (becase at power 10 it becomes cheap enough to not matter) -- this would make sure that the server does not see invalid proofs anymore in this situation.
- if you have enough free disk space, enable power=10.

How often do you get such invalid proofs -- 100%? sometimes?
preda is offline   Reply With Quote
Old 2020-10-14, 11:36   #91
preda
 
preda's Avatar
 
"Mihai Preda"
Apr 2015

24×83 Posts
Default Worktype for PRP + P-1 in primenet.py : PRP_P1

The script gpuowl/tools/primenet.py has a new work-type for "PRP that needs P-1", which is what you want for the merged PRP + P-1; it's called "PRP_P1":

primenet.py -w PRP_P1 etc.

If you run it with only -w PRP (the old way) you may get assignments that already had P-1 done. That's fine as long as you don't trigger another P-1 on them, which would be mostly a waste (because of the low probability of a factor to be found in the "additional" P-1).

PS: I don't think this worktype is available on the manual assignment web page yet.

Last fiddled with by preda on 2020-10-14 at 11:37
preda is offline   Reply With Quote
Old 2020-10-14, 11:46   #92
UBR47K
 
UBR47K's Avatar
 
Aug 2015

26 Posts
Default

Quote:
Originally Posted by preda View Post
Thanks. I don't know yet the reason for this. Also I can't seem to reproduce.
Does that GPU produce any errors during the PRP? (EE) (i.e. is the PRP 100% reliable, or sometimes there are errors and retries?)
No errors or retries during PRP.
My Radeon VII seems to generate correct PRP proofs, only this particular card is strange.

Quote:
Originally Posted by preda View Post
One approach I'm thinking of to tackle this is:
- for power >= 10, do automatic local proof verification before upload (becase at power 10 it becomes cheap enough to not matter) -- this would make sure that the server does not see invalid proofs anymore in this situation.
- if you have enough free disk space, enable power=10.

How often do you get such invalid proofs -- 100%? sometimes?
All proofs made with this Fury X are invalid.
UBR47K is online now   Reply With Quote
Old 2020-10-16, 04:48   #93
preda
 
preda's Avatar
 
"Mihai Preda"
Apr 2015

24·83 Posts
Default

Quote:
Originally Posted by kriesel View Post
The % complete is inconsistent between a stop and a resume.
The P2 iterations apparently don't all get saved at a stop and resume.
Some of these should be fixed now:
- consistent % progress across restart.

Ctrl-C during P2 has this behavior:
- first Ctrl-C triggers GCD, which is followed by a save and exit 30s later
- second Ctrl-C abruptly exits, and you lose the progress since the last GCD
(as you see, P2 save only takes place after a GCD. The savefile basically records "GCD was done up to this point").
preda is offline   Reply With Quote
Old 2020-10-16, 04:52   #94
preda
 
preda's Avatar
 
"Mihai Preda"
Apr 2015

24×83 Posts
Default P2 changes

A new prime-pairing algorithm for P2 has been implemented, P2 is now (significantly) faster.

This required a bump of the P2 savefile version, so don't switch version in the middle of P2, wait till P2 finishes.

Also because of the changes it's no longer possible to change the B2 bound during P2.

PS: and the default ratio B2/B1 is now 30, because of the lower relative cost of P2. This may bite you when upgrading in the middle of a PRP test (after P2 is complete) as you'd see the B2 bound changing because of this ratio. In this case, simply manually force the old B2 until the end of the PRP.

Last fiddled with by preda on 2020-10-16 at 04:59
preda is offline   Reply With Quote
Old 2020-10-16, 05:58   #95
petrw1
1976 Toyota Corona years forever!
 
petrw1's Avatar
 
"Wayne"
Nov 2006
Saskatchewan, Canada

2·7·11·29 Posts
Default

Quote:
Originally Posted by preda View Post
A new prime-pairing algorithm for P2 has been implemented, P2 is now (significantly) faster.
Probably a dumb question but is there any way to get this enhancement in the previous stand-alone P-1 version?

Thanks
petrw1 is offline   Reply With Quote
Old 2020-10-16, 06:56   #96
preda
 
preda's Avatar
 
"Mihai Preda"
Apr 2015

24×83 Posts
Default

Quote:
Originally Posted by petrw1 View Post
Probably a dumb question but is there any way to get this enhancement in the previous stand-alone P-1 version?

Thanks
Should be possible to port it over, yes. Simply take the output of stage1 and plug it into the new stage2. Of course things are a bit hairy, so that would still require some significant work I assume.

It's not on my personal to-do list though. I also don't see the reason for the old-style P-1, so that's why I'm not particularly motivated in that direction.

So, unless somebody steps up, I guess unfortunatelly the answer is "theoretically yes, practically no"..

PS: I don't want this to sound like I'm against it, which I'm not, it's just that I actually have a ton of things to do lined up on an imaginary list already. working through them.

Last fiddled with by preda on 2020-10-16 at 07:00
preda is offline   Reply With Quote
Old 2020-10-16, 06:59   #97
preda
 
preda's Avatar
 
"Mihai Preda"
Apr 2015

24608 Posts
Default

Quote:
Originally Posted by kriesel View Post
Note, one of the awkward things about/during P2 is there is no ETA (for the more likely NF case, or the less likely F case).
P2 ETA added. Should be quite accurate, and P2 progress % is v. accurate too.
preda is offline   Reply With Quote
Old 2020-10-16, 12:42   #98
preda
 
preda's Avatar
 
"Mihai Preda"
Apr 2015

24×83 Posts
Default P2 hardening

Let me start with a brief description of how P2 works:

There is a large set of precomputed buffers, as many as the GPU memory allows (a 16GB GPU at the vawefront accommodates a bit over 300 buffers). These buffers are computed once, at the start of P2, and then never change (but they are used (read) regularly).

There is also a set of 3 walking big-step buffers, that are updated ("stepped") once per block. (the range of primes to be covered is split into blocks "D" in size, where D=330 usually).

P2 repeatedly selects one of the precomputed buffers, subtracts it from the big-step buffer, and multiplies the result into an accumulator "Acc".

Errors and checks.

Errors can happen in these places:
a) error in the initial computation of the precomputed buffers
b) precomputed buffers get mutated in GPU memory (bit-flip)
c) error in the big-step buffer initial computation or increment
d) error in the "Acc" multiplication

Checks:

d) I don't have a check on the Acc MUL; but the accumulator has a self-healing behavior, where an error in the accumulator multiply affects only the primes since the last GCD up to the error location, but not afterwards. So the effect of an error in the Acc MUL is self-contained.

a) For point "a", the initial buffers computation is done twice and the results compared. This does not protect from programmer error, so one final value of the precomputed buffers is also computed in a different way -- this gives a degree of confidence that the algo is good if the values concide (which would be an unlikely coincidence otherwise).

b) Once we are confident in the values of the precomputed buffers, they are snapshotted with a very simple checksum (just a sum64 of each buffer). Before each GCD, we recompute and compare the checksums of the buffers to verify that they didn't mutate under our feet.

c) The big-step value can also be indepently computed at any point with a simple exponentiation. We do this before GCD, and compare the values.

So the checks for b) and c) are run before each P2 GCD -- they are very fast, under 1s in total.

Barring programmer error, this set of P2 error checks allows to run large (huge) P2 with a lower risk of "doing useless work" because an early hardware error ruins all the rest.

Last fiddled with by preda on 2020-10-16 at 12:43
preda is offline   Reply With Quote
Old 2020-10-18, 08:35   #99
frmky
 
frmky's Avatar
 
Jul 2003
So Cal

22×33×19 Posts
Default

Thought I'd try this on an nVidia Telsa K20. I compiled using clang++-9 on Ubuntu 18.04 LTS since g++-8 failed.

Using ./gpuowl -prp 6972649 -b1 1500000 -b2 10000000 -maxAlloc 4.5G
I got
Code:
2020-10-18 01:23:20 Tesla K20c-0 6972649      2150000  30.83% 3fa14f3f8a7af59c
2020-10-18 01:23:24 Tesla K20c-0 6972649      2160000  30.98% 352eddd00f6ce8b0
2020-10-18 01:23:29 Tesla K20c-0 6972649 P1(1.5M) releasing 3050 buffers
2020-10-18 01:23:29 Tesla K20c-0 6972649 Released memory lock 'memlock-0'
2020-10-18 01:23:29 Tesla K20c-0 6972649 OK   2164500  31.04% b7b43bcc9edb8fcf  385 us/it; ETA 00:31
2020-10-18 01:23:29 Tesla K20c-0 6972649 P1   2164500 starting Jacobi check
2020-10-18 01:23:31 Tesla K20c-0 6972649 P1 Jacobi check OK
2020-10-18 01:23:31 Tesla K20c-0 6972649 OK   2168500  31.10% a2dae35546023a65  396 us/it; ETA 00:32
2020-10-18 01:23:31 Tesla K20c-0 6972649 P2(1.5M,10M) D=330, nBuf=1522
2020-10-18 01:23:31 Tesla K20c-0 6972649 P2(1.5M,10M) Generating P2 plan, please wait..
gpuowl: Pm1Plan.cpp:212: void Pm1Plan::scan(const vector<bool> &, u32, vector<Pm1Plan::BitBlock> &, Fun) [Fun = (lambda at Pm1Plan.cpp:269:40)]: Assertion `!blockBits[pos]' failed.
Aborted (core dumped)
frmky is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
GpuOwl PRP-Proof changes preda GpuOwl 20 2020-10-17 06:51
gpuowl: runtime error SELROC GpuOwl 59 2020-10-02 03:56
gpuOWL for Wagstaff GP2 GpuOwl 22 2020-06-13 16:57
gpuowl tuning M344587487 GpuOwl 14 2018-12-29 08:11
How to interface gpuOwl with PrimeNet preda PrimeNet 2 2017-10-07 21:32

All times are UTC. The time now is 11:55.

Fri Nov 27 11:55:41 UTC 2020 up 78 days, 9:06, 4 users, load averages: 0.99, 1.20, 1.23

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2020, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.