mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware > GPU Computing > GpuOwl

Reply
 
Thread Tools
Old 2020-09-07, 11:28   #23
preda
 
preda's Avatar
 
"Mihai Preda"
Apr 2015

1,327 Posts
Default

Quote:
Originally Posted by kriesel View Post
2020-09-04 05:23:39 asr2/radeonvii0 checksum de9da1a4 (expected 75062f10) in '.\63000061\proof\14519546'
Interesting. It looks as if that file was not written correctly, or corrupted on disk. When trying to use it for proof generation, the checksum mismatch was discovered. On restart, it tries first the proof power requested (8), and if that doesn't work, tries every power from higest to lowest to see if any is feasible given the residues that are present. This order is indeed not minimal (e.g. 8 is checked twice) but it represents an exceptional case so it's not worth "fixing" IMO.
preda is online now   Reply With Quote
Old 2020-09-07, 23:26   #24
storm5510
Random Account
 
storm5510's Avatar
 
Aug 2009
U.S.A.

2×839 Posts
Default

@preda:

Quote:
Originally Posted by Prime95
Storm your gpuowl proof was no good. You might want to contact Mihai with the details.
George asked me to contact you with more detail about this. It is for M10496897. I do not have many details to provide. My log file does got go back that far. I keep all my results for such instances:

Quote:
{"status":"C", "exponent":"10496897", "worktype":"PRP-3", "res64":"c25cf3c76e1acf6f", "residue-type":"1", "errors":{"gerbicz":"0"}, "fft-length":"655360", "proof":{"version":"1", "power":"8", "hashsize":"64", "md5":"7e64c95c13fcad24f5d666ca0e366e27"}, "program":{"name":"gpuowl", "version":"v6.11-364-g36f4e2a"}, "user":"storm5510", "computer":"7700_Kaby_Lake", "aid":"766017F5351C87AEFDFB0D475734D58A", "timestamp":"2020-08-26 15:29:16 UTC"}
storm5510 is offline   Reply With Quote
Old 2020-09-08, 00:04   #25
preda
 
preda's Avatar
 
"Mihai Preda"
Apr 2015

1,327 Posts
Default

Quote:
Originally Posted by storm5510 View Post
@preda:



George asked me to contact you with more detail about this. It is for M10496897. I do not have many details to provide. My log file does got go back that far. I keep all my results for such instances:
Do you have the proof file? Would be in a folder named "uploaded", in pool/ if you use -pool or in the run directory of gpuowl otherwise, and is named something like 10496897-8.proof . If you have it, please upload it somewhere (Drive, Box, etc) and send me the link, thanks.
preda is online now   Reply With Quote
Old 2020-09-08, 00:43   #26
storm5510
Random Account
 
storm5510's Avatar
 
Aug 2009
U.S.A.

110100011102 Posts
Default

Quote:
Originally Posted by preda View Post
Do you have the proof file? Would be in a folder named "uploaded", in pool/ if you use -pool or in the run directory of gpuowl otherwise, and is named something like 10496897-8.proof . If you have it, please upload it somewhere (Drive, Box, etc) and send me the link, thanks.
I didn't think about this. Yes, I still have it. Allow me to arrange something...

Edit: The proof is here: https://www.adrive.com/public/9Uj3T4/10496897-8.proof

Last fiddled with by storm5510 on 2020-09-08 at 00:51
storm5510 is offline   Reply With Quote
Old 2020-09-08, 01:42   #27
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

37×127 Posts
Default Improving fallback to lower proof power etc.

This was a power 8 proof PRP run on Radeon 5700XT and Windows 10 that went awry. Aside from the reported checksum error, I see a few additional issues with this run sequence.
Code:
2020-09-07 11:48:16 asr2/5700xt 99356317 OK 98400000  99.04%; 2212 us/it; ETA 0d 00:35; ed3540b5aa993e56 (check 1.11s)
2020-09-07 11:55:39 asr2/5700xt 99356317 OK 98600000  99.24%; 2213 us/it; ETA 0d 00:28; 4569ca1b42f97bf5 (check 1.11s)
2020-09-07 12:03:03 asr2/5700xt 99356317 OK 98800000  99.44%; 2212 us/it; ETA 0d 00:21; 52d0f07278cb3ea0 (check 1.10s)
2020-09-07 12:10:27 asr2/5700xt 99356317 OK 99000000  99.64%; 2213 us/it; ETA 0d 00:13; 214b5e72adcb0097 (check 1.10s)
2020-09-07 12:17:50 asr2/5700xt 99356317 OK 99200000  99.84%; 2212 us/it; ETA 0d 00:06; 8dc4afa02db98b6e (check 1.10s)
2020-09-07 12:23:36 asr2/5700xt CC 99356317 / 99356317, af767eb4030a____
2020-09-07 12:23:38 asr2/5700xt 99356317 OK 99356800 100.00%; 2215 us/it; ETA 0d 00:00; 5a424b4dc57d3ccf (check 1.07s)
2020-09-07 12:23:39 asr2/5700xt proof: building level 1, hash dc19c1ed5074bfed
2020-09-07 12:23:39 asr2/5700xt proof: building level 2, hash e1c39c39ef8fec8c2020-09-07 12:23:40 asr2/5700xt proof: building level 3, hash 4dff4687239f51cc
2020-09-07 12:23:42 asr2/5700xt proof: building level 4, hash 7803518131602fc6
2020-09-07 12:23:45 asr2/5700xt proof: building level 5, hash e6e93fce0591589a
2020-09-07 12:23:51 asr2/5700xt proof: building level 6, hash 486bd862e2a3633f
2020-09-07 12:23:53 asr2/5700xt checksum 78d6fc30 (expected 9be1d4ca) in '.\99356317\proof\23286660'
2020-09-07 12:23:53 asr2/5700xt Exception NSt10filesystem7__cxx1116filesystem_errorE: filesystem error: checksum mismatch: No error
2020-09-07 12:23:53 asr2/5700xt Bye

>gpuowl-win
2020-09-07 17:44:04 gpuowl v6.11-364-g36f4e2a
2020-09-07 17:44:04 config: -user kriesel -cpu asr2/5700xt -d 2 -use NO_ASM -maxAlloc 7500
2020-09-07 17:44:04 device 2, unique id ''
2020-09-07 17:44:04 asr2/5700xt 99356317 FFT: 5.50M 1K:11:256 (17.23 bpw)
2020-09-07 17:44:04 asr2/5700xt Expected maximum carry32: 293D0000
2020-09-07 17:44:05 asr2/5700xt OpenCL args "-DEXP=99356317u -DWIDTH=1024u -DSMALL_HEIGHT=256u -DMIDDLE=11u -DPM1=0 -DAMDGPU=1 -DWEIGHT_STEP_MINUS_1=0xb.52db15a632b98p-4 -DIWEIGHT_STEP_MINUS_1=-0xd.42fc054606498p-5 -DNO_ASM=1  -cl-unsafe-math-optimizations -cl-std=CL2.0 -cl-finite-math-only "
2020-09-07 17:44:13 asr2/5700xt OpenCL compilation in 8.11 s
2020-09-07 17:44:15 asr2/5700xt 99356317 OK 99200000 loaded: blockSize 400, 8dc4afa02db98b6e
2020-09-07 17:44:15 asr2/5700xt validating proof residues for power 8
2020-09-07 17:44:22 asr2/5700xt checksum 78d6fc30 (expected 9be1d4ca) in '.\99356317\proof\23286660'
2020-09-07 17:44:22 asr2/5700xt validating proof residues for power 9
2020-09-07 17:44:22 asr2/5700xt Can't open '.\99356317\proof\194056' (mode 'rb')
2020-09-07 17:44:22 asr2/5700xt validating proof residues for power 8
2020-09-07 17:44:27 asr2/5700xt checksum 78d6fc30 (expected 9be1d4ca) in '.\99356317\proof\23286660'
2020-09-07 17:44:27 asr2/5700xt validating proof residues for power 7
2020-09-07 17:44:30 asr2/5700xt checksum 78d6fc30 (expected 9be1d4ca) in '.\99356317\proof\23286660'
2020-09-07 17:44:30 asr2/5700xt validating proof residues for power 6
2020-09-07 17:44:30 asr2/5700xt Can't open '.\99356317\proof\1552443' (mode 'rb')
2020-09-07 17:44:30 asr2/5700xt Proof disabled because of missing checkpoints
2020-09-07 17:44:33 asr2/5700xt 99356317 OK 99200800  99.84%; 2199 us/it; ETA 0d 00:06; 924e6946b4f9fde2 (check 1.37s)
2020-09-07 17:50:17 asr2/5700xt CC 99356317 / 99356317, af767eb4030a5338
2020-09-07 17:50:18 asr2/5700xt 99356317 OK 99356400 100.00%; 2212 us/it; ETA 0d 00:00; abffad0e796314d6 (check 1.08s)
2020-09-07 17:50:18 asr2/5700xt {"status":"C", "exponent":"99356317", "worktype":"PRP-3", "res64":"af767eb4030a____", "residue-type":"1", "errors":{"gerbicz":"0"}, "fft-length":"5767168", "program":{"name":"gpuowl", "version":"v6.11-364-g36f4e2a"}, "user":"kriesel", "computer":"asr2/5700xt", "aid":"(redacted)", "timestamp":"2020-09-07 22:50:18 UTC"}
1) Gpuowl gives up, abandoning the run. It could skip to the next worktodo entry instead, putting hours or days of gpu time to productive use rather than leaving it idle until the user finds gpuowl halted.
2) There is a 1552444 iteration residue file, while in the restart it's looking for 1552443 at power 6. It seems there was a slight difference in computing how many iterations between the initial run and the restart or the original power and the fallback power.
3) It had already computed to 100% in the first run. And it recomputes from an indicated 99.84% to 100% in the restart. This is a minor production loss at 5 minutes 44 seconds.
4) The off-by-1, 1552444 vs. 1552443 prevents a power 6 proof from being generated in the restart.
5) Power 5 which would still save ~96% of a PRP DC is not attempted in the restart, or supported. (It might have the off by one, or more, issue too.) Admittedly this should be a rare case. Even power 4 would represent an occasional substantial savings over a complete DC as result of error.
6) For power 8, topk would be the next multiple of 256 above p which is 99356416 for p~99356317. Topk/256 for power 8 would be 388111. Saved residues would be at iterations that are multiples of that. Four times 388111=1552444, the first saved for power 6. The initial run goes past 99356416 to 99356800, presumably because of block size 400. But the restart computes only to 99356400, one less block for some reason. 99356400/256 = 388110.9375. Four times that is 1552443.75 which apparently got truncated to 1552443 for the power 6 restart. Or the restart proof attempts compute iteration count independently for each power ignoring the history of the exponent's run, or any need to ensure powers of 2 between iterations for different power proofs. If the restart omits the ceiling function, 99356317/256=388110.61328125; 4 times that is 1552442.453125, unlikely to produce 1552443 for power 6.
So I suspect there's no way currently to save the proof. I still have all the files generated, and have not yet reported the PRP result.

If in a future version, gpuowl computed topk for its maximum supported power (currently 9), then derived the specified power's iteration count for residues saved from multiples of that, some iteration multiples would be more reliably interchangeable among powers, improving fallback to lower powers upon an error. As is, topk/2^power for p=99356317 =
power, first residue save, proposed, nearest multiple from current power8 default;
9 194056 194056 na
8 388111 388112 388111
7 776222 776224 776222
6 1552443 1552448 1552444
5 3104885 3104896 3104888
4 6209770 6209792 6209776

Last fiddled with by kriesel on 2020-09-08 at 01:54
kriesel is online now   Reply With Quote
Old 2020-09-08, 12:10   #28
storm5510
Random Account
 
storm5510's Avatar
 
Aug 2009
U.S.A.

2×839 Posts
Default

It seems I have ran into another oddity. gpuOwl is refusing to run PRP-CF tests:

Quote:
1 FFT: 128K 256:1:256 (0.00 bpw)
FFT size too large for exponent (0.00 bits/word)
Exiting because "FFT size is too large"
This is on version 6.11-380-g79ea0cc. I never had a problem before, so something has changed. My GPU is an Nvidia GTX 1080. There are two lines in "use-flags" text file which sort of resembles the first line in my quoted section. It mentions 256,4,1. I do not know if this is relevant or not.
storm5510 is offline   Reply With Quote
Old 2020-09-08, 12:41   #29
ATH
Einyen
 
ATH's Avatar
 
Dec 2003
Denmark

297910 Posts
Default

Gpuowl can only do PRP tests, not PRP-CF test where 1 or more factors are known and testing the remaining cofactor.
ATH is offline   Reply With Quote
Old 2020-09-08, 15:28   #30
storm5510
Random Account
 
storm5510's Avatar
 
Aug 2009
U.S.A.

2×839 Posts
Default

Quote:
Originally Posted by ATH View Post
Gpuowl can only do PRP tests, not PRP-CF test where 1 or more factors are known and testing the remaining cofactor.
The assignment lines in worktodo begin with "PRP" only. I does not know the difference big or small. I have been running CF's, with known factor(s), since George began the certification process with Prime95 v30.x. The problem must be something else...
storm5510 is offline   Reply With Quote
Old 2020-09-08, 17:00   #31
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

37·127 Posts
Default

Gpuowl not running PRP-CF which it does not support at all, is not an oddity. Especially when the worktodo line says PRP. PRP-CF is not an implemented work type in gpuowl. It's likely that giving it known factors in the worktodo line is confusing the input parser that was not written to handle such data. Other types of input error will also give the error message reported or similar, such as omitting AID or a placeholder 0 instead of AID; it shifts k,b,n,c, etc. into unintended non-matching variables. Worktodo entry formats are described in https://www.mersenneforum.org/showpo...8&postcount=22

There is no utility to running a PRP test in gpuowl for something with a known factor. Nor in attempting to run a computation type in gpuowl that gpuowl does not implement, such as PRP-CF.

Last fiddled with by kriesel on 2020-09-08 at 17:04
kriesel is online now   Reply With Quote
Old 2020-09-08, 17:49   #32
storm5510
Random Account
 
storm5510's Avatar
 
Aug 2009
U.S.A.

32168 Posts
Default

Quote:
Originally Posted by kriesel View Post
Gpuowl not running PRP-CF which it does not support at all, is not an oddity. Especially when the worktodo line says PRP. PRP-CF is not an implemented work type in gpuowl. It's likely that giving it known factors in the worktodo line is confusing the input parser that was not written to handle such data. Other types of input error will also give the error message reported or similar, such as omitting AID or a placeholder 0 instead of AID; it shifts k,b,n,c, etc. into unintended non-matching variables. Worktodo entry formats are described in https://www.mersenneforum.org/showpo...8&postcount=22

There is no utility to running a PRPtest in gpuowl for something with a known factor. Nor in attempting to run a computation type it does not implement.
With all due respect, you are telling me I cannot run something which I am running now.

Quote:
PRP=<AID Removed>,1,2,7877777,-1,99,0,3,5,"47266663,9172685364795810287,125872567825872611377"
In the attached image, I have highlighted the version number at the top. This is the predecessor to the current which would not run CF's. I believe this is the same version which George sent a PM to me about a proof which was no good. I will only run this one assignment, and have George check it. If this proof is also bad, then I will stop. I have 5 other proofs from runs with gpuOwl. I received no notifications about problems with those proofs. I looked at my results on mersenne.org. I have 8 "verified" results and 1 "suspect." The suspect being the one I received the message about. preda requested the proof so I put it where he could download it.

In Prime95, I have my work type set to "First time PRP on Mersenne cofactors." If these are first time tests, then why do all the assignments I get have factors? Manual reservations are the same.
Attached Thumbnails
Click image for larger version

Name:	owl.JPG
Views:	29
Size:	105.8 KB
ID:	23290  
storm5510 is offline   Reply With Quote
Old 2020-09-08, 21:17   #33
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

37×127 Posts
Default

Quote:
Originally Posted by storm5510 View Post
With all due respect, you are telling me I cannot run something which I am running now.
...
PRP proofs getting certified do not prove a completed PRP test is a valid PRP-CF test. Check your gpuowl result lines for your PRP-CF attempts via PRP= worktodo lines with extra ignored parameters tacked on the end (residue-type,base,"factor1,factor2"). Residue type 1 result is a PRP test in current gpuowl. Check the PrimeNet status pages of exponents you've run PRP tests on gpuowl (including the ones you intended as PRP-CF); PRP test is what the software did, and PrimeNet displays, residue type 1.

A first time PRP-CF test of Mwhatever/"previously known factors, new-factor" would tell whether the cofactor is still composite, or a prime. PRP of composite Mwhatever will return composite no matter how many factors are known or unknown. And a type 1 PRP test's res64 will match a type 5 PRP-CF run if both are correct, independently of how many factors are input into the PRP-CF run done by software that supports it, because that's a property of type 5 PRP residue tests.

Last fiddled with by kriesel on 2020-09-08 at 21:47
kriesel is online now   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
Things that make you go "Hmmmm…" Xyzzy Lounge 4119 2020-11-21 16:07
GpuOwl PRP-Proof changes preda GpuOwl 20 2020-10-17 06:51
gpuOWL for Wagstaff GP2 GpuOwl 22 2020-06-13 16:57
gpuowl tuning M344587487 GpuOwl 14 2018-12-29 08:11
short runs or long runs MattcAnderson Operazione Doppi Mersennes 3 2014-02-16 15:19

All times are UTC. The time now is 12:08.

Tue Nov 24 12:08:57 UTC 2020 up 75 days, 9:19, 4 users, load averages: 1.07, 1.13, 1.34

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2020, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.