mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware > GPU Computing > GpuOwl

Reply
 
Thread Tools
Old 2020-09-11, 21:26   #56
moebius
 
moebius's Avatar
 
Jul 2009
Germany

2·229 Posts
Default

Quote:
Originally Posted by Prime95 View Post
I believed those early July gpuowl executable runs were all completed.
for information for the primenet-users : gpuowl v6.11-364 works fine with proofs in my opinion.
104984237

Am I right? Is recently yesterday? My gpuowl run of a 104M Exponent is at 59%.
example of the results.txt
{"status":"C", "exponent":"104984237", "worktype":"PRP-3", "res64":"2d72f5c9f11f87f8", "residue-type":"1", "errors":{"gerbicz":"0"}, "fft-length":"5767168", "proof":{"version":"1", "power":"8", "hashsize":"64", "md5":"ec16efafc1ec6c3bfc96727d0d4ea2b0"}, "program":{"name":"gpuowl", "version":"v6.11-364-g36f4e2a"}, "user":"geschwen", "computer":"AMD_RXVega64", "aid":"65AD3C59D05B3FDE860AF7768217E66E", "timestamp":"2020-09-10 15:16:42 UTC"}

Last fiddled with by moebius on 2020-09-11 at 22:07
moebius is offline   Reply With Quote
Old 2020-09-12, 02:06   #57
Prime95
P90 years forever!
 
Prime95's Avatar
 
Aug 2002
Yeehaw, FL

157668 Posts
Default

Quote:
Originally Posted by moebius View Post
for information for the primenet-users : gpuowl v6.11-364 works fine with proofs in my opinion.
104984237

Am I right? Is recently yesterday? My gpuowl run of a 104M Exponent is at 59%.
example of the results.txt
{"status":"C", "exponent":"104984237", "worktype":"PRP-3", "res64":"2d72f5c9f11f87f8", "residue-type":"1", "errors":{"gerbicz":"0"}, "fft-length":"5767168", "proof":{"version":"1", "power":"8", "hashsize":"64", "md5":"ec16efafc1ec6c3bfc96727d0d4ea2b0"}, "program":{"name":"gpuowl", "version":"v6.11-364-g36f4e2a"}, "user":"geschwen", "computer":"AMD_RXVega64", "aid":"65AD3C59D05B3FDE860AF7768217E66E", "timestamp":"2020-09-10 15:16:42 UTC"}
All gpuowl executables that produced proofs work just fine. Some of the early gpuowls either did not output the proof section in the JSON results or omitted the md5 value in the proof section of the JSON results.

The proof file MD5 hash is an important security feature during upload. Without it, a bad actor could submit a bogus proof for your PRP run. For these old gpuowls, the server was accepting proof uploads without requiring a MD5 match. The server is no longer accepting uploads without an MD5 match.
Prime95 is offline   Reply With Quote
Old 2020-09-12, 02:27   #58
moebius
 
moebius's Avatar
 
Jul 2009
Germany

2·229 Posts
Default

Quote:
Originally Posted by Prime95 View Post
All gpuowl executables that produced proofs work just fine. Some of the early gpuowls either did not output the proof section in the JSON results or omitted the md5 value in the proof section of the JSON results.
I just asked because it seems the output in results.txt changed from v6.11-364 to v.6.11-380. Thanks for your answer.

Last fiddled with by moebius on 2020-09-12 at 02:28
moebius is offline   Reply With Quote
Old 2020-09-12, 18:37   #59
moebius
 
moebius's Avatar
 
Jul 2009
Germany

2×229 Posts
Default

Quote:
Originally Posted by moebius View Post
I just asked because it seems the output in results.txt changed
I made a mistake, the output of gpuowl v6.11-364 is identical, the .proof file can be uploaded with uploader.exe without problems.
104984251

Last fiddled with by moebius on 2020-09-12 at 18:38
moebius is offline   Reply With Quote
Old 2020-09-28, 09:57   #60
aheeffer
 
Aug 2020

25 Posts
Default GNU MP: Cannot allocate memory

I got a crash this morning on a Radeon VII card running gpuowl v6.11-380 on Windows when starting the P2 stage:

GNU MP: Cannot allocate memory (size=1404976)

This error did not show up in the log file. The last line was:

2020-09-28 06:42:32 Rig-RadeonVII-03 205131791 P2 using 155 buffers of 88.0 MB each

Restarting the program produced:

2020-09-28 09:44:43 Rig-RadeonVII-03 205131791 P2 using 162 buffers of 88.0 MB each

and then it continued without problems.

Is this a problem with the card's memory? It ran fine for several months, except a few days ago when it stalled without crashing, dropping down to 25 Watts.
aheeffer is offline   Reply With Quote
Old 2020-09-28, 11:04   #61
paulunderwood
 
paulunderwood's Avatar
 
Sep 2002
Database er0rr

66558 Posts
Default

Maybe there were too many GCD happening at once on your multi card system. See this post on how to limit the RAM requirements for multicard systems, particularly the section "Use -maxAlloc to avoid out-of-memory with multi-jobs per card:".
paulunderwood is offline   Reply With Quote
Old 2020-09-28, 12:07   #62
aheeffer
 
Aug 2020

25 Posts
Default

Quote:
Originally Posted by paulunderwood View Post
Maybe there were too many GCD happening at once on your multi card system. See this post on how to limit the RAM requirements for multicard systems, particularly the section "Use -maxAlloc to avoid out-of-memory with multi-jobs per card:".
I am running only one instance per card, so I did not use -maxAlloc

In the meantime P2 has finished without problems but the card stalled during PRP without any information in the log file. Another card also stalled and gives this when I stopped gpuowl:

Code:
2020-09-28 09:27:33 Rig-RadeonVII-01 332292019 OK 115400000  34.73%; 3918 us/it; ETA 9d 20:02; 5eb5fa5c480d09ec (check 2.67s)
2020-09-28 09:40:40 Rig-RadeonVII-01 332292019 OK 115600000  34.79%; 3921 us/it; ETA 9d 20:03; 482edd18e909798d (check 2.73s)
2020-09-28 13:46:08 Rig-RadeonVII-01 Stopping, please wait..
2020-09-28 13:46:12 Rig-RadeonVII-01 332292019 EE 115711600  34.82%; 131987 us/it; ETA 330d 20:33; 8a914c104bc101b5 (check 2.01s)
2020-09-28 13:46:14 Rig-RadeonVII-01 332292019 EE 115600000 loaded: blockSize 400, 973f740ba9290ed0 (expected 482edd18e909798d)
2020-09-28 13:46:14 Rig-RadeonVII-01 Exiting because "error on load"
2020-09-28 13:46:14 Rig-RadeonVII-01 waiting for background GCDs..
2020-09-28 13:46:14 Rig-RadeonVII-01 Bye
Is it normal to have background GCDs during PRP testing?

Last fiddled with by aheeffer on 2020-09-28 at 12:14 Reason: additional data
aheeffer is offline   Reply With Quote
Old 2020-09-28, 14:09   #63
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

111668 Posts
Default

Quote:
Originally Posted by aheeffer View Post
I am running only one instance per card, so I did not use -maxAlloc

In the meantime P2 has finished without problems but the card stalled during PRP without any information in the log file. Another card also stalled and gives this when I stopped gpuowl:
Is it normal to have background GCDs during PRP testing?
-maxAlloc is necessary on NVIDIA gpus when running P-1 even with a single instance. Probably a good idea on AMD also. The value following will depend on the number of instances running on the gpu and its ram amount; value should exclude at least 5% of gpu ram from alloc, and excluding more may be necessary.

Problems can also occur when the gpu ram amount is fine, but the system cpu-side ram is small. If it is spilling to virtual memory and paging file, it will likely appear to have hung, or actually fail. I've had issues with low-system-ram (4GB) runs that disappeared after installing more system ram (to 10GB) that is supporting multiple gpus.

Background GCD of a P-1 stage during the beginning or resumption of PRP or LL or P-1 is normal; separate threads are used to occupy the gpu, and perform gcds on the cpu at the same time, to increase gpu utilization, compared to having the gpu idle while the gcd occurs on the cpu. Several cases may occur.

P-1 stage 1 gcd on cpu, P-1 speculative stage 2 computation of same exponent on gpu
P-1 stage 2 gcd on cpu, P-1 stage 1 computation of next worktodo entry on gpu
P-1 stage 2 gcd on cpu, PRP next worktodo entry on gpu
P-1 stage 2 gcd on cpu, LLDC next worktodo entry on gpu

Error on load is a separate issue. Usually a retry helps. I've had to move assignments or change fft selection sometimes.

Last fiddled with by kriesel on 2020-09-28 at 14:18
kriesel is offline   Reply With Quote
Old 2020-10-08, 13:41   #64
aheeffer
 
Aug 2020

408 Posts
Default

Quote:
Originally Posted by kriesel View Post
-maxAlloc is necessary on NVIDIA gpus when running P-1 even with a single instance. Probably a good idea on AMD also. The value following will depend on the number of instances running on the gpu and its ram amount; value should exclude at least 5% of gpu ram from alloc, and excluding more may be necessary.
The same Radeon VII card finished the job and failed to start the P2 stage for the next exponent repeatedly while running a single instance. I added -maxAlloc 15000 and the problem disappeared. I have not that problem with the other cards though.
aheeffer is offline   Reply With Quote
Old 2020-10-24, 12:18   #65
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

10010011101102 Posts
Default

On gpuowl-win V6.11-364-g36f4e2a, Windows 10 Pro x64, Radeon VII gpu, Celeron G1840 cpu, 16 GB system ram:
LL 182585281 hit a reproducible jacobi=1 error 2020-09-29 between 151148000 and 151500000, using default fft 10M 1K:10:512
Manually ratcheted up to 151,148,000 and then 151,304,000 using manual stop and start gave error.
Also err at 151,210,000.
Also err at 151,167,000.
Also err at 151,155,000, and at 151,150,000, and 151,149,000, so narrowed down to a range of 1000 iterations. Or maybe not, see last attempts.
Consider specifying -block smaller than 500000 default; small -log interval; -fft nondefault to get through; 1K:5:1K, 512:10:1K, 4K:5:256 all failed.
or increase to 11M fft 1K:11:512 512:11:1K; 12M 1K:12:512 (all failed);
FFT lengths 1K:6:1K 512:12:1K 4K:6:256 4K:3:512 not tried.
retreat to older save file, try with -log 10000 -fft 1K:6:1K -jacobi 100000; also an error with 12M 1K:12:512 (14.51 bpw)
The run seems unrecoverable at ~83% complete.
Code:
2020-09-29 07:46:29 asr2/radeonvii0 182585281 LL 150000000  82.15%; 1562 us/it; ETA 0d 14:08; e82c599e9d12e4a1
2020-09-29 07:49:06 asr2/radeonvii0 182585281 LL 150100000  82.21%; 1567 us/it; ETA 0d 14:08; 9145f592abe47a18
2020-09-29 07:49:06 asr2/radeonvii0 182585281 OK 150000000 (jacobi == -1)
2020-09-29 08:11:12 asr2/radeonvii0 182585281 LL 150200000  82.26%; 13267 us/it; ETA 4d 23:21; e6ed5af5361619e1
2020-09-29 08:13:49 asr2/radeonvii0 182585281 LL 150300000  82.32%; 1565 us/it; ETA 0d 14:02; 84e6b7ffe3b0b099
2020-09-29 08:16:25 asr2/radeonvii0 182585281 LL 150400000  82.37%; 1563 us/it; ETA 0d 13:59; d8567e98d1cfcfad
2020-09-29 08:19:02 asr2/radeonvii0 182585281 LL 150500000  82.43%; 1566 us/it; ETA 0d 13:57; 8766d826a2bf2722
2020-09-29 08:21:38 asr2/radeonvii0 182585281 LL 150600000  82.48%; 1564 us/it; ETA 0d 13:54; 7065b4e2ee7c5af6
2020-09-29 08:21:38 asr2/radeonvii0 182585281 EE 150500000 (jacobi == 1)
2020-09-29 08:21:39 asr2/radeonvii0 182585281 LL 150000000 loaded: e82c599e9d12e4a1
2020-09-29 08:24:15 asr2/radeonvii0 182585281 LL 150100000  82.21%; 1562 us/it; ETA 0d 14:06; 72dc26939901d8ff
2020-09-29 08:26:51 asr2/radeonvii0 182585281 LL 150200000  82.26%; 1562 us/it; ETA 0d 14:03; fce6b27abfc95071
2020-09-29 08:29:27 asr2/radeonvii0 182585281 LL 150300000  82.32%; 1562 us/it; ETA 0d 14:00; a604948ed76eb561
2020-09-29 08:32:03 asr2/radeonvii0 182585281 LL 150400000  82.37%; 1562 us/it; ETA 0d 13:58; 1204fb6405f65036
2020-09-29 08:34:40 asr2/radeonvii0 182585281 LL 150500000  82.43%; 1562 us/it; ETA 0d 13:55; 8db5d4ec89674b4c
2020-09-29 08:37:16 asr2/radeonvii0 182585281 LL 150600000  82.48%; 1564 us/it; ETA 0d 13:54; c8c68013c89e07e0
2020-09-29 08:37:16 asr2/radeonvii0 182585281 OK 150500000 (jacobi == -1)
2020-09-29 08:39:52 asr2/radeonvii0 182585281 LL 150700000  82.54%; 1564 us/it; ETA 0d 13:51; 0c27956a56f9471e
2020-09-29 08:42:29 asr2/radeonvii0 182585281 LL 150800000  82.59%; 1562 us/it; ETA 0d 13:47; 7b07715af7efde9f
2020-09-29 08:45:05 asr2/radeonvii0 182585281 LL 150900000  82.65%; 1562 us/it; ETA 0d 13:45; 27eba22000d74e46
2020-09-29 08:47:41 asr2/radeonvii0 182585281 LL 151000000  82.70%; 1562 us/it; ETA 0d 13:42; 01ffc10aaaedc8c7
2020-09-29 08:50:17 asr2/radeonvii0 182585281 LL 151100000  82.76%; 1564 us/it; ETA 0d 13:41; 08d976f434305199
2020-09-29 08:50:17 asr2/radeonvii0 182585281 OK 151000000 (jacobi == -1)
2020-09-29 08:52:54 asr2/radeonvii0 182585281 LL 151200000  82.81%; 1563 us/it; ETA 0d 13:38; f892d0f8a8b7a98e
2020-09-29 08:55:30 asr2/radeonvii0 182585281 LL 151300000  82.87%; 1562 us/it; ETA 0d 13:34; f46d2fafce514a27
2020-09-29 08:58:06 asr2/radeonvii0 182585281 LL 151400000  82.92%; 1562 us/it; ETA 0d 13:32; fd9f1af203376cfd
2020-09-29 09:00:42 asr2/radeonvii0 182585281 LL 151500000  82.97%; 1562 us/it; ETA 0d 13:29; e9e300c3bcda75bd
2020-09-29 09:03:19 asr2/radeonvii0 182585281 LL 151600000  83.03%; 1566 us/it; ETA 0d 13:29; 85443e4f78aa1351
2020-09-29 09:03:19 asr2/radeonvii0 182585281 EE 151500000 (jacobi == 1)
2020-09-29 09:03:19 asr2/radeonvii0 182585281 LL 151000000 loaded: 01ffc10aaaedc8c7
2020-09-29 09:05:55 asr2/radeonvii0 182585281 LL 151100000  82.76%; 1562 us/it; ETA 0d 13:40; 08d976f434305199
2020-09-29 09:08:29 asr2/radeonvii0 182585281 LL 151200000  82.81%; 1562 us/it; ETA 0d 13:37; f892d0f8a8b7a98e
2020-09-29 09:11:05 asr2/radeonvii0 182585281 LL 151300000  82.87%; 1562 us/it; ETA 0d 13:35; f46d2fafce514a27
2020-09-29 09:13:42 asr2/radeonvii0 182585281 LL 151400000  82.92%; 1562 us/it; ETA 0d 13:32; fd9f1af203376cfd
2020-09-29 09:16:18 asr2/radeonvii0 182585281 LL 151500000  82.97%; 1562 us/it; ETA 0d 13:29; e9e300c3bcda75bd
2020-09-29 09:18:54 asr2/radeonvii0 182585281 LL 151600000  83.03%; 1565 us/it; ETA 0d 13:28; 85443e4f78aa1351
2020-09-29 09:18:54 asr2/radeonvii0 182585281 EE 151500000 (jacobi == 1)
2020-09-29 09:18:55 asr2/radeonvii0 182585281 LL 151000000 loaded: 01ffc10aaaedc8c7
2020-09-29 09:21:31 asr2/radeonvii0 182585281 LL 151100000  82.76%; 1562 us/it; ETA 0d 13:40; 08d976f434305199
2020-09-29 09:24:07 asr2/radeonvii0 182585281 LL 151200000  82.81%; 1562 us/it; ETA 0d 13:37; f892d0f8a8b7a98e
2020-09-29 09:26:43 asr2/radeonvii0 182585281 LL 151300000  82.87%; 1562 us/it; ETA 0d 13:35; f46d2fafce514a27
2020-09-29 09:29:19 asr2/radeonvii0 182585281 LL 151400000  82.92%; 1562 us/it; ETA 0d 13:32; fd9f1af203376cfd
2020-09-29 09:31:56 asr2/radeonvii0 182585281 LL 151500000  82.97%; 1562 us/it; ETA 0d 13:29; e9e300c3bcda75bd
2020-09-29 09:34:32 asr2/radeonvii0 182585281 LL 151600000  83.03%; 1565 us/it; ETA 0d 13:28; 85443e4f78aa1351
2020-09-29 09:34:32 asr2/radeonvii0 182585281 EE 151500000 (jacobi == 1)
2020-09-29 09:34:33 asr2/radeonvii0 182585281 LL 151000000 loaded: 01ffc10aaaedc8c7
...
Code:
2020-09-29 12:10:50 asr2/radeonvii0 182585281 LL 151600000  83.03%; 1564 us/it; ETA 0d 13:28; 85443e4f78aa1351
2020-09-29 12:10:50 asr2/radeonvii0 182585281 EE 151500000 (jacobi == 1)
2020-09-29 12:10:50 asr2/radeonvii0 182585281 LL 151000000 loaded: 01ffc10aaaedc8c7
2020-09-29 12:13:27 asr2/radeonvii0 182585281 LL 151100000  82.76%; 1562 us/it; ETA 0d 13:40; 08d976f434305199
2020-09-29 12:14:41 asr2/radeonvii0 Stopping, please wait..
2020-09-29 12:14:42 asr2/radeonvii0 182585281 LL 151148000  82.78%; 1566 us/it; ETA 0d 13:40; 524d711bc2b63b53
2020-09-29 12:14:42 asr2/radeonvii0 waiting for the Jacobi check to finish..
2020-09-29 12:16:38 asr2/radeonvii0 182585281 OK 151148000 (jacobi == -1)
2020-09-29 12:16:38 asr2/radeonvii0 Exiting because "stop requested"
2020-09-29 12:16:38 asr2/radeonvii0 waiting for background GCDs..
kriesel is offline   Reply With Quote
Old 2020-10-31, 14:01   #66
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

2×17×139 Posts
Default

On gpuowl-win v7.0-40-gb62d4fd, Windows 10 Pro x64, Radeon VII gpu, Celeron G1840 cpu, 16 GB system ram:

worktodo line: B1=1900000,B2=80000000;PRP=(aid),1,2,310347613,-1,80,2

Code:
2020-10-31 08:23:15 asr2/radeonvii0 310347613     95990000  30.93% ff2b7527b4cb7321
2020-10-31 08:23:55 asr2/radeonvii0 310347613 EE  96000000  30.93% 53889815f700a921 3766 us/it; ETA 9d 08:13 3 errors
2020-10-31 08:23:58 asr2/radeonvii0 310347613 OK  95900000 loaded: blockSize 500, 51fbdc6128554059
2020-10-31 08:24:04 asr2/radeonvii0 310347613 OK  95900500  30.90% 3766a119900725db    1 us/it; ETA 0d 00:03 4 errors
2020-10-31 08:24:39 asr2/radeonvii0 310347613     95910000  30.90% ed0497e6de8357fd
Note the reported timing just after the reload after error, 1 us/it. Looks like the number of iterations and time interval for that line don't match.
kriesel is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
Things that make you go "Hmmmm…" Xyzzy Lounge 4137 2020-11-25 22:49
GpuOwl PRP-Proof changes preda GpuOwl 20 2020-10-17 06:51
gpuOWL for Wagstaff GP2 GpuOwl 22 2020-06-13 16:57
gpuowl tuning M344587487 GpuOwl 14 2018-12-29 08:11
short runs or long runs MattcAnderson Operazione Doppi Mersennes 3 2014-02-16 15:19

All times are UTC. The time now is 08:35.

Sat Nov 28 08:35:12 UTC 2020 up 79 days, 5:46, 3 users, load averages: 1.56, 1.64, 1.63

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2020, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.