mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware > GPU Computing > GpuOwl

Reply
 
Thread Tools
Old 2020-08-07, 18:21   #1
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

2×17×139 Posts
Default Things that make you go hmm, concerning gpuowl runs

This thread is intended for operating oddities reports, which might include outright bugs, but definitely includes mysteries, puzzles, head scratchers.


Seen on a system running Windows 10 Pro, on a Radeon VII that previously had run P-1 successfully with maxAlloc 15000, before a motherboard failure and replacement. (Actually two; the working theory at the moment is prime95 and mfakto together on an i7-4790 is too much for the cpu VRMs/traces on the motherboard design; exact same device/location flamed out twice.) The replacement MB only has 4GB of ram and a Celeron G1840, while the original had 16GB and an I7-4790. All boards used on this build are Asrock H81 BTC Pro 2.0, BIOS version 1.20, configured as identically as I could manage. (Another system running 5 TF gpus was built with the same board model and cpu type first and is reliable so far, without running mfakto on it.)

The maxalloc value needed to be reduced, with previous experience indicating 12000 was too much but 8000 ran P-1 ok. Now even that has failed, as follows. And it takes the other gpu's run down too when it fails.

Code:
2020-08-07 12:10:03 asr2/radeonvii4 99873313 P1  1450000  96.66%;  851 us/it; ETA 0d 00:01; d8b47a4c90ea4a28
2020-08-07 12:10:12 asr2/radeonvii4 99873313 P1  1460000  97.32%;  852 us/it; ETA 0d 00:01; e599ab00cd9196f2
2020-08-07 12:10:20 asr2/radeonvii4 99873313 P1  1470000  97.99%;  852 us/it; ETA 0d 00:00; 7017d97a0c0522b1
2020-08-07 12:10:29 asr2/radeonvii4 99873313 P1  1480000  98.66%;  852 us/it; ETA 0d 00:00; 75aea69ae7650b45
2020-08-07 12:10:37 asr2/radeonvii4 99873313 P1  1490000  99.32%;  852 us/it; ETA 0d 00:00; cd3cbe8e6a7b07a9
2020-08-07 12:10:46 asr2/radeonvii4 99873313 P1  1500000  99.99%;  852 us/it; ETA 0d 00:00; 63073d1296f10b36
2020-08-07 12:10:46 asr2/radeonvii4 saved
2020-08-07 12:10:47 asr2/radeonvii4 99873313 P1  1500153 100.00%; 3878 us/it; ETA 0d 00:00; 598a6b10499394d7
2020-08-07 12:10:47 asr2/radeonvii4 P-1 (B1=1040000, B2=28080000, D=30030): primes 1664694, expanded 1752244, doubles 281399 (left 1126189), singles 1101896, total 1383295 (83%)
2020-08-07 12:10:47 asr2/radeonvii4 99873313 P2 using blocks [35 - 935] to cover 1383295 primes
2020-08-07 12:10:47 asr2/radeonvii4 99873313 P2 using 163 buffers of 44.0 MB each
GNU MP: Cannot allocate memory (size=430096)

>gpuowl-win
2020-08-07 12:40:14 gpuowl v6.11-364-g36f4e2a
2020-08-07 12:40:14 config: -user kriesel -cpu asr2/radeonvii4 -d 1 -use NO_ASM -maxAlloc 8000
2020-08-07 12:40:14 device 1, unique id ''
2020-08-07 12:40:14 asr2/radeonvii4 99873313 FFT: 5.50M 1K:11:256 (17.32 bpw)
2020-08-07 12:40:14 asr2/radeonvii4 Expected maximum carry32: 2BE10000
2020-08-07 12:40:15 asr2/radeonvii4 OpenCL args "-DEXP=99873313u -DWIDTH=1024u -DSMALL_HEIGHT=256u -DMIDDLE=11u -DPM1=1 -DAMDGPU=1 -DCARRYM64=1 -DWEIGHT_STEP_MINUS_1=0x9.ad71e29311eb8p-4 -DIWEIGHT_STEP_MINUS_1=-0xc.0f74fd9784338p-5 -DNO_ASM=1  -cl-unsafe-math-optimizations -cl-std=CL2.0 -cl-finite-math-only "
2020-08-07 12:40:20 asr2/radeonvii4 OpenCL compilation in 4.08 s
2020-08-07 12:40:20 asr2/radeonvii4 99873313 P1 B1=1040000, B2=28080000; 1500153 bits; starting at 1500152
2020-08-07 12:40:20 asr2/radeonvii4 99873313 P1  1500153 100.00%; 205759 us/it; ETA 0d 00:00; 598a6b10499394d7
2020-08-07 12:40:21 asr2/radeonvii4 P-1 (B1=1040000, B2=28080000, D=30030): primes 1664694, expanded 1752244, doubles 281399 (left 1126189), singles 1101896, total 1383295 (83%)
2020-08-07 12:40:21 asr2/radeonvii4 99873313 P2 using blocks [35 - 935] to cover 1383295 primes
2020-08-07 12:40:21 asr2/radeonvii4 99873313 P2 using 163 buffers of 44.0 MB each
2020-08-07 12:41:13 asr2/radeonvii4 99873313 P1 GCD: no factor
2020-08-07 12:41:48 asr2/radeonvii4 99873313 P2  163/2880: 79045 primes; setup  0.88 s,   1.093 ms/prime
Example of other Radeon VII gpu on the system getting derailed:

Code:
2020-08-07 12:05:31 asr2/radeonvii0 162221989 OK 73400000  45.25%; 1416 us/it; ETA 1d 10:56; 55d5d6561a915117 (check 0.95s)
2020-08-07 12:10:15 asr2/radeonvii0 162221989 OK 73600000  45.37%; 1418 us/it; ETA 1d 10:55; 569862b34f411d19 (check 0.94s)
2020-08-07 12:14:59 asr2/radeonvii0 Exception St9bad_alloc: std::bad_alloc
2020-08-07 12:14:59 asr2/radeonvii0 Bye
>gpuowl-win
2020-08-07 12:40:42 gpuowl v6.11-364-g36f4e2a
2020-08-07 12:40:42 config: -user kriesel -cpu asr2/radeonvii0 -d 0 -maxAlloc 8000 -proof 8
2020-08-07 12:40:42 device 0, unique id ''
2020-08-07 12:40:42 asr2/radeonvii0 worktodo.txt line ignored: ";B1=1570000,B2=43960000;PFactor=0,1,2,159000187,-1,78,2"
2020-08-07 12:40:42 asr2/radeonvii0 162221989 FFT: 9M 1K:9:512 (17.19 bpw)
2020-08-07 12:40:42 asr2/radeonvii0 Expected maximum carry32: 34DC0000
2020-08-07 12:40:43 asr2/radeonvii0 OpenCL args "-DEXP=162221989u -DWIDTH=1024u -DSMALL_HEIGHT=512u -DMIDDLE=9u -DPM1=0 -DAMDGPU=1 -DWEIGHT_STEP_MINUS_1=0xc.0ed81c3e4fa9p-4 -DIWEIGHT_STEP_MINUS_1=-0xd.c0880129bf2c8p-5  -cl-unsafe-math-optimizations -cl-std=CL2.0 -cl-finite-math-only "
2020-08-07 12:40:43 asr2/radeonvii0 ASM compilation failed, retrying compilation using NO_ASM
2020-08-07 12:40:49 asr2/radeonvii0 OpenCL compilation in 5.94 s
2020-08-07 12:40:51 asr2/radeonvii0 162221989 OK 73600000 loaded: blockSize 400, 569862b34f411d19
2020-08-07 12:40:51 asr2/radeonvii0 validating proof residues for power 8
2020-08-07 12:41:36 asr2/radeonvii0 Proof using power 8
2020-08-07 12:41:38 asr2/radeonvii0 162221989 OK 73600800  45.37%; 1423 us/it; ETA 1d 11:02; 9ed406ed9cdd5c24 (check 0.97s)
2020-08-07 12:46:21 asr2/radeonvii0 162221989 OK 73800000  45.49%; 1418 us/it; ETA 1d 10:49; 1428cb07c24353c7 (check 0.94s)
Both could be restarted and appear to be working. In some previous instances I've seen it necessary to reboot the system (and maybe even total shutdown / cold start if I recall correctly).
Code:
>ver

Microsoft Windows [Version 10.0.18363.959]

Last fiddled with by kriesel on 2020-08-07 at 19:13
kriesel is offline   Reply With Quote
Old 2020-08-08, 14:25   #2
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

127616 Posts
Default

B1=1570000,B2=43960000;PFactor=0,1,2,159000187,-1,78,2 failed on the 4GB Win10 system on a Radeon VII, repeatably, near P1 GCD/beginning of P2. Transplanted to a Windows 7 Pro system with 12GB of system ram and an RX480 running gpuowl 6.11-340, it completed without complaint.

The same 4GB system involved in post 1 completed a 166M P-1 without incident on the same Radeon VII as above, while another Radeon VII on the same system repeatedly choked on 167M P-1. (The 167M is being transplanted to the RX480 also.)

The same system & Radeon VII as choked on 167M had issues with 2 out 15 P-1 exponents in 99.8M, and restarts finished those 2. It's always a failure to allocate ram. It may be that 4GB of system ram is too little to do the GCD.

Last fiddled with by kriesel on 2020-08-08 at 14:25
kriesel is offline   Reply With Quote
Old 2020-08-24, 17:03   #3
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

2·17·139 Posts
Default

Quote:
Originally Posted by kriesel View Post
B1=1570000,B2=43960000;PFactor=0,1,2,159000187,-1,78,2 failed on the 4GB Win10 system on a Radeon VII, repeatably, near P1 GCD/beginning of P2. Transplanted to a Windows 7 Pro system with 12GB of system ram and an RX480 running gpuowl 6.11-340, it completed without complaint.

The same 4GB system involved in post 1 completed a 166M P-1 without incident on the same Radeon VII as above, while another Radeon VII on the same system repeatedly choked on 167M P-1. (The 167M is being transplanted to the RX480 also.)

The same system & Radeon VII as choked on 167M had issues with 2 out 15 P-1 exponents in 99.8M, and restarts finished those 2. It's always a failure to allocate ram. It may be that 4GB of system ram is too little to do the GCD.
Increased the system ram from 4GB to 10GB and -maxAlloc back to 15000, problems with P-1 completion went away. Also connecting to its Windows Remote Desktop service began to crash the system! Switched usage to TightVNC, no problem.
kriesel is offline   Reply With Quote
Old 2020-08-24, 19:10   #4
paulunderwood
 
paulunderwood's Avatar
 
Sep 2002
Database er0rr

2×17×103 Posts
Default

My 8GB system with a Radeon VII went to a crawl while running two stage 2 P-1 and uploading a certificate. Eventually it aborted one of the stage 2 tasks and became responsive again. I now have the two instances of gpuOwl running out of sync w.r.t. finishing.
paulunderwood is online now   Reply With Quote
Old 2020-08-24, 20:29   #5
ewmayer
2ω=0
 
ewmayer's Avatar
 
Sep 2002
República de California

2×3×11×149 Posts
Default

Quote:
Originally Posted by paulunderwood View Post
My 8GB system with a Radeon VII went to a crawl while running two stage 2 P-1 and uploading a certificate. Eventually it aborted one of the stage 2 tasks and became responsive again. I now have the two instances of gpuOwl running out of sync w.r.t. finishing.
Had you used e.g. ' -maxAlloc 7500' to restrict the mem-usage of the 2 jobs?
ewmayer is offline   Reply With Quote
Old 2020-08-24, 22:17   #6
paulunderwood
 
paulunderwood's Avatar
 
Sep 2002
Database er0rr

2·17·103 Posts
Default

Quote:
Originally Posted by ewmayer View Post
Had you used e.g. ' -maxAlloc 7500' to restrict the mem-usage of the 2 jobs?
No, not yet. Thanks for the tip. But I think staggered work will do for now with no certicate uploading when two stage 2 are running.
paulunderwood is online now   Reply With Quote
Old 2020-08-24, 22:40   #7
ewmayer
2ω=0
 
ewmayer's Avatar
 
Sep 2002
República de California

2·3·11·149 Posts
Default

Quote:
Originally Posted by paulunderwood View Post
No, not yet. Thanks for the tip. But I think staggered work will do for now with no certicate uploading when two stage 2 are running.
Man, I put in all that work to document stuff like this on my howto-run-gpuowl-under-Linux post/thread, and it just get the tl;dr treatment. :)

In my case, with 2 jobs running on each of 6 R7s, worrying about relative synchronicity of jobs in each pair is impractical. AFAICT there is no performance hit from p-1 stage 2s being limited to 7.5GB, so my runs scripts use that as a hard upper limit. The only tweaking I still need to pay heed to is related to my having grabbed the last ~400 PRP assignments with p < 107m a couple months ago ... the default changeover to 6M FFT is slightly below that; I'm forcing 5.5M on all those tests by way of both saving cycles and accuracy testing of George recent FFT-twiddles accuracy improvements. But I need to pay attention to how many such expos remain in each of my 12 worktodo.txt files (2 per GPU), as soon as each GPU is working on the last assignment with p < 107m I need to fiddle the corresponding task-entry in my bash runscript to remove the '-fft 5.5M'.
ewmayer is offline   Reply With Quote
Old 2020-08-24, 22:54   #8
paulunderwood
 
paulunderwood's Avatar
 
Sep 2002
Database er0rr

DAE16 Posts
Default

Quote:
Originally Posted by ewmayer View Post
Man, I put in all that work to document stuff like this on my howto-run-gpuowl-under-Linux post/thread, and it just get the tl;dr treatment. :)

In my case, with 2 jobs running on each of 6 R7s, worrying about relative synchronicity of jobs in each pair is impractical. AFAICT there is no performance hit from p-1 stage 2s being limited to 7.5GB, so my runs scripts use that as a hard upper limit. The only tweaking I still need to pay heed to is related to my having grabbed the last ~400 PRP assignments with p < 107m a couple months ago ... the default changeover to 6M FFT is slightly below that; I'm forcing 5.5M on all those tests by way of both saving cycles and accuracy testing of George recent FFT-twiddles accuracy improvements. But I need to pay attention to how many such expos remain in each of my 12 worktodo.txt files (2 per GPU), as soon as each GPU is working on the last assignment with p < 107m I need to fiddle the corresponding task-entry in my bash runscript to remove the '-fft 5.5M'.
Think of Preda and how much effort he has put into writing a pooling facility. Jeez, 12 worktodo's I have 2.
Furthermore, I guess you only need one copy of gpuOwl to maintain and that should reside in a local bin file. Although when it is updated it might mean stopping all running instances.

Last fiddled with by paulunderwood on 2020-08-24 at 23:22
paulunderwood is online now   Reply With Quote
Old 2020-08-25, 00:16   #9
ewmayer
2ω=0
 
ewmayer's Avatar
 
Sep 2002
República de California

100110011010102 Posts
Default

Quote:
Originally Posted by paulunderwood View Post
Think of Preda and how much effort he has put into writing a pooling facility. Jeez, 12 worktodo's I have 2.
When I read his how-to re. the pooling, it struck me as logistically more complex than multiple subdirs ... to each his or her own.

Quote:
Furthermore, I guess you only need one copy of gpuOwl to maintain and that should reside in a local bin file. Although when it is updated it might mean stopping all running instances.
Yes, the single exe resides in the dir above the various run0,1,2,...-subdirs which have their own worktodo files - single runscript simply cd's into each in turn and fires up an instance. I don't do pull-and-rebuild every single time there is an update; prefer major-updates, and in major-new-functionality-not-yet-fully-debugged cases as at present w.r.to PRP-certs, prefer to wait and let the dust settle. I figure there's enough old clients of the various codes running, many of which will not be updated anytime soon for one reason or another, that a couple hundred more first-time PRPs in place of ones-with-cert won't kill us.

On the few occasions I've needed to kill-and-restart in between boots, 'pidof' has come in handy.
ewmayer is offline   Reply With Quote
Old 2020-08-25, 01:04   #10
storm5510
Random Account
 
storm5510's Avatar
 
Aug 2009
U.S.A.

23·211 Posts
Default

I created my own rule-of-thumb not to allow more than 75% of the GPU's RAM to be used. Before this, I had some difficulties. I got up one morning to find it had stopped running six hours before. The GPU RAM was still allocated, I could scroll the command window up and down. Pressing Ctrl-C produced another output line. A second time stopped it properly. Since I restricted the RAM usage, I have had no further problems.

My two GPU's, both Nvidia, are probably not very well suited for this kind of work (P-1). I do not much care for TF anymore. There is too much of that being done as it is. I run one instance of gpuOwl on each machine. I feel the screen updates need to be more often. That is a personal preference. Preda has done a good job with this and is to be commended.
storm5510 is offline   Reply With Quote
Old 2020-09-01, 14:46   #11
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

472610 Posts
Default

An example of gpuowl stalling; other gpus on the system are continuing normally while this has been stalled for ~10 hours without Select involved.This is at the stage1/stage2 P-1 transition.
Code:
2020-08-31 22:47:07 asr2/radeonvii 554207281 P1  7320000  99.87%; 7036 us/it; ETA 0d 00:01; a42c68110a8e9a52
2020-08-31 22:48:18 asr2/radeonvii saved
2020-08-31 22:48:19 asr2/radeonvii 554207281 P1  7329779 100.00%; 7310 us/it; ETA 0d 00:00; 10294edab635e665
2020-08-31 22:48:21 asr2/radeonvii P-1 (B1=5080000, B2=152400000, D=30030): primes 8217848, expanded 8762377, doubles 1280242 (left 5815206), singles 5657364, total 6937606 (84%)
2020-08-31 22:48:22 asr2/radeonvii 554207281 P2 using blocks [169 - 5075] to cover 6937606 primes
2020-08-31 22:48:22 asr2/radeonvii 554207281 P2 using 44 buffers of 240.0 MB each
(no further output through 8:33am 2020-09-01)
The first Ctrl-c readily terminated the task. Relaunch (same everything) at 8:34am 9/1 went normally and quickly reached stage 1 gcd no factor result, and progress in stage 2.
Gpuowl is initially using a core of cpu, presumably for the stage1 gcd that did not occur previously in 10 hours of elapsed time.
It's now running normally on same system/gpu, stage 1 gcd completed quickly, P2 requiring ~16.5 minutes per interim status line, manually estimated stage 2 completion 17 hours.
Code:
2020-09-01 08:34:23 gpuowl v6.11-364-g36f4e2a
2020-09-01 08:34:23 config: -user kriesel -cpu asr2/radeonvii -d 1 -maxAlloc 15000
2020-09-01 08:34:23 device 1, unique id ''
2020-09-01 08:34:23 asr2/radeonvii 554207281 FFT: 30M 1K:15:1K (17.62 bpw)
2020-09-01 08:34:23 asr2/radeonvii Expected maximum carry32: 8B3E0000
2020-09-01 08:34:27 asr2/radeonvii OpenCL args "-DEXP=554207281u -DWIDTH=1024u -DSMALL_HEIGHT=1024u -DMIDDLE=15u -DPM1=1 -DAMDGPU=1 -DCARRY64=1 -DCARRYM64=1 -DMM_CHAIN=1u -DMM2_CHAIN=2u -DMAX_ACCURACY=1 -DWEIGHT_STEP_MINUS_1=0x9.b50be91942698p-5 -DIWEIGHT_STEP_MINUS_1=-0xe.e55217d2c5318p-6  -cl-unsafe-math-optimizations -cl-std=CL2.0 -cl-finite-math-only "
2020-09-01 08:34:27 asr2/radeonvii ASM compilation failed, retrying compilation using NO_ASM
2020-09-01 08:34:32 asr2/radeonvii OpenCL compilation in 4.76 s
2020-09-01 08:34:35 asr2/radeonvii 554207281 P1 B1=5080000, B2=152400000; 7329779 bits; starting at 7329778
2020-09-01 08:34:36 asr2/radeonvii 554207281 P1  7329779 100.00%; 1092815 us/it; ETA 0d 00:00; 10294edab635e665
2020-09-01 08:34:39 asr2/radeonvii P-1 (B1=5080000, B2=152400000, D=30030): primes 8217848, expanded 8762377, doubles 1280242 (left 5815206), singles 5657364, total 6937606 (84%)
2020-09-01 08:34:39 asr2/radeonvii 554207281 P2 using blocks [169 - 5075] to cover 6937606 primes
2020-09-01 08:34:39 asr2/radeonvii 554207281 P2 using 44 buffers of 240.0 MB each
2020-09-01 08:41:16 asr2/radeonvii 554207281 P1 GCD: no factor
2020-09-01 08:51:29 asr2/radeonvii 554207281 P2   44/2880: 107208 primes; setup  1.51 s,   9.406 ms/prime
2020-09-01 09:08:01 asr2/radeonvii 554207281 P2   88/2880: 107177 primes; setup  1.42 s,   9.236 ms/prime
2020-09-01 09:24:33 asr2/radeonvii 554207281 P2  132/2880: 107337 primes; setup  1.44 s,   9.234 ms/prime
kriesel is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
Things that make you go "Hmmmm…" Xyzzy Lounge 4137 2020-11-25 22:49
GpuOwl PRP-Proof changes preda GpuOwl 20 2020-10-17 06:51
gpuOWL for Wagstaff GP2 GpuOwl 22 2020-06-13 16:57
gpuowl tuning M344587487 GpuOwl 14 2018-12-29 08:11
short runs or long runs MattcAnderson Operazione Doppi Mersennes 3 2014-02-16 15:19

All times are UTC. The time now is 08:55.

Sat Nov 28 08:55:26 UTC 2020 up 79 days, 6:06, 3 users, load averages: 1.37, 1.44, 1.48

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2020, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.