mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   GpuOwl (https://www.mersenneforum.org/forumdisplay.php?f=171)
-   -   Things that make you go hmm, concerning gpuowl runs (https://www.mersenneforum.org/showthread.php?t=25809)

kriesel 2020-08-07 18:21

Things that make you go hmm, concerning gpuowl runs
 
This thread is intended for operating oddities reports, which might include outright bugs, but definitely includes mysteries, puzzles, head scratchers.


Seen on a system running Windows 10 Pro, on a Radeon VII that previously had run P-1 successfully with maxAlloc 15000, before a motherboard failure and replacement. (Actually two; the working theory at the moment is prime95 and mfakto together on an i7-4790 is too much for the cpu VRMs/traces on the motherboard design; exact same device/location flamed out twice.) The replacement MB only has 4GB of ram and a Celeron G1840, while the original had 16GB and an I7-4790. All boards used on this build are Asrock H81 BTC Pro 2.0, BIOS version 1.20, configured as identically as I could manage. (Another system running 5 TF gpus was built with the same board model and cpu type first and is reliable so far, without running mfakto on it.)

The maxalloc value needed to be reduced, with previous experience indicating 12000 was too much but 8000 ran P-1 ok. Now even that has failed, as follows. And it takes the other gpu's run down too when it fails.

[CODE]2020-08-07 12:10:03 asr2/radeonvii4 99873313 P1 1450000 96.66%; 851 us/it; ETA 0d 00:01; d8b47a4c90ea4a28
2020-08-07 12:10:12 asr2/radeonvii4 99873313 P1 1460000 97.32%; 852 us/it; ETA 0d 00:01; e599ab00cd9196f2
2020-08-07 12:10:20 asr2/radeonvii4 99873313 P1 1470000 97.99%; 852 us/it; ETA 0d 00:00; 7017d97a0c0522b1
2020-08-07 12:10:29 asr2/radeonvii4 99873313 P1 1480000 98.66%; 852 us/it; ETA 0d 00:00; 75aea69ae7650b45
2020-08-07 12:10:37 asr2/radeonvii4 99873313 P1 1490000 99.32%; 852 us/it; ETA 0d 00:00; cd3cbe8e6a7b07a9
2020-08-07 12:10:46 asr2/radeonvii4 99873313 P1 1500000 99.99%; 852 us/it; ETA 0d 00:00; 63073d1296f10b36
2020-08-07 12:10:46 asr2/radeonvii4 saved
2020-08-07 12:10:47 asr2/radeonvii4 99873313 P1 1500153 100.00%; 3878 us/it; ETA 0d 00:00; 598a6b10499394d7
2020-08-07 12:10:47 asr2/radeonvii4 P-1 (B1=1040000, B2=28080000, D=30030): primes 1664694, expanded 1752244, doubles 281399 (left 1126189), singles 1101896, total 1383295 (83%)
2020-08-07 12:10:47 asr2/radeonvii4 99873313 P2 using blocks [35 - 935] to cover 1383295 primes
2020-08-07 12:10:47 asr2/radeonvii4 99873313 P2 using 163 buffers of 44.0 MB each
GNU MP: Cannot allocate memory (size=430096)

>gpuowl-win
2020-08-07 12:40:14 gpuowl v6.11-364-g36f4e2a
2020-08-07 12:40:14 config: -user kriesel -cpu asr2/radeonvii4 -d 1 -use NO_ASM -maxAlloc 8000
2020-08-07 12:40:14 device 1, unique id ''
2020-08-07 12:40:14 asr2/radeonvii4 99873313 FFT: 5.50M 1K:11:256 (17.32 bpw)
2020-08-07 12:40:14 asr2/radeonvii4 Expected maximum carry32: 2BE10000
2020-08-07 12:40:15 asr2/radeonvii4 OpenCL args "-DEXP=99873313u -DWIDTH=1024u -DSMALL_HEIGHT=256u -DMIDDLE=11u -DPM1=1 -DAMDGPU=1 -DCARRYM64=1 -DWEIGHT_STEP_MINUS_1=0x9.ad71e29311eb8p-4 -DIWEIGHT_STEP_MINUS_1=-0xc.0f74fd9784338p-5 -DNO_ASM=1 -cl-unsafe-math-optimizations -cl-std=CL2.0 -cl-finite-math-only "
2020-08-07 12:40:20 asr2/radeonvii4 OpenCL compilation in 4.08 s
2020-08-07 12:40:20 asr2/radeonvii4 99873313 P1 B1=1040000, B2=28080000; 1500153 bits; starting at 1500152
2020-08-07 12:40:20 asr2/radeonvii4 99873313 P1 1500153 100.00%; 205759 us/it; ETA 0d 00:00; 598a6b10499394d7
2020-08-07 12:40:21 asr2/radeonvii4 P-1 (B1=1040000, B2=28080000, D=30030): primes 1664694, expanded 1752244, doubles 281399 (left 1126189), singles 1101896, total 1383295 (83%)
2020-08-07 12:40:21 asr2/radeonvii4 99873313 P2 using blocks [35 - 935] to cover 1383295 primes
2020-08-07 12:40:21 asr2/radeonvii4 99873313 P2 using 163 buffers of 44.0 MB each
2020-08-07 12:41:13 asr2/radeonvii4 99873313 P1 GCD: no factor
2020-08-07 12:41:48 asr2/radeonvii4 99873313 P2 163/2880: 79045 primes; setup 0.88 s, 1.093 ms/prime[/CODE]Example of other Radeon VII gpu on the system getting derailed:

[CODE]2020-08-07 12:05:31 asr2/radeonvii0 162221989 OK 73400000 45.25%; 1416 us/it; ETA 1d 10:56; 55d5d6561a915117 (check 0.95s)
2020-08-07 12:10:15 asr2/radeonvii0 162221989 OK 73600000 45.37%; 1418 us/it; ETA 1d 10:55; 569862b34f411d19 (check 0.94s)
2020-08-07 12:14:59 asr2/radeonvii0 Exception St9bad_alloc: std::bad_alloc
2020-08-07 12:14:59 asr2/radeonvii0 Bye
>gpuowl-win
2020-08-07 12:40:42 gpuowl v6.11-364-g36f4e2a
2020-08-07 12:40:42 config: -user kriesel -cpu asr2/radeonvii0 -d 0 -maxAlloc 8000 -proof 8
2020-08-07 12:40:42 device 0, unique id ''
2020-08-07 12:40:42 asr2/radeonvii0 worktodo.txt line ignored: ";B1=1570000,B2=43960000;PFactor=0,1,2,159000187,-1,78,2"
2020-08-07 12:40:42 asr2/radeonvii0 162221989 FFT: 9M 1K:9:512 (17.19 bpw)
2020-08-07 12:40:42 asr2/radeonvii0 Expected maximum carry32: 34DC0000
2020-08-07 12:40:43 asr2/radeonvii0 OpenCL args "-DEXP=162221989u -DWIDTH=1024u -DSMALL_HEIGHT=512u -DMIDDLE=9u -DPM1=0 -DAMDGPU=1 -DWEIGHT_STEP_MINUS_1=0xc.0ed81c3e4fa9p-4 -DIWEIGHT_STEP_MINUS_1=-0xd.c0880129bf2c8p-5 -cl-unsafe-math-optimizations -cl-std=CL2.0 -cl-finite-math-only "
2020-08-07 12:40:43 asr2/radeonvii0 ASM compilation failed, retrying compilation using NO_ASM
2020-08-07 12:40:49 asr2/radeonvii0 OpenCL compilation in 5.94 s
2020-08-07 12:40:51 asr2/radeonvii0 162221989 OK 73600000 loaded: blockSize 400, 569862b34f411d19
2020-08-07 12:40:51 asr2/radeonvii0 validating proof residues for power 8
2020-08-07 12:41:36 asr2/radeonvii0 Proof using power 8
2020-08-07 12:41:38 asr2/radeonvii0 162221989 OK 73600800 45.37%; 1423 us/it; ETA 1d 11:02; 9ed406ed9cdd5c24 (check 0.97s)
2020-08-07 12:46:21 asr2/radeonvii0 162221989 OK 73800000 45.49%; 1418 us/it; ETA 1d 10:49; 1428cb07c24353c7 (check 0.94s)[/CODE]Both could be restarted and appear to be working. In some previous instances I've seen it necessary to reboot the system (and maybe even total shutdown / cold start if I recall correctly).
[CODE]>ver

Microsoft Windows [Version 10.0.18363.959][/CODE]

kriesel 2020-08-08 14:25

B1=1570000,B2=43960000;PFactor=0,1,2,159000187,-1,78,2 failed on the 4GB Win10 system on a Radeon VII, repeatably, near P1 GCD/beginning of P2. Transplanted to a Windows 7 Pro system with 12GB of system ram and an RX480 running gpuowl 6.11-340, it completed without complaint.

The same 4GB system involved in post 1 completed a 166M P-1 without incident on the same Radeon VII as above, while another Radeon VII on the same system repeatedly choked on 167M P-1. (The 167M is being transplanted to the RX480 also.)

The same system & Radeon VII as choked on 167M had issues with 2 out 15 P-1 exponents in 99.8M, and restarts finished those 2. It's always a failure to allocate ram. It may be that 4GB of system ram is too little to do the GCD.

kriesel 2020-08-24 17:03

[QUOTE=kriesel;552912]B1=1570000,B2=43960000;PFactor=0,1,2,159000187,-1,78,2 failed on the 4GB Win10 system on a Radeon VII, repeatably, near P1 GCD/beginning of P2. Transplanted to a Windows 7 Pro system with 12GB of system ram and an RX480 running gpuowl 6.11-340, it completed without complaint.

The same 4GB system involved in post 1 completed a 166M P-1 without incident on the same Radeon VII as above, while another Radeon VII on the same system repeatedly choked on 167M P-1. (The 167M is being transplanted to the RX480 also.)

The same system & Radeon VII as choked on 167M had issues with 2 out 15 P-1 exponents in 99.8M, and restarts finished those 2. It's always a failure to allocate ram. It may be that 4GB of system ram is too little to do the GCD.[/QUOTE]
Increased the system ram from 4GB to 10GB and -maxAlloc back to 15000, problems with P-1 completion went away. Also connecting to its Windows Remote Desktop service began to crash the system! Switched usage to TightVNC, no problem.

paulunderwood 2020-08-24 19:10

My 8GB system with a Radeon VII went to a crawl while running two stage 2 P-1 and uploading a certificate. Eventually it aborted one of the stage 2 tasks and became responsive again. I now have the two instances of gpuOwl running out of sync w.r.t. finishing.

ewmayer 2020-08-24 20:29

[QUOTE=paulunderwood;554848]My 8GB system with a Radeon VII went to a crawl while running two stage 2 P-1 and uploading a certificate. Eventually it aborted one of the stage 2 tasks and became responsive again. I now have the two instances of gpuOwl running out of sync w.r.t. finishing.[/QUOTE]

Had you used e.g. ' -maxAlloc 7500' to restrict the mem-usage of the 2 jobs?

paulunderwood 2020-08-24 22:17

[QUOTE=ewmayer;554857]Had you used e.g. ' -maxAlloc 7500' to restrict the mem-usage of the 2 jobs?[/QUOTE]

No, not yet. Thanks for the tip. But I think staggered work will do for now with no certicate uploading when two stage 2 are running.

ewmayer 2020-08-24 22:40

[QUOTE=paulunderwood;554859]No, not yet. Thanks for the tip. But I think staggered work will do for now with no certicate uploading when two stage 2 are running.[/QUOTE]

Man, I put in all that work to document stuff like this on my howto-run-gpuowl-under-Linux post/thread, and it just get the tl;dr treatment. :)

In my case, with 2 jobs running on each of 6 R7s, worrying about relative synchronicity of jobs in each pair is impractical. AFAICT there is no performance hit from p-1 stage 2s being limited to 7.5GB, so my runs scripts use that as a hard upper limit. The only tweaking I still need to pay heed to is related to my having grabbed the last ~400 PRP assignments with p < 107m a couple months ago ... the default changeover to 6M FFT is slightly below that; I'm forcing 5.5M on all those tests by way of both saving cycles and accuracy testing of George recent FFT-twiddles accuracy improvements. But I need to pay attention to how many such expos remain in each of my 12 worktodo.txt files (2 per GPU), as soon as each GPU is working on the last assignment with p < 107m I need to fiddle the corresponding task-entry in my bash runscript to remove the '-fft 5.5M'.

paulunderwood 2020-08-24 22:54

[QUOTE=ewmayer;554863]Man, I put in all that work to document stuff like this on my howto-run-gpuowl-under-Linux post/thread, and it just get the tl;dr treatment. :)

In my case, with 2 jobs running on each of 6 R7s, worrying about relative synchronicity of jobs in each pair is impractical. AFAICT there is no performance hit from p-1 stage 2s being limited to 7.5GB, so my runs scripts use that as a hard upper limit. The only tweaking I still need to pay heed to is related to my having grabbed the last ~400 PRP assignments with p < 107m a couple months ago ... the default changeover to 6M FFT is slightly below that; I'm forcing 5.5M on all those tests by way of both saving cycles and accuracy testing of George recent FFT-twiddles accuracy improvements. But I need to pay attention to how many such expos remain in each of my 12 worktodo.txt files (2 per GPU), as soon as each GPU is working on the last assignment with p < 107m I need to fiddle the corresponding task-entry in my bash runscript to remove the '-fft 5.5M'.[/QUOTE]
Think of Preda and how much effort he has put into writing a pooling facility. Jeez, 12 worktodo's :smile: I have 2.
Furthermore, I guess you only need one copy of gpuOwl to maintain and that should reside in a local bin file. Although when it is updated it might mean stopping all running instances.

ewmayer 2020-08-25 00:16

[QUOTE=paulunderwood;554864]Think of Preda and how much effort he has put into writing a pooling facility. Jeez, 12 worktodo's :smile: I have 2.[/QUOTE]
When I read his how-to re. the pooling, it struck me as logistically more complex than multiple subdirs ... to each his or her own.

[QUOTE]Furthermore, I guess you only need one copy of gpuOwl to maintain and that should reside in a local bin file. Although when it is updated it might mean stopping all running instances.[/QUOTE]

Yes, the single exe resides in the dir above the various run0,1,2,...-subdirs which have their own worktodo files - single runscript simply cd's into each in turn and fires up an instance. I don't do pull-and-rebuild every single time there is an update; prefer major-updates, and in major-new-functionality-not-yet-fully-debugged cases as at present w.r.to PRP-certs, prefer to wait and let the dust settle. I figure there's enough old clients of the various codes running, many of which will not be updated anytime soon for one reason or another, that a couple hundred more first-time PRPs in place of ones-with-cert won't kill us.

On the few occasions I've needed to kill-and-restart in between boots, 'pidof' has come in handy.

storm5510 2020-08-25 01:04

I created my own rule-of-thumb not to allow more than 75% of the GPU's RAM to be used. Before this, I had some difficulties. I got up one morning to find it had stopped running six hours before. The GPU RAM was still allocated, I could scroll the command window up and down. Pressing Ctrl-C produced another output line. A second time stopped it properly. Since I restricted the RAM usage, I have had no further problems.

My two GPU's, both Nvidia, are probably not very well suited for this kind of work (P-1). I do not much care for TF anymore. There is too much of that being done as it is. I run one instance of [I]gpuOwl [/I]on each machine. I feel the screen updates need to be more often. That is a personal preference. [I]Preda[/I] has done a good job with this and is to be commended.

kriesel 2020-09-01 14:46

An example of gpuowl stalling; other gpus on the system are continuing normally while this has been stalled for ~10 hours without Select involved.This is at the stage1/stage2 P-1 transition.[CODE]2020-08-31 22:47:07 asr2/radeonvii 554207281 P1 7320000 99.87%; 7036 us/it; ETA 0d 00:01; a42c68110a8e9a52
2020-08-31 22:48:18 asr2/radeonvii saved
2020-08-31 22:48:19 asr2/radeonvii 554207281 P1 7329779 100.00%; 7310 us/it; ETA 0d 00:00; 10294edab635e665
2020-08-31 22:48:21 asr2/radeonvii P-1 (B1=5080000, B2=152400000, D=30030): primes 8217848, expanded 8762377, doubles 1280242 (left 5815206), singles 5657364, total 6937606 (84%)
2020-08-31 22:48:22 asr2/radeonvii 554207281 P2 using blocks [169 - 5075] to cover 6937606 primes
2020-08-31 22:48:22 asr2/radeonvii 554207281 P2 using 44 buffers of 240.0 MB each
[/CODE](no further output through 8:33am 2020-09-01)
The first Ctrl-c readily terminated the task. Relaunch (same everything) at 8:34am 9/1 went normally and quickly reached stage 1 gcd no factor result, and progress in stage 2.
Gpuowl is initially using a core of cpu, presumably for the stage1 gcd that did not occur previously in 10 hours of elapsed time.
It's now running normally on same system/gpu, stage 1 gcd completed quickly, P2 requiring ~16.5 minutes per interim status line, manually estimated stage 2 completion 17 hours.[CODE]2020-09-01 08:34:23 gpuowl v6.11-364-g36f4e2a
2020-09-01 08:34:23 config: -user kriesel -cpu asr2/radeonvii -d 1 -maxAlloc 15000
2020-09-01 08:34:23 device 1, unique id ''
2020-09-01 08:34:23 asr2/radeonvii 554207281 FFT: 30M 1K:15:1K (17.62 bpw)
2020-09-01 08:34:23 asr2/radeonvii Expected maximum carry32: 8B3E0000
2020-09-01 08:34:27 asr2/radeonvii OpenCL args "-DEXP=554207281u -DWIDTH=1024u -DSMALL_HEIGHT=1024u -DMIDDLE=15u -DPM1=1 -DAMDGPU=1 -DCARRY64=1 -DCARRYM64=1 -DMM_CHAIN=1u -DMM2_CHAIN=2u -DMAX_ACCURACY=1 -DWEIGHT_STEP_MINUS_1=0x9.b50be91942698p-5 -DIWEIGHT_STEP_MINUS_1=-0xe.e55217d2c5318p-6 -cl-unsafe-math-optimizations -cl-std=CL2.0 -cl-finite-math-only "
2020-09-01 08:34:27 asr2/radeonvii ASM compilation failed, retrying compilation using NO_ASM
2020-09-01 08:34:32 asr2/radeonvii OpenCL compilation in 4.76 s
2020-09-01 08:34:35 asr2/radeonvii 554207281 P1 B1=5080000, B2=152400000; 7329779 bits; starting at 7329778
2020-09-01 08:34:36 asr2/radeonvii 554207281 P1 7329779 100.00%; 1092815 us/it; ETA 0d 00:00; 10294edab635e665
2020-09-01 08:34:39 asr2/radeonvii P-1 (B1=5080000, B2=152400000, D=30030): primes 8217848, expanded 8762377, doubles 1280242 (left 5815206), singles 5657364, total 6937606 (84%)
2020-09-01 08:34:39 asr2/radeonvii 554207281 P2 using blocks [169 - 5075] to cover 6937606 primes
2020-09-01 08:34:39 asr2/radeonvii 554207281 P2 using 44 buffers of 240.0 MB each
2020-09-01 08:41:16 asr2/radeonvii 554207281 P1 GCD: no factor
2020-09-01 08:51:29 asr2/radeonvii 554207281 P2 44/2880: 107208 primes; setup 1.51 s, 9.406 ms/prime
2020-09-01 09:08:01 asr2/radeonvii 554207281 P2 88/2880: 107177 primes; setup 1.42 s, 9.236 ms/prime
2020-09-01 09:24:33 asr2/radeonvii 554207281 P2 132/2880: 107337 primes; setup 1.44 s, 9.234 ms/prime[/CODE]


All times are UTC. The time now is 08:57.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2020, Jelsoft Enterprises Ltd.