mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware > GPU Computing > GpuOwl

Reply
 
Thread Tools
Old 2020-02-11, 23:11   #1838
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

37·131 Posts
Default P-1 needs more error checks built in

All 0 residues in P-1 stage 1, and it marches blindly on.
Worktodo line:
Code:
B1=1040000,B2=28080000;PFactor=0,1,2,99998441,-1,77,2
Config.txt:
Code:
-device 0 -user kriesel -cpu condorella/rx480 -use NO_ASM,UNROLL_HEIGHT,UNROLL_WIDTH,MERGED_MIDDLE,WORKINGIN1,WORKINGOUT1,T2_SHUFFLE_HEIGHT,T2_SHUFFLE_WIDTH,CARRY32,MORE_SQUARES_MIDDLEMUL1,CHEBYSHEV_MIDDLEMUL2,NEW_SLOWTRIG
Code:
C:\msys64\home\ken\gpuowl-compile\gpuowl-v6.11-134-g1e0ce1d\rx480>gpuowl-win
2020-02-11 16:35:15 gpuowl v6.11-134-g1e0ce1d
2020-02-11 16:35:15 config: -device 0 -user kriesel -cpu condorella/rx480 -use NO_ASM,UNROLL_HEIGHT,UNROLL_WIDTH,MERGED_MIDDLE,WORKINGIN1,WORKINGOUT1,T2_SHUFFLE
_HEIGHT,T2_SHUFFLE_WIDTH,CARRY32,MORE_SQUARES_MIDDLEMUL1,CHEBYSHEV_MIDDLEMUL2,NEW_SLOWTRIG
2020-02-11 16:35:15 config:
2020-02-11 16:35:15 config: 4.5m fft NO_ASM,MERGED_MIDDLE,WORKINGIN5,WORKINGOUT1,T2_SHUFFLE_WIDTH,T2_SHUFFLE_HEIGHT,UNROLL_MIDDLEMUL2,UNROLL_MIDDLEMUL1,CARRY32,
CHEBYSHEV_METHOD_FMA,CHEBYSHEV_MIDDLEMUL2,LESS_ACCURATE
2020-02-11 16:35:15 config: :5m fft  NO_ASM,UNROLL_HEIGHT,UNROLL_WIDTH,MERGED_MIDDLE,WORKINGIN1,WORKINGOUT1,T2_SHUFFLE_HEIGHT,T2_SHUFFLE_WIDTH,CARRY32,MORE_SQUA
RES_MIDDLEMUL1,CHEBYSHEV_MIDDLEMUL2,NEW_SLOWTRIG
2020-02-11 16:35:15 device 0, unique id ''
2020-02-11 16:35:15 condorella/rx480 99998441 FFT 5632K: Width 256x4, Height 64x4, Middle 11; 17.34 bits/word
2020-02-11 16:35:17 condorella/rx480 OpenCL args "-DEXP=99998441u -DWIDTH=1024u -DSMALL_HEIGHT=256u -DMIDDLE=11u -DWEIGHT_STEP=0xc.a5a9d5baf7a18p-3 -DIWEIGHT_ST
EP=0xa.1ef1eeb123f08p-4 -DWEIGHT_BIGSTEP=0xd.744fccad69d68p-3 -DIWEIGHT_BIGSTEP=0x9.837f0518db8a8p-4 -DAMDGPU=1 -DCARRY32=1 -DCHEBYSHEV_MIDDLEMUL2=1 -DMERGED_MI
DDLE=1 -DMORE_SQUARES_MIDDLEMUL1=1 -DNEW_SLOWTRIG=1 -DNO_ASM=1 -DT2_SHUFFLE_HEIGHT=1 -DT2_SHUFFLE_WIDTH=1 -DUNROLL_HEIGHT=1 -DUNROLL_WIDTH=1 -DWORKINGIN1=1 -DWO
RKINGOUT1=1  -I. -cl-fast-relaxed-math -cl-std=CL2.0"
2020-02-11 16:35:20 condorella/rx480 OpenCL compilation in 3.11 s
2020-02-11 16:35:20 condorella/rx480 99998441 P1 B1=1040000, B2=28080000; 1500153 bits; starting at 0
2020-02-11 16:35:58 condorella/rx480 99998441 P1    10000   0.67%; 3761 us/it; ETA 0d 01:33; 0000000000000000
2020-02-11 16:36:36 condorella/rx480 99998441 P1    20000   1.33%; 3768 us/it; ETA 0d 01:33; 0000000000000000
2020-02-11 16:37:13 condorella/rx480 99998441 P1    30000   2.00%; 3767 us/it; ETA 0d 01:32; 0000000000000000
2020-02-11 16:37:51 condorella/rx480 99998441 P1    40000   2.67%; 3769 us/it; ETA 0d 01:32; 0000000000000000
2020-02-11 16:38:29 condorella/rx480 99998441 P1    50000   3.33%; 3767 us/it; ETA 0d 01:31; 0000000000000000
2020-02-11 16:39:06 condorella/rx480 99998441 P1    60000   4.00%; 3771 us/it; ETA 0d 01:31; 0000000000000000
2020-02-11 16:39:44 condorella/rx480 99998441 P1    70000   4.67%; 3764 us/it; ETA 0d 01:30; 0000000000000000
2020-02-11 16:40:21 condorella/rx480 saved
2020-02-11 16:40:22 condorella/rx480 99998441 P1    80000   5.33%; 3781 us/it; ETA 0d 01:30; 0000000000000000
2020-02-11 16:41:00 condorella/rx480 99998441 P1    90000   6.00%; 3768 us/it; ETA 0d 01:29; 0000000000000000
2020-02-11 16:41:37 condorella/rx480 99998441 P1   100000   6.67%; 3768 us/it; ETA 0d 01:28; 0000000000000000
2020-02-11 16:42:15 condorella/rx480 99998441 P1   110000   7.33%; 3765 us/it; ETA 0d 01:27; 0000000000000000
2020-02-11 16:42:53 condorella/rx480 99998441 P1   120000   8.00%; 3768 us/it; ETA 0d 01:27; 0000000000000000
2020-02-11 16:43:30 condorella/rx480 99998441 P1   130000   8.67%; 3763 us/it; ETA 0d 01:26; 0000000000000000
2020-02-11 16:44:08 condorella/rx480 99998441 P1   140000   9.33%; 3770 us/it; ETA 0d 01:25; 0000000000000000
2020-02-11 16:44:46 condorella/rx480 99998441 P1   150000  10.00%; 3766 us/it; ETA 0d 01:25; 0000000000000000
2020-02-11 16:45:21 condorella/rx480 saved
2020-02-11 16:45:23 condorella/rx480 99998441 P1   160000  10.67%; 3784 us/it; ETA 0d 01:25; 0000000000000000
2020-02-11 16:46:01 condorella/rx480 99998441 P1   170000  11.33%; 3769 us/it; ETA 0d 01:24; 0000000000000000
2020-02-11 16:46:39 condorella/rx480 99998441 P1   180000  12.00%; 3765 us/it; ETA 0d 01:23; 0000000000000000
2020-02-11 16:47:16 condorella/rx480 99998441 P1   190000  12.67%; 3771 us/it; ETA 0d 01:22; 0000000000000000
2020-02-11 16:47:54 condorella/rx480 99998441 P1   200000  13.33%; 3767 us/it; ETA 0d 01:22; 0000000000000000
2020-02-11 16:48:32 condorella/rx480 99998441 P1   210000  14.00%; 3769 us/it; ETA 0d 01:21; 0000000000000000
2020-02-11 16:49:10 condorella/rx480 99998441 P1   220000  14.67%; 3768 us/it; ETA 0d 01:20; 0000000000000000
2020-02-11 16:49:47 condorella/rx480 99998441 P1   230000  15.33%; 3769 us/it; ETA 0d 01:20; 0000000000000000
2020-02-11 16:50:22 condorella/rx480 saved
2020-02-11 16:50:25 condorella/rx480 99998441 P1   240000  16.00%; 3779 us/it; ETA 0d 01:19; 0000000000000000
2020-02-11 16:51:03 condorella/rx480 99998441 P1   250000  16.66%; 3764 us/it; ETA 0d 01:18; 0000000000000000
2020-02-11 16:51:40 condorella/rx480 99998441 P1   260000  17.33%; 3767 us/it; ETA 0d 01:18; 0000000000000000
2020-02-11 16:52:18 condorella/rx480 99998441 P1   270000  18.00%; 3766 us/it; ETA 0d 01:17; 0000000000000000
2020-02-11 16:52:56 condorella/rx480 99998441 P1   280000  18.66%; 3770 us/it; ETA 0d 01:17; 0000000000000000
2020-02-11 16:53:33 condorella/rx480 99998441 P1   290000  19.33%; 3769 us/it; ETA 0d 01:16; 0000000000000000
2020-02-11 16:54:11 condorella/rx480 99998441 P1   300000  20.00%; 3769 us/it; ETA 0d 01:15; 0000000000000000
2020-02-11 16:54:49 condorella/rx480 99998441 P1   310000  20.66%; 3764 us/it; ETA 0d 01:15; 0000000000000000
2020-02-11 16:55:22 condorella/rx480 saved
2020-02-11 16:55:27 condorella/rx480 99998441 P1   320000  21.33%; 3780 us/it; ETA 0d 01:14; 0000000000000000
2020-02-11 16:56:04 condorella/rx480 99998441 P1   330000  22.00%; 3765 us/it; ETA 0d 01:13; 0000000000000000
2020-02-11 16:56:42 condorella/rx480 99998441 P1   340000  22.66%; 3759 us/it; ETA 0d 01:13; 0000000000000000
2020-02-11 16:57:19 condorella/rx480 99998441 P1   350000  23.33%; 3770 us/it; ETA 0d 01:12; 0000000000000000
2020-02-11 16:57:57 condorella/rx480 99998441 P1   360000  24.00%; 3767 us/it; ETA 0d 01:12; 0000000000000000
2020-02-11 16:58:35 condorella/rx480 99998441 P1   370000  24.66%; 3767 us/it; ETA 0d 01:11; 0000000000000000
Stopped the run, renamed the intermediate files out of the way, reduced -use option to just NO_ASM, and it seems to be running ok.
Code:
C:\msys64\home\ken\gpuowl-compile\gpuowl-v6.11-134-g1e0ce1d\rx480>gpuowl-win
2020-02-11 17:07:30 gpuowl v6.11-134-g1e0ce1d
2020-02-11 17:07:30 config: -device 0 -user kriesel -cpu condorella/rx480 -use NO_ASM
2020-02-11 17:07:30 config:
2020-02-11 17:07:30 config: :,UNROLL_HEIGHT,UNROLL_WIDTH,MERGED_MIDDLE,WORKINGIN1,WORKINGOUT1,T2_SHUFFLE_HEIGHT,T2_SHUFFLE_WIDTH,CARRY32,MORE_SQUARES_MIDDLEMUL1
,CHEBYSHEV_MIDDLEMUL2,NEW_SLOWTRIG
2020-02-11 17:07:30 config:
2020-02-11 17:07:30 config: :4.5m fft NO_ASM,MERGED_MIDDLE,WORKINGIN5,WORKINGOUT1,T2_SHUFFLE_WIDTH,T2_SHUFFLE_HEIGHT,UNROLL_MIDDLEMUL2,UNROLL_MIDDLEMUL1,CARRY32
,CHEBYSHEV_METHOD_FMA,CHEBYSHEV_MIDDLEMUL2,LESS_ACCURATE
2020-02-11 17:07:30 config: :5m fft  NO_ASM,UNROLL_HEIGHT,UNROLL_WIDTH,MERGED_MIDDLE,WORKINGIN1,WORKINGOUT1,T2_SHUFFLE_HEIGHT,T2_SHUFFLE_WIDTH,CARRY32,MORE_SQUA
RES_MIDDLEMUL1,CHEBYSHEV_MIDDLEMUL2,NEW_SLOWTRIG
2020-02-11 17:07:30 device 0, unique id ''
2020-02-11 17:07:30 condorella/rx480 99998441 FFT 5632K: Width 256x4, Height 64x4, Middle 11; 17.34 bits/word
2020-02-11 17:07:32 condorella/rx480 OpenCL args "-DEXP=99998441u -DWIDTH=1024u -DSMALL_HEIGHT=256u -DMIDDLE=11u -DWEIGHT_STEP=0xc.a5a9d5baf7a18p-3 -DIWEIGHT_ST
EP=0xa.1ef1eeb123f08p-4 -DWEIGHT_BIGSTEP=0xd.744fccad69d68p-3 -DIWEIGHT_BIGSTEP=0x9.837f0518db8a8p-4 -DAMDGPU=1 -DNO_ASM=1  -I. -cl-fast-relaxed-math -cl-std=CL
2.0"
2020-02-11 17:07:35 condorella/rx480 OpenCL compilation in 3.20 s
2020-02-11 17:07:36 condorella/rx480 99998441 P1 B1=1040000, B2=28080000; 1500153 bits; starting at 0
2020-02-11 17:08:14 condorella/rx480 99998441 P1    10000   0.67%; 3887 us/it; ETA 0d 01:37; 5412ff3dd7337b62
2020-02-11 17:08:53 condorella/rx480 99998441 P1    20000   1.33%; 3897 us/it; ETA 0d 01:36; 67401cf04590fe9e

Last fiddled with by kriesel on 2020-02-11 at 23:28
kriesel is offline   Reply With Quote
Old 2020-02-12, 21:58   #1839
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

37·131 Posts
Default

Latest gpuowl commit is still missing the 15M. This was on Google Colaboratory, for a hopefully very reliable gpu and underlying system.
Code:
{"exponent":"2000081", "worktype":"PM1", "status":"F", "program":{"name":"gpuowl", "version":"v6.11-145-g6146b6d-dirty"}, "timestamp":"2020-02-12 20:35:23 UTC", "user":"kriesel", "computer":"colab3/TeslaP4", "fft-length":131072, "B1":15015, "factors":["2700109974025273"]}
{"exponent":"4444091", "worktype":"PM1", "status":"F", "program":{"name":"gpuowl", "version":"v6.11-145-g6146b6d-dirty"}, "timestamp":"2020-02-12 20:35:47 UTC", "user":"kriesel", "computer":"colab3/TeslaP4", "fft-length":229376, "B1":15015, "factors":["1809798096458971047321927127"]}
{"exponent":"10000831", "worktype":"PM1", "status":"F", "program":{"name":"gpuowl", "version":"v6.11-145-g6146b6d-dirty"}, "timestamp":"2020-02-12 20:40:05 UTC", "user":"kriesel", "computer":"colab3/TeslaP4", "fft-length":524288, "B1":120000, "B2":2200000, "factors":["646560662529991467527"]}
{"exponent":"15000031", "worktype":"PM1", "status":"NF", "program":{"name":"gpuowl", "version":"v6.11-145-g6146b6d-dirty"}, "timestamp":"2020-02-12 20:54:51 UTC", "user":"kriesel", "computer":"colab3/TeslaP4", "fft-length":786432, "B1":180000, "B2":3780000}
{"exponent":"18000137", "worktype":"PM1", "status":"F", "program":{"name":"gpuowl", "version":"v6.11-145-g6146b6d-dirty"}, "timestamp":"2020-02-12 20:55:29 UTC", "user":"kriesel", "computer":"colab3/TeslaP4", "fft-length":1048576, "B1":15015, "factors":["2479169845866581244380961527"]}
{"exponent":"19000013", "worktype":"PM1", "status":"F", "program":{"name":"gpuowl", "version":"v6.11-145-g6146b6d-dirty"}, "timestamp":"2020-02-12 20:56:14 UTC", "user":"kriesel", "computer":"colab3/TeslaP4", "fft-length":1048576, "B1":15015, "factors":["4674003199"]}
CUDAPm1 v0.20 finds it.
Code:
CUDAPm1 v0.20
------- DEVICE 0 -------
name                GeForce GTX 1080
Compatibility       6.1
clockRate (MHz)     1797
memClockRate (MHz)  5005
totalGlobalMem      zu
totalConstMem       zu
l2CacheSize         2097152
sharedMemPerBlock   zu
regsPerBlock        65536
warpSize            32
memPitch            zu
maxThreadsPerBlock  1024
maxThreadsPerMP     2048
multiProcessorCount 20
maxThreadsDim[3]    1024,1024,64
maxGridSize[3]      2147483647,65535,65535
textureAlignment    zu
deviceOverlap       1

CUDA reports 7968M of 8192M GPU memory free.
Using threads: norm1 256, mult 256, norm2 1024.
No stage 2 checkpoint.
Using up to 4992M GPU memory.
Selected B1=2360000, B2=59590000, 3.83% chance of finding a factor
Using B1 = 2360000 from savefile.
Continuing stage 2 from a partial result of M282000073 fft length = 16384K
batch wrapper reports (re)launch at Wed 02/12/2020 15:13:40.06 reset count 0 of max 3 
CUDAPm1 v0.20
------- DEVICE 0 -------
name                GeForce GTX 1080
Compatibility       6.1
clockRate (MHz)     1797
memClockRate (MHz)  5005
totalGlobalMem      zu
totalConstMem       zu
l2CacheSize         2097152
sharedMemPerBlock   zu
regsPerBlock        65536
warpSize            32
memPitch            zu
maxThreadsPerBlock  1024
maxThreadsPerMP     2048
multiProcessorCount 20
maxThreadsDim[3]    1024,1024,64
maxGridSize[3]      2147483647,65535,65535
textureAlignment    zu
deviceOverlap       1

CUDA reports 7968M of 8192M GPU memory free.
Index 25
Using threads: norm1 256, mult 32, norm2 64.
Using up to 4137M GPU memory.
Selected B1=275000, B2=8112500, 5.53% chance of finding a factor
Starting stage 1 P-1, M15000031, B1 = 275000, B2 = 8112500, fft length = 800K
Doing 396818 iterations
Iteration 100000 M15000031, 0x7a8e085ca931e223, n = 800K, CUDAPm1 v0.20 err = 0.14453 (1:19 real, 0.7873 ms/iter, ETA 3:53)
Iteration 200000 M15000031, 0xad072f2e5fc4eb76, n = 800K, CUDAPm1 v0.20 err = 0.14844 (1:19 real, 0.7901 ms/iter, ETA 2:35)
Iteration 300000 M15000031, 0x82162c462572c64d, n = 800K, CUDAPm1 v0.20 err = 0.14063 (1:19 real, 0.7920 ms/iter, ETA 1:16)
M15000031, 0xe80933bd37a9f9a9, n = 800K, CUDAPm1 v0.20
Stage 1 complete, estimated total time = 5:14
Starting stage 1 gcd.
M15000031 Stage 1 found no factor (P-1, B1=275000, B2=8112500, e=0, n=800K CUDAPm1 v0.20)
Starting stage 2.
Using b1 = 275000, b2 = 8112500, d = 2310, e = 12, nrp = 480
Zeros: 348644, Ones: 429436, Pairs: 93347
Processing 1 - 480 of 480 relative primes.
Inititalizing pass... done. transforms: 31221, err = 0.14453, (13.64 real, 0.4369 ms/tran,  ETA NA)
Transforms: 229552 M15000031, 0x6760f107920d3922, n = 800K, CUDAPm1 v0.20 err = 0.14844 (1:35 real, 0.4140 ms/tran, ETA 4:36)
Transforms: 218518 M15000031, 0x45ab02ca00c98138, n = 800K, CUDAPm1 v0.20 err = 0.14063 (1:31 real, 0.4142 ms/tran, ETA 3:06)
Transforms: 214190 M15000031, 0x1f7fed9dfd61de18, n = 800K, CUDAPm1 v0.20 err = 0.14844 (1:28 real, 0.4145 ms/tran, ETA 1:37)
Transforms: 235492 M15000031, 0xbff7fea3340e621f, n = 800K, CUDAPm1 v0.20 err = 0.15625 (1:38 real, 0.4160 ms/tran, ETA 0:00)

Stage 2 complete, 928973 transforms, estimated total time = 6:25
Starting stage 2 gcd.
M15000031 has a factor: 1178543237739460982839 (P-1, B1=275000, B2=8112500, e=12, n=800K CUDAPm1 v0.20)
And prime95 v29.8b does too
Code:
[Wed Feb 12 15:47:41 2020]
P-1 found a factor in stage #2, B1=255000, B2=5737500, E=12.
UID: Kriesel/peregrine, M15000031 has a factor: 1178543237739460982839 (P-1, B1=255000, B2=5737500, E=12)
kriesel is offline   Reply With Quote
Old 2020-02-12, 22:04   #1840
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

37×131 Posts
Default gpuowl-win build 6.11-145-g6146b6d

Here it is, tested only as far as the help output.
Attached Files
File Type: zip gpuowl-win-v6.11-145-g6146b6d.zip (642.6 KB, 69 views)
File Type: txt build-log.txt (6.1 KB, 56 views)
kriesel is offline   Reply With Quote
Old 2020-02-12, 22:21   #1841
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

37·131 Posts
Default Latest commit gpuowl is crashing on Colab Tesla P4

It ran below 20M P-1 with -maxAlloc 7500, and crashes on 20M.
Code:
2020-02-12 22:15:08 gpuowl v6.11-145-g6146b6d-dirty
2020-02-12 22:15:09 config: -user kriesel -cpu colab3/TeslaP4 -yield -maxAlloc 7000 -use NO_ASM
2020-02-12 22:15:09 device 0, unique id ''
2020-02-12 22:15:09 colab3/TeslaP4 20000023 FFT 1152K: Width 8x8, Height 256x4, Middle 9; 16.95 bits/word
2020-02-12 22:15:09 colab3/TeslaP4 OpenCL args "-DEXP=20000023u -DWIDTH=64u -DSMALL_HEIGHT=1024u -DMIDDLE=9u -DWEIGHT_STEP=0x1.0840814dcafb8p+0 -DIWEIGHT_STEP=0x1.f002ed51e880ap-1 -DWEIGHT_BIGSTEP=0x1.172b83c7d517bp+0 -DIWEIGHT_BIGSTEP=0x1.d5818dcfba487p-1 -DNO_ASM=1  -I. -cl-fast-relaxed-math -cl-std=CL2.0"
2020-02-12 22:15:09 colab3/TeslaP4 

2020-02-12 22:15:09 colab3/TeslaP4 OpenCL compilation in 0.01 s
2020-02-12 22:15:09 colab3/TeslaP4 20000023 P1 B1=240000, B2=5760000; 346123 bits; starting at 346122
2020-02-12 22:15:09 colab3/TeslaP4 20000023 P1   346123 100.00%; 43626 us/it; ETA 0d 00:00; 6a2e08e14df5900e
2020-02-12 22:15:09 colab3/TeslaP4 P-1 (B1=240000, B2=5760000, D=30030): primes 376241, expanded 390008, doubles 69210 (left 241927), singles 237821, total 307031 (82%)
2020-02-12 22:15:09 colab3/TeslaP4 20000023 P2 using blocks [8 - 192] to cover 307031 primes
2020-02-12 22:15:09 colab3/TeslaP4 20000023 P2 using 759 buffers of 9.0 MB each
2020-02-12 22:15:21 colab3/TeslaP4 Exception gpu_error: MEM_OBJECT_ALLOCATION_FAILURE clEnqueueCopyBuffer(queue, src, dst, 0, 0, size, 0, NULL, NULL) at clwrap.cpp:344 copyBuf
2020-02-12 22:15:21 colab3/TeslaP4 Bye
That's an 8GB gpu. https://www.mersenneforum.org/showpo...5&postcount=15
kriesel is offline   Reply With Quote
Old 2020-02-13, 07:21   #1842
preda
 
preda's Avatar
 
"Mihai Preda"
Apr 2015

2·23·29 Posts
Default

Try with a smaller -maxAlloc. Can you check the free memory on the GPU -- how much is free before and during the gpuowl run?

Quote:
Originally Posted by kriesel View Post
It ran below 20M P-1 with -maxAlloc 7500, and crashes on 20M.
Code:
2020-02-12 22:15:08 gpuowl v6.11-145-g6146b6d-dirty
2020-02-12 22:15:09 config: -user kriesel -cpu colab3/TeslaP4 -yield -maxAlloc 7000 -use NO_ASM
2020-02-12 22:15:09 device 0, unique id ''
2020-02-12 22:15:09 colab3/TeslaP4 20000023 FFT 1152K: Width 8x8, Height 256x4, Middle 9; 16.95 bits/word
2020-02-12 22:15:09 colab3/TeslaP4 OpenCL args "-DEXP=20000023u -DWIDTH=64u -DSMALL_HEIGHT=1024u -DMIDDLE=9u -DWEIGHT_STEP=0x1.0840814dcafb8p+0 -DIWEIGHT_STEP=0x1.f002ed51e880ap-1 -DWEIGHT_BIGSTEP=0x1.172b83c7d517bp+0 -DIWEIGHT_BIGSTEP=0x1.d5818dcfba487p-1 -DNO_ASM=1  -I. -cl-fast-relaxed-math -cl-std=CL2.0"
2020-02-12 22:15:09 colab3/TeslaP4 

2020-02-12 22:15:09 colab3/TeslaP4 OpenCL compilation in 0.01 s
2020-02-12 22:15:09 colab3/TeslaP4 20000023 P1 B1=240000, B2=5760000; 346123 bits; starting at 346122
2020-02-12 22:15:09 colab3/TeslaP4 20000023 P1   346123 100.00%; 43626 us/it; ETA 0d 00:00; 6a2e08e14df5900e
2020-02-12 22:15:09 colab3/TeslaP4 P-1 (B1=240000, B2=5760000, D=30030): primes 376241, expanded 390008, doubles 69210 (left 241927), singles 237821, total 307031 (82%)
2020-02-12 22:15:09 colab3/TeslaP4 20000023 P2 using blocks [8 - 192] to cover 307031 primes
2020-02-12 22:15:09 colab3/TeslaP4 20000023 P2 using 759 buffers of 9.0 MB each
2020-02-12 22:15:21 colab3/TeslaP4 Exception gpu_error: MEM_OBJECT_ALLOCATION_FAILURE clEnqueueCopyBuffer(queue, src, dst, 0, 0, size, 0, NULL, NULL) at clwrap.cpp:344 copyBuf
2020-02-12 22:15:21 colab3/TeslaP4 Bye
That's an 8GB gpu. https://www.mersenneforum.org/showpo...5&postcount=15
preda is offline   Reply With Quote
Old 2020-02-13, 07:33   #1843
preda
 
preda's Avatar
 
"Mihai Preda"
Apr 2015

2×23×29 Posts
Default

Did you try with -use ORIG_SLOWTRIG

Quote:
Originally Posted by kriesel View Post
Latest gpuowl commit is still missing the 15M

Last fiddled with by preda on 2020-02-13 at 07:34
preda is offline   Reply With Quote
Old 2020-02-13, 08:23   #1844
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

37×131 Posts
Default

Quote:
Originally Posted by preda View Post
Try with a smaller -maxAlloc. Can you check the free memory on the GPU -- how much is free before and during the gpuowl run?
Other exponents ran ok with -maxAlloc 7500, allocating as much as 7272MB into P2 buffers. The way I run Colab, I normally can't check the gpu ram free during a run. And the time window for doing so during a run that crashes so quickly in P2 is small. It appears from nvidia-smi output at session start, that since Colab gpus are on headless linux VMs, the initial occupied gpu ram is 0.
T4 0/15079 MiB
P100 0/16280 MiB
P4 ?
K80 ?
To get those last two also is a matter of waiting to hit them in the Colab gpu model lottery. Models are listed in probability order, most frequent recently first.
I've added logging idle and active nvidia-smi output to Google drive files into the Colab script. Colab "screen" output to the browser is lost when a new session is launched, the page closed, or the data scrolls out of the 5000 line buffer. Based on my recent experience with my first Colab accounts it could take ~5 weeks to get all the 4 models allocated. Perhaps it will be quicker on this newer account.

Last fiddled with by kriesel on 2020-02-13 at 09:21
kriesel is offline   Reply With Quote
Old 2020-02-13, 08:46   #1845
preda
 
preda's Avatar
 
"Mihai Preda"
Apr 2015

2·23·29 Posts
Default

OK I checked myself, the problem with M15000031 is that by default it gets a too small FFT size for P-1 (it's at the border). If FFT size is manually increased the factor is found. I'll keep an eye on improving the default FFT size.


Quote:
Originally Posted by kriesel View Post
Latest gpuowl commit is still missing the 15M
preda is offline   Reply With Quote
Old 2020-02-13, 09:29   #1846
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

37×131 Posts
Default

Quote:
Originally Posted by preda View Post
Try with a smaller -maxAlloc. Can you check the free memory on the GPU -- how much is free before and during the gpuowl run?
Other exponents ran ok with -maxAlloc 7500, allocating as much as 7272MB into P2 buffers. The way I run Colab, I normally can't check the gpu ram free during a gpuowl run. And the time window for doing so during a run that crashes so quickly in P2 is small. It appears from nvidia-smi output at session start, that since Colab gpus are on headless linux VMs, the initial occupied gpu ram is 0.

GPU model Idle Active
T4 0/15079 MiB 5939 gpuowl
P100 0/16280 MiB 293 mfaktc
P4 ? ?
K80 ? ?

To get those last two also is a matter of waiting to hit them in the Colab gpu model lottery. Models are listed in probability order, most frequent recently first.
I've added logging idle and active nvidia-smi output to Google drive files into the Colab script. Colab "screen" output to the browser is lost when a new session is launched, the page closed, or the data scrolls out of the 5000 line buffer. Based on my recent experience with my first Colab accounts it could take ~5 weeks to get all the 4 models allocated. Perhaps it will be quicker on this newer account.


(Moderator please delete my previous similar post and this line; this post replaces the previous post.)

Last fiddled with by kriesel on 2020-02-13 at 09:30
kriesel is offline   Reply With Quote
Old 2020-02-13, 10:20   #1847
preda
 
preda's Avatar
 
"Mihai Preda"
Apr 2015

2·23·29 Posts
Default

Ken, I understand it's not easy to get all this information, and maybe it's not even needed. The situation is that I'm not yet convinced that there is a problem with the way GpuOwl handles maxAlloc or buffer allocation. I'm not convinced because I can imagine alternative explanations for the observed behavior. The alternative explanation is: maybe the GPU, even if it is reporting 8GB, does not have all of that actually available. Maybe it has less than 7.5 actually available to be allocated in contigous blocks of 9MB (for some reason). Thus, GpuOwl will fail if ran with maxAlloc 7.5G, but that's not necessarilly a bug of the program.

(all that because OpenCL does not offer a normal/reliable way to query actual free GPU memory)

Quote:
Originally Posted by kriesel View Post
Other exponents ran ok with -maxAlloc 7500, allocating as much as 7272MB into P2 buffers. The way I run Colab, I normally can't check the gpu ram free during a gpuowl run. And the time window for doing so during a run that crashes so quickly in P2 is small. It appears from nvidia-smi output at session start, that since Colab gpus are on headless linux VMs, the initial occupied gpu ram is 0.

GPU model Idle Active
T4 0/15079 MiB 5939 gpuowl
P100 0/16280 MiB 293 mfaktc
P4 ? ?
K80 ? ?

To get those last two also is a matter of waiting to hit them in the Colab gpu model lottery. Models are listed in probability order, most frequent recently first.
I've added logging idle and active nvidia-smi output to Google drive files into the Colab script. Colab "screen" output to the browser is lost when a new session is launched, the page closed, or the data scrolls out of the 5000 line buffer. Based on my recent experience with my first Colab accounts it could take ~5 weeks to get all the 4 models allocated. Perhaps it will be quicker on this newer account.


(Moderator please delete my previous similar post and this line; this post replaces the previous post.)

Last fiddled with by preda on 2020-02-13 at 10:24
preda is offline   Reply With Quote
Old 2020-02-13, 11:04   #1848
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

37×131 Posts
Default

Quote:
Originally Posted by preda View Post
Ken, I understand it's not easy to get all this information, and maybe it's not even needed. The situation is that I'm not yet convinced that there is a problem with the way GpuOwl handles maxAlloc or buffer allocation. I'm not convinced because I can imagine alternative explanations for the observed behavior. The alternative explanation is: maybe the GPU, even if it is reporting 8GB, does not have all of that actually available. Maybe it has less than 7.5 actually available to be allocated in contigous blocks of 9MB (for some reason). Thus, GpuOwl will fail if ran with maxAlloc 7.5G, but that's not necessarilly a bug of the program.

(all that because OpenCL does not offer a normal/reliable way to query actual free GPU memory)
Some of the gpu models with ECC have strange actual usable ram amounts, because ECC is implemented with part of the multiple of power of 2 total ram complement; Tesla C2075 nominal 6GB is 5.25GB (5376MB) net, for example.
I have two Colab accounts "instrumented" now to catch idle and activated nvidia-smi output which includes allocated & total MiB gpu ram. If they're not too buggy script additions.
I agree that it seems unlikely it's a memAlloc problem at 20M test exponent; a smaller exponent succeeded on the P4 with 7272MB allocated, 7500 maxAlloc, while the 20M failed with memAlloc set at7500, 7300, and 7000. But we'll see.

FYI, gtx1080, gpuowl v6.11-134 P-1, stage 2, 443M exponent, 18 buffers x 224MB, nvidia-smi shows 7527/8192MiB active, vs 107 idle.

Last fiddled with by kriesel on 2020-02-13 at 11:07
kriesel is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
mfakto: an OpenCL program for Mersenne prefactoring Bdot GPU Computing 1668 2020-12-22 15:38
GPUOWL AMD Windows OpenCL issues xx005fs GpuOwl 0 2019-07-26 21:37
Testing an expression for primality 1260 Software 17 2015-08-28 01:35
Testing Mersenne cofactors for primality? CRGreathouse Computer Science & Computational Number Theory 18 2013-06-08 19:12
Primality-testing program with multiple types of moduli (PFGW-related) Unregistered Information & Answers 4 2006-10-04 22:38

All times are UTC. The time now is 02:23.

Sat Jan 23 02:23:31 UTC 2021 up 50 days, 22:34, 0 users, load averages: 1.81, 1.78, 1.69

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.