mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware > GPU Computing > GpuOwl

Reply
 
Thread Tools
Old 2019-11-17, 06:16   #45
Prime95
P90 years forever!
 
Prime95's Avatar
 
Aug 2002
Yeehaw, FL

164738 Posts
Default

Thanks all, I can now build an executable. -static was the key.

Debugging has been difficult. The Windows driver seems to have a problem with atomic operations or global memory fences.
Prime95 is online now   Reply With Quote
Old 2019-11-17, 17:00   #46
M344587487
 
M344587487's Avatar
 
"Composite as Heck"
Oct 2017

13·61 Posts
Default

Quote:
Originally Posted by kriesel View Post
What VM software and version provides gpu passthrough? Last time I checked, VirtualBox did not.
If the host OS was Windows, the display TDR issue seems likely to also impact such a guest VM regardless of OS.
I've done a bit of research and it seems GPU passthrough is not easily available on a windows host. It's on Linux host in many ways, no doubt BSD, it's on VMWare's ESXi through VMWare's vSphere which again is not windows. It's on modern Hyper V with Windows Server but not Windows 10. Shame.
M344587487 is offline   Reply With Quote
Old 2019-11-17, 17:27   #47
Prime95
P90 years forever!
 
Prime95's Avatar
 
Aug 2002
Yeehaw, FL

7·1,069 Posts
Default

Please try this version without using long carry. It seems to work for me. I'll forward the source changes to Mihai for his approval.
Attached Files
File Type: zip gpuowl-win.exe.zip (1.05 MB, 191 views)
Prime95 is online now   Reply With Quote
Old 2019-11-17, 17:56   #48
xx005fs
 
"Eric"
Jan 2018
USA

22·53 Posts
Default

Quote:
Originally Posted by Prime95 View Post
Please try this version without using long carry. It seems to work for me. I'll forward the source changes to Mihai for his approval.
Confirmed that short carry works for me too. The speed-up is also present (though long carry have a slight performance regression in this version compared to 6.11 which is what I am using, about 50us. After using short carry it sped up 50us compared to 6.11 long carry so I guess it's still a plus!
xx005fs is offline   Reply With Quote
Old 2019-11-17, 18:23   #49
Prime95
P90 years forever!
 
Prime95's Avatar
 
Aug 2002
Yeehaw, FL

164738 Posts
Default

Quote:
Originally Posted by xx005fs View Post
Confirmed that short carry works for me too. The speed-up is also present (though long carry have a slight performance regression in this version compared to 6.11 which is what I am using, about 50us. After using short carry it sped up 50us compared to 6.11 long carry so I guess it's still a plus!
Strange. I dropped 150us using short carry. I'm getting 990us on a 5M FFT. XFX radeon vii, 1550 MHz, 950mV, 1100 MHz mclk, 189 watts, temp 73, junction 93.

This was built using the latest source with one change to gpuowl.cl. Apparently I did something non-standard (I copied sources from somewhere rather than using git clone) as the .exe I uploaded does not have version info.

Fixed gpuowl.cl: https://www.dropbox.com/s/bin8vkcthu...gpuowl.cl?dl=0
Awaiting Mihai's review.
Prime95 is online now   Reply With Quote
Old 2019-11-17, 18:35   #50
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

3·1,697 Posts
Default

Quote:
Originally Posted by Prime95 View Post
Please try this version without using long carry. It seems to work for me.
For PRP, P-1, or both? I've seen both have issues, and P-1 stage 2 seems the more severe.
kriesel is online now   Reply With Quote
Old 2019-11-17, 18:58   #51
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

3·1,697 Posts
Default Progress, sort of, on XFX Radeon VII

On my XFX RadeonVII on Win10 Pro, multiple gpuowl versions succeed in running primality check, with detected errors. But P-1 stage 2 kills v6.11-9 early.
I haven't tried George's patched version yet.


gpuowl V0.5 runs but errors but produces a matching LL DC res64.
Code:
03340000 / 50240549 [6.65%], ms/iter: 1.571, ETA: 0d 20:28; f8cd01b74ea441df error 4.57764e-005 (max 4.57764e-005)
03360000 / 50240549 [6.69%], ms/iter: 1.575, ETA: 0d 20:31; 3b0cf8e4e4ac5ba3 error 0.5 (max 0.5)
Error is too large; retrying
03360000 / 50240549 [6.69%], ms/iter: 1.564, ETA: 0d 20:22; 6f99d09d989db017 error 4.19617e-005 (max 4.57764e-005)

10020000 / 50240549 [19.94%], ms/iter: 1.551, ETA: 0d 17:19; 737cbb7601f4f31c error 0.5 (max 0.5)
Error is too large; retrying
10020000 / 50240549 [19.94%], ms/iter: 1.566, ETA: 0d 17:30; 934860171a3ce275 error 4.19617e-005 (max 4.57764e-005)

12780000 / 50240549 [25.44%], ms/iter: 1.582, ETA: 0d 16:28; 72b7c7384b821d8a error 0.5 (max 0.5)
Error is too large; retrying
12780000 / 50240549 [25.44%], ms/iter: 1.582, ETA: 0d 16:28; 76f11a6325918f85 error 4.19617e-005 (max 4.57764e-005)

12940000 / 50240549 [25.76%], ms/iter: 1.571, ETA: 0d 16:17; 3142cfaa94ace2c4 error 4.19617e-005 (max 4.57764e-005)
12960000 / 50240549 [25.80%], ms/iter: 1.568, ETA: 0d 16:14; a25bf07ea45c4185 error 0.5 (max 0.5)
Error is too large; retrying
12960000 / 50240549 [25.80%], ms/iter: 1.586, ETA: 0d 16:25; 9084b83ebd4baf0a error 4.19617e-005 (max 4.57764e-005)

23920000 / 50240549 [47.61%], ms/iter: 1.569, ETA: 0d 11:28; 97723077c12a402b error 0.5 (max 0.5)
Error is too large; retrying
23920000 / 50240549 [47.61%], ms/iter: 1.585, ETA: 0d 11:35; 670902199a743007 error 4.57764e-005 (max 4.57764e-005)

24100000 / 50240549 [47.97%], ms/iter: 1.580, ETA: 0d 11:28; 440e09ad44656de6 error 0.5 (max 0.5)
Error is too large; retrying
24100000 / 50240549 [47.97%], ms/iter: 1.578, ETA: 0d 11:27; b83eca8f811eba37 error 4.19617e-005 (max 4.57764e-005)

24340000 / 50240549 [48.45%], ms/iter: 1.566, ETA: 0d 11:16; d926485817437b11 error 0.5 (max 0.5)
Error is too large; retrying
24340000 / 50240549 [48.45%], ms/iter: 1.584, ETA: 0d 11:24; d058b40d338a65ff error 4.19617e-005 (max 4.57764e-005)

29040000 / 50240549 [57.80%], ms/iter: 1.578, ETA: 0d 09:17; edbfb2989f739a15 error 0.5 (max 0.5)
Error is too large; retrying
29040000 / 50240549 [57.80%], ms/iter: 1.571, ETA: 0d 09:15; eef8fe6ae92878a3 error 4.19617e-005 (max 4.57764e-005)

31160000 / 50240549 [62.02%], ms/iter: 1.581, ETA: 0d 08:23; 4da16390f5451420 error 0.5 (max 0.5)
Error is too large; retrying
31160000 / 50240549 [62.02%], ms/iter: 1.578, ETA: 0d 08:22; 633691483196cbc1 error 4.19617e-005 (max 4.57764e-005)

36560000 / 50240549 [72.77%], ms/iter: 1.582, ETA: 0d 06:01; 1e2f4254fd266b67 error 0.5 (max 0.5)
Error is too large; retrying
36560000 / 50240549 [72.77%], ms/iter: 1.574, ETA: 0d 05:59; c6254b151b53ab62 error 4.19617e-005 (max 4.95911e-005)

39980000 / 50240549 [79.58%], ms/iter: 2.381, ETA: 0d 06:47; fffffffffffffffe error 0.5 (max 0.5)
Error is too large; retrying
39980000 / 50240549 [79.58%], ms/iter: 2.405, ETA: 0d 06:51; e66878a4d76f79cc error 3.8147e-005 (max 4.95911e-005)

43760000 / 50240549 [87.10%], ms/iter: 1.611, ETA: 0d 02:54; 84d54e2f7e4abae7 error 0.5 (max 0.5)
Error is too large; retrying
43760000 / 50240549 [87.10%], ms/iter: 1.609, ETA: 0d 02:54; 68c1b96e06cdea1a error 4.19617e-005 (max 4.57764e-005)

48200000 / 50240549 [95.94%], ms/iter: 1.538, ETA: 0d 00:52; 8c672fa7465df027 error 4.19617e-005 (max 4.57764e-005)
48220000 / 50240549 [95.98%], ms/iter: 1.533, ETA: 0d 00:52; fffffffffffffffe error 0.382529 (max 0.382529)
Error jump by 835546.38%, doing a consistency check.
48220000 / 50240549 [95.98%], ms/iter: 1.531, ETA: 0d 00:52; 619709852a967226 error 4.19617e-005 (max 4.57764e-005)
Consistency check FAILED, stopping.

48200000 / 50240549 [95.94%], ms/iter: 1.538, ETA: 0d 00:52; 8c672fa7465df027 error 4.19617e-005 (max 4.57764e-005)
48220000 / 50240549 [95.98%], ms/iter: 1.533, ETA: 0d 00:52; fffffffffffffffe error 0.382529 (max 0.382529)
Error jump by 835546.38%, doing a consistency check.
48220000 / 50240549 [95.98%], ms/iter: 1.531, ETA: 0d 00:52; 619709852a967226 error 4.19617e-005 (max 4.57764e-005)
Consistency check FAILED, stopping.
12 roundoff error too large; retrying
2 consistency check
res64 matched. https://www.mersenne.org/report_expo...exp_hi=&full=1


V0.6 LL DC:
I tried a strategic triple check in Gpuowl V0.6 which does LL with Jacobi check on AMD gpus in 4M fft, so it's good for LL DC up to ~77M, or several years yet. A Radeon VII can knock these out in under a day.
The result I got apparently confirms Ernst's result, on 55473541
(And the server refused it, or rather did not understand it. Presumably because its result output included ,AID: 0 for a TC I could not get an assignment for from the server. James H has been notified and has responded already.)
Code:
02140000 / 55473541 [3.86%], ms/iter: 2.920, ETA: 1d 19:15; 15d065d8c8729d15 roundoff 0.000244141 (max 0.000274658)
02160000 / 55473541 6b207006b08df182 Retry : roundoff 0.5 is too large
02160000 / 55473541 [3.89%], ms/iter: 5.851, ETA: 3d 14:39; 0f92dd48c1784517 roundoff 0.000244141 (max 0.000274658)

02880000 / 55473541 [5.19%], ms/iter: 2.634, ETA: 1d 14:29; 8b67007254d501ef roundoff 0.000244141 (max 0.000274658)
02900000 / 55473541 205b4d16b929d367 Retry : roundoff 0.5 is too large
02900000 / 55473541 [5.23%], ms/iter: 5.285, ETA: 3d 05:11; 623d19d986f3dd05 roundoff 0.000244141 (max 0.000274658)

20380000 / 55473541 [36.74%], ms/iter: 0.873, ETA: 0d 08:31; 0139edea50787064 roundoff 0.000244141 (max 0.000274658)
20400000 / 55473541 0000000000000002 Retry : loop
20400000 / 55473541 [36.77%], ms/iter: 1.775, ETA: 0d 17:17; 28ee13a6ef2ccaf4 roundoff 0.000244141 (max 0.000274658)
V6.11-9 PRP DC:
Caught and corrected two GEC errors, completed correctly a rerun of PRP3 on 82589933.

Code:
V6.11-9 82589933 PRP3 errors on Radeon VII

2019-11-16 01:45:13 82589933    55850000  67.62%;  941 us/sq; ETA 0d 06:59; 0a348af3703b3a13
2019-11-16 01:46:00 82589933    55900000  67.68%;  939 us/sq; ETA 0d 06:58; 0000000000000000
2019-11-16 01:46:46 82589933    55950000  67.74%;  933 us/sq; ETA 0d 06:54; 0000000000000000
2019-11-16 01:47:34 82589933 EE 56000000  67.80%;  934 us/sq; ETA 0d 06:54; 0000000000000000 (check 0.77s)
2019-11-16 01:48:22 82589933    55800000  67.56%;  953 us/sq; ETA 0d 07:06; 8f077051d6fdb58f
2019-11-16 01:49:09 82589933    55850000  67.62%;  941 us/sq; ETA 0d 07:00; 0a348af3703b3a13

2019-11-16 03:44:21 82589933 OK 60500000  73.25%; 1955 us/sq; ETA 0d 12:00; 023994d6bd9e58a8 (check 1.27s) 1 errors
2019-11-16 03:45:59 82589933    60550000  73.31%; 1953 us/sq; ETA 0d 11:57; 88169fb1f9b4b21c
2019-11-16 03:47:36 82589933    60600000  73.37%; 1943 us/sq; ETA 0d 11:52; f2993d91a0316a39
2019-11-16 03:49:13 82589933    60650000  73.44%; 1935 us/sq; ETA 0d 11:48; ab213f1673967b49
2019-11-16 03:50:50 82589933    60700000  73.50%; 1950 us/sq; ETA 0d 11:52; 443b9385d91ecbc4
2019-11-16 03:52:29 82589933 EE 60750000  73.56%; 1954 us/sq; ETA 0d 11:51; 7d72d5ceb8bc4d90 (check 1.29s) 1 errors
2019-11-16 03:54:08 82589933    60550000  73.31%; 1978 us/sq; ETA 0d 12:07; 88169fb1f9b4b21c
2019-11-16 03:55:46 82589933    60600000  73.37%; 1945 us/sq; ETA 0d 11:53; f2993d91a0316a39
2019-11-16 03:57:23 82589933    60650000  73.44%; 1942 us/sq; ETA 0d 11:50; ab213f1673967b49
2019-11-16 03:59:00 82589933    60700000  73.50%; 1953 us/sq; ETA 0d 11:52; 443b9385d91ecbc4
2019-11-16 04:00:39 82589933 OK 60750000  73.56%; 1950 us/sq; ETA 0d 11:50; d3ebe76a016e73a5 (check 1.32s) 2 errors
V6.11-9 P-1 stage 2 fails:
Code:
gpuowl v6.11-9 P-1 stage 2 fatal error

2019-11-16 17:44:37 100003037 P1  1190000  99.36%; 3871 us/sq; ETA 0d 00:00; 46553feda7b08932
2019-11-16 17:45:07 100003037 P1  1197722 100.00%; 3893 us/sq; ETA 0d 00:00; 36368ba430e41255
2019-11-16 17:45:07 P-1 (B1=830000, B2=17430000, D=30030): primes 1050980, expanded 1071560, doubles 177259 (left 703338), singles 696462, total 873721 (83%)
2019-11-16 17:45:07 100003037 P2 using blocks [28 - 580] to cover 873721 primes
2019-11-16 17:45:08 100003037 P2 using 344 buffers of 44.0 MB each
(crash; restart)
2019-11-16 20:29:43 Note: no config.txt file found
2019-11-16 20:29:43 config: -user kriesel -cpu roa/radeonvii -use FMA_X2 -device 1 -carry long 
2019-11-16 20:29:43 100003037 FFT 5632K: Width 256x4, Height 64x4, Middle 11; 17.34 bits/word
2019-11-16 20:29:43 using long carry kernels
2019-11-16 20:29:43 OpenCL args "-DEXP=100003037u -DWIDTH=1024u -DSMALL_HEIGHT=256u -DMIDDLE=11u -DWEIGHT_STEP=0xc.a3e01ed682068p-3 -DIWEIGHT_STEP=0xa.20606be35c478p-4 -DWEIGHT_BIGSTEP=0xd.744fccad69d68p-3 -DIWEIGHT_BIGSTEP=0x9.837f0518db8a8p-4 -DFMA_X2=1  -I. -cl-fast-relaxed-math -cl-std=CL2.0"
2019-11-16 20:29:49 OpenCL compilation in 5216 ms
2019-11-16 20:29:51 100003037 P1 B1=830000, B2=17430000; 1197722 bits; starting at 1197721
2019-11-16 20:29:51 100003037 P1  1197722 100.00%; 12091 us/sq; ETA 0d 00:00; 36368ba430e41255
2019-11-16 20:29:51 P-1 (B1=830000, B2=17430000, D=30030): primes 1050980, expanded 1071560, doubles 177259 (left 703338), singles 696462, total 873721 (83%)
2019-11-16 20:29:51 100003037 P2 using blocks [28 - 580] to cover 873721 primes
2019-11-16 20:29:52 100003037 P2 using 345 buffers of 44.0 MB each
(crash again)
2019-11-17 00:29:45 Note: no config.txt file found
2019-11-17 00:29:45 config: -user kriesel -cpu roa/radeonvii -use FMA_X2 -device 1 -carry long -maxAlloc 15000 
2019-11-17 00:29:45 100003037 FFT 5632K: Width 256x4, Height 64x4, Middle 11; 17.34 bits/word
2019-11-17 00:29:45 using long carry kernels
2019-11-17 00:29:46 OpenCL args "-DEXP=100003037u -DWIDTH=1024u -DSMALL_HEIGHT=256u -DMIDDLE=11u -DWEIGHT_STEP=0xc.a3e01ed682068p-3 -DIWEIGHT_STEP=0xa.20606be35c478p-4 -DWEIGHT_BIGSTEP=0xd.744fccad69d68p-3 -DIWEIGHT_BIGSTEP=0x9.837f0518db8a8p-4 -DFMA_X2=1  -I. -cl-fast-relaxed-math -cl-std=CL2.0"
2019-11-17 00:29:51 OpenCL compilation in 5326 ms
2019-11-17 00:29:53 100003037 P1 B1=830000, B2=17430000; 1197722 bits; starting at 1197721
2019-11-17 00:29:53 100003037 P1  1197722 100.00%; 10500 us/sq; ETA 0d 00:00; 36368ba430e41255
2019-11-17 00:29:54 P-1 (B1=830000, B2=17430000, D=30030): primes 1050980, expanded 1071560, doubles 177259 (left 703338), singles 696462, total 873721 (83%)
2019-11-17 00:29:54 100003037 P2 using blocks [28 - 580] to cover 873721 primes
2019-11-17 00:29:54 100003037 P2 using 324 buffers of 44.0 MB each
(crash)
2019-11-17 07:47:50 Note: no config.txt file found
2019-11-17 07:47:50 config: -user kriesel -cpu roa/radeonvii -use FMA_X2 -device 1 -carry long -maxAlloc 8000 
2019-11-17 07:47:50 100003037 FFT 5632K: Width 256x4, Height 64x4, Middle 11; 17.34 bits/word
2019-11-17 07:47:50 using long carry kernels
2019-11-17 07:47:50 OpenCL args "-DEXP=100003037u -DWIDTH=1024u -DSMALL_HEIGHT=256u -DMIDDLE=11u -DWEIGHT_STEP=0xc.a3e01ed682068p-3 -DIWEIGHT_STEP=0xa.20606be35c478p-4 -DWEIGHT_BIGSTEP=0xd.744fccad69d68p-3 -DIWEIGHT_BIGSTEP=0x9.837f0518db8a8p-4 -DFMA_X2=1  -I. -cl-fast-relaxed-math -cl-std=CL2.0"
2019-11-17 07:47:56 OpenCL compilation in 5375 ms
2019-11-17 07:47:57 100003037 P1 B1=830000, B2=17430000; 1197722 bits; starting at 1197721
2019-11-17 07:47:58 100003037 P1  1197722 100.00%; 10455 us/sq; ETA 0d 00:00; 36368ba430e41255
2019-11-17 07:47:58 P-1 (B1=830000, B2=17430000, D=30030): primes 1050980, expanded 1071560, doubles 177259 (left 703338), singles 696462, total 873721 (83%)
2019-11-17 07:47:58 100003037 P2 using blocks [28 - 580] to cover 873721 primes
2019-11-17 07:47:58 100003037 P2 using 165 buffers of 44.0 MB each
(crash; 2 Windows TDR events, id 4101 in Windows system event log 07:48:09 and 07:48:34)
kriesel is online now   Reply With Quote
Old 2019-11-17, 19:04   #52
Prime95
P90 years forever!
 
Prime95's Avatar
 
Aug 2002
Yeehaw, FL

7×1,069 Posts
Default

Quote:
Originally Posted by kriesel View Post
For PRP, P-1, or both? I've seen both have issues, and P-1 stage 2 seems the more severe.
I think P-1 uses the same code. If I'm right, P-1 should work too.
Prime95 is online now   Reply With Quote
Old 2019-11-17, 19:04   #53
xx005fs
 
"Eric"
Jan 2018
USA

22×53 Posts
Default

Quote:
Originally Posted by Prime95 View Post
Strange. I dropped 150us using short carry. I'm getting 990us on a 5M FFT. XFX radeon vii, 1550 MHz, 950mV, 1100 MHz mclk, 189 watts, temp 73, junction 93.

This was built using the latest source with one change to gpuowl.cl. Apparently I did something non-standard (I copied sources from somewhere rather than using git clone) as the .exe I uploaded does not have version info.

Fixed gpuowl.cl: https://www.dropbox.com/s/bin8vkcthu...gpuowl.cl?dl=0
Awaiting Mihai's review.
I am using a Vega 64 and the speed up of around 4% is around 100us. Not bad at all.

Another thing worth experimenting on Windows is tweaking the HBM2 timings. I have found significant uplifts going from a looser timing to tighter timings in mining such as XMR. I am trying to figure out a better timing table for PRP workloads that's hopefully stable. But so far I just put my XMR mining timing on and a similar 4-5% increase in throughput is observed as switching to short carry from long.

Last fiddled with by xx005fs on 2019-11-17 at 19:10
xx005fs is offline   Reply With Quote
Old 2019-11-17, 19:39   #54
Prime95
P90 years forever!
 
Prime95's Avatar
 
Aug 2002
Yeehaw, FL

7·1,069 Posts
Default

Quote:
Originally Posted by xx005fs View Post
Another thing worth experimenting on Windows is tweaking the HBM2 timings. I have found significant uplifts going from a looser timing to tighter timings in mining such as XMR. I am trying to figure out a better timing table for PRP workloads that's hopefully stable. But so far I just put my XMR mining timing on and a similar 4-5% increase in throughput is observed as switching to short carry from long.
How do you do that? Using Wattman all I can see to change is the memory clock.
Prime95 is online now   Reply With Quote
Old 2019-11-17, 20:16   #55
philbo0042
 
Oct 2019

11102 Posts
Default

Quote:
Originally Posted by Prime95 View Post
Please try this version without using long carry. It seems to work for me. I'll forward the source changes to Mihai for his approval.
Thank you so much for your work in creating this executable. I have spent a lot of time this weekend reading threads and trying to get gpuOwL to work on my new R7, but had no luck. That is really strange since, after some advice from kriesel, I was able to make it work on my laptop fairly easily.

After I used your executable my new setup works great! I wanted to say thanks before I forgot.
philbo0042 is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
gpuOwL-specific reference material kriesel kriesel 28 2021-03-27 18:40
gpuowl: runtime error SELROC GpuOwl 59 2020-10-02 03:56
GPUOWL AMD Windows OpenCL issues xx005fs GpuOwl 0 2019-07-26 21:37
gpuowl tuning M344587487 GpuOwl 14 2018-12-29 08:11
How to interface gpuOwl with PrimeNet preda PrimeNet 2 2017-10-07 21:32

All times are UTC. The time now is 23:18.

Thu May 6 23:18:14 UTC 2021 up 28 days, 17:59, 0 users, load averages: 1.20, 1.54, 1.93

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.