mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware > GPU Computing > GpuOwl

Reply
 
Thread Tools
Old 2020-02-06, 19:28   #1827
ewmayer
2ω=0
 
ewmayer's Avatar
 
Sep 2002
Rep├║blica de California

5×7×331 Posts
Default

Quote:
Originally Posted by kriesel View Post
CUDALucas still has its place;
faster on a few gpu models than gpuowl;
will run on older NVIDIA gpus that are entirelly incapable of running gpuowl because they don't support the required OpenCL level for gpuowl;
relatively current gpuowl versions don't do LL so can't do LLDC (although v0.5 and v0.6 gpuowl can with 4M fft)
It would be great if CUDALucas had the Jacobi check.
Ha, ha, re. your note about running on older nVidia GPUs -- in preparation for my recent upgrade of my deskside Haswell system to put a Radeon 7 in the PCI3 slot, I first removed an ancient gtx430 card from the PCI2 slot. Mike/Xyzzy had gifted me that ~5 years ago to use to play with CUDA development work - I actually got as far as working TF code, but never did get the sieving stuff optimized for the nVidia architecture, so it was spending way more time there than it should, overall speed was about 1/2 that of mfaktc.

Anyhow, I still have the card, could re-install it in PCI2, that would leave a mere 1" gap between it and the underbelly fan array of the R7 so I would need to make sure it wasn't significantly impeding airflow to the latter. Just curious - do you have any sense how fast - and I use the term very loosely :) - this card would be at LL and TF? Probably not worth it on a work-per-watt basis, but the experiment might be useful in terms of seeing whether *some* kind of GPU - e.g. a newer used model of one of the ones known to be good choices for GIMPS work - could go into PCI2 without hurting the R7 throughput.

(A second R7 is not an option, even if it could be adapted to go into PCI2, my PS has only 2 8-pin power connects, and the current R7 uses them both.)
ewmayer is offline   Reply With Quote
Old 2020-02-06, 19:36   #1828
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

10010111011112 Posts
Default

Quote:
Originally Posted by ewmayer View Post
I first removed an ancient gtx430 ... do you have any sense how fast - and I use the term very loosely :) - this card would be at LL and TF? Probably not worth it on a work-per-watt basis
Probably slower than any of these:
https://www.mersenne.ca/mfaktc.php (enter gtx 4 in the model search box)
https://www.mersenne.ca/cudalucas.php (ditto)

If you want a more sure answer, find the NVIDIA spec pages for the GTX430 and one or more of the models listed on Heinrich's pages, and compute an estimate by proportion. Since my test GTX480 could not run gpuowl, the GTX430's chances are slim. It would look pretty weak compared to a Radeon VII. Or in mfaktc, compared to a GTX 1xxx or RTX.

Try it on. Send James some benchmarks.

Understandable if you don't regard its space-heater value as sufficient to run for long, in California.

Last fiddled with by kriesel on 2020-02-06 at 19:43
kriesel is offline   Reply With Quote
Old 2020-02-09, 21:56   #1829
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

37·131 Posts
Default gpuowl-win build 6.11-142-gf54af2e

Stumbled across the new commit while bringing up a new Colab account. Haven't run it other than -h. I defer to Mihai as to what this offers beyond v6.11-134.
It still eats a whole cpu core on Windows 10 during P-1 with -yield option included. At least for a while after startup.
Attached Files
File Type: zip gpu-win-v6.11-142-gf54af2e.zip (642.5 KB, 63 views)
File Type: txt gpuowl-build-log.txt (6.1 KB, 59 views)

Last fiddled with by kriesel on 2020-02-09 at 22:06
kriesel is offline   Reply With Quote
Old 2020-02-10, 00:10   #1830
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

10010111011112 Posts
Default 10M test p-1 reliably missed known factor

The first run of the 10M exponent was with optimized -use flags on gpuowl-win v6.11-134. After that it was just -use NO_ASM. Different -use, different bounds, never found the known factor although all the bounds tried should have been adequate. The 50M test p-1 would not run with the numerous -use options previously in place for a different fft length, so those options were temporarily removed.
The same radeon vii gpu has run error free in PRP GEC since a clock reduction Dec 15 2019. (1250Mhz gpu, 880 Mhz memory; runs P-1 at 70-100W; hot spot currently 71C)
It also missed the known factor on 4444091.
And missed the known factors of 2000081. And that of 15000031.
Everything I tried below 20M failed.

Code:
{"exponent":"10000831", "worktype":"PM1", "status":"NF", "program":{"name":"gpuowl", "version":"v6.11-134-g1e0ce1d"}, "timestamp":"2020-02-09 23:17:56 UTC", "user":"kriesel", "computer":"roa/radeonvii", "fft-length":524288, "B1":30000, "B2":500000}
{"exponent":"50001781", "worktype":"PM1", "status":"F", "program":{"name":"gpuowl", "version":"v6.11-134-g1e0ce1d"}, "timestamp":"2020-02-09 23:31:40 UTC", "user":"kriesel", "computer":"roa/radeonvii", "fft-length":2883584, "B1":100000, "B2":5000000, "factors":["4392938042637898431087689"]}
{"exponent":"10000831", "worktype":"PM1", "status":"NF", "program":{"name":"gpuowl", "version":"v6.11-134-g1e0ce1d"}, "timestamp":"2020-02-09 23:33:03 UTC", "user":"kriesel", "computer":"roa/radeonvii", "fft-length":524288, "B1":30000, "B2":500000}
{"exponent":"24000577", "worktype":"PM1", "status":"F", "program":{"name":"gpuowl", "version":"v6.11-134-g1e0ce1d"}, "timestamp":"2020-02-09 23:43:41 UTC", "user":"kriesel", "computer":"roa/radeonvii", "fft-length":1310720, "B1":300000, "factors":["13504596665207"]}
{"exponent":"10000831", "worktype":"PM1", "status":"NF", "program":{"name":"gpuowl", "version":"v6.11-134-g1e0ce1d"}, "timestamp":"2020-02-09 23:46:42 UTC", "user":"kriesel", "computer":"roa/radeonvii", "fft-length":524288, "B1":40000, "B2":600000}
{"exponent":"10000831", "worktype":"PM1", "status":"NF", "program":{"name":"gpuowl", "version":"v6.11-134-g1e0ce1d"}, "timestamp":"2020-02-09 23:48:38 UTC", "user":"kriesel", "computer":"roa/radeonvii", "fft-length":524288, "B1":120000, "B2":2200000}
{"exponent":"4444091", "worktype":"PM1", "status":"NF", "program":{"name":"gpuowl", "version":"v6.11-134-g1e0ce1d"}, "timestamp":"2020-02-10 00:02:00 UTC", "user":"kriesel", "computer":"roa/radeonvii", "fft-length":229376, "B1":15015, "B2":90000}
{"exponent":"61012769", "worktype":"PM1", "status":"F", "program":{"name":"gpuowl", "version":"v6.11-134-g1e0ce1d"}, "timestamp":"2020-02-10 00:05:19 UTC", "user":"kriesel", "computer":"roa/radeonvii", "fft-length":3670016, "B1":20000, "B2":2000000, "factors":["2018028590362685212673"]}
{"exponent":"2000081", "worktype":"PM1", "status":"NF", "program":{"name":"gpuowl", "version":"v6.11-134-g1e0ce1d"}, "timestamp":"2020-02-10 00:19:58 UTC", "user":"kriesel", "computer":"roa/radeonvii", "fft-length":131072, "B1":15015, "B2":300300}
{"exponent":"2000081", "worktype":"PM1", "status":"NF", "program":{"name":"gpuowl", "version":"v6.11-134-g1e0ce1d"}, "timestamp":"2020-02-10 00:24:03 UTC", "user":"kriesel", "computer":"roa/radeonvii", "fft-length":131072, "B1":15015, "B2":30030}
{"exponent":"15000031", "worktype":"PM1", "status":"NF", "program":{"name":"gpuowl", "version":"v6.11-134-g1e0ce1d"}, "timestamp":"2020-02-10 00:55:47 UTC", "user":"kriesel", "computer":"roa/radeonvii", "fft-length":786432, "B1":180000, "B2":3780000}
{"exponent":"20000023", "worktype":"PM1", "status":"F", "program":{"name":"gpuowl", "version":"v6.11-134-g1e0ce1d"}, "timestamp":"2020-02-10 01:05:52 UTC", "user":"kriesel", "computer":"roa/radeonvii", "fft-length":1179648, "B1":240000, "factors":["60100040564410724460091241"]}

Last fiddled with by kriesel on 2020-02-10 at 01:08
kriesel is offline   Reply With Quote
Old 2020-02-10, 08:22   #1831
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

37×131 Posts
Default

Quote:
Originally Posted by preda View Post
ROCm exposes a per-GPU unique_id, e.g.:

Code:
cat /sys/class/drm/card0/device/unique_id 
3044212172dc768c
This id is a property of the GPU itself, and does not depend on the system or PCIe slot. So changing a GPU in a different slot, or in a different system, preserves the UID.

I added a way to specify the GPU to run on by using this unique id:
./gpuowl -uid 3044212172dc768c

this can be used instead of -device (-d) which specifies the device by position in the list of devices. The advantage is that the identity of the GPU is preserved when swapping the PCIe slots.

Combining -uid with -cpu allows to associate a stable symbolic name to an actual GPU.
The Windows driver does not support this, yielding a nul id:
Code:
2020-02-09 19:00:02 roa/radeonvii Bye
2020-02-09 19:00:06 config: -device 1 -user kriesel -cpu roa/radeonvii -yield -maxAlloc 16000 -use NO_ASM
2020-02-09 19:00:06 device 1, unique id ''
kriesel is offline   Reply With Quote
Old 2020-02-10, 09:50   #1832
preda
 
preda's Avatar
 
"Mihai Preda"
Apr 2015

2·23·29 Posts
Default

Thank you for the bug report! investigating..

The first failed case, 10000831, is fixed by using "-use ORIG_SLOWTRIG". Could you please check if there are any failures with -use ORIG_SLOWTRIG

In parallel we'll be looking for a better fix.

A faster way to repro the problem is e.g. "gpuowl -prp 10000831" which fails GEC.

Note: if re-running the P-1s be sure to delete the savefiles from the previous runs (or run in a new location)

Quote:
Originally Posted by kriesel View Post
The first run of the 10M exponent was with optimized -use flags on gpuowl-win v6.11-134. After that it was just -use NO_ASM. Different -use, different bounds, never found the known factor although all the bounds tried should have been adequate. The 50M test p-1 would not run with the numerous -use options previously in place for a different fft length, so those options were temporarily removed.
The same radeon vii gpu has run error free in PRP GEC since a clock reduction Dec 15 2019. (1250Mhz gpu, 880 Mhz memory; runs P-1 at 70-100W; hot spot currently 71C)
It also missed the known factor on 4444091.
And missed the known factors of 2000081. And that of 15000031.
Everything I tried below 20M failed.

Code:
{"exponent":"10000831", "worktype":"PM1", "status":"NF", "program":{"name":"gpuowl", "version":"v6.11-134-g1e0ce1d"}, "timestamp":"2020-02-09 23:17:56 UTC", "user":"kriesel", "computer":"roa/radeonvii", "fft-length":524288, "B1":30000, "B2":500000}
{"exponent":"50001781", "worktype":"PM1", "status":"F", "program":{"name":"gpuowl", "version":"v6.11-134-g1e0ce1d"}, "timestamp":"2020-02-09 23:31:40 UTC", "user":"kriesel", "computer":"roa/radeonvii", "fft-length":2883584, "B1":100000, "B2":5000000, "factors":["4392938042637898431087689"]}
{"exponent":"10000831", "worktype":"PM1", "status":"NF", "program":{"name":"gpuowl", "version":"v6.11-134-g1e0ce1d"}, "timestamp":"2020-02-09 23:33:03 UTC", "user":"kriesel", "computer":"roa/radeonvii", "fft-length":524288, "B1":30000, "B2":500000}
{"exponent":"24000577", "worktype":"PM1", "status":"F", "program":{"name":"gpuowl", "version":"v6.11-134-g1e0ce1d"}, "timestamp":"2020-02-09 23:43:41 UTC", "user":"kriesel", "computer":"roa/radeonvii", "fft-length":1310720, "B1":300000, "factors":["13504596665207"]}
{"exponent":"10000831", "worktype":"PM1", "status":"NF", "program":{"name":"gpuowl", "version":"v6.11-134-g1e0ce1d"}, "timestamp":"2020-02-09 23:46:42 UTC", "user":"kriesel", "computer":"roa/radeonvii", "fft-length":524288, "B1":40000, "B2":600000}
{"exponent":"10000831", "worktype":"PM1", "status":"NF", "program":{"name":"gpuowl", "version":"v6.11-134-g1e0ce1d"}, "timestamp":"2020-02-09 23:48:38 UTC", "user":"kriesel", "computer":"roa/radeonvii", "fft-length":524288, "B1":120000, "B2":2200000}
{"exponent":"4444091", "worktype":"PM1", "status":"NF", "program":{"name":"gpuowl", "version":"v6.11-134-g1e0ce1d"}, "timestamp":"2020-02-10 00:02:00 UTC", "user":"kriesel", "computer":"roa/radeonvii", "fft-length":229376, "B1":15015, "B2":90000}
{"exponent":"61012769", "worktype":"PM1", "status":"F", "program":{"name":"gpuowl", "version":"v6.11-134-g1e0ce1d"}, "timestamp":"2020-02-10 00:05:19 UTC", "user":"kriesel", "computer":"roa/radeonvii", "fft-length":3670016, "B1":20000, "B2":2000000, "factors":["2018028590362685212673"]}
{"exponent":"2000081", "worktype":"PM1", "status":"NF", "program":{"name":"gpuowl", "version":"v6.11-134-g1e0ce1d"}, "timestamp":"2020-02-10 00:19:58 UTC", "user":"kriesel", "computer":"roa/radeonvii", "fft-length":131072, "B1":15015, "B2":300300}
{"exponent":"2000081", "worktype":"PM1", "status":"NF", "program":{"name":"gpuowl", "version":"v6.11-134-g1e0ce1d"}, "timestamp":"2020-02-10 00:24:03 UTC", "user":"kriesel", "computer":"roa/radeonvii", "fft-length":131072, "B1":15015, "B2":30030}
{"exponent":"15000031", "worktype":"PM1", "status":"NF", "program":{"name":"gpuowl", "version":"v6.11-134-g1e0ce1d"}, "timestamp":"2020-02-10 00:55:47 UTC", "user":"kriesel", "computer":"roa/radeonvii", "fft-length":786432, "B1":180000, "B2":3780000}
{"exponent":"20000023", "worktype":"PM1", "status":"F", "program":{"name":"gpuowl", "version":"v6.11-134-g1e0ce1d"}, "timestamp":"2020-02-10 01:05:52 UTC", "user":"kriesel", "computer":"roa/radeonvii", "fft-length":1179648, "B1":240000, "factors":["60100040564410724460091241"]}

Last fiddled with by preda on 2020-02-10 at 10:24
preda is offline   Reply With Quote
Old 2020-02-10, 11:08   #1833
preda
 
preda's Avatar
 
"Mihai Preda"
Apr 2015

2·23·29 Posts
Default

@Ken An attempted fix is in the most recent commit https://github.com/preda/gpuowl/comm...653c12ef4f417c
, which is supposed to fix the issues without requiring -use ORIG_SLOWTRIG

Quote:
Originally Posted by preda View Post
Thank you for the bug report! investigating..

The first failed case, 10000831, is fixed by using "-use ORIG_SLOWTRIG". Could you please check if there are any failures with -use ORIG_SLOWTRIG

In parallel we'll be looking for a better fix.

A faster way to repro the problem is e.g. "gpuowl -prp 10000831" which fails GEC.

Note: if re-running the P-1s be sure to delete the savefiles from the previous runs (or run in a new location)

Last fiddled with by preda on 2020-02-10 at 11:09
preda is offline   Reply With Quote
Old 2020-02-10, 19:23   #1834
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

37·131 Posts
Default

Quote:
Originally Posted by preda View Post
@Ken An attempted fix is in the most recent commit https://github.com/preda/gpuowl/comm...653c12ef4f417c
, which is supposed to fix the issues without requiring -use ORIG_SLOWTRIG
Thanks for the quick actions on this.
The issue reached to at least 19M.

Code:
{"exponent":"18000137", "worktype":"PM1", "status":"NF", "program":{"name":"gpuowl", "version":"v6.11-134-g1e0ce1d"}, "timestamp":"2020-02-10 17:58:35 UTC", "user":"kriesel", "computer":"roa/radeonvii", "fft-length":1048576, "B1":220000, "B2":5060000}
{"exponent":"19000013", "worktype":"PM1", "status":"NF", "program":{"name":"gpuowl", "version":"v6.11-134-g1e0ce1d"}, "timestamp":"2020-02-10 19:03:35 UTC", "user":"kriesel", "computer":"roa/radeonvii", "fft-length":1048576, "B1":230000, "B2":9520000}
Will try a new commit later.
kriesel is offline   Reply With Quote
Old 2020-02-10, 20:32   #1835
preda
 
preda's Avatar
 
"Mihai Preda"
Apr 2015

2·23·29 Posts
Default

Hi Ken, running with -use ORIG_SLOWTRIG I find factors (in stage1 already) for both 18000137 and 19000013. Do you still see a problem with -use ORIG_SLOWTRIG?

Quote:
Originally Posted by kriesel View Post
Thanks for the quick actions on this.
The issue reached to at least 19M.

Code:
{"exponent":"18000137", "worktype":"PM1", "status":"NF", "program":{"name":"gpuowl", "version":"v6.11-134-g1e0ce1d"}, "timestamp":"2020-02-10 17:58:35 UTC", "user":"kriesel", "computer":"roa/radeonvii", "fft-length":1048576, "B1":220000, "B2":5060000}
{"exponent":"19000013", "worktype":"PM1", "status":"NF", "program":{"name":"gpuowl", "version":"v6.11-134-g1e0ce1d"}, "timestamp":"2020-02-10 19:03:35 UTC", "user":"kriesel", "computer":"roa/radeonvii", "fft-length":1048576, "B1":230000, "B2":9520000}
Will try a new commit later.
preda is offline   Reply With Quote
Old 2020-02-10, 21:43   #1836
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

10010111011112 Posts
Default

Quote:
Originally Posted by preda View Post
Hi Ken, running with -use ORIG_SLOWTRIG I find factors (in stage1 already) for both 18000137 and 19000013. Do you still see a problem with -use ORIG_SLOWTRIG?
Win10 x64, gpuowl v6.11-134
In config.txt:
Code:
-device 1 -user kriesel -cpu roa/radeonvii -yield -maxAlloc 16000 -use NO_ASM,ORIG_SLOWTRIG
In worktodo:
Code:
B1=15015,B2=300300;PFactor=0,1,2,2000081,-1,61,2
B1=3000;PFactor=0,1,2,4444091,-1,64,2
B1=30000,B2=500000;PFactor=0,1,2,10000831,-1,70,2
B1=180000,B2=3780000;PFactor=0,1,2,15000031,-1,66,2
B1=220000,B2=5060000;PFactor=0,1,2,18000137,-1,35,2
B1=230000,B2=9520000;PFactor=0,1,2,19000013,-1,53,2
Results (15M missed factor):
Code:
{"exponent":"2000081", "worktype":"PM1", "status":"F", "program":{"name":"gpuowl", "version":"v6.11-134-g1e0ce1d"}, "timestamp":"2020-02-10 21:25:34 UTC", "user":"kriesel", "computer":"roa/radeonvii", "fft-length":131072, "B1":15015, "factors":["2700109974025273"]}
{"exponent":"4444091", "worktype":"PM1", "status":"F", "program":{"name":"gpuowl", "version":"v6.11-134-g1e0ce1d"}, "timestamp":"2020-02-10 21:25:47 UTC", "user":"kriesel", "computer":"roa/radeonvii", "fft-length":229376, "B1":15015, "factors":["1809798096458971047321927127"]}
{"exponent":"10000831", "worktype":"PM1", "status":"F", "program":{"name":"gpuowl", "version":"v6.11-134-g1e0ce1d"}, "timestamp":"2020-02-10 21:26:15 UTC", "user":"kriesel", "computer":"roa/radeonvii", "fft-length":524288, "B1":30000, "B2":500000, "factors":["646560662529991467527"]}
{"exponent":"15000031", "worktype":"PM1", "status":"NF", "program":{"name":"gpuowl", "version":"v6.11-134-g1e0ce1d"}, "timestamp":"2020-02-10 21:30:07 UTC", "user":"kriesel", "computer":"roa/radeonvii", "fft-length":786432, "B1":180000, "B2":3780000}
{"exponent":"18000137", "worktype":"PM1", "status":"F", "program":{"name":"gpuowl", "version":"v6.11-134-g1e0ce1d"}, "timestamp":"2020-02-10 21:32:35 UTC", "user":"kriesel", "computer":"roa/radeonvii", "fft-length":1048576, "B1":220000, "factors":["2479169845866581244380961527"]}
{"exponent":"19000013", "worktype":"PM1", "status":"F", "program":{"name":"gpuowl", "version":"v6.11-134-g1e0ce1d"}, "timestamp":"2020-02-10 21:35:57 UTC", "user":"kriesel", "computer":"roa/radeonvii", "fft-length":1048576, "B1":230000, "factors":["4674003199"]}
I don't know why, but the 768k fft for 15M took 444us/it in P1 while the 1024k for 18M, 19M took 310us/it in P1. I reran the 15M with a 3980000 B2 but it still missed.

Last fiddled with by kriesel on 2020-02-10 at 21:52
kriesel is offline   Reply With Quote
Old 2020-02-11, 14:14   #1837
mrh
 
"mrh"
Oct 2018
Temecula, ca

32·7 Posts
Default

I ran

Code:
B1=180000,B2=3780000;PFactor=0,1,2,15000031,-1,66,2
with a fresh checked out master, with and without ORIG_SLOWTRIG, as well as an old version (pre new sin/cos code) and none of those found a factor. So there may be some other issue. I'll keep checking.
mrh is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
mfakto: an OpenCL program for Mersenne prefactoring Bdot GPU Computing 1668 2020-12-22 15:38
GPUOWL AMD Windows OpenCL issues xx005fs GpuOwl 0 2019-07-26 21:37
Testing an expression for primality 1260 Software 17 2015-08-28 01:35
Testing Mersenne cofactors for primality? CRGreathouse Computer Science & Computational Number Theory 18 2013-06-08 19:12
Primality-testing program with multiple types of moduli (PFGW-related) Unregistered Information & Answers 4 2006-10-04 22:38

All times are UTC. The time now is 02:24.

Sat Jan 23 02:24:56 UTC 2021 up 50 days, 22:36, 0 users, load averages: 1.68, 1.76, 1.69

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.