mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware > GPU Computing > GpuOwl

Reply
 
Thread Tools
Old 2020-10-29, 16:49   #2553
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

2×17×139 Posts
Default

Quote:
Originally Posted by moebius View Post
Good idea, but now it's a new link. The table is now in ODF (.ods) format. I hope you can read it well in the browser regarding the resolution.
https://drive.google.com/file/d/1Tim...l57X_RWO0/view
Please use wrap text for the column headings and make the columns narrower to the extent the wrap allows, so it can all be viewed at once without tiny font.
It probably also ought indicate which version of gpuowl was used for that timing.
Finally, please sort by model.
kriesel is offline   Reply With Quote
Old 2020-10-30, 15:45   #2554
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

10010011101102 Posts
Default AMD Windows driver speed influence

Windows AMD Adrenalin driver difference, Window 10 Pro x64, XFX Radeon VII and XFX 5700XT
Code:
Radeon VII (power limited to ~1670Mhz gpu clock for temperature control):
Exponent    fft length     Gpuowl Version       us/it PRP    delta, 20.4.2 to 20.10.1
Mersenne     M words                         20.4.2  20.10.1  us/it   %
642589933  36M 4K:9:512  v6.11-364-g36f4e2a   6864     6944    +80  +1.17
843112609  48M 4K:12:512  v7.0-35-gf06bc5b   10063    10433   +370  +3.68

5700XT (free-running, not power limited):
852348659  48M 4K:12:512 v6.11-364-g36f4e2a  21829    21319   -510  -2.34
This suggests segregating by gpu model to separate systems, to allow older faster driver use on Radeon VIIs.
Which I was contemplating anyway since with the April driver, running the 5700XT caused driver and system instability sufficient to deter running the 5700XT.
I've seen as high as 5% speed penalty for newer driver major version on older AMD gpus previously.

% delta are given with excess digits to avoid adding rounding error and are maybe significant to a full decimal digit.
Early indications after ~12 hours are stability is better with 20.10.1; no issues yet.

Last fiddled with by kriesel on 2020-10-30 at 15:57
kriesel is offline   Reply With Quote
Old 2020-10-30, 16:57   #2555
moebius
 
moebius's Avatar
 
Jul 2009
Germany

1CA16 Posts
Default

I still use Adrenaline 19.11.3 ,Win64 10 Pro 1909 and v6.11-364-g36f4e2a with RX Vega 64.
107868373 FFT: 6M 1K:12:256 1775 us/it PRP

Why updating if everything is stable.
moebius is offline   Reply With Quote
Old 2020-10-30, 17:14   #2556
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

10010011101102 Posts
Default

Quote:
Originally Posted by moebius View Post
I still use Adrenaline 19.11.3 ,Win64 10 Pro 1909 and v6.11-364-g36f4e2a with RX Vega 64.
107868373 FFT: 6M 1K:12:256 1775 us/it PRP

Why updating if everything is stable.
Back at gpuowl V1.9 to get V2.0 to work at all, a driver update was necessary, and cost 5.1% on performance on RX480 & RX550 in V1.9 for driver v19.x vs. v18.y, as I recall.
kriesel is offline   Reply With Quote
Old 2020-10-30, 21:50   #2557
moebius
 
moebius's Avatar
 
Jul 2009
Germany

2×229 Posts
Default

Does anyone have a Radeon RX 590 or below to compare? This card should perform reasonably well for a consumer card, as it can do 0.445 TLops FP64 and 7.119 TLOPS FP32.

Last fiddled with by moebius on 2020-10-30 at 21:56
moebius is offline   Reply With Quote
Old 2020-11-01, 04:42   #2558
Ethan (EO)
 
Ethan (EO)'s Avatar
 
"Ethan O'Connor"
Oct 2002
GIMPS since Jan 1996

2×32×5 Posts
Default

Quote:
Originally Posted by kracker View Post
As requested... instructions on how to compile on windows (I use msys2.. and also there are probably better ways to do it but it's just how I did it)

1) Download, install and follow the instructions for updating MSYS2 here: https://www.msys2.org/
2) Download and install AMD APP SDK(make sure you use the 64bit version) for Windows: https://developer.amd.com/amd-accele...ssing-app-sdk/
3) Copy the contents of C:\Program Files (x86)\AMD APP SDK\3.0\lib\x86_64 to C:\msys64\mingw64\lib and C:\Program Files (x86)\AMD APP SDK\3.0\include to C:\msys64\mingw64\include
4) Install gcc (pacman -S mingw-w64-x86_64-gcc)
5) Download gpuowl sources and drop them somewhere(to /home/username/ is probably easiest)
6) Run MSYS2 from mingw64.exe and cd to the directory you extracted the source to
7) Compile by:
g++ -c gpuowl.cpp
g++ -o gpuowl.exe gpuowl.o -lOpenCL -static
strip gpuowl.exe
I just tried out a few alternative Windows openCL SDKs since AMD's isn't supported anymore, and they all worked great as drop-in replacements with no changes needed to source or makefile. This is all tested against commit 30b0117f5829ac0b3782e613bad62a88c3a0ea03 of GpuOwl

1) GPUOpen OpenCL SDK : https://github.com/GPUOpen-Libraries...L-SDK/releases (3.0 tested)

2) Intel OpenCL SDK https://software.intel.com/content/w...pencl-sdk.html (2020 Update 3 tested)

3) nvidia OpenCL from the Cuda Toolkit: https://developer.nvidia.com/cuda-do...et_arch=x86_64 (11.1 Update 1 tested)

I ran prp 1000003 with each build on a 1080ti as a quick check and the residue looked fine. I'm attaching binaries in case anyone wants to test these more thoroughly or on different hardware.
Attached Files
File Type: 7z gpuowl-win-v7.1-7-g30b0117-AltOCLLibs.7z (590.8 KB, 6 views)
Ethan (EO) is offline   Reply With Quote
Old 2020-11-01, 05:09   #2559
moebius
 
moebius's Avatar
 
Jul 2009
Germany

2·229 Posts
Default

Please read this thread regarding v7.1
https://mersenneforum.org/showthread.php?t=26152
moebius is offline   Reply With Quote
Old 2020-11-01, 09:59   #2560
moebius
 
moebius's Avatar
 
Jul 2009
Germany

45810 Posts
Default Nvidia Geforce RTX 3080 from a forum user

gpuowl-win.exe -iters 200000 -prp 77936867
2020-11-01 01:30:36 gpuowl v6.11-364-g36f4e2a
2020-11-01 01:30:36 Note: not found 'config.txt'
2020-11-01 01:30:36 config: -iters 200000 -prp 77936867
2020-11-01 01:30:36 device 0, unique id ''
2020-11-01 01:30:36 GeForce RTX 3080-0 77936867 FFT: 4M 1K:8:256 (18.58 bpw)
2020-11-01 01:30:36 GeForce RTX 3080-0 Expected maximum carry32: 583B0000
2020-11-01 01:30:36 GeForce RTX 3080-0 OpenCL args "-DEXP=77936867u -DWIDTH=1024u -DSMALL_HEIGHT=256u -DMIDDLE=8u -DPM1=0 -DMM_CHAIN=1u -DMM2_CHAIN=2u -DMAX_ACCURACY=1 -DWEIGHT_STEP_MINUS_1=0xa.c42d0d7cec038p-5 -DIWEIGHT_STEP_MINUS_1=-0x8.0e50c8817ddf8p-5 -cl-unsafe-math-optimizations -cl-std=CL2.0 -cl-finite-math-only "
2020-11-01 01:30:36 GeForce RTX 3080-0

2020-11-01 01:30:36 GeForce RTX 3080-0 OpenCL compilation in 0.01 s
2020-11-01 01:30:37 GeForce RTX 3080-0 77936867 OK 0 loaded: blockSize 400, 0000000000000003
2020-11-01 01:30:37 GeForce RTX 3080-0 validating proof residues for power 8
2020-11-01 01:30:37 GeForce RTX 3080-0 Proof using power 8
2020-11-01 01:30:40 GeForce RTX 3080-0 77936867 OK 800 0.00%; 1948 us/it; ETA 1d 18:11; 1579c241dc63eca6 (check 0.84s)
2020-11-01 01:37:16 GeForce RTX 3080-0 Stopping, please wait..
2020-11-01 01:37:17 GeForce RTX 3080-0 77936867 OK 200000 0.26%; 1991 us/it; ETA 1d 18:59; f0b04b45b0855bd2 (check 0.86s)
2020-11-01 01:37:17 GeForce RTX 3080-0 Exiting because "stop requested"
2020-11-01 01:37:17 GeForce RTX 3080-0 Bye
moebius is offline   Reply With Quote
Old 2020-11-01, 13:37   #2561
TheJudger
 
TheJudger's Avatar
 
"Oliver"
Mar 2005
Germany

2×3×5×37 Posts
Default

some Quick&Dirty benchmarks:
  1. A100 PCIe, reported clock rate and power consumption during run: 1215 MHz, 250W:
    Code:
    # ./gpuowl.exe -iters 200000 -prp 77936867
    2020-11-01 14:30:43 gpuowl v6.11-380-g79ea0cc
    2020-11-01 14:30:43 Note: not found 'config.txt'
    2020-11-01 14:30:43 config: -iters 200000 -prp 77936867
    2020-11-01 14:30:43 device 0, unique id ''
    2020-11-01 14:30:43 A100-PCIE-40GB-0 77936867 FFT: 4M 1K:8:256 (18.58 bpw)
    2020-11-01 14:30:43 A100-PCIE-40GB-0 Expected maximum carry32: 583B0000
    2020-11-01 14:30:44 A100-PCIE-40GB-0 OpenCL args "-DEXP=77936867u -DWIDTH=1024u -DSMALL_HEIGHT=256u -DMIDDLE=8u -DPM1=0 -DMM_CHAIN=1u -DMM2_CHAIN=2u -DMAX_ACCURACY=1 -DWEIGHT_STEP_MINUS_1=0x1.5885a1af9d807p-2 -DIWEIGHT_STEP_MINUS_1=-0x1.01ca19102fbbfp-2  -cl-unsafe-math-optimizations -cl-std=CL2.0 -cl-finite-math-only "
    2020-11-01 14:30:48 A100-PCIE-40GB-0
    
    2020-11-01 14:30:48 A100-PCIE-40GB-0 OpenCL compilation in 3.98 s
    2020-11-01 14:30:49 A100-PCIE-40GB-0 77936867 OK        0 loaded: blockSize 400, 0000000000000003
    2020-11-01 14:30:49 A100-PCIE-40GB-0 validating proof residues for power 8
    2020-11-01 14:30:49 A100-PCIE-40GB-0 Proof using power 8
    2020-11-01 14:30:49 A100-PCIE-40GB-0 77936867 OK      800   0.00%;  291 us/it; ETA 0d 06:18; 1579c241dc63eca6 (check 0.22s)
    2020-11-01 14:31:49 A100-PCIE-40GB-0 Stopping, please wait..
    2020-11-01 14:31:49 A100-PCIE-40GB-0 77936867 OK   200000   0.26%;  301 us/it; ETA 0d 06:31; f0b04b45b0855bd2 (check 0.19s)
    2020-11-01 14:31:49 A100-PCIE-40GB-0 Exiting because "stop requested"
    2020-11-01 14:31:49 A100-PCIE-40GB-0 Bye
  2. Quadro RTX 8000, reported clock rate and power consumption during run: 1920 MHz, 200W:
    Code:
    # ./gpuowl.exe -iters 200000 -prp 77936867
    2020-11-01 14:21:08 gpuowl v6.11-380-g79ea0cc
    2020-11-01 14:21:08 Note: not found 'config.txt'
    2020-11-01 14:21:08 config: -iters 200000 -prp 77936867
    2020-11-01 14:21:08 device 0, unique id ''
    2020-11-01 14:21:08 Quadro RTX 8000-0 77936867 FFT: 4M 1K:8:256 (18.58 bpw)
    2020-11-01 14:21:08 Quadro RTX 8000-0 Expected maximum carry32: 583B0000
    2020-11-01 14:21:09 Quadro RTX 8000-0 OpenCL args "-DEXP=77936867u -DWIDTH=1024u -DSMALL_HEIGHT=256u -DMIDDLE=8u -DPM1=0 -DMM_CHAIN=1u -DMM2_CHAIN=2u -DMAX_ACCURACY=1 -DWEIGHT_STEP_MINUS_1=0x1.5885a1af9d807p-2 -DIWEIGHT_STEP_MINUS_1=-0x1.01ca19102fbbfp-2  -cl-unsafe-math-optimizations -cl-std=CL2.0 -cl-finite-math-only "
    2020-11-01 14:21:11 Quadro RTX 8000-0
    
    2020-11-01 14:21:11 Quadro RTX 8000-0 OpenCL compilation in 1.63 s
    2020-11-01 14:21:11 Quadro RTX 8000-0 77936867 OK        0 loaded: blockSize 400, 0000000000000003
    2020-11-01 14:21:11 Quadro RTX 8000-0 validating proof residues for power 8
    2020-11-01 14:21:11 Quadro RTX 8000-0 Proof using power 8
    2020-11-01 14:21:14 Quadro RTX 8000-0 77936867 OK      800   0.00%; 1812 us/it; ETA 1d 15:14; 1579c241dc63eca6 (check 0.77s)
    2020-11-01 14:27:25 Quadro RTX 8000-0 Stopping, please wait..
    2020-11-01 14:27:26 Quadro RTX 8000-0 77936867 OK   200000   0.26%; 1864 us/it; ETA 1d 16:15; f0b04b45b0855bd2 (check 0.80s)
    2020-11-01 14:27:26 Quadro RTX 8000-0 Exiting because "stop requested"
    2020-11-01 14:27:26 Quadro RTX 8000-0 Bye
  3. Geforce RTX 3090, reported clock rate and power consumption during run: 1935 MHz, 320W:
    Code:
    # ./gpuowl.exe -iters 200000 -prp 77936867
    2020-11-01 14:30:27 gpuowl v6.11-380-g79ea0cc
    2020-11-01 14:30:27 Note: not found 'config.txt'
    2020-11-01 14:30:27 config: -iters 200000 -prp 77936867
    2020-11-01 14:30:27 device 0, unique id ''
    2020-11-01 14:30:27 GeForce RTX 3090-0 77936867 FFT: 4M 1K:8:256 (18.58 bpw)
    2020-11-01 14:30:27 GeForce RTX 3090-0 Expected maximum carry32: 583B0000
    2020-11-01 14:30:27 GeForce RTX 3090-0 OpenCL args "-DEXP=77936867u -DWIDTH=1024u -DSMALL_HEIGHT=256u -DMIDDLE=8u -DPM1=0 -DMM_CHAIN=1u -DMM2_CHAIN=2u -DMAX_ACCURACY=1 -DWEIGHT_STEP_MINUS_1=0x1.5885a1af9d807p-2 -DIWEIGHT_STEP_MINUS_1=-0x1.01ca19102fbbfp-2  -cl-unsafe-math-optimizations -cl-std=CL2.0 -cl-finite-math-only "
    2020-11-01 14:30:29 GeForce RTX 3090-0
    
    2020-11-01 14:30:29 GeForce RTX 3090-0 OpenCL compilation in 1.78 s
    2020-11-01 14:30:30 GeForce RTX 3090-0 77936867 OK        0 loaded: blockSize 400, 0000000000000003
    2020-11-01 14:30:30 GeForce RTX 3090-0 validating proof residues for power 8
    2020-11-01 14:30:30 GeForce RTX 3090-0 Proof using power 8
    2020-11-01 14:30:32 GeForce RTX 3090-0 77936867 OK      800   0.00%; 1527 us/it; ETA 1d 09:03; 1579c241dc63eca6 (check 0.66s)
    2020-11-01 14:35:44 GeForce RTX 3090-0 Stopping, please wait..
    2020-11-01 14:35:45 GeForce RTX 3090-0 77936867 OK   200000   0.26%; 1572 us/it; ETA 1d 09:56; f0b04b45b0855bd2 (check 0.68s)
    2020-11-01 14:35:45 GeForce RTX 3090-0 Exiting because "stop requested"
    2020-11-01 14:35:45 GeForce RTX 3090-0 Bye
TheJudger is offline   Reply With Quote
Old 2020-11-01, 17:41   #2562
xx005fs
 
"Eric"
Jan 2018
USA

211 Posts
Default

Thank you for the benchmark numbers. Very impressive performance from the A100, almost scales 1:1 with volta when comparing their memory bandwidth. I can't imagine the performance if the memory is overclocked.

OTOH 3090 is honestly a big disappointment, it's slower than a tuned Vega 64 (which draw a lot less power) and not much faster than Turing RTX 8000. Looking forward to the performance of 6900xt for sure but I highly doubt it'll best the Radeon VII.

Last fiddled with by xx005fs on 2020-11-01 at 17:43
xx005fs is offline   Reply With Quote
Old 2020-11-01, 18:19   #2563
axn
 
axn's Avatar
 
Jun 2003

12AC16 Posts
Default

Quote:
Originally Posted by TheJudger View Post
some Quick&Dirty benchmarks:
  1. A100 PCIe, reported clock rate and power consumption during run: 1215 MHz, 250W:
    Code:
    # ./gpuowl.exe -iters 200000 -prp 77936867
    2020-11-01 14:30:43 gpuowl v6.11-380-g79ea0cc
    2020-11-01 14:30:43 Note: not found 'config.txt'
    2020-11-01 14:30:43 config: -iters 200000 -prp 77936867
    2020-11-01 14:30:43 device 0, unique id ''
    2020-11-01 14:30:43 A100-PCIE-40GB-0 77936867 FFT: 4M 1K:8:256 (18.58 bpw)
    2020-11-01 14:30:43 A100-PCIE-40GB-0 Expected maximum carry32: 583B0000
    2020-11-01 14:30:44 A100-PCIE-40GB-0 OpenCL args "-DEXP=77936867u -DWIDTH=1024u -DSMALL_HEIGHT=256u -DMIDDLE=8u -DPM1=0 -DMM_CHAIN=1u -DMM2_CHAIN=2u -DMAX_ACCURACY=1 -DWEIGHT_STEP_MINUS_1=0x1.5885a1af9d807p-2 -DIWEIGHT_STEP_MINUS_1=-0x1.01ca19102fbbfp-2  -cl-unsafe-math-optimizations -cl-std=CL2.0 -cl-finite-math-only "
    2020-11-01 14:30:48 A100-PCIE-40GB-0
    
    2020-11-01 14:30:48 A100-PCIE-40GB-0 OpenCL compilation in 3.98 s
    2020-11-01 14:30:49 A100-PCIE-40GB-0 77936867 OK        0 loaded: blockSize 400, 0000000000000003
    2020-11-01 14:30:49 A100-PCIE-40GB-0 validating proof residues for power 8
    2020-11-01 14:30:49 A100-PCIE-40GB-0 Proof using power 8
    2020-11-01 14:30:49 A100-PCIE-40GB-0 77936867 OK      800   0.00%;  291 us/it; ETA 0d 06:18; 1579c241dc63eca6 (check 0.22s)
    2020-11-01 14:31:49 A100-PCIE-40GB-0 Stopping, please wait..
    2020-11-01 14:31:49 A100-PCIE-40GB-0 77936867 OK   200000   0.26%;  301 us/it; ETA 0d 06:31; f0b04b45b0855bd2 (check 0.19s)
    2020-11-01 14:31:49 A100-PCIE-40GB-0 Exiting because "stop requested"
    2020-11-01 14:31:49 A100-PCIE-40GB-0 Bye
Holy ****. That's fast. That's like, what, 750 GHzDay/day? EDIT:- Nope, more like 900 !!!


Quote:
Originally Posted by xx005fs View Post
Looking forward to the performance of 6900xt for sure but I highly doubt it'll best the Radeon VII.
I'm hoping that it will achieve 90%+ performance of R VII. Of course, at $999, it is still too expensive but 6800 & 6800XT might be good value. All pure speculation currently, obviously.

Last fiddled with by axn on 2020-11-01 at 18:22
axn is online now   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
mfakto: an OpenCL program for Mersenne prefactoring Bdot GPU Computing 1657 2020-10-27 01:23
GPUOWL AMD Windows OpenCL issues xx005fs GpuOwl 0 2019-07-26 21:37
Testing an expression for primality 1260 Software 17 2015-08-28 01:35
Testing Mersenne cofactors for primality? CRGreathouse Computer Science & Computational Number Theory 18 2013-06-08 19:12
Primality-testing program with multiple types of moduli (PFGW-related) Unregistered Information & Answers 4 2006-10-04 22:38

All times are UTC. The time now is 08:51.

Sat Nov 28 08:51:55 UTC 2020 up 79 days, 6:02, 3 users, load averages: 1.73, 1.50, 1.51

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2020, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.