mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware > GPU Computing > GpuOwl

Reply
 
Thread Tools
Old 2022-04-19, 15:56   #1
Magellan3s
 
Mar 2022

61 Posts
Default GPU Owl 3080ti Benchmarks

GPU is an EVGA 3080ti FTW3
OS is Linux 20.04.4

Slight GPU Overclock @
+200 MHZ +1000 Mhz Memory




Code:
jesus@Magellan:~/gpuowl-6$ ./gpuowl -prp 57885161 -iters 30000
2022-04-19 10:51:02 gpuowl 
2022-04-19 10:51:02 config: -user Magallan3s -cpu Magellan -maxAlloc 10500M -yield
2022-04-19 10:51:02 config: -prp 57885161 -iters 30000 
2022-04-19 10:51:02 device 0, unique id ''
2022-04-19 10:51:02 Magellan 57885161 FFT: 3M 1K:6:256 (18.40 bpw)
2022-04-19 10:51:02 Magellan Expected maximum carry32: 42500000
2022-04-19 10:51:02 Magellan OpenCL args "-DEXP=57885161u -DWIDTH=1024u -DSMALL_HEIGHT=256u -DMIDDLE=6u -DPM1=0 -DMAX_ACCURACY=1 -DWEIGHT_STEP_MINUS_1=0x1.07673850f37p-1 -DIWEIGHT_STEP_MINUS_1=-0x1.5bd9e39e14a3dp-2  -cl-unsafe-math-optimizations -cl-std=CL2.0 -cl-finite-math-only "
2022-04-19 10:51:02 Magellan 

2022-04-19 10:51:02 Magellan OpenCL compilation in 0.00 s
2022-04-19 10:51:03 Magellan 57885161 OK    30000 loaded: blockSize 400, fe1565094c7f7b47
2022-04-19 10:51:03 Magellan validating proof residues for power 8
2022-04-19 10:51:03 Magellan Proof using power 8
2022-04-19 10:51:04 Magellan 57885161 OK    30800   0.05%; 1194 us/it; ETA 0d 19:11; 4f153add2832ca8a (check 0.50s)
2022-04-19 10:51:38 Magellan Stopping, please wait..
2022-04-19 10:51:38 Magellan 57885161 OK    60000   0.10%; 1148 us/it; ETA 0d 18:27; 175901ec29adfa87 (check 0.47s)
2022-04-19 10:51:39 Magellan Exiting because "stop requested"
2022-04-19 10:51:39 Magellan Bye



For Wavefront Exponent

Code:
jesus@Magellan:~/gpuowl-6$ ./gpuowl -prp 113613007 -iters 30000
2022-04-19 10:55:04 gpuowl 
2022-04-19 10:55:04 config: -user Magallan3s -cpu Magellan -maxAlloc 10500M -yield
2022-04-19 10:55:04 config: -prp 113613007 -iters 30000 
2022-04-19 10:55:04 device 0, unique id ''
2022-04-19 10:55:04 Magellan 113613007 FFT: 6M 1K:12:256 (18.06 bpw)
2022-04-19 10:55:04 Magellan Expected maximum carry32: 4CFA0000
2022-04-19 10:55:05 Magellan OpenCL args "-DEXP=113613007u -DWIDTH=1024u -DSMALL_HEIGHT=256u -DMIDDLE=12u -DPM1=0 -DMM2_CHAIN=1u -DMAX_ACCURACY=1 -DWEIGHT_STEP_MINUS_1=0x1.d7719ff404155p-1 -DIWEIGHT_STEP_MINUS_1=-0x1.eae2bbc5c8218p-2  -cl-unsafe-math-optimizations -cl-std=CL2.0 -cl-finite-math-only "
2022-04-19 10:55:05 Magellan 

2022-04-19 10:55:05 Magellan OpenCL compilation in 0.83 s
2022-04-19 10:55:06 Magellan 113613007 OK        0 loaded: blockSize 400, 0000000000000003
2022-04-19 10:55:06 Magellan validating proof residues for power 8
2022-04-19 10:55:06 Magellan Proof using power 8
2022-04-19 10:55:09 Magellan 113613007 OK      800   0.00%; 2132 us/it; ETA 2d 19:18; 420cf6918603e7e1 (check 0.91s)
2022-04-19 10:56:12 Magellan Stopping, please wait..
2022-04-19 10:56:13 Magellan 113613007 OK    30000   0.03%; 2158 us/it; ETA 2d 20:06; 32d4895e2a4b9a36 (check 0.91s)
2022-04-19 10:56:13 Magellan Exiting because "stop requested"
2022-04-19 10:56:13 Magellan Bye
Same wavefront exponent tested on Intel 12900k (AVX512 clock @ 4.8 mhz) using (Corsair DDR5 ram @ 5400 mhz)

Code:
jesus@Magellan:~/Prime95 (copy)$ ./mprime -m 
[Main thread Apr 19 11:02] Mersenne number primality test program version 30.7
[Main thread Apr 19 11:02] Optimizing for CPU architecture: Core i3/i5/i7, L2 cache size: 8x1280 KB, L3 cache size: 30 MB
	    
Your choice: [Main thread Apr 19 11:13] Starting worker.
[Work thread Apr 19 11:13] Worker starting
[Work thread Apr 19 11:13] Setting affinity to run worker on CPU core #1
[Work thread Apr 19 11:13] Setting affinity to run helper thread 1 on CPU core #2
[Work thread Apr 19 11:13] Setting affinity to run helper thread 2 on CPU core #3
[Work thread Apr 19 11:13] Setting affinity to run helper thread 6 on CPU core #7
[Work thread Apr 19 11:13] Setting affinity to run helper thread 7 on CPU core #8
[Work thread Apr 19 11:13] Setting affinity to run helper thread 5 on CPU core #6
[Work thread Apr 19 11:13] Setting affinity to run helper thread 3 on CPU core #4
[Work thread Apr 19 11:13] Setting affinity to run helper thread 4 on CPU core #5
[Work thread Apr 19 11:13] Starting Gerbicz error-checking PRP test of M113613007 using AVX-512 FFT length 6048K, Pass1=1152, Pass2=5376, clm=1, 8 threads
[Work thread Apr 19 11:13] Preallocating disk space for the proof interim residues file p113613007.residues
[Work thread Apr 19 11:13] PRP proof using power=10 and 64-bit hash size.
[Work thread Apr 19 11:13] Proof requires 14.5GB of temporary disk space and uploading a 156MB proof file.
[Work thread Apr 19 11:13] Iteration: 10000 / 113613007 [0.00%], ms/iter:  2.032, ETA: 64:08:00
[Work thread Apr 19 11:13] Iteration: 20000 / 113613007 [0.01%], ms/iter:  1.970, ETA: 62:09:09
[Work thread Apr 19 11:14] Iteration: 30000 / 113613007 [0.02%], ms/iter:  1.962, ETA: 61:54:44
[Work thread Apr 19 11:14] Iteration: 40000 / 113613007 [0.03%], ms/iter:  1.961, ETA: 61:52:20
[Main thread Apr 19 11:14] Stopping all worker windows.
[Work thread Apr 19 11:14] Stopping PRP test of M113613007 at iteration 40747 [0.03%]
[Work thread Apr 19 11:14] Worker stopped.
[Main thread Apr 19 11:14] Execution halted.

Last fiddled with by Magellan3s on 2022-04-19 at 16:15
Magellan3s is offline   Reply With Quote
Old 2022-04-19, 16:55   #2
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

11001101001112 Posts
Default

That gpu model family is well suited to TF, and not well suited to gpuowl or other DP work.
A Radeon VII GPU that costs less is more than double the speed in DP. (6M fft, under 890. usec/iter).

Also, please do not proliferate threads when an existing thread will serve. There are existing benchmark threads, for example.
kriesel is offline   Reply With Quote
Old 2022-04-19, 22:30   #3
Prime95
P90 years forever!
 
Prime95's Avatar
 
Aug 2002
Yeehaw, FL

7,901 Posts
Default

Quote:
Originally Posted by Magellan3s View Post
GPU is an EVGA 3080ti FTW3
OS is Linux 20.04.4

Slight GPU Overclock @
+200 MHZ +1000 Mhz Memory
Thanks for the info!
Prime95 is online now   Reply With Quote
Old 2022-04-20, 15:45   #4
Magellan3s
 
Mar 2022

6110 Posts
Default

Quote:
Originally Posted by Prime95 View Post
Thanks for the info!
You are very welcome George!

Here are the same exponents tested with an NVIDIA A100 (top of the line $12,000 Nvidia compute card).

The A100 NVIDIA compute accelerator card is performing worse than a $~1,000 Radeon VII.



Quote:
Originally Posted by kriesel View Post
That gpu model family is well suited to TF, and not well suited to gpuowl or other DP work.
A Radeon VII GPU that costs less is more than double the speed in DP. (6M fft, under 890. usec/iter).
Apparently not even the A100 is suited to GPUOWL or other DP work becuase your Radeon VII is performing 6M FFT at a faster rate than the 1050 usec I got with an A100....
Attached Thumbnails
Click image for larger version

Name:	57885161 bench.jpg
Views:	47
Size:	804.8 KB
ID:	26784   Click image for larger version

Name:	a100 bench.jpg
Views:	38
Size:	797.6 KB
ID:	26785   Click image for larger version

Name:	a100 gpu owl wavefront.jpg
Views:	40
Size:	741.1 KB
ID:	26786  

Last fiddled with by Magellan3s on 2022-04-20 at 15:47
Magellan3s is offline   Reply With Quote
Old 2022-04-20, 15:55   #5
ATH
Einyen
 
ATH's Avatar
 
Dec 2003
Denmark

22·72·17 Posts
Default

Something must be wrong then because when I get an A100 rarely on Google Colab Pro+ with gpuowl I'm getting 389µs on a ~109.8M exponent 6M FFT (and 600µs on a V100).
ATH is offline   Reply With Quote
Old 2022-04-20, 16:09   #6
Magellan3s
 
Mar 2022

6110 Posts
Default

Quote:
Originally Posted by Prime95 View Post
Thanks for the info!
Quote:
Originally Posted by ATH View Post
Something must be wrong then because when I get an A100 rarely on Google Colab Pro+ with gpuowl I'm getting 389µs on a ~109.8M exponent 6M FFT (and 600µs on a V100).
This was on a ubuntu 20.04 virtual machine!
Magellan3s is offline   Reply With Quote
Old 2022-04-20, 16:22   #7
storm5510
Random Account
 
storm5510's Avatar
 
Aug 2009
Not U. + S.A.

214910 Posts
Default

Quote:
Originally Posted by Magellan3s View Post
GPU is an EVGA 3080ti FTW3
OS is Linux 20.04.4

Slight GPU Overclock @
+200 MHZ +1000 Mhz Memory
I read this as +200 MHz on the GPU core clock and +1,000 MHz on the memory clock. Is this an addition beyond the default settings?
storm5510 is offline   Reply With Quote
Old 2022-04-20, 16:36   #8
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

3·11·199 Posts
Default

Quote:
Originally Posted by ATH View Post
Something must be wrong then
Perhaps the benchmark instance was not the only one running on his Colab session's A100? One more reason to run GPU and CPU GIMPS apps in background and top -d 180 in foreground of Colab; to possibly spot such errors.
Also I wonder what version of gpuowl was used for each; there's a null version output on his pix; no version stated with your timings.

Quote:
Originally Posted by storm5510 View Post
I read this as +200 MHz on the GPU core clock and +1,000 MHz on the memory clock. Is this an addition beyond the default settings?
+1GHz seems a rather ambitious amount of GPU ram overclock. On Radeon VII, AMD's tools only allow up to +200Mhz atop the stock 1GHz. Haven't messed with NVIDIA clocks on RTXxxxx myself though.
GPU specs here.

Last fiddled with by kriesel on 2022-04-20 at 16:48
kriesel is offline   Reply With Quote
Old 2022-04-20, 17:13   #9
M344587487
 
M344587487's Avatar
 
"Composite as Heck"
Oct 2017

29·31 Posts
Default

Quote:
Originally Posted by kriesel View Post
...
+1GHz seems a rather ambitious amount of GPU ram overclock. On Radeon VII, AMD's tools only allow up to +200Mhz atop the stock 1GHz. Haven't messed with NVIDIA clocks on RTXxxxx myself though.
GPU specs here.
HBM2 and GDDR6X are very different, +1GHz does seem ambitious (I have no practical experience with it like you) but it may be possible that the nomenclature is just different. If the speed rating of GDDR6X is anything like DDR a +1GHz memory OC may translate to an actual OC of the modules of +250MHz (GDDR6X uses DDR and also PAM4 encoding to double the rate again), whereas an HBM OC of +200MHz will (I think) have a 1:1 correspondence of +200MHz module OC. I could be entirely wrong.
M344587487 is offline   Reply With Quote
Old 2022-04-20, 17:32   #10
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

146478 Posts
Default

Quote:
Originally Posted by M344587487 View Post
HBM2 and GDDR6X are very different
Yes. The spec sheet linked earlier gave 1188MHz base clock, 1800 boost, both for the memory on the EVGA FTW3 GPU. Going +1000 MHz from base would be 2188 MHz. Boost tends to be rated more for transient use than steady state, so 2188 for long GIMPS runs would be quite ambitious. It would probably requiring extra cooling or voltage curve reduction or something, or an insensitivity to errors (either not detecting them, in applications where an occasional error has little consequence, such as high refresh rate graphics display, or detecting and correcting in error-intolerant demanding applications like GIMPS). At some point rising power consumption and clock may cause damage.
kriesel is offline   Reply With Quote
Old 2022-04-21, 14:46   #11
tdulcet
 
tdulcet's Avatar
 
"Teal Dulcet"
Jun 2018

6610 Posts
Default

Quote:
Originally Posted by ATH View Post
Something must be wrong then because when I get an A100 rarely on Google Colab Pro+ with gpuowl I'm getting 389µs on a ~109.8M exponent 6M FFT (and 600µs on a V100).
Quote:
Originally Posted by Magellan3s View Post
This was on a ubuntu 20.04 virtual machine!
Colab currently use Ubuntu 18.04, but that should not matter. I would try removing the -yield argument from your config file, as that will slow things down and should not be needed for headless GPUs like the A100.
tdulcet is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
Benchmarks Pjetrode Information & Answers 3 2018-01-07 23:23
RPS benchmarks pinhodecarlos Riesel Prime Search 29 2014-12-07 07:13
GPU Benchmarks houding Hardware 7 2014-07-09 10:48
LLR benchmarks Retep Riesel Prime Search 4 2008-11-06 22:15
Benchmarks Vandy Hardware 6 2002-10-28 13:45

All times are UTC. The time now is 08:51.


Tue Jun 28 08:51:20 UTC 2022 up 75 days, 6:52, 1 user, load averages: 1.34, 1.15, 1.09

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2022, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.

≠ ± ∓ ÷ × · − √ ‰ ⊗ ⊕ ⊖ ⊘ ⊙ ≤ ≥ ≦ ≧ ≨ ≩ ≺ ≻ ≼ ≽ ⊏ ⊐ ⊑ ⊒ ² ³ °
∠ ∟ ° ≅ ~ ‖ ⟂ ⫛
≡ ≜ ≈ ∝ ∞ ≪ ≫ ⌊⌋ ⌈⌉ ∘ ∏ ∐ ∑ ∧ ∨ ∩ ∪ ⨀ ⊕ ⊗ 𝖕 𝖖 𝖗 ⊲ ⊳
∅ ∖ ∁ ↦ ↣ ∩ ∪ ⊆ ⊂ ⊄ ⊊ ⊇ ⊃ ⊅ ⊋ ⊖ ∈ ∉ ∋ ∌ ℕ ℤ ℚ ℝ ℂ ℵ ℶ ℷ ℸ 𝓟
¬ ∨ ∧ ⊕ → ← ⇒ ⇐ ⇔ ∀ ∃ ∄ ∴ ∵ ⊤ ⊥ ⊢ ⊨ ⫤ ⊣ … ⋯ ⋮ ⋰ ⋱
∫ ∬ ∭ ∮ ∯ ∰ ∇ ∆ δ ∂ ℱ ℒ ℓ
𝛢𝛼 𝛣𝛽 𝛤𝛾 𝛥𝛿 𝛦𝜀𝜖 𝛧𝜁 𝛨𝜂 𝛩𝜃𝜗 𝛪𝜄 𝛫𝜅 𝛬𝜆 𝛭𝜇 𝛮𝜈 𝛯𝜉 𝛰𝜊 𝛱𝜋 𝛲𝜌 𝛴𝜎𝜍 𝛵𝜏 𝛶𝜐 𝛷𝜙𝜑 𝛸𝜒 𝛹𝜓 𝛺𝜔