mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware > GPU Computing > GpuOwl

Reply
 
Thread Tools
Old 2022-11-20, 16:08   #2850
Magellan3s
 
Mar 2022
Earth

5·23 Posts
Default

That Radeon Instinct M250X-Aldebaran XT is stupid fast compared to anything else on the market!

I hope the 7900 XTX offers performance that is at least somewhat close to it on GPUOWL.

Last fiddled with by Magellan3s on 2022-11-20 at 16:09
Magellan3s is offline   Reply With Quote
Old 2022-11-20, 16:45   #2851
axn
 
axn's Avatar
 
Jun 2003

2·2,719 Posts
Default

Quote:
Originally Posted by Magellan3s View Post
That Radeon Instinct M250X-Aldebaran XT is stupid fast compared to anything else on the market!
I believe those numbers are estimated base on FLOPS scaling which is not realistic. They will probably be 20-30% slower in real life.

Quote:
Originally Posted by Magellan3s View Post
I hope the 7900 XTX offers performance that is at least somewhat close to it on GPUOWL.
Assuming a 2.7x (SWAG) improvement over 6950, it would be about 193 microseconds, which would be significantly faster than A100.
axn is offline   Reply With Quote
Old 2022-11-20, 18:18   #2852
frmky
 
frmky's Avatar
 
Jul 2003
So Cal

1010001010102 Posts
Default

Also the MI250X (and MI250) are really two GPUs packaged together, like the nVidia K80. AMD hides this a bit in their marketing. So expect only half of that performance in a single gpuOwl run. And AMD is a bit behind nVidia in getting data from GPU memory to the cores, so the actual performance is closer to the A100 than the raw FLOPS numbers would suggest for most workloads.
frmky is offline   Reply With Quote
Old 2022-11-20, 19:10   #2853
moebius
 
moebius's Avatar
 
Jul 2009
Germany

11·61 Posts
Default

Quote:
Originally Posted by axn View Post
I believe those numbers are estimated base on FLOPS scaling which is not realistic. They will probably be 20-30% slower in real life.

Assuming a 2.7x (SWAG) improvement over 6950, it would be about 193 microseconds, which would be significantly faster than A100.
The yellow values ​​are really only roughly extrapolated on the Mi100, but with the 6800 XT I was very close to the real value at the time, extrapolated on a 5700 XT. https://mersenneforum.org/showpost.p...5&postcount=24

And then there's the famous Bottleneck probably too for MI50 and MI60.

Last fiddled with by moebius on 2022-11-20 at 19:13
moebius is offline   Reply With Quote
Old 2022-11-20, 20:05   #2854
moebius
 
moebius's Avatar
 
Jul 2009
Germany

11×61 Posts
Default

Quote:
Originally Posted by Magellan3s View Post
That Radeon Instinct M250X-Aldebaran XT is stupid fast compared to anything else on the market! I hope the 7900 XTX offers performance that is at least somewhat close to it on GPUOWL.
This one is rather close to the Instinct MI210. I've entered my predictions for the 7900 graphics cards into the list too.

Last fiddled with by moebius on 2022-11-20 at 20:16
moebius is offline   Reply With Quote
Old 2022-11-20, 20:15   #2855
frmky
 
frmky's Avatar
 
Jul 2003
So Cal

50528 Posts
Default

A MI250 is really two MI210's in the same package. Since gpuOwl doesn't support multiple GPUs, a single run on a MI250 will run on one of them and give the same times as a MI210. You can just run two simultaneously, one on each GPU. Same for the MI250X.
frmky is offline   Reply With Quote
Old 2022-11-20, 20:54   #2856
moebius
 
moebius's Avatar
 
Jul 2009
Germany

11·61 Posts
Default

Quote:
Originally Posted by frmky View Post
A MI250 is really two MI210's in the same package. Since gpuOwl doesn't support multiple GPUs, a single run on a MI250 will run on one of them and give the same times as a MI210. You can just run two simultaneously, one on each GPU. Same for the MI250X.
The technical basis of the Instinct MI250(X), codename Aldebaran, is two dies linked by a 400 GByte/s fast Infinity Fabric (via Elevated Fanout Bridge, EFB). They are manufactured with TSMCs N6, i.e. 6 nm EUV, and have 29.1 billion transistors. The two chips are called GCD (Graphics Compute Die) , each of which has four shader arrays for 112 compute units with a total of 7,168 ALUs at 1.7 GHz when fully expanded. The L2 cache still holds 8 MB, but its bandwidth has been doubled - for good reason: the matrix cores, similar to Nvidia's tensor cores, deliver double or quadruple the rate per clock. The Instinct MI250X as the top model achieves around 96 teraflops with double precision (FP64), without the matrix cores it is still 48 teraflops.

I conclude that mfakto will scale just as well as on a MI100, but gpuowl much better.

But I'm only human, I can also be wrong.
moebius is offline   Reply With Quote
Old 2022-11-20, 21:17   #2857
frmky
 
frmky's Avatar
 
Jul 2003
So Cal

2×1,301 Posts
Default

But to the user they appear as two distinct GPUs with separate memory spaces. Data transfer between the two over Infinity Fabric is much slower than HBM2, and applications must be coded to support multiple GPUs to use both.
https://chipsandcheese.com/2022/09/1...-architecture/ (5th paragraph)
https://twitter.com/projectphysx/sta...623746?lang=en
frmky is offline   Reply With Quote
Old 2022-11-20, 21:54   #2858
moebius
 
moebius's Avatar
 
Jul 2009
Germany

11·61 Posts
Default

Quote:
Originally Posted by frmky View Post
But to the user they appear as two distinct GPUs with separate memory spaces...
In any case it would be interesting to know if these two discrete GPU's are as fast in wavefront prp-test as on the card with only one discrete GPU, then they would be at least twice as good for the project.
moebius is offline   Reply With Quote
Old 2022-11-21, 01:39   #2859
axn
 
axn's Avatar
 
Jun 2003

2×2,719 Posts
Default

Quote:
Originally Posted by moebius View Post
In any case it would be interesting to know if these two discrete GPU's are as fast in wavefront prp-test as on the card with only one discrete GPU, then they would be at least twice as good for the project.
These beasts (Instincts, Teslas, Quadros, etc.) are only of theoretical interest to the project. They are f***ing expensive! You're better off building one or more mulit-GPU PCs with that kind of money.
axn is offline   Reply With Quote
Old 2022-11-21, 02:20   #2860
frmky
 
frmky's Avatar
 
Jul 2003
So Cal

A2A16 Posts
Default

32GB MI60's are available for under $1000. But they are passively cooled so you'd have to deal with rigging a cooler for it.
frmky is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
mfakto: an OpenCL program for Mersenne prefactoring Bdot GPU Computing 1719 2023-01-16 15:51
GPUOWL AMD Windows OpenCL issues xx005fs GpuOwl 0 2019-07-26 21:37
Testing an expression for primality 1260 Software 17 2015-08-28 01:35
Testing Mersenne cofactors for primality? CRGreathouse Computer Science & Computational Number Theory 18 2013-06-08 19:12
Primality-testing program with multiple types of moduli (PFGW-related) Unregistered Information & Answers 4 2006-10-04 22:38

All times are UTC. The time now is 03:45.


Mon Feb 6 03:45:29 UTC 2023 up 172 days, 1:14, 1 user, load averages: 0.86, 0.94, 0.96

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2023, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.

≠ ± ∓ ÷ × · − √ ‰ ⊗ ⊕ ⊖ ⊘ ⊙ ≤ ≥ ≦ ≧ ≨ ≩ ≺ ≻ ≼ ≽ ⊏ ⊐ ⊑ ⊒ ² ³ °
∠ ∟ ° ≅ ~ ‖ ⟂ ⫛
≡ ≜ ≈ ∝ ∞ ≪ ≫ ⌊⌋ ⌈⌉ ∘ ∏ ∐ ∑ ∧ ∨ ∩ ∪ ⨀ ⊕ ⊗ 𝖕 𝖖 𝖗 ⊲ ⊳
∅ ∖ ∁ ↦ ↣ ∩ ∪ ⊆ ⊂ ⊄ ⊊ ⊇ ⊃ ⊅ ⊋ ⊖ ∈ ∉ ∋ ∌ ℕ ℤ ℚ ℝ ℂ ℵ ℶ ℷ ℸ 𝓟
¬ ∨ ∧ ⊕ → ← ⇒ ⇐ ⇔ ∀ ∃ ∄ ∴ ∵ ⊤ ⊥ ⊢ ⊨ ⫤ ⊣ … ⋯ ⋮ ⋰ ⋱
∫ ∬ ∭ ∮ ∯ ∰ ∇ ∆ δ ∂ ℱ ℒ ℓ
𝛢𝛼 𝛣𝛽 𝛤𝛾 𝛥𝛿 𝛦𝜀𝜖 𝛧𝜁 𝛨𝜂 𝛩𝜃𝜗 𝛪𝜄 𝛫𝜅 𝛬𝜆 𝛭𝜇 𝛮𝜈 𝛯𝜉 𝛰𝜊 𝛱𝜋 𝛲𝜌 𝛴𝜎𝜍 𝛵𝜏 𝛶𝜐 𝛷𝜙𝜑 𝛸𝜒 𝛹𝜓 𝛺𝜔