mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware > GPU Computing

Reply
 
Thread Tools
Old 2022-11-06, 11:02   #1
axn
 
axn's Avatar
 
Jun 2003

2×2,719 Posts
Question RDNA 3?

Does anyone have any inside info on the upcoming RDNA 3 (aka RX 7000 series)? If all knowing wiki is to be believed, the latest and greatest should be 2.5-3x faster that the 6950 XT, which would make it the absolute fastest PRP cruncher.

Last fiddled with by axn on 2022-11-06 at 11:02
axn is offline   Reply With Quote
Old 2022-11-06, 11:30   #2
preda
 
preda's Avatar
 
"Mihai Preda"
Apr 2015

5A516 Posts
Default

Quote:
Originally Posted by axn View Post
Does anyone have any inside info on the upcoming RDNA 3 (aka RX 7000 series)? If all knowing wiki is to be believed, the latest and greatest should be 2.5-3x faster that the 6950 XT, which would make it the absolute fastest PRP cruncher.
Hopefully faster than the 4y-old RadeonVII.
preda is offline   Reply With Quote
Old 2022-11-06, 11:40   #3
PhilF
 
PhilF's Avatar
 
"6800 descendent"
Feb 2005
Colorado

13428 Posts
Default

Quote:
Originally Posted by preda View Post
Hopefully faster than the 4y-old RadeonVII.
This is the first time I have ever heard anyone insinuate that the Radeon VII is slow!
PhilF is offline   Reply With Quote
Old 2022-11-06, 13:54   #4
axn
 
axn's Avatar
 
Jun 2003

2×2,719 Posts
Default

Quote:
Originally Posted by preda View Post
Hopefully faster than the 4y-old RadeonVII.
According to the gpuowl benchmarks compiled by moebius here (https://docs.google.com/spreadsheets...u7PgIrITgItkC/), 6950 XT and VII are neck-and-neck. So 7900 XTX should come in at top of that list, and 7900 XT should be at #2 or #3.

*fingers crossed*
axn is offline   Reply With Quote
Old 2022-11-06, 17:06   #5
M344587487
 
M344587487's Avatar
 
"Composite as Heck"
Oct 2017

3×311 Posts
Default

I was surprised at how good RDNA2 remained relative to R7, expected a bigger divergence between CDNA and RDNA as generations rolled by. The divergence probably did happen but manifests in AI/ML/whatever instead of more traditional compute. With luck the characteristics that have allowed RDNA to remain very viable for our niche remain intact, it would suck if the only reason RDNA has been good to date is that AMD didn't have the resources to optimise further for gaming by gutting compute.

For gaming it's unclear which of XT/XTX is better bang for buck (well IMO gaming on anything beyond midrange is a waste but YMMV). For gpuowl the XTX is almost certainly the one to go for. Less but faster cache is an interesting wrinkle, it's the only metric (that we know of) which isn't strictly an upgrade over the 6950XT. That the 80/96 MiB cache matches up with the midrange 6700/6700XT is interesting, but it may be down to them not double-stacking cache on the 7900xt/xtx (which was something rumoured and I'm guessing might be reserved for a refresh or pro cards down the line, it might just be that smaller faster cache performed better on average for gaming).
M344587487 is offline   Reply With Quote
Old 2022-11-06, 19:04   #6
Prime95
P90 years forever!
 
Prime95's Avatar
 
Aug 2002
Yeehaw, FL

816510 Posts
Default

Quote:
Originally Posted by axn View Post
According to the gpuowl benchmarks compiled by moebius here (https://docs.google.com/spreadsheets...u7PgIrITgItkC/), 6950 XT and VII are neck-and-neck. So 7900 XTX should come in at top of that list, and 7900 XT should be at #2 or #3.
Something weird with the chart. The 6950 has half the memory bandwidth and half the FP64 throughput but comes out faster than the VII?

BTW, I just tuned some of my Radeon VIIs for maximum energy efficiency. Typical for 111M exponents, 150W (assuming 91% efficient power supply), I get 813 us/it.
My goal is to add a used Radeon VII for just over $300 and run all Radeon VIIs at peak energy efficiency, getting slightly more throughput using less power with a roughly two-year breakeven on the used Radeon VII.
Prime95 is offline   Reply With Quote
Old 2022-11-07, 01:51   #7
axn
 
axn's Avatar
 
Jun 2003

2·2,719 Posts
Default

Quote:
Originally Posted by Prime95 View Post
Something weird with the chart. The 6950 has half the memory bandwidth and half the FP64 throughput but comes out faster than the VII?.
Infinity Cache is big enough to run the entire FFT out of it, and that has much higher bandwidth. I think Radeon VIIs were severely bottlenecked on memory for PRP tests, so their increased FLOPS were ineffective.

The 7900s have similar TFLOPS to the VII, but has matching bandwidth increase for the cache, so I am expecting to see proportional improvement -- assuming the wiki numbers are in the right ballpark.
axn is offline   Reply With Quote
Old 2022-11-07, 02:21   #8
mrh
 
"mrh"
Oct 2018
Temecula, ca

9010 Posts
Default

Quote:
Originally Posted by Prime95 View Post
Something weird with the chart. The 6950 has half the memory bandwidth and half the FP64 throughput but comes out faster than the VII?

BTW, I just tuned some of my Radeon VIIs for maximum energy efficiency. Typical for 111M exponents, 150W (assuming 91% efficient power supply), I get 813 us/it.
My goal is to add a used Radeon VII for just over $300 and run all Radeon VIIs at peak energy efficiency, getting slightly more throughput using less power with a roughly two-year breakeven on the used Radeon VII.
I'm doing that as well. Running like that I can put two VIIs in each system and not worry about heat.
mrh is offline   Reply With Quote
Old 2022-11-07, 03:24   #9
Prime95
P90 years forever!
 
Prime95's Avatar
 
Aug 2002
Yeehaw, FL

5·23·71 Posts
Default

Quote:
Originally Posted by axn View Post
Infinity Cache is big enough to run the entire FFT out of it, and that has much higher bandwidth.
Thanks for the insight. Yes, gpuowl on Radeon VII is near or at max memory bandwidth which is why Radeon VII Pro benchmarks are not much faster than a Radeon VII.

Quote:
Originally Posted by mrh View Post
I'm doing that as well. Running like that I can put two VIIs in each system and not worry about heat.
This probably belongs in a different thread. My first observation was that sclk=2 gives the maximum iterations/watt. One can fine tune the clock speed using 'echo "s 1 XXXX" >/sys/class/drm/card2/device/pp_od_clk_voltage' where XXXX is between 1500 and 2200. Even though there is a big gap in clock speeds between sclk=1 and sclk=3, peak efficiency is near sclk=2.

Then I thought, let's maximize the clock speed for the sclk=2 voltage which uses 725mV. So then I worked on the voltage curve working up from 760mV until I found the voltage that did not produce errors.
echo "vc 1 1304 760" >/sys/class/drm/card2/device/pp_od_clk_voltage
I already had set the upper end of the voltage curve with
echo "vc 2 1801 1030" >/sys/class/drm/card2/device/pp_od_clk_voltage
Finally, find the largest XXXX value that chooses 725mV with sclk=2.

GPU example 1:
echo "vc 1 1304 770" >/sys/class/drm/card0/device/pp_od_clk_voltage
echo "vc 2 1801 1030" >/sys/class/drm/card0/device/pp_od_clk_voltage
echo "s 1 1958" >/sys/class/drm/card0/device/pp_od_clk_voltage
echo "c" >/sys/class/drm/card0/device/pp_od_clk_voltage
/opt/rocm/bin/rocm-smi -d 1 --setsclk 2 --setfan 160

GPU example 2 (one of my better cards):
echo "vc 1 1304 760" >/sys/class/drm/card1/device/pp_od_clk_voltage
echo "vc 2 1801 1030" >/sys/class/drm/card1/device/pp_od_clk_voltage
echo "s 1 2085" >/sys/class/drm/card1/device/pp_od_clk_voltage
echo "c" >/sys/class/drm/card1/device/pp_od_clk_voltage
/opt/rocm/bin/rocm-smi -d 1 --setsclk 2 --setfan 160

GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU%
0 69.0c 138.0W 1186Mhz 1201Mhz 82.75% manual 250.0W N/A 93%
1 66.0c 139.0W 1228Mhz 1201Mhz 80.78% manual 250.0W N/A 75%
Prime95 is offline   Reply With Quote
Old 2022-11-15, 01:06   #10
Magellan3s
 
Mar 2022
Earth

5×23 Posts
Default

Seeing as I have been unable to get my hands on a 4090 at MSRP.... The 7900xtx will be on my list!
Magellan3s is offline   Reply With Quote
Old 2022-11-15, 03:00   #11
moebius
 
moebius's Avatar
 
Jul 2009
Germany

11×61 Posts
Default

Quote:
Originally Posted by Prime95 View Post
Something weird with the chart. The 6950 has half the memory bandwidth and half the FP64 throughput but comes out faster than the VII?
First of all, I would like to say that it is extremely important to me that the values ​​​​that I enter in the table are realistic, so I check them for plausibility if possible (e.g. by extrapolating using values of a reference card, as in this case the RX6800 XT).

for the 6900 XT e.g. the following values ​​are available.

DrDerpenberg

JCoveiro

I always enter the best value for a single instance, as well as for the Radeon VII, to ensure a relatively fair comparison. It is quite possible that with 2 instances at the same time the Radeon VII will show up better than with one instance.
I started the list because I am often suspicious of the benchmarks on mersenne.ca where a RX 5700XT performs better than a RX 6800XT, which I consider almost impossible.

Last fiddled with by moebius on 2022-11-15 at 03:40
moebius is offline   Reply With Quote
Reply

Thread Tools


All times are UTC. The time now is 12:45.


Tue Feb 7 12:45:59 UTC 2023 up 173 days, 10:14, 1 user, load averages: 2.63, 2.09, 1.66

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2023, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.

≠ ± ∓ ÷ × · − √ ‰ ⊗ ⊕ ⊖ ⊘ ⊙ ≤ ≥ ≦ ≧ ≨ ≩ ≺ ≻ ≼ ≽ ⊏ ⊐ ⊑ ⊒ ² ³ °
∠ ∟ ° ≅ ~ ‖ ⟂ ⫛
≡ ≜ ≈ ∝ ∞ ≪ ≫ ⌊⌋ ⌈⌉ ∘ ∏ ∐ ∑ ∧ ∨ ∩ ∪ ⨀ ⊕ ⊗ 𝖕 𝖖 𝖗 ⊲ ⊳
∅ ∖ ∁ ↦ ↣ ∩ ∪ ⊆ ⊂ ⊄ ⊊ ⊇ ⊃ ⊅ ⊋ ⊖ ∈ ∉ ∋ ∌ ℕ ℤ ℚ ℝ ℂ ℵ ℶ ℷ ℸ 𝓟
¬ ∨ ∧ ⊕ → ← ⇒ ⇐ ⇔ ∀ ∃ ∄ ∴ ∵ ⊤ ⊥ ⊢ ⊨ ⫤ ⊣ … ⋯ ⋮ ⋰ ⋱
∫ ∬ ∭ ∮ ∯ ∰ ∇ ∆ δ ∂ ℱ ℒ ℓ
𝛢𝛼 𝛣𝛽 𝛤𝛾 𝛥𝛿 𝛦𝜀𝜖 𝛧𝜁 𝛨𝜂 𝛩𝜃𝜗 𝛪𝜄 𝛫𝜅 𝛬𝜆 𝛭𝜇 𝛮𝜈 𝛯𝜉 𝛰𝜊 𝛱𝜋 𝛲𝜌 𝛴𝜎𝜍 𝛵𝜏 𝛶𝜐 𝛷𝜙𝜑 𝛸𝜒 𝛹𝜓 𝛺𝜔