mersenneforum.org RDNA 3?
 Register FAQ Search Today's Posts Mark Forums Read

 2022-11-06, 11:02 #1 axn     Jun 2003 2×2,719 Posts RDNA 3? Does anyone have any inside info on the upcoming RDNA 3 (aka RX 7000 series)? If all knowing wiki is to be believed, the latest and greatest should be 2.5-3x faster that the 6950 XT, which would make it the absolute fastest PRP cruncher. Last fiddled with by axn on 2022-11-06 at 11:02
2022-11-06, 11:30   #2
preda

"Mihai Preda"
Apr 2015

5×172 Posts

Quote:
 Originally Posted by axn Does anyone have any inside info on the upcoming RDNA 3 (aka RX 7000 series)? If all knowing wiki is to be believed, the latest and greatest should be 2.5-3x faster that the 6950 XT, which would make it the absolute fastest PRP cruncher.
Hopefully faster than the 4y-old RadeonVII.

2022-11-06, 11:40   #3
PhilF

"6800 descendent"
Feb 2005

10111000102 Posts

Quote:
 Originally Posted by preda Hopefully faster than the 4y-old RadeonVII.
This is the first time I have ever heard anyone insinuate that the Radeon VII is slow!

2022-11-06, 13:54   #4
axn

Jun 2003

2·2,719 Posts

Quote:
 Originally Posted by preda Hopefully faster than the 4y-old RadeonVII.
According to the gpuowl benchmarks compiled by moebius here (https://docs.google.com/spreadsheets...u7PgIrITgItkC/), 6950 XT and VII are neck-and-neck. So 7900 XTX should come in at top of that list, and 7900 XT should be at #2 or #3.

*fingers crossed*

 2022-11-06, 17:06 #5 M344587487     "Composite as Heck" Oct 2017 3×311 Posts I was surprised at how good RDNA2 remained relative to R7, expected a bigger divergence between CDNA and RDNA as generations rolled by. The divergence probably did happen but manifests in AI/ML/whatever instead of more traditional compute. With luck the characteristics that have allowed RDNA to remain very viable for our niche remain intact, it would suck if the only reason RDNA has been good to date is that AMD didn't have the resources to optimise further for gaming by gutting compute. For gaming it's unclear which of XT/XTX is better bang for buck (well IMO gaming on anything beyond midrange is a waste but YMMV). For gpuowl the XTX is almost certainly the one to go for. Less but faster cache is an interesting wrinkle, it's the only metric (that we know of) which isn't strictly an upgrade over the 6950XT. That the 80/96 MiB cache matches up with the midrange 6700/6700XT is interesting, but it may be down to them not double-stacking cache on the 7900xt/xtx (which was something rumoured and I'm guessing might be reserved for a refresh or pro cards down the line, it might just be that smaller faster cache performed better on average for gaming).
2022-11-06, 19:04   #6
Prime95
P90 years forever!

Aug 2002
Yeehaw, FL

177508 Posts

Quote:
 Originally Posted by axn According to the gpuowl benchmarks compiled by moebius here (https://docs.google.com/spreadsheets...u7PgIrITgItkC/), 6950 XT and VII are neck-and-neck. So 7900 XTX should come in at top of that list, and 7900 XT should be at #2 or #3.
Something weird with the chart. The 6950 has half the memory bandwidth and half the FP64 throughput but comes out faster than the VII?

BTW, I just tuned some of my Radeon VIIs for maximum energy efficiency. Typical for 111M exponents, 150W (assuming 91% efficient power supply), I get 813 us/it.
My goal is to add a used Radeon VII for just over $300 and run all Radeon VIIs at peak energy efficiency, getting slightly more throughput using less power with a roughly two-year breakeven on the used Radeon VII. 2022-11-07, 01:51 #7 axn Jun 2003 2×2,719 Posts Quote:  Originally Posted by Prime95 Something weird with the chart. The 6950 has half the memory bandwidth and half the FP64 throughput but comes out faster than the VII?. Infinity Cache is big enough to run the entire FFT out of it, and that has much higher bandwidth. I think Radeon VIIs were severely bottlenecked on memory for PRP tests, so their increased FLOPS were ineffective. The 7900s have similar TFLOPS to the VII, but has matching bandwidth increase for the cache, so I am expecting to see proportional improvement -- assuming the wiki numbers are in the right ballpark. 2022-11-07, 02:21 #8 mrh "mrh" Oct 2018 Temecula, ca 5A16 Posts Quote:  Originally Posted by Prime95 Something weird with the chart. The 6950 has half the memory bandwidth and half the FP64 throughput but comes out faster than the VII? BTW, I just tuned some of my Radeon VIIs for maximum energy efficiency. Typical for 111M exponents, 150W (assuming 91% efficient power supply), I get 813 us/it. My goal is to add a used Radeon VII for just over$300 and run all Radeon VIIs at peak energy efficiency, getting slightly more throughput using less power with a roughly two-year breakeven on the used Radeon VII.
I'm doing that as well. Running like that I can put two VIIs in each system and not worry about heat.

2022-11-07, 03:24   #9
Prime95
P90 years forever!

Aug 2002
Yeehaw, FL

177508 Posts

Quote:
 Originally Posted by axn Infinity Cache is big enough to run the entire FFT out of it, and that has much higher bandwidth.
Thanks for the insight. Yes, gpuowl on Radeon VII is near or at max memory bandwidth which is why Radeon VII Pro benchmarks are not much faster than a Radeon VII.

Quote:
 Originally Posted by mrh I'm doing that as well. Running like that I can put two VIIs in each system and not worry about heat.
This probably belongs in a different thread. My first observation was that sclk=2 gives the maximum iterations/watt. One can fine tune the clock speed using 'echo "s 1 XXXX" >/sys/class/drm/card2/device/pp_od_clk_voltage' where XXXX is between 1500 and 2200. Even though there is a big gap in clock speeds between sclk=1 and sclk=3, peak efficiency is near sclk=2.

Then I thought, let's maximize the clock speed for the sclk=2 voltage which uses 725mV. So then I worked on the voltage curve working up from 760mV until I found the voltage that did not produce errors.
echo "vc 1 1304 760" >/sys/class/drm/card2/device/pp_od_clk_voltage
I already had set the upper end of the voltage curve with
echo "vc 2 1801 1030" >/sys/class/drm/card2/device/pp_od_clk_voltage
Finally, find the largest XXXX value that chooses 725mV with sclk=2.

GPU example 1:
echo "vc 1 1304 770" >/sys/class/drm/card0/device/pp_od_clk_voltage
echo "vc 2 1801 1030" >/sys/class/drm/card0/device/pp_od_clk_voltage
echo "s 1 1958" >/sys/class/drm/card0/device/pp_od_clk_voltage
echo "c" >/sys/class/drm/card0/device/pp_od_clk_voltage
/opt/rocm/bin/rocm-smi -d 1 --setsclk 2 --setfan 160

GPU example 2 (one of my better cards):
echo "vc 1 1304 760" >/sys/class/drm/card1/device/pp_od_clk_voltage
echo "vc 2 1801 1030" >/sys/class/drm/card1/device/pp_od_clk_voltage
echo "s 1 2085" >/sys/class/drm/card1/device/pp_od_clk_voltage
echo "c" >/sys/class/drm/card1/device/pp_od_clk_voltage
/opt/rocm/bin/rocm-smi -d 1 --setsclk 2 --setfan 160

GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU%
0 69.0c 138.0W 1186Mhz 1201Mhz 82.75% manual 250.0W N/A 93%
1 66.0c 139.0W 1228Mhz 1201Mhz 80.78% manual 250.0W N/A 75%

 2022-11-15, 01:06 #10 Magellan3s   Mar 2022 Earth 5·23 Posts Seeing as I have been unable to get my hands on a 4090 at MSRP.... The 7900xtx will be on my list!
2022-11-15, 03:00   #11
moebius

Jul 2009
Germany

25·3·7 Posts

Quote:
 Originally Posted by Prime95 Something weird with the chart. The 6950 has half the memory bandwidth and half the FP64 throughput but comes out faster than the VII?
First of all, I would like to say that it is extremely important to me that the values ​​​​that I enter in the table are realistic, so I check them for plausibility if possible (e.g. by extrapolating using values of a reference card, as in this case the RX6800 XT).

for the 6900 XT e.g. the following values ​​are available.

DrDerpenberg

JCoveiro

I always enter the best value for a single instance, as well as for the Radeon VII, to ensure a relatively fair comparison. It is quite possible that with 2 instances at the same time the Radeon VII will show up better than with one instance.
I started the list because I am often suspicious of the benchmarks on mersenne.ca where a RX 5700XT performs better than a RX 6800XT, which I consider almost impossible.

Last fiddled with by moebius on 2022-11-15 at 03:40

All times are UTC. The time now is 07:35.

Wed Feb 8 07:35:36 UTC 2023 up 174 days, 5:04, 1 user, load averages: 1.68, 1.39, 1.10