mersenneforum.org RDNA2 / Big Navi
 Register FAQ Search Today's Posts Mark Forums Read

 2021-01-10, 01:56 #111 Xyzzy     "Mike" Aug 2002 11111100000112 Posts Possibly interesting: gpuowl = 172W mfakto = 202W
2021-01-10, 02:51   #112
tServo

"Marv"
May 2009
near the Tannhäuser Gate

10011100112 Posts

Quote:
 Originally Posted by Xyzzy Possibly interesting: gpuowl = 172W mfakto = 202W
Since GPUOWL uses FP64 computations, many threads will be paused often waiting for this resource; consuming no power.

MFAKTC primarily uses INT computations so the threads tend to run full blast; consuming more power.

 2021-01-10, 10:46 #113 Viliam Furik   "Viliam Furík" Jul 2018 Martin, Slovakia 3·149 Posts It seems that mfakto only uses half of the compute units, 1920 of the total 3840. I've tried to find information on INT32 performance ratio, but it is hard.
2021-01-10, 14:40   #114
tServo

"Marv"
May 2009
near the Tannhäuser Gate

3×11×19 Posts

Quote:
 Originally Posted by Viliam Furik It seems that mfakto only uses half of the compute units, 1920 of the total 3840. I've tried to find information on INT32 performance ratio, but it is hard.
Are you using a performance monitor to measure compute units in use?

MFAKTO was written 9 years ago and perhaps some of the kernel's launch parameters could use re-tuning due to the newer GPUs having different architecture.

Also, using half the compute units could be due to any GPU's Achilles heel: memory access stalls.

2021-01-10, 16:05   #115
Viliam Furik

"Viliam Furík"
Jul 2018
Martin, Slovakia

1BF16 Posts

Quote:
 Originally Posted by Xyzzy Code: ... number of multiprocessors 30 (1920 compute elements) clock rate 1815 MHz ...
RX6800 has 60 CUs, and a total of 3840 compute cores.

 2021-01-10, 17:27 #116 axn     Jun 2003 134B16 Posts It is what OpenCL reports, not what the program "uses". Power consumption of 200W is consistent with all the CUs being used since the TDP is 250W.
2021-01-10, 22:27   #117
Viliam Furik

"Viliam Furík"
Jul 2018
Martin, Slovakia

3·149 Posts

Quote:
 Originally Posted by axn It is what OpenCL reports, not what the program "uses". Power consumption of 200W is consistent with all the CUs being used since the TDP is 250W.
Oh, ok then. Thanks.

In that case, why is that so low? I have read it could have a 1:1 ratio of FP32 and INT32 operations per second, and it should have 16 TFLOPS of FP32.

 2021-01-12, 16:28 #118 Xyzzy     "Mike" Aug 2002 1F8316 Posts Running each card alone, in an open-air test bench, they will run at default speed with no errors. Putting them in a case, close to each other, the top card gets errors. We are experimenting with reducing the power draw to get everything stable. Unfortunately, there is no memory temperature reading. The errors occur (we think) when the junction temp gets around 90°C. The junction is rated for 110°C so we assume the memory is the culprit. Probably most people will never notice an occasional video error in games but it is obviously an issue for compute tasks. We put this system together with what we had laying around, so our cost ended up being pretty low. CPU: Intel Celeron G5900 3.4 GHz Dual-Core Processor CPU Cooler: Noctua NH-U12S 55 CFM CPU Cooler Motherboard: Asus ROG STRIX Z490-F GAMING ATX LGA1200 Motherboard Memory: Corsair Vengeance LPX 16 GB (2 x 8 GB) DDR4-2400 CL16 Memory Memory: Corsair Vengeance LPX 16 GB (2 x 8 GB) DDR4-2400 CL16 Memory Storage: Seagate Barracuda Compute 256 GB M.2-2280 NVME Solid State Drive Video Card: Gigabyte Radeon RX 6800 16 GB Video Card Video Card: Gigabyte Radeon RX 6800 16 GB Video Card Case: Fractal Design Meshify C ATX Mid Tower Case Power Supply: SeaSonic FOCUS Plus Platinum 750 W 80+ Platinum Certified Fully Modular ATX Power Supply Attached Thumbnails
 2021-01-14, 21:30 #119 Xyzzy     "Mike" Aug 2002 3·2,689 Posts Over the past few days we have been dealing with numerous issues with the 6800 cards. We have isolated the problem to the memory on both cards. We have tested each card individually and in different systems. The memory runs at ~2,000MT/s. There is no way to tell the memory to run slower. Once either card heats up into the high 70s/low 80s it is only a matter of time before they either start to throw errors or completely bork the system. When we say bork the system, it borks it so hard it resets the system's BIOS to defaults. (!) We can clock them down so they run in the 60s but what is the point? They run slow at that speed, and even then sometimes they will hang. It just takes several hours to a day. We explored modding the card's BIOS to lower the memory clock speed but the BIOS is digitally signed so they can't be modified. If it was just one card we would suspect that it was a defective card and we would warranty it. Both cards acting weird in multiple systems means, we think, there is a driver or design problem. If we give the cards many work units, like 10K iterations of a FFT then another 10K of a different FFT, etc., (using a batch file) the cards will "get confused" eventually and hang. There are only two available drivers for them. We tried both. We used DDU to purge the systems of all video/audio drivers prior to installing them. We never got to test them in Linux. We never got past the initial "easy pointy-and-clicky Windows" stage. Putting them in a case is a disaster. They fail on a test bench but in a case they fail even faster. We cannot recommend the 6800 (reference style) at this point. We don't know if the drivers are not right or if the cards are not suitable for compute work or what. Note that in games and synthetic benchmarks, the cards work fine. They never reset or bork the system even though they are running very hot at "stock" speeds. Above 50% or so the fans on the cards stop providing additional cooling and just get louder. Our time is worth about a dollar an hour so we decided to cut our losses and we got rid of them. Life is too short to spend that much time on unreliable hardware. We now have a RTX 3070 installed which "just works". It is "slow" for gpuowl but it is great for our games.
 2021-01-15, 00:43 #120 M344587487     "Composite as Heck" Oct 2017 7×113 Posts That's a shame, I really hope that poor windows drivers are to blame and not the hardware. Gamers tend to be able to overclock much higher than compute can and be considered "stable", I wouldn't be surprised if RDNA2 needs to be dialed back a bit for stability now that they've optimised for gaming. gpuowl does hammer the memory in a way gaming doesn't, there's a chance the memory cooling solution is not fit for purpose. Judging by this video the reference cooling is not ideal although the die and RAM contact looks fine: https://www.youtube.com/watch?v=0s7bOaa6X9E
2021-01-15, 02:50   #121
LaurV
Romulan Interpreter

Jun 2011
Thailand

2×13×192 Posts

Quote:
 Originally Posted by Xyzzy Above 50% or so the fans on the cards stop providing additional cooling and just get louder.
This! You put the dot on the ı (unicode: latin character i without dot )
Is the card blowing air in both directions? (i.e. the hot air blown back into the case?)
Is mfakto different from gpuowl? (you said it works well in games even if it gets hotter? this may mean that the card has issues with the memory, either supply or bus/impedance, when you use more memory it can not sustain, in this case mfakto should behave "gaming style" and be more reliable? Maybe you can use them for TF? Is the memory cooled by the same metal block, or separate? Did you open it to see if the pads for memory are thicker, or different material/type/color/dryness/etc? It may be that the memory chips are set lower then the gpu chip and the pads are thicker, like M34 said, it can be a memory-cooling issue).
Is there any water blocks available for it?

Last fiddled with by LaurV on 2021-01-15 at 02:55

All times are UTC. The time now is 23:47.

Thu Apr 22 23:47:42 UTC 2021 up 14 days, 18:28, 0 users, load averages: 1.74, 1.72, 1.77

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.