mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware > GPU Computing

Reply
 
Thread Tools
Old 2021-03-14, 06:42   #364
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

6,841 Posts
Default

Quote:
Originally Posted by ewmayer View Post
a strange yen
Is that what you get when the Japanese mint is having a bad day? Good for numismatists though.
https://upload.wikimedia.org/wikiped...PY_coins_2.png

Last fiddled with by kriesel on 2021-03-14 at 06:43
kriesel is offline   Reply With Quote
Old 2021-03-15, 20:21   #365
clowns789
 
clowns789's Avatar
 
Jun 2003
The Computer

24×52 Posts
Default

If anyone is in the market for the pro version, there is one for $2,300 on eBay and ending in less than three days. It was sold already, but apparently canceled by the winning bidder and relisted. While it's a good chunk of change, it's not as marked up percentage-wise as the consumer version. I have been told by Ken Kriesel that the pro version is not expected to be faster for many of our tasks due to memory constraints, but it might be nice to have nonetheless.

https://www.ebay.com/itm/AMD-Radeon-...oAAOSwZt1gQP1c
clowns789 is offline   Reply With Quote
Old 2021-03-15, 20:39   #366
paulunderwood
 
paulunderwood's Avatar
 
Sep 2002
Database er0rr

10000101110112 Posts
Default

Quote:
Originally Posted by clowns789 View Post
If anyone is in the market for the pro version, there is one for $2,300 on eBay and ending in less than three days. It was sold already, but apparently canceled by the winning bidder and relisted. While it's a good chunk of change, it's not as marked up percentage-wise as the consumer version. I have been told by Ken Kriesel that the pro version is not expected to be faster for many of our tasks due to memory constraints, but it might be nice to have nonetheless.

https://www.ebay.com/itm/AMD-Radeon-...oAAOSwZt1gQP1c
You should be able to afford it if you find the next Mersenne prime
paulunderwood is offline   Reply With Quote
Old 2022-04-09, 20:16   #367
ewmayer
2ω=0
 
ewmayer's Avatar
 
Sep 2002
República de California

11,743 Posts
Default

2 of the 3 R7s in my 3-GPU open-frame desktop rig have been nonfunctional since last fall due to issue detailed in the copied-below e-mail thread between myself and Mike/Xyzzy.

I am trying to find out how one might diagnose whether the actual microelectronics are still functional and maybe just just need reflashing, or whether the whole shebang is shorted due to the psu-plug issue. Do any of our readers have experience with this sort f thing?

The PSU itself is fine aside from the one ruined plug (pic attached), but obviously contains a lot less sensitive electronics than the GPU.

Quote:
29 Sep 2021 EWM: My 3-GPU rig has been crashing after 12-24hrs uptime ...
today when I shut off and power back, up the lights/fans on just 1 GPU come on. Suspect the PSU
(a Corsair, identical to the model that flaked out after ~9 months - so much for "ultra
reliable") has gone tilt. It's been drawing 750-800W-at-wall, thus putting out maybe 85% of that
on the machine side, which seems reasonably within the the 850W rated limit, given the brand's
honest-rating reputation. More later.
Quote:
01 Oct 2021, EWM: Update on this - The system is able to run stably with just
the one "light still on" GPU, figured maybe that one just happened to have the lowest-Ohmage
connection to the PSU.

Next did detailed unplug-pull-out and trace-wires inspection of the 3-GPU rig yesterday - I had
foolishly used a single PCIe plug and splitter to power 2 of the 3, because back in the
wild-n-crazy days when I was trying to run a 4th GPU - all underclocked, but still - off the
same RM850 PSU that proved to be the only cabling solution that seemed to work semi-stably.
After selling GPU #4 last Fall I left remaining cables as-is. Figured since one of the same-PCIe
GPU pair was plugged into full-width mobo PCI slot and both were underclocked, the single PCIe
ribbon cable could handle the wattage. Wrong! But since the PSU-side was the most deeply-buried
(i.e. hard to see) plug of the set, didn't realize until I dug down in there ... plug partially
melted, vinyl sheathing crumbled in my hand on removal.

So figured, ok, maybe it's not the PSU after all - even with the one PCIe socket rendered
unusable by melted plastic crud in half the holes, still 3 or 4 unused ones left, plus the one
driving the dedicated GPU which still operates fine. So plugged 2 fresh PCIe cables into
remaining PSU-side sockets, ran one to each of GPU 2 and 3, tried power-up. Lights and fans on
all 3 GPUs came on, OK, but fans on 2 and 3 were running full blast, which was weird.
'/opt/rocm/bin/rocm-smi' showed just a single device numbered 0, same as with just GPU 1 plugged
in. That GPU used to be at dev #1, but apparently if just 1 GPU is detected, rocm calls that dev
0. Powered back down, unplugged 2 and 3 - no point running fans full blast if GPU not seen by
system - was able to run gpuowl on GPU 1 overnight again. Today powered down and switched the
PCIe plug from 1 to 2, on powerup, rocm-smi gave "WARNING: No AMD GPUs specified". So the
plug-meltage incident appears to have borked 2 and 3.

Do you know if there is still a strong market (e.g. professional refurb-folks) for "lights come
on, fans spin, but no other signs of life" R7s?

Lesson learned: never skimp on the power cabling, and for high-draw components spread the load
over as many power plugs as reasonably possible, for redundancy.
Quote:
02 Oct 2021, Xyzzy: have you tested each one alone? in the first slot?
Quote:
Do you know if there is still a strong market (e.g. professional refurb-folks) for "lights come
on, fans spin, but no other signs of life" R7s?
laurv sometimes buys broken cards to fix
[EWM: LaurV says his card-repair skills are mostly on the nVidia side, and more the power side of things, replacing blown MOSFETs and the components around them.]

[Long hiatus - too busy with other things, and can't afford the electricity bill with more than the remaining 3 R7s running, anyway.]

Quote:
07 Apr 2022, EWM: I finally got around to trying one of the suspected-to-be-b0rked R7s in the 1st PCI slot
on the mobo of my open-test-frame desktop rig, which I'd been using for the remaining good GPU -
simply swapped the 2 GPUs on the mobo. Plugged PCIe power-cabling into the suspected-bad one, on
powerup same symptoms as described below, lights come on, fan runs full-blast, no device found.
Shutdown, swapped power-cabling back into working GPU (now in 2nd PCI slot), it works fine.

So need a way to find out if the affected GPU itself is OK and just needs reflashing or
whatever, or whether it's a total loss.
Quote:
07 Apr 2022, Xyzzy: just so i understand it, you had only 1 gpu plugged into
the mobo for this test?
Quote:
7 Apr 2022 EWM: No - swapped PCI slots of
working gpu with 1 of the 2 borked ones. I've only been using one 2x8-plug PCIe cable since
borkage to minimize cabling mess, hooked that up to the borked gpu this time and booted up - no
gpu detected, though power was clearly getting to it based on lights/fans. Shut down and moved
power cable back to working gpu, now housed in its new PCI slot, booted back up, it runs fine as
expected. So the problem is clearly not with the mobo, nor the PSU.
[EWM: LaurV notes: "Power getting tolights (probably 5V or 3.3V) and fans (probably 12V) doesn't mean that the power gets to the GPU itself (probably 1.8V, separate MOSFETs that could be burned, on a different power branch)."]
Attached Thumbnails
Click image for larger version

Name:	pci_plug_melted.jpg
Views:	42
Size:	369.2 KB
ID:	26743  
ewmayer is offline   Reply With Quote
Old 2022-04-10, 03:08   #368
paulunderwood
 
paulunderwood's Avatar
 
Sep 2002
Database er0rr

4,283 Posts
Default

Try only one card in pci-e slot 1 with 2 feeds from the PSU. If it fails to show try it in pci-e slot 2. If it still fails try slot 3 etc. If and when it boots and you get a video out, you know the card is good. And you now know which slots are good. So try all slots with known good card.

Using the same cables, next try only card 2 in a known good slot. Ditto card 3 by itself.

Using the same cables again verify each of the PSU's cable slots.

Next verify the pci-cables, by cycling them in a good card and slot.

I know, it's a lot of testing!

Using an 850w PSU with insufficient cabling is asking for trouble. At least get some more cables from eBay. I also recommend a beefier PSU so that your rig draws near 50%.

ps. I might have some spare unused pci-e cables from a Corsair AX860.

Last fiddled with by paulunderwood on 2022-04-10 at 03:57
paulunderwood is offline   Reply With Quote
Old 2022-04-10, 08:21   #369
preda
 
preda's Avatar
 
"Mihai Preda"
Apr 2015

101011100102 Posts
Default

I had one R7 GPU die on me following something PSU related, not sure exactly what happened there. But the symptoms were similar, the lights/fans were ON but the GPU was not detected. I looked in the kernel log ("sudo dmesg"), and I saw the kernel reporting some errors for the affected GPU (that was thus not initialized properly, and not appearing in the list of "initialized" GPUs later on).

I contacted the manufacturer (XFX) as the GPU was still in warranty period, and they obliged. I shipped the GPU overseas to their factory somewhere in Asia, and about a month later I received back a working GPU from them. Unfortunatelly it seemed that, at least in my case, the GPU was hardware-affected and a simple BIOS re-flash would not have fixed it.
preda is offline   Reply With Quote
Old 2022-04-10, 09:34   #370
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

6,841 Posts
Default

Quote:
Originally Posted by preda View Post
I shipped the GPU overseas to their factory somewhere in Asia, and about a month later I received back a working GPU from them.
When was that? No such luck here, working through XFX California. They verified what I already had, the cards had failed. They did not repair, they did not replace, they only offered exchange for an RX 5700XT, or payment of original purchase price without sales tax or shipping reimbursement, and that was only as original purchaser with full documentation. What I was paid for 2 corpses was about enough to buy one used RadeonVII replacement at the time. A motherboard's power handling component failing took out 3 of 5 GPUs installed on it at the time. Out of luck on warranty claim for the third since I had not saved or printed its original purchase documentation, and my Best Buy login stopped working by the time the GPUs did. An RX 5700 XT was not an acceptable exchange to me, because:
half the GPU ram (limits max exponent in gpuowl P-1)
less than half the GPU ram bandwidth
less than half the PRP performance (iterations/sec) in gpuowl
~18% the DP performance
the XFX 5700 XT sample I had bought is terribly unreliable
even an RX 6900 XT is not the equal of a Radeon VII, which they would not consider providing
Overall, it was a disappointing warranty claim experience with XFX.

I suppose from their perspective, paying full original price for something near end of warranty might be thought of as generous.
But at the time, working used Radeon VIIs were selling for up to 4 times original sale price.
It's too bad they are no longer being made.

On that particular motherboard, PCIe power is fed from connectors from both ends of the board.
A component near the CPU socket failed spectacularly, with arcing and a little flame.
Running mfakto on the IGP, along with prime95 on the CPU and gpuowl on all GPUs, is what sent it over the edge. CPU installed was i7-4790 (110 watt TDP). Issue was reproduced on a replacement motherboard. (Definitely a destructive test. I don't run mfakto on IGP on such GPU-heavy systems any more.)
GPUs affected were alternate; of ABCDE, A C E failed, B and D positions survived.
Whatever the damage was, they produced Code 43 errors in Windows afterward; not usable on other systems in Windows or Linux.

Last fiddled with by kriesel on 2022-04-10 at 09:45
kriesel is offline   Reply With Quote
Old 2022-04-10, 17:22   #371
preda
 
preda's Avatar
 
"Mihai Preda"
Apr 2015

2·17·41 Posts
Default

Quote:
Originally Posted by kriesel View Post
When was that?
It was in December 2020.

I imagine it must have been painful to lose so many R7s, sorry for that. Indeed a pity they're not made anymore, in retrospective I should've bought a few more just for my personal use..
preda is offline   Reply With Quote
Old 2022-04-10, 19:43   #372
kruoli
 
kruoli's Avatar
 
"Oliver"
Sep 2017
Porta Westfalica, DE

1,151 Posts
Default

Not directly realted to Radeon VII's, but I had major problems with gpuowl when I was remoting into my machine with a graphical session. After avoiding it, everything went fine. But this was on Windows… I hope to give you some motivation to look at "weird" possibilities.
kruoli is online now   Reply With Quote
Old 2022-04-10, 20:06   #373
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

6,841 Posts
Default

Quote:
Originally Posted by kruoli View Post
Not directly related to Radeon VII's, but I had major problems with gpuowl when I was remoting into my machine with a graphical session.
That's my SOP for gpuowl or other GIMPS apps on Windows. GPU-Z doesn't work correctly with AMD drivers, Win7, and remote desktop, but GIMPS apps do.
kriesel is offline   Reply With Quote
Old 2022-05-07, 21:15   #374
Viliam Furik
 
Viliam Furik's Avatar
 
"Viliam Furík"
Jul 2018
Martin, Slovakia

19·41 Posts
Default

I'd like to offer a Radeon VII for sale, for 750 $, including the shipping costs to pretty much anywhere.
It had its fans changed recently, so they should last for a long time hopefully.

Please, PM me if you are interested.
Viliam Furik is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
AMD Radeon Pro WX 3200 ET_ GPU Computing 1 2019-07-04 11:02
Radeon Pro Vega II Duo (look at this monster) M344587487 GPU Computing 10 2019-06-18 14:00
What's the best project to run on a Radeon RX 480? jasong GPU Computing 0 2016-11-09 04:32
Radeon Pro Duo 0PolarBearsHere GPU Computing 0 2016-03-15 01:32
AMD Radeon R9 295X2 firejuggler GPU Computing 33 2014-09-03 21:42

All times are UTC. The time now is 20:47.


Thu Sep 29 20:47:06 UTC 2022 up 42 days, 18:15, 0 users, load averages: 1.31, 1.22, 1.19

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2022, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.

≠ ± ∓ ÷ × · − √ ‰ ⊗ ⊕ ⊖ ⊘ ⊙ ≤ ≥ ≦ ≧ ≨ ≩ ≺ ≻ ≼ ≽ ⊏ ⊐ ⊑ ⊒ ² ³ °
∠ ∟ ° ≅ ~ ‖ ⟂ ⫛
≡ ≜ ≈ ∝ ∞ ≪ ≫ ⌊⌋ ⌈⌉ ∘ ∏ ∐ ∑ ∧ ∨ ∩ ∪ ⨀ ⊕ ⊗ 𝖕 𝖖 𝖗 ⊲ ⊳
∅ ∖ ∁ ↦ ↣ ∩ ∪ ⊆ ⊂ ⊄ ⊊ ⊇ ⊃ ⊅ ⊋ ⊖ ∈ ∉ ∋ ∌ ℕ ℤ ℚ ℝ ℂ ℵ ℶ ℷ ℸ 𝓟
¬ ∨ ∧ ⊕ → ← ⇒ ⇐ ⇔ ∀ ∃ ∄ ∴ ∵ ⊤ ⊥ ⊢ ⊨ ⫤ ⊣ … ⋯ ⋮ ⋰ ⋱
∫ ∬ ∭ ∮ ∯ ∰ ∇ ∆ δ ∂ ℱ ℒ ℓ
𝛢𝛼 𝛣𝛽 𝛤𝛾 𝛥𝛿 𝛦𝜀𝜖 𝛧𝜁 𝛨𝜂 𝛩𝜃𝜗 𝛪𝜄 𝛫𝜅 𝛬𝜆 𝛭𝜇 𝛮𝜈 𝛯𝜉 𝛰𝜊 𝛱𝜋 𝛲𝜌 𝛴𝜎𝜍 𝛵𝜏 𝛶𝜐 𝛷𝜙𝜑 𝛸𝜒 𝛹𝜓 𝛺𝜔