mersenneforum.org  

Go Back   mersenneforum.org > New To GIMPS? Start Here! > Information & Answers

Reply
 
Thread Tools
Old 2020-01-11, 20:45   #1
Ozymandias
 
Jan 2020

5 Posts
Default Prime95 throws up two errors.

Hello Everyone, looking for some help with my recent-ish build.

Built the system in late December and have been having regular stability issues since then.

I ran Prime95 which threw up two Errors:

FATAL ERROR: Final result 00000000, expected: 25DE6210.
Hardware failure detected, consult stress.txt file.
Torture Test completed 21 tests in 48 minutes - 1 errors, 0 warnings.
Worker stopped

FATAL ERROR: Rounding was 0.5, expected less than 0.4
Hardware failure detected, consult stress.txt file.
Torture Test completed 19 tests in 46 minutes - 1 errors, 0 warnings.
Worker stopped


Would the failure of P95 indicate a hardware problem with the CPU?


SYSTEM SPECS:
AMD Ryzen 5 3600X
Gigabyte X570 Aorus Elite motherboard
EVGA Nvidia RTX 2070 Super XC Gaming
Corsair Vengance LPX DDR4 16gb 3200mhz
Corsair Force LE 240gb
Western Digital Black 1tb SN750
Dlink Wireless networking card
Corsair TX650M 80+ Bronze (New)

Other Relevant Information:
Memtest86 passed twice
Not overclocking anything (including the Memory)
BIOS, Chipset, GPU, other relevant drivers are up to date

Would greatly appreciate any help to get the the bottom of this, I'm here to answer any questions.
Ozymandias is offline   Reply With Quote
Old 2020-01-11, 22:23   #2
VBCurtis
 
VBCurtis's Avatar
 
"Curtis"
Feb 2005
Riverside, CA

3·19·83 Posts
Default

It's pretty tough to determine the source of errors like this. Even when the memory passes memtest, you could still have a bad stick. Since memory is also the easiest thing to physically remove and replace, I suggest you start there first.
Run on just one stick of memory, run the P95 test again. If it passes, swap to the other single stick, repeat.

If both those tests pass (or both fail), then it's likely not the memory- it would be well and truly rare to get *two* bad memory sticks!

It's also possible the board is bad; for instance, there may be a pin on the memory bus that's loose, or fails occasionally. If your board will boot with a single stick in any old location, you could try the test with one stick, in each DIMM slot.

Good lucK!
VBCurtis is offline   Reply With Quote
Old 2020-01-11, 23:15   #3
paulunderwood
 
paulunderwood's Avatar
 
Sep 2002
Database er0rr

E2116 Posts
Default

Is heat a problem? Install some temperature monitoring software to see if it is a problem.

If the temperature looks okay strip the machine down to 1 memory stick and the disk.

If that fails it the BIOS might need a slight adjustment on voltage and/or timing.

If the temperature is high for the CPU then it might need the heatsink refitting.

Did you build this computer yourself or did you buy it ready-built?
paulunderwood is offline   Reply With Quote
Old 2020-01-12, 21:01   #4
Ozymandias
 
Jan 2020

5 Posts
Default

Thanks for the advice, I'll experiment with the memory post a reply with where that gets me.
Ozymandias is offline   Reply With Quote
Old 2020-01-13, 21:07   #5
Ozymandias
 
Jan 2020

510 Posts
Default

Ran some more tests and I think I made some progress, also I built the system myself.

First off, I swapped the existing memory in the system with a older 8gb set. I ran P95 with everything the same except the memory type. I got two errors during this run:

FATAL ERROR: Rounding was 0.484375, expected less than 0.4
Hardware failure detected, consult stress.txt file.
Torture Test completed 87 tests in 3 hours 30 minutes - 1 errors, 0 warnings
Worker Stopped

FATAL ERROR: Resulting sum was 699302658479884, expected: 732984828215990.4
Hardware failure detected, consult stress.txt file.
Torture Test completed 93 tests in 3 hours 29 minutes - 1 errors, 0 warnings
Worker Stopped

I then moved the two sticks of memory into the two other memory slots and got the following error:

FATAL ERROR: Rounding was 4.624687723e+011, expected less than 0.4
Hardware failure detected, consult stress.txt file.
Torture Test completed 60 tests in 2 hours 22 minutes - 1 errors, 0 warnings
Worker Stopped

With this information is it safe to say there is a hardware problem with the cpu? Or does the problem lie elsewhere?

Last fiddled with by Ozymandias on 2020-01-13 at 21:16
Ozymandias is offline   Reply With Quote
Old 2020-01-13, 21:29   #6
paulunderwood
 
paulunderwood's Avatar
 
Sep 2002
Database er0rr

3,617 Posts
Default

Have you measured the tempratures through the various sensors with temperature monitoring software? Note down what they are idle and under load with Prime95.

Re the CPU. It could be that the pads are not quite right, the heatsink not screwed down enough -- but its an AMD clip: no problem -- or might be a hotspot where thermal paste was applied improperly -- or did it have a pad preapplied? It might need a very very little increase in voltage,

I'd say check out the temperature software first and get back to us...

Last fiddled with by paulunderwood on 2020-01-13 at 21:52
paulunderwood is offline   Reply With Quote
Old 2020-01-14, 02:19   #7
Ozymandias
 
Jan 2020

5 Posts
Default

I used Core Temp to record the different temperatures, I found that the CPU was idling at 50-60°C Which seemed... High, to me at least.

Under a P95 Run it was holding pretty steady at 90°C.

In terms of my cooling setup I'm running a AMD Wraith Max cooler. Additionally When I was first having issues I completely re-seated everything in the PC including the CPU and Cooler. I found that it looked like not all the pre-applied CPU paste had came in contact with the processor, Photo (Second in the gallery): https://imgur.com/a/XSwHtrY

While re-assembling the system I made sure to follow AMD's instructions on how to mount the cooler and I noted that it took somewhat more pressure than the first time to mount it.

Last fiddled with by Ozymandias on 2020-01-14 at 02:19
Ozymandias is offline   Reply With Quote
Old 2020-01-14, 06:07   #8
nomead
 
nomead's Avatar
 
"Sam Laur"
Dec 2018
Turku, Finland

1010010102 Posts
Default

I think there's something not quite right with the cooling. Granted, my R5 3600 is slower, and I run it with a cheap ($25) tower cooler instead of the box cooler, but it idles at around 35C, and running mprime raises that to 68C. (Also running mfaktc in the same computer case raises the CPU to 74C because of the extra GPU heat generated within the box).

I seem to get about 3.9 GHz at that temperature. I haven't done any overclocking, or used any overdrive settings in the BIOS, it's just running at whatever frequency the internal boost algorithm wants to use. These Ryzen chips seem to boost themselves up in frequency until they reach power or thermal limits. If your processor reaches 90C, it is likely to be throttling the clock down already. So, follow the clock frequency along with the temperature when starting the test, how much does the frequency drop when the CPU heats up?

That preapplied paste is usually quite dry (well it has to be), and in my experience only seems to work once. So after lifting the cooler off the CPU, I'd recommend cleaning the old paste off and using new paste. Why it took more pressure to mount... simple, the bits of paste that were left on the CPU IHS and the bits that are on the cooler rarely match up again. So now there are portions with no paste and places where there are two layers of paste. It will flow a bit when heated up, but maybe not enough to make the thermal contact good enough again.
nomead is offline   Reply With Quote
Old 2020-01-14, 07:43   #9
paulunderwood
 
paulunderwood's Avatar
 
Sep 2002
Database er0rr

361710 Posts
Default

Yup. Clean the paste off the chip and off the heatsink with "remover" and prepare with "preparer". And apply some new thermal paste -- not to thickly, which prevents heat conduction, and enough so as not to get any air pockets which heat up. (I always use Arctic Silver 5 and a debit card as a spreader. YMMV.)

Last fiddled with by paulunderwood on 2020-01-14 at 07:46
paulunderwood is offline   Reply With Quote
Old 2020-01-16, 21:17   #10
Ozymandias
 
Jan 2020

1012 Posts
Default

So I've had the chance to re-apply some new thermal paste and think I've made some (limited) progress at least. In order to bring temps down I:
  • Cleaned the existing thermal paste and added new paste (Arctic Silver 5)
  • Increased the CPU fan profile to run at a slightly higher speed
  • Changed the CPU settings from "Default" to "Eco-Mode" in Ryzen Master

The final result was the CPU staying rock steady at 50°C at idle and steady at 85°C while running P95. Lastly I think I've managed to get the system somewhat stable, It was able to run P95 for around 12 hours yesterday with no errors.

With all that said the temps do still seem high, especially at idle speeds. Any thoughts?

Last fiddled with by Ozymandias on 2020-01-16 at 21:18
Ozymandias is offline   Reply With Quote
Old 2020-01-16, 22:10   #11
paulunderwood
 
paulunderwood's Avatar
 
Sep 2002
Database er0rr

361710 Posts
Default

It does not sound at all surprising. 85C for a big chip like that running flat out with Prime95. Keep it free of dust and it should be okay during the summer months. If you can you might then want to add more case cooling. I am glad you got it stable.
paulunderwood is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
Errors kriesel kriesel 4 2019-05-21 20:51
Prime95 errors ( 29.4 v8) under Win 10 x64 pepi37 Software 0 2018-11-29 08:17
Memtest86+ shows no errors but computer crashes with Prime95 TObject Hardware 11 2013-05-09 11:43
ERRORS Unregistered Information & Answers 2 2013-04-01 04:14
Prime95 roundoff errors pjaj Software 18 2011-07-20 03:04

All times are UTC. The time now is 14:03.

Mon Apr 12 14:03:38 UTC 2021 up 4 days, 8:44, 1 user, load averages: 2.19, 2.09, 1.94

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.