mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   Hardware (https://www.mersenneforum.org/forumdisplay.php?f=9)
-   -   Help: Hardware errors / 1 Gerbicz/double check error (https://www.mersenneforum.org/showthread.php?t=26851)

MarkVanCoutren 2021-05-29 22:22

Help: Hardware errors / 1 Gerbicz/double check error
 
I'm new to prime95 and GIMPS and I've been getting this message with my Intel Core i5-9600K 3.7 GHz 6-Core Processor running Windows 10

Iteration: 8280000 / 108671053 [7.61%], ms/iter: 10.106, ETA: 11d 17:48
Hardware errors have occurred during the test!
1 Gerbicz/double-check error.
Confidence in final result is excellent

I've been getting this same sequence every iteration and I think I got it on my last number as well. I've tried stopping it for a few hours to let it cool but it keeps giving me the error.
CPU temp is 51/52 C and the cores are around 61 C each (from Open Hardware Monitor)
I haven't tried to overclock this at all. I just left it running nonstop for a few days. Have I broken my computer?

moebius 2021-05-29 23:05

Probably your (PRP) result will be right, so let it run till end.

Some memory modules can't be get stable, or the processor core temperatures are to high.
Example for worst case error ratio:
[URL="https://mersenneforum.org/showpost.php?p=474348&postcount=159"]https://mersenneforum.org/showpost.php?p=474348&postcount=159[/URL]

tuckerkao 2021-05-29 23:06

The situation should be okay, it just indicates that there has been 1 Grebicz error check happened. I met this situation before and my final result was still accurate after the PRP certification from another user.

Once an error has occurred, it'll show on every message. Just let the machine finish the PRP testing. Unless you get 2 error checks for the same block, it should be fine in the end.

VBCurtis 2021-05-29 23:07

You haven't broken it. You may have uncovered that it doesn't handle full-power heat generation anymore (say, from dust accumulation), or you may have uncovered a bad memory stick.

If it's a desktop, open it up and blow out the dust. Check to make sure all fans still turn when the machine is powered on. You might also choose to run Prime95 on fewer cores so that it generates less heat- if single-threaded operation still produces these errors, then it is more likely you have a failing memory stick and less likely dust / heat management is the culprit.

You might look into memtest86, or another memory-testing program, to try to narrow down what might be causing the hardware errors.

EDIT: In your post in another thread, you mentioned overclocking. Getting an error like this means you went too far, and need to back off the overclock for stability.

Aramis Wyler 2021-05-29 23:07

As long as it says confidence is excellent you'll be ok. I get some numbers where I have that every time, and other numbers where I don't see it at all. Must be edge cases.

LaurV 2021-05-30 10:08

If some error happens, like bad GC check or a too-high rounding, etc, then P95 tries to re-do the iteration using a different (slower) method. Sometimes, like for some rounding errors, borderline FFT sizes, as mentioned above, there is no problem, and the slower method will get the same result. Then it will say that the "result is reproducible", so there was not a hardware error. Sometimes the slower calculation gets a different result, and in that case it will redo the GC, or resume from an earlier checkpoint, depending on the situation. If that's the case, your result is still OK, there is no error, and the confidence in the final result being correct is very high. However, the app will let you know that some error happened, so you can take measures in the future (like, dusting, reducing clocks, re-seat the CPU - not reset, re-seat means taking the CPU out, clean, apply new paste, etc, whatever, up to you or your IT guys).

If you see 1 error, 1 error, 1 error, 1 error, 1 error, at every iteration/checkpoint/printing on the screen, there is no problem. This are NOT new errors, it is the same error that happened in the past, the system lets you know, so you can decide. Errors happen to all of us, now and then. I see one or two monthly, or every two mounts, when I overclock. They are harmless.

If you see 1 error, 2 errors, 3 errors, 5 errors, 77 errors, at every iteration/checkpoint/printing on the screen, or if you see 1 error at every test, or often (the counter is reset with the new assignment), then you are in deep shh.. you need to take action. I mean, bad system will continue to produce errors. Then, dusting, reduce clocks, re-seat, whatever the other guys said.


All times are UTC. The time now is 03:08.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.