View Single Post
Old 2021-05-30, 10:08   #6
Romulan Interpreter
LaurV's Avatar
Jun 2011

2·3·7·233 Posts

If some error happens, like bad GC check or a too-high rounding, etc, then P95 tries to re-do the iteration using a different (slower) method. Sometimes, like for some rounding errors, borderline FFT sizes, as mentioned above, there is no problem, and the slower method will get the same result. Then it will say that the "result is reproducible", so there was not a hardware error. Sometimes the slower calculation gets a different result, and in that case it will redo the GC, or resume from an earlier checkpoint, depending on the situation. If that's the case, your result is still OK, there is no error, and the confidence in the final result being correct is very high. However, the app will let you know that some error happened, so you can take measures in the future (like, dusting, reducing clocks, re-seat the CPU - not reset, re-seat means taking the CPU out, clean, apply new paste, etc, whatever, up to you or your IT guys).

If you see 1 error, 1 error, 1 error, 1 error, 1 error, at every iteration/checkpoint/printing on the screen, there is no problem. This are NOT new errors, it is the same error that happened in the past, the system lets you know, so you can decide. Errors happen to all of us, now and then. I see one or two monthly, or every two mounts, when I overclock. They are harmless.

If you see 1 error, 2 errors, 3 errors, 5 errors, 77 errors, at every iteration/checkpoint/printing on the screen, or if you see 1 error at every test, or often (the counter is reset with the new assignment), then you are in deep shh.. you need to take action. I mean, bad system will continue to produce errors. Then, dusting, reduce clocks, re-seat, whatever the other guys said.

Last fiddled with by LaurV on 2021-05-30 at 10:10
LaurV is online now   Reply With Quote