mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   Information & Answers (https://www.mersenneforum.org/forumdisplay.php?f=38)
-   -   Is there an FAQ for Error and Warning messages? (https://www.mersenneforum.org/showthread.php?t=17357)

Unregistered 2012-10-31 15:29

Is there an FAQ for Error and Warning messages?
 
I have been running P95 stress test on a new system and I have (1) of the (8) cores that stops during the test with the following message:

"0 errors, 100 warnings"

I can't find any reference to this warning in numerous searches and after reading many threads here with error messages. I did not know if there is an FAQ that lists all of the program error/warning messages and what they are likely to mean, but i could not find it if there is?

Since all of the other cores run just fine and no errors were recorded in 45 minutes of torture testing, I'm speculating that the one core may be running a little hotter than the others which are below the maximum, but approaching it????

Any information on an FAQ or likely reason for the above warning, not error message, is appreciated.

Dubslow 2012-10-31 18:42

Without seeing the actual warning, it's hard to say. Could you copy and paste [U]all[/U] output from the torture test? (It should be one of the menu options. 45 minutes of testing shouldn't produce too much text.)

Alternately, if that doesn't help, there are utilities out there that can measure your CPU temperatures on a per core basis, so you can test your very reasonable hypothesis. In the past I've used [URL="http://www.cpuid.com/softwares/hwmonitor.html"]HWMonitor[/URL], but others here may have their own recommendations.

Unregistered 2012-10-31 18:55

Thanks for the reply.

I'm already monitoring the core temps and as I indicated and they are approaching the recommended limit but they are not there yet. Even though the specific core doesn't show to be excessively hot, if the error message is related to it, the temp may be close enough to the threshold to trigger the warning?

Dubslow 2012-10-31 21:32

Perhaps. Prime95 does not directly monitor the temperatures, only the math. Without more detail, I can't help much more.

Unregistered 2013-01-27 00:25

Here's a strange one...
 
I have been running P95 as a stress test on a new PC and it was running just fine, no errors, all was good. At 47 minutes of run time all of the cores finished test 9 and the message "Self-test complete 720K- PASSED" posted for ALL 8 cores. Then a few seconds later core #7 listed a strange message when the cores were all preparing to run the next test: "Torture test completed 32 tests in 47 minutes - 0 errors, 100 warnings". Then seconds later core #5 which had also stated "Passed", listed the almost exact same message: "Torture test completed 32 tests in 48 minutes - 0 errors, 100 warnings".

The test is still running on the other 6 cores without issue. Below is a copy of the test and as you can see, there are NO Errors listed until AFTER the 18:47 timeminute mark when the software indicated all tests passed and it prepared to start a new test.

Please advise as this looks like a software issue, not a hardware issue??? The errors occured after the test had completed and was preparing to run again, not while actually testing.

The PC is a Asrock mobo, with AMD Fx-8350 CPU. It runs flawlessly in the OCCT test for hours on end. The RAM has passed over 96 hours of memtest with no errors. The PC is 100% stable in all tests including P95 until the software attempts to start a new test after ~47 minutes.

[CODE][Jan 26 18:00] Worker starting
[Jan 26 18:00] Setting affinity to run worker on logical CPU #7
[Jan 26 18:00] Beginning a continuous self-test to check your computer.
[Jan 26 18:00] Please read stress.txt. Choose Test/Stop to end this test.
[Jan 26 18:00] Test 1, 6500 Lucas-Lehmer iterations of M12451841 using AMD K10 type-2 FFT length 640K, Pass1=640, Pass2=1K.
[Jan 26 18:02] Test 2, 6500 Lucas-Lehmer iterations of M12451839 using AMD K10 FFT length 640K, Pass1=640, Pass2=1K.
[Jan 26 18:03] Test 3, 6500 Lucas-Lehmer iterations of M12196481 using AMD K10 type-2 FFT length 640K, Pass1=640, Pass2=1K.
[Jan 26 18:05] Test 4, 6500 Lucas-Lehmer iterations of M11796481 using AMD K10 FFT length 640K, Pass1=640, Pass2=1K.
[Jan 26 18:06] Test 5, 6500 Lucas-Lehmer iterations of M11796479 using AMD K10 type-2 FFT length 640K, Pass1=640, Pass2=1K.
[Jan 26 18:08] Test 6, 6500 Lucas-Lehmer iterations of M11596479 using AMD K10 FFT length 640K, Pass1=640, Pass2=1K.
[Jan 26 18:09] Test 7, 6500 Lucas-Lehmer iterations of M11285761 using AMD K10 type-2 FFT length 640K, Pass1=640, Pass2=1K.
[Jan 26 18:11] Test 8, 6500 Lucas-Lehmer iterations of M10885759 using AMD K10 FFT length 640K, Pass1=640, Pass2=1K.
[Jan 26 18:12] Test 9, 6500 Lucas-Lehmer iterations of M10485761 using AMD K10 type-2 FFT length 640K, Pass1=640, Pass2=1K.
[Jan 26 18:14] Test 10, 6500 Lucas-Lehmer iterations of M10485759 using AMD K10 FFT length 640K, Pass1=640, Pass2=1K.
[Jan 26 18:15] Self-test 640K passed!
[Jan 26 18:15] Test 1, 800000 Lucas-Lehmer iterations of M172031 using AMD K10 type-1 FFT length 8K, Pass1=32, Pass2=256.
[Jan 26 18:17] Test 2, 800000 Lucas-Lehmer iterations of M163839 using AMD K10 type-1 FFT length 8K, Pass1=32, Pass2=256.
[Jan 26 18:18] Test 3, 800000 Lucas-Lehmer iterations of M159745 using AMD K10 type-1 FFT length 8K, Pass1=32, Pass2=256.
[Jan 26 18:19] Test 4, 800000 Lucas-Lehmer iterations of M157695 using AMD K10 type-1 FFT length 8K, Pass1=32, Pass2=256.
[Jan 26 18:21] Test 5, 800000 Lucas-Lehmer iterations of M155649 using AMD K10 type-1 FFT length 8K, Pass1=32, Pass2=256.
[Jan 26 18:22] Test 6, 800000 Lucas-Lehmer iterations of M153599 using AMD K10 type-1 FFT length 8K, Pass1=32, Pass2=256.
[Jan 26 18:23] Test 7, 800000 Lucas-Lehmer iterations of M147455 using AMD K10 type-1 FFT length 8K, Pass1=32, Pass2=256.
[Jan 26 18:25] Test 8, 800000 Lucas-Lehmer iterations of M143361 using AMD K10 type-1 FFT length 8K, Pass1=32, Pass2=256.
[Jan 26 18:26] Test 9, 800000 Lucas-Lehmer iterations of M141311 using AMD K10 type-1 FFT length 8K, Pass1=32, Pass2=256.
[Jan 26 18:27] Test 10, 800000 Lucas-Lehmer iterations of M135169 using AMD K10 type-1 FFT length 8K, Pass1=32, Pass2=256.
[Jan 26 18:29] Test 11, 800000 Lucas-Lehmer iterations of M172031 using AMD K10 type-1 FFT length 8K, Pass1=32, Pass2=256.
[Jan 26 18:30] Test 12, 800000 Lucas-Lehmer iterations of M163839 using AMD K10 type-1 FFT length 8K, Pass1=32, Pass2=256.
[Jan 26 18:31] Self-test 8K passed!
[Jan 26 18:31] Test 1, 5300 Lucas-Lehmer iterations of M14155777 using AMD K10 type-2 FFT length 720K, Pass1=320, Pass2=2304.
[Jan 26 18:33] Test 2, 5300 Lucas-Lehmer iterations of M14155775 using AMD K10 FFT length 720K, Pass1=320, Pass2=2304.
[Jan 26 18:34] Test 3, 5300 Lucas-Lehmer iterations of M13969343 using AMD K10 type-2 FFT length 720K, Pass1=320, Pass2=2304.
[Jan 26 18:36] Test 4, 5300 Lucas-Lehmer iterations of M13669345 using AMD K10 FFT length 720K, Pass1=320, Pass2=2304.
[Jan 26 18:37] Test 5, 5300 Lucas-Lehmer iterations of M13369345 using AMD K10 type-2 FFT length 720K, Pass1=320, Pass2=2304.
[Jan 26 18:39] Test 6, 5300 Lucas-Lehmer iterations of M13369343 using AMD K10 FFT length 720K, Pass1=320, Pass2=2304.
[Jan 26 18:40] Test 7, 5300 Lucas-Lehmer iterations of M13069345 using AMD K10 type-2 FFT length 720K, Pass1=320, Pass2=2304.
[Jan 26 18:42] Test 8, 5300 Lucas-Lehmer iterations of M12969343 using AMD K10 FFT length 720K, Pass1=320, Pass2=2304.
[Jan 26 18:43] Test 9, 6500 Lucas-Lehmer iterations of M12451841 using AMD K10 type-2 FFT length 720K, Pass1=320, Pass2=2304.
[Jan 26 18:45] Test 10, 6500 Lucas-Lehmer iterations of M12451839 using AMD K10 FFT length 720K, Pass1=320, Pass2=2304.
[Jan 26 18:47] Self-test 720K passed!
[Jan 26 18:47] Test 1, 460000 Lucas-Lehmer iterations of M250519 using AMD K10 type-0 FFT length 12K, Pass1=48, Pass2=256.
[Jan 26 18:47] ERROR: ILLEGAL SUMOUT
[Jan 26 18:47] Possible hardware failure, consult readme.txt file, restarting test.
[Jan 26 18:47] ERROR: ILLEGAL SUMOUT
[Jan 26 18:47] Possible hardware failure, consult readme.txt file, restarting test.
[Jan 26 18:47] ERROR: ILLEGAL SUMOUT
[Jan 26 18:47] Possible hardware failure, consult readme.txt file, restarting test.
[Jan 26 18:47] ERROR: ILLEGAL SUMOUT
[Jan 26 18:47] Possible hardware failure, consult readme.txt file, restarting test.
[Jan 26 18:47] ERROR: ILLEGAL SUMOUT
[Jan 26 18:47] Possible hardware failure, consult readme.txt file, restarting test.
[Jan 26 18:47] ERROR: ILLEGAL SUMOUT
[Jan 26 18:47] Possible hardware failure, consult readme.txt file, restarting test.
[Jan 26 18:47] ERROR: ILLEGAL SUMOUT
[Jan 26 18:47] Possible hardware failure, consult readme.txt file, restarting test.
[Jan 26 18:47] ERROR: ILLEGAL SUMOUT
[Jan 26 18:47] Possible hardware failure, consult readme.txt file, restarting test.[/CODE]
snip duplicate error messages as forum won't allow over 10,000 words.

[CODE][Jan 26 18:47] Maximum number of warnings exceeded.
[Jan 26 18:47] Torture Test completed 32 tests in 47 minutes - 0 errors, 100 warnings.
[Jan 26 18:47] Worker stopped.[/CODE]

Core #5 test results are identical other than 1 minute later. Neither core would continue testing after the false error messages.

Dubslow 2013-01-27 04:14

It looks like it started the 12K test after 720K, which then failed.

Could you redo the test, except change your time stamp to include seconds (or ms)? See "undoc.txt" for details on how to do that.

Jorge 2013-01-27 04:24

I'll give it a try but I'm concerned that the test history clearly shows a problem with starting a new test which makes me think there is a scheduling issue in P95 with high core count CPUs as OCCT runs for hours without issue as does every other stress test that I can find.

Some folks have expressed concern that P95 does not in fact run without issue on AMD FX series CPUs. I don't want to be chasing a P95 issue when I have no means to know what the issue is or any means to resolve it if it is in fact a P95 issue.

Prime95 2013-01-27 05:14

[QUOTE=Unregistered;326038][CODE]
[Jan 26 18:47] Test 1, 460000 Lucas-Lehmer iterations of M250519 using AMD K10 type-0 FFT length 12K, Pass1=48, Pass2=256.
[Jan 26 18:47] ERROR: ILLEGAL SUMOUT
[Jan 26 18:47] Possible hardware failure, consult readme.txt file, restarting test.[/CODE][/QUOTE]

Prime95 started the 12K FFT test and ran into a hardware problem. ILLEGAL SUMOUT is generally a floating point operation returning an invalid floating point value.

The 12K FFT (and 8K FFT) can put more stress on the FPU since the FFT data fits in the caches which reduces stress on main memory, but increasing stress on the FPU and caches.

Rerun the stress test, but choose small in-place FFT tests. You may be real close to 100% stable, but not quite.

It is possible you found a rare prime95 bug, but not real likely.

Jorge 2013-01-27 07:14

I find it implausible that the system runs every other stress test for hours and even runs P95 for ~47 minutes with zero errors and then all of a sudden has two cores that error out within seconds of each other. While it may be possible, it's highly unlikely. In the past I have used P95 on quad core CPUs and it's been flawless, but I'm wondering if it has an issue with 8-core CPUs?

I'll try running the test again with small in-place FFF. If I recall this causes more CPU temp. though that has not been an issue in the prior tests.

NBtarheel_33 2013-01-27 12:35

[QUOTE=Jorge;326097]I find it implausible that the system runs every other stress test for hours and even runs P95 for ~47 minutes with zero errors and then all of a sudden has two cores that error out within seconds of each other. While it may be possible, it's highly unlikely. In the past I have used P95 on quad core CPUs and it's been flawless, but I'm wondering if it has an issue with 8-core CPUs?[/QUOTE]

IIRC, Prime95 has been run on 32- (and possibly 48- ) core systems without any issues. Note that Prime95 is an incredibly rigorous standard for stress testing (hence its popularity); many systems only ever achieve 98-99% stability (i.e. there is some test that fails, or the test fails after a long period of time).

What if you tried running a small LL test that only requires the 8K or 12K FFTs? See if you get ILLEGAL SUMOUT there, as well.

Jorge 2013-01-27 19:17

I've used P95 for years on many PCs along with OCCT and I understand that both of these test programs are very severe. That is precisely why I use them and make sure all of my PCs will run for 24 hours without any errors.

IME every PC that P95 would error on, would also error on OCCT. This PC does not error on any stress test but P95. I'm not convinced it's the PC, especially with quite a few people reporting the same situation with AMD FX processors - even at stock CPU settings. There may be some monitoring issue with AMD FX processors, I don't know. That's why I'm providing test logs to try and resolve the issue.

The run log below shows this system ran perfectly with small FFF for 9 hours and 50 minutes then suddenly halted based on "100 warnings", not based on an actual error. No errors are reported. That's 424 test and almost 10 solid hours of P95 with no errors and then all of a sudden it stops working based on (100) possible hardware "warnings".

Notice the BLUE highlighted message at the end... [B][COLOR=blue]ZERO ERRORS, 100 Warnings[/COLOR][/B]. If there are zero errors, then the test should not have stopped. If there was an error it would be displayed. The test stopped based on "100 warnings". The hardware warnings might be an issue with the AMD FX processors since there are zero errors for the tests?

During the 9 hours and 50 minutes of testing, they system has already run the same string several times without fail. With the FX processors being a totally new X86 CPU design, there may be some monitoring issues that create "warnings" ? I don't know but I know that OCCT uses a very similar stress testing methodology and appears to be as severe, yet this PC runs OCCT for hours without errors. Even in this P95 test it never showed an error after almost 10 hours of testing, it showed "100 warnings", but no errors. The warnings only occur at the beginning of a new test string, just as in the previous P95 test, after ~47 minutes. P95 completed almost 10 hours of tests on this PC without a single error. [B][U]The only time it shows an indicated issue is at the beginning of a test when launching a new string.[/U][/B]

Whomever is responsible for the P95 stress testing program might want to take a look and/or conduct their own tests with an AMD FX processor to see if they can determine why there are (100) warnings, but no errors. If this is a signal ringing issue, maybe the warning threshold needs to be raised to (1000) or something to compensation for false warnings and improper stoppage of the test?

[code][Jan 27 02:44] Worker starting
[Jan 27 02:44] Setting affinity to run worker on logical CPU #7
[Jan 27 02:44] Beginning a continuous self-test to check your computer.
[Jan 27 02:44] Please read stress.txt. Choose Test/Stop to end this test.
[Jan 27 02:44] Test 1, 180000 Lucas-Lehmer iterations of M580673 using AMD K10 type-1 FFT length 28K, Pass1=112, Pass2=256.
[Jan 27 02:45] Test 2, 180000 Lucas-Lehmer iterations of M573441 using AMD K10 type-1 FFT length 28K, Pass1=112, Pass2=256.
[Jan 27 02:46] Test 3, 180000 Lucas-Lehmer iterations of M565247 using AMD K10 type-1 FFT length 28K, Pass1=112, Pass2=256.

SNIP

[Jan 27 03:00] Self-test 28K passed!
[Jan 27 03:00] Test 1, 800000 Lucas-Lehmer iterations of M172031 using AMD K10 type-1 FFT length 8K, Pass1=32, Pass2=256.
[Jan 27 03:01] Test 2, 800000 Lucas-Lehmer iterations of M163839 using AMD K10 type-1 FFT length 8K, Pass1=32, Pass2=256.
[B][COLOR=blue][Jan 27 03:02] Test 3, 800000 Lucas-Lehmer iterations of M159745[/COLOR][/B] using AMD K10 type-1 FFT length 8K, Pass1=32, Pass2=256.
[Jan 27 03:03] Test 4, 800000 Lucas-Lehmer iterations of M157695 using AMD K10 type-1 FFT length 8K, Pass1=32, Pass2=256.
[Jan 27 03:15] Self-test 8K passed!

SNIP

[B][COLOR=blue][Jan 27 07:41] Test 1, 800000 Lucas-Lehmer iterations of M159745[/COLOR][/B] using AMD K10 type-0 FFT length 10K, Pass1=40, Pass2=256.
[Jan 27 07:43] Test 2, 800000 Lucas-Lehmer iterations of M157695 using AMD K10 type-0 FFT length 10K, Pass1=40, Pass2=256.
[Jan 27 07:45] Test 3, 800000 Lucas-Lehmer iterations of M155649 using AMD K10 type-0 FFT length 10K, Pass1=40, Pass2=256.
[Jan 27 07:47] Test 4, 800000 Lucas-Lehmer iterations of M153599 using [Jan 27 07:56] Test 9, 560000 Lucas-Lehmer iterations of M212991 using AMD K10 type-0 FFT length 10K, Pass1=40, Pass2=256.
[Jan 27 07:57] Self-test 10K passed!

SNIP

[B][COLOR=blue][Jan 27 09:48] Test 1, 800000 Lucas-Lehmer iterations of M159745[/COLOR][/B] using AMD K10 type-0 FFT length 12K, Pass1=48, Pass2=256.
[Jan 27 09:50] Test 2, 800000 Lucas-Lehmer iterations of M157695 using AMD K10 type-0 FFT length 12K, Pass1=48, Pass2=256.
[Jan 27 09:52] Test 3, 800000 Lucas-Lehmer iterations of M155649 using AMD K10 type-0 FFT length 12K, Pass1=48, Pass2=256.
[Jan 27 09:55] Test 4, 800000 Lucas-Lehmer iterations of M153599 using [Jan 27 10:03] Test 8, 800000 Lucas-Lehmer iterations of M135169 using AMD K10 type-0 FFT length 12K, Pass1=48, Pass2=256.
[Jan 27 10:05] Self-test 12K passed!

SNIP

[Jan 27 10:05] Test 1, 120000 Lucas-Lehmer iterations of M778241 using AMD K10 type-1 FFT length 56K, Pass1=224, Pass2=256.
[Jan 27 10:07] Test 2, 120000 Lucas-Lehmer iterations of M753663 using AMD K10 type-1 FFT length 56K, Pass1=224, Pass2=256.
[Jan 27 10:08] Test 3, 120000 Lucas-Lehmer iterations of M745473 using AMD K10 type-1 FFT length 56K, Pass1=224, Pass2=256.
[Jan 27 10:10] Test 4, 120000 Lucas-Lehmer iterations of M737279 using [Jan 27 10:19] Test 9, 160000 Lucas-Lehmer iterations of M644399 using AMD K10 type-1 FFT length 56K, Pass1=224, Pass2=256.
[Jan 27 10:21] Self-test 56K passed!

SNIP

[Jan 27 10:21] Test 1, 380000 Lucas-Lehmer iterations of M278527 using AMD K10 type-1 FFT length 20K, Pass1=80, Pass2=256.
[Jan 27 10:23] Test 2, 380000 Lucas-Lehmer iterations of M274335 using AMD K10 type-1 FFT length 20K, Pass1=80, Pass2=256.
[Jan 27 10:24] Test 3, 380000 Lucas-Lehmer iterations of M270335 using AMD K10 type-1 FFT length 20K, Pass1=80, Pass2=256.
[Jan 27 10:26] Test 4, 380000 Lucas-Lehmer iterations of M266241 using [Jan 27 10:34] Test 9, 460000 Lucas-Lehmer iterations of M245281 using AMD K10 type-1 FFT length 20K, Pass1=80, Pass2=256.
[Jan 27 10:36] Self-test 20K passed!

SNIP

[Jan 27 10:36] Test 1, 210000 Lucas-Lehmer iterations of M442369 using AMD K10 type-2 FFT length 32K, Pass1=128, Pass2=256.
[Jan 27 10:38] Test 2, 210000 Lucas-Lehmer iterations of M441041 using AMD K10 FFT length 32K, Pass1=128, Pass2=256.
[Jan 27 10:39] Test 3, 210000 Lucas-Lehmer iterations of M436943 using AMD K10 type-2 FFT length 32K, Pass1=128, Pass2=256.
[Jan 27 10:40] Test 4, 270000 Lucas-Lehmer iterations of M420217 using [Jan 27 10:51] Test 10, 270000 Lucas-Lehmer iterations of M376833 using AMD K10 FFT length 32K, Pass1=128, Pass2=256.
[Jan 27 10:52] Self-test 32K passed!

SNIP

[Jan 27 10:52] Test 1, 560000 Lucas-Lehmer iterations of M210415 using AMD K10 type-0 FFT length 10K, Pass1=40, Pass2=256.
[Jan 27 10:54] Test 2, 560000 Lucas-Lehmer iterations of M208897 using AMD K10 type-0 FFT length 10K, Pass1=40, Pass2=256.
[Jan 27 10:55] Test 3, 560000 Lucas-Lehmer iterations of M204799 using AMD K10 type-0 FFT length 10K, Pass1=40, Pass2=256.
[Jan 27 10:56] Test 4, 560000 Lucas-Lehmer iterations of M200705 using [B][COLOR=blue][Jan 27 11:07] Test 12, 800000 Lucas-Lehmer iterations of M159745[/COLOR][/B] using AMD K10 type-0 FFT length 10K, Pass1=40, Pass2=256.
[Jan 27 11:09] Self-test 10K passed!

SNIP

[Jan 27 12:30] Test 1, 800000 Lucas-Lehmer iterations of M135169 using AMD K10 type-1 FFT length 8K, Pass1=32, Pass2=256.
[Jan 27 12:31] Test 2, 800000 Lucas-Lehmer iterations of M172031 using AMD K10 type-1 FFT length 8K, Pass1=32, Pass2=256.
[Jan 27 12:32] Test 3, 800000 Lucas-Lehmer iterations of M163839 using AMD K10 type-1 FFT length 8K, Pass1=32, Pass2=256.
[B][COLOR=blue][Jan 27 12:33] Test 4, 800000 Lucas-Lehmer iterations of M159745[/COLOR][/B] using AMD K10 type-1 FFT length 8K, Pass1=32, Pass2=256.
[Jan 27 12:34] ERROR: ILLEGAL SUMOUT
[Jan 27 12:34] Possible hardware failure, consult readme.txt file, restarting test.
[Jan 27 12:34] ERROR: ILLEGAL SUMOUT
[Jan 27 12:34] Possible hardware failure, consult readme.txt file, restarting test.

*SNIP*

[Jan 27 12:34] Possible hardware failure, consult readme.txt file, restarting test.
[Jan 27 12:34] ERROR: ILLEGAL SUMOUT
[Jan 27 12:34] Maximum number of warnings exceeded.
[Jan 27 12:34] Torture Test completed 424 tests in 9 hours, 50 minutes - [B][COLOR=blue]0 errors, 100 warnings[/COLOR][/B].
[Jan 27 12:34] Worker stopped.[/code]

Prime95 2013-01-27 22:26

[QUOTE=Jorge;326178]Whomever is responsible for the P95 stress testing program might want to take a look and/or conduct their own tests with an AMD FX processor to see if they can determine why there are (100) warnings, but no errors. If this is a signal ringing issue, maybe the warning threshold needs to be raised to (1000) or something to compensation for false warnings and improper stoppage of the test?[/QUOTE]


I think you are confused by the word "warning" -- and that's due to poor documentation. When an ILLEGAL SUMOUT occurs, the hardware produced an invalid floating point value -- something is wrong with the system. The reason prime95 does not call this a hardware error is because back in the early days of Windows 95 and Windows 98, poorly written device driver would fail to save and restore the floating point state resulting in floating point errors. The hardware was fine, but the device driver software was bad. Thus prime95 called this condition a "warning" or "possible hardware failure". I don't know if something similar can happen in newer operating systems.

In short, even though prime95 uses the word warning, something is wrong 00 we just don't know what that something is.

I don't have an FX processor handy, someone on the forum may and can run a stress test. I have access to an Bulldozer-based Opteron - is that similar to your FX ?

sdbardwick 2013-01-27 23:05

Jorge, what version of Prime95 are you running?
mprime64 v26.6 small FFTs on Opteron 4280 (bulldozer) uses the Core2 code path rather than the K10 path in your results.

Jorge 2013-01-28 02:26

[QUOTE=Prime95;326215]

SNIP

In short, even though prime95 uses the word warning, something is wrong 00 we just don't know what that something is.

I don't have an FX processor handy, someone on the forum may and can run a stress test. I have access to an Bulldozer-based Opteron - is that similar to your FX ?[/QUOTE]

Yes Bulldozer based Opterons use the same architecture so that should be a good test. I agree we don't know what is wrong. That is why I'm posting logs and hoping those who are involved with the development of the P95 stress test software, can look into the issue.

With the warnings happening after the PC has run for 9 hours and 50 minutes and having run the exact same string previously without issue, I'm still thinking this might be a P95 issue. When I have tested with extreme overclocking on this and other systems, and there really was an error, it would list the error, i.e. ".5 returned instead of >.4", as a typical example. There have been none of these with this FX system after many hours of stress testing.

I did read the notes about 2000/XP/Vista protecting P95 from driver issues. I don't know if Win 7/8 function the same. I'm running Win 7 64-bit, on the test PC.

I think most folks would conclude that 100% load on all 8 cores for 9 hours and 50 minutes is a stable PC but with many other FX/Bulldozer/Vishera PC owners not being able to run P95 for more than a few minutes at the default CPU frequency with no overclocking, it makes you wonder what exactly is happening. While I fully understand that some systems may use borderline quality components or be configured poorly, there are a lot of experienced enthusiasts who've never had P95 issues on the other PCs they have built over the years, me included.

sdbardwick- I'm running V27.7, (64-bit), last updated May 15, 2012 from what I see. I do not know if this could be an issue, but it's suppose to be OK for Win 7 64-bit.

Anyone willing to run P95 tests on Bulldozer/Vishera model AMD FX/Opteron processors may be able to help resolve this issue.

After some searches here I found the V27.7 P95 thread and noticed that there is a V27.9 and that some folks had issues with HT on V27.7. The suggestion was to run one thread per core, but that defeats the point of stress testing, IMO. I don't know if this is a possible issue with the Bulldozer/Vishera architecture FX/Opteron CPUs, but it might be?

Prime95 2013-01-28 05:01

[QUOTE=Jorge;326260]Yes Bulldozer based Opterons use the same architecture so that should be a good test.[/quote]

I'll run one overnight.

[quote]With the warnings happening after the PC has run for 9 hours and 50 minutes and having run the exact same string previously without issue, I'm still thinking this might be a P95 issue.[/quote]

Nope. There have been numerous reports of systems that last far longer before spitting out an error. All it means is that the system is really, really close to being prime95 stable.

[quote]When I have tested with extreme overclocking on this and other systems, and there really was an error, it would list the error, i.e. ".5 returned instead of >.4", as a typical example.[/quote]

Yes, the ILLEGAL SUMOUT failure mode is rare.


[quote]I think most folks would conclude that 100% load on all 8 cores for 9 hours and 50 minutes is a stable PC but with many other FX/Bulldozer/Vishera PC owners not being able to run P95 for more than a few minutes at the default CPU frequency with no overclocking, it makes you wonder what exactly is happening.[/quote]

10 hours without error is a stable PC for most everyday tasks. Although, I wouldn't do serious distributed computing work on such a machine.

Reports of stress test failures at stock speed is not at all uncommon. Usually its a memory problem, but ever since AMD put their memory controller on chip many of their CPUs fail at stock speed. IMO, AMD quality control was not very good a few years ago. Maybe its better these days, I don't know.

[quote]
After some searches here I found the V27.7 P95 thread and noticed that there is a V27.9 and that some folks had issues with HT on V27.7. The suggestion was to run one thread per core, but that defeats the point of stress testing, IMO.[/QUOTE]

For stress testing purposes, 27.7 and 27.9 are equivalent. You are correctly running 8 stress test threads.

sdbardwick 2013-01-28 05:12

FWIW, the 26.6 test has been running for 6+ hours without error using 8 threads (1 socket).

Jorge 2013-01-28 23:55

[QUOTE=Prime95;326280]I'll run one overnight.[/QUOTE]

[B][COLOR=blue]Excellent, all information is appreciated![/COLOR][/B]

[QUOTE=Prime95;326280]Reports of stress test failures at stock speed is not at all uncommon. Usually its a memory problem, but ever since AMD put their memory controller on chip many of their CPUs fail at stock speed. IMO, AMD quality control was not very good a few years ago. Maybe its better these days, I don't know.[/QUOTE]

[COLOR=blue]With all due respect, I totally disagree with you about AMD CPU quality. Over the past 20 years I have built a lot of PC's professionally and most were AMD. I have never had an AMD PC that had any IMC or CPU issues -ever. AMD's IMC may not be able to run RAM at as high a frequency as Intel IMC's when overclocked, but that doesn't mean they don't work just fine and reliably at the AMD specified frequencies. In comparison it is documented that Intel has shipped millions of defective CPUs, chipsets, mobos and SSDs. AMD has not shipped any defective products that I am aware of - ever. Some 40 years ago AMD also manufactured Intel's CPUs for them...[/COLOR] :wink:

[COLOR=blue]I hope that with more P95 testing we can determine if there is or is not an issue running P95 on the Bulldozer/Vishera architecture CPUs. I noticed that the Opteron models are pretty low frequency so they are likely to run fewer tests in a given period of time than the FX processors, so longer run times may be required?[/COLOR]

Prime95 2013-01-29 02:46

I'm at 20 hours of Bulldozer stress testing. 16 threads, small in-place FFTs. Linux version 27.7. No problems.

At this point, I'd say that there isn't a software problem with the prime95 stress test. The major difference between the two setups is the hardware and the OS. If it is a hardware problem, then the error should go away if you drop the CPU and memory clock rate significantly. If it is an OS issue, you might try installing linux and running the stress test (yes, I know that would be a pain!). Alternatively, you could call one-prime95-error-every-10-hours stable enough.

I'll continue the Bulldozer stress test for another day. Unfortunately, I can't run it under Windows.


[QUOTE=Jorge;326407]With all due respect, I totally disagree with you about AMD CPU quality. [/QUOTE]

That's OK. Your opinion and mine are both subjective and based on rather small data sets. I've not built any AMD systems (owned two). My opinion is based on a surge of I'm-running-at-stock-speed-your-program-must-be-broken complaints when hypertransport first came out.

LaurV 2013-01-29 03:15

[edit: obviously my post is not addressed to George, he posted in between, as it took me a while to finish this, between job tasks].

Opinions vary. You are entitled to yours. We hope you are not one of those AMD trolls, we had many here in the past. We hope you are able think objectively, and not influenced by heart.

This is not meant to be an insult. I have very good friends which I would classify as "AMD trolls". We meet around a beer bottle and talk technical things sometimes. They used AMD in the past (me too), at the time we were students, and AMD was cheaper, for about the same performance. They felt in love with it, and later stuck to first love.

The things changed, there are many years since Intel outperforms AMD at every point, quality, performance, performance/watt/buck, reliability, IPC (instructions per clock cycle), whatever. But for all those features, you have to pay more money.

You should not compare Buldozer with i7 on DP floats calculus and this kind of stuff where AMD sucks. They are targeting different markets.

If AMD did not recall CPUs it does not mean they don't have defects, but it may be they care less about the customers, or have a different policy. The defect rate is exactly the same for both Intel and AMD, the silicium chips are quite mature and stable medium, they all go 100 ppm to 200 ppm (parts per million) defect rate, etc, and for the number of transistors they have, about one in 150 CPUs are deffect. They (both!) still sell those like lower end, either with a core cut out, with some memory speed locked, bla bla. Trust me, I work in an electronic factory (some people here know me). There is no "better", it only depends on your preference, target applications, and budget.

For LL testing, well, Intel is better. It took a while to convince my friends. Guess what, they are now convinced that Intel is better, but they still use AMD, because "Intel need competition" :smile:

That I would call an "AMD troll". My friends knows that I call them such. It is a "friendly" call :razz: (and don't ask how they call me, or how we call each-other sometimes, that is what friends are for, isn't it?)...

[edit2, after reading George's post: from Prime95 (the program) stress.txt file, last paragraph, last FAQ:

[CODE]Q) A forum member said "Don't bother with prime95, it always pukes on me,
and my system is stable!. What do you make of that?"

or

"We had a server at work that ran for 2 MONTHS straight, without a reboot
I installed Prime95 on it and ran it - a couple minutes later I get an error.
You are going to tell me that the server wasn't stable?"

A) These users obviously do not subscribe to the 100% rock solid
school of thought. THEIR MACHINES DO HAVE HARDWARE PROBLEMS.
But since they are not presently running any programs that reveal
the hardware problem, the machines are quite stable. As long as
these machines never run a program that uncovers the hardware problem,
then the machines will continue to be stable.
[/CODE]

end of edit2]

Jorge 2013-01-29 03:24

[QUOTE=Prime95;326433]I'm at 20 hours of Bulldozer stress testing. 16 threads, small in-place FFTs. Linux version 27.7. No problems.

At this point, I'd say that there isn't a software problem with the prime95 stress test. The major difference between the two setups is the hardware and the OS. If it is a hardware problem, then the error should go away if you drop the CPU and memory clock rate significantly. If it is an OS issue, you might try installing linux and running the stress test (yes, I know that would be a pain!). Alternatively, you could call one-prime95-error-every-10-hours stable enough.

I'll continue the Bulldozer stress test for another day. Unfortunately, I can't run it under Windows. [/QUOTE]

I think we need to run the Bulldozer/Vishera architecture CPUs under Windows to have a more realistic understanding if there is an issue as that's what I and most other enthusiasts are using, (even though I'd much prefer to not be using Windoze). Obviously a sample run on only 1-2 CPUs may not turn up any issues but if they do then we know to investigate further.

Xyzzy 2013-01-29 03:37

[QUOTE]If it is an OS issue, you might try installing linux and running the stress test (yes, I know that would be a pain!).[/QUOTE]No pain if you use a Linux LiveCD or USB dealio. Just wget the client and rock and roll.

[url]http://www.ubuntu.com/download/help/try-ubuntu-before-you-install[/url]

[code]wget http://www.mersenneforum.org/gimps/p95v279.linux64.tar.gz
gzip -d p95v279.linux64.tar.gz
tar -xvf p95v279.linux64.tar
./mprime -m[/code]
The only thing scary about trying a different operating system is the possibility that it gives the same error and proves that your hardware is not 100% reliable.

But, if it passes, that narrows down the possible issues. For example, what happens when you run the torture test in safe mode in Windows?

[url]http://windows.microsoft.com/en-US/windows7/Start-your-computer-in-safe-mode[/url]

By eliminating a slew of drivers and programs you can eliminate variables.

FWIW, any error or warning is unacceptable to us, so we would not rest until the issue was resolved. And we would explore every possible angle to simplify the challenge.

:mike:

Jorge 2013-01-29 03:38

[QUOTE=LaurV;326434]Opinions vary. You are entitled to yours.

BIG SNIP

That I would call an "AMD troll". My friends knows that I call them such. It is a "friendly" call :razz: (and don't ask how they call me, or how we call each-other sometimes, [B]that is what friends are for, isn't it?[/B])...[/QUOTE]


NO that's not what friends are for.

BTW, I find your post TOTALLY INAPPROPRIATE, condescending, insulting, ignorant and technically incorrect in so many ways I won't even waste my time responding to such fanbois foolishness - and your post is complete OFF TOPIC.

Catch my drift?

The only reason I replied to Prime95's AMD comment was because his perception is completely inaccurate - as is 90% of what you stated as "facts", when it's your subjective opinion, unlike the Intel shipments of defective products, which are documented. Concluding that the issue I am seeing is likely a result of AMD"s perceived QC issues would be wrong as there is no basis for this belief.

Since AMD hasn't shipped any defective products that I am aware of, there is no reason for them to have a recall. You can be damn certain that if they did ship defective products, they would be forced to recall them as Intel was, but since AMD didn't ship any, there were no recalls. There does not appear to be any objective statistical or scientific bases for this myth about AMD quality issues.

Your post was of absolutely NO VALUE to this thread or the testing of V27.7 on Bulldozer/Vishera architecture CPUs running under Windoze.

If you have nothing constructive to contribute to this thread, please stay out of the thread. There are other areas of the forum if you want to talk crap over a few beers with your Bros.

My post is meant to be a constructive response to your inappropriate comment/trolling, so I hope you take it in that spirit.

Jorge 2013-01-29 03:41

[QUOTE=Xyzzy;326438]

SNIP


FWIW, any error or warning is unacceptable to us, so we would not rest until the issue was resolved. And we would explore every possible angle to simplify the challenge.

:mike:[/QUOTE]

My goal is to find an answer, but introducing an O/S that I won't be using isn't the best means to see if the issue exists under Windoze. I will continue until I do find the answer as I haven't had a PC that wouldn't Prime for 24 hours straight in the 20 years I have been building AMD and Intel powered PCs professionally.

Xyzzy 2013-01-29 03:56

Two other thoughts:

1 - Remove all items from the computer that are not necessary. Disable all on-board devices. All you need is video output of some sort. Start simple, hopefully pass the test and then add things in one by one. If you can get by with one stick of memory, run that. Simplify.

2 - Also, we have built countless computers. We have used nearly every manufacturer out there. We have spent thousands of hours diagnosing systems with weird problems. But then, we started using only well-documented Intel processors on thoroughly-tested Asus motherboards, all at stock speeds. (We also use Asus video cards.)

Twenty or thirty years ago our time was not worth much, but now it is worth much more to us, because it is a finite resource, and we are running out of it. So we use top tier products coupled with our experience and our stuff just works.

We are not saying an Asrock motherboard with an AMD processor will not work, but in our expert opinion, it is less likely to work easily.

Here is one of our recent build threads:

[url]http://www.mersenneforum.org/showthread.php?t=16871[/url]

FWIW, we built the Opteron system that GIMPS uses. It was a nightmare to build compared to our most recent plug and play adventure. Building computers today is almost boring because quality, well documented and tested components just seem to work.

But, we could be full of it, so YMMV.

:mike:

LaurV 2013-01-29 03:58

[thinking]Clearly AMD troll, as I assumed. He has no real problem with his system, just wanna prove AMD is better... This is how you catch them...[/thinking]

Again, in case you did not read the later edit of my previous post:

From Prime95 (the program) stress.txt file, last paragraph, last FAQ:

[CODE]Q) A forum member said "Don't bother with prime95, it always pukes on me,
and my system is stable!. What do you make of that?"

or

"We had a server at work that ran for 2 MONTHS straight, without a reboot
I installed Prime95 on it and ran it - a couple minutes later I get an error.
You are going to tell me that the server wasn't stable?"

A) These users obviously do not subscribe to the 100% rock solid
school of thought. THEIR MACHINES DO HAVE HARDWARE PROBLEMS.
But since they are not presently running any programs that reveal
the hardware problem, the machines are quite stable. As long as
these machines never run a program that uncovers the hardware problem,
then the machines will continue to be stable.
[/CODE]

Related to the part with "AMD-the processor which never has bugs", try goggling dragonfly and Matt Dillon as a starting point. [URL="http://www.pcpro.co.uk/news/373348/amd-confirms-cpu-bug"]Here[/URL] is a good one.

Prime95 2013-01-29 04:52

[QUOTE=Jorge;326439]The only reason I replied to Prime95's AMD comment was because his perception is completely inaccurate..., unlike the Intel shipments of defective products, which are documented. Concluding that the issue I am seeing is likely a result of AMD"s perceived QC issues would be wrong as there is no basis for this belief.[/QUOTE]

To clarify, I don't believe AMD products are defective as with the infamous Intel FDIV bug. I suspect there was a problem in the past, hopefully fixed, dealing with "binning". That is, selling a product rated to run at x GHz, but under some stress test scenarios it can't quite get to x GHz. Intel had this problem once, selling a 1.3GHz Pentium 3, which it later had to pull from the market. The only reason I mentioned all this is to shoot down the "I'm running at stock speed therefore there must be a software problem" argument.

The most common stress test failure I see today is memory sold as safe at X-Y-Z CAS/RAS/whatever-the-other-latency-setting-is. Unfortunately, under stress they are not completely stable at the X-Y-Z settings. This is due to either an inadequate binning process, or the pressures in the ultra-competitive memory market.

Now, let's get back on track. I like Xyzzy's idea of running a stress test in safe mode.

science_man_88 2013-01-29 14:19

[QUOTE=Jorge;326439]Since AMD hasn't shipped any defective products that [COLOR=Red]I am aware of,[/COLOR] there is no reason for them to have a recall. [/QUOTE]

and opinion statement, and though I suck at computers easy to find things to counter it with:

[QUOTE="http://en.wikipedia.org/wiki/Opteron#Opteron_recall"]AMD has [COLOR=Red]recalled[/COLOR] some E4 stepping-revision single-core Opteron processors, including x52 (2.6 GHz) and x54 (2.8 GHz) models which use DDR memory. The following table describes affected processors, as they are listed in AMD Opteron x52 and x54 Production Notice.[/QUOTE]

Jorge 2013-01-29 17:22

[B]This thread has turned into a pissfest over Intel vs. AMD when that has absolutely nothing to do with the issue of if P95, V27.7 runs without issue on AMD Bulldozer/Vishera based CPUs [U]under Windoze[/U].[/B]

PLEASE people stop the B.S. I didn't ask for anyone's views on Intel vs. AMD processors and these pissfests are completely OFF-TOPIC and a disservice to those interesting in determining if there is some issue with P95, V27.7 running on Bulldozer/Vishera CPUs in Windoze.

It's always the SOS, with fanbois and their need to convince the world that their POV is correct - even when inappropriate and typically inaccurate subjective beliefs and conclusions, not based in reality or in science or in statistics.

If you want to have a pissfest on AMD vs. Intel - [B][U]start your own thread and stop posting crap in this thread which is inappropriate.[/U][/B] Had Prime 95 NOT stated that in his opinion that AMD IMC quality was not as good as Intel, none of this chicken shit fanbois crap would have been posted in this thread. Since there is no merit to this belief, please do not post this crap as it always turns into a pointless pissfest.

It's just ignorant for people to be posting this foolishness in this thread. The thread is about P95, not Intel vs. AMD. The technically and socially challenged people who go around calling other folks "trolls", because they don't participate in the fanboism debates, demonstrates their personal issues. They should take those personal issues elsewhere as they don't belong in this thread.

I follow a lot of PC hardware forums and I have yet to see any AMD CPU that would not run without error at it's stated frequency. If there is a rare exception, then it would be warrantied but this is not a known or documented issue with AMD IMCs. As noted previously, there are many people who are not qualified to be building PCs but as enthusiasts, they do so anyway.

Many people buy AMD CPUs because they cost less and provide a better value than Intel. Thus you are likely to find a higher percentage of AMD enthusiasts who are unable to sort thru their hardware issues compared to Intel enthusiasts, so the perception regarding AMD CPU/IMC quality can be skewed and not a true reflection of AMD quality at all.

In 20 years of building many AMD (and a few Intel), PC's, I have never had an AMD powered PC that wouldn't run P95 for 24 hours without issue. This PC with the FX processor and P95, V27.7 is the first time I have ever experienced an issue with my PC builds and P95 - yet this system runs bulletproof under OCCT.

By many accounts from respected PC builders, PC hardware review sites, etc. AMD CPUs and IMCs are every bit as good of quality as Intel products at their rated frequencies. I have yet to see any reputable PC hardware review website state that AMD has IMC quality issues at their rated frequency.

I have already ordered new RAM and I had already tested in Safe Mode, so I am doing all the logical steps to try and determine if this is a hardware or P95 issue. Until we get more results from people running P95 on Bulldozer/Vishera under Windoze, we won't be able to reach an informed conclusion.

[B][U]PLEASE[/U][/B]: If you have nothing specific to add to this thread regarding testing of P95, V27.7 under Windoze on a Bulldozer/Vishera CPU - [B]DO NOT POST IN THIS THREAD[/B]. Start you own thread in the appropriate section of the forum.

Futhermore DO NOT POST ANY CRAP ABOUT AMD vs. INTEL IN THIS THREAD as it is OFF-TOPIC and INAPPROPRIATE!

rajula 2013-01-29 18:17

[QUOTE=Jorge;326440]My goal is to find an answer, but introducing an O/S that I won't be using isn't the best means to see if the issue exists under Windoze.[/QUOTE]

Clearly the issue exists under Windows, but wouldn't it be nice to know how related it is to having Windows as the OS?

In that regard I find xyzzy's suggestion to run P95 on Linux from a liveCD the best suggestion so far. This would not interfere with your current install and it is a (comparably) fast way of giving more hints about the role of the OS. But, it would take you a few minutes to download and write the image and boot up. Plus of course the time that it takes to run the stress test again.

As the second option I would underclock the CPU and memory and run the stress test. This should give strong hints on the reliability of the hardware.

[SIZE="1"]As a side note, I have roughly 50/50 balance in AMD and intel CPUs that I am using and have used in the past. Although I find them all reliable I would never be so naive as to think that they could not have any hardware faults. In fact, that would be my first suspicion if there were an error/warning with P95.[/SIZE]

science_man_88 2013-01-29 20:05

[QUOTE=Jorge;326494]Futhermore DO NOT POST ANY CRAP ABOUT AMD vs. INTEL IN THIS THREAD as it is OFF-TOPIC and INAPPROPRIATE![/QUOTE]

really unless you do test to prove it's the software and not hardware ( in other words either exchanging hardware/disabling software ) there can be no conclusion drawn that is not opinion of one thing or another. you claim it has to be the software but it may just be that prime95 software detects this specific instance of hardware mishaps more often,

Prime95 2013-01-30 19:43

I just terminated the 16 thread, small FFT, Bulldozer torture test after 61 hours. No errors, no warnings.

bcp19 2013-01-31 06:13

[QUOTE=Jorge;326494]In 20 years of building many AMD (and a few Intel), PC's, I have never had an AMD powered PC that wouldn't run P95 for 24 hours without issue [B](UNTIL THIS ONE--->>>)[/B]. This PC with the FX processor and P95, V27.7 is the first time I have ever experienced an issue with my PC builds and P95 - yet this system runs bulletproof under OCCT.[/QUOTE]

First, let me state I could care less about AMD vs Intel, but you sound like the fanboi you accuse others of being. There has NEVER been an AMD vs Intel flame war on this forum since I joined, until now, and who is the person up in arms and calling names?

To me, your statement above sums it all up. You talk about LaurV being condecending and conceited, but maybe you should look in the mirror since your statement comes across like you are saying "Since *I* have built X computers with no problems, this *CANNOT* be a hardware problem, the software *MUST* be bad".

Having spent 20 years as an electronic technician in the Navy (as well as over 35 years building computers), I can say with certainty that you cannot assume that because the parts are new that they are fault-free. Unless you can reproduce this same set of errors on [B]several[/B] similar machines, I fail to see how you can blame the software. Simple logic should make this clear. If 100 computers run the program fault-free and 1 fails (even after several hours) then logic dictates there is a problem with the 1.

Please also remember this, while you may feel you deserve respect because of your past and experience (inferred from your posts), you are new here and the lack of respect you have shown others is most likely the reason you are not shown the respect you feel you deserve.


All times are UTC. The time now is 09:13.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.