mersenneforum.org Is there an FAQ for Error and Warning messages?
 Register FAQ Search Today's Posts Mark Forums Read

2013-01-27, 22:26   #12
Prime95
P90 years forever!

Aug 2002
Yeehaw, FL

22×17×109 Posts

Quote:
 Originally Posted by Jorge Whomever is responsible for the P95 stress testing program might want to take a look and/or conduct their own tests with an AMD FX processor to see if they can determine why there are (100) warnings, but no errors. If this is a signal ringing issue, maybe the warning threshold needs to be raised to (1000) or something to compensation for false warnings and improper stoppage of the test?

I think you are confused by the word "warning" -- and that's due to poor documentation. When an ILLEGAL SUMOUT occurs, the hardware produced an invalid floating point value -- something is wrong with the system. The reason prime95 does not call this a hardware error is because back in the early days of Windows 95 and Windows 98, poorly written device driver would fail to save and restore the floating point state resulting in floating point errors. The hardware was fine, but the device driver software was bad. Thus prime95 called this condition a "warning" or "possible hardware failure". I don't know if something similar can happen in newer operating systems.

In short, even though prime95 uses the word warning, something is wrong 00 we just don't know what that something is.

I don't have an FX processor handy, someone on the forum may and can run a stress test. I have access to an Bulldozer-based Opteron - is that similar to your FX ?

 2013-01-27, 23:05 #13 sdbardwick     Aug 2002 North San Diego County 2×11×31 Posts Jorge, what version of Prime95 are you running? mprime64 v26.6 small FFTs on Opteron 4280 (bulldozer) uses the Core2 code path rather than the K10 path in your results.
2013-01-28, 02:26   #14
Jorge

Jan 2013

32 Posts

Quote:
 Originally Posted by Prime95 SNIP In short, even though prime95 uses the word warning, something is wrong 00 we just don't know what that something is. I don't have an FX processor handy, someone on the forum may and can run a stress test. I have access to an Bulldozer-based Opteron - is that similar to your FX ?
Yes Bulldozer based Opterons use the same architecture so that should be a good test. I agree we don't know what is wrong. That is why I'm posting logs and hoping those who are involved with the development of the P95 stress test software, can look into the issue.

With the warnings happening after the PC has run for 9 hours and 50 minutes and having run the exact same string previously without issue, I'm still thinking this might be a P95 issue. When I have tested with extreme overclocking on this and other systems, and there really was an error, it would list the error, i.e. ".5 returned instead of >.4", as a typical example. There have been none of these with this FX system after many hours of stress testing.

I did read the notes about 2000/XP/Vista protecting P95 from driver issues. I don't know if Win 7/8 function the same. I'm running Win 7 64-bit, on the test PC.

I think most folks would conclude that 100% load on all 8 cores for 9 hours and 50 minutes is a stable PC but with many other FX/Bulldozer/Vishera PC owners not being able to run P95 for more than a few minutes at the default CPU frequency with no overclocking, it makes you wonder what exactly is happening. While I fully understand that some systems may use borderline quality components or be configured poorly, there are a lot of experienced enthusiasts who've never had P95 issues on the other PCs they have built over the years, me included.

sdbardwick- I'm running V27.7, (64-bit), last updated May 15, 2012 from what I see. I do not know if this could be an issue, but it's suppose to be OK for Win 7 64-bit.

Anyone willing to run P95 tests on Bulldozer/Vishera model AMD FX/Opteron processors may be able to help resolve this issue.

After some searches here I found the V27.7 P95 thread and noticed that there is a V27.9 and that some folks had issues with HT on V27.7. The suggestion was to run one thread per core, but that defeats the point of stress testing, IMO. I don't know if this is a possible issue with the Bulldozer/Vishera architecture FX/Opteron CPUs, but it might be?

Last fiddled with by Jorge on 2013-01-28 at 02:52

2013-01-28, 05:01   #15
Prime95
P90 years forever!

Aug 2002
Yeehaw, FL

741210 Posts

Quote:
 Originally Posted by Jorge Yes Bulldozer based Opterons use the same architecture so that should be a good test.
I'll run one overnight.

Quote:
 With the warnings happening after the PC has run for 9 hours and 50 minutes and having run the exact same string previously without issue, I'm still thinking this might be a P95 issue.
Nope. There have been numerous reports of systems that last far longer before spitting out an error. All it means is that the system is really, really close to being prime95 stable.

Quote:
 When I have tested with extreme overclocking on this and other systems, and there really was an error, it would list the error, i.e. ".5 returned instead of >.4", as a typical example.
Yes, the ILLEGAL SUMOUT failure mode is rare.

Quote:
 I think most folks would conclude that 100% load on all 8 cores for 9 hours and 50 minutes is a stable PC but with many other FX/Bulldozer/Vishera PC owners not being able to run P95 for more than a few minutes at the default CPU frequency with no overclocking, it makes you wonder what exactly is happening.
10 hours without error is a stable PC for most everyday tasks. Although, I wouldn't do serious distributed computing work on such a machine.

Reports of stress test failures at stock speed is not at all uncommon. Usually its a memory problem, but ever since AMD put their memory controller on chip many of their CPUs fail at stock speed. IMO, AMD quality control was not very good a few years ago. Maybe its better these days, I don't know.

Quote:
 After some searches here I found the V27.7 P95 thread and noticed that there is a V27.9 and that some folks had issues with HT on V27.7. The suggestion was to run one thread per core, but that defeats the point of stress testing, IMO.
For stress testing purposes, 27.7 and 27.9 are equivalent. You are correctly running 8 stress test threads.

 2013-01-28, 05:12 #16 sdbardwick     Aug 2002 North San Diego County 68210 Posts FWIW, the 26.6 test has been running for 6+ hours without error using 8 threads (1 socket).
2013-01-28, 23:55   #17
Jorge

Jan 2013

118 Posts

Quote:
 Originally Posted by Prime95 I'll run one overnight.
Excellent, all information is appreciated!

Quote:
 Originally Posted by Prime95 Reports of stress test failures at stock speed is not at all uncommon. Usually its a memory problem, but ever since AMD put their memory controller on chip many of their CPUs fail at stock speed. IMO, AMD quality control was not very good a few years ago. Maybe its better these days, I don't know.
With all due respect, I totally disagree with you about AMD CPU quality. Over the past 20 years I have built a lot of PC's professionally and most were AMD. I have never had an AMD PC that had any IMC or CPU issues -ever. AMD's IMC may not be able to run RAM at as high a frequency as Intel IMC's when overclocked, but that doesn't mean they don't work just fine and reliably at the AMD specified frequencies. In comparison it is documented that Intel has shipped millions of defective CPUs, chipsets, mobos and SSDs. AMD has not shipped any defective products that I am aware of - ever. Some 40 years ago AMD also manufactured Intel's CPUs for them...

I hope that with more P95 testing we can determine if there is or is not an issue running P95 on the Bulldozer/Vishera architecture CPUs. I noticed that the Opteron models are pretty low frequency so they are likely to run fewer tests in a given period of time than the FX processors, so longer run times may be required?

2013-01-29, 02:46   #18
Prime95
P90 years forever!

Aug 2002
Yeehaw, FL

22·17·109 Posts

I'm at 20 hours of Bulldozer stress testing. 16 threads, small in-place FFTs. Linux version 27.7. No problems.

At this point, I'd say that there isn't a software problem with the prime95 stress test. The major difference between the two setups is the hardware and the OS. If it is a hardware problem, then the error should go away if you drop the CPU and memory clock rate significantly. If it is an OS issue, you might try installing linux and running the stress test (yes, I know that would be a pain!). Alternatively, you could call one-prime95-error-every-10-hours stable enough.

I'll continue the Bulldozer stress test for another day. Unfortunately, I can't run it under Windows.

Quote:
 Originally Posted by Jorge With all due respect, I totally disagree with you about AMD CPU quality.
That's OK. Your opinion and mine are both subjective and based on rather small data sets. I've not built any AMD systems (owned two). My opinion is based on a surge of I'm-running-at-stock-speed-your-program-must-be-broken complaints when hypertransport first came out.

Last fiddled with by Prime95 on 2013-01-29 at 03:21

 2013-01-29, 03:15 #19 LaurV Romulan Interpreter     Jun 2011 Thailand 11×853 Posts [edit: obviously my post is not addressed to George, he posted in between, as it took me a while to finish this, between job tasks]. Opinions vary. You are entitled to yours. We hope you are not one of those AMD trolls, we had many here in the past. We hope you are able think objectively, and not influenced by heart. This is not meant to be an insult. I have very good friends which I would classify as "AMD trolls". We meet around a beer bottle and talk technical things sometimes. They used AMD in the past (me too), at the time we were students, and AMD was cheaper, for about the same performance. They felt in love with it, and later stuck to first love. The things changed, there are many years since Intel outperforms AMD at every point, quality, performance, performance/watt/buck, reliability, IPC (instructions per clock cycle), whatever. But for all those features, you have to pay more money. You should not compare Buldozer with i7 on DP floats calculus and this kind of stuff where AMD sucks. They are targeting different markets. If AMD did not recall CPUs it does not mean they don't have defects, but it may be they care less about the customers, or have a different policy. The defect rate is exactly the same for both Intel and AMD, the silicium chips are quite mature and stable medium, they all go 100 ppm to 200 ppm (parts per million) defect rate, etc, and for the number of transistors they have, about one in 150 CPUs are deffect. They (both!) still sell those like lower end, either with a core cut out, with some memory speed locked, bla bla. Trust me, I work in an electronic factory (some people here know me). There is no "better", it only depends on your preference, target applications, and budget. For LL testing, well, Intel is better. It took a while to convince my friends. Guess what, they are now convinced that Intel is better, but they still use AMD, because "Intel need competition" That I would call an "AMD troll". My friends knows that I call them such. It is a "friendly" call (and don't ask how they call me, or how we call each-other sometimes, that is what friends are for, isn't it?)... [edit2, after reading George's post: from Prime95 (the program) stress.txt file, last paragraph, last FAQ: Code: Q) A forum member said "Don't bother with prime95, it always pukes on me, and my system is stable!. What do you make of that?" or "We had a server at work that ran for 2 MONTHS straight, without a reboot I installed Prime95 on it and ran it - a couple minutes later I get an error. You are going to tell me that the server wasn't stable?" A) These users obviously do not subscribe to the 100% rock solid school of thought. THEIR MACHINES DO HAVE HARDWARE PROBLEMS. But since they are not presently running any programs that reveal the hardware problem, the machines are quite stable. As long as these machines never run a program that uncovers the hardware problem, then the machines will continue to be stable. end of edit2] Last fiddled with by LaurV on 2013-01-29 at 03:45
2013-01-29, 03:24   #20
Jorge

Jan 2013

32 Posts

Quote:
 Originally Posted by Prime95 I'm at 20 hours of Bulldozer stress testing. 16 threads, small in-place FFTs. Linux version 27.7. No problems. At this point, I'd say that there isn't a software problem with the prime95 stress test. The major difference between the two setups is the hardware and the OS. If it is a hardware problem, then the error should go away if you drop the CPU and memory clock rate significantly. If it is an OS issue, you might try installing linux and running the stress test (yes, I know that would be a pain!). Alternatively, you could call one-prime95-error-every-10-hours stable enough. I'll continue the Bulldozer stress test for another day. Unfortunately, I can't run it under Windows.
I think we need to run the Bulldozer/Vishera architecture CPUs under Windows to have a more realistic understanding if there is an issue as that's what I and most other enthusiasts are using, (even though I'd much prefer to not be using Windoze). Obviously a sample run on only 1-2 CPUs may not turn up any issues but if they do then we know to investigate further.

2013-01-29, 03:37   #21
Xyzzy

"Mike"
Aug 2002

8,053 Posts

Quote:
 If it is an OS issue, you might try installing linux and running the stress test (yes, I know that would be a pain!).
No pain if you use a Linux LiveCD or USB dealio. Just wget the client and rock and roll.

Code:
wget http://www.mersenneforum.org/gimps/p95v279.linux64.tar.gz
gzip -d p95v279.linux64.tar.gz
tar -xvf p95v279.linux64.tar
./mprime -m
The only thing scary about trying a different operating system is the possibility that it gives the same error and proves that your hardware is not 100% reliable.

But, if it passes, that narrows down the possible issues. For example, what happens when you run the torture test in safe mode in Windows?

http://windows.microsoft.com/en-US/w...r-in-safe-mode

By eliminating a slew of drivers and programs you can eliminate variables.

FWIW, any error or warning is unacceptable to us, so we would not rest until the issue was resolved. And we would explore every possible angle to simplify the challenge.

2013-01-29, 03:38   #22
Jorge

Jan 2013

32 Posts

Quote:
 Originally Posted by LaurV Opinions vary. You are entitled to yours. BIG SNIP That I would call an "AMD troll". My friends knows that I call them such. It is a "friendly" call (and don't ask how they call me, or how we call each-other sometimes, that is what friends are for, isn't it?)...

NO that's not what friends are for.

BTW, I find your post TOTALLY INAPPROPRIATE, condescending, insulting, ignorant and technically incorrect in so many ways I won't even waste my time responding to such fanbois foolishness - and your post is complete OFF TOPIC.

Catch my drift?

The only reason I replied to Prime95's AMD comment was because his perception is completely inaccurate - as is 90% of what you stated as "facts", when it's your subjective opinion, unlike the Intel shipments of defective products, which are documented. Concluding that the issue I am seeing is likely a result of AMD"s perceived QC issues would be wrong as there is no basis for this belief.

Since AMD hasn't shipped any defective products that I am aware of, there is no reason for them to have a recall. You can be damn certain that if they did ship defective products, they would be forced to recall them as Intel was, but since AMD didn't ship any, there were no recalls. There does not appear to be any objective statistical or scientific bases for this myth about AMD quality issues.

Your post was of absolutely NO VALUE to this thread or the testing of V27.7 on Bulldozer/Vishera architecture CPUs running under Windoze.

If you have nothing constructive to contribute to this thread, please stay out of the thread. There are other areas of the forum if you want to talk crap over a few beers with your Bros.

My post is meant to be a constructive response to your inappropriate comment/trolling, so I hope you take it in that spirit.

Last fiddled with by Jorge on 2013-01-29 at 03:52

 Similar Threads Thread Thread Starter Forum Replies Last Post ramgeis PrimeNet 2 2013-06-09 23:53 EdH Factoring 4 2010-01-01 19:52 fivemack Msieve 1 2009-03-21 14:26 edron1011 Software 0 2008-11-21 15:46 Prime95 Lounge 7 2006-10-31 05:40

All times are UTC. The time now is 11:07.

Tue Apr 20 11:07:01 UTC 2021 up 12 days, 5:47, 0 users, load averages: 1.38, 1.77, 1.86