mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   Software (https://www.mersenneforum.org/forumdisplay.php?f=10)
-   -   remote chance of a problem? (https://www.mersenneforum.org/showthread.php?t=4951)

nomadicus 2005-11-07 03:50

remote chance of a problem?
 
Is there even the remotest chance that v24.14 on an AMD X2 (I have a 4400+) dual-core, that passed several 12 hour torture tests with ease can have a illegal sumout problem?

I switched back to v23.8.1 and cannot reproduce the problem.

I switch back again to v24.14 and can reproduce the problem fairly easily: it takes 3 or 4 start/stops to get it to fail.

Thoughts?


edit: clarification

JHagerson 2005-11-07 05:13

You might have found a bug in Prime95. Could you please post more details? What exponent are you testing? Are you performing an LL test? How many iterations occur before failure? Specifically what chip are you using? We need gory details like code name, speed, size of caches, and all of that.

With some additional information, Dr. Woltman (Prime95) can try to reproduce the problem.

Mystwalker 2005-11-07 11:00

It could also be the case that 24.14 produces significantly more heat and thus increases the chance of a failure in borderline cases.

[QUOTE=JHagerson]Dr. Woltman (Prime95)[/QUOTE]

AFAIK, it's "only" Mr. - but what's in a name? :wink:

garo 2005-11-07 11:04

v24.14 is more efficient and thus stresses your computer more. So it is entirely likely that a borderline stable system works with v23.8 but not with v24.14.

ewmayer 2005-11-07 19:19

What exponent are you testing, Nomadicus?

nomadicus 2005-11-07 21:40

The rig:
DFI LP nF4 SLI-DR
AMD 64 X2 4400+ (2.2GHz dual-core, L2 cache 1MB each core)
2x1GB OCZ 2-3-2-5 Titanium
eVGA 7800GTX
RAID1+0 4x74GB Raptors
Maxtor 250GB (16MB cache)
NEC 3540A CD/DVD burner
Enermax 600Watt Noisetaker
Lian-Li PC-V1000B case

Although this is a benchmarking (hence the RAID1+0) and gaming rig, I don't have it overclocked; I had a mild overclock going at one time but have since returned everything to its stock settings. At first I thought the video drivers may have been a problem. But I since confirmed they are not. All BIOS, drivers, etc. are up to date and are stable as far as reports on the web on concerned.

I do not run prime95 while gaming or benchmarking. After testing hardware/playing, I restart both instances of prime95 and get the error right away. That is how I noticed it.

This is what I see. This happens only when I stop/continue. I allow a few minutes between each stop/contine cycle so the CPU is cool when I continue. I've never seen it happen several minutes/hours after I continue. Once it restarts successfully, it can run for days without an error.

This rig has passed several (i.e., three or four I forget) 12 hour torture tests at both OC and stock settings.

Test=30322213,68,1 has affinity set to CPU 0.
Test=30322363,68,1 has affinity set to CPU 1.

I am presently checking out the memory more closely and will let you know if I find a problem with it . . . but I have a feeling the memory is okay. We'll see.

nomadicus 2005-11-07 22:19

Results.txt for CPU0

[Sun Nov 06 18:02:29 2005]
Iteration: 2/30322213, ERROR: ILLEGAL SUMOUT
Possible hardware failure, consult the readme.txt file.
Continuing from last save file.
[Sun Nov 06 22:46:54 2005]
Iteration: 53720/30322213, ERROR: ILLEGAL SUMOUT
Possible hardware failure, consult the readme.txt file.
Continuing from last save file.
[Sun Nov 06 22:52:02 2005]
Iteration: 53720/30322213, ERROR: ILLEGAL SUMOUT
Possible hardware failure, consult the readme.txt file.
Continuing from last save file.
[Sun Nov 06 22:57:09 2005]
Iteration: 53720/30322213, ERROR: ILLEGAL SUMOUT
Possible hardware failure, consult the readme.txt file.
Continuing from last save file.



---------------------
Results.txt for CPU1

[Sun Nov 06 18:02:41 2005]
Iteration: 2/30322363, ERROR: ILLEGAL SUMOUT
Possible hardware failure, consult the readme.txt file.
Continuing from last save file.

Mystwalker 2005-11-08 00:13

Could you check the CPU temperature while running both versions?

nomadicus 2005-11-08 04:05

[QUOTE=Mystwalker]Could you check the CPU temperature while running both versions?[/QUOTE]Idle 33-35C, with both torture tests at 43-44C.

nomadicus 2005-11-11 13:46

Update:
Fails immediately 1 out of 15 (roughly) times when I restart (I do a stop a few minutes before).
I am trying 24.13. I'll see it creates the same symptoms.
I am going to do 48 hour torture test this weekend using 24.14.

Suggestions welcome.

garo 2005-11-11 17:15

Did you run memtest86?

nomadicus 2005-11-17 05:04

The images below are 1600x1200. Didn't think about the size until it was too late; shrinking them makes the text unreadable.

This link shows the error as repeatable by just doing a start/stop.
[url="http://www.eqsrecording.com/rig/prime95err.jpg"]http://www.eqsrecording.com/rig/prime95err.jpg[/url]

This is a two day torture test.
[url="http://www.eqsrecording.com/rig/prime95tst1.jpg"]http://www.eqsrecording.com/rig/prime95tst1.jpg[/url]

I'll be doing a memtest86 over the coming weekend. If doesn't do anything, I'll do another torture test with the fsb upped from 200MHz to 209MHz.

Very strange.

nomadicus 2005-12-03 15:35

Two day memtst w FSB slightly overclocked to better flush out weaknesses.
[url="http://www.eqsrecording.com/rig/x2-memtst-01.jpg"]http://www.eqsrecording.com/rig/x2-memtst-01.jpg[/url]
memtst 59 hours passed.

So here is the deal.

I stop then start prime95 v24.14 and it fails once in a while. Same for v24.13. (all hardware settings are at stock).
I stop then start prime95 v23.8 and it never has failed.

Two dual torture tests over 48 hours pass. One test is "blend", the second is "Small FFT's" at 58 hours.

memtst passed.

I don't feel good about running V24.14 so I am running 23.8.

Anyother thoughts on this matter?
Anyone else with an x2 willing to try this? I have nearly 200 hours in testing so this is not for the faint of heart.

nomadicus 2005-12-16 05:40

Solved.

Not prime95, but requires an AMD dual-core driver.

Description:
Stopping (Test > Stop) prime95 manually then restarting it (Test > Continue) would cause an "ERROR: ILLEGAL SUMOUT" once in a while (perhaps 1 out of 10 times).

Solution:
Download the latest version of "AMD Athlon™ 64 X2 Dual Core Processor Driver for Windows XP and Windows Server 2003 Version (exe)" from here: [url="http://www.amd.com/us-en/Processors/TechnicalResources/0,,30_182_871_13118,00.html"]http://www.amd.com/us-en/Processors/TechnicalResources/0,,30_182_871_13118,00.html[/url].
This driver does more than manage Cool'n'Quiet even if you have disabled Cool'n'Quiet in your motherboard BIOS.

Make sure affinity is set for each instance of prime95 and the stop/restart of prime95 should be fine.


All times are UTC. The time now is 14:03.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.