View Single Post
Old 2004-10-29, 14:51   #1
S00113
 
S00113's Avatar
 
Dec 2003

23·33 Posts
Default More RHEL WS 3.0 bugs?

On many (>10) prevoiusly error free machines running RedHat Enterprise Linux WS 3.0, I have started seeing ROUND OFF and SUM(INPUTS) != SUM(OUTPUTS) errors. Sometimes they even start to loop forever like this:
[pre]
[Sat Oct 23 22:22:24 2004]
Iteration: 3705014/12654503, ERROR: ROUND OFF (0.40625) > 0.40
Continuing from last save file.
[Sat Oct 23 22:45:15 2004]
Disregard last error. Result is reproducible and thus not a hardware problem.
For added safety, redoing iteration using a slower, more reliable method.
Continuing from last save file.
Iteration: 3705014/12654503, ERROR: SUM(INPUTS) != SUM(OUTPUTS), 2.9482982079801
73e+17 != -455.9915635528858
Possible hardware failure, consult the readme.txt file.
Continuing from last save file.
[...]
[Fri Oct 29 16:05:00 2004]
Iteration: 3705014/12654503, ERROR: ROUND OFF (0.40625) > 0.40
Continuing from last save file.
Disregard last error. Result is reproducible and thus not a hardware problem.
For added safety, redoing iteration using a slower, more reliable method.
Continuing from last save file.
Iteration: 3705014/12654503, ERROR: SUM(INPUTS) != SUM(OUTPUTS), 2.9482982079801
73e+17 != -455.9915635528858
Possible hardware failure, consult the readme.txt file.
Continuing from last save file.
[Fri Oct 29 16:10:12 2004]
Iteration: 3705014/12654503, ERROR: ROUND OFF (0.40625) > 0.40
Continuing from last save file.
Disregard last error. Result is reproducible and thus not a hardware problem.
For added safety, redoing iteration using a slower, more reliable method.
Continuing from last save file.
Iteration: 3705014/12654503, ERROR: SUM(INPUTS) != SUM(OUTPUTS), 2.9482982079801
73e+17 != -455.9915635528858
Possible hardware failure, consult the readme.txt file.
Continuing from last save file.
[/pre]
I have some save files which reproduce the loop, in case anyone are interested.

This problem has been occuring a lot lately. I can not reproduce errors on the machines with mprime -t whne the machines are idle, so I suspect a faulty driver not restoring FP context properly. Do anyone else have this problem on RHEL?
S00113 is offline   Reply With Quote