mersenneforum.org  

Go Back   mersenneforum.org > Other Stuff > Archived Projects > NFSNET Discussion

 
 
Thread Tools
Old 2007-04-26, 14:54   #1
R.D. Silverman
 
R.D. Silverman's Avatar
 
Nov 2003

22×5×373 Posts
Default LA Failure

The linear algebra for 5,423+ finished last night.

It failed. The results was "too many orthogonal vectors".

I need to rebuild the matrix and try again.

Meanwhile 5,423- is about 2/3 sieved.
R.D. Silverman is offline  
Old 2007-05-23, 13:01   #2
R.D. Silverman
 
R.D. Silverman's Avatar
 
Nov 2003

22·5·373 Posts
Default

Quote:
Originally Posted by R.D. Silverman View Post
The linear algebra for 5,423+ finished last night.

It failed. The results was "too many orthogonal vectors".

I need to rebuild the matrix and try again.

Meanwhile 5,423- is about 2/3 sieved.
The LA for 5,423+ just failed again for the 3rd time.

I had rebuilt the matrix. This one was only 3.6M rows.

I am going to try recompiling all the code and try again.

I suspect memory problems, but diags turned up nothing.
I am going to try a different diagnostic suite. I am also going to
re-seat the memory DIMMS.

5,423- has finished sieving, and 2,1794M is in progress.
R.D. Silverman is offline  
Old 2007-05-23, 13:37   #3
R.D. Silverman
 
R.D. Silverman's Avatar
 
Nov 2003

1D2416 Posts
Default

Quote:
Originally Posted by R.D. Silverman View Post
The LA for 5,423+ just failed again for the 3rd time.

I had rebuilt the matrix. This one was only 3.6M rows.

I am going to try recompiling all the code and try again.

I suspect memory problems, but diags turned up nothing.
I am going to try a different diagnostic suite. I am also going to
re-seat the memory DIMMS.

5,423- has finished sieving, and 2,1794M is in progress.
Although it is almost surely futile, I am also going to try building the
matrix on a different machine. Even if the matrix somehow has the
"wrong" bits lit, it still has a null space and hence the LA should still
produce a solution, even if it is wrong.

If anyone else has some ideas, I would love to hear them.
R.D. Silverman is offline  
Old 2007-05-23, 15:36   #4
wblipp
 
wblipp's Avatar
 
"William"
May 2003
New Haven

23·103 Posts
Default

Quote:
Originally Posted by R.D. Silverman View Post
I suspect memory problems, but diags turned up nothing. I am going to try a different diagnostic suite. I am also going to
re-seat the memory DIMMS.
Many people have found memory problems using the Prime95 torture test that did not show up on any diagnostic suite.
wblipp is offline  
Old 2007-05-23, 16:41   #5
ewmayer
2ω=0
 
ewmayer's Avatar
 
Sep 2002
República de California

32·1,297 Posts
Default

Bob, if your code doesn't already have such an expedient, Is there any kind of simple checksum you can incorporate into the matrix-building step, to make sure that all the 1s and 0s that go in come out right in the final matrix? I know diddly-squat about NFS LA, but have found before-and-after checksums invaluable in my own work, when it comes to tracking down weird memory-corruption and compiler bugs.

Also, Is there anything further that can be gleaned from the "too many orthogonal vectors" diagnostic? That would seem (assuming the diagnostic itself it working as intended) to say something about the actual contents (viewed at large) of the final matrix, wouldn't it? Could diagnosing which vectors are orthogonal to which -- or adding diagnostic to flag ones which are orthogonal to some greater-than-expected fraction of their brethren (if there is some applicable threshold to be applied in this regard) -- be helpful?
ewmayer is offline  
Old 2007-05-23, 16:48   #6
xilman
Bamboozled!
 
xilman's Avatar
 
"𒉺𒌌𒇷𒆷𒀭"
May 2003
Down not across

73·151 Posts
Default

Quote:
Originally Posted by R.D. Silverman View Post
If anyone else has some ideas, I would love to hear them.
If all else fails, ship the data to me. Either I can build the matrix from your relations or I can run the matrix you build, whichever you prefer.

I realise that this only gives you factors and not solve your underlying problem, but at least you get the factors.

Paul
xilman is offline  
Old 2007-05-23, 17:20   #7
R.D. Silverman
 
R.D. Silverman's Avatar
 
Nov 2003

22·5·373 Posts
Default

Quote:
Originally Posted by xilman View Post
If all else fails, ship the data to me. Either I can build the matrix from your relations or I can run the matrix you build, whichever you prefer.

I realise that this only gives you factors and not solve your underlying problem, but at least you get the factors.

Paul
During one LA run, I also encountered the following failure:

It reported that the 'left' matrix somehow contained a row of all 0's.
(this is within a 64x64 block )

A second try with the same exact matrix did not yield this error!!!!!
(which, more than anything else, is what makes me suspect a memory problem)

running along OK, then, WHAM:

Smask[64400] = FFFFFFFFFFFFFFFF (64 entries out of 64).
Writing checkpoint 322 to checkfile.even after 64400 iterations.
Checkpoint successfully written at Tue Apr 24 08:52:36 2007
Smask[64429] = FFFFFFFFFFFFFFF2 (61 entries out of 64).
Smask[64434] = FFFFFFFFFFFFFFD5 (61 entries out of 64).
Smask[64445] = FFFFFFFFFFFFFF79 (61 entries out of 64).
Smask[64472] = FFFFFFFFFFFFFFEC (61 entries out of 64).
leftmat row 0 = 0000000000000000
(0000000000000000000000000000000000000000000000000000000000000000)
leftmat row 1 = 0000000000000003
(1100000000000000000000000000000000000000000000000000000000000000)
leftmat row 2 = 0000000000000004
(0010000000000000000000000000000000000000000000000000000000000000)
leftmat row 3 = 0000000000000009
(1001000000000000000000000000000000000000000000000000000000000000)
leftmat row 4 = 0000000000000010
(0000100000000000000000000000000000000000000000000000000000000000)
leftmat row 5 = 0000000000000021
(1000010000000000000000000000000000000000000000000000000000000000)
leftmat row 6 = 0000000000000040
(0000001000000000000000000000000000000000000000000000000000000000)
leftmat row 7 = 0000000000000080
(0000000100000000000000000000000000000000000000000000000000000000)
leftmat row 8 = 0000000000000101
(1000000010000000000000000000000000000000000000000000000000000000)
leftmat row 9 = 0000000000000200
(0000000001000000000000000000000000000000000000000000000000000000)
leftmat row 10 = 0000000000000400
(0000000000100000000000000000000000000000000000000000000000000000)
leftmat row 11 = 0000000000000801
(1000000000010000000000000000000000000000000000000000000000000000)
leftmat row 12 = 0000000000001001
(1000000000001000000000000000000000000000000000000000000000000000)
leftmat row 13 = 0000000000002000
(0000000000000100000000000000000000000000000000000000000000000000)
leftmat row 14 = 0000000000004000
(0000000000000010000000000000000000000000000000000000000000000000)
leftmat row 15 = 0000000000008001
(1000000000000001000000000000000000000000000000000000000000000000)
leftmat row 16 = 0000000000010000
(0000000000000000100000000000000000000000000000000000000000000000)
leftmat row 17 = 0000000000020001
(1000000000000000010000000000000000000000000000000000000000000000)
leftmat row 18 = 0000000000040001
(1000000000000000001000000000000000000000000000000000000000000000)
leftmat row 19 = 0000000000080001
(1000000000000000000100000000000000000000000000000000000000000000)
leftmat row 20 = 0000000000100001
(1000000000000000000010000000000000000000000000000000000000000000)
leftmat row 21 = 0000000000200001
(1000000000000000000001000000000000000000000000000000000000000000)
leftmat row 22 = 0000000000400001
(1000000000000000000000100000000000000000000000000000000000000000)
leftmat row 23 = 0000000000800001
(1000000000000000000000010000000000000000000000000000000000000000)
leftmat row 24 = 0000000001000000
(0000000000000000000000001000000000000000000000000000000000000000)
leftmat row 25 = 0000000002000000
(0000000000000000000000000100000000000000000000000000000000000000)
leftmat row 26 = 0000000004000000
(0000000000000000000000000010000000000000000000000000000000000000)
leftmat row 27 = 0000000008000001
(1000000000000000000000000001000000000000000000000000000000000000)
leftmat row 28 = 0000000010000000
(0000000000000000000000000000100000000000000000000000000000000000)
leftmat row 29 = 0000000020000001
(1000000000000000000000000000010000000000000000000000000000000000)
leftmat row 30 = 0000000040000001
(1000000000000000000000000000001000000000000000000000000000000000)
leftmat row 31 = 0000000080000000
(0000000000000000000000000000000100000000000000000000000000000000)
leftmat row 32 = 0000000100000000
(0000000000000000000000000000000010000000000000000000000000000000)
leftmat row 33 = 0000000200000001
(1000000000000000000000000000000001000000000000000000000000000000)
leftmat row 34 = 0000000400000000
(0000000000000000000000000000000000100000000000000000000000000000)
leftmat row 35 = 0000000800000001
(1000000000000000000000000000000000010000000000000000000000000000)
leftmat row 36 = 0000001000000001
(1000000000000000000000000000000000001000000000000000000000000000)
leftmat row 37 = 0000002000000001
(1000000000000000000000000000000000000100000000000000000000000000)
leftmat row 38 = 0000004000000001
(1000000000000000000000000000000000000010000000000000000000000000)
leftmat row 39 = 0000008000000000
(0000000000000000000000000000000000000001000000000000000000000000)
leftmat row 40 = 0000010000000001
(1000000000000000000000000000000000000000100000000000000000000000)
leftmat row 41 = 0000020000000001
(1000000000000000000000000000000000000000010000000000000000000000)
leftmat row 42 = 0000040000000001
(1000000000000000000000000000000000000000001000000000000000000000)
leftmat row 43 = 0000080000000000
(0000000000000000000000000000000000000000000100000000000000000000)
leftmat row 44 = 0000100000000001
(1000000000000000000000000000000000000000000010000000000000000000)
leftmat row 45 = 0000200000000000
(0000000000000000000000000000000000000000000001000000000000000000)
leftmat row 46 = 0000400000000000
(0000000000000000000000000000000000000000000000100000000000000000)
leftmat row 47 = 0000800000000000
(0000000000000000000000000000000000000000000000010000000000000000)
leftmat row 48 = 0001000000000001
(1000000000000000000000000000000000000000000000001000000000000000)
leftmat row 49 = 0002000000000001
(1000000000000000000000000000000000000000000000000100000000000000)
leftmat row 50 = 0004000000000000
(0000000000000000000000000000000000000000000000000010000000000000)
leftmat row 51 = 0008000000000001
(1000000000000000000000000000000000000000000000000001000000000000)
leftmat row 52 = 0010000000000000
(0000000000000000000000000000000000000000000000000000100000000000)
leftmat row 53 = 0020000000000001
(1000000000000000000000000000000000000000000000000000010000000000)
leftmat row 54 = 0040000000000001
(1000000000000000000000000000000000000000000000000000001000000000)
leftmat row 55 = 0080000000000000
(0000000000000000000000000000000000000000000000000000000100000000)
leftmat row 56 = 0100000000000000
(0000000000000000000000000000000000000000000000000000000010000000)
leftmat row 57 = 0200000000000000
(0000000000000000000000000000000000000000000000000000000001000000)
leftmat row 58 = 0400000000000000
(0000000000000000000000000000000000000000000000000000000000100000)
leftmat row 59 = 0800000000000000
(0000000000000000000000000000000000000000000000000000000000010000)
leftmat row 60 = 1000000000000000
(0000000000000000000000000000000000000000000000000000000000001000)
leftmat row 61 = 2000000000000001
(1000000000000000000000000000000000000000000000000000000000000100)
leftmat row 62 = 4000000000000000
(0000000000000000000000000000000000000000000000000000000000000010)
leftmat row 63 = 8000000000000001
(1000000000000000000000000000000000000000000000000000000000000001)
rightmat row 0 = B551DFBCC1FC724E
(0111001001001110001111111000001100111101111110111000101010101101)
rightmat row 1 = 2E86BDF45F01DFE2
(0100011111111011100000001111101000101111101111010110000101110100)
rightmat row 2 = 4A00E8614598E587
(1110000110100111000110011010001010000110000101110000000001010010)
rightmat row 3 = E73FC9093A15F99A
(0101100110011111101010000101110010010000100100111111110011100111)
rightmat row 4 = 4EAEF0619ED59A43
(1100001001011001101010110111100110000110000011110111010101110010)
rightmat row 5 = D090E6779FD0A416
(0110100000100101000010111111100111101110011001110000100100001011)
rightmat row 6 = 70F689B23572365A
(0101101001101100010011101010110001001101100100010110111100001110)
rightmat row 7 = F01EA383991F50C0
(0000001100001010111110001001100111000001110001010111100000001111)
rightmat row 8 = 3854053E0E0F5E7F
(1111111001111010111100000111000001111100101000000010101000011100)
rightmat row 9 = 496B69E987F563BF
(1111110111000110101011111110000110010111100101101101011010010010)
rightmat row 10 = 748F98575E5DC603
(1100000001100011101110100111101011101010000110011111000100101110)
rightmat row 11 = A17B78D36ED29C40
(0000001000111001010010110111011011001011000111101101111010000101)
rightmat row 12 = 268C210564FF386D
(1011011000011100111111110010011010100000100001000011000101100100)
rightmat row 13 = 78F2818FE5464FAA
(0101010111110010011000101010011111110001100000010100111100011110)
rightmat row 14 = C5D138FD1D93D12D
(1011010010001011110010011011100010111111000111001000101110100011)
rightmat row 15 = 143C239A8079704F
(1111001000001110100111100000000101011001110001000011110000101000)
rightmat row 16 = 3F08E64960EDD79A
(0101100111101011101101110000011010010010011001110001000011111100)
rightmat row 17 = DC867D3BFE6E37FF
(1111111111101100011101100111111111011100101111100110000100111011)
rightmat row 18 = 00547643F5CBA461
(1000011000100101110100111010111111000010011011100010101000000000)
rightmat row 19 = 077021E4A5D19F56
(0110101011111001100010111010010100100111100001000000111011100000)
rightmat row 20 = 2EFDE2104EAC3F4B
(1101001011111100001101010111001000001000010001111011111101110100)
rightmat row 21 = 55EA41179DD59892
(0100100100011001101010111011100111101000100000100101011110101010)
rightmat row 22 = 0FAB496F4369B4A2
(0100010100101101100101101100001011110110100100101101010111110000)
rightmat row 23 = F3EFEA60533B50E6
(0110011100001010110111001100101000000110010101111111011111001111)
rightmat row 24 = FCEBB778B702260B
(1101000001100100010000001110110100011110111011011101011100111111)
rightmat row 25 = 86FD3C5DA02E7D74
(0010111010111110011101000000010110111010001111001011111101100001)
rightmat row 26 = 3518376EF0C20D38
(0001110010110000010000110000111101110110111011000001100010101100)
rightmat row 27 = CB8167E6F82671CB
(1101001110001110011001000001111101100111111001101000000111010011)
rightmat row 28 = 785F28F2B558DDD1
(1000101110111011000110101010110101001111000101001111101000011110)
rightmat row 29 = 6343D68EEEE77677
(1110111001101110111001110111011101110001011010111100001011000110)
rightmat row 30 = AA69B789732FAFFF
(1111111111110101111101001100111010010001111011011001011001010101)
rightmat row 31 = 2022FFA859C0E65D
(1011101001100111000000111001101000010101111111110100010000000100)
rightmat row 32 = 6C52693A42677EBC
(0011110101111110111001100100001001011100100101100100101000110110)
rightmat row 33 = 6921EA456C729191
(1000100110001001010011100011011010100010010101111000010010010110)
rightmat row 34 = 026ED92E488631CF
(1111001110001100011000010001001001110100100110110111011001000000)
rightmat row 35 = F7CF70711045E9DA
(0101101110010111101000100000100010001110000011101111001111101111)
rightmat row 36 = 6E4AF86DE434C7B0
(0000110111100011001011000010011110110110000111110101001001110110)
rightmat row 37 = 74ACE4C30232D08F
(1111000100001011010011000100000011000011001001110011010100101110)
rightmat row 38 = CAE48D5CE6DB9922
(0100010010011001110110110110011100111010101100010010011101010011)
rightmat row 39 = 4240B4E45F1ADC61
(1000011000111011010110001111101000100111001011010000001001000010)
rightmat row 40 = 3CA6B8DDFB90CB56
(0110101011010011000010011101111110111011000111010110010100111100)
rightmat row 41 = 43F3945C726D1359
(1001101011001000101101100100111000111010001010011100111111000010)
rightmat row 42 = 23477D02B103E095
(1010100100000111110000001000110101000000101111101110001011000100)
rightmat row 43 = 0C7CF01335D078ED
(1011011100011110000010111010110011001000000011110011111000110000)
rightmat row 44 = 8B4B1CB8100046C0
(0000001101100010000000000000100000011101001110001101001011010001)
rightmat row 45 = A628DAC1B761439D
(1011100111000010100001101110110110000011010110110001010001100101)
rightmat row 46 = 7EC876C7CE194ED1
(1000101101110010100110000111001111100011011011100001001101111110)
rightmat row 47 = B1986EB24683125D
(1011101001001000110000010110001001001101011101100001100110001101)
rightmat row 48 = C06BBDE825D4AFBF
(1111110111110101001010111010010000010111101111011101011000000011)
rightmat row 49 = 0159EEC1200A60E5
(1010011100000110010100000000010010000011011101111001101010000000)
rightmat row 50 = DFF2C5AABB947EBF
(1111110101111110001010011101110101010101101000110100111111111011)
rightmat row 51 = 4E0C169AAEE7498C
(0011000110010010111001110111010101011001011010000011000001110010)
rightmat row 52 = A6E30783180C342E
(0111010000101100001100000001100011000001111000001100011101100101)
rightmat row 53 = F7B2C5C03AEE7D4C
(0011001010111110011101110101110000000011101000110100110111101111)
rightmat row 54 = A1553DF984BA6192
(0100100110000110010111010010000110011111101111001010101010000101)
rightmat row 55 = 79E51CD4CA0E063C
(0011110001100000011100000101001100101011001110001010011110011110)
rightmat row 56 = 52080CF24A070EE5
(1010011101110000111000000101001001001111001100000001000001001010)
rightmat row 57 = E33C76DC6AD9101E
(0111100000001000100110110101011000111011011011100011110011000111)
rightmat row 58 = 7DE83B07C8977B92
(0100100111011110111010010001001111100000110111000001011110111110)
rightmat row 59 = 61B69195F051C873
(1100111000010011100010100000111110101001100010010110110110000110)
rightmat row 60 = AC1B846CB2B19343
(1100001011001001100011010100110100110110001000011101100000110101)
rightmat row 61 = 1B1991656A69AE33
(1100110001110101100101100101011010100110100010011001100011011000)
rightmat row 62 = 1A978A3D90A08D99
(1001100110110001000001010000100110111100010100011110100101011000)
rightmat row 63 = 179CC416D47ADB51
(1000101011011011010111100010101101101000001000110011100111101000)
R.D. Silverman is offline  
Old 2007-05-23, 18:00   #8
ewmayer
2ω=0
 
ewmayer's Avatar
 
Sep 2002
República de California

32·1,297 Posts
Default

So the question is, is this a hardware or software memory corruption problem?

I'd suggest running some program like Purify on your code, but if the problem is not in any way reproducible from one run to the next, this could be a waste of time.

I assume you have carefully scanned your build-time output for compile warnings about uninitialized memory? Have you built on multiple platforms/compilers? Some are better at catching uninitialized-memory issues at build time (obviously preferable to doing a time-consuming Purify run with a nonpredictably reproducible issue like you seem to have) than others.

But wait - your second post of the thread says it failed 3 times in a row -- so is the error reproducible in this case, or not?
ewmayer is offline  
Old 2007-05-23, 19:43   #9
xilman
Bamboozled!
 
xilman's Avatar
 
"𒉺𒌌𒇷𒆷𒀭"
May 2003
Down not across

73×151 Posts
Default

Quote:
Originally Posted by R.D. Silverman View Post
During one LA run, I also encountered the following failure:

It reported that the 'left' matrix somehow contained a row of all 0's.
(this is within a 64x64 block )

A second try with the same exact matrix did not yield this error!!!!!
(which, more than anything else, is what makes me suspect a memory problem)
Such errors have also been seen with overheated and/or broken cpu or motherboards.

Is the temperature ok? Are all the fans spinning properly? Remember to check the chipset fan(s) if any.

Can you pull the memory from that system and put it into another one so that the latter has enough to be able to complete the matrix?

Sometimes re-seating any add-on cards curesstrange problems. For example, my Athlon Linux box hangs every now and again. It's not done so for weeks but previously it would sometimes hang several times a day. I've long suspected the ethernet card but have never been able to prove it. On the other hand, no hangs have occurred after reseating all the cards.


Paul
xilman is offline  
Old 2007-05-23, 20:59   #10
Xyzzy
 
Xyzzy's Avatar
 
Aug 2002

100000100110112 Posts
Default

Quote:
I need to rebuild the matrix and try again.
Quote:
The first matrix I designed was quite naturally perfect. It was a work of art. Flawless. Sublime. A triumph only equaled by its monumental failure.
Xyzzy is offline  
Old 2007-05-23, 21:53   #11
paulunderwood
 
paulunderwood's Avatar
 
Sep 2002
Database er0rr

32×19×23 Posts
Default

Basic hardware checks:

BIOS: all voltages readings
BIOS: all temperature readings (idle CPU at 45C and no hotter than 15C over the motherboard)
BIOS: CPU timings
BIOS: all memory timings
BIOS: miscellaneous settings such "spread spectrum"
H/W: Clean fans, heatsink
H/W: connections for mainboard, cards, disks and memory

Software checks:
memtest86+
prime95 torture tests

Operating System checks:
Filing system: "chkdsk" (Win); or "fsck" and "badblocks" (Linux; RTFM)
Defrag (Win)
Update operating system
Virus
Firewall
Spyware
Other malware
Network device security

Application checks:
Re-install
Re-compile
Try on another box

Try to shield from cosmic rays

Last fiddled with by paulunderwood on 2007-05-23 at 22:23
paulunderwood is offline  
 

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
Assertion failure in 6.4.2 bsquared GMP-ECM 4 2013-03-01 15:52
Power Supply Failure flashjh Hardware 11 2013-02-16 15:49
NEW USER - HARDWARE FAILURE - PLEASE HELP Cosmo Hardware 45 2005-10-17 10:00
What does this failure indication mean? krunsj Hardware 5 2004-07-17 16:09
Failure Functioins Unregistered Miscellaneous Math 0 2004-02-12 11:51

All times are UTC. The time now is 18:33.


Sun Nov 28 18:33:41 UTC 2021 up 128 days, 13:02, 0 users, load averages: 0.88, 1.00, 1.07

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.