Xyzzy 2020-10-17 21:21

lanczos error: submatrix is not invertible
[CODE]Sat Oct 17 12:07:31 2020 lanczos error: submatrix is not invertible
Sat Oct 17 12:07:31 2020 lanczos halted after 180668 iterations (dim = 11424815)
Sat Oct 17 12:07:31 2020 linear algebra failed, aborting[/CODE]We had this error today after we paused (CTRL+Z) the job and then resumed it. The error caused the job to stop.

We were able to restart the job so hopefully the work is intact?


jasonp 2020-10-20 18:21

How long between stop and resume? Also is VBITS set to > 64?

The error check in the linear algebra will always fail if you create checkpoints within three iterations of each other. I wonder if fast-forwarding the wall-clock time caused a checkpoint too early...

Xyzzy 2020-10-21 13:03

[QUOTE=jasonp;560445]How long between stop and resume?[/QUOTE]Maybe an hour?

[QUOTE=jasonp;560445]Also is VBITS set to > 64?[/QUOTE]It is whatever the default is.

Do you think the job is corrupted? We are at 44% with ~350 hours to go.


jasonp 2020-10-21 17:13

If you restarted from checkpoint and it got past the failure point then you should hopefully be able to finish. Maybe the failure wasn't related to getting suspended. Cosmic ray? Memory corruption?

RichD 2020-10-21 18:46

I've had that error once or twice. Restarting from a checkpoint completed with no further problems. I wrote it off as old hardware or thermal issue.

