mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   Msieve (https://www.mersenneforum.org/forumdisplay.php?f=83)
-   -   Help resuming linear algebra step (https://www.mersenneforum.org/showthread.php?t=25011)

lavalamp 2019-12-07 18:48

Help resuming linear algebra step
 
So, my machine had been running linear algebra for six and a half hours, with only 5 minutes to go I think, "It's close enough, I'll play some counter-strike while it finishes." I load the game and ... of course I get a blue screen.

No worries, there are checkpoints right? So I run msieve -s test.dat -l test.log -i test.ini -nf test.fb -t 4 -nc2 again and:[CODE]commencing linear algebra
read 3880716 cycles
cycles contain 12919660 unique relations
read 0 relations
error: cannot locate relation 44290431[/CODE]Oh.

And it wipes the test.dat.mat file for good measure. Luckily I'd made a backup of everything first.

Then I see there are TWO checkpoint files, test.dat.chk and test.dat.bak.chk. So I try using the second checkpoint file and get the same error.

Is there anything I can do to rescue the process at this point without running the entire linear algebra step again?

EdH 2019-12-07 19:09

[QUOTE=lavalamp;532285]So, my machine had been running linear algebra for six and a half hours, with only 5 minutes to go I think, "It's close enough, I'll play some counter-strike while it finishes." I load the game and ... of course I get a blue screen.

No worries, there are checkpoints right? So I run msieve -s test.dat -l test.log -i test.ini -nf test.fb -t 4 -nc2 again and:[CODE]commencing linear algebra
read 3880716 cycles
cycles contain 12919660 unique relations
read 0 relations
error: cannot locate relation 44290431[/CODE]Oh.

And it wipes the test.dat.mat file for good measure. Luckily I'd made a backup of everything first.

Then I see there are TWO checkpoint files, test.dat.chk and test.dat.bak.chk. So I try using the second checkpoint file and get the same error.

Is there anything I can do to rescue the process at this point without running the entire linear algebra step again?[/QUOTE]If you have a backup of everything, try running -ncr instead of -nc2. That is the resume command for an interrupted process. -nc2 tells it to start nc2 over from the beginning.

lavalamp 2019-12-07 19:13

[QUOTE=EdH;532287]If you have a backup of everything, try running -ncr instead of -nc2. That is the resume command for an interrupted process. -nc2 tells it to start nc2 over from the beginning.[/QUOTE]Ah yes, that did the trick thank-you.

Seems a bit dangerous that nc2 deletes past progress then fails with an error instead of checking first to be honest. I'm very glad I made a manual backup first.

EdH 2019-12-07 19:30

[QUOTE=lavalamp;532288]Ah yes, that did the trick thank-you.

Seems a bit dangerous that nc2 deletes past progress then fails with an error instead of checking first to be honest. I'm very glad I made a manual backup first.[/QUOTE]
Glad to hear it worked. Yeah, other programs like ggnfs and YAFU refuse to run if they find partial results. I have suffered the loss you (temporarily) experienced. I try to remember the "r" these days.

jasonp 2019-12-09 04:23

The code doesn't know the difference between running in a directory with checkpoint files you don't want versus continuing a job with checkpoint files you do want. It would be a nice feature to encode a fingerprint of the factorization into the temporary files generated.


All times are UTC. The time now is 09:54.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.