mersenneforum.org Let's fix the elusive siever bug
 User Name Remember Me? Password
 Register FAQ Search Today's Posts Mark Forums Read

 2008-10-17, 20:41 #1 Batalov     "Serge" Mar 2008 Phi(4,2^7658614+1)/2 32×17×61 Posts Let's fix the elusive siever bug Well, many of you heard of this bug. A few probably had it happen to them. It happens rarely but deadly. The siever enters infinite loop (doesn't crash) and can stay in it for hours or days. I've now caught and singled out a particular case. It is very small and that may help debugging. Description: gnfs-lasieve4INNe enters infinite loop. Debug case: bug.poly: Code: n: 75229857442823839063327954585610815916921620664868103784648473223197175129674845956583458783493459542969045721078101701111280306855293361344591 m: 1000000000000000000000000000000000 c5: 1300 c0: 23 skew: 0.45 type: snfs rlim: 3050000 alim: 5500000 lpbr: 27 lpba: 27 mfbr: 54 mfba: 54 rlambda: 2.6 alambda: 2.6 # # here's the command-line to get into infinite loop # # gnfs-lasieve4I13e -r bug.poly -f 3089717 -c 100 -o test.out # gnfs-lasieve4I12e -r bug.poly -f 3089717 -c 100 -o test.out # I will debug with my meager tools, too. Just wanted to share the debug case. Greg, Paul, guys? --Serge
 2008-10-17, 22:57 #2 fivemack (loop (#_fork))     Feb 2006 Cambridge, England 2·3,191 Posts It looks as if the problem's in the 'reduce2' routine; just waiting for it to freeze and breaking in with gdb always gives me a spot in reduce2. Specifically, it looks as if calling reduce2 with a0=3089729 b0=0 a1=1550102 b1=1 sigma=0.2025 hangs; it gets into a cycle Code: s=-28413.312500 a0sq=34032648.000000 a1sq=56826.562500 s=28413.312500 a0sq=34032648.000000 a1sq=56826.562500 because s/a1sq is calculated as 0.500001 and rounded up to +1, so it takes one step in that direction, at which point s/a1sq becomes -0.500001 and is rounded down to -1, so it takes one step back again. This may be very dependent on the exact compile options used; the binary I'm working with uses SSE instructions for all the floating-point arithmetic. reduce2 is not a routine called very often (once per special-Q) so the right answer is probably to do it in exact mpz_t/mpz_t rational arithmetic; but replacing 'float' with 'double' throughout (except in the function signature) at least gives a version which runs through your bugged case. Thanks very much for the test case! Last fiddled with by fivemack on 2008-10-17 at 22:59
2008-10-18, 00:54   #3
Batalov

"Serge"
Mar 2008
Phi(4,2^7658614+1)/2

32·17·61 Posts

P.S. Remarkably, just now, this otherwise uneventful factorization ended with a fiasco (51 failed sqrts), and I am now (short of any other reasons) thinking that any relations that were produced while (or just before) the siever hung were somehow poisonous to the matrix. BL finished in due time with 51 nontrivial dependencies that turned out to be rather trivial (~160 cycles in each). See the log, attached.

Interesting! (maybe a coincidence of course)

I am sieving some more and will report if/when I will get to understand it better. Maybe the factors will shed the light on why it is unstable in reduce2 and in later in algebra.
Attached Files
 ggnfs14447_167.zip (15.2 KB, 86 views)

2008-10-18, 02:59   #4
Batalov

"Serge"
Mar 2008
Phi(4,2^7658614+1)/2

100100011101012 Posts

Quote:
 Originally Posted by fivemack because s/a1sq is calculated as 0.500001 and rounded up to +1, so it takes one step in that direction, at which point s/a1sq becomes -0.500001 and is rounded down to -1, so it takes one step back again.
Right. But even using double will not fix the problem (with s/a1sq = 0.5 precisely it will cycle forever). Without even trying to understand what reduce2() does, I simply added a reasonable cycle bailout value; the usual reports from the siever show "NNN Special q, MMMM reduction iterations" where MMMM/NNN is never more than 10; so...
Code:
//replace
for(;;) {
n_iter++;
}
//by
i32_t iter;
for(iter=0; iter<1024; iter++) {
}
n_iter += iter;
I'll commit this to GGNFS SVN for the sake of sanity.
This can most possibly be improved (but not with mpz's - because sigma=skew² is not integer anyway).

--Serge

 2008-10-18, 12:43 #5 joral     Mar 2008 5·11 Posts I've been doing some experimenting with this since last night, and it strikes me that it is basically the extended euclidean algorithm used in some way to find a short vector pair for the lattice. I know when doing this other ways, I've normally rounded down rather than to nearest. Using rint() causes the calculation to cycle between these two points. Code: iter a0 b0 a1 b1 k ... 114126, -3149, 10913, -198, -295, -1 114127, -3347, 10618, -198, -295, 1 ... using floor(), it terminates here Code: iter a0 b0 a1 b1 k ... 12, 3347, -10618, 198, 295, 1 ... If that iteration count means something, this may be another way to keep it from getting stuck, but still keep a fully meaningful iteration count.

 Thread Tools

 Similar Threads Thread Thread Starter Forum Replies Last Post R.D. Silverman Factoring 31 2018-10-08 17:17 chris2be8 Factoring 6 2018-02-06 17:22 VBCurtis Factoring 6 2018-01-24 11:06 pepi37 Software 7 2015-07-10 04:42 schickel Factoring 12 2010-09-30 08:44

All times are UTC. The time now is 03:27.

Sun Mar 7 03:27:47 UTC 2021 up 93 days, 23:39, 0 users, load averages: 2.91, 1.81, 1.55

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.