mersenneforum.org ggnfs sieving yield wierdness
 Register FAQ Search Today's Posts Mark Forums Read

 2009-05-07, 06:50 #1 jrk     May 2008 3×5×73 Posts ggnfs sieving yield wierdness Has anyone noticed this? Using gnfs-lasieve4I13e siever*, sieving smaller ranges in separate instances causes the siever to find more relations than if one big range is sieved. This is confirmed by re-sieving over a small range at the end of the original range and observing that new relations are found that are missing in the original output. In the case shown in the graph below, this increase is about 25%, but the increase seems to be related to the length of the original range. So what's happening? Is this expected? Does it have anything to do with duplicates being found? * tested with both the regular and "experimental" versions. Attached Thumbnails   Last fiddled with by jrk on 2009-05-07 at 07:03
 2009-05-07, 07:08 #2 smh     "Sander" Oct 2002 52.345322,5.52471 29·41 Posts I think this is because you're probably sieving below the factor base. The siever automatically lowers the factor base for you. If you're resieving the first part of that range, you probably won't notice any difference.
2009-05-07, 07:11   #3
Andi47

Oct 2004
Austria

2×17×73 Posts

Quote:
 Originally Posted by jrk Has anyone noticed this? Using gnfs-lasieve4I13e siever*, sieving smaller ranges in separate instances causes the siever to find more relations than if one big range is sieved. This is confirmed by re-sieving over a small range at the end of the original range and observing that new relations are found that are missing in the original output. In the case shown in the graph below, this increase is about 25%, but the increase seems to be related to the length of the original range. So what's happening? Is this expected? Does it have anything to do with duplicates being found? * tested with both the regular and "experimental" versions.
Is this from the c138 of the 4th aliquot team sieve? When you start sieving, alim (algebraic factor base bound, IIRC, which is 9M in the input file) will be set to (start of the range -1). With alim as low as e.g. 3.2M, the yield will drop quickly with growing special Q.
When you do everything in one big range, you will still sieve with alim=3.2M for Q=5M. When you do many small ranges, alim will always be "just below Q", thus yielding more relations.

2009-05-07, 07:16   #4
jrk

May 2008

3×5×73 Posts

Quote:
 Originally Posted by smh I think this is because you're probably sieving below the factor base. The siever automatically lowers the factor base for you.
You're right, the limit is automatically lowered. Shouldn't the limit then be kept in step with the current Q being tested? Maybe that can be a RFE for the maintainers?

Over a big range like that above, the difference can be significant!

Quote:
 If you're resieving the first part of that range, you probably won't notice any difference.
You are right again. Like I mentioned, retesting smaller ranges shows that the difference increases as the original range increases. The difference increases gradually and is largest at the end of the original range tested.

2009-05-07, 07:21   #5
jrk

May 2008

3×5×73 Posts

Quote:
 Originally Posted by Andi47 Is this from the c138 of the 4th aliquot team sieve? When you start sieving, alim (algebraic factor base bound, IIRC, which is 9M in the input file) will be set to (start of the range -1). With alim as low as e.g. 3.2M, the yield will drop quickly with growing special Q. When you do everything in one big range, you will still sieve with alim=3.2M for Q=5M. When you do many small ranges, alim will always be "just below Q", thus yielding more relations.
You are exactly right.

I think the alim/rlim should be increased continually with the Q being tested, so that relations are not "lost." Is that possible to do in the program? Serge?

A workaround would obviously be to split up all ranges below alim/rlim into smaller chunks and merging them afterward.

 2009-05-07, 07:47 #6 smh     "Sander" Oct 2002 52.345322,5.52471 22458 Posts Just wondering. How did you create the graph?
2009-05-07, 08:09   #7
jrk

May 2008

3·5·73 Posts

Quote:
 Originally Posted by smh Just wondering. How did you create the graph?
cat relations.out | sed -e 's/.*,//g' | ./graph1.pl

graph11.pl is just a hack-ish perl script I put together:

Code:
#!/usr/bin/perl

$start = 3200000;$end = 5000000;
$inc = 5000;$steps = ($end -$start) / $inc; @y = (); while(<STDIN>) { chomp; #input is represented in hexadecimal$x = hex;
if($x >=$start && $x <=$end)
{
$x = int(($x - $start) /$inc);

$y[$x]++;
}
else
{
}
}
for($i = 0;$i < $steps;$i++)
{
$p =$start + $i *$inc;
print $p . ' ' .$y[\$i] . "\n";
}
then put the output in openoffice calc to make the graph (overkill...).

I made the assumption that the Q value is at the end of the relation line. But actually it is not always. So I throw away the lines where the last value is out of the range, and if the value is in range I assume it is a Q value. Figured it was close enough for my purposes anyway.

 2009-05-07, 09:00 #8 smh     "Sander" Oct 2002 52.345322,5.52471 29×41 Posts Thanks. I'm new to Calc and i can't seem to get a nice looking graph in calc. What type of graph do you use? Also, the x and y axes are always the other way around.
 2009-05-07, 09:15 #9 jrk     May 2008 3×5×73 Posts I used an "area" graph. And you need to check "first column as label" under "data range".
2009-05-07, 09:54   #10
Batalov

"Serge"
Mar 2008
Phi(4,2^7658614+1)/2

59·163 Posts

Quote:
 Originally Posted by jrk You are exactly right. I think the alim/rlim should be increased continually with the Q being tested, so that relations are not "lost." Is that possible to do in the program? Serge? A workaround would obviously be to split up all ranges below alim/rlim into smaller chunks and merging them afterward.
Well, this thing is rooted deep. Changing alim (or rlim) doens't merely change a number or two, -- no, it affects many (practically all?) core structures of the program, so for the purposes of the siever (as it is now) the FB_limits (also known as rlim/alim) are constants and set up once. Re-writing around this paradigm is hard and prone to error. Not just a few lines. (There are many places in the code where no asserts are made, for the speed sake. It is brittle in this sense.)

The workaround is indeed what everyone is doing: small ranges. but not too small, because when the siever starts, it spends some time preparing all the above mentioned housekeeping items. This overhead will negate all potential gains, if the intended region is cut too fine. Cut in 10 ranges or so and you will do well.

2009-05-07, 17:41   #11
jrk

May 2008

3×5×73 Posts

Quote:
 Originally Posted by Batalov Well, this thing is rooted deep. [...]
I suspected as much. Thanks for the info.

 Similar Threads Thread Thread Starter Forum Replies Last Post ray10may Msieve 1 2017-04-14 14:19 ryanp Factoring 9 2013-04-03 06:33 R.D. Silverman Cunningham Tables 44 2010-10-21 14:32 nuggetprime Factoring 5 2007-06-04 14:42 delta_t Hardware 3 2004-06-24 13:35

All times are UTC. The time now is 15:35.

Fri Dec 3 15:35:04 UTC 2021 up 133 days, 10:04, 0 users, load averages: 0.88, 1.13, 1.25