mersenneforum.org More relations mean many more relations wanted
 Register FAQ Search Today's Posts Mark Forums Read

2007-07-29, 19:14   #1
fivemack
(loop (#_fork))

Feb 2006
Cambridge, England

2·3,191 Posts
More relations mean many more relations wanted

I'm sieving F1009 (211-digit SNFS, sextic polynomial), and running msieve-1.24 roughly daily over the assembled relations to see how well the combining is going.

Code:
Fri Jul 27 18:54:31 2007  restarting with 32493323 relations
Fri Jul 27 19:05:24 2007  found 4771661 duplicates and 27764624 unique relations
Fri Jul 27 20:06:12 2007  begin with 12233478 relations and 11229792 unique ideals
Fri Jul 27 20:06:31 2007  reduce to 6552267 relations and 5186708 ideals in 28 passes
Fri Jul 27 20:06:31 2007  max relations containing the same ideal: 22
Fri Jul 27 20:06:32 2007  filtering wants 10465698 more relations
Code:
Sat Jul 28 19:25:30 2007  restarting with 34035873 relations
Sat Jul 28 19:30:53 2007  found 5159303 duplicates and 28919535 unique relations
Sat Jul 28 19:34:53 2007  28919535 relations and about 27621466 large ideals
Sat Jul 28 19:47:55 2007  begin with 13632186 relations and 12318283 unique ideals
Sat Jul 28 19:48:16 2007  reduce to 8092201 relations and 6448331 ideals in 24 passes
Sat Jul 28 19:48:16 2007  max relations containing the same ideal: 26
Sat Jul 28 19:48:17 2007  filtering wants 9652074 more relations
so it looked as if the relation-explosion was imminent.

But on Sunday, I have
Code:
Sun Jul 29 11:42:26 2007  restarting with 34819039 relations
Sun Jul 29 11:47:48 2007  found 5359550 duplicates and 29502453 unique relations
Sun Jul 29 11:51:47 2007  29502453 relations and about 26566521 large ideals
Sun Jul 29 12:02:48 2007  begin with 9635998 relations and 10746313 unique ideals
Sun Jul 29 12:02:54 2007  reduce to 3411684 relations and 2494517 ideals in 15 passes
Sun Jul 29 12:02:54 2007  max relations containing the same ideal: 21
Sun Jul 29 12:02:54 2007  filtering wants 11843580 more relations
which looks as if the filtering has taken a very large step backwards. I've attached the whole msieve.log file; the automatically-determined parameters don't seem to have changed much between Saturday and Sunday. Any idea what's going on here?

I'm using my usual reaction to odd things happening in the filtering, namely sieving more; I'll try reprocessing Tuesday morning with another few million relations.
Attached Files
 msieve-f1009.log.txt (65.2 KB, 177 views)

2007-07-31, 01:09   #2
jasonp
Tribal Bullet

Oct 2004

67168 Posts

Quote:
 Originally Posted by fivemack which looks as if the filtering has taken a very large step backwards. I've attached the whole msieve.log file; the automatically-determined parameters don't seem to have changed much between Saturday and Sunday. Any idea what's going on here?
Greg Childers needed 45M relations to complete a C211 SNFS without incident, but that figure includes a fair amount of oversieving, so you should be close.

This behavior is very strange; between the current filtering run and the previous, adding 700k relations causes a huge number of relations to get pruned as singletons but doesn't cause a corresponding drop in the number if large ideals. The number of pruning passes has dropped by a lot as well, so I wonder if you caught the dataset at the point of the explosion. What this seems to mean is that there's a huge bubble of singleton ideals that are concentrated in a very few relations, so as more relations get added the bubble dissipates and you get a surge in the amount of excess for the matrix. I vaguely remember my experiments with 3LP QS behaved this way, though in that case the drop in excess was very mild compared to this.

Last fiddled with by jasonp on 2007-07-31 at 01:10

 2007-08-03, 19:48 #3 fivemack (loop (#_fork))     Feb 2006 Cambridge, England 143568 Posts This is very odd indeed: it's happened again. With 41.3M relations, msieve built a matrix (OK, a large matrix, 3817432 x 3822767 of weight 333867858). I was still sieving so didn't let msieve run. With 42.6M relations, msieve reduces to 6311580 relations and 4771775 ideals in 20 passes, and asks for another 10081533 relations. It's 5GB of relations so will fit compressed on one DVD if you want to have a look yourself; I'm going to try to write my own code for uniqify and singleton-removal, and see what msieve makes of the results of that.
2007-08-03, 20:11   #4
jasonp
Tribal Bullet

Oct 2004

2·3·19·31 Posts

Quote:
 Originally Posted by fivemack This is very odd indeed: it's happened again. With 41.3M relations, msieve built a matrix (OK, a large matrix, 3817432 x 3822767 of weight 333867858). I was still sieving so didn't let msieve run. With 42.6M relations, msieve reduces to 6311580 relations and 4771775 ideals in 20 passes, and asks for another 10081533 relations. It's 5GB of relations so will fit compressed on one DVD if you want to have a look yourself; I'm going to try to write my own code for uniqify and singleton-removal, and see what msieve makes of the results of that.
Go ahead and mail the DVD, I'll take a look at the dataset. Maybe you can get a quicker second opinion by running procrels and matbuild from GGNFS on the set of relations. I think you'd have to run matbuild to execute the singleton removal, but it would be interesting to see how many duplicates and singletons other tools find.

Once you complete the factorization, it would also be interesting to modify the msieve code to stop reading after X relations (the changes are easy, I can send you a patch), then run a script that looks at the amount of excess over time, maybe every 200k relations. It's more likely there's a bug in the filtering.

 2007-08-03, 21:53 #5 fivemack (loop (#_fork))     Feb 2006 Cambridge, England 638210 Posts I'm not very keen on running procrels/matbuild on sets of data this large, since the performance of procrels is roughly quadratic (read N processed files, process file M, output N+1 data files, read N+1 data files, process file M+1, ...) and it's almost always crashed, corrupting the data files, after twenty million relations. I've now written and run my own uniqing and singleton removal - I'm lucky enough to have 4G of memory and plenty of disc space, so can do uniqing simply by setting up a C++ map,int>, reading through the file compiling a count of occurences of each [A,B] pair, then read through it again decreasing the counter and outputting the line to another file only if the counter is 1. This got from 42.6M relations down to 32.427M, in about 2GB of memory because C++ maps are scarcely memory-efficient. For singleton removal, I'm not bothering with the ideal-theory, just reading the file once to set up another map<> of counters for the occurrence of each prime on the rational and algebraic sides, then reading it through again and outputting lines to another file only if all the primes on both sides occurred at least twice in the first file. This takes about 1.5G memory. The 32.427M relations have 18.209M different primes on the rational side and 8.043M different primes on the algebraic side; one pass of my removal gets it to 25.576M relations, 10.781M/6.228M; a second gets to 24.283M and at this point I hand over to msieve. msieve-1.26 finds no duplicates (fortunate!), and is happily building a matrix as I type, so I suspect there's something screwy in the singleton-removal pass. Do you still want the data and my tiny tools for dealing with it? Last fiddled with by fivemack on 2007-08-03 at 21:56
 2007-08-03, 23:27 #6 Wacky     Jun 2003 The Texas Hill Country 32·112 Posts Tom, If you would like some assistance in tracking down "the problem", you could send me a copy on your relations and I will see what I can learn by converting the relations to CWI format and processing them with a different set of tools. Unfortunately, I will not be able to do any such processing until September because I will not have access to the Internet for most of the month of August. I will make a few "observations": Eliminating "obvious" singletons on the basis of primes, rather than ideals is something that we routinely do in processing NFSNet data sets once we have the "final" set of raw relations. In fact, I often perform one such pass before I even worry about eliminating duplicates from line sieving relations. It is easy to prove that such an elimination is not a "lossy" simplification step. If we can transform the relation data into a format the creates one canonical text line for each relation, we can utilize standard Unix utilities such as "sort" and "uniq" to simplify the process of removing duplicates. This allows the relations to be processed in batches and then efficiently merged during the process of creating a file of unique relations.
2007-08-04, 15:59   #7
jasonp
Tribal Bullet

Oct 2004

2×3×19×31 Posts

Quote:
 Originally Posted by fivemack I've now written and run my own uniqing and singleton removal - I'm lucky enough to have 4G of memory and plenty of disc space, so can do uniqing simply by setting up a C++ map,int>, reading through the file compiling a count of occurences of each [A,B] pair, then read through it again decreasing the counter and outputting the line to another file only if the counter is 1.
Wow, that was fast!
Quote:
 msieve-1.26 finds no duplicates (fortunate!), and is happily building a matrix as I type, so I suspect there's something screwy in the singleton-removal pass. Do you still want the data and my tiny tools for dealing with it?
Rats! Yes, please send the data and your tools, this sounds like something I have to resolve. Do you end up with a huge amount of excess that indicates a lot of oversieving? I'm reasonably confident there's nothing wrong with the duplicate removal, it's pretty simple compared to the singleton removal.

I thought about singleton removal starting with the primes only and then switching to removal by ideal, but decided against it because it was yet another singleton removal pass and it can't be as memory-efficient as using a non-perfect-hashtable on the collection of ideals.

2007-08-04, 17:32   #8
Wacky

Jun 2003
The Texas Hill Country

32×112 Posts

Quote:
 Originally Posted by jasonp I thought about singleton removal starting with the primes only and then switching to removal by ideal, but decided against it because it was yet another singleton removal pass and it can't be as memory-efficient as using a non-perfect-hashtable on the collection of ideals.
There are two advantages in doing some initial singleton removal on primes rather than ideals.

The candidate elements can be determined much more rapidly since we do not need to compute the ideals. Since we end up discarding a significant fraction of the relations, this can make a noticeable difference in the execution speed of that first pass.

Although it is becoming less important as more of us gain access to systems with a large amount of RAM, the storage of the primes takes less space than a comparable storage of ideals.

Since singleton removal needs to be done repeatedly (removing a batch of singletons is likely to reduce some additional ideals to singleton status), I find that using an initial pass based on just the primes is an effective way to quickly reduce the working set of relations.

Last fiddled with by Wacky on 2007-08-04 at 17:33

 Similar Threads Thread Thread Starter Forum Replies Last Post Lee Yiyuan Miscellaneous Math 7 2012-05-08 12:55 fivemack Factoring 1 2009-01-26 17:49 Andi47 Msieve 6 2008-12-11 12:39 fivemack Factoring 5 2007-10-16 15:50 thomasn NFSNET Discussion 10 2003-07-11 19:26

All times are UTC. The time now is 03:31.

Sat Feb 27 03:31:22 UTC 2021 up 85 days, 23:42, 0 users, load averages: 1.99, 2.02, 1.92