20070729, 19:14  #1 
(loop (#_fork))
Feb 2006
Cambridge, England
2·3,191 Posts 
More relations mean many more relations wanted
I'm sieving F1009 (211digit SNFS, sextic polynomial), and running msieve1.24 roughly daily over the assembled relations to see how well the combining is going.
On Friday, I had Code:
Fri Jul 27 18:54:31 2007 restarting with 32493323 relations Fri Jul 27 19:05:24 2007 found 4771661 duplicates and 27764624 unique relations Fri Jul 27 20:06:12 2007 begin with 12233478 relations and 11229792 unique ideals Fri Jul 27 20:06:31 2007 reduce to 6552267 relations and 5186708 ideals in 28 passes Fri Jul 27 20:06:31 2007 max relations containing the same ideal: 22 Fri Jul 27 20:06:32 2007 filtering wants 10465698 more relations Code:
Sat Jul 28 19:25:30 2007 restarting with 34035873 relations Sat Jul 28 19:30:53 2007 found 5159303 duplicates and 28919535 unique relations Sat Jul 28 19:34:53 2007 28919535 relations and about 27621466 large ideals Sat Jul 28 19:47:55 2007 begin with 13632186 relations and 12318283 unique ideals Sat Jul 28 19:48:16 2007 reduce to 8092201 relations and 6448331 ideals in 24 passes Sat Jul 28 19:48:16 2007 max relations containing the same ideal: 26 Sat Jul 28 19:48:17 2007 filtering wants 9652074 more relations But on Sunday, I have Code:
Sun Jul 29 11:42:26 2007 restarting with 34819039 relations Sun Jul 29 11:47:48 2007 found 5359550 duplicates and 29502453 unique relations Sun Jul 29 11:51:47 2007 29502453 relations and about 26566521 large ideals Sun Jul 29 12:02:48 2007 begin with 9635998 relations and 10746313 unique ideals Sun Jul 29 12:02:54 2007 reduce to 3411684 relations and 2494517 ideals in 15 passes Sun Jul 29 12:02:54 2007 max relations containing the same ideal: 21 Sun Jul 29 12:02:54 2007 filtering wants 11843580 more relations I'm using my usual reaction to odd things happening in the filtering, namely sieving more; I'll try reprocessing Tuesday morning with another few million relations. 
20070731, 01:09  #2  
Tribal Bullet
Oct 2004
6716_{8} Posts 
Quote:
This behavior is very strange; between the current filtering run and the previous, adding 700k relations causes a huge number of relations to get pruned as singletons but doesn't cause a corresponding drop in the number if large ideals. The number of pruning passes has dropped by a lot as well, so I wonder if you caught the dataset at the point of the explosion. What this seems to mean is that there's a huge bubble of singleton ideals that are concentrated in a very few relations, so as more relations get added the bubble dissipates and you get a surge in the amount of excess for the matrix. I vaguely remember my experiments with 3LP QS behaved this way, though in that case the drop in excess was very mild compared to this. Last fiddled with by jasonp on 20070731 at 01:10 

20070803, 19:48  #3 
(loop (#_fork))
Feb 2006
Cambridge, England
14356_{8} Posts 
This is very odd indeed: it's happened again. With 41.3M relations, msieve built a matrix (OK, a large matrix, 3817432 x 3822767 of weight 333867858). I was still sieving so didn't let msieve run.
With 42.6M relations, msieve reduces to 6311580 relations and 4771775 ideals in 20 passes, and asks for another 10081533 relations. It's 5GB of relations so will fit compressed on one DVD if you want to have a look yourself; I'm going to try to write my own code for uniqify and singletonremoval, and see what msieve makes of the results of that. 
20070803, 20:11  #4  
Tribal Bullet
Oct 2004
2·3·19·31 Posts 
Quote:
Once you complete the factorization, it would also be interesting to modify the msieve code to stop reading after X relations (the changes are easy, I can send you a patch), then run a script that looks at the amount of excess over time, maybe every 200k relations. It's more likely there's a bug in the filtering. 

20070803, 21:53  #5 
(loop (#_fork))
Feb 2006
Cambridge, England
6382_{10} Posts 
I'm not very keen on running procrels/matbuild on sets of data this large, since the performance of procrels is roughly quadratic (read N processed files, process file M, output N+1 data files, read N+1 data files, process file M+1, ...) and it's almost always crashed, corrupting the data files, after twenty million relations.
I've now written and run my own uniqing and singleton removal  I'm lucky enough to have 4G of memory and plenty of disc space, so can do uniqing simply by setting up a C++ map<pair<int,int>,int>, reading through the file compiling a count of occurences of each [A,B] pair, then read through it again decreasing the counter and outputting the line to another file only if the counter is 1. This got from 42.6M relations down to 32.427M, in about 2GB of memory because C++ maps are scarcely memoryefficient. For singleton removal, I'm not bothering with the idealtheory, just reading the file once to set up another map<> of counters for the occurrence of each prime on the rational and algebraic sides, then reading it through again and outputting lines to another file only if all the primes on both sides occurred at least twice in the first file. This takes about 1.5G memory. The 32.427M relations have 18.209M different primes on the rational side and 8.043M different primes on the algebraic side; one pass of my removal gets it to 25.576M relations, 10.781M/6.228M; a second gets to 24.283M and at this point I hand over to msieve. msieve1.26 finds no duplicates (fortunate!), and is happily building a matrix as I type, so I suspect there's something screwy in the singletonremoval pass. Do you still want the data and my tiny tools for dealing with it? Last fiddled with by fivemack on 20070803 at 21:56 
20070803, 23:27  #6 
Jun 2003
The Texas Hill Country
3^{2}·11^{2} Posts 
Tom,
If you would like some assistance in tracking down "the problem", you could send me a copy on your relations and I will see what I can learn by converting the relations to CWI format and processing them with a different set of tools. Unfortunately, I will not be able to do any such processing until September because I will not have access to the Internet for most of the month of August. I will make a few "observations": Eliminating "obvious" singletons on the basis of primes, rather than ideals is something that we routinely do in processing NFSNet data sets once we have the "final" set of raw relations. In fact, I often perform one such pass before I even worry about eliminating duplicates from line sieving relations. It is easy to prove that such an elimination is not a "lossy" simplification step. If we can transform the relation data into a format the creates one canonical text line for each relation, we can utilize standard Unix utilities such as "sort" and "uniq" to simplify the process of removing duplicates. This allows the relations to be processed in batches and then efficiently merged during the process of creating a file of unique relations. 
20070804, 15:59  #7  
Tribal Bullet
Oct 2004
2×3×19×31 Posts 
Quote:
Quote:
I thought about singleton removal starting with the primes only and then switching to removal by ideal, but decided against it because it was yet another singleton removal pass and it can't be as memoryefficient as using a nonperfecthashtable on the collection of ideals. 

20070804, 17:32  #8  
Jun 2003
The Texas Hill Country
3^{2}×11^{2} Posts 
Quote:
The candidate elements can be determined much more rapidly since we do not need to compute the ideals. Since we end up discarding a significant fraction of the relations, this can make a noticeable difference in the execution speed of that first pass. Although it is becoming less important as more of us gain access to systems with a large amount of RAM, the storage of the primes takes less space than a comparable storage of ideals. Since singleton removal needs to be done repeatedly (removing a batch of singletons is likely to reduce some additional ideals to singleton status), I find that using an initial pass based on just the primes is an effective way to quickly reduce the working set of relations. Last fiddled with by Wacky on 20070804 at 17:33 

Thread Tools  
Similar Threads  
Thread  Thread Starter  Forum  Replies  Last Post 
Congruence relations  Lee Yiyuan  Miscellaneous Math  7  20120508 12:55 
Keeping relations  fivemack  Factoring  1  20090126 17:49 
So, HOW many more relations does it want?  Andi47  Msieve  6  20081211 12:39 
Compressing relations  fivemack  Factoring  5  20071016 15:50 
How many relations do we need for 2 ^ 713 1 ?  thomasn  NFSNET Discussion  10  20030711 19:26 