20071016, 12:52  #1 
(loop (#_fork))
Feb 2006
Cambridge, England
2×7×461 Posts 
Compressing relations
I think I can get latticesiever output compressed to four bytes per relation without inordinate difficulty (you need only represent AB pairs; figure out what lattice the first six relations lie on by LLL, compress them to coordinates with respect to the basis; these will by construction normally be in the [16384,16384] x [0,16384] range; when the coordinates wrt that lattice stop being integral go and check for a different lattice).
A little extra generality does the same for linesiever output [sort by B then A, Huffmancompress differences between As and have a token for 'go back to start of previous line'] Would anyone be interested in an actual implementation of this? The compressor gets me to implement LLL, the decompressor is an interesting exercise in batch factorisation, but it's enough work that I'd quite like to know if anyone would actually use it. I suppose it saves only bandwidth and disc space, neither of which are expensive nowadays. 
20071016, 13:08  #2  
"Bob Silverman"
Nov 2003
North of Boston
7506_{10} Posts 
Quote:
An interesting idea, but probably not worth doing. If you want to save space, then output the data in binary, not ASCII. 

20071016, 13:21  #3 
(loop (#_fork))
Feb 2006
Cambridge, England
2·7·461 Posts 
This would save another factor 4 over the naive way of outputting data in binary (64bit A and B values), though a very basic Huffmantype encoding gets you nearly a factor two there.

20071016, 14:27  #4 
"Sander"
Oct 2002
52.345322,5.52471
29·41 Posts 
GGNFS needs 115 bytes per relation.
At the moment i have about 25M relations for 5,447 which takes almost 3 GB. Not really a problem, but smaller files are a lot easier to manage. 
20071016, 14:49  #5 
Jun 2003
The Texas Hill Country
3^{2}·11^{2} Posts 
On the other hand, it is "expensive" to reconstruct the factors for each relation. Other than eliminating duplicates, you cannot do any real filtering without the factors.
I observe that with lattice sieving, we get about 1/3 duplicates and about 1/3 of those unique relations are singletons. By keeping the largest primes, you are not only able to eliminate almost half of the (unneeded) relations, but you significantly reduce the effort required to refactor those that remain. 
20071016, 15:50  #6  
Tribal Bullet
Oct 2004
3^{2}·5·79 Posts 
Quote:
As others have mentioned, skipping factors below a few million seems to be the best combination of saving space and still making filtering convenient. Adding the missing factors back takes much longer but that only needs to be done once, when the relations have made it to your RAID array :) 

Thread Tools  
Similar Threads  
Thread  Thread Starter  Forum  Replies  Last Post 
Congruence relations  Lee Yiyuan  Miscellaneous Math  7  20120508 12:55 
Keeping relations  fivemack  Factoring  1  20090126 17:49 
So, HOW many more relations does it want?  Andi47  Msieve  6  20081211 12:39 
More relations mean many more relations wanted  fivemack  Factoring  7  20070804 17:32 
How many relations do we need for 2 ^ 713 1 ?  thomasn  NFSNET Discussion  10  20030711 19:26 