mersenneforum.org Compressing relations
 Register FAQ Search Today's Posts Mark Forums Read

 2007-10-16, 12:52 #1 fivemack (loop (#_fork))     Feb 2006 Cambridge, England 2×7×461 Posts Compressing relations I think I can get lattice-siever output compressed to four bytes per relation without inordinate difficulty (you need only represent AB pairs; figure out what lattice the first six relations lie on by LLL, compress them to coordinates with respect to the basis; these will by construction normally be in the [-16384,16384] x [0,16384] range; when the coordinates wrt that lattice stop being integral go and check for a different lattice). A little extra generality does the same for line-siever output [sort by B then A, Huffman-compress differences between As and have a token for 'go back to start of previous line'] Would anyone be interested in an actual implementation of this? The compressor gets me to implement LLL, the decompressor is an interesting exercise in batch factorisation, but it's enough work that I'd quite like to know if anyone would actually use it. I suppose it saves only bandwidth and disc space, neither of which are expensive nowadays.
2007-10-16, 13:08   #2
R.D. Silverman

"Bob Silverman"
Nov 2003
North of Boston

750610 Posts

Quote:
 Originally Posted by fivemack I think I can get lattice-siever output compressed to four bytes per relation without inordinate difficulty (you need only represent AB pairs; figure out what lattice the first six relations lie on by LLL, compress them to coordinates with respect to the basis; these will by construction normally be in the [-16384,16384] x [0,16384] range; when the coordinates wrt that lattice stop being integral go and check for a different lattice). A little extra generality does the same for line-siever output [sort by B then A, Huffman-compress differences between As and have a token for 'go back to start of previous line'] Would anyone be interested in an actual implementation of this? The compressor gets me to implement LLL, the decompressor is an interesting exercise in batch factorisation, but it's enough work that I'd quite like to know if anyone would actually use it. I suppose it saves only bandwidth and disc space, neither of which are expensive nowadays.

An interesting idea, but probably not worth doing. If you want to save
space, then output the data in binary, not ASCII.

 2007-10-16, 13:21 #3 fivemack (loop (#_fork))     Feb 2006 Cambridge, England 2·7·461 Posts This would save another factor 4 over the naive way of outputting data in binary (64-bit A and B values), though a very basic Huffman-type encoding gets you nearly a factor two there.
 2007-10-16, 14:27 #4 smh     "Sander" Oct 2002 52.345322,5.52471 29·41 Posts GGNFS needs 115 bytes per relation. At the moment i have about 25M relations for 5,447- which takes almost 3 GB. Not really a problem, but smaller files are a lot easier to manage.
 2007-10-16, 14:49 #5 Wacky     Jun 2003 The Texas Hill Country 32·112 Posts On the other hand, it is "expensive" to reconstruct the factors for each relation. Other than eliminating duplicates, you cannot do any real filtering without the factors. I observe that with lattice sieving, we get about 1/3 duplicates and about 1/3 of those unique relations are singletons. By keeping the largest primes, you are not only able to eliminate almost half of the (unneeded) relations, but you significantly reduce the effort required to re-factor those that remain.
2007-10-16, 15:50   #6
jasonp
Tribal Bullet

Oct 2004

32·5·79 Posts

Quote:
 Originally Posted by fivemack Would anyone be interested in an actual implementation of this? The compressor gets me to implement LLL, the decompressor is an interesting exercise in batch factorisation, but it's enough work that I'd quite like to know if anyone would actually use it. I suppose it saves only bandwidth and disc space, neither of which are expensive nowadays.
You can drastically reduce the disk space taken by GGNFS relations already: modify the siever to not output duplicate factors (procrels and msieve both discover multiplicities anyway) and to skip factors below 256 or so. I would think just doing that can cut the size of the files by 30-50%.

As others have mentioned, skipping factors below a few million seems to be the best combination of saving space and still making filtering convenient. Adding the missing factors back takes much longer but that only needs to be done once, when the relations have made it to your RAID array :)

 Similar Threads Thread Thread Starter Forum Replies Last Post Lee Yiyuan Miscellaneous Math 7 2012-05-08 12:55 fivemack Factoring 1 2009-01-26 17:49 Andi47 Msieve 6 2008-12-11 12:39 fivemack Factoring 7 2007-08-04 17:32 thomasn NFSNET Discussion 10 2003-07-11 19:26

All times are UTC. The time now is 22:50.

Sat Jan 28 22:50:08 UTC 2023 up 163 days, 20:18, 0 users, load averages: 1.78, 1.25, 1.13