![]() |
![]() |
#1 |
May 2011
101112 Posts |
![]()
Hi,
I have a question to ask. Based on the filtering step, is the 1st maximum bound 2^32 since the bucket is [0,2^32] and uint32 is used for filtmin_a and filtmin_r? If so, isn't the bound limit a little low if larger numbers are used? Or did i misunderstood the code. Thanks! |
![]() |
![]() |
![]() |
#2 |
(loop (#_fork))
Feb 2006
Cambridge, England
33·239 Posts |
![]()
The filtmin parameter serves to divide ideals into 'small' and 'large'; these roughly correspond to the small- and large-prime bounds in sieving, though it's not an exact match. Even for RSA768 the sieving was done with 'small'=2^30. So this particular limit is not a serious restriction at present.
|
![]() |
![]() |
![]() |
#3 |
Tribal Bullet
Oct 2004
354510 Posts |
![]()
This is one of several limitations in the filtering that prevents it being realistically used for extremely large problems. A more pressing limitation is that relation numbers max out at 2^32 so that a factorization must have less than 4 billion relations. We're already 25% of the way to that limit for the largest Msieve jobs, but getting further will require an update to the sieving tools that people use.
|
![]() |
![]() |
![]() |
#4 |
May 2011
23 Posts |
![]()
I see.. Other than the filtering bound and number of relations limitations, are there other limitations that prevents the filtering step from being realistically used for extremely large problems?
|
![]() |
![]() |
![]() |
#5 |
Nov 2003
22·5·373 Posts |
![]() |
![]() |
![]() |
![]() |
#7 |
Tribal Bullet
Oct 2004
5×709 Posts |
![]()
Besides the limit on the number of relations, there are only a few things stopping, say, filtering on RSA768-size numbers, i.e. many billions of relations:
- clique removal needs to be possible from disk files - duplicate and singleton removal needs to be MPI-enabled, otherwise the size of internal hashtables will exceed the memory of the machine - in-memory clique removal has an internal 16GB limit on the size of an intermediate structure With all of these done and a large-memory machine for the merge phase, we'd be good to go. There's a parallel problem that the sieving tools would need to be comfortable with large primes > 2^33 |
![]() |
![]() |
![]() |
#8 | |
Bamboozled!
"๐บ๐๐ท๐ท๐ญ"
May 2003
Down not across
33×419 Posts |
![]() Quote:
Back in the days of NFSNET we (I and Don Leclair, IIRC) wrote duplicate and singleton removal software for a single threaded single process environment. It wasn't perhaps as fast as it would have been had more memory been available but it worked well. Perhaps I should dig up the remdup.c software and then take this discussion off-line. Paul Last fiddled with by xilman on 2011-07-12 at 14:34 Reason: Fix /tag] |
|
![]() |
![]() |
![]() |
#9 |
Tribal Bullet
Oct 2004
5×709 Posts |
![]()
MPI isn't strictly needed, but when there's a terabyte of relations then you have to account somehow for intermediate lists of things (primes, ideals, relation numbers or coordinates, etc) exceeding the memory of the filtering machine. That becomes much more manageable given a cluster of average size machines all pulling information out of the same dataset and boiling it down by themselves, with a global pass at the end to reconcile all the results.
For singleton removal, you at least have to have a hashtable that maps primes or ideals to a unique number, so that relations can be assigned an ideal list. Right now the size of that hashtable is the limiting factor in the memory use of the filtering. (Yes I know there are loads of space-efficient hashtable implementations out there) |
![]() |
![]() |
![]() |
#10 |
May 2011
278 Posts |
![]()
I see.. okie! So upgrading the filtering steps should not be that complex but rather because of the current sieving limitations, there is not much incentives to cater for large numbers (>768) factorization yet right?
Sorry one more question, i am stuck with understanding how the first bound is chosen. What is the role of bin_max, bin_min and hits_per_prime in the duplicate.c source code? Anyway, thanks guys for all the help you guys provided! Really appreciate it! =) |
![]() |
![]() |
![]() |
#11 | |
Tribal Bullet
Oct 2004
5·709 Posts |
![]() Quote:
We have a histogram of the number of times a prime occurs, which was computed essentially for free when the first duplicate removal pass ran. Every prime < 2^32 encountered in every relation (including the duplicates :) increments the count in one 128k-size bucket. Then the average number of times a prime appears in bucket i is approximately (count of hits in bucket i) / (number of primes in bucket i) and the desired filtering bound is the middle of the first bucket (working from 2^32 backwards) where this number exceeds 40.0. We don't need to count the exact number of primes in bucket i, instead we compute (number of primes less than (start of bucket i+1)) - (number of primes less than (start of bucket i)) using the approximation that the number of primes less than x is about x / log(x). These are what bin_min and bin_max are used for, and hits_per_prime is the computed average. If you haven't seen it already, read this for more detail on Msieve's NFS filtering. Most of it is still accurate today. Last fiddled with by jasonp on 2011-07-13 at 11:29 |
|
![]() |
![]() |
![]() |
Thread Tools | |
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
NFS filtering in the era of big data | jasonp | Msieve | 36 | 2018-05-07 19:55 |
NFS filtering error... | Stargate38 | YAFU | 4 | 2016-04-20 16:53 |
The big filtering bug strikes again (I think) | Dubslow | Msieve | 20 | 2016-02-05 14:00 |
Filtering Error | richs | Msieve | 8 | 2015-01-18 17:40 |
Filtering | R.D. Silverman | Cunningham Tables | 14 | 2010-08-05 08:30 |