mersenneforum.org > Math Trial Factoring Sieve?
 Register FAQ Search Today's Posts Mark Forums Read

 2009-04-06, 16:08 #1 lfm     Jul 2006 Calgary 1A916 Posts Trial Factoring Sieve? Can someone elaborate somewhat on the trial factoring sieve used to check possible factors for primeness? It is mentioned in the GIMPS "The Math" page but not with enough details for me to implement it for myself. I understand a regular prime sieve but I can't figure out how to apply the idea to this problem. Can you give a specific example of a particular size sieve and exactly what range of primes can be tested for and how?
 2009-04-06, 16:15 #2 philmoore     "Phil" Sep 2002 Tracktown, U.S.A. 25·5·7 Posts GIMPS does not actually use a sieve to check for possible factors of Mersenne numbers, for the simple reason that any prime number can divide at most one Mersenne number with a prime exponent. Instead, GIMPS does trial factoring of 2^p - 1 by all possible prime factors of the form 2kp+1, where the factors are further restricted to numbers which give a remainder of 1 or 7 when divided by 8. Restricting to just the prime possible factors is done by sieving, however, with the understanding that testing an occasional composite may be cheaper than sieving all the way until only primes remain.
2009-04-06, 16:36   #3
lfm

Jul 2006
Calgary

52·17 Posts

Quote:
 Originally Posted by philmoore Restricting to just the prime possible factors is done by sieving, however, with the understanding that testing an occasional composite may be cheaper than sieving all the way until only primes remain.
Yes, this is what I am asking, how is this done? It eliminates many possible factors before the power/modulus test needs to be done. I am just missing some detail for how this test is implemented, so I can implement it myself.

 2009-04-06, 20:07 #4 wblipp     "William" May 2003 Near Grandkid 237410 Posts I don't know the actual implementation, but the straight forward obvious implementation would be two stages of sieving. The first stage would be a Sieve of Eratosthenes to get small primes. The second stage would apply these small primes to an array where the k'th bit represents "2kp+1". One of the first 3 values of "2kp+1" is divisible by 3. Mark that and every 3rd one after it as composite. Likewise for 5, 7, etc. At the end of the second stage, the remaining "k's" represent values of 2kp+1 that might be prime (they might also be composite with smallest divisor bigger than you tested). Trial divide with these by calculating 2^n mod 2kp+1.
2009-04-06, 20:54   #5
ewmayer
2ω=0

Sep 2002
República de California

1175510 Posts

Quote:
 Originally Posted by wblipp I don't know the actual implementation, but the straight forward obvious implementation would be two stages of sieving. The first stage would be a Sieve of Eratosthenes to get small primes. The second stage would apply these small primes to an array where the k'th bit represents "2kp+1". One of the first 3 values of "2kp+1" is divisible by 3. Mark that and every 3rd one after it as composite. Likewise for 5, 7, etc. At the end of the second stage, the remaining "k's" represent values of 2kp+1 that might be prime (they might also be composite with smallest divisor bigger than you tested). Trial divide with these by calculating 2^n mod 2kp+1.
Exactly - one needs to tune the small-prime limit of the sieve to try the catch the optimum, beyond which the work needed to process more small primes is more than the added "trial divides" it saves. A typical small-prime limit here is on the order of a million. This is of course hardware and implementation-dependent, but the runtime around the local optimum is pretty flat so the precise small-prime limit isn't hugely important, as long as one is in the right ballpark.

And of course priorities change, depending on where the code spends most of its time ... this past weekend I finished the first working version an SSE2-based TF modular powering routine for candidate factors up to 78 bits - that runs 2-3x faster on 32-bit x86 than the pure-integer analog does, so now the sieving work (which was between 20-50% of the total runtime) has suddenly become an important further-optimization candidate again.

Thoughts on further speeding the sieve: The main problem with the sieve bit-clearing code at present is that I process one small prime at a time, clear all the bits corresponding to qs divisible by that prime, then proceed to the next small prime. For a sieve consisting of all primes less than (say) 1 million that means around 70000 sweeps through the sieve bit-array. Even if that is designed to fit in L1 cache (it is) that's a lot of memory traffic, and there is (at present) no a priori way to prevent a bit from getting "cleared" multiple times. I've been contemplating how to address these issues, and am thinking that perhaps some kind of heap-based approach might work: As one computes the bits hit by each multiple of the current sieving prime, push an integer corresponding to the bit location into a heap, which is designed to only accept "new" values, i.e. to ignore values already in the heap. At the end of small-primes processing, just pop one value at a time off the heap, and clear (just once now) the corresponding bit of the sieve.

But it seems that is just robbing Peter to pay Paul, as one has simply exchanged redundant bit-clears with redundant-heap-element-push-ignores.

Last fiddled with by ewmayer on 2009-04-06 at 20:55

 2009-04-06, 22:13 #6 cheesehead     "Richard B. Woods" Aug 2002 Wisconsin USA 22·3·641 Posts IIRC, the current GIMPS software's TF potential-factor sieving consists of testing only the non-guaranteed-composite residue classes mod 210 (in addition to the modulo 8 requirements). I.e., 1 mod 210, 11 mod 210, 13 mod 210, ..., 199 mod 210, 209 mod 210. Last fiddled with by cheesehead on 2009-04-06 at 22:18
2009-04-06, 22:19   #7
xilman
Bamboozled!

"𒉺𒌌𒇷𒆷𒀭"
May 2003
Down not across

2×73×17 Posts

Quote:
 Originally Posted by ewmayer Exactly - one needs to tune the small-prime limit of the sieve to try the catch the optimum, beyond which the work needed to process more small primes is more than the added "trial divides" it saves. A typical small-prime limit here is on the order of a million. This is of course hardware and implementation-dependent, but the runtime around the local optimum is pretty flat so the precise small-prime limit isn't hugely important, as long as one is in the right ballpark. And of course priorities change, depending on where the code spends most of its time ... this past weekend I finished the first working version an SSE2-based TF modular powering routine for candidate factors up to 78 bits - that runs 2-3x faster on 32-bit x86 than the pure-integer analog does, so now the sieving work (which was between 20-50% of the total runtime) has suddenly become an important further-optimization candidate again. Thoughts on further speeding the sieve: The main problem with the sieve bit-clearing code at present is that I process one small prime at a time, clear all the bits corresponding to qs divisible by that prime, then proceed to the next small prime. For a sieve consisting of all primes less than (say) 1 million that means around 70000 sweeps through the sieve bit-array. Even if that is designed to fit in L1 cache (it is) that's a lot of memory traffic, and there is (at present) no a priori way to prevent a bit from getting "cleared" multiple times. I've been contemplating how to address these issues, and am thinking that perhaps some kind of heap-based approach might work: As one computes the bits hit by each multiple of the current sieving prime, push an integer corresponding to the bit location into a heap, which is designed to only accept "new" values, i.e. to ignore values already in the heap. At the end of small-primes processing, just pop one value at a time off the heap, and clear (just once now) the corresponding bit of the sieve. But it seems that is just robbing Peter to pay Paul, as one has simply exchanged redundant bit-clears with redundant-heap-element-push-ignores.
Does this idea help?

Observe that the bit pattern for a given prime p repeats with periodicity p (obviously!) and that on a n-bit machine, the word pattern repeats every p words. So, sieve p words with prime p then logical-and that word pattern into the remainder of the sieve. The idea is to replace bit addressing, which most machines are bad at, with (possibly vectorizable) word operations.

Note that exactly the same approach applies when p is the product of small primes, which may be useful for the smallest primes, which are the ones that take most of the time anyway. As long as p is within a reasonable range, p words is relatively small amount of storage.

Who is this Peter guy anyway?

Paul

2009-04-06, 22:47   #8
ewmayer
2ω=0

Sep 2002
República de California

5×2,351 Posts

Quote:
 Originally Posted by xilman Does this idea help? Observe that the bit pattern for a given prime p repeats with periodicity p (obviously!) and that on a n-bit machine, the word pattern repeats every p words. So, sieve p words with prime p then logical-and that word pattern into the remainder of the sieve. The idea is to replace bit addressing, which most machines are bad at, with (possibly vectorizable) word operations. Note that exactly the same approach applies when p is the product of small primes, which may be useful for the smallest primes, which are the ones that take most of the time anyway. As long as p is within a reasonable range, p words is relatively small amount of storage.
The problem is this: the repeating-pattern stuff is easy to take advantage of for the small primes, but the larger primes typically will hit just a handful of bits in the sieve, if they hit any at all on the given pass through the sieve. (We have a separate "bit offset" array which tells us if a given prime will hit any bits in the current pass through the sieve - if (current_offset[p] > num_sieve_bits) we simply decrement current_offset[p] by num_sieve_bits and continue on to the next small prime). Because the primes thin out logarithmically slowly we need lots of the larger ones to clear out an appreciable fraction of the bits, but the larger primes don't lend themselves to batch-mode sieve clearing.

I think it really boils down to making the rote bit-clearing as cycle-efficient as absolutely possible ... this coming weekend I will look at using SSE2 not just (in the floating-point sense) for the modular powering but (in 128-bit integer ALU mode) for the bit-clearing.

Quote:
 Who is this Peter guy anyway?
Paul's unwilling beneficiary - but perhaps the UK Ministry of Finance will allow you to claim a tax deduction for your "donation".

Last fiddled with by ewmayer on 2009-04-06 at 22:48

2009-04-06, 23:17   #9
Prime95
P90 years forever!

Aug 2002
Yeehaw, FL

1FE516 Posts

Quote:
 Originally Posted by cheesehead IIRC, the current GIMPS software's TF potential-factor sieving consists of testing only the non-guaranteed-composite residue classes mod 210.
Try mod 120 = 3 * 5 * 8. It eliminates sieving for candidates divisible by 3, 5, and 3-or-5 mod 8.

2009-04-06, 23:24   #10
wblipp

"William"
May 2003
Near Grandkid

94616 Posts

Quote:
 Originally Posted by ewmayer but the larger primes typically will hit just a handful of bits in the sieve
Most of the "double markings" are going to be numbers that were previously marked with tiny primes. I was going to suggest a 3 spoke wheel-per-prime so that when marking multiples of q you could avoid marking multiples of 3q, 5q, and 7q. But the handful of hits suggests that at most a 1 or 2 spoke wheel is useful. Note that "residues mod 210" is an alternative implementation of the same idea - avoiding duplicate marking for multiples of 3, 5, and 7.

Last fiddled with by wblipp on 2009-04-06 at 23:24

2009-04-06, 23:57   #11
ewmayer
2ω=0

Sep 2002
República de California

5×2,351 Posts

Quote:
 Originally Posted by wblipp Most of the "double markings" are going to be numbers that were previously marked with tiny primes.
True, but merely examining a given bit for is-equal-to-zero needs about as much time as clearing it, so I don't see an obvious speedup to be had here.

George, what is a typical small-prime limit used by your code, and roughly what fraction of the TF time does it spend in sieve-related operations?

 Similar Threads Thread Thread Starter Forum Replies Last Post davieddy Math 48 2009-07-07 19:42 S485122 PrimeNet 1 2007-09-06 00:52 jocelynl Math 8 2006-02-01 14:12 michael Software 23 2004-01-06 08:54 gbvalor Math 4 2003-05-22 02:04

All times are UTC. The time now is 22:17.

Mon Feb 6 22:17:44 UTC 2023 up 172 days, 19:46, 1 user, load averages: 0.94, 1.03, 1.13