mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Math

Reply
 
Thread Tools
Old 2009-04-06, 16:08   #1
lfm
 
lfm's Avatar
 
Jul 2006
Calgary

1A916 Posts
Default Trial Factoring Sieve?

Can someone elaborate somewhat on the trial factoring sieve used to check possible factors for primeness? It is mentioned in the GIMPS "The Math" page but not with enough details for me to implement it for myself.

I understand a regular prime sieve but I can't figure out how to apply the idea to this problem. Can you give a specific example of a particular size sieve and exactly what range of primes can be tested for and how?
lfm is offline   Reply With Quote
Old 2009-04-06, 16:15   #2
philmoore
 
philmoore's Avatar
 
"Phil"
Sep 2002
Tracktown, U.S.A.

25·5·7 Posts
Default

GIMPS does not actually use a sieve to check for possible factors of Mersenne numbers, for the simple reason that any prime number can divide at most one Mersenne number with a prime exponent. Instead, GIMPS does trial factoring of 2^p - 1 by all possible prime factors of the form 2kp+1, where the factors are further restricted to numbers which give a remainder of 1 or 7 when divided by 8. Restricting to just the prime possible factors is done by sieving, however, with the understanding that testing an occasional composite may be cheaper than sieving all the way until only primes remain.
philmoore is offline   Reply With Quote
Old 2009-04-06, 16:36   #3
lfm
 
lfm's Avatar
 
Jul 2006
Calgary

52·17 Posts
Default

Quote:
Originally Posted by philmoore View Post
Restricting to just the prime possible factors is done by sieving, however, with the understanding that testing an occasional composite may be cheaper than sieving all the way until only primes remain.
Yes, this is what I am asking, how is this done? It eliminates many possible factors before the power/modulus test needs to be done. I am just missing some detail for how this test is implemented, so I can implement it myself.
lfm is offline   Reply With Quote
Old 2009-04-06, 20:07   #4
wblipp
 
wblipp's Avatar
 
"William"
May 2003
Near Grandkid

237410 Posts
Default

I don't know the actual implementation, but the straight forward obvious implementation would be two stages of sieving.

The first stage would be a Sieve of Eratosthenes to get small primes.

The second stage would apply these small primes to an array where the k'th bit represents "2kp+1". One of the first 3 values of "2kp+1" is divisible by 3. Mark that and every 3rd one after it as composite. Likewise for 5, 7, etc.

At the end of the second stage, the remaining "k's" represent values of 2kp+1 that might be prime (they might also be composite with smallest divisor bigger than you tested). Trial divide with these by calculating 2^n mod 2kp+1.
wblipp is offline   Reply With Quote
Old 2009-04-06, 20:54   #5
ewmayer
2ω=0
 
ewmayer's Avatar
 
Sep 2002
Repรบblica de California

1175510 Posts
Default

Quote:
Originally Posted by wblipp View Post
I don't know the actual implementation, but the straight forward obvious implementation would be two stages of sieving.

The first stage would be a Sieve of Eratosthenes to get small primes.

The second stage would apply these small primes to an array where the k'th bit represents "2kp+1". One of the first 3 values of "2kp+1" is divisible by 3. Mark that and every 3rd one after it as composite. Likewise for 5, 7, etc.

At the end of the second stage, the remaining "k's" represent values of 2kp+1 that might be prime (they might also be composite with smallest divisor bigger than you tested). Trial divide with these by calculating 2^n mod 2kp+1.
Exactly - one needs to tune the small-prime limit of the sieve to try the catch the optimum, beyond which the work needed to process more small primes is more than the added "trial divides" it saves. A typical small-prime limit here is on the order of a million. This is of course hardware and implementation-dependent, but the runtime around the local optimum is pretty flat so the precise small-prime limit isn't hugely important, as long as one is in the right ballpark.

And of course priorities change, depending on where the code spends most of its time ... this past weekend I finished the first working version an SSE2-based TF modular powering routine for candidate factors up to 78 bits - that runs 2-3x faster on 32-bit x86 than the pure-integer analog does, so now the sieving work (which was between 20-50% of the total runtime) has suddenly become an important further-optimization candidate again.

Thoughts on further speeding the sieve: The main problem with the sieve bit-clearing code at present is that I process one small prime at a time, clear all the bits corresponding to q`s divisible by that prime, then proceed to the next small prime. For a sieve consisting of all primes less than (say) 1 million that means around 70000 sweeps through the sieve bit-array. Even if that is designed to fit in L1 cache (it is) that's a lot of memory traffic, and there is (at present) no a priori way to prevent a bit from getting "cleared" multiple times. I've been contemplating how to address these issues, and am thinking that perhaps some kind of heap-based approach might work: As one computes the bits hit by each multiple of the current sieving prime, push an integer corresponding to the bit location into a heap, which is designed to only accept "new" values, i.e. to ignore values already in the heap. At the end of small-primes processing, just pop one value at a time off the heap, and clear (just once now) the corresponding bit of the sieve.

But it seems that is just robbing Peter to pay Paul, as one has simply exchanged redundant bit-clears with redundant-heap-element-push-ignores.

Last fiddled with by ewmayer on 2009-04-06 at 20:55
ewmayer is offline   Reply With Quote
Old 2009-04-06, 22:13   #6
cheesehead
 
cheesehead's Avatar
 
"Richard B. Woods"
Aug 2002
Wisconsin USA

22·3·641 Posts
Default

IIRC, the current GIMPS software's TF potential-factor sieving consists of testing only the non-guaranteed-composite residue classes mod 210 (in addition to the modulo 8 requirements). I.e., 1 mod 210, 11 mod 210, 13 mod 210, ..., 199 mod 210, 209 mod 210.

Last fiddled with by cheesehead on 2009-04-06 at 22:18
cheesehead is offline   Reply With Quote
Old 2009-04-06, 22:19   #7
xilman
Bamboozled!
 
xilman's Avatar
 
"๐’‰บ๐’ŒŒ๐’‡ท๐’†ท๐’€ญ"
May 2003
Down not across

2×73×17 Posts
Default

Quote:
Originally Posted by ewmayer View Post
Exactly - one needs to tune the small-prime limit of the sieve to try the catch the optimum, beyond which the work needed to process more small primes is more than the added "trial divides" it saves. A typical small-prime limit here is on the order of a million. This is of course hardware and implementation-dependent, but the runtime around the local optimum is pretty flat so the precise small-prime limit isn't hugely important, as long as one is in the right ballpark.

And of course priorities change, depending on where the code spends most of its time ... this past weekend I finished the first working version an SSE2-based TF modular powering routine for candidate factors up to 78 bits - that runs 2-3x faster on 32-bit x86 than the pure-integer analog does, so now the sieving work (which was between 20-50% of the total runtime) has suddenly become an important further-optimization candidate again.

Thoughts on further speeding the sieve: The main problem with the sieve bit-clearing code at present is that I process one small prime at a time, clear all the bits corresponding to q`s divisible by that prime, then proceed to the next small prime. For a sieve consisting of all primes less than (say) 1 million that means around 70000 sweeps through the sieve bit-array. Even if that is designed to fit in L1 cache (it is) that's a lot of memory traffic, and there is (at present) no a priori way to prevent a bit from getting "cleared" multiple times. I've been contemplating how to address these issues, and am thinking that perhaps some kind of heap-based approach might work: As one computes the bits hit by each multiple of the current sieving prime, push an integer corresponding to the bit location into a heap, which is designed to only accept "new" values, i.e. to ignore values already in the heap. At the end of small-primes processing, just pop one value at a time off the heap, and clear (just once now) the corresponding bit of the sieve.

But it seems that is just robbing Peter to pay Paul, as one has simply exchanged redundant bit-clears with redundant-heap-element-push-ignores.
Does this idea help?

Observe that the bit pattern for a given prime p repeats with periodicity p (obviously!) and that on a n-bit machine, the word pattern repeats every p words. So, sieve p words with prime p then logical-and that word pattern into the remainder of the sieve. The idea is to replace bit addressing, which most machines are bad at, with (possibly vectorizable) word operations.

Note that exactly the same approach applies when p is the product of small primes, which may be useful for the smallest primes, which are the ones that take most of the time anyway. As long as p is within a reasonable range, p words is relatively small amount of storage.


Who is this Peter guy anyway?


Paul
xilman is offline   Reply With Quote
Old 2009-04-06, 22:47   #8
ewmayer
2ω=0
 
ewmayer's Avatar
 
Sep 2002
Repรบblica de California

5×2,351 Posts
Default

Quote:
Originally Posted by xilman View Post
Does this idea help?

Observe that the bit pattern for a given prime p repeats with periodicity p (obviously!) and that on a n-bit machine, the word pattern repeats every p words. So, sieve p words with prime p then logical-and that word pattern into the remainder of the sieve. The idea is to replace bit addressing, which most machines are bad at, with (possibly vectorizable) word operations.

Note that exactly the same approach applies when p is the product of small primes, which may be useful for the smallest primes, which are the ones that take most of the time anyway. As long as p is within a reasonable range, p words is relatively small amount of storage.
The problem is this: the repeating-pattern stuff is easy to take advantage of for the small primes, but the larger primes typically will hit just a handful of bits in the sieve, if they hit any at all on the given pass through the sieve. (We have a separate "bit offset" array which tells us if a given prime will hit any bits in the current pass through the sieve - if (current_offset[p] > num_sieve_bits) we simply decrement current_offset[p] by num_sieve_bits and continue on to the next small prime). Because the primes thin out logarithmically slowly we need lots of the larger ones to clear out an appreciable fraction of the bits, but the larger primes don't lend themselves to batch-mode sieve clearing.

I think it really boils down to making the rote bit-clearing as cycle-efficient as absolutely possible ... this coming weekend I will look at using SSE2 not just (in the floating-point sense) for the modular powering but (in 128-bit integer ALU mode) for the bit-clearing.

Quote:
Who is this Peter guy anyway?
Paul's unwilling beneficiary - but perhaps the UK Ministry of Finance will allow you to claim a tax deduction for your "donation".

Last fiddled with by ewmayer on 2009-04-06 at 22:48
ewmayer is offline   Reply With Quote
Old 2009-04-06, 23:17   #9
Prime95
P90 years forever!
 
Prime95's Avatar
 
Aug 2002
Yeehaw, FL

1FE516 Posts
Default

Quote:
Originally Posted by cheesehead View Post
IIRC, the current GIMPS software's TF potential-factor sieving consists of testing only the non-guaranteed-composite residue classes mod 210.
Try mod 120 = 3 * 5 * 8. It eliminates sieving for candidates divisible by 3, 5, and 3-or-5 mod 8.
Prime95 is online now   Reply With Quote
Old 2009-04-06, 23:24   #10
wblipp
 
wblipp's Avatar
 
"William"
May 2003
Near Grandkid

94616 Posts
Default

Quote:
Originally Posted by ewmayer View Post
but the larger primes typically will hit just a handful of bits in the sieve
Most of the "double markings" are going to be numbers that were previously marked with tiny primes. I was going to suggest a 3 spoke wheel-per-prime so that when marking multiples of q you could avoid marking multiples of 3q, 5q, and 7q. But the handful of hits suggests that at most a 1 or 2 spoke wheel is useful. Note that "residues mod 210" is an alternative implementation of the same idea - avoiding duplicate marking for multiples of 3, 5, and 7.

Last fiddled with by wblipp on 2009-04-06 at 23:24
wblipp is offline   Reply With Quote
Old 2009-04-06, 23:57   #11
ewmayer
2ω=0
 
ewmayer's Avatar
 
Sep 2002
Repรบblica de California

5×2,351 Posts
Default

Quote:
Originally Posted by wblipp View Post
Most of the "double markings" are going to be numbers that were previously marked with tiny primes.
True, but merely examining a given bit for is-equal-to-zero needs about as much time as clearing it, so I don't see an obvious speedup to be had here.

George, what is a typical small-prime limit used by your code, and roughly what fraction of the TF time does it spend in sieve-related operations?
ewmayer is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
Constructing a sieve for trial factors davieddy Math 48 2009-07-07 19:42
How far to do trial factoring S485122 PrimeNet 1 2007-09-06 00:52
trial factoring and P-1 jocelynl Math 8 2006-02-01 14:12
How to only do Trial Factoring? michael Software 23 2004-01-06 08:54
About trial factoring gbvalor Math 4 2003-05-22 02:04

All times are UTC. The time now is 22:17.


Mon Feb 6 22:17:44 UTC 2023 up 172 days, 19:46, 1 user, load averages: 0.94, 1.03, 1.13

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2023, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.

โ‰  ยฑ โˆ“ รท ร— ยท โˆ’ โˆš โ€ฐ โŠ— โŠ• โŠ– โŠ˜ โŠ™ โ‰ค โ‰ฅ โ‰ฆ โ‰ง โ‰จ โ‰ฉ โ‰บ โ‰ป โ‰ผ โ‰ฝ โŠ โŠ โŠ‘ โŠ’ ยฒ ยณ ยฐ
โˆ  โˆŸ ยฐ โ‰… ~ โ€– โŸ‚ โซ›
โ‰ก โ‰œ โ‰ˆ โˆ โˆž โ‰ช โ‰ซ โŒŠโŒ‹ โŒˆโŒ‰ โˆ˜ โˆ โˆ โˆ‘ โˆง โˆจ โˆฉ โˆช โจ€ โŠ• โŠ— ๐–• ๐–– ๐–— โŠฒ โŠณ
โˆ… โˆ– โˆ โ†ฆ โ†ฃ โˆฉ โˆช โŠ† โŠ‚ โŠ„ โŠŠ โŠ‡ โŠƒ โŠ… โŠ‹ โŠ– โˆˆ โˆ‰ โˆ‹ โˆŒ โ„• โ„ค โ„š โ„ โ„‚ โ„ต โ„ถ โ„ท โ„ธ ๐“Ÿ
ยฌ โˆจ โˆง โŠ• โ†’ โ† โ‡’ โ‡ โ‡” โˆ€ โˆƒ โˆ„ โˆด โˆต โŠค โŠฅ โŠข โŠจ โซค โŠฃ โ€ฆ โ‹ฏ โ‹ฎ โ‹ฐ โ‹ฑ
โˆซ โˆฌ โˆญ โˆฎ โˆฏ โˆฐ โˆ‡ โˆ† ฮด โˆ‚ โ„ฑ โ„’ โ„“
๐›ข๐›ผ ๐›ฃ๐›ฝ ๐›ค๐›พ ๐›ฅ๐›ฟ ๐›ฆ๐œ€๐œ– ๐›ง๐œ ๐›จ๐œ‚ ๐›ฉ๐œƒ๐œ— ๐›ช๐œ„ ๐›ซ๐œ… ๐›ฌ๐œ† ๐›ญ๐œ‡ ๐›ฎ๐œˆ ๐›ฏ๐œ‰ ๐›ฐ๐œŠ ๐›ฑ๐œ‹ ๐›ฒ๐œŒ ๐›ด๐œŽ๐œ ๐›ต๐œ ๐›ถ๐œ ๐›ท๐œ™๐œ‘ ๐›ธ๐œ’ ๐›น๐œ“ ๐›บ๐œ”