20050328, 17:24  #1 
Oct 2004
23^{2} Posts 
Question about factoring code
Hi,
Extract from the help for prime95.... "The GIMPS factoring code creates a modified sieve of Eratosthenes with each bit representing a potential 2kp+1 factor. The sieve then eliminates any potential factors that are divisible by prime numbers below 40,000 or so. Also, bits representing potential factors of 3 or 5 mod 8 are cleared. This process eliminates roughly 95% of potential factors. The remaining potential factors are tested using the powering algorithm above." I understand the 2kp+1 and 3 or 5 mod 8 tests and find these in the sieve algorithm in the source code. I also know that any factor of a Mersenne prime must also be itself prime. So the "eliminates any potential factors that are divisible by prime numbers" would make sense. However, I'm a little confused by the "below 40,000 or so" as I can't see where this occurs in the factor64.asm source code. How precise is this "40,000 OR SO". What is the actual figure? And why was 40,000 selected as opposed to higher or lower? (I assume it DOESN'T mean test against the first 40,000 primes because that's not what it says). Although there are limits on calculations caused by word sizes etc, 40,000 does not fall on such a boundary. There are 4203 primes from 1 to 40,000. The largest of these would be 39989. I assume these are built "on the fly" by a sieve algorithm for later use to eliminate potential factors in the assembly code sieve. (as it has been stated elsewhere that prime95 does not contain a list of primes other than the already found mersenne primes). :Question: I would appreciate it if someone could tell me EXACTLY which primes are used to eliminate potential factors in the CURRENT algorithm (appreciating the documentation may not have kept up and 40,000 could now be out of date). And a pointer to where I may find this going on in the source code files would also be helpful. What I'm trying to do is to see how much variation there is from the claimed "about 95% of potential factors eliminated" depending on the exponent size (from current right up through billion digits for which I can't use P95 but will replicate the algorithm using more precision) and bit depth ranges. For example changing the 40,000 might have an effect too. The code is slightly obfiscated by having versions and optimisations for different processors SSE2 etc, but my long term objective is to implement this or a similar factorisation algorithm in custom hardware. The sieve may remain in software to save me logic gates, thus I want to see the tradeoff between elimination and actually trial factoring. 
20050328, 19:00  #2 
P90 years forever!
Aug 2002
Yeehaw, FL
2^{2}·43·47 Posts 
The small primes are built onthefly from the sivinfo table. This table gives the distance between primes divided by two.
There is nothing magical about 40,000. This was chosen many, many years ago by timing several possible cutoff points. Basically, you are looking for the spot where the rate at which the cost of eliminating possible factors is equal to the cost of testing a possible factor. For example, consider the prime 39989. Sieving possible factors costs on the order of 1020 clocks (read the bit offset, clear sieve bit, compute next bit offset, write bit offset). Testing a possible factor has different costs depending on the architecture and size of the possible factor. For arguments sake, it is roughly 20 squaring loops at 50 clocks each, or 1000 clocks. Since sieving all the small primes below 39989 eliminated roughly 95% of the trial factors, every 20 sieving attempts at a cost of 1020 clocks (200400) clocks will save one trial factoring at a cost of 1000 clocks  indicating a little deeper sieving would be beneficial. Now, I didn't do the clock counting technique above. I just tried different sieve end points and chose the best one for the machine I was using at that time. Years later we are testing bigger exponents, bigger factors, on different machines. The current sieve endpoint is likely nonoptimal. 
20050328, 19:32  #3 
∂^{2}ω=0
Sep 2002
República de California
2·3^{2}·653 Posts 
I can't help you with the wheretofindtheseparametersinthesourcecode issue, but can help with the primesbelow40000 (or whatever) issue, as I've dealt with that optimization in my nonx86 factoring code. (Which hasn't yet been released  still needs some platformspecific assembly code for fast integer modmul on a couple of platforms I want to add support for).
Here's what happens: we have a really fast bitwise sieve that takes very little time at all to eliminate all q's that == +1 mod 8 and all multiples of a fixed set of small primes (my code uses 3,5,7,11,13,17, I expect George's is the same or similar). The resulting smallprimes sieve will have length (natural integer wordsize)*(product of small primes), where the wordlength is typically the largest one the platform has good hardware support for, i.e. can do integer ALU operations like mask and shift on in a single cycle  so that'll be 32 or 64 bits. Now in practice there are additional optimizations one can use, such as exploiting the fact that for any odd Mersenne exponent p there are only 16 values of the factor index (my term for it) k mod 60 that can possibly yield prime factor candidates q = 2*k*p+1 (which 16 k mod 60 values depends on the value of p mod 60)  one typically creates a separate sieve (which has 1/60th the size of the fulllength sieve) for each of those, and then runs through each of the 16 subsieves separately. For efficiency reasons we want each of the sieves to fit entirely in cache (definitely in L2, and preferably in L1), so that limits how many small primes we use to set up these basic sieves. Now once we have our basic sieve(s) set up and copied, every run through the sieve corresponds to processing a set of kvalues. Even if (say) 9095% of our sieve bits are already zero at this point due to the mod8 and mod3,5,7,11,13,17 steps, testing a whether a factor candidate q divides M(p) is quite expensive relative to testing whether a small prime (< sqrt(q), obviously) divides q, so before doing the M(p) mod q check, we want to do a number of q mod (small prime) checks beyond those already built into the sieve. Note that once we know the smallest q that a given small prime divides, we can store the corresponding kvalue and use that to quickly knock all additional multiples of the small prime out of the current range of k's, rather than doing explicit q%(small prime) tests, which are very expensive. Since these added small primes won't be a multiple of the sieve length, we need to do a small amount of bookkeeping every time we get to the end of the current copy of the sieve, to prepare for the next set of k's  every time we get a new set of k's (i.e. a fresh copy of the original (primes <= 17)alreadyknockedout sieve), we do a preprocessing step to knock out the bits coresponding to the k's of the q's divided by the additional small primes less than our smallprime sieving limit. The aim is (as always) to minimize our compute time  if our smallprime sieving limit is too low we test more q's than we should, whereas if it's too high we wind up spending more time in the sievepreprocessing step than we would have spent just testing the small number of q's the added sieving wound up eliminating (since we get diminishing returns as our sieve limit gets larger.) The optimal smallprime sieving limit thus depends on the hardware, the algorithm and the range of q's being tested  since e.g. testing q > 2^64 is much more expensive than q < 2^64, it makes sense to raise the smallprime sieving limit for q > 2^64. But note that the runtime around the optimum is typically in the shape of a broad inverted cup, i.e. if one is within a factor of 2 of the optimal smallprime sieving limit, the runtime will vary by only a few percent from one end of the range to the other. In my experience, with a fast algorithmic implementation of the critical q^2sized modmul operation (i.e. generate modular products of pairs of sizeq operands), for factoring up to 2^64, a smallprime sieving limit of between 20000 and 100000 is typical. I don't know if/how George modifies that value for q > 2^64 (or whatever hardwaredependent breakover points Prime95 has to deal with). 
20050328, 19:32  #4  
"Juan Tutors"
Mar 2004
571 Posts 
Quote:


20050328, 20:32  #5 
Oct 2004
23^{2} Posts 
Thanks guys for very high quality answers.
I think the idea of the sieve cutoff being nonoptimal on current machines (or on very different size exponents) was partly where I was headed with my thinking too. 
20050328, 22:24  #6  
"Sander"
Oct 2002
52.345322,5.52471
2245_{8} Posts 
Quote:


20050330, 03:39  #7 
Oct 2004
23^{2} Posts 
Primearray values
Having now examined factor64.asm further I see the table of small primes (primearray) that gets built from the byte table of differences in sivinfo.
It does this until reaching the zero difference byte which indicates the end of the sequence. From the source I have, the primearray gets populated with the 9208 small primes (over 5) from 7 through to 95549 (so, indirectly prime95 does contain a list of primes in a compact encoded form). I believe these are the primes used for attempted elimination of factor candidates. If so then this is quite different from the 40,000 or so in the documentation and may benefit from updating the help file(s) and/or possible review of whether this is the best value. In any case I agree with George that the value might now be suboptimal on current work and hardware. Ideally it would be possible to benchmark at runtime and adaptively change the array size, although an easier approach would be to just select a fixed size based on the architecture, or even just a single update to the source code based on current benchmarking. I may be wrong in that although the table gets built to this size, perhaps it is not actually used to this depth because of the 4K sieve size or some other limiting variable. George, am I right that all 9208 primes (to 95549) are actually getting used to quickly eliminate potential factors, or is it only some of them? (about half this number) 
20050330, 15:31  #8 
P90 years forever!
Aug 2002
Yeehaw, FL
2^{2}×43×47 Posts 
Look for a line containing "DB 0" buried in the middle of the sivinfo array. This effectively truncates the table.

20050330, 17:51  #9 
Oct 2004
23^{2} Posts 
Yes, I had found the zero terminating reading of the data, and used the table of entries in Visual Basic to build the same table of 9208 primes from it that the assembly code does. There are more figures after in the ifdef foo section but these are not used.
However, the location of the zero suggests that primes up to 95000ish are used rather than circa 40000. As the table is this size, my question is whether the code that steps through it actually uses all entries in the table or not. Last fiddled with by Peter Nelson on 20050330 at 17:55 
20050330, 18:28  #10 
P90 years forever!
Aug 2002
Yeehaw, FL
2^{2}×43×47 Posts 
Assuming your calculations are correct, then the cutoff is at 95000 not 40000.
For some reason, I thought the sivinfo data only built primes to 64K. But that was so long ago, I apparently misremembered. 
Thread Tools  
Similar Threads  
Thread  Thread Starter  Forum  Replies  Last Post 
Some code for factoring numbers up to 2^63  arbooker  Factoring  219  20221028 13:48 
Factoring Question  Rde  Software  12  20090612 22:38 
Question about triming [code] blocks  schickel  Forum Feedback  4  20090401 03:27 
factoring question  philmoore  Factoring  8  20050614 22:13 
Questions about SSE2 code and Factoring  Joe O  Software  2  20020913 23:39 