Register FAQ Search Today's Posts Mark Forums Read

 2005-03-28, 17:24 #1 Peter Nelson     Oct 2004 232 Posts Question about factoring code Hi, Extract from the help for prime95.... "The GIMPS factoring code creates a modified sieve of Eratosthenes with each bit representing a potential 2kp+1 factor. The sieve then eliminates any potential factors that are divisible by prime numbers below 40,000 or so. Also, bits representing potential factors of 3 or 5 mod 8 are cleared. This process eliminates roughly 95% of potential factors. The remaining potential factors are tested using the powering algorithm above." I understand the 2kp+1 and 3 or 5 mod 8 tests and find these in the sieve algorithm in the source code. I also know that any factor of a Mersenne prime must also be itself prime. So the "eliminates any potential factors that are divisible by prime numbers" would make sense. However, I'm a little confused by the "below 40,000 or so" as I can't see where this occurs in the factor64.asm source code. How precise is this "40,000 OR SO". What is the actual figure? And why was 40,000 selected as opposed to higher or lower? (I assume it DOESN'T mean test against the first 40,000 primes because that's not what it says). Although there are limits on calculations caused by word sizes etc, 40,000 does not fall on such a boundary. There are 4203 primes from 1 to 40,000. The largest of these would be 39989. I assume these are built "on the fly" by a sieve algorithm for later use to eliminate potential factors in the assembly code sieve. (as it has been stated elsewhere that prime95 does not contain a list of primes other than the already found mersenne primes). :Question: I would appreciate it if someone could tell me EXACTLY which primes are used to eliminate potential factors in the CURRENT algorithm (appreciating the documentation may not have kept up and 40,000 could now be out of date). And a pointer to where I may find this going on in the source code files would also be helpful. What I'm trying to do is to see how much variation there is from the claimed "about 95% of potential factors eliminated" depending on the exponent size (from current right up through billion digits for which I can't use P95 but will replicate the algorithm using more precision) and bit depth ranges. For example changing the 40,000 might have an effect too. The code is slightly obfiscated by having versions and optimisations for different processors SSE2 etc, but my long term objective is to implement this or a similar factorisation algorithm in custom hardware. The sieve may remain in software to save me logic gates, thus I want to see the tradeoff between elimination and actually trial factoring.
 2005-03-28, 19:00 #2 Prime95 P90 years forever!     Aug 2002 Yeehaw, FL 22·43·47 Posts The small primes are built on-the-fly from the sivinfo table. This table gives the distance between primes divided by two. There is nothing magical about 40,000. This was chosen many, many years ago by timing several possible cutoff points. Basically, you are looking for the spot where the rate at which the cost of eliminating possible factors is equal to the cost of testing a possible factor. For example, consider the prime 39989. Sieving possible factors costs on the order of 10-20 clocks (read the bit offset, clear sieve bit, compute next bit offset, write bit offset). Testing a possible factor has different costs depending on the architecture and size of the possible factor. For arguments sake, it is roughly 20 squaring loops at 50 clocks each, or 1000 clocks. Since sieving all the small primes below 39989 eliminated roughly 95% of the trial factors, every 20 sieving attempts at a cost of 10-20 clocks (200-400) clocks will save one trial factoring at a cost of 1000 clocks -- indicating a little deeper sieving would be beneficial. Now, I didn't do the clock counting technique above. I just tried different sieve end points and chose the best one for the machine I was using at that time. Years later we are testing bigger exponents, bigger factors, on different machines. The current sieve endpoint is likely non-optimal.
2005-03-28, 19:32   #4
JuanTutors

"Juan Tutors"
Mar 2004

571 Posts

Quote:
 Originally Posted by Prime95 Now, I didn't do the clock counting technique above. I just tried different sieve end points and chose the best one for the machine I was using at that time. Years later we are testing bigger exponents, bigger factors, on different machines. The current sieve endpoint is likely non-optimal.
What are the chances of you or someone else trying to figure out a better if not optimal sieve endpoint? I personally would be afraid to go in myself and try to modify the current sieve endpoint and try different ones...

 2005-03-28, 20:32 #5 Peter Nelson     Oct 2004 232 Posts Thanks guys for very high quality answers. I think the idea of the sieve cutoff being non-optimal on current machines (or on very different size exponents) was partly where I was headed with my thinking too.
2005-03-28, 22:24   #6
smh

"Sander"
Oct 2002
52.345322,5.52471

22458 Posts

Quote:
 Originally Posted by Prime95 Now, I didn't do the clock counting technique above. I just tried different sieve end points and chose the best one for the machine I was using at that time. Years later we are testing bigger exponents, bigger factors, on different machines. The current sieve endpoint is likely non-optimal.
So this might be an easy (small) speed gain in the factoring code. Or would the speed gain be to small to go through the trouble of trying different cut off points?

 2005-03-30, 03:39 #7 Peter Nelson     Oct 2004 232 Posts Primearray values Having now examined factor64.asm further I see the table of small primes (primearray) that gets built from the byte table of differences in sivinfo. It does this until reaching the zero difference byte which indicates the end of the sequence. From the source I have, the primearray gets populated with the 9208 small primes (over 5) from 7 through to 95549 (so, indirectly prime95 does contain a list of primes in a compact encoded form). I believe these are the primes used for attempted elimination of factor candidates. If so then this is quite different from the 40,000 or so in the documentation and may benefit from updating the help file(s) and/or possible review of whether this is the best value. In any case I agree with George that the value might now be sub-optimal on current work and hardware. Ideally it would be possible to benchmark at runtime and adaptively change the array size, although an easier approach would be to just select a fixed size based on the architecture, or even just a single update to the source code based on current benchmarking. I may be wrong in that although the table gets built to this size, perhaps it is not actually used to this depth because of the 4K sieve size or some other limiting variable. George, am I right that all 9208 primes (to 95549) are actually getting used to quickly eliminate potential factors, or is it only some of them? (about half this number)
 2005-03-30, 15:31 #8 Prime95 P90 years forever!     Aug 2002 Yeehaw, FL 22×43×47 Posts Look for a line containing "DB 0" buried in the middle of the sivinfo array. This effectively truncates the table.
 2005-03-30, 17:51 #9 Peter Nelson     Oct 2004 232 Posts Yes, I had found the zero terminating reading of the data, and used the table of entries in Visual Basic to build the same table of 9208 primes from it that the assembly code does. There are more figures after in the ifdef foo section but these are not used. However, the location of the zero suggests that primes up to 95000ish are used rather than circa 40000. As the table is this size, my question is whether the code that steps through it actually uses all entries in the table or not. Last fiddled with by Peter Nelson on 2005-03-30 at 17:55
 2005-03-30, 18:28 #10 Prime95 P90 years forever!     Aug 2002 Yeehaw, FL 22×43×47 Posts Assuming your calculations are correct, then the cutoff is at 95000 not 40000. For some reason, I thought the sivinfo data only built primes to 64K. But that was so long ago, I apparently mis-remembered.

 Similar Threads Thread Thread Starter Forum Replies Last Post arbooker Factoring 219 2022-10-28 13:48 Rde Software 12 2009-06-12 22:38 schickel Forum Feedback 4 2009-04-01 03:27 philmoore Factoring 8 2005-06-14 22:13 Joe O Software 2 2002-09-13 23:39

All times are UTC. The time now is 20:22.

Fri Dec 2 20:22:20 UTC 2022 up 106 days, 17:50, 0 users, load averages: 0.78, 1.17, 1.24