mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Software

Reply
 
Thread Tools
Old 2005-03-28, 17:24   #1
Peter Nelson
 
Peter Nelson's Avatar
 
Oct 2004

232 Posts
Default Question about factoring code

Hi,

Extract from the help for prime95....

"The GIMPS factoring code creates a modified sieve of Eratosthenes with each bit representing a potential 2kp+1 factor. The sieve then eliminates any potential factors that are divisible by prime numbers below 40,000 or so. Also, bits representing potential factors of 3 or 5 mod 8 are cleared. This process eliminates roughly 95% of potential factors. The remaining potential factors are tested using the powering algorithm above."

I understand the 2kp+1 and 3 or 5 mod 8 tests and find these in the sieve algorithm in the source code.

I also know that any factor of a Mersenne prime must also be itself prime.

So the "eliminates any potential factors that are divisible by prime numbers" would make sense.

However, I'm a little confused by the "below 40,000 or so" as I can't see where this occurs in the factor64.asm source code.

How precise is this "40,000 OR SO". What is the actual figure? And why was 40,000 selected as opposed to higher or lower?

(I assume it DOESN'T mean test against the first 40,000 primes because that's not what it says).

Although there are limits on calculations caused by word sizes etc, 40,000 does not fall on such a boundary.

There are 4203 primes from 1 to 40,000. The largest of these would be 39989.

I assume these are built "on the fly" by a sieve algorithm for later use to eliminate potential factors in the assembly code sieve. (as it has been stated elsewhere that prime95 does not contain a list of primes other than the already found mersenne primes).

:Question:

I would appreciate it if someone could tell me EXACTLY which primes are used to eliminate potential factors in the CURRENT algorithm (appreciating the documentation may not have kept up and 40,000 could now be out of date).

And a pointer to where I may find this going on in the source code files would also be helpful.

What I'm trying to do is to see how much variation there is from the claimed "about 95% of potential factors eliminated" depending on the exponent size (from current right up through billion digits for which I can't use P95 but will replicate the algorithm using more precision) and bit depth ranges. For example changing the 40,000 might have an effect too.

The code is slightly obfiscated by having versions and optimisations for different processors SSE2 etc, but my long term objective is to implement this or a similar factorisation algorithm in custom hardware. The sieve may remain in software to save me logic gates, thus I want to see the tradeoff between elimination and actually trial factoring.
Peter Nelson is offline   Reply With Quote
Old 2005-03-28, 19:00   #2
Prime95
P90 years forever!
 
Prime95's Avatar
 
Aug 2002
Yeehaw, FL

22·43·47 Posts
Default

The small primes are built on-the-fly from the sivinfo table. This table gives the distance between primes divided by two.

There is nothing magical about 40,000. This was chosen many, many years ago by timing several possible cutoff points. Basically, you are looking for the spot where the rate at which the cost of eliminating possible factors is equal to the cost of testing a possible factor.

For example, consider the prime 39989. Sieving possible factors costs on the order of 10-20 clocks (read the bit offset, clear sieve bit, compute next bit offset, write bit offset). Testing a possible factor has different costs depending on the architecture and size of the possible factor. For arguments sake, it is roughly 20 squaring loops at 50 clocks each, or 1000 clocks.

Since sieving all the small primes below 39989 eliminated roughly 95% of the trial factors, every 20 sieving attempts at a cost of 10-20 clocks (200-400) clocks will save one trial factoring at a cost of 1000 clocks -- indicating a little deeper sieving would be beneficial.

Now, I didn't do the clock counting technique above. I just tried different sieve end points and chose the best one for the machine I was using at that time. Years later we are testing bigger exponents, bigger factors, on different machines. The current sieve endpoint is likely non-optimal.
Prime95 is offline   Reply With Quote
Old 2005-03-28, 19:32   #3
ewmayer
2ω=0
 
ewmayer's Avatar
 
Sep 2002
República de California

2·32·653 Posts
Default

I can't help you with the where-to-find-these-parameters-in-the-source-code issue, but can help with the primes-below-40000 (or whatever) issue, as I've dealt with that optimization in my non-x86 factoring code. (Which hasn't yet been released - still needs some platform-specific assembly code for fast integer modmul on a couple of platforms I want to add support for).

Here's what happens: we have a really fast bitwise sieve that takes very little time at all to eliminate all q's that == +1 mod 8 and all multiples of a fixed set of small primes (my code uses 3,5,7,11,13,17, I expect George's is the same or similar). The resulting small-primes sieve will have length (natural integer wordsize)*(product of small primes), where the wordlength is typically the largest one the platform has good hardware support for, i.e. can do integer ALU operations like mask and shift on in a single cycle - so that'll be 32 or 64 bits. Now in practice there are additional optimizations one can use, such as exploiting the fact that for any odd Mersenne exponent p there are only 16 values of the factor index (my term for it) k mod 60 that can possibly yield prime factor candidates q = 2*k*p+1 (which 16 k mod 60 values depends on the value of p mod 60) - one typically creates a separate sieve (which has 1/60th the size of the full-length sieve) for each of those, and then runs through each of the 16 subsieves separately. For efficiency reasons we want each of the sieves to fit entirely in cache (definitely in L2, and preferably in L1), so that limits how many small primes we use to set up these basic sieves.

Now once we have our basic sieve(s) set up and copied, every run through the sieve corresponds to processing a set of k-values. Even if (say) 90-95% of our sieve bits are already zero at this point due to the mod-8 and mod-3,5,7,11,13,17 steps, testing a whether a factor candidate q divides M(p) is quite expensive relative to testing whether a small prime (< sqrt(q), obviously) divides q, so before doing the M(p) mod q check, we want to do a number of q mod (small prime) checks beyond those already built into the sieve. Note that once we know the smallest q that a given small prime divides, we can store the corresponding k-value and use that to quickly knock all additional multiples of the small prime out of the current range of k's, rather than doing explicit q%(small prime) tests, which are very expensive. Since these added small primes won't be a multiple of the sieve length, we need to do a small amount of bookkeeping every time we get to the end of the current copy of the sieve, to prepare for the next set of k's - every time we get a new set of k's (i.e. a fresh copy of the original (primes <= 17)-already-knocked-out sieve), we do a preprocessing step to knock out the bits coresponding to the k's of the q's divided by the additional small primes less than our small-prime sieving limit.

The aim is (as always) to minimize our compute time - if our small-prime sieving limit is too low we test more q's than we should, whereas if it's too high we wind up spending more time in the sieve-preprocessing step than we would have spent just testing the small number of q's the added sieving wound up eliminating (since we get diminishing returns as our sieve limit gets larger.) The optimal small-prime sieving limit thus depends on the hardware, the algorithm and the range of q's being tested - since e.g. testing q > 2^64 is much more expensive than q < 2^64, it makes sense to raise the small-prime sieving limit for q > 2^64. But note that the runtime around the optimum is typically in the shape of a broad inverted cup, i.e. if one is within a factor of 2 of the optimal small-prime sieving limit, the runtime will vary by only a few percent from one end of the range to the other. In my experience, with a fast algorithmic implementation of the critical q^2-sized modmul operation (i.e. generate modular products of pairs of size-q operands), for factoring up to 2^64, a small-prime sieving limit of between 20000 and 100000 is typical. I don't know if/how George modifies that value for q > 2^64 (or whatever hardware-dependent breakover points Prime95 has to deal with).
ewmayer is offline   Reply With Quote
Old 2005-03-28, 19:32   #4
JuanTutors
 
JuanTutors's Avatar
 
"Juan Tutors"
Mar 2004

571 Posts
Default

Quote:
Originally Posted by Prime95
Now, I didn't do the clock counting technique above. I just tried different sieve end points and chose the best one for the machine I was using at that time. Years later we are testing bigger exponents, bigger factors, on different machines. The current sieve endpoint is likely non-optimal.
What are the chances of you or someone else trying to figure out a better if not optimal sieve endpoint? I personally would be afraid to go in myself and try to modify the current sieve endpoint and try different ones...
JuanTutors is offline   Reply With Quote
Old 2005-03-28, 20:32   #5
Peter Nelson
 
Peter Nelson's Avatar
 
Oct 2004

232 Posts
Default

Thanks guys for very high quality answers.

I think the idea of the sieve cutoff being non-optimal on current machines (or on very different size exponents) was partly where I was headed with my thinking too.
Peter Nelson is offline   Reply With Quote
Old 2005-03-28, 22:24   #6
smh
 
smh's Avatar
 
"Sander"
Oct 2002
52.345322,5.52471

22458 Posts
Default

Quote:
Originally Posted by Prime95
Now, I didn't do the clock counting technique above. I just tried different sieve end points and chose the best one for the machine I was using at that time. Years later we are testing bigger exponents, bigger factors, on different machines. The current sieve endpoint is likely non-optimal.
So this might be an easy (small) speed gain in the factoring code. Or would the speed gain be to small to go through the trouble of trying different cut off points?
smh is offline   Reply With Quote
Old 2005-03-30, 03:39   #7
Peter Nelson
 
Peter Nelson's Avatar
 
Oct 2004

232 Posts
Default Primearray values

Having now examined factor64.asm further I see the table of small primes (primearray) that gets built from the byte table of differences in sivinfo.

It does this until reaching the zero difference byte which indicates the end of the sequence.

From the source I have, the primearray gets populated with the 9208 small primes (over 5) from 7 through to 95549 (so, indirectly prime95 does contain a list of primes in a compact encoded form).

I believe these are the primes used for attempted elimination of factor candidates.

If so then this is quite different from the 40,000 or so in the documentation and may benefit from updating the help file(s) and/or possible review of whether this is the best value.

In any case I agree with George that the value might now be sub-optimal on current work and hardware. Ideally it would be possible to benchmark at runtime and adaptively change the array size, although an easier approach would be to just select a fixed size based on the architecture, or even just a single update to the source code based on current benchmarking.

I may be wrong in that although the table gets built to this size, perhaps it is not actually used to this depth because of the 4K sieve size or some other limiting variable.

George, am I right that all 9208 primes (to 95549) are actually getting used to quickly eliminate potential factors, or is it only some of them? (about half this number)
Peter Nelson is offline   Reply With Quote
Old 2005-03-30, 15:31   #8
Prime95
P90 years forever!
 
Prime95's Avatar
 
Aug 2002
Yeehaw, FL

22×43×47 Posts
Default

Look for a line containing "DB 0" buried in the middle of the sivinfo array. This effectively truncates the table.
Prime95 is offline   Reply With Quote
Old 2005-03-30, 17:51   #9
Peter Nelson
 
Peter Nelson's Avatar
 
Oct 2004

232 Posts
Default

Yes, I had found the zero terminating reading of the data, and used the table of entries in Visual Basic to build the same table of 9208 primes from it that the assembly code does. There are more figures after in the ifdef foo section but these are not used.

However, the location of the zero suggests that primes up to 95000ish are used rather than circa 40000.

As the table is this size, my question is whether the code that steps through it actually uses all entries in the table or not.

Last fiddled with by Peter Nelson on 2005-03-30 at 17:55
Peter Nelson is offline   Reply With Quote
Old 2005-03-30, 18:28   #10
Prime95
P90 years forever!
 
Prime95's Avatar
 
Aug 2002
Yeehaw, FL

22×43×47 Posts
Default

Assuming your calculations are correct, then the cutoff is at 95000 not 40000.

For some reason, I thought the sivinfo data only built primes to 64K. But that was so long ago, I apparently mis-remembered.
Prime95 is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
Some code for factoring numbers up to 2^63 arbooker Factoring 219 2022-10-28 13:48
Factoring Question Rde Software 12 2009-06-12 22:38
Question about triming [code] blocks schickel Forum Feedback 4 2009-04-01 03:27
factoring question philmoore Factoring 8 2005-06-14 22:13
Questions about SSE2 code and Factoring Joe O Software 2 2002-09-13 23:39

All times are UTC. The time now is 20:22.


Fri Dec 2 20:22:20 UTC 2022 up 106 days, 17:50, 0 users, load averages: 0.78, 1.17, 1.24

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2022, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.

≠ ± ∓ ÷ × · − √ ‰ ⊗ ⊕ ⊖ ⊘ ⊙ ≤ ≥ ≦ ≧ ≨ ≩ ≺ ≻ ≼ ≽ ⊏ ⊐ ⊑ ⊒ ² ³ °
∠ ∟ ° ≅ ~ ‖ ⟂ ⫛
≡ ≜ ≈ ∝ ∞ ≪ ≫ ⌊⌋ ⌈⌉ ∘ ∏ ∐ ∑ ∧ ∨ ∩ ∪ ⨀ ⊕ ⊗ 𝖕 𝖖 𝖗 ⊲ ⊳
∅ ∖ ∁ ↦ ↣ ∩ ∪ ⊆ ⊂ ⊄ ⊊ ⊇ ⊃ ⊅ ⊋ ⊖ ∈ ∉ ∋ ∌ ℕ ℤ ℚ ℝ ℂ ℵ ℶ ℷ ℸ 𝓟
¬ ∨ ∧ ⊕ → ← ⇒ ⇐ ⇔ ∀ ∃ ∄ ∴ ∵ ⊤ ⊥ ⊢ ⊨ ⫤ ⊣ … ⋯ ⋮ ⋰ ⋱
∫ ∬ ∭ ∮ ∯ ∰ ∇ ∆ δ ∂ ℱ ℒ ℓ
𝛢𝛼 𝛣𝛽 𝛤𝛾 𝛥𝛿 𝛦𝜀𝜖 𝛧𝜁 𝛨𝜂 𝛩𝜃𝜗 𝛪𝜄 𝛫𝜅 𝛬𝜆 𝛭𝜇 𝛮𝜈 𝛯𝜉 𝛰𝜊 𝛱𝜋 𝛲𝜌 𝛴𝜎𝜍 𝛵𝜏 𝛶𝜐 𝛷𝜙𝜑 𝛸𝜒 𝛹𝜓 𝛺𝜔