mersenneforum.org P1 Questions
 Register FAQ Search Today's Posts Mark Forums Read

 2016-04-23, 20:18 #1 potonono     Jun 2005 USA, IL 110000012 Posts P1 Questions 1) How would P1 have found this factor? http://www.mersenne.ca/exponent/75751771 Isn't it outside the bounds used? 2) If P1 eliminates a certain portion of factor search space, could trial factoring save any time by skipping that same search space? Last fiddled with by potonono on 2016-04-23 at 20:29
 2016-04-23, 20:41 #2 Dubslow Basketry That Evening!     "Bunslow the Bold" Jun 2011 40
2016-04-24, 06:17   #3
axn

Jun 2003

10011011010102 Posts

Quote:
 Originally Posted by potonono 2) If P1 eliminates a certain portion of factor search space, could trial factoring save any time by skipping that same search space?
Yes. Well, maybe. There is about a 30% overlap between TF & P-1 at the levels GIMPS is doing, which is to say, you could skip about 30% of TF candidate factors if you could identify them efficiently.

The idea is to skip factors f where f-1 is smooth to a given bound (B1, B2). This can be done relatively efficiently using sieve techniques (c.f QS, NFS). I don't know if this would be a net win (especially on GPUs). But supposing that we pull this off, here are a couple of reasons not to do it.

1) We'll have to have two code paths for TF, one that skips smooth factors, and one normal. The program will have to decide which one to use based on whether P-1 has been run or not. There will also be bookkeeping overhead to keep track of which version was used.
2) What if there was an error during P-1 and a factor was missed? If we skip those factors in TF as well, we lose our chance to eliminate that test. This, I would say, is a deal breaker.

2016-04-25, 03:12   #4
potonono

Jun 2005
USA, IL

110000012 Posts

Quote:
 Originally Posted by axn 1) We'll have to have two code paths for TF, one that skips smooth factors, and one normal. The program will have to decide which one to use based on whether P-1 has been run or not. There will also be bookkeeping overhead to keep track of which version was used.
If P1 were a requirement, you just need one code path. Isn't it a requirement that after a certain amount of TF, P1 is required before LL?

Quote:
 Originally Posted by axn 2) What if there was an error during P-1 and a factor was missed? If we skip those factors in TF as well, we lose our chance to eliminate that test. This, I would say, is a deal breaker.
What if there is an error in TF and a factor was missed? Unless double-checking is being done (for both TF and P1) with some kind of residue like the LL tests, we can't be sure, and potential 30% seems worthwhile.

All that aside, I agree something like that heavily depends on being able to identify the overlap efficiently, which more than likely wouldn't be easy... Thanks for the info Dubslow and axn!

2016-04-25, 04:53   #5
axn

Jun 2003

2×5×7×71 Posts

Quote:
 Originally Posted by potonono If P1 were a requirement, you just need one code path. Isn't it a requirement that after a certain amount of TF, P1 is required before LL?
TF run before P-1 mustn't skip the overlapping factors.

Only TF run after P-1 should skip, because now you know that such smooth factors cannot divide the number. If you skip the smooth factors before P-1 has been run, then you could end up running unnecessary TFs (at higher bit levels, when a smaller level smooth factor could've been found) and unnecessary P-1.

Hence we should have two code bases for TF. Which one will be run will depend on whether P-1 has been run or not.

Currently, all of TF is performed before P-1, so as per my scheme, we'll not do the skipping of overlapping factors. However, there is an idea of running the highest bit depth of TF after running P-1. If such a scheme were to be implemented, it could usefully skip factors for the final bit depth.

Quote:
 Originally Posted by potonono What if there is an error in TF and a factor was missed? Unless double-checking is being done (for both TF and P1) with some kind of residue like the LL tests, we can't be sure, and potential 30% seems worthwhile.
It is complicated. If we have low error rate on P-1, then it might not be a big concern. But in P-1, a single error could potentially screw up the whole calculation causing us to miss a factor, whereas in TF, the impact of any error is localized. So TF should be more reliable than P-1, and the redundancy provides a little bit of extra reliability to the overall process.

Another factor to consider is that higher TF causes P-1 to pick lower bounds. But if we skip the smooth factors during TF, we cannot consider that TF level as "done" for the purpose for picking P-1 bounds, and so we'll end up spending more time on P-1. This would reduce the net savings to the project.

It is kinda complicated to model all this and see if we'll have a net win. And we don't even know if the theoretical 30% savings can be realized

2016-04-25, 04:54   #6
LaurV
Romulan Interpreter

Jun 2011
Thailand

945210 Posts

Quote:
 Originally Posted by potonono All that aside, I agree something like that heavily depends on being able to identify the overlap efficiently, which more than likely wouldn't be easy...
And it would not worth, as already pointed out. P1 and TF search for factors with different properties, and their overlapping segment is quite small. It would need to do P-1 before TF (and not the other way around) but right now with the GPUs the TF step that covers the same segment as P-1 is much faster, same fast as a "sieving" task should be.

2016-04-25, 05:05   #7
LaurV
Romulan Interpreter

Jun 2011
Thailand

22·17·139 Posts

Quote:
 Originally Posted by axn If such a scheme were to be implemented, it could usefully skip factors for the final bit depth.
no, it could not (and sorry for the cross post) - that will need factoring k, to see which one is smooth, or in the best way, sieving, which would take the same amount as doing full TF with the GPU.

Final bit of TF after the P-1 makes sense (sometimes even the final two bits!) because a P-1 with a chance of finding factors of 3-4% (one in 25-35 trials gives a factor) will run 3 times (on a 580 or Titan) as a TF to 75 or so (with a chance to find a factor in 75 trials), therefore you can find factors faster running P-1 than running TF to last bit. Of course, this calculus is coarse, and it does not consider much the fact that if a factor is not found, then the TF to last bit will be anyhow done.

Last fiddled with by LaurV on 2016-04-25 at 05:07 Reason: s/than/then/ and back

2016-04-25, 05:24   #8
axn

Jun 2003

2×5×7×71 Posts

Quote:
 Originally Posted by LaurV no, it could not (and sorry for the cross post) - that will need factoring k, to see which one is smooth, or in the best way, sieving, which would take the same amount as doing full TF with the GPU
It has to be done by sieving -- factoring would be too costly. Could be done on the GPU itself. But I don't understand why it would take the same (or more) time as doing full TF. Unless someone has actually tried it out, and has some hard numbers, I prefer to think of it as an open question.

Rough outline of how it works. B1 and B2 parameters control the sieving.

You sieve with all primes and prime powers < B1. Instead of clearing a bitmap entry, you'll have a byte-map, and you'll add log(p) (scaled up to be an integer) to the byte-map entry. Finally, you'll add log(B2) to that (or rather every entry in byte-map is initialized to log(B2)), and check if > log(k). All of these can be done with approximate logs.

 2016-04-25, 09:39 #9 LaurV Romulan Interpreter     Jun 2011 Thailand 22×17×139 Posts MfaktX splits the k's in a number of modularity classes (either 420 or 4620, depending on settings) from which, depending on p's modularity, most are eliminated, because the factor q=2*k*p+1 of m=2^p-1, with p prime, would either be non prime, or it would be 3 or 5 (mod 8). The remaining classes (96 or respective 960, for the two cases) are sieved with a number of small primes, between few thousands and a million primes, or more, depending on settings. I mean, the q's are sieved. This way the exponentiation is only done for q's which have chances to be prime. By this, we know how fast the sieve is on GPU. What is proposed here is to repeat the same sieve for k's themselves, in each class, with primes smaller than B1 and only do exponentiation to what's left. I mean, avoid doing the exponentiation for smooth k's. Technically, the sieve computes for each small prime s the amount x=-1/(2p) (mod s) and eliminates all k which are x (mod s). This way, all q=2kp+1=0 (mod s) are eliminated [from the factor-candidates list, they can't be factors]. One additional test could check if k itself is 0 (mod s) and in this case the values of s for which such thing happens would have to be multiplied (if s

 Similar Threads Thread Thread Starter Forum Replies Last Post Dubslow GPU Computing 1 2011-08-05 18:22 Carmichael Factoring 8 2007-04-10 11:30 OmbooHankvald Prime Sierpinski Project 2 2005-08-01 20:18 OmbooHankvald Math 6 2005-06-23 11:42 xtreme2k Lounge 59 2002-10-31 06:20

All times are UTC. The time now is 23:46.

Tue May 18 23:46:55 UTC 2021 up 40 days, 18:27, 0 users, load averages: 1.38, 1.35, 1.44