20160423, 20:18  #1 
Jun 2005
USA, IL
193 Posts 
P1 Questions
1) How would P1 have found this factor? http://www.mersenne.ca/exponent/75751771 Isn't it outside the bounds used?
2) If P1 eliminates a certain portion of factor search space, could trial factoring save any time by skipping that same search space? Last fiddled with by potonono on 20160423 at 20:29 
20160423, 20:41  #2 
Basketry That Evening!
"Bunslow the Bold"
Jun 2011
40<A<43 89<O<88
3×29×83 Posts 
1) BrentSuyama extension *sometimes* allows P1 to find factors where B2 is otherwise too small. The way it works is a bit luck based though, so it's not the same as "a larger B2". Note that in the "P1 results:" section of your link, where it has the userid listed, the column to the right of B2 is labeled "e". This column indicates whether or not the BrentSuyama extension was in use. e=1 means it was not in use, e=2 or e=6 indicates it was in use though with lesser ability to find factors, and e=12 is the full strength of Prime95's BrentSuyama implementation. For this factor, e was indeed 12.
2) The search spaces are not compatible. There are factors that P1 can find in 1 GHzday that would take TF 1000, and conversely factors that TF can find in 1 GHzday that would take 1000 with P1. However, running one does decrease the odds that the other method will find a factor. Notice that running P1 on the same (or nearly the same) exponents will have drastically different odds of factor listed by Prime95 depending on how much TF is done before the P1. 
20160424, 06:17  #3  
Jun 2003
2^{2}×5^{2}×7^{2} Posts 
Quote:
The idea is to skip factors f where f1 is smooth to a given bound (B1, B2). This can be done relatively efficiently using sieve techniques (c.f QS, NFS). I don't know if this would be a net win (especially on GPUs). But supposing that we pull this off, here are a couple of reasons not to do it. 1) We'll have to have two code paths for TF, one that skips smooth factors, and one normal. The program will have to decide which one to use based on whether P1 has been run or not. There will also be bookkeeping overhead to keep track of which version was used. 2) What if there was an error during P1 and a factor was missed? If we skip those factors in TF as well, we lose our chance to eliminate that test. This, I would say, is a deal breaker. 

20160425, 03:12  #4  
Jun 2005
USA, IL
C1_{16} Posts 
Quote:
Quote:
All that aside, I agree something like that heavily depends on being able to identify the overlap efficiently, which more than likely wouldn't be easy... Thanks for the info Dubslow and axn! 

20160425, 04:53  #5  
Jun 2003
2^{2}·5^{2}·7^{2} Posts 
Quote:
Only TF run after P1 should skip, because now you know that such smooth factors cannot divide the number. If you skip the smooth factors before P1 has been run, then you could end up running unnecessary TFs (at higher bit levels, when a smaller level smooth factor could've been found) and unnecessary P1. Hence we should have two code bases for TF. Which one will be run will depend on whether P1 has been run or not. Currently, all of TF is performed before P1, so as per my scheme, we'll not do the skipping of overlapping factors. However, there is an idea of running the highest bit depth of TF after running P1. If such a scheme were to be implemented, it could usefully skip factors for the final bit depth. Quote:
Another factor to consider is that higher TF causes P1 to pick lower bounds. But if we skip the smooth factors during TF, we cannot consider that TF level as "done" for the purpose for picking P1 bounds, and so we'll end up spending more time on P1. This would reduce the net savings to the project. It is kinda complicated to model all this and see if we'll have a net win. And we don't even know if the theoretical 30% savings can be realized 

20160425, 04:54  #6 
Romulan Interpreter
Jun 2011
Thailand
2^{2}×2,341 Posts 
And it would not worth, as already pointed out. P1 and TF search for factors with different properties, and their overlapping segment is quite small. It would need to do P1 before TF (and not the other way around) but right now with the GPUs the TF step that covers the same segment as P1 is much faster, same fast as a "sieving" task should be.

20160425, 05:05  #7  
Romulan Interpreter
Jun 2011
Thailand
2^{2}·2,341 Posts 
Quote:
Final bit of TF after the P1 makes sense (sometimes even the final two bits!) because a P1 with a chance of finding factors of 34% (one in 2535 trials gives a factor) will run 3 times (on a 580 or Titan) as a TF to 75 or so (with a chance to find a factor in 75 trials), therefore you can find factors faster running P1 than running TF to last bit. Of course, this calculus is coarse, and it does not consider much the fact that if a factor is not found, then the TF to last bit will be anyhow done. Last fiddled with by LaurV on 20160425 at 05:07 Reason: s/than/then/ and back 

20160425, 05:24  #8  
Jun 2003
1324_{16} Posts 
Quote:
Rough outline of how it works. B1 and B2 parameters control the sieving. You sieve with all primes and prime powers < B1. Instead of clearing a bitmap entry, you'll have a bytemap, and you'll add log(p) (scaled up to be an integer) to the bytemap entry. Finally, you'll add log(B2) to that (or rather every entry in bytemap is initialized to log(B2)), and check if > log(k). All of these can be done with approximate logs. 

20160425, 09:39  #9 
Romulan Interpreter
Jun 2011
Thailand
2^{2}×2,341 Posts 
MfaktX splits the k's in a number of modularity classes (either 420 or 4620, depending on settings) from which, depending on p's modularity, most are eliminated, because the factor q=2*k*p+1 of m=2^p1, with p prime, would either be non prime, or it would be 3 or 5 (mod 8). The remaining classes (96 or respective 960, for the two cases) are sieved with a number of small primes, between few thousands and a million primes, or more, depending on settings. I mean, the q's are sieved. This way the exponentiation is only done for q's which have chances to be prime. By this, we know how fast the sieve is on GPU. What is proposed here is to repeat the same sieve for k's themselves, in each class, with primes smaller than B1 and only do exponentiation to what's left. I mean, avoid doing the exponentiation for smooth k's. Technically, the sieve computes for each small prime s the amount x=1/(2p) (mod s) and eliminates all k which are x (mod s). This way, all q=2kp+1=0 (mod s) are eliminated [from the factorcandidates list, they can't be factors]. One additional test could check if k itself is 0 (mod s) and in this case the values of s for which such thing happens would have to be multiplied (if s<B1) in a separate variable  one for each k. At the end, it would need to test the multiplied values to see if they are equal to k (i.e if k was B1 smooth). I see two problems, one related to the memory and speed (is not cutting out bits, like normal sieving, but multiplying values, as large as 2^50  when we go to 75 bits factors, and have a 25 bits p exponent  well, we can find their logarithms and add them instead of multiplication, but still pain in the ass, and with 50 time more memory required), and second related to powers of the small primes s, additional tests need to be done to check if k was power smooth, otherwise we will miss factors (the fact that for the P1 algorithm to find the factor, that factor need to be B1powersmooth, and not only smooth, is usually missed by the bystanders talking about , so skipping all smooth numbers in the TF step could miss factors).
Last fiddled with by LaurV on 20160425 at 09:47 
Thread Tools  
Similar Threads  
Thread  Thread Starter  Forum  Replies  Last Post 
Two questions:  Dubslow  GPU Computing  1  20110805 18:22 
Questions about the QS  Carmichael  Factoring  8  20070410 11:30 
Questions  OmbooHankvald  Prime Sierpinski Project  2  20050801 20:18 
LLR questions  OmbooHankvald  Math  6  20050623 11:42 
A few questions :)  xtreme2k  Lounge  59  20021031 06:20 