20050727, 11:32  #1  
Aug 2002
Termonfeckin, IE
5×19×29 Posts 
New ECM Effort Tables for Cunningham Composites Ready
Since, the previous thread at
http://www.mersenneforum.org/showthread.php?t=3998 meandered into a discussion of parallelization of the matrix step, I decided to start a new thread for this discussion. Those who get lost in my references to Bruce's posts are asked to refer to that thread. I'm almost ready to release the latest tables of ECM effort expended on Cunningham composites. I've made a few assumptions though and I would like it if people could verify that these are correct before I post the tables up on the forum. 1. I've decided to convert the curves reported from the actual number of curves to the proportion done. This was done to eliminate the confusion between Prime95 curves and ECM5.0.3 curves and ECM6 curves. So now you will see figures like 0.57 instead of say 3450. 2. In the tables I received from rogue I have assumed that all base2 curves are Prime95 curves since rogue used George's tables for this and George is still using Prime95 curve equivalents. For all other bases I assumed the curve counts are ECM6 except for the few cases where Bruce's emails/posts make it clear these were ECM5 curves. The current tables do Prime95 counts and there was a little bit of mixing between Prime95 and ECM6 counts on the other bases but I have sorted it out conservatively. I assumed that Bruce's figures that rogue added to the nonbase2 tables were ECM6 and converted them accordingly. There will nevertheless be some undercounting/overcounting but with over a thousand composites it is a tedious task and just not worth anyone's time to strive for 100% accuracy. And as Bob Silverman reminded me here http://www.mersenneforum.org/showpos...3&postcount=32 Quote:
is only significant when large efforts such as those of Bruce are doublecounted. But I think Rogue has done a good job of sorting that out! 3. My third point, which is a doubt that remains unclear is over how Bruce Dodson has computed curve equivalences. His emails/posts led me to believe that he was implying that x curves at B1=11e6 are equivalent to x*(11e6/44e6) = 0.25x curves at B1=44e6. First, was he really implying this? I doubt that this is correct. Secondly, he also seemed to say in one of his emails/posts that making the p45 column as done caused him some confusion as he assumed that 10600 curves had been run for p45 which is equivalent to 2650 curves at the p50 level. I have two questions here: Should 2650 curves be added to the p50 column? I do not think so as Paul Zimmerman's page clearly states that you do x curves at p40 and then do y curves at p45 etc. Which means optimal parameters require running the appropriate number of curves at EACH level and curves at one level do not count against another level. There is a second possibility in that Bruce was talking about curves in excess of those required for p45. That is if instead of 10,600, p45 had 20,600 curves run then we can accrdingly reduce the number of required curves at the p50 level by 2650. I realize that there is one exception to the rule above in that running some curves at a high B1 level makes the chances of finding a smaller factor much smaller which means that say 400 P95 curves at p60 makes runnning any more curves at p40 pointless. If my understanding of the situation is correct, I propose that we measure these curves in a conservative fashion. Rogue has done an excellent job figuring out Bruce's emails  which are sometimes confusing to me :). I propose that only if enough curves have been run at a higher B1 to make running curves at the lower level pointless do we mark the lower level as complete. However I would elicit opinions on whether this "pseudocompletion" should be signified with a 1.0 or done or with a "NR = Not Required". Similarly, curves at a lower level count for a higher level only if we have CLEAR evidence that more than the required number of curves have been run. Only then will the additional curves be translated and added to the next higher level. This should clear everything up. Please do post your comments, particularly Bob, Bruce, Rogue and George. 

20050727, 11:57  #2  
Nov 2003
2^{2}×5×373 Posts 
Quote:
Bob BTW, not to criticize Bruce, but I also find his emails confusing.. This was a source of dispute some time ago when I may (probably did) have misinterpreted some of his data. Another possible way to combine counts at multiple levels would be to report "probability" of an undiscovered factor remaining at that level. Of course, this probability is either 0 or 1. What this really means is the probability that ECM has failed to find such a factor, if it exists. Under this interpretation, a p50 curve WOULD also add to the p45 level. I like this interpretation best of all, but it does mean having software that can estimate Dickmans's function fairly accurately. Note that if we run the optimal number of curves for a given level, the chance of failure is still (11/e). 

20050727, 12:33  #3 
Aug 2002
Termonfeckin, IE
AC3_{16} Posts 
I think that is an excellent idea. However, I wonder if most readers of the forum would read the fine print. When they see (11/e) in the conditional probability of finding a factor column, they will most likely want to do further curves at that level.
I do not have a program to compute the Dickman function either, though I believe that ECM6 incorporates much of the code required. Wonder if akruppa would wander in and offer comments I think let us go with the proportion complete idea for this iteration and proceed to the conditional probability in the next one. Hopefully, we'll have the Dickmans's function code by then. I just don't want to delay releasing the current set any more. 
20050727, 12:35  #4 
"Nancy"
Aug 2002
Alexandria
2,467 Posts 
When I proposed the "percent complete instead of curve count" idea I thought of keeping track of the probability that a factor of a given size, assuming it exists, was missed by the ECM work done. I.e. pxx n% complete means that the probability of missing a pxx is exp(n/100).
This means that, strictly speaking, all curves at all B1,B2 settings count towards all factor sizes. However, the effect of curves with low B1,B2 bounds on large factors is very small indeed, so for the sake of simplifying bookkeeping I'd suggest counting only curves to a factor size level that were run with bounds optimised for that level, the level 'one smaller', or higher levels. Strictly speaking, we shouldn't stop at 100% either... doing more curves will lower the probability of missing a factor below exp(1). However, if we use the work done tables only to decide on which bounds to choose next, stopping at 100% (or 1.) is alright. It would be different if we wanted to use the data i.e. to compute the exact probability that a curve of given parameters finds any factor, as that would require knowing the distribution of candidate factors over all the factor sizes. The curve counts on PaulZ's table list only the expected number of curves at *one* B1 value to find a factor of the given size. They do not account for work that was (and maybe wasn't!) done at lower levels. >I like this interpretation best of all, but it does mean having software that >can estimate Dickmans's function fairly accurately. The code in GMPECM 6 is pretty accurate for the values of alpha we need. For larger alpha (>7 or so, afair) the results drift off due to accumulated rounding errors. For all probability estimates we need in practise, the contribution of these larger alpha is negligible. Alex 
20050727, 13:08  #5 
Aug 2002
Termonfeckin, IE
5×19×29 Posts 
Thanks Alex. I am not familiar with the GMPECM code. Is there a ready interface for calculating these probabilities? If not, can you let me know where I can find the relevant code  filename and/or function nam would suffice  line number would be even better.
Also, can you explain to me what alpha is? PS: I still think I will go with "percent complete" since 100% is more "userfriendly" than 0.368. And you are right in that to make bookkeeping simple we should ignore the impact of levels below "one smaller". I will probably write a script to compute probabilities across the board for the next release, but not this one. 
20050727, 13:27  #6 
"Nancy"
Aug 2002
Alexandria
2,467 Posts 
All the code for the probabilities is in rho.c. It is more or less a port of the rho.gp Pari/GP script. The C code is not documented at all, the script somewhat. It's not hard to use anyways, you call rho_init() with step size and max alpha value you want, then you can use ecmprob().
The dickman rho(alpha) function is defined as the ratio of positive integers <=N that are N^(1/alpha) smooth as N>oo. So rho(2) is the probability that an integer n chosen uniformly at random has no prime factor >sqrt(n). Alex Last fiddled with by akruppa on 20050727 at 13:28 
20050727, 14:42  #7  
Nov 2003
2^{2}×5×373 Posts 
Quote:
the probability that a number has all of its factors less than B1, except perhaps for a single factor between B1 and B2. I used this function extensively in my paper, but I no longer have the code. 

20050727, 14:58  #8 
"Nancy"
Aug 2002
Alexandria
2,467 Posts 
My code does that. It computes a table of the values of rho(alpha), then integrates rho(alpha)/p over the stage2 primes (I forgot the details by now, I could look them up. It's basically the mu(alpha,beta) function). It also computes the effect of the BrentSuyama extension by integrating over primes p>B2, taking into account the probability that p divides f(a)f(b) for k*dF^2 distinct (a,b) pairs and f() a Sth power or a degree S Dickson polynomial. It assumes an "extra smoothness" of 23.4 (24 in version 6.0).
I believe all major effects that affect the ECM probability of success are covered, and the expected values for the number of curves needed to find factors of given size matched experimental counts with subpercent relative error. I'm pretty sure the values printed by GMPECM are considerably more accurate than any found in existing literature. Alex Last fiddled with by akruppa on 20050727 at 15:02 
20050805, 13:09  #9 
"Nancy"
Aug 2002
Alexandria
2,467 Posts 
To lend a bit of support to my claim:
I ran primes p in [N1000000, N+1000000], N = floor(10^19.5), separated into a file with p==1 (mod 6) and one with p==5 (mod 6) through gmpecm 50 times. The 1 (mod 6) file contains 22386 primes, the 5 (mod 6) file contains 22310 primes. Using parameters B1=11000, B2=1580670, k=2, dF=288, Dickson_6, the 50 runs found 17074 of the p==1 (mod 6) and 15394 of the p==5 (mod 6) primes. This is a ratio of one prime found in 50*22386/17074 = 65.56 and 50*22310/15394 = 72.46 attempts, respectively. The ratio for primes of both residue classes combined is (50*22386+50*22310)/(17074+15394) = 68.83. The values reported by rho.gp, using the average exponents of 2 and 3 in the group order given in Montgomery's disseration for p==1 (mod 6) and p==5 (mod 6) for curves with torsion group of order 12, is 65.54 for primes p==1 (mod 5) and 72.15 for p==5 (mod 6) . The combined value for primes of all residue classes is 68.60. The relative errors are 0.02%, 0.44% and 0.33%, respectively. The latter figure is well within the standard deviation of a Poisson process with 32468 events. There is one curious thing I noticed along the way: Montgomery gives the avg. exponent (table 6.3.1, p76) of 2 as 3.68 and of 3 as 1.87 for the p==1 (mod 6) case. Now 2^3.68*3^1.87 = 100.0016 is almost exactly 100. For the p==5 (mod 6) case, we have avg. exponent 3.69 and 1.50 resp., and 2^3.69*3^1.50 = 67.06 which is close to 66.666... Are these values meant to be 100 and 200/3 exactly? If so, where do these constants come from? Alex 
20050822, 03:09  #10 
Apr 2004
9_{16} Posts 
Alex,
I'd be very interested to see some similar figures for p+1 and p1 factoring using the default B2 values, say for B1=10^6 and 10^7? The implementation of these in the current program is a lot faster than any others I have seen before, so I'm giving p+1 a go over my n!+1 tables. Another user had some very impressive results with p1 ! Andrew 
20050823, 21:15  #11 
"Nancy"
Aug 2002
Alexandria
9A3_{16} Posts 
I've prepared estimates for P1 but the code is not finished yet and won't make it for 6.1. If your prime factors p of N are so that a small prime or prime power q^k  p1 with probability (q1)q^(k1) (that is to say, if p behaves as if chosen uniformly at random from the primes), the "extra smoothness" for P1 is 3.41 (instead of 23.4 for ECM).
You could use the rho.gp script and substitute 23.4 (or 24, which was the old value I erroneously used first) in ecmprob() by 3.41 and the results should be relatively accurate. If the prime factors p have a known divisor in p1 or there are known quadratic residues (mod p), the probabilities change a bit. That will be in the code later on, but as I said isn't quite ready yet. Edit: I forgot about P+1: the problem with the P+1 algorithm is that you don't know whether your group order will be p1 or p+1. If the order is p+1, the analysis is very similar to the P1 case and you can assume an "extra smoothness" of 3.41. If you assume that you get order p+1 with 50% probability (a reasonable assumption for most numbers) and you did P1 with at least as high bounds before, the probability for P+1 is 50% that of P1, because with 50% probability you'll merely repeat P1 which you know finds nothing. On the whole I think P+1 is only really useful for Fibonacci and Lucas numbers. For other numbers, the probaility per unit cpu time for P+1 is no (or only marginally) better than for ECM with identical B1,B2 values. Alex Last fiddled with by akruppa on 20050823 at 21:35 
Thread Tools  
Similar Threads  
Thread  Thread Starter  Forum  Replies  Last Post 
Tables with effort needed?  CRGreathouse  Software  0  20180220 22:21 
Extended Cunningham tables  ZetaFlux  Factoring  2  20080303 18:34 
Cunningham Tables @mersenneforum.org v2.0  garo  Cunningham Tables  3  20060704 08:00 
New Cunningham Tables are ready. Please see sample and comment  garo  Factoring  9  20050802 16:52 
A question about Cunningham tables  T.Rex  Factoring  14  20050527 00:27 