mersenneforum.org  

Go Back   mersenneforum.org > Factoring Projects > Factoring

Reply
 
Thread Tools
Old 2005-07-27, 11:32   #1
garo
 
garo's Avatar
 
Aug 2002
Termonfeckin, IE

5×19×29 Posts
Default New ECM Effort Tables for Cunningham Composites Ready

Since, the previous thread at
http://www.mersenneforum.org/showthread.php?t=3998 meandered into a
discussion of parallelization of the matrix step, I decided to start a
new thread for this discussion. Those who get lost in my references to
Bruce's posts are asked to refer to that thread.

I'm almost ready to release the latest tables of ECM effort expended
on Cunningham composites. I've made a few assumptions though and I
would like it if people could verify that these are correct before I
post the tables up on the forum.

1. I've decided to convert the curves reported from the actual number
of curves to the proportion done. This was done to eliminate the
confusion between Prime95 curves and ECM5.0.3 curves and ECM6
curves. So now you will see figures like 0.57 instead of say
3450.

2. In the tables I received from rogue I have assumed that all base-2
curves are Prime95 curves since rogue used George's tables for this
and George is still using Prime95 curve equivalents. For all other
bases I assumed the curve counts are ECM6 except for the few cases
where Bruce's emails/posts make it clear these were ECM5 curves. The
current tables do Prime95 counts and there was a little bit of mixing
between Prime95 and ECM6 counts on the other bases but I have sorted
it out conservatively. I assumed that Bruce's figures that rogue added
to the non-base2 tables were ECM6 and converted them
accordingly. There will nevertheless be some
undercounting/overcounting but with over a thousand composites it is a
tedious task and just not worth anyone's time to strive for 100%
accuracy.

And as Bob Silverman reminded me here
http://www.mersenneforum.org/showpos...3&postcount=32
Quote:
Originally Posted by Bob
If you read my paper with Sam Wagstaff Jr., you will
discover that the response surface [probability of success with given
limits and multiple curves] is VERY shallow in the neighborhood of the
optimum.
So slight undercounts/overcounts do not matter. It
is only significant when large efforts such as those of Bruce are
doublecounted. But I think Rogue has done a good job of sorting that
out!

3. My third point, which is a doubt that remains unclear is over how
Bruce Dodson has computed curve equivalences. His emails/posts led me
to believe that he was implying that x curves at B1=11e6 are equivalent
to x*(11e6/44e6) = 0.25x curves at B1=44e6. First, was he really
implying this? I doubt that this is correct.

Secondly, he also seemed to say in one of his emails/posts that making
the p45 column as done caused him some confusion as he assumed that
10600 curves had been run for p45 which is equivalent to 2650 curves
at the p50 level. I have two questions here: Should 2650 curves be
added to the p50 column? I do not think so as Paul Zimmerman's page
clearly states that you do x curves at p40 and then do y curves at p45
etc. Which means optimal parameters require running the appropriate
number of curves at EACH level and curves at one level do not count
against another level. There is a second possibility in that Bruce was
talking about curves in excess of those required for p45. That is if
instead of 10,600, p45 had 20,600 curves run then we can accrdingly
reduce the number of required curves at the p50 level by 2650.

I realize that there is one exception to the rule above in that
running some curves at a high B1 level makes the chances of finding a
smaller factor much smaller which means that say 400 P95 curves at p60
makes runnning any more curves at p40 pointless.

If my understanding of the situation is correct, I propose that we
measure these curves in a conservative fashion. Rogue has done an
excellent job figuring out Bruce's emails - which are sometimes
confusing to me :). I propose that only if enough curves have been run
at a higher B1 to make running curves at the lower level pointless do
we mark the lower level as complete. However I would elicit opinions
on whether this "pseudo-completion" should be signified with a 1.0 or
done or with a "NR = Not Required". Similarly, curves at a lower level
count for a higher level only if we have CLEAR evidence that more than
the required number of curves have been run. Only then will the
additional curves be translated and added to the next higher level.

This should clear everything up. Please do post your comments,
particularly Bob, Bruce, Rogue and George.
garo is offline   Reply With Quote
Old 2005-07-27, 11:57   #2
R.D. Silverman
 
R.D. Silverman's Avatar
 
Nov 2003

22×5×373 Posts
Thumbs up

Quote:
Originally Posted by garo


Secondly, he also seemed to say in one of his emails/posts that making
the p45 column as done caused him some confusion as he assumed that
10600 curves had been run for p45 which is equivalent to 2650 curves
at the p50 level. I have two questions here: Should 2650 curves be
added to the p50 column? I do not think so as Paul Zimmerman's page
clearly states that you do x curves at p40 and then do y curves at p45
etc. Which means optimal parameters require running the appropriate
number of curves at EACH level and curves at one level do not count
against another level.

Yes. I am in striong agreement with this. If one runs curves at
p50, they should not be equated to an equivalent number of p45 curves
and those counts also added to the p45 column.


If my understanding of the situation is correct, I propose that we
measure these curves in a conservative fashion. Rogue has done an
excellent job figuring out Bruce's emails - which are sometimes
confusing to me :). I propose that only if enough curves have been run
at a higher B1 to make running curves at the lower level pointless do
we mark the lower level as complete. However I would elicit opinions
on whether this "pseudo-completion" should be signified with a 1.0 or
done or with a "NR = Not Required".

I like the "NR" option.

See above.

Bob

BTW, not to criticize Bruce, but I also find his emails confusing..
This was a source of dispute some time ago when I may (probably did)
have misinterpreted some of his data.

Another possible way to combine counts at multiple levels would be to
report "probability" of an undiscovered factor remaining at that level.
Of course, this probability is either 0 or 1. What this really means is
the probability that ECM has failed to find such a factor, if it exists.
Under this interpretation, a p50 curve WOULD also add to the p45 level.

I like this interpretation best of all, but it does mean having software that
can estimate Dickmans's function fairly accurately.

Note that if we run the optimal number of curves for a given level, the
chance of failure is still (1-1/e).
R.D. Silverman is offline   Reply With Quote
Old 2005-07-27, 12:33   #3
garo
 
garo's Avatar
 
Aug 2002
Termonfeckin, IE

AC316 Posts
Default

I think that is an excellent idea. However, I wonder if most readers of the forum would read the fine print. When they see (1-1/e) in the conditional probability of finding a factor column, they will most likely want to do further curves at that level.

I do not have a program to compute the Dickman function either, though I believe that ECM6 incorporates much of the code required. Wonder if akruppa would wander in and offer comments

I think let us go with the proportion complete idea for this iteration and proceed to the conditional probability in the next one. Hopefully, we'll have the Dickmans's function code by then. I just don't want to delay releasing the current set any more.
garo is offline   Reply With Quote
Old 2005-07-27, 12:35   #4
akruppa
 
akruppa's Avatar
 
"Nancy"
Aug 2002
Alexandria

2,467 Posts
Default

When I proposed the "percent complete instead of curve count" idea I thought of keeping track of the probability that a factor of a given size, assuming it exists, was missed by the ECM work done. I.e. pxx n% complete means that the probability of missing a pxx is exp(-n/100).

This means that, strictly speaking, all curves at all B1,B2 settings count towards all factor sizes. However, the effect of curves with low B1,B2 bounds on large factors is very small indeed, so for the sake of simplifying bookkeeping I'd suggest counting only curves to a factor size level that were run with bounds optimised for that level, the level 'one smaller', or higher levels.

Strictly speaking, we shouldn't stop at 100% either... doing more curves will lower the probability of missing a factor below exp(-1). However, if we use the work done tables only to decide on which bounds to choose next, stopping at 100% (or 1.) is alright. It would be different if we wanted to use the data i.e. to compute the exact probability that a curve of given parameters finds any factor, as that would require knowing the distribution of candidate factors over all the factor sizes.

The curve counts on PaulZ's table list only the expected number of curves at *one* B1 value to find a factor of the given size. They do not account for work that was (and maybe wasn't!) done at lower levels.

>I like this interpretation best of all, but it does mean having software that
>can estimate Dickmans's function fairly accurately.

The code in GMP-ECM 6 is pretty accurate for the values of alpha we need. For larger alpha (>7 or so, afair) the results drift off due to accumulated rounding errors. For all probability estimates we need in practise, the contribution of these larger alpha is negligible.

Alex
akruppa is offline   Reply With Quote
Old 2005-07-27, 13:08   #5
garo
 
garo's Avatar
 
Aug 2002
Termonfeckin, IE

5×19×29 Posts
Default

Thanks Alex. I am not familiar with the GMP-ECM code. Is there a ready interface for calculating these probabilities? If not, can you let me know where I can find the relevant code - filename and/or function nam would suffice - line number would be even better.

Also, can you explain to me what alpha is?

PS: I still think I will go with "percent complete" since 100% is more "user-friendly" than 0.368. And you are right in that to make book-keeping simple we should ignore the impact of levels below "one smaller".

I will probably write a script to compute probabilities across the board for the next release, but not this one.
garo is offline   Reply With Quote
Old 2005-07-27, 13:27   #6
akruppa
 
akruppa's Avatar
 
"Nancy"
Aug 2002
Alexandria

2,467 Posts
Default

All the code for the probabilities is in rho.c. It is more or less a port of the rho.gp Pari/GP script. The C code is not documented at all, the script somewhat. It's not hard to use anyways, you call rho_init() with step size and max alpha value you want, then you can use ecmprob().

The dickman rho(alpha) function is defined as the ratio of positive integers <=N that are N^(1/alpha) smooth as N->oo. So rho(2) is the probability that an integer n chosen uniformly at random has no prime factor >sqrt(n).

Alex

Last fiddled with by akruppa on 2005-07-27 at 13:28
akruppa is offline   Reply With Quote
Old 2005-07-27, 14:42   #7
R.D. Silverman
 
R.D. Silverman's Avatar
 
Nov 2003

22×5×373 Posts
Thumbs up

Quote:
Originally Posted by akruppa
All the code for the probabilities is in rho.c. It is more or less a port of the rho.gp Pari/GP script. The C code is not documented at all, the script somewhat. It's not hard to use anyways, you call rho_init() with step size and max alpha value you want, then you can use ecmprob().

The dickman rho(alpha) function is defined as the ratio of positive integers <=N that are N^(1/alpha) smooth as N->oo. So rho(2) is the probability that an integer n chosen uniformly at random has no prime factor >sqrt(n).

Alex
Except that for 2-step ECM, we want the rho-2 function. This is
the probability that a number has all of its factors less than B1, except
perhaps for a single factor between B1 and B2.

I used this function extensively in my paper, but I no longer have the code.
R.D. Silverman is offline   Reply With Quote
Old 2005-07-27, 14:58   #8
akruppa
 
akruppa's Avatar
 
"Nancy"
Aug 2002
Alexandria

2,467 Posts
Default

My code does that. It computes a table of the values of rho(alpha), then integrates rho(alpha)/p over the stage2 primes (I forgot the details by now, I could look them up. It's basically the mu(alpha,beta) function). It also computes the effect of the Brent-Suyama extension by integrating over primes p>B2, taking into account the probability that p divides f(a)-f(b) for k*dF^2 distinct (a,b) pairs and f() a S-th power or a degree S Dickson polynomial. It assumes an "extra smoothness" of 23.4 (24 in version 6.0).

I believe all major effects that affect the ECM probability of success are covered, and the expected values for the number of curves needed to find factors of given size matched experimental counts with sub-percent relative error. I'm pretty sure the values printed by GMP-ECM are considerably more accurate than any found in existing literature.

Alex

Last fiddled with by akruppa on 2005-07-27 at 15:02
akruppa is offline   Reply With Quote
Old 2005-08-05, 13:09   #9
akruppa
 
akruppa's Avatar
 
"Nancy"
Aug 2002
Alexandria

2,467 Posts
Default

To lend a bit of support to my claim:

I ran primes p in [N-1000000, N+1000000], N = floor(10^19.5), separated into a file with p==1 (mod 6) and one with p==5 (mod 6) through gmp-ecm 50 times. The 1 (mod 6) file contains 22386 primes, the 5 (mod 6) file contains 22310 primes.

Using parameters B1=11000, B2=1580670, k=2, dF=288, Dickson_6, the 50 runs found 17074 of the p==1 (mod 6) and 15394 of the p==5 (mod 6) primes. This is a ratio of one prime found in 50*22386/17074 = 65.56 and 50*22310/15394 = 72.46 attempts, respectively. The ratio for primes of both residue classes combined is (50*22386+50*22310)/(17074+15394) = 68.83.

The values reported by rho.gp, using the average exponents of 2 and 3 in the group order given in Montgomery's disseration for p==1 (mod 6) and p==5 (mod 6) for curves with torsion group of order 12, is 65.54 for primes p==1 (mod 5) and 72.15 for p==5 (mod 6) . The combined value for primes of all residue classes is 68.60. The relative errors are -0.02%, -0.44% and -0.33%, respectively. The latter figure is well within the standard deviation of a Poisson process with 32468 events.

There is one curious thing I noticed along the way:

Montgomery gives the avg. exponent (table 6.3.1, p76) of 2 as 3.68 and of 3 as 1.87 for the p==1 (mod 6) case. Now 2^3.68*3^1.87 = 100.0016 is almost exactly 100. For the p==5 (mod 6) case, we have avg. exponent 3.69 and 1.50 resp., and 2^3.69*3^1.50 = 67.06 which is close to 66.666...

Are these values meant to be 100 and 200/3 exactly? If so, where do these constants come from?


Alex
akruppa is offline   Reply With Quote
Old 2005-08-22, 03:09   #10
AWalker
 
Apr 2004

916 Posts
Default

Alex,

I'd be very interested to see some similar figures for p+1 and p-1 factoring
using the default B2 values, say for B1=10^6 and 10^7?
The implementation of these in the current program is a lot faster than any others I have seen before, so I'm giving p+1 a go over my n!+-1 tables.
Another user had some very impressive results with p-1 !
Andrew
AWalker is offline   Reply With Quote
Old 2005-08-23, 21:15   #11
akruppa
 
akruppa's Avatar
 
"Nancy"
Aug 2002
Alexandria

9A316 Posts
Default

I've prepared estimates for P-1 but the code is not finished yet and won't make it for 6.1. If your prime factors p of N are so that a small prime or prime power q^k | p-1 with probability (q-1)q^(k-1) (that is to say, if p behaves as if chosen uniformly at random from the primes), the "extra smoothness" for P-1 is 3.41 (instead of 23.4 for ECM).

You could use the rho.gp script and substitute 23.4 (or 24, which was the old value I erroneously used first) in ecmprob() by 3.41 and the results should be relatively accurate.

If the prime factors p have a known divisor in p-1 or there are known quadratic residues (mod p), the probabilities change a bit. That will be in the code later on, but as I said isn't quite ready yet.

Edit: I forgot about P+1: the problem with the P+1 algorithm is that you don't know whether your group order will be p-1 or p+1. If the order is p+1, the analysis is very similar to the P-1 case and you can assume an "extra smoothness" of 3.41. If you assume that you get order p+1 with 50% probability (a reasonable assumption for most numbers) and you did P-1 with at least as high bounds before, the probability for P+1 is 50% that of P-1, because with 50% probability you'll merely repeat P-1 which you know finds nothing. On the whole I think P+1 is only really useful for Fibonacci and Lucas numbers. For other numbers, the probaility per unit cpu time for P+1 is no (or only marginally) better than for ECM with identical B1,B2 values.

Alex

Last fiddled with by akruppa on 2005-08-23 at 21:35
akruppa is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
Tables with effort needed? CRGreathouse Software 0 2018-02-20 22:21
Extended Cunningham tables Zeta-Flux Factoring 2 2008-03-03 18:34
Cunningham Tables @mersenneforum.org v2.0 garo Cunningham Tables 3 2006-07-04 08:00
New Cunningham Tables are ready. Please see sample and comment garo Factoring 9 2005-08-02 16:52
A question about Cunningham tables T.Rex Factoring 14 2005-05-27 00:27

All times are UTC. The time now is 02:02.

Sat Sep 19 02:02:09 UTC 2020 up 8 days, 23:13, 0 users, load averages: 2.70, 2.15, 1.72

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2020, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.