20071127, 20:55  #1 
Nov 2007
Halifax, Nova Scotia
2^{3}×7 Posts 
ECM Runtime and F20
Hello everyone,
I am wondering how the runtime of a single curve in the ECM scales in practice with N, B1, and B2. I know from Silverman and Wagstaff's 1993 paper that the total runtime scales for constant factor size with the time for a single multiplication modulo N. For large N using FFT etc. multiplication I suppose this means the runtime of a single should go with log_2(N) for fixed B1 and B2. For fixed N, runtime of a single curve seems to scale linearly with B1, which seems reasonable from my understanding of the algorithm. Also, as long as a reasonable size is taken for B2 the runtime of 1 curve seems to be roughly independent of B2, with some optimal value of B2 as given in Zimmerman's table. Have I got these three scalings roughly correct? Finally one more question: what are some runtimes for single curves on the Fermat numbers? In particular I am wondering how long curves with B1=10^6 and B1=3*10^6 take on F20 (and what I should expect for 11*10^6). Answers to any of these questions would be greatly appreciated, Thanks, Doug 
20071128, 00:02  #2  
Nov 2007
Halifax, Nova Scotia
2^{3}·7 Posts 
I should have mentioned that the thread "Factoring F14" thread provides a partial answer to my performance on Fermat numbers question:
Quote:
Has anyone in particular ran curves on F20? 

20071128, 05:52  #3 
"Phil"
Sep 2002
Tracktown, U.S.A.
3·373 Posts 
[Fri Sep 29 10:45:27 2006]
2^1048576+1 completed 5 ECM curves, B1=1000000, B2=100000000 These 5 curves took about 10 days on an Athlon system, 1400 MHz. My guess is that they would have taken about 1/3 of the time on a 3000 MHz Pentium processor. Your scaling assumption sounds roughly correct, but from F14 to around F17, running time seems to more than double, sometimes tripling even, as you double the exponent. I don't know why, but I'm guessing that it may have something to do with cache utilization. Does anyone else have an opinion on this, or some hard data? Once we get to large numbers, the doubling approximation seems to hold, at least for stage 1, although stage 2 sometimes takes a hit because of main memory limitations. 
20071128, 13:18  #4 
Einyen
Dec 2003
Denmark
3313_{10} Posts 
I did a small speedtest 2 years ago: http://www.mersenneforum.org/showthread.php?t=4223
Original data: ECM.txt Data in Excel where I did regression analysis: ECM.xls Regression formulas: ECMregression.txt step1: When B1 is constant: time=constant*digits^{1.7} When n is constant: time=constant*B1 Step1 conclusion: time = constant*B1*digits^{1.7} step2: When B2 is constant: time=constant*digits^{1.2} When n is constant: time=constant*B2^{0.6} step2 conclusion: time=constant*B2^{0.6}*digits^{1.2} Offcourse this will hold less and less when you get to very high B1/B2 or n when memory becomes a problem. 
20071128, 14:09  #5  
Oct 2004
Austria
9B2_{16} Posts 
Quote:
Last fiddled with by Andi47 on 20071128 at 14:09 

20071128, 16:00  #6 
Nov 2003
2^{2}·5·373 Posts 

20071129, 01:29  #7 
Nov 2007
Halifax, Nova Scotia
111000_{2} Posts 
Thanks everyone for your replies and comments. This gives me a good feeling for the subject. I will run some of my own calculations on F20,
as well as (properly) (re)read Silverman and Wagstaff's paper, and share any insight I garnish, if it is anything beyond what you all have said here. 
20071202, 03:00  #8 
Einyen
Dec 2003
Denmark
110011110001_{2} Posts 
I did some new tests of ECM, p1 and p+1 with GMPECM 6.1.3 on GMP 4.2.2 on 3 different computers: p4 Northwood 2.4 Ghz 1Gb ram, p4 Prescott 3.4 Ghz 2Gb ram, Athlon XP 2200+ 1.80 Ghz 1Gb ram.
ECM: athlonECM.xls athlonECMregression.txt p4nECM.xls p4nECMregression.txt p4pECM.xls p4pECMregression.txt P1: athlonP1.xls athlonP1regression.txt p4nP1.xls p4nP1regression.txt p4pP1.xls p4pP1regression.txt P+1: athlonP+1.xls athlonP+1regression.txt p4nP+1.xls p4nP+1regression.txt p4pP+1.xls p4pP+1regression.txt ECM: Ath: step1: time=constant*B1*digits^{1.63} step2: time=constant*B2^{0.63}*digits^{1.3} P4N: step1: time=constant*B1*digits^{1.75} step2: time=constant*B2^{0.63}*digits^{1.3} P4P: step1: time=constant*B1*digits^{1.71} step2: time=constant*B2^{0.63}*digits^{1.4} P1: Ath: step1: time=constant*B1*digits^{1.6} step2: time=constant*B2^{0.6}*digits^{1.3} P4N: step1: time=constant*B1*digits^{1.7} step2: time=constant*B2^{0.6}*digits^{1.4} P4P: step1: time=constant*B1*digits^{1.7} step2: time=constant*B2^{0.62}*digits^{1.35} Alot bigger variation on the exponent on the digits for P1 than for ECM, ranging from 1.54 to 1.68 for athlon step1 and 1.29 to 1.45 for athlon step2, same kind of variation for the 2 other computers. Probably some of it caused by P1 being so much faster than ECM that many of the timings are so small that random fluxuation plays a bigger role. P+1: Ath: step1: time=constant*B1*digits^{1.59} step2: time=constant*B2^{0.64}*digits^{1.3} P4N: step1: time=constant*B1*digits^{1.70} step2: time=constant*B2^{0.64}*digits^{1.3} P4P: step1: time=constant*B1*digits^{1.67} step2: time=constant*B2^{0.64}*digits^{1.35} Almost as low variation on exponents in P+1 as on ECM in step1, but higher variation like in P1 in step2. Looking at the timings ECM is about 910 times slower than P1 at the same B1/B2 and 56 times slower than P+1, which fits with the advice in GMPECM readme to run P1 with 10*B1 and P+1 with 5*B1, where B1 is the B1value you use for ECM at each factor level. Last fiddled with by ATH on 20071202 at 03:42 
20071204, 05:25  #9 
"Jason Goatcher"
Mar 2005
3×7×167 Posts 
It might be a good idea to take a bit of post 8, and give it it's own Sickey thread in the GMPECM forum.
Just my opinion. :) 
20071209, 20:51  #10  
Nov 2007
Halifax, Nova Scotia
2^{3}·7 Posts 
Quote:
Code:
echo "2^(2^20)+1"  ./ecm power 1 maxmem 2048 v 1000000 Code:
GMPECM 6.1.3 [powered by GMP 4.2.2] [ECM] Input number is 2^(2^20)+1 (315653 digits) Using B1=1000000, B2=904700226, polynomial x^1, sigma=2813888680 dF=128, k=5933, d=1020, d2=7, i0=974 Expected number of curves to find a factor of n digits: 20 25 30 35 40 45 50 55 60 65 5 21 130 1014 9706 110526 1461187 2.2e+07 3.7e+08 7.2e+09 Step 1 took 273197586ms Estimated memory usage: 720M ... Lots of timing data removed here ... Step 2 took 432207265ms (a) the verbose option is causing gmpecm to spend a lot of time doing timings etc. (b) gmpecm is not nearly as fast as mprime for numbers of this size (c) I have somehow made a serious mistake with the software Does anyway have an explanation for what I have found? 

20071210, 14:46  #11  
Nov 2003
2^{2}×5×373 Posts 
Quote:
P95 will be a lot faster in stage 1 than GMPECM. GMPECM will be faster in stage 2. 

Thread Tools  
Similar Threads  
Thread  Thread Starter  Forum  Replies  Last Post 
gpuowl: runtime error  SELROC  GpuOwl  69  20210929 10:07 
runtime question  yoyo  YAFU  1  20150108 07:07 
Predicting QS and NFS runtime  jasonp  Factoring  2  20110307 01:22 
gmpecm needed memory and runtime  yoyo  GMPECM  7  20100409 16:48 
runtime error when using redc  ltd  GMPECM  5  20091030 13:09 