20120902, 23:39  #12  
Nov 2006
Terra
2·3·13 Posts 
Quote:
Dubslow and I have the same question . The table , back in message # 6 , comes from the GMPECM readme file and from : http://www.loria.fr/~zimmerma/records/ecm/params.html The 2 functions in question are a ) the B1 column and  b ) the # of curves column . I think the easiest way to see the second derivatives is to look at the first derivatives which are shown , essentially , in the 2 columns headed "ratio" . They are not monotonic , but rise and fall , up and down , like on roller coasters . The specific numbers you request can be seen in the numbers in the "ratio" columns . B1 , e.g. , goes up to 5 , down to 3 , up to 3.91 , down to 3.26 , up to 3.4 , etc. I've not seen any function or expression which can account for this . I know that specific computer technology can produce sharp elbows . So , if it's out there , I'm saying "show me" the rationale for the roller coasters . 

20120903, 04:38  #13  
Dec 2011
11×13 Posts 
Quote:
I don't actually know how the B1 values were chosen. Maybe they ran curves at various B1 values and looked at what size factors were found. The thread you reference includes a quote from A Kruppa, who states "...early papers on ECM gave figures like B1=11000 for 20digit factors, B1=50000 for 25 digits, etc. These values stuck and everyone keeps using them, mostly for the sake of easy bookkeeping..." The way I see it, all of the B1 values are "wrong" in the sense that they were not optimized. But they aren't far enough wrong that it makes much difference. Using standardized B1 values makes sharing curve counts a lot easier. If a particular B1 has been chosen a little bit too small, you can make up for it by doing more curves. If a particular B1 has been chosen a little bit too large, you can make up for it by doing fewer curves. If you run the gmpecm program with the v option, it does some analysis of the curves for you. I ran a 155digit number through gmpecm v6.4.2, with values of B1 ranging from 3e6 to 5e6. Here are the results it reported on my hardware, with my system load: At B1=3000000, "Expected number of curves to find a factor of" 40 digits = 2351 At B1=3500000, "Expected number of curves to find a factor of" 40 digits = 2097 At B1=4000000, "Expected number of curves to find a factor of" 40 digits = 1755 At B1=4500000, "Expected number of curves to find a factor of" 40 digits = 1616 At B1=5000000, "Expected number of curves to find a factor of" 40 digits = 1386 Two runs, averaged, on a nonquiescent system reported: At B1=3000000, "Expected time to find a factor of" 40 digits = 13.7h At B1=3500000, "Expected time to find a factor of" 40 digits = 13.6h At B1=4000000, "Expected time to find a factor of" 40 digits = 13.4h At B1=4500000, "Expected time to find a factor of" 40 digits = 13.6h At B1=5000000, "Expected time to find a factor of" 40 digits = 13.9h I've never looked at the code to see how it generates these values, but this particular set of data seems consistent with Silverman's and Wagstaff's paper (S&W). In particular, the curve is rather flat in the vicinity of the optimal B1 value! I hate to be repetitive, but although my data is too noisy to assert the optimal B1 value is 4000000, Table 3 of the paper does show that 4000000 may have been a more optimal choice than 3000000 (for finding 40digit factors). If you want to run ~1755 curves at B1=4000000, rather than ~2351 curves at B1=3000000, I think you might expect to see a few percent performance improvement. If you optimized all of your B1 values by using Dickman's function (and extensions), as described in S&W and elsewhere, you might find that the socalled "rollercoaster" was gone. But, repetitive as I am, it doesn't matter. I believe the differences will be in the singledigit percent range. [If someone has already done this optimization, perhaps somebody can post a pointer.] [See disclaimer above.] 

20120903, 08:50  #14 
Just call me Henry
"David"
Sep 2007
Cambridge (GMT/BST)
3^{2}×7^{2}×13 Posts 
I ran some tests recently at 35 digit level and the fastest result was B1=2e6 and B2=0.5*default with 1.24h. The fastest B2=default was B1=15e5 which took 1.28h.
The standard B1=1e6 with B2=default took 1.39h. The fastest took 85% of the standard time plus it stood more chance of finding lucky high factors. Just running the 35digit level with 40digit curves(B1=3e6 B2=default) only takes 1.31h which is faster than the default for 35digit. There are similar stories for 25 and 30 digits. Spreadsheets attached. The results aren't perfect then are the lowest of 5 curves run on an idle pc. At somepoint I will modify a binary to give parsable output for this sort of thing so I can do more curves/B2scales/B1s/numbers. The 25 and 30 digit times are with a different binary with a different version of mpir/gmp. All are run on a c90. 
20120906, 00:39  #15  
Nov 2006
Terra
78_{10} Posts 
I think perhaps you gentlemen have explained the stateoftheart
of optimal B1's . Thank you . Quote:
the optimal B1 value , for the purpose of quickly finding a factor . But now the table has been _repurposed_ to indicate the optimal point to abandon ecm as unsuccessful , and to switch to SNFS . Somehow it has become received wisdom that the optimal amount of ecm is to 2/9 of the SNFS difficulty . This may be a theorem , but I have not yet encountered it . E.g. , 225 * 2/9 = 50 , so for SNFS difficulty = 225 , ecm would be completed , seeking factors from 20 to 50 digits , in the manner described in the readme , using the # of curves given in the table in the readme for the various increasing levels , through 7553 curves @ 43e6 , before switching to SNFS . Are the approximate values in the table as useful for this new purpose ? 

20120906, 01:47  #16 
"Curtis"
Feb 2005
Riverside, CA
2·3·733 Posts 
Walter
Yes, they are because the number of curves compensates for the slightlyincorrect choice of B1. In practice, any B1 and number of curves combination that completes a T50 (using the v flag mentioned above to discover expected number of curves, which is the number we use to declare a level "done") satisfies the "enough ECM" 2/9ths recommendation for SNFS. If you use a moreoptimal B1/number of curves combination for your system, you may save a few percentage points of ECM time as much as 10 or 15 percent. This should not be done for projects that distribute level among multiple testers, as it's rather difficult to convert curves from varying B1 bounds. For projects where you personally ECM until deemed ineffective and then proceed to GNFS or SNFS, by all means optimize to your heart's content. Curtis 
20120906, 01:54  #17  
"Curtis"
Feb 2005
Riverside, CA
112E_{16} Posts 
Quote:
I did some tests like this on numbers varying from 120 to 2000 digits, and found two things you did not: The optimal B2scale setting rose as the size of the candidate rose, and the optimal B2scale is higher for higherdigit levels. Optimal B2scale in my tests (on a laptop i7 architecture may matter, memory access and all) varied between 4 and 8 for T35 and T40 tests. I used only the default B1 settings, and most tests were for the Fibonacci tables or oddperfect.org. If you are still interested, I suggest you repeat those tests for a 140digit number and a T30 followed by a T35. The T30 will allow us to compare to your present data on your CPU, and the T35 will corroborate (or not) my inkling that B2scale should rise a bit as we move to higher digitlevels. Curtis 

20120906, 10:20  #18  
Just call me Henry
"David"
Sep 2007
Cambridge (GMT/BST)
3^{2}·7^{2}·13 Posts 
Quote:
I thought digit size might make a difference. I will run my 35digit test again on a c140. 

20120906, 14:08  #19  
Nov 2006
Terra
2×3×13 Posts 
Quote:
a ) providing the theoretical 1to1 ratio with step 1 , or b ) minimizing CPU or wall clock time ? Is a ) valid ? 

20120906, 15:13  #20 
Nov 2006
Terra
2·3·13 Posts 
Here is some supplementary info for my next post :
In a run using "I 1" from 26e6 , after 2741 curves , B1 was just a hair over 52e6 , a doubling . The first 5 curves above 52e6 took 91.8% longer to run than the first 5 curves , @ ~ 26e6 . The 2741 curves averaged ~ 39e6 . 2741 is nowhere near what the readme would say for 39e6 , which would be something near 7000 . I probably should have said this earlier , but I only referred to second derivatives and the changes of sign as a concise way of expressing my question , one which isolates my interest in the gross behavior as opposed to some quibble about the second significant digit . I'm really much more interested in the gross behavior of the first derivatives . As evidence of the grossity of my interest , in my post , I mistakenly wrote "down to 3.26" when I meant to write "down to 2.36" . 
20120906, 15:18  #21  
Nov 2006
Terra
2·3·13 Posts 
One of the things I've been looking for is the definition of T50 , or
some background on it . It seems obvious that every time you run a curve , you learn something . There are 3 possibilities : a ) you learn a prime factor b ) you learn a composite factor or c ) you learn only a tiny bit , that the B1 wasn't able to do better . Actually , a ) 1 ) the cofactor = quotient is composite or a ) 2 ) the cofactor is prime , completing the factorization . Even in case c ) , your expectation about the size of the next factor to be found has changed . You now know that a smaller factor is slightly less likely and a larger factor is slightly more likely . Given that , it also seems obvious that it is suboptimal to keep hammering away at the same B1 again and again . Does it make sense to ignore that information for 10,000's of consecutive curves ? Empirically , I've been told that factors are generally found earlier at a given level , rather than later at that level . In GMPECM , the "I" option gradually increases B1 with each successive curve . Quote:
but after 1000's of curves , or 10,000's of curves , doesn't it have a much larger effect ? Thanks . There's some supplementary info in the previous post . 

20120907, 05:45  #22  
"Curtis"
Feb 2005
Riverside, CA
2×3×733 Posts 
Quote:
"optimal" as I use it is a minimization of time to complete a level. "T50" is simply shorthand for a combination of B1 and # of curves such that the expected number of curves to find a 50digit factor has been completed. Note this leaves a 1/e chance a factor is not found. I found that setting B2 to achieve (a) produced longer wallclocktime than setting B2 so that stage 2 time is 7080% of stage 1 time. I believe this ratio changes across CPU architecture/memory speed/memory allocation. As for your query about using information gained from a curve run without factor at levels where one runs 5000 or 10000 curves, the add'l information one gains from each curve is not meaningful. Perhaps one might run 1000 curves and then shift B1 by some amount say, from 45digit optimal to 46digit optimal but as mentioned many times in this thread, the timetotest curves are very flat for such things. Witness the expected time from v flag to find a 35digit factor when B1 = 1M vs B1 = 3M. I experimented a bit with incremental B1 settings when I did some oddperfect work around 120 digits; while it seemed more efficient in principle, I came to believe it made little difference, and chose to stick to the standard methods. I have no numeric evidence for that feeling anything learned was not documented. Curtis 

Thread Tools  
Similar Threads  
Thread  Thread Starter  Forum  Replies  Last Post 
ECM  why 3 curves?  James Heinrich  PrimeNet  3  20171114 13:59 
JKLECM: ECM using Hessian curves  CRGreathouse  Software  1  20170906 15:39 
Need help with elliptic curves...  WraithX  Math  12  20100929 09:34 
Curves needed  henryzz  GMPECM  3  20071221 16:13 
Elliptic curves in NFS  otkachalka  Factoring  5  20051120 12:22 