mersenneforum.org How many ecm curves per B1 is a good estimate?
 Register FAQ Search Today's Posts Mark Forums Read

 2013-10-15, 19:07 #1 EdH     "Ed Hall" Dec 2009 Adirondack Mtns 47·79 Posts How many ecm curves per B1 is a good estimate? I have a script that tasks several machines to perform ecm on a given number, using the following table: Code: # 30 @ B1 = 2k # 74 @ B1 = 11k # 107 @ B1 = 50k # 432 @ B1 = 250k # 900 @ B1 = 1M # 2350 @ B1 = 3M # 3850 @ B1 = 11M # 2000 @ B1 = 43M # 2000 @ B1 = 110M # 5000 @ B1 = 200M What would be better values for a general table for composites between 80-160 digits?
2013-10-15, 19:26   #2
R.D. Silverman

Nov 2003

22×5×373 Posts

Quote:
 Originally Posted by EdH I have a script that tasks several machines to perform ecm on a given number, using the following table: Code: # 30 @ B1 = 2k # 74 @ B1 = 11k # 107 @ B1 = 50k # 432 @ B1 = 250k # 900 @ B1 = 1M # 2350 @ B1 = 3M # 3850 @ B1 = 11M # 2000 @ B1 = 43M # 2000 @ B1 = 110M # 5000 @ B1 = 200M What would be better values for a general table for composites between 80-160 digits?
"wrong-headed". Let's take it a bit at a time.

(0) When you say "better", one must ask: Better for what? What is
the metric??

(1) What is your objective? [Don't say: to factor numbers]. What is your
ECM objective? Is it to maximize the probability of success given that you
are spending a **fixed** amount of time? i.e. you are selecting a priori the
amount of time you want to spend on each number.

Or is it to maximize the probability of success per unit time? Here, you allow
the amount of time you are willing to spend to vary, depending on the size
of the factor you are looking for.

(2) The number of curves and the B1 limit you select depends, not on the
size of the composite, but rather on the size of the factor(s) you are looking
for.

R. Silverman & S.S. Wagstaff Jr. A Practical Analysis of ECM
Mathematics of Computation.

It explains just about everything. BTW, an optimal strategy is NOT to
select a fixed B1 limit and a fixed number of curves. An optimal strategy
selects a limit and a number of curves, then uses the information gained
by a failure to select a new set of paramaters. The above paper shows
how.

 2013-10-15, 20:31 #3 VBCurtis     "Curtis" Feb 2005 Riverside, CA 52×191 Posts Mr Silverman has pointed the way here, rather politely too. I believe the question you need an answer to is "how much time should I spend on ECM for a given number before proceeding to NFS?" I believe the paper cited above will help with that also, but I can cite heuristics used in this forum: Run enough curves to find a factor of 2/9ths the number of digits of difficulty of an SNFS-able composite, and somewhere between 0.30 and 0.33 the number of digits of a GNFS composite. For a 160-digit number, I'd do a t50 at B1 = 43M (roughly 7500 curves), and a t50 worth of curves at B1 = 110M (roughly 3000 curves). I would not use a B1 higher than 110M for a number smaller than 170 digits. Your number of curves table is produced by the -v flag in ECM, and varies by version of the software; as stage 2 improvements have been found, the default B2 bound has risen, allowing fewer curves per level for never versions of ECM compared to older versions and the paper. Even after reading the paper, I am unclear why these specific B1 values are chosen to find factors of a specific size. For example, to find 45-digit factors I find B1 anywhere from 18M to 30M produces lower expected time to complete the T45 level than B1 = 11M. Perhaps the time estimates GMP-ECM produces are inaccurate, perhaps a software enhancement has made larger B1 values optimal, perhaps I just need to re-read the paper...
2013-10-15, 20:43   #4
Mini-Geek
Account Deleted

"Tim Sorbera"
Aug 2006
San Antonio, TX USA

17×251 Posts

Quote:
 Originally Posted by VBCurtis Even after reading the paper, I am unclear why these specific B1 values are chosen to find factors of a specific size. For example, to find 45-digit factors I find B1 anywhere from 18M to 30M produces lower expected time to complete the T45 level than B1 = 11M. Perhaps the time estimates GMP-ECM produces are inaccurate, perhaps a software enhancement has made larger B1 values optimal, perhaps I just need to re-read the paper...
At the risk of being wrong, I'm saying it's the one I bolded: software enhancements have changed the optimal values. At the time of the paper, B2 ran 100 times as fast as B1 (which meant B2 = 40 * B1 was optimal, I think). It is now many times faster than that. This surely changes the optimal B1 and B2 values. The ideas behind the paper and the commonly-used B1 values are sound, I'm sure, but the figures are no longer up-to-date as the most efficient values.

Last fiddled with by Mini-Geek on 2013-10-15 at 20:51

2013-10-15, 20:50   #5
R.D. Silverman

Nov 2003

22×5×373 Posts

Quote:
 Originally Posted by Mini-Geek At the risk of being wrong, I'm saying it's the one I bolded: software enhancements have changed the optimal values. At the time of the paper, B2 values were 100 times B1 values. They are now many times larger than that. This obviously changes the optimal B1 and B2 values.
As the paper points out (and derives), it is optimal to spend as
much TIME in step 2 as in step 1. The optimal B2/B1 ratio then
depends on how fast the implementation is while in Step 2.

The theoretical optimum for step 2 is B2 = B1^2, because a theoretically
perfect convolution implementation can run Step 2 to B1^2 in the
same time as B1. However, even the latest version of GMPECM does
not take B2 that high.

Note, however, that as the paper also points out the response surface is
very very very flat in the neighborhood of the optimal point. Thus, one
can change parameters by a fair bit without noticeably affecting
performance.

 2013-10-16, 01:51 #6 EdH     "Ed Hall" Dec 2009 Adirondack Mtns 1110100000012 Posts Thank you all for the assistance. I realized the question I was asking was not quite "right," but what I'm trying to do is come up with a "general" table that will take ecm to certain levels across a spectrum of composite sizes. I know that it won't produce the same level for two different size composites, but I'd like to maximize the success when running several machines. I'd further like to set up the script such that at some point in the table, it will switch to gnfs. I've tried to make use of aliqueit and YAFU, but I'm having trouble getting the most out of multiple machines. I can run ecm.py (and do) and that is great for multiple cores on a single machine, but I haven't discovered how to incorporate the other machines, so even if I run the others based on the curves chosen by aliqueit or YAFU, I can't then tell aliqueit or YAFU what I've already done, so they can move on. I've looked a little into openmpi and may go that route, but I'm not comfortable yet with it. I know the table should be based on the size of the number to determine the proper level and time to devote, but I was kind of searching for a general case. I'll go away for now and come back after my reading assignment. Thanks again.
 2013-10-16, 03:16 #7 LaurV Romulan Interpreter     Jun 2011 Thailand 223408 Posts You may try to see how yafu is doing it. I love its way, in fact. I use yafu with custom plan and 0.33 ECM percent. The reason is quite superficial: trying to eliminate composites that "split in 3", before going to NFS. I am working under, and around, 120-130 digits. I find yafu a wonderful tool, ECM timing is ok, compared to NFS timing (I would say "optimal" if I would not be afraid that some guys with more knowledge will jump to my throat ), and I didn't see a "3 way" NFS split since ages. Last fiddled with by LaurV on 2013-10-16 at 03:18 Reason: the parenthesis
 2013-10-16, 03:23 #8 VBCurtis     "Curtis" Feb 2005 Riverside, CA 52·191 Posts Ed- This is just what I was hoping you were asking, and is why I gave the heuristics for SNFS and GNFS search depths for ECM as a fraction of the composite size. YAFU uses those to generate its ECM decisions, and even has a "pretest ratio" flag that can be adjusted if more or less ECM is desired for GNFS jobs. mklasson (the guy who wrote Aliqueit software) mentions somewhere that one should ECM for time roughly equal to 25% of the expected NFS time, but I have not done the footwork myself to see why that is optimal. I believe the 25% of time and the 33% of composite size heuristics are compatible.
2013-10-16, 03:51   #9
VBCurtis

"Curtis"
Feb 2005
Riverside, CA

52×191 Posts

Quote:
 Originally Posted by R.D. Silverman As the paper points out (and derives), it is optimal to spend as much TIME in step 2 as in step 1. The optimal B2/B1 ratio then depends on how fast the implementation is while in Step 2. The theoretical optimum for step 2 is B2 = B1^2, because a theoretically perfect convolution implementation can run Step 2 to B1^2 in the same time as B1. However, even the latest version of GMPECM does not take B2 that high. Note, however, that as the paper also points out the response surface is very very very flat in the neighborhood of the optimal point. Thus, one can change parameters by a fair bit without noticeably affecting performance.
In practice with the current GMP-ECM implementation, setting B2 so that stage 2 time is roughly 1/3rd stage 1 time produces the most efficient results. The best results are usually found when k=2 or k=3. I have tried choosing B2 to be roughly the same amount of time-in-stage as B1, but results are noticeably worse. Does this suggest a flaw in the GMP-ECM time estimations?

To illustrate how flat the response curve is, I tested B1 values every 1M or 2M from 3M to 66M to look for optimal B1 for t40 and t45 runs. For t45, the 11M level by tradition, I found that expected time for a t45 is within 3% of the best B1 from 11M to 34M, with B1=19M and B2=96G the best at 3% faster than default 11M.

For t50, every B1 from 30M to >66M is within 2% of the best B1.

Henry and I read these numbers and conclude that since we only pay a small penalty in efficiency to use a B1 much larger than standard practice, we may as well double or triple the usual B1 values for a particular level in order to increase our chances to find a larger factor. I use B1 = 21M for my t45 level, which is both faster than 11M for a t45 and completes 15% of a t50 whilst doing a t45 (2368 curves).

For a t40, I find that B1= 11M is within 2% of the time of B1 = 3M. In my opinion, that means we can simply skip the 3M curves, and do 700 curves at 11M as a t40. But similar arguments can be made for smaller ECM levels...

2013-10-16, 11:33   #10
R.D. Silverman

Nov 2003

22×5×373 Posts

Quote:
 Originally Posted by EdH Thank you all for the assistance. I realized the question I was asking was not quite "right," but what I'm trying to do is come up with a "general" table that will take ecm to certain levels across a spectrum of composite sizes. I know that it won't produce the same level for two different size composites,
Nothing gets me quite so pissed off as people who refuse to read
what I have written. Especially when I take the time to give (at least moderately)

The last sentence you wrote above shows clearly that you did not
read what I wrote. The ECM "level" has NOTHING TO DO WITH THE SIZE
OF THE COMPOSITES. It only involves the size of the FACTORS.

Quote:
 I'd further like to set up the script such that at some point in the table, it will switch to gnfs.
For numbers less than 100 digits you should be using QS, not gnfs.

Quote:
 I know the table should be based on the size of the number to determine the proper level and time to devote, but I was kind of searching for a general case.
General case FOR WHAT??? You still haven't stated what you are trying to
optimize!

"Proper time and level" depends on your OBJECTIVE FUNCTION. You have
not said what that is. Learn how to say exactly what it is that you want.

Others have given advice as well. You have been told, e.g. that a general
"rule of thumb" for running ECM before switching to SNFS is to run ECM
to look for a factor that is about 2/9 the size of the composite. This applies
somewhat to to decision to switch to QS as well. To run ECM before
switching to GNFS, run ECM to a level that is about 30% of the size of the
composite. Note that these are only "rough estimates". A true estimate
would require exact knowledge of the relative performance of ECM and
QS/NFS on the machines that you are using. It isn't just about CPU
speed. Cache sizes play a big role, as does DRAM latency and size.

2013-10-16, 13:17   #11
bsquared

"Ben"
Feb 2007

65548 Posts

Quote:
 Originally Posted by EdH T I'd further like to set up the script such that at some point in the table, it will switch to gnfs. I've tried to make use of aliqueit and YAFU, but I'm having trouble getting the most out of multiple machines. I can run ecm.py (and do) and that is great for multiple cores on a single machine, but I haven't discovered how to incorporate the other machines, so even if I run the others based on the curves chosen by aliqueit or YAFU, I can't then tell aliqueit or YAFU what I've already done, so they can move on.
YAFU can actually do this too, if I understand your problem correctly. Using the switch -work T will tell yafu that ECM to t-level T has already been performed, and it will pick up from that point. So if you run 1000 curves at B1=1e6 somewhere else, then start yafu with -work 35, it will start by running curves targeting 40 digit factors (B1=3e6).

 Similar Threads Thread Thread Starter Forum Replies Last Post RienS Hardware 17 2014-11-18 22:58 Primeinator Information & Answers 8 2009-06-11 23:39 petrw1 PrimeNet 5 2008-11-08 02:23 davieddy Lounge 34 2008-09-17 04:22 XYYXF Math 13 2006-09-01 15:44

All times are UTC. The time now is 20:48.

Sun May 9 20:48:18 UTC 2021 up 31 days, 15:29, 0 users, load averages: 3.68, 3.53, 3.58