mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   CADO-NFS (https://www.mersenneforum.org/forumdisplay.php?f=170)
-   -   Polynomial Selection Parameters Discussion (https://www.mersenneforum.org/showthread.php?t=25550)

EdH 2020-05-16 16:55

Polynomial Selection Parameters Discussion
 
[B]Note: [/B]This initial post is for reference only. The following posts are for discussion.

This thread is intended to allow discussion of the parameters used in Polynomial Selection for CADO-NFS. The following parameters are avaiable ("tasks.polyselect." preceeds each in the parameters files):
[code]
degree
threads
P
admin
admax
incr
nrkeep
adrange
nq
sopteffort
ropteffort
[/code]I have added the Polynomial Selection section of params.c90 for easy reference to their descriptions and defaults.

Extraction from CADO-NFS params.c90, explaining available parameters:
[code]
###########################################################################
# Polynomial selection task with Kleinjung's algorithm (2008)
###########################################################################

tasks.polyselect.degree = 4 # degree of the algebraic polynomial

#tasks.polyselect.threads = 2
# # How many threads to use per polyselect process

tasks.polyselect.P = 10000 # choose lc(g) with two prime factors in [P,2P]
# Setting a large value will most likely find better polynomials,
# but the number of found polynomials will be smaller.
# As a rule of thumb, we want at least 100 found polynomials in total
# (without norm limitation, see below).
# We must have P larger than primes in the SPECIAL_Q[] list.

tasks.polyselect.admin = 0 # min value for leading coefficient of f(x)
# If not set, the default is 0.
# Note: admin is not necessarily a multiple of incr (defined below).

tasks.polyselect.admax = 100e3 # max value for leading coefficient of f(x)
# In fact the maximal value is admax-1, so that we can run for example:
# polyselect -admin 1000 -admax 2000 and polyselect -admin 2000 -admax 3000
# without duplicating the value 2000.

tasks.polyselect.incr = 60 # increment for leading coefficient of f(x)
# This factor is usually a smooth number, which forces projective roots in
# the algebraic polynomial. 60 is a good start, 210 is popular as well.
# Note: admin and admax are not necessarily multiples of incr, for
# example polyselect -incr 60 -admin 0 -admax 200 will deal with 60, 120,
# and 180, while polyselect -incr 60 -admin 200 -admax 400 will deal with
# 240, 300, and 360.

# The polynomial selection search time is proportional to the
# length of the search interval, i.e., (admax-admin)/incr.

tasks.polyselect.nrkeep = 40 # number of polynomials kept in stage 1 of
# polynomial selection (size optimization)

tasks.polyselect.adrange = 5000 # size of individual tasks
# Polynomial selection is split into several individual tasks. The
# complete range from admin to admax has to be covered for the polynomial
# selection to complete. The number of individual tasks is
# (polsel_admax-polsel_admin)/polsel_adrange. Each such task is issued as
# a workunit to a slave for computation.

tasks.polyselect.nq = 256
# Number of small prime combinations in the leading coefficient of the
# linear polynomial. Safe to leave at the default value.
# Recommended values are powers of the degree, e.g., 625 for degree 5,
# or 1296 for degree 6. Here 256 = 4^4 thus the leading coefficient of
# the linear polynomial will be q1*q2*q3*q4*p1*p2 where q1,q2,q3,q4 are
# small primes, and P <= p1, p2 < 2*P. For a given pair (p1,p2), 256
# different combinations of q1,q2,q3,q4 will be tried.

tasks.polyselect.sopteffort = 0
# Indicates how much effort is spent in stage 1 (size optimization) of the
# polynomial selection. The default value is 0. The size optimization time
# is roughly proportional to 1+sopteffort.

tasks.polyselect.ropteffort = 0.2
# Indicates how much effort is spent in stage 2 (rootsieve) of the
# polynomial selection. The number of sublattices considered is
# proportional to ropteffort, and the rootsieve time is roughly
# proportional to ropteffort. This is a floating-point value.
[/code]

EdH 2020-05-16 16:55

Now, for my current questions:

- (degree) Is there a "rule-of-thumb" for determining what degree?

- (admin) Other than to eliminate duplication, is there a reason not to start searching at admin = 0?

- (nrkeep) I see 200 for the default number to keep at 190 digit sizes. If there is ranking of polynomials done to determine which 200 to keep, is it really possible that one far down the initial ranking would turn out better than the rest later on?

EdH 2020-05-16 18:17

[QUOTE=EdH;545537]. . .
- (nrkeep) I see 200 for the default number to keep at 190 digit sizes. If there is ranking of polynomials done to determine which 200 to keep, is it really possible that one far down the initial ranking would turn out better than the rest later on?[/QUOTE]
Perhaps I have found the answer, in that the "raw exp_E" score range appears to be rather narrow, resulting in a large number of candidates with identical scores.

VBCurtis 2020-05-16 19:03

[QUOTE=EdH;545537]- (degree) Is there a "rule-of-thumb" for determining what degree?

- (admin) Other than to eliminate duplication, is there a reason not to start searching at admin = 0?

- (nrkeep) I see 200 for the default number to keep at 190 digit sizes. If there is ranking of polynomials done to determine which 200 to keep, is it really possible that one far down the initial ranking would turn out better than the rest later on?[/QUOTE]

There is a theoretical property of NFS for determining degree. However, the poly-search capabilities of the software influence this also- deg 6 is simply harder to search than deg 5.
While using CADO, I've found the deg 4/5 cutoff to be just under 100 digits, and deg 5/6 to be around 215.
For SNFS, poly search obviously doesn't matter, so theory predicts optimal degree more accurately. If I recall, 4/5 is around 110 digits, 5/6 205-210, 6/7 340 or 350. Sorry that I'm too lazy to locate the formula for you!

There are two reasons to set admin:
1. As you've discovered quite a few times, CADO sometimes gets "stuck" on a small c5 value, delaying poly select quite a lot.
2. As the size of the input rises, it becomes less and less likely that a very-small c5 value will produce the best poly. Generally, this is because skew gets too high (at least, that's my layman's grasp of what goes on).

For 120-140 digit inputs, #1 has me setting admin in the 2000-10000 range to try to avoid those software hiccups. For 180-210 digits, I set admin higher because of #2.

How many polys to root-opt is a matter of time spent vs chance of reward. CADO has a better size-opt ranking algo than msieve, so it's not necessary to root-opt hundreds of polys in CADO like we used to in msieve. That said, at large inputs like 190 digits the root-opt phase is a tiny fraction of size-opt search time, so may as well root-opt a few hundred in case something unusual happens.
On small inputs, say <130 digits, the root-opt step is a large fraction of size-opt time; I've found I get better results by adding extra size-opt effort and reducing the nrkeep to 48 or lower. The time is better spent generating more candidates than hoping a 50th place poly ends up best after root-opt. You can see this clearly if you mess with the settings on a C100; root-opt takes longer than size-opt with default settings (if you change to deg 5, at least)! Doubling size-opt time and halving nrkeep usually yields a better poly. CADO root-opts in blocks of 6 polys, so I make nrkeep a multiple of 6 nearly always.

swellman 2020-05-16 19:08

[QUOTE=EdH;545540]Perhaps I have found the answer, in that the "raw exp_E" score range appears to be rather narrow, resulting in a large number of candidates with identical scores.[/QUOTE]

I’ve struggled with this parameter - why hang on to hundreds of likely subpar polys that seemingly take forever to complete root-opt, instead just keep the “cream of crop”. But then if one cuts nrkeep to a much lower value (say 40) has the best poly potentially been filtered out before root opt? Still 200 seems like an awful lot of interim polys to carry along.

swellman 2020-05-16 19:27

[QUOTE=VBCurtis;545546]There is a theoretical property of NFS for determining degree. However, the poly-search capabilities of the software influence this also- deg 6 is simply harder to search than deg 5.
While using CADO, I've found the deg 4/5 cutoff to be just under 100 digits, and deg 5/6 to be around 215.[/quote]

In past GNFS poly searches, a deg 6 has never been used for anything under C206 (per the limited data in [url=https://www.mersenneforum.org/showpost.php?p=539610&postcount=161]best scores table[/url] and the [url=https://www.mersenneforum.org/showthread.php?t=24054]C190+ NFS jobs thread[/url]). Parallel searches in deg 5 and 6, as well as lots of test sieving were involved for C206-215.

EdH 2020-05-16 22:59

The description of P, says that larger values normally find better, but fewer, polynomials. If this is the case, would raising P and letting the number of found polynomials fall below nrkeep make sense?

Is a test of yield a possible strategy, e.g, test a small min/max region for yield and then raise P and adjust the search region to produce the optimum number of polynomials? (The description suggests at least 100.) Is the returned count of found polynomials consistent, or does it increase/decrease, across a range?

If you ran two searches across the same region with vastly different P, would they find the same polynomials, with the larger P finding fewer of them? Or, would the found polynomials be different?

EdH 2020-05-16 23:04

[QUOTE=VBCurtis;545546]. . .
Sorry that I'm too lazy to locate the formula for you!
. . .[/QUOTE]This is not at all a problem. Please don't go to that bother. I would merely forget it very soon and would not actually put it to use, at least not in the near future.

VBCurtis 2020-05-17 02:33

[QUOTE=EdH;545559] If you ran two searches across the same region with vastly different P, would they find the same polynomials, with the larger P finding fewer of them? Or, would the found polynomials be different?[/QUOTE]

From params.c90:
choose lc(g) with two prime factors in [P,2P]

So, if you double P (call it Q=2P), it's highly unlikely a polynomial will have this property in both [P,Q] and [Q,2Q]. I've changed P by 25% and had the same best polynomial pop out of the two runs; I've changed P by 2x and never (in just a handful of trials) had the same poly come out of both runs.

Each adrange is producing hundreds of candidate polys; over the course of a poly select run you're finding thousands and keeping the best 60 or 90 or whatever. That doesn't mean you should inflate P so high that you only find a few hundred polys; things don't work that way.

I've been selecting P values such that the last-place Exp value (the one published during size-opt) is a minimum. I figure that's the P that finds the most "good" poly candidates, while trying to optimize for 5th place or 1st place is more a matter of luck.

EdH 2020-05-17 13:26

Some Levity for a Sunday Morning
 
Apparently, I have some parameters a bit off. Actually, I overran a bunch of timedout WUs and am restarting a snapshot, but the ETAs are scary!
[code]
Info:Polynomial Selection (size optimized): Parsed 378 polynomials, added 0 to priority queue (has 200)
Info:Polynomial Selection (size optimized): Marking workunit c190_polyselect1_456960-458880#2 as ok
(92.9% => [B]ETA Wed Apr 19 00:12:46 4237[/B])
[/code]Fortunately, after a few minutes, things looked a bit better:
[code]
Info:Polynomial Selection (size optimized): Parsed 370 polynomials, added 1 to priority queue (has 200)
Info:Polynomial Selection (size optimized): Worst polynomial in queue now has exp_E 53.980000
Info:Polynomial Selection (size optimized): Marking workunit c190_polyselect1_443520-445440#2 as ok
(97.9% => [B]ETA Mon Dec 13 04:11:14 2066[/B])
[/code]I guess the larger the P, the longer the processing. I had anticipated such, but not the degree.

On a more serious note, I do see that even with a doubling of P, there are still more poly candidates found in one WU than the size of nrkeep at 200.

EdH 2020-05-17 14:54

Sorry to be so inquisitive, but I'm lost on something else:
[code]Parsed 401 polynomials, added 327 to priority queue (has 200)
Worst polynomial in queue now has exp_E 56.820000
. . .
Parsed 367 polynomials, added 45 to priority queue (has 200)
Worst polynomial in queue now has exp_E 55.920000
. . .
Parsed 376 polynomials, added 9 to priority queue (has 200)
Worst polynomial in queue now has exp_E 55.320000
[/code]To my ignorant eye, there appear to be polynomials being pushed into a full queue, which are pushing out the better ones. Is a lower exp_E better than a higher one?


All times are UTC. The time now is 07:07.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2020, Jelsoft Enterprises Ltd.