![]() |
![]() |
#23 | ||
"Bob Silverman"
Nov 2003
North of Boston
11101010101002 Posts |
![]() Quote:
Quote:
|
||
![]() |
![]() |
![]() |
#24 |
Jul 2003
So Cal
50538 Posts |
![]()
We only perform root optimization and get an e-score on the best size-optimized hits. That said, I still have ~8.1 million unique size-optimized hits from my msieve run on 2,1109+. But I did not save the data on prior searches.
|
![]() |
![]() |
![]() |
#25 |
"Curtis"
Feb 2005
Riverside, CA
130138 Posts |
![]()
Forgive my ignorance, but is RDS asking for the mean score of, say, all GNFS-155 jobs we've run, and look at the distribution of poly scores for various jobs run at some particular size? That appears to be Sean's interpretation of the question. I can address this, in that if I don't score within 5% of the trendline of the record poly scores I keep on poly-searching. The outliers for records aren't THAT great an outlier; only one or two of the record entries are notably lucky / above trend across the entire table.
Greg's interpretation is similar to mine, that RDS was asking how big an outlier the record scores are compared to the typical polynomial score found during a search. However, I don't see how one would measure that mean, since the results depend so heavily on the filters applied at each phase of poly select. I mean, if I root-opt only my best 100 hits while Gimarel root-opts his top 1000, clearly we're going to have different concepts of "mean poly". While these objections are pretty obvious, I don't see a way to pick a standard for either one to yield meaningful distribution data. |
![]() |
![]() |
![]() |
#26 | |
"Bob Silverman"
Nov 2003
North of Boston
22×1,877 Posts |
![]() Quote:
for each one, but it does exist. I am asking how the e-scores are distributed across all of the polynomials. What is their mean, how far from the mean (in terms of the standard deviation) is the best score? This is Stat-101. |
|
![]() |
![]() |
![]() |
#27 |
"Curtis"
Feb 2005
Riverside, CA
33·11·19 Posts |
![]()
Each polynomial... from what set? From stage 1, or after size opt, or after root opt?
What settings do you use for stage 1 norm, or stage 2 norm, or for P or nq (in CADO)? Each setting changes both the number of polys generated per unit of effort as well as their average quality. You seem to be treating the poly search as a black box that is the same for every search, while it's far from that. I don't think it's stats 101, despite your condescension- rather, I think it reflects that you don't know much about how poly select is actually run. I suppose rather than argue with you, I should just suggest you gather your own data, so you get exactly what you're looking for. Last fiddled with by VBCurtis on 2022-10-02 at 21:56 |
![]() |
![]() |
![]() |
#28 | |
"Bob Silverman"
Nov 2003
North of Boston
1D5416 Posts |
![]() Quote:
We seem to be talking cross purposes. You are focussing on the specifics of how the search is conducted, and on the specifics of your implementation of that search. I am asking about the mathematics. How the selection is run does not matter. And your accusation that I was being condescending was totally uncalled for. Let N be an odd composite whose factorization is sought. Polynomial selection finds a (set of) polynomials F_i, roots of those polynomials M_i such that F_i(M_i) = 0 mod N. The second polynomial for NFS is then G_i(x) = x -M_i. (or x+M_i depending on choice of sign). For each such pair of polynomials we seek to maximize integral integral Prob(norm(G(p)) * Prob(norm(F(p)) where the norms are the norms of the polynomials taken at a lattice point p, and the double integral is taken over the entire lattice and Prob is the probability that the norms are smooth over the factor base. The likelihood that the polys are jointly smooth is measured by the e-score. Each such pair of polynomials has a e-score. The selection process chooses a final pair of polynomials with maximal e-score. The set of e-scores has a distribution. I am asking how the set of e-scores is distributed. The exact search process and the inputs that guide that search process do not matter. The question is: For each pair of polynomials (F,G) generated by the search process, [regardless of the specifics of how that search is conducted] there is a e-score. It exists even if the search process does not compute it for every pair. What is is the distribution of the e-scores that are computed? By how much does the maximal e-score exceed the mean? One can also ask: If you were to compute an e-score for every pair of polynomials, what would its distribution be? I think this is worth a paper. Last fiddled with by R.D. Silverman on 2022-10-02 at 22:59 |
|
![]() |
![]() |
![]() |
#29 | |
Just call me Henry
"David"
Sep 2007
Liverpool (GMT/BST)
136208 Posts |
![]() Quote:
Sounds like your next research project. The CPU version of msieve sounds like it would be fine for this study although it would be worth considering how the distribution parameters vary with more/less searching. |
|
![]() |
![]() |
![]() |
#30 |
Jul 2003
So Cal
19·137 Posts |
![]()
I had a bit over 3 million unique size-optimized hits from 2,1109+. I initially forgot to sort them before using uniq. A quick histogram of their norms is attached.
I then took the 10000 polys with the lowest (best) norms and ran root optimization with msieve. msieve outputs 200 root-optimized polynomials for each input, so this resulted in 2M polynomials. A quick histogram of their e-scores is also attached. |
![]() |
![]() |
![]() |
#31 | |
"Bob Silverman"
Nov 2003
North of Boston
22·1,877 Posts |
![]() Quote:
theorem effect happening here? |
|
![]() |
![]() |
![]() |
#32 |
Sep 2009
97C16 Posts |
![]()
I've got a fair amount of similar data for GGNFS poly selection for various numbers (most from the Brent tables) ranging from 130 to 176 digits.
My scripts usually kept the best 200-300 polys from msieve -np1 -nps (size optimized) and put them through msieve -npr (root optimized) which generated several times as many polys. I then used the one with the best e-score. The scores from -nps seem to be on a different scale to the scores from -npr though. 6.400158e+19 vs 9.418e-13 for the first entry in both files (this is for a c161). I could send you the data if you want it. I don't know enough statistics to do a useful analysis myself. If you want it send me a PM with details of what you want (do you want the polys or just the e-scores?) and how to send it (email or I could post it on a web site for you to download. And how many numbers would you want it for? |
![]() |
![]() |
![]() |
#33 |
Jul 2003
So Cal
19×137 Posts |
![]()
Likely since the e-score is an integral combining contributions from both size and root optimization. msieve reports both the final norm characterizing the size and alpha characterizing the root properties of the polynomials. Attached are histograms of each of these separately for the same 2 million polynomials.
|
![]() |
![]() |
![]() |
Thread Tools | |
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Polynomial selection | Max0526 | NFS@Home | 9 | 2017-05-20 08:57 |
Updated polynomial selection | jasonp | Msieve | 65 | 2011-05-01 19:06 |
GNFS polynomial selection | Unregistered | Information & Answers | 3 | 2011-04-16 14:24 |
2^877-1 polynomial selection | fivemack | Factoring | 47 | 2009-06-16 00:24 |
Polynomial selection | CRGreathouse | Factoring | 2 | 2009-05-25 07:55 |