mersenneforum.org  

Go Back   mersenneforum.org > Factoring Projects > Cunningham Tables

Reply
 
Thread Tools
Old 2022-10-02, 12:36   #23
R.D. Silverman
 
R.D. Silverman's Avatar
 
"Bob Silverman"
Nov 2003
North of Boston

23×937 Posts
Default

Quote:
Originally Posted by swellman View Post
Kamada’s site has 20+ years of data, some displayed graphically but no e-score histogram. He may have such data in his archives.

There is data in the NFS@Home log files for the various GNFS factorizations over the years but that would require a bit of data mining. I believe @VBCurtis did some digging into this a few years ago but the thread eludes me. Greg Childers may have such data as a matter of course (though I doubt it). But some data was likely never captured,
This is my expectation; below optimal e-scores were not captured. It would be a lot of data.
Quote:

But I love the idea of a study and I hope someone here takes the idea and runs with it.
R.D. Silverman is offline   Reply With Quote
Old 2022-10-02, 17:59   #24
frmky
 
frmky's Avatar
 
Jul 2003
So Cal

47458 Posts
Default

Quote:
Originally Posted by R.D. Silverman View Post
This is my expectation; below optimal e-scores were not captured. It would be a lot of data.
We only perform root optimization and get an e-score on the best size-optimized hits. That said, I still have ~8.1 million unique size-optimized hits from my msieve run on 2,1109+. But I did not save the data on prior searches.
frmky is offline   Reply With Quote
Old 2022-10-02, 20:14   #25
VBCurtis
 
VBCurtis's Avatar
 
"Curtis"
Feb 2005
Riverside, CA

126648 Posts
Default

Forgive my ignorance, but is RDS asking for the mean score of, say, all GNFS-155 jobs we've run, and look at the distribution of poly scores for various jobs run at some particular size? That appears to be Sean's interpretation of the question. I can address this, in that if I don't score within 5% of the trendline of the record poly scores I keep on poly-searching. The outliers for records aren't THAT great an outlier; only one or two of the record entries are notably lucky / above trend across the entire table.

Greg's interpretation is similar to mine, that RDS was asking how big an outlier the record scores are compared to the typical polynomial score found during a search. However, I don't see how one would measure that mean, since the results depend so heavily on the filters applied at each phase of poly select. I mean, if I root-opt only my best 100 hits while Gimarel root-opts his top 1000, clearly we're going to have different concepts of "mean poly".

While these objections are pretty obvious, I don't see a way to pick a standard for either one to yield meaningful distribution data.
VBCurtis is offline   Reply With Quote
Old 2022-10-02, 20:23   #26
R.D. Silverman
 
R.D. Silverman's Avatar
 
"Bob Silverman"
Nov 2003
North of Boston

1D4816 Posts
Default

Quote:
Originally Posted by VBCurtis View Post
Forgive my ignorance, but is RDS asking for the mean score of, say, all GNFS-155 jobs we've run, and look at the distribution of poly scores for various jobs run at some particular size? That appears to be Sean's interpretation of the question. I can address this, in that if I don't score within 5% of the trendline of the record poly scores I keep on poly-searching. The outliers for records aren't THAT great an outlier; only one or two of the record entries are notably lucky / above trend across the entire table.

Greg's interpretation is similar to mine, that RDS was asking how big an outlier the record scores are compared to the typical polynomial score found during a search. However, I don't see how one would measure that mean, since the results depend so heavily on the filters applied at each phase of poly select. I mean, if I root-opt only my best 100 hits while Gimarel root-opts his top 1000, clearly we're going to have different concepts of "mean poly".

While these objections are pretty obvious, I don't see a way to pick a standard for either one to yield meaningful distribution data.
For each candidate composite, each polynomial yields an e-score. You may not compute an e-score
for each one, but it does exist. I am asking how the e-scores are distributed across all of the polynomials.
What is their mean, how far from the mean (in terms of the standard deviation) is the best score? This is Stat-101.
R.D. Silverman is offline   Reply With Quote
Old 2022-10-02, 21:53   #27
VBCurtis
 
VBCurtis's Avatar
 
"Curtis"
Feb 2005
Riverside, CA

10101101101002 Posts
Default

Each polynomial... from what set? From stage 1, or after size opt, or after root opt?

What settings do you use for stage 1 norm, or stage 2 norm, or for P or nq (in CADO)? Each setting changes both the number of polys generated per unit of effort as well as their average quality.

You seem to be treating the poly search as a black box that is the same for every search, while it's far from that.

I don't think it's stats 101, despite your condescension- rather, I think it reflects that you don't know much about how poly select is actually run. I suppose rather than argue with you, I should just suggest you gather your own data, so you get exactly what you're looking for.

Last fiddled with by VBCurtis on 2022-10-02 at 21:56
VBCurtis is offline   Reply With Quote
Old 2022-10-02, 22:32   #28
R.D. Silverman
 
R.D. Silverman's Avatar
 
"Bob Silverman"
Nov 2003
North of Boston

23×937 Posts
Default

Quote:
Originally Posted by VBCurtis View Post
Each polynomial... from what set? From stage 1, or after size opt, or after root opt?

What settings do you use for stage 1 norm, or stage 2 norm, or for P or nq (in CADO)? Each setting changes both the number of polys generated per unit of effort as well as their average quality.

You seem to be treating the poly search as a black box that is the same for every search, while it's far from that.

I don't think it's stats 101, despite your condescension- rather, I think it reflects that you don't know much about how poly select is actually run. I suppose rather than argue with you, I should just suggest you gather your own data, so you get exactly what you're looking for.
If I had the hardware, I would. I do not have a GPU. I only have a single PC.

We seem to be talking cross purposes. You are focussing on the specifics of how the search is conducted,
and on the specifics of your implementation of that search. I am asking about the mathematics.
How the selection is run does not matter.


And your accusation that I was being condescending was totally uncalled for.

Let N be an odd composite whose factorization is sought. Polynomial selection finds a (set of) polynomials F_i,
roots of those polynomials M_i such that F_i(M_i) = 0 mod N. The second polynomial for NFS is then
G_i(x) = x -M_i. (or x+M_i depending on choice of sign). For each such pair of polynomials we seek to
maximize integral integral Prob(norm(G(p)) * Prob(norm(F(p)) where the norms are the norms of the
polynomials taken at a lattice point p, and the double integral is taken over the entire lattice and Prob is
the probability that the norms are smooth over the factor base. The likelihood that the polys are jointly smooth
is measured by the e-score. Each such pair of polynomials has a e-score. The selection process chooses
a final pair of polynomials with maximal e-score.

The set of e-scores has a distribution. I am asking how the set of e-scores is distributed. The exact search process
and the inputs that guide that search process do not matter. The question is:

For each pair of polynomials (F,G) generated by the search process, [regardless of the specifics of how that search is conducted] there is a e-score. It exists even if the search process does not compute it for every pair. What is is the distribution of the e-scores that are computed? By how much does the maximal e-score exceed the mean?

One can also ask: If you were to compute an e-score for every pair of polynomials, what would its distribution be? I think this is worth a paper.

Last fiddled with by R.D. Silverman on 2022-10-02 at 22:59
R.D. Silverman is offline   Reply With Quote
Old 2022-10-03, 10:01   #29
henryzz
Just call me Henry
 
henryzz's Avatar
 
"David"
Sep 2007
Liverpool (GMT/BST)

3·5·401 Posts
Default

Quote:
Originally Posted by R.D. Silverman View Post
One can also ask: If you were to compute an e-score for every pair of polynomials, what would its distribution be? I think this is worth a paper.

Sounds like your next research project.
The CPU version of msieve sounds like it would be fine for this study although it would be worth considering how the distribution parameters vary with more/less searching.
henryzz is online now   Reply With Quote
Old 2022-10-06, 05:46   #30
frmky
 
frmky's Avatar
 
Jul 2003
So Cal

1001111001012 Posts
Default

I had a bit over 3 million unique size-optimized hits from 2,1109+. I initially forgot to sort them before using uniq. A quick histogram of their norms is attached.

I then took the 10000 polys with the lowest (best) norms and ran root optimization with msieve. msieve outputs 200 root-optimized polynomials for each input, so this resulted in 2M polynomials. A quick histogram of their e-scores is also attached.
Attached Thumbnails
Click image for larger version

Name:	sizeoptnorm.png
Views:	67
Size:	8.5 KB
ID:	27420   Click image for larger version

Name:	escore.png
Views:	75
Size:	8.3 KB
ID:	27421  
frmky is offline   Reply With Quote
Old 2022-10-06, 12:29   #31
R.D. Silverman
 
R.D. Silverman's Avatar
 
"Bob Silverman"
Nov 2003
North of Boston

23·937 Posts
Default

Quote:
Originally Posted by frmky View Post
I had a bit over 3 million unique size-optimized hits from 2,1109+. I initially forgot to sort them before using uniq. A quick histogram of their norms is attached.

I then took the 10000 polys with the lowest (best) norms and ran root optimization with msieve. msieve outputs 200 root-optimized polynomials for each input, so this resulted in 2M polynomials. A quick histogram of their e-scores is also attached.
It seems that my guess was wrong. This does not appear to be similar to a Gamma pdf. Is there a central limit
theorem effect happening here?
R.D. Silverman is offline   Reply With Quote
Old 2022-10-06, 16:18   #32
chris2be8
 
chris2be8's Avatar
 
Sep 2009

23·7·43 Posts
Default

I've got a fair amount of similar data for GGNFS poly selection for various numbers (most from the Brent tables) ranging from 130 to 176 digits.

My scripts usually kept the best 200-300 polys from msieve -np1 -nps (size optimized) and put them through msieve -npr (root optimized) which generated several times as many polys. I then used the one with the best e-score.

The scores from -nps seem to be on a different scale to the scores from -npr though. 6.400158e+19 vs 9.418e-13 for the first entry in both files (this is for a c161).

I could send you the data if you want it. I don't know enough statistics to do a useful analysis myself. If you want it send me a PM with details of what you want (do you want the polys or just the e-scores?) and how to send it (email or I could post it on a web site for you to download. And how many numbers would you want it for?
chris2be8 is offline   Reply With Quote
Old 2022-10-06, 18:01   #33
frmky
 
frmky's Avatar
 
Jul 2003
So Cal

253310 Posts
Default

Quote:
Originally Posted by R.D. Silverman View Post
Is there a central limit theorem effect happening here?
Likely since the e-score is an integral combining contributions from both size and root optimization. msieve reports both the final norm characterizing the size and alpha characterizing the root properties of the polynomials. Attached are histograms of each of these separately for the same 2 million polynomials.
Attached Thumbnails
Click image for larger version

Name:	norm.png
Views:	54
Size:	10.4 KB
ID:	27422   Click image for larger version

Name:	alpha.png
Views:	60
Size:	10.0 KB
ID:	27423  
frmky is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
Polynomial selection Max0526 NFS@Home 9 2017-05-20 08:57
Updated polynomial selection jasonp Msieve 65 2011-05-01 19:06
GNFS polynomial selection Unregistered Information & Answers 3 2011-04-16 14:24
2^877-1 polynomial selection fivemack Factoring 47 2009-06-16 00:24
Polynomial selection CRGreathouse Factoring 2 2009-05-25 07:55

All times are UTC. The time now is 11:31.


Mon Dec 5 11:31:36 UTC 2022 up 109 days, 9 hrs, 0 users, load averages: 0.71, 0.91, 1.08

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2022, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.

≠ ± ∓ ÷ × · − √ ‰ ⊗ ⊕ ⊖ ⊘ ⊙ ≤ ≥ ≦ ≧ ≨ ≩ ≺ ≻ ≼ ≽ ⊏ ⊐ ⊑ ⊒ ² ³ °
∠ ∟ ° ≅ ~ ‖ ⟂ ⫛
≡ ≜ ≈ ∝ ∞ ≪ ≫ ⌊⌋ ⌈⌉ ∘ ∏ ∐ ∑ ∧ ∨ ∩ ∪ ⨀ ⊕ ⊗ 𝖕 𝖖 𝖗 ⊲ ⊳
∅ ∖ ∁ ↦ ↣ ∩ ∪ ⊆ ⊂ ⊄ ⊊ ⊇ ⊃ ⊅ ⊋ ⊖ ∈ ∉ ∋ ∌ ℕ ℤ ℚ ℝ ℂ ℵ ℶ ℷ ℸ 𝓟
¬ ∨ ∧ ⊕ → ← ⇒ ⇐ ⇔ ∀ ∃ ∄ ∴ ∵ ⊤ ⊥ ⊢ ⊨ ⫤ ⊣ … ⋯ ⋮ ⋰ ⋱
∫ ∬ ∭ ∮ ∯ ∰ ∇ ∆ δ ∂ ℱ ℒ ℓ
𝛢𝛼 𝛣𝛽 𝛤𝛾 𝛥𝛿 𝛦𝜀𝜖 𝛧𝜁 𝛨𝜂 𝛩𝜃𝜗 𝛪𝜄 𝛫𝜅 𝛬𝜆 𝛭𝜇 𝛮𝜈 𝛯𝜉 𝛰𝜊 𝛱𝜋 𝛲𝜌 𝛴𝜎𝜍 𝛵𝜏 𝛶𝜐 𝛷𝜙𝜑 𝛸𝜒 𝛹𝜓 𝛺𝜔