20201021, 19:07  #1 
Jan 2012
Toronto, Canada
59 Posts 
Estimating number of relations needed
I am trying to run a 200 digit SNFS job for the first time and after sieving from 7.5M to ~15M I've collected about 33.5M relations vs the estimated minimum of about 28.6M but it isn't getting past the filtering step. It reduces to just 3499 relations and 0 ideals after some number of passes and the number of duplicate relations is about 10% the number of relations which is similar to that of some smaller jobs I've finished.
Anyway, I would have expected the number of relations and ideals after filtering to be much greater than what is indicated here. I'm not sure if the issue can be resolved just with more sieving, but if that is the case is there a way to reliably estimate how many more relations I might need? Full log is also attached. Code:
Wed Oct 21 14:44:04 2020 Msieve v. 1.53 (SVN unknown) Wed Oct 21 14:44:04 2020 random seeds: bd75a574 74f633da Wed Oct 21 14:44:04 2020 factoring 10000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001023 (200 digits) Wed Oct 21 14:44:05 2020 searching for 15digit factors Wed Oct 21 14:44:05 2020 commencing number field sieve (200digit input) Wed Oct 21 14:44:05 2020 R0: 5000000000000000000000000000000000000000 Wed Oct 21 14:44:05 2020 R1: 1 Wed Oct 21 14:44:05 2020 A0: 5115 Wed Oct 21 14:44:05 2020 A1: 0 Wed Oct 21 14:44:05 2020 A2: 0 Wed Oct 21 14:44:05 2020 A3: 0 Wed Oct 21 14:44:05 2020 A4: 0 Wed Oct 21 14:44:05 2020 A5: 16 Wed Oct 21 14:44:05 2020 skew 3.17, size 3.898e14, alpha 0.504, combined = 1.155e11 rroots = 1 Wed Oct 21 14:44:05 2020 Wed Oct 21 14:44:05 2020 commencing relation filtering Wed Oct 21 14:44:05 2020 estimated available RAM is 7542.8 MB Wed Oct 21 14:44:05 2020 commencing duplicate removal, pass 1 Wed Oct 21 14:47:13 2020 error 15 reading relation 30460330 Wed Oct 21 14:47:24 2020 error 15 reading relation 32331885 Wed Oct 21 14:47:25 2020 error 15 reading relation 32392947 Wed Oct 21 14:47:25 2020 error 11 reading relation 32423445 Wed Oct 21 14:47:30 2020 error 9 reading relation 33359915 Wed Oct 21 14:47:31 2020 skipped 5 relations with composite factors Wed Oct 21 14:47:31 2020 found 3606403 hash collisions in 33483146 relations Wed Oct 21 14:47:49 2020 added 75 free relations Wed Oct 21 14:47:49 2020 commencing duplicate removal, pass 2 Wed Oct 21 14:48:00 2020 found 3078858 duplicates and 30404363 unique relations Wed Oct 21 14:48:00 2020 memory use: 138.6 MB Wed Oct 21 14:48:00 2020 reading ideals above 720000 Wed Oct 21 14:48:00 2020 commencing singleton removal, initial pass Wed Oct 21 14:50:59 2020 memory use: 753.0 MB Wed Oct 21 14:50:59 2020 reading all ideals from disk Wed Oct 21 14:51:00 2020 memory use: 1037.9 MB Wed Oct 21 14:51:02 2020 keeping 38289591 ideals with weight <= 200, target excess is 159943 Wed Oct 21 14:51:04 2020 commencing inmemory singleton removal Wed Oct 21 14:51:06 2020 begin with 30404363 relations and 38289591 unique ideals Wed Oct 21 14:51:12 2020 reduce to 3499 relations and 0 ideals in 17 passes Wed Oct 21 14:51:12 2020 max relations containing the same ideal: 0 Wed Oct 21 14:51:12 2020 filtering wants 1000000 more relations Last fiddled with by LaurV on 20201022 at 07:52 Reason: replaced quote tag with code tag due to aesthetics (very long number, difficult to read) 
20201021, 19:35  #2 
Dec 2017
77_{8} Posts 
I think you've just crossed a boundary into needing the 14e siever, which is why it seems to be longer.
It just needs more relations  probably 4045 million, based on similar numbers that I've factored. 
20201021, 20:54  #3 
"Curtis"
Feb 2005
Riverside, CA
2^{4}×13×23 Posts 
If the largeprime bound jumped up 1 from your previous jobs, you'll need about 70% more relations than you did for the previous jobs, but not all that much more overall wallclock time; more relations are found per unit time with a larger largeprime setting.
That's the lpbr and lpba setting in the .poly file. SNFS has a bunch of moving parts; experience will show you the way. If you're using the factmsieve.py script, the target relations numbers are not very accurate. You can edit the script as you gain experience, to have it not bother trying to filter until it's in the ballpark of your previous jobs of that size. There is quite a bit of speed to be found in tweaking the settings in that script; YAFU has incorporated lots of such tweaks, and if you're in Linux then CADO is faster still but takes some experience to run for SNFS jobs. 
20201021, 21:43  #4 
Jan 2012
Toronto, Canada
59 Posts 
Thanks for the explanations everyone. This job did switch from the 13e to the 14e siever compared to the last one (156 digits) and the large prime bounds actually jumped twice from 2^27 to 2^29. I think I will just manually modify the relations estimate in the factmsieve.py script for now.

20201022, 07:53  #5 
Romulan Interpreter
Jun 2011
Thailand
10010011100111_{2} Posts 
I edited the OP, see there the reason. Please mind for the future.

20201022, 16:17  #6  
Sep 2009
2·1,021 Posts 
Quote:
If the number of relations, X, and the number of ideals, Y, are both reasonably large (at least 4 digits) and X < Y then add 10%+(YX)*4 else add 5%. That's an empirical relationship (ie I don't know why it works). But it's useful. Chris 

20201023, 01:30  #7  
Jan 2012
Toronto, Canada
111011_{2} Posts 
Quote:
Interesting. I've just completed this factorization and my numbers were a little less than what your formula would have suggested. Ended up needing 42.2M raw relations (about 26% more than in my first post); 35.9M was where it first filtered down to a nontrivial number of relations with 4.5M relations and 5.6M ideals, which would have meant 8M relations or ~44M total. 

20201023, 16:10  #8 
Sep 2009
7FA_{16} Posts 
I worked out my guidelines doing rather smaller SNFS jobs than your job. So I'm not too surprised you managed with fewer relations. A better guess would take the duplicate rate into account as well. It's only a rough guide but a lot better than repeatedly trying to build a matrix until we have enough relations.
Chris 
20210331, 18:52  #9 
Jan 2012
Toronto, Canada
111011_{2} Posts 
I'm running two similar SNFS jobs as in the OP concurrently, and I'm noticing a rather massive discrepancy in yields from these:
Poly 1: Code:
n: 10000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000040039 type: snfs c5: 16 c0: 200195 Y1: 1 Y0: 5000000000000000000000000000000000000000 q0: 3300000 Code:
n: 10000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000039669 type: snfs c5: 16 c0: 198345 Y1: 1 Y0: 5000000000000000000000000000000000000000 q0: 3300000 Am I correct in understanding from this is that this is due to smooth relations being harder to find in the algebraic ring extension of Z that include a root of poly 2 compared to that for poly 1, and if so is there a good way to estimate the "difficulty" of these polynomials without actually doing the sieving? Last fiddled with by swishzzz on 20210331 at 18:53 Reason: quote > code 
20210331, 19:00  #10 
"Curtis"
Feb 2005
Riverside, CA
2^{4}×13×23 Posts 
You didn't mention how much time the sieving takes yield isn't really related to how fast relations are found. Comparing 290k relations from a Qrange to 430k relations doesn't tell you anything about difficulty.
Compare sec/rel in a variety of ranges, or compare total sieving time; yield is a red herring. 
20210331, 19:45  #11  
Jan 2012
Toronto, Canada
59 Posts 
I ran the two jobs on different boxes (8 cores for poly 1 and 6 cores for poly 2) but here are some stats from various Q ranges which shows that poly 2 is quite a bit slower:
Quote:


Thread Tools  
Similar Threads  
Thread  Thread Starter  Forum  Replies  Last Post 
Estimating the number of primes in a partiallyfactored number  CRGreathouse  Probability & Probabilistic Number Theory  15  20140813 18:46 
Estimating time needed for GNFS  CRGreathouse  Factoring  16  20140310 03:40 
Estimating time needed for GNFS  CRGreathouse  Factoring  0  20140302 04:18 
Estimating the number of prime factors a number has  henryzz  Math  7  20120523 01:13 
Estimating minimum relations  bchaffin  Factoring  24  20120324 18:37 