Estimating number of relations needed
1 Attachment(s)
I am trying to run a 200 digit SNFS job for the first time and after sieving from 7.5M to ~15M I've collected about 33.5M relations vs the estimated minimum of about 28.6M but it isn't getting past the filtering step. It reduces to just 3499 relations and 0 ideals after some number of passes and the number of duplicate relations is about 10% the number of relations which is similar to that of some smaller jobs I've finished.
Anyway, I would have expected the number of relations and ideals after filtering to be much greater than what is indicated here. I'm not sure if the issue can be resolved just with more sieving, but if that is the case is there a way to reliably estimate how many more relations I might need? Full log is also attached. [code] Wed Oct 21 14:44:04 2020 Msieve v. 1.53 (SVN unknown) Wed Oct 21 14:44:04 2020 random seeds: bd75a574 74f633da Wed Oct 21 14:44:04 2020 factoring 10000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001023 (200 digits) Wed Oct 21 14:44:05 2020 searching for 15digit factors Wed Oct 21 14:44:05 2020 commencing number field sieve (200digit input) Wed Oct 21 14:44:05 2020 R0: 5000000000000000000000000000000000000000 Wed Oct 21 14:44:05 2020 R1: 1 Wed Oct 21 14:44:05 2020 A0: 5115 Wed Oct 21 14:44:05 2020 A1: 0 Wed Oct 21 14:44:05 2020 A2: 0 Wed Oct 21 14:44:05 2020 A3: 0 Wed Oct 21 14:44:05 2020 A4: 0 Wed Oct 21 14:44:05 2020 A5: 16 Wed Oct 21 14:44:05 2020 skew 3.17, size 3.898e14, alpha 0.504, combined = 1.155e11 rroots = 1 Wed Oct 21 14:44:05 2020 Wed Oct 21 14:44:05 2020 commencing relation filtering Wed Oct 21 14:44:05 2020 estimated available RAM is 7542.8 MB Wed Oct 21 14:44:05 2020 commencing duplicate removal, pass 1 Wed Oct 21 14:47:13 2020 error 15 reading relation 30460330 Wed Oct 21 14:47:24 2020 error 15 reading relation 32331885 Wed Oct 21 14:47:25 2020 error 15 reading relation 32392947 Wed Oct 21 14:47:25 2020 error 11 reading relation 32423445 Wed Oct 21 14:47:30 2020 error 9 reading relation 33359915 Wed Oct 21 14:47:31 2020 skipped 5 relations with composite factors Wed Oct 21 14:47:31 2020 found 3606403 hash collisions in 33483146 relations Wed Oct 21 14:47:49 2020 added 75 free relations Wed Oct 21 14:47:49 2020 commencing duplicate removal, pass 2 Wed Oct 21 14:48:00 2020 found 3078858 duplicates and 30404363 unique relations Wed Oct 21 14:48:00 2020 memory use: 138.6 MB Wed Oct 21 14:48:00 2020 reading ideals above 720000 Wed Oct 21 14:48:00 2020 commencing singleton removal, initial pass Wed Oct 21 14:50:59 2020 memory use: 753.0 MB Wed Oct 21 14:50:59 2020 reading all ideals from disk Wed Oct 21 14:51:00 2020 memory use: 1037.9 MB Wed Oct 21 14:51:02 2020 keeping 38289591 ideals with weight <= 200, target excess is 159943 Wed Oct 21 14:51:04 2020 commencing inmemory singleton removal Wed Oct 21 14:51:06 2020 begin with 30404363 relations and 38289591 unique ideals Wed Oct 21 14:51:12 2020 reduce to 3499 relations and 0 ideals in 17 passes Wed Oct 21 14:51:12 2020 max relations containing the same ideal: 0 Wed Oct 21 14:51:12 2020 filtering wants 1000000 more relations [/code] 
I think you've just crossed a boundary into needing the 14e siever, which is why it seems to be longer.
It just needs more relations  probably 4045 million, based on similar numbers that I've factored. 
If the largeprime bound jumped up 1 from your previous jobs, you'll need about 70% more relations than you did for the previous jobs, but not all that much more overall wallclock time; more relations are found per unit time with a larger largeprime setting.
That's the lpbr and lpba setting in the .poly file. SNFS has a bunch of moving parts; experience will show you the way. If you're using the factmsieve.py script, the target relations numbers are not very accurate. You can edit the script as you gain experience, to have it not bother trying to filter until it's in the ballpark of your previous jobs of that size. There is quite a bit of speed to be found in tweaking the settings in that script; YAFU has incorporated lots of such tweaks, and if you're in Linux then CADO is faster still but takes some experience to run for SNFS jobs. 
Thanks for the explanations everyone. This job did switch from the 13e to the 14e siever compared to the last one (156 digits) and the large prime bounds actually jumped twice from 2^27 to 2^29. I think I will just manually modify the relations estimate in the factmsieve.py script for now.

I edited the OP, see there the reason. Please mind for the future.

[QUOTE=swishzzz;560589]
[code] Wed Oct 21 14:51:12 2020 reduce to 3499 relations and 0 ideals in 17 passes [/code][/QUOTE] That line implies you need at least 30% more relations. You might need a lot more but won't need significantly less than that. If the number of relations, X, and the number of ideals, Y, are both reasonably large (at least 4 digits) and X < Y then add 10%+(YX)*4 else add 5%. That's an empirical relationship (ie I don't know why it works). But it's useful. Chris 
[QUOTE=chris2be8;560730]That line implies you need at least 30% more relations. You might need a lot more but won't need significantly less than that.
If the number of relations, X, and the number of ideals, Y, are both reasonably large (at least 4 digits) and X < Y then add 10%+(YX)*4 else add 5%. That's an empirical relationship (ie I don't know why it works). But it's useful. Chris[/QUOTE] Interesting. I've just completed this factorization and my numbers were a little less than what your formula would have suggested. Ended up needing 42.2M raw relations (about 26% more than in my first post); 35.9M was where it first filtered down to a nontrivial number of relations with 4.5M relations and 5.6M ideals, which would have meant 8M relations or ~44M total. 
I worked out my guidelines doing rather smaller SNFS jobs than your job. So I'm not too surprised you managed with fewer relations. A better guess would take the duplicate rate into account as well. It's only a rough guide but a lot better than repeatedly trying to build a matrix until we have enough relations.
Chris 
I'm running two similar SNFS jobs as in the OP concurrently, and I'm noticing a rather massive discrepancy in yields from these:
Poly 1: [CODE] n: 10000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000040039 type: snfs c5: 16 c0: 200195 Y1: 1 Y0: 5000000000000000000000000000000000000000 q0: 3300000 [/CODE] Poly 2: [CODE] n: 10000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000039669 type: snfs c5: 16 c0: 198345 Y1: 1 Y0: 5000000000000000000000000000000000000000 q0: 3300000 [/CODE] So same size, basically the same skew as the two polys only differ slightly in the constant term. Sieving from q=3300k to 3400k yielded about 430k relations for poly 1 and only 290k for poly 2. Poly 1 built a 20% smaller than usual matrix with ~37.4M unique relations (so perhaps it would have been possible to build with 3536M), but poly 2 barely built a matrix with 38.5M uniques. So with slower sieving + more relations needed I'd estimate poly 2 to be at least 4 digits more difficult than poly 1 which is somewhat surprising. Am I correct in understanding from this is that this is due to smooth relations being harder to find in the algebraic ring extension of Z that include a root of poly 2 compared to that for poly 1, and if so is there a good way to estimate the "difficulty" of these polynomials without actually doing the sieving? 
You didn't mention how much time the sieving takes yield isn't really related to how fast relations are found. Comparing 290k relations from a Qrange to 430k relations doesn't tell you anything about difficulty.
Compare sec/rel in a variety of ranges, or compare total sieving time; yield is a red herring. 
I ran the two jobs on different boxes (8 cores for poly 1 and 6 cores for poly 2) but here are some stats from various Q ranges which shows that poly 2 is quite a bit slower:
[QUOTE] Poly 1 sieved from Q = 3300k  12400k Poly 2 sieved from Q = 3300k  18100k Q = 3300k  3400k Poly 1: 430k rel, 46.1 rel/sec/core Poly 2: 290k rel, 36.7 rel/sec/core Q = 7000k  7100k Poly 1: 465k rel, 50.1 rel/sec/core Poly 2: 312k rel, 38.3 rel/sec/core Q = 11000k  11100k Poly 1: 456k rel, 47.2 rel/sec/core Poly 2: 304k rel, 35.1 rel/sec/core Q = 16000 to 16100k Poly 2: 286k rel, 30.9 rel/sec/core [/QUOTE] 
All times are UTC. The time now is 19:26. 
Powered by vBulletin® Version 3.8.11
Copyright ©2000  2021, Jelsoft Enterprises Ltd.