20210923, 18:19  #67 
Aug 2020
79*6581e4;3*2539e3
3·239 Posts 
I did the matrix building to see how many uniques were required, but yes, it's faster to just do that once and otherwise only use remdups. So I'll do just that? Or is there some merit in using msieve instead of remdups?
Would it be helpful to sieve at q > 108M? A range of 1M takes slightly more than 3 hours, so a few M can quickly be added overnight. 
20210923, 18:24  #68 
"Curtis"
Feb 2005
Riverside, CA
1011001010111_{2} Posts 
I don't think we can answer that until we see the data from 15 vs 12 vs 10. I'm expecting 5060% duplicates in 1215, and worse in 1012.
If you do wish to take more data, I think Q=810M would tell us more about the optimal starting Q than 108+. Let's see what we learn with the data you have, first. 
20211007, 16:52  #69 
Aug 2020
79*6581e4;3*2539e3
3×239 Posts 
Total  Uniques  Duplicates  Ratio
10100M 214,512,998  153,030,781  61,482,217  71,34% 11105M 219,448,738  158,516,513  60,932,225  72,23% 12110M 224,369,197  163,859,554  60,509,643  73,03% 13110M 220,590,893  162,609,603  57,981,290  73,72% I also did several steps in between, but ran into a problem. Is it possible that the dimension parameter for remdups4 strongly influences the number of uniques found? Initially I did the remdup manually with dim close to the required maximum value. For the batch file I just used 650 because it worked for all ranges. But I found that with dim=350 I got 85,319,847 uniques for 1050M while with dim=650 it was only 83,992,074. Was that just coincidence and something else went wrong (total numbers of rels was the same though) or does the dimension influence it? I can't really imagine it, but who knows. What I was planning to do is finding the number of uniques for various ranges in 5M or 1M steps. That way it's be possible to find the ideal qmin that will reach the required 152M unqiues within the shortest range. And we'd see if that happens when qmax/qmin = 8. I don't know if sieving smaller q's makes sense, already at 10M it's much less efficient than at the larger qmins. Or do you want to see something specific from it? Last fiddled with by bur on 20211007 at 17:01 
20211007, 17:38  #70 
"Curtis"
Feb 2005
Riverside, CA
7·19·43 Posts 
I believe a toosmall dimension setting will let some duplicate relations sneak through; a reasonable price to pay to control memory use on really big problems. For these normalsized jobs, I set dim to 3000 so that it is not a factor.
As for Qmin selection, it's a more complicated problem than you think it is. Sec/rel at small Q is typically 5060% of the time at the ending Q. So, we can tolerate a quitelarge duplicate ratio at small Q because the relations are found so quickly. Q=10M might be 75% faster than Q=110M, but yield 45% duplicates vs 15% at high Q. I made up those numbers, but they're typical in the data I gathered. If that's the data, is Q=1011M worth sieving? So, if you want to more accurately solve for the best minQ on this job, you'd need: relations per second at Q=10M duplicate rate for 1011M, found by filtering 10110M and 11110M to determine how many uniques are added by sieving 1011M and then dividing by the total raw relations count for 1011M. relations per second at 110M duplicate rate for 109110M or 110111M. Even then, you'll have data for just one job, and this data varies from job to job. I suggest that Qmax/Qmin = 8 is "good enough" for our purposes. Edit: Let's look at your last two lines of data, since they have the same ending Q: 1213M has 3778304 total relations, 1249951 unique. That's a duplicate ratio of 67%, higher than I expected. If you have data for duplicate ratio above 105M, we could then convert the sieve speeds of each range into a "uniques sec/rel" speed and presto! An answer for 1213 vs 108110. Last fiddled with by VBCurtis on 20211007 at 17:45 
20211007, 17:56  #71 
Apr 2020
947_{10} Posts 
The way I tested this was to look at the CPUtime stats in the logfile ('stats_total_cpu_time') to find ranges that took almost exactly the same length of time to sieve, and then see which of these ranges produced the most unique relations.

20211022, 17:17  #72 
Aug 2020
79*6581e4;3*2539e3
717_{10} Posts 
I finished the uniques ratio for a large set of different qmin/qmax settings. The data is attached as an xslx file, zipped since for some reason the forum software didn't like the file. Explanations are as comments in the file.
A brief summary: I sieved a C167 (AL1992:1644) in the qrange of 5M to 125M. Then I used remdups4 to determine the uniques for different qranges with qmin of 5M to 20M in 1M steps and qmax of 50M to 125M (step size 1M or 5M). msieve required about 152M uniques to build a matrix. As expected, depending on the choice of qmin this took a larger or smaller qrange. Minimum qrange to yield 152M uniques was qmin = 16M; qmax = 105M. What I found interesting is that the uniques ratio at first decreased with increasing qrange (with fixed qmin) and then began to increase again and after a while to decrease again. It seems to go up and down. This is just one job, but here the optimal qmin/max ratio was 6.6. It might be interesting to perform a similar series of tests for a different number of similar size to see if it will be that low as well. Anyway, qmin = 16M is very close to your initial choice of 17M, so whatever that was based on, it was a good choice. 
20211022, 23:33  #73  
Apr 2020
947_{10} Posts 
Quote:
There are probably some cases where the optimal Q "range" leaves a gap below lim1. This adds yet another variable to the testing, and of course CADO doesn't (yet) let you automate this without manual intervention. Last fiddled with by charybdis on 20211022 at 23:34 

20220105, 15:16  #74 
Aug 2020
79*6581e4;3*2539e3
3×239 Posts 
Sieving of the C170 is proceeding fine, I'm at 121M rels, q = 46M. The yield per 5000 q varies quite strongly, just the last three workunits had 20273, 15446 and 17053, respectively.
I noticed that the C165 params had a qmin of 17M, whereas for the C170 you suggested to use qmin of 15M, I'd have thought qmin would always increase with increasing n. Is there a specific reason? The C167 I ran some tests on recently gave the best result with qmin of 16M. Last fiddled with by bur on 20220105 at 15:16 
20220105, 17:01  #75 
"Curtis"
Feb 2005
Riverside, CA
1011001010111_{2} Posts 
165 digits uses I=14, while 170 is using A=28. With a larger siever, we expect to need a shorter Qrange.
I set Qmin as low as possible such that we expect the Qmax to Qmin ratio to be between 7 and 8. 
20220105, 18:17  #76 
Aug 2020
79*6581e4;3*2539e3
3·239 Posts 
Ok, thanks. For the C167 the qmin which lead to the smallest required qrange had a ratio of about 6. Maybe it'll be interesting to do a similar test on this C170, or is there already enough evidence that in most cases 78 is ideal?

20220105, 19:16  #77 
"Curtis"
Feb 2005
Riverside, CA
1011001010111_{2} Posts 
Definitely *not* enough evidence! In fact, my target was 8 until your analysis of that C167. Now I say "7 to 8", and maybe I should say "7" pending further testing like the happilydetailed one you did already.

Thread Tools  
Similar Threads  
Thread  Thread Starter  Forum  Replies  Last Post 
Some CADONFS Work At Around 175180 Decimal Digits  EdH  CADONFS  127  20201007 01:47 
Sigma parameter in ecm  storm5510  Information & Answers  4  20191130 21:32 
PrimeNet error 7: Invalid parameter  ksteczk  PrimeNet  6  20180326 15:11 
Parameter Underestimation  R.D. Silverman  Cunningham Tables  14  20100929 19:56 
ECM Work and Parameter Choices  R.D. Silverman  Cunningham Tables  11  20060306 18:46 