View Single Post
Old 2021-09-23, 16:40   #66
VBCurtis's Avatar
Feb 2005
Riverside, CA

3×1,667 Posts

I'd like to see 10-100M and 10-105M. The CADO group reports that duplicate ratio rises meaningfully when q-max is beyond 8 * q-min. Your data is all within that ratio, but you started sieving at 10M so you have a chance to measure a range outside that ratio to see how duplicates and matrix building behave.

A faster test would be to try remdups on 15-108, 12-108. 10-108; we can do some subtraction to see how the duplicate ratio is on specifically 12-15 and 10-12 within the data set of "sieved to 108M". Those Q-ranges look much faster than higher ranges, but if there are over 50% duplicates down there (when filtered with the entire dataset) then the faster sieving is an illusion. We don't need full filtering / matrix generation runs there, just the part until "xxx raw relations, yyy unique".
VBCurtis is offline   Reply With Quote