mersenneforum.org Some CADO-NFS Work At Around 175-180 Decimal Digits
 Register FAQ Search Today's Posts Mark Forums Read

 2020-04-08, 19:31 #12 charybdis     Apr 2020 17×29 Posts Thank you so much for this! I'll go for 285M relations and see what happens, and I'll keep you updated on the filtering/matrix steps.
 2020-04-08, 20:45 #13 VBCurtis     "Curtis" Feb 2005 Riverside, CA 27×3×13 Posts A couple little things: I didn't account for needing more relations for C177 vs C175, but this is all guesswork anyway... 290M might be smarter? Also, Ed uses msieve to solve the matrix because it's faster than CADO. If you're going to use CADO start-to-finish, then the matrix step is relatively slower, which again argues for more sieving. I *think* that you can use a snapshot file to retroactively do more sieving if a matrix doesn't meet your sensibilities for size- that is, if the matrix looks big, you can edit the snapshot file to add a higher rels_wanted setting, and restart CADO. I believe CADO will look to see if relations count matches that number, even if filtering is already complete (I'd like confirmation of this, actually!). If I'm right, starting with 285M with a plan to bump to 300M if the matrix comes out big is maybe the best plan. tl'dr: 285M good. More might be better. :)
2020-04-08, 20:50   #14
EdH

"Ed Hall"
Dec 2009

2×1,999 Posts

Quote:
 Originally Posted by VBCurtis 1. This only matters if we plan to iterate multiple factorizations of similar size to compare various params settings; otherwise, your timing data doesn't tell us much since there is little to compare to. If you have some elapsed (e.g. wall clock) time for the C168ish you did with the default CADO file, we can see if my C175 file did better than the observed double-every-5.5-digits typical on CADO. So, I wouldn't bother letting it finish, but I would try to record CADO's claim of sieve time from right before it enters filtering. 2. I believe you need to give it the poly also; either the .poly file in the same folder (which the snapshot should reference), or by explicitly declaring the poly the same way you did for the SNFS poly (tasks.poly = {polyfilename}, if I recall). Either way, you'll need to copy the poly file to the colab instance. 3. Far beyond my pay-grade in networking nor CADO knowledge, sorry.
1. Do you know whether "tasks.filter.run = false" provides the poly/sieve timings or just stops?

2. I had thought the poly values were also in the snapshot, but I see they aren't, so I'll be sure to copy that file as well. I will have to modify the snapshot file for other things such as build path, too.

One thing I hope to try in the next few days, is to run a semi-clone, server only, instance of the current local server, on Colab, sieving 100k-500k*. I'd like to see if the local instance would recognize the relations from the Colab instance, if it had no record of them being assigned. If not, I'm wondering if including the .stderr file would tell the local instance that they had already been accepted.

*Would running this area throw off your "rels_wanted" value, since these would be more prone to duplicates, or am I off course?

2020-04-08, 20:58   #15
EdH

"Ed Hall"
Dec 2009

2·1,999 Posts

Quote:
 Originally Posted by VBCurtis A couple little things: I didn't account for needing more relations for C177 vs C175, but this is all guesswork anyway... 290M might be smarter? Also, Ed uses msieve to solve the matrix because it's faster than CADO. If you're going to use CADO start-to-finish, then the matrix step is relatively slower, which again argues for more sieving. I *think* that you can use a snapshot file to retroactively do more sieving if a matrix doesn't meet your sensibilities for size- that is, if the matrix looks big, you can edit the snapshot file to add a higher rels_wanted setting, and restart CADO. I believe CADO will look to see if relations count matches that number, even if filtering is already complete (I'd like confirmation of this, actually!). If I'm right, starting with 285M with a plan to bump to 300M if the matrix comes out big is maybe the best plan. tl'dr: 285M good. More might be better. :)
On previous occasions, I have restarted CADO-NFS after it was already performing krylov, to add more relations. As you posted, I changed the rels_wanted value and it did go back to sieving until the new value was met. I mainly did this when msieve wouldn't build a matrix but CADO-NFS did, but I also did it for the recent testing of msieve matrices.

2020-04-08, 21:50   #16
charybdis

Apr 2020

17×29 Posts

Quote:
 Originally Posted by VBCurtis A couple little things: I didn't account for needing more relations for C177 vs C175, but this is all guesswork anyway... 290M might be smarter? Also, Ed uses msieve to solve the matrix because it's faster than CADO. If you're going to use CADO start-to-finish, then the matrix step is relatively slower, which again argues for more sieving. I *think* that you can use a snapshot file to retroactively do more sieving if a matrix doesn't meet your sensibilities for size- that is, if the matrix looks big, you can edit the snapshot file to add a higher rels_wanted setting, and restart CADO. I believe CADO will look to see if relations count matches that number, even if filtering is already complete (I'd like confirmation of this, actually!). If I'm right, starting with 285M with a plan to bump to 300M if the matrix comes out big is maybe the best plan. tl'dr: 285M good. More might be better. :)
I'll be using msieve - it's faster, and the machine I was using ran out of memory during the "replay" stage of CADO filtering for the c172 I've just finished. Filtering is obviously a while away, but what matrix size would you consider "too big" here (say with target density 100)?

2020-04-08, 22:10   #17
VBCurtis

"Curtis"
Feb 2005
Riverside, CA

116008 Posts

Quote:
 Originally Posted by EdH 1. Do you know whether "tasks.filter.run = false" provides the poly/sieve timings or just stops? *Would running this area throw off your "rels_wanted" value, since these would be more prone to duplicates, or am I off course?
1. I'm pretty sure the timing is listed before filtering begins, so CADO should show the time-to-sieve just before it exits.

No big deal on the tweak to relations from starting at lower Q. 500k is already quite low/prone to extra duplicates, going down to 100 or 150 won't change those numbers enough to matter.

2020-04-08, 22:19   #18
VBCurtis

"Curtis"
Feb 2005
Riverside, CA

10011100000002 Posts

Quote:
 Originally Posted by charybdis I'll be using msieve - it's faster, and the machine I was using ran out of memory during the "replay" stage of CADO filtering for the c172 I've just finished. Filtering is obviously a while away, but what matrix size would you consider "too big" here (say with target density 100)?
I glanced through the NFS@home 15e results page to have a look at matrix sizes, but I forgot that most numbers don't have the difficulty listed on the results (one has to open the log to find that info). So, I'm taking a guess: 20M matrix is too big for GNFS177. Using msieve, filtering and the matrix will fit in a 16GB machine easily; the limit is around 25-26M matrix size on 16GB.

... Aha! https://mersenneforum.org/showpost.p...&postcount=217 is a good data point: C184ish, 32/32LP, 366M raw relations was enough to build a 17.7M matrix. 32/32 needs about 30% more relations than our choice of 31/32, so 280M would be equivalent if we were on ggnfs and starting at Q=20M. We're taking advantage of CADO's super fast speeds at low Q, at the cost of extra duplicates; I estimate you'll need 190M unique relations, and 285-290M raw relations is still a good guess (based on that one data point from the linked post). C177 is markedly easier than C184, so I won't be surprised to see a 15-16M matrix from your dataset.

Last fiddled with by VBCurtis on 2020-04-08 at 22:20

 2020-04-13, 23:59 #19 charybdis     Apr 2020 17×29 Posts So I decided to do an early filtering run (with the default target_density of 90) for the c177 to see how things were going, and was surprised to get a rather friendly matrix (edit - this was after sieving Q up to 190M): Code: Mon Apr 13 23:19:04 2020 commencing relation filtering Mon Apr 13 23:19:04 2020 estimated available RAM is 15845.6 MB Mon Apr 13 23:19:04 2020 commencing duplicate removal, pass 1 ...relation errors... Mon Apr 13 23:45:59 2020 found 81454230 hash collisions in 266531234 relations Mon Apr 13 23:46:21 2020 added 122209 free relations Mon Apr 13 23:46:21 2020 commencing duplicate removal, pass 2 Mon Apr 13 23:51:30 2020 found 110111310 duplicates and 156542133 unique relations Mon Apr 13 23:51:30 2020 memory use: 1449.5 MB Mon Apr 13 23:51:30 2020 reading ideals above 189857792 Mon Apr 13 23:51:30 2020 commencing singleton removal, initial pass Tue Apr 14 00:03:26 2020 memory use: 3012.0 MB Tue Apr 14 00:03:27 2020 reading all ideals from disk Tue Apr 14 00:03:43 2020 memory use: 2357.2 MB Tue Apr 14 00:03:46 2020 commencing in-memory singleton removal Tue Apr 14 00:03:49 2020 begin with 156542133 relations and 141217020 unique ideals Tue Apr 14 00:04:16 2020 reduce to 69079091 relations and 41834367 ideals in 15 passes Tue Apr 14 00:04:16 2020 max relations containing the same ideal: 24 Tue Apr 14 00:04:20 2020 reading ideals above 720000 Tue Apr 14 00:04:20 2020 commencing singleton removal, initial pass Tue Apr 14 00:13:17 2020 memory use: 1506.0 MB Tue Apr 14 00:13:17 2020 reading all ideals from disk Tue Apr 14 00:13:38 2020 memory use: 2750.9 MB Tue Apr 14 00:13:43 2020 keeping 62470184 ideals with weight <= 200, target excess is 377745 Tue Apr 14 00:13:48 2020 commencing in-memory singleton removal Tue Apr 14 00:13:52 2020 begin with 69079091 relations and 62470184 unique ideals Tue Apr 14 00:14:38 2020 reduce to 68407306 relations and 61797325 ideals in 11 passes Tue Apr 14 00:14:38 2020 max relations containing the same ideal: 200 Tue Apr 14 00:15:02 2020 removing 6152193 relations and 5152193 ideals in 1000000 cliques Tue Apr 14 00:15:04 2020 commencing in-memory singleton removal Tue Apr 14 00:15:08 2020 begin with 62255113 relations and 61797325 unique ideals Tue Apr 14 00:15:42 2020 reduce to 61908990 relations and 56293091 ideals in 9 passes Tue Apr 14 00:15:42 2020 max relations containing the same ideal: 193 Tue Apr 14 00:16:03 2020 removing 4732119 relations and 3732119 ideals in 1000000 cliques Tue Apr 14 00:16:05 2020 commencing in-memory singleton removal Tue Apr 14 00:16:08 2020 begin with 57176871 relations and 56293091 unique ideals Tue Apr 14 00:16:33 2020 reduce to 56939736 relations and 52320100 ideals in 7 passes Tue Apr 14 00:16:33 2020 max relations containing the same ideal: 185 Tue Apr 14 00:16:52 2020 removing 4296409 relations and 3296409 ideals in 1000000 cliques Tue Apr 14 00:16:54 2020 commencing in-memory singleton removal Tue Apr 14 00:16:57 2020 begin with 52643327 relations and 52320100 unique ideals Tue Apr 14 00:17:19 2020 reduce to 52427201 relations and 48804144 ideals in 7 passes Tue Apr 14 00:17:19 2020 max relations containing the same ideal: 176 Tue Apr 14 00:17:38 2020 removing 4072883 relations and 3072883 ideals in 1000000 cliques Tue Apr 14 00:17:39 2020 commencing in-memory singleton removal Tue Apr 14 00:17:42 2020 begin with 48354318 relations and 48804144 unique ideals Tue Apr 14 00:18:05 2020 reduce to 48141563 relations and 45515043 ideals in 8 passes Tue Apr 14 00:18:05 2020 max relations containing the same ideal: 166 Tue Apr 14 00:18:22 2020 removing 3937420 relations and 2937420 ideals in 1000000 cliques Tue Apr 14 00:18:23 2020 commencing in-memory singleton removal Tue Apr 14 00:18:26 2020 begin with 44204143 relations and 45515043 unique ideals Tue Apr 14 00:18:47 2020 reduce to 43985166 relations and 42354678 ideals in 8 passes Tue Apr 14 00:18:47 2020 max relations containing the same ideal: 159 Tue Apr 14 00:19:02 2020 removing 3855855 relations and 2855855 ideals in 1000000 cliques Tue Apr 14 00:19:04 2020 commencing in-memory singleton removal Tue Apr 14 00:19:06 2020 begin with 40129311 relations and 42354678 unique ideals Tue Apr 14 00:19:23 2020 reduce to 39897778 relations and 39262821 ideals in 7 passes Tue Apr 14 00:19:23 2020 max relations containing the same ideal: 146 Tue Apr 14 00:19:37 2020 removing 1008731 relations and 811959 ideals in 196772 cliques Tue Apr 14 00:19:38 2020 commencing in-memory singleton removal Tue Apr 14 00:19:40 2020 begin with 38889047 relations and 39262821 unique ideals Tue Apr 14 00:19:54 2020 reduce to 38873526 relations and 38435281 ideals in 6 passes Tue Apr 14 00:19:54 2020 max relations containing the same ideal: 145 Tue Apr 14 00:20:01 2020 relations with 0 large ideals: 1351 Tue Apr 14 00:20:01 2020 relations with 1 large ideals: 2032 Tue Apr 14 00:20:01 2020 relations with 2 large ideals: 31561 Tue Apr 14 00:20:01 2020 relations with 3 large ideals: 276178 Tue Apr 14 00:20:01 2020 relations with 4 large ideals: 1367390 Tue Apr 14 00:20:01 2020 relations with 5 large ideals: 4161767 Tue Apr 14 00:20:01 2020 relations with 6 large ideals: 8220245 Tue Apr 14 00:20:01 2020 relations with 7+ large ideals: 24813002 Tue Apr 14 00:20:01 2020 commencing 2-way merge Tue Apr 14 00:20:20 2020 reduce to 25269597 relation sets and 24831352 unique ideals Tue Apr 14 00:20:20 2020 commencing full merge Tue Apr 14 00:25:34 2020 memory use: 3048.5 MB Tue Apr 14 00:25:36 2020 found 12788387 cycles, need 12729552 Tue Apr 14 00:25:38 2020 weight of 12729552 cycles is about 1145831261 (90.01/cycle) Tue Apr 14 00:25:39 2020 distribution of cycle lengths: Tue Apr 14 00:25:39 2020 1 relations: 936274 Tue Apr 14 00:25:39 2020 2 relations: 1233061 Tue Apr 14 00:25:39 2020 3 relations: 1366643 Tue Apr 14 00:25:39 2020 4 relations: 1339097 Tue Apr 14 00:25:39 2020 5 relations: 1277647 Tue Apr 14 00:25:39 2020 6 relations: 1185428 Tue Apr 14 00:25:39 2020 7 relations: 1063351 Tue Apr 14 00:25:39 2020 8 relations: 927894 Tue Apr 14 00:25:39 2020 9 relations: 789959 Tue Apr 14 00:25:39 2020 10+ relations: 2610198 Tue Apr 14 00:25:39 2020 heaviest cycle: 23 relations Tue Apr 14 00:25:41 2020 commencing cycle optimization Tue Apr 14 00:25:55 2020 start with 80743841 relations Tue Apr 14 00:27:24 2020 pruned 2673306 relations Tue Apr 14 00:27:25 2020 memory use: 2459.5 MB Tue Apr 14 00:27:25 2020 distribution of cycle lengths: Tue Apr 14 00:27:25 2020 1 relations: 936274 Tue Apr 14 00:27:25 2020 2 relations: 1265314 Tue Apr 14 00:27:25 2020 3 relations: 1424607 Tue Apr 14 00:27:25 2020 4 relations: 1385641 Tue Apr 14 00:27:25 2020 5 relations: 1326920 Tue Apr 14 00:27:25 2020 6 relations: 1219435 Tue Apr 14 00:27:25 2020 7 relations: 1089468 Tue Apr 14 00:27:25 2020 8 relations: 940531 Tue Apr 14 00:27:25 2020 9 relations: 790962 Tue Apr 14 00:27:25 2020 10+ relations: 2350400 Tue Apr 14 00:27:25 2020 heaviest cycle: 22 relations Tue Apr 14 00:27:42 2020 RelProcTime: 4118 Tue Apr 14 00:27:46 2020 Tue Apr 14 00:27:46 2020 commencing linear algebra Tue Apr 14 00:27:47 2020 read 12729552 cycles Tue Apr 14 00:28:04 2020 cycles contain 38650274 unique relations Tue Apr 14 00:33:26 2020 read 38650274 relations Tue Apr 14 00:34:10 2020 using 20 quadratic characters above 4294917295 Tue Apr 14 00:36:38 2020 building initial matrix Tue Apr 14 00:42:43 2020 memory use: 5483.9 MB Tue Apr 14 00:42:53 2020 read 12729552 cycles Tue Apr 14 00:42:55 2020 matrix is 12729375 x 12729552 (4750.3 MB) with weight 1481221875 (116.36/col) Tue Apr 14 00:42:55 2020 sparse part has weight 1092512135 (85.82/col) Tue Apr 14 00:44:26 2020 filtering completed in 2 passes Tue Apr 14 00:44:29 2020 matrix is 12728889 x 12729066 (4750.3 MB) with weight 1481204265 (116.36/col) Tue Apr 14 00:44:29 2020 sparse part has weight 1092509145 (85.83/col) Tue Apr 14 00:45:25 2020 matrix starts at (0, 0) Tue Apr 14 00:45:27 2020 matrix is 12728889 x 12729066 (4750.3 MB) with weight 1481204265 (116.36/col) Tue Apr 14 00:45:27 2020 sparse part has weight 1092509145 (85.83/col) Tue Apr 14 00:45:27 2020 saving the first 112 matrix rows for later Tue Apr 14 00:45:29 2020 matrix includes 128 packed rows Tue Apr 14 00:45:32 2020 matrix is 12728777 x 12729066 (4413.9 MB) with weight 1088568275 (85.52/col) Tue Apr 14 00:45:32 2020 sparse part has weight 1004323212 (78.90/col) Tue Apr 14 00:45:32 2020 using block size 8192 and superblock size 442368 for processor cache size 9216 kB Tue Apr 14 00:45:56 2020 commencing Lanczos iteration (6 threads) Tue Apr 14 00:45:56 2020 memory use: 5060.6 MB Tue Apr 14 00:46:26 2020 linear algebra at 0.0%, ETA 64h47m Looks like 285M relations was a big overestimate - maybe something to do with the double large prime bounds being slightly less than double the single large prime bounds? These were the parameters I ended up using: Code: tasks.A = 28 tasks.qmin = 500000 tasks.lim0 = 115000000 tasks.lim1 = 175000000 tasks.lpb0 = 31 tasks.lpb1 = 32 tasks.sieve.lambda0 = 1.855 tasks.sieve.lambda1 = 1.85 tasks.sieve.mfb0 = 58 tasks.sieve.mfb1 = 60 tasks.sieve.ncurves0 = 20 tasks.sieve.ncurves1 = 25 I'll do another c177 next; Curtis, what parameters do you think I should try this time (bearing in mind Ed's c176 will also be a useful comparison)? I guess from a data-collection point of view it would also be useful to try some more filtering runs on the first c177 with a smaller number of relations to see how many are actually needed? (Also shouldn't this really be in a separate thread?) Last fiddled with by charybdis on 2020-04-14 at 00:10
2020-04-14, 01:40   #20
VBCurtis

"Curtis"
Feb 2005
Riverside, CA

138016 Posts

Quote:
 Originally Posted by charybdis Looks like 285M relations was a big overestimate - maybe something to do with the double large prime bounds being slightly less than double the single large prime bounds? These were the parameters I ended up using: Code: tasks.A = 28 tasks.qmin = 500000 tasks.lim0 = 115000000 tasks.lim1 = 175000000 tasks.lpb0 = 31 tasks.lpb1 = 32 tasks.sieve.lambda0 = 1.855 tasks.sieve.lambda1 = 1.85 tasks.sieve.mfb0 = 58 tasks.sieve.mfb1 = 60 tasks.sieve.ncurves0 = 20 tasks.sieve.ncurves1 = 25 I'll do another c177 next; Curtis, what parameters do you think I should try this time (bearing in mind Ed's c176 will also be a useful comparison)? I guess from a data-collection point of view it would also be useful to try some more filtering runs on the first c177 with a smaller number of relations to see how many are actually needed? (Also shouldn't this really be in a separate thread?)
Wow! Sorry about the mistake on relations-estimate. So 266M raw relations, 156M unique (not an unusually-good ratio, meaning this is likely not an exceptional poly), built a 12.7M matrix. That's quite small for this size, meaning more relations are not a good idea. If you're willing to do a few filtering runs, please use msieve's filter_maxrels flag (find the exact invocation via msieve -h) to test 260M and 250M relations? Looks like 250M should be the target number for our future c175 file (since this is a c177, at the high end of what this file will cover). Maybe 240M is even enough...

Since you aborted the run, I suppose you don't have the CADO-generated summary of sieving thread-time? Bummer.

Your final Q of 190M means yield wasn't terrific; that suggests we might benefit from increasing the lim's a bit for your next run. How about lim0=130M and lim1=180M? Those are kind of big by the norms of ggnfs/15e, but you're sieving on 14.5e (A=28). Yield for your job wasn't great, Q:1-190M producing 266M relations is just under yield of 1.5. I think I'd boost either the siever (to I=15, in which case don't bother changing lim's, or even reduce them a bit) or LP.

Alternative: Go for 32LP on both sides, rather than 31/32. That would add 30% to relations needed, 325M rather than 250M. lim's should be less unbalanced in this case, e.g. 140M and 175M. I think I like this plan better, as a less-massive change to the params. EDIT: Also, Ed is doing I=15 on his run, so we'll get some sort of comparison there.

Do you have any sense for whether the job ran more quickly than you expected, or less? GNFS jobs seem to double in length every 5.5 digits with CADO, if that helps you make a comparison to previous work. My experience with poly select tweak of nq=15625, starting Q really small, and tight lambda/low mfb settings generally seem to effectively take ~2 digits off the job time compared to CADO defaults.
As for breaking off a new thread for this interesting 175-params discussion, that's Ed's call- it's his subforum, after all!

Last fiddled with by VBCurtis on 2020-04-14 at 01:45

2020-04-14, 11:53   #21
charybdis

Apr 2020

17×29 Posts

Quote:
 Originally Posted by VBCurtis Since you aborted the run, I suppose you don't have the CADO-generated summary of sieving thread-time? Bummer.
I don't have the full summary, but there are the "combined stats" lines in the log, which give 'stats_total_cpu_time': '67364457.04999968', or ~2.13 CPU-years for sieving (most of the CPUs are i5-4xxx and i5-6xxx).

Quote:
 Alternative: Go for 32LP on both sides, rather than 31/32. That would add 30% to relations needed, 325M rather than 250M. lim's should be less unbalanced in this case, e.g. 140M and 175M. I think I like this plan better, as a less-massive change to the params. EDIT: Also, Ed is doing I=15 on his run, so we'll get some sort of comparison there.
Yes, even if I=15 turns out to be faster I guess 32/32 will give a better data point. I suppose the double large prime bounds should go up too?

Quote:
 ]Do you have any sense for whether the job ran more quickly than you expected, or less? GNFS jobs seem to double in length every 5.5 digits with CADO, if that helps you make a comparison to previous work. My experience with poly select tweak of nq=15625, starting Q really small, and tight lambda/low mfb settings generally seem to effectively take ~2 digits off the job time compared to CADO defaults.
I think it took a little longer than I expected, but some of this will be down to the oversieving at high Q with low yield; we'll get a better idea once I've done some filtering runs with fewer relations. I'd already been trying to fudge the parameters for my jobs using some of your guesses.

2020-04-14, 12:39   #22
EdH

"Ed Hall"
Dec 2009

2·1,999 Posts

Quote:
 Originally Posted by charybdis . . . (Also shouldn't this really be in a separate thread?)
Quote:
 Originally Posted by VBCurtis . . . As for breaking off a new thread for this interesting 175-params discussion, that's Ed's call- it's his subforum, after all!
A separate thread would be fine, but there is intermix within some of the posts. The thread might gain more interest from others if it was located outside the blog area. And, this portion doesn't match the original post.

It's OK either way. If you'd like to, Curtis, you can grab the relevant posts and move them to a more appropriate location.

 Similar Threads Thread Thread Starter Forum Replies Last Post enzocreti enzocreti 1 2020-03-03 18:38 tuckerkao Miscellaneous Math 2 2020-02-16 06:23 Nick Puzzles 9 2013-02-13 17:17 vsuite GPU Computing 11 2011-02-02 04:47 Corbyguy Software 3 2008-06-09 18:09

All times are UTC. The time now is 07:03.

Sun Oct 17 07:03:48 UTC 2021 up 86 days, 1:32, 0 users, load averages: 1.57, 1.54, 1.27