mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   CADO-NFS (https://www.mersenneforum.org/forumdisplay.php?f=170)
-   -   Some CADO-NFS Work At Around 175-180 Decimal Digits (https://www.mersenneforum.org/showthread.php?t=25479)

charybdis 2020-04-08 19:31

Thank you so much for this! I'll go for 285M relations and see what happens, and I'll keep you updated on the filtering/matrix steps.

VBCurtis 2020-04-08 20:45

A couple little things:

I didn't account for needing more relations for C177 vs C175, but this is all guesswork anyway... 290M might be smarter?

Also, Ed uses msieve to solve the matrix because it's faster than CADO. If you're going to use CADO start-to-finish, then the matrix step is relatively slower, which again argues for more sieving. I *think* that you can use a snapshot file to retroactively do more sieving if a matrix doesn't meet your sensibilities for size- that is, if the matrix looks big, you can edit the snapshot file to add a higher rels_wanted setting, and restart CADO. I believe CADO will look to see if relations count matches that number, even if filtering is already complete (I'd like confirmation of this, actually!). If I'm right, starting with 285M with a plan to bump to 300M if the matrix comes out big is maybe the best plan.

tl'dr: 285M good. More might be better. :)

EdH 2020-04-08 20:50

[QUOTE=VBCurtis;542116]1. This only matters if we plan to iterate multiple factorizations of similar size to compare various params settings; otherwise, your timing data doesn't tell us much since there is little to compare to. If you have some elapsed (e.g. wall clock) time for the C168ish you did with the default CADO file, we can see if my C175 file did better than the observed double-every-5.5-digits typical on CADO. So, I wouldn't bother letting it finish, but I would try to record CADO's claim of sieve time from right before it enters filtering.

2. I believe you need to give it the poly also; either the .poly file in the same folder (which the snapshot should reference), or by explicitly declaring the poly the same way you did for the SNFS poly (tasks.poly = {polyfilename}, if I recall). Either way, you'll need to copy the poly file to the colab instance.

3. Far beyond my pay-grade in networking nor CADO knowledge, sorry.[/QUOTE]1. Do you know whether "tasks.filter.run = false" provides the poly/sieve timings or just stops?

2. I had thought the poly values were also in the snapshot, but I see they aren't, so I'll be sure to copy that file as well. I will have to modify the snapshot file for other things such as build path, too.

One thing I hope to try in the next few days, is to run a semi-clone, server only, instance of the current local server, on Colab, sieving 100k-500k*. I'd like to see if the local instance would recognize the relations from the Colab instance, if it had no record of them being assigned. If not, I'm wondering if including the .stderr file would tell the local instance that they had already been accepted.

*Would running this area throw off your "rels_wanted" value, since these would be more prone to duplicates, or am I off course?

EdH 2020-04-08 20:58

[QUOTE=VBCurtis;542130]A couple little things:

I didn't account for needing more relations for C177 vs C175, but this is all guesswork anyway... 290M might be smarter?

Also, Ed uses msieve to solve the matrix because it's faster than CADO. If you're going to use CADO start-to-finish, then the matrix step is relatively slower, which again argues for more sieving. I *think* that you can use a snapshot file to retroactively do more sieving if a matrix doesn't meet your sensibilities for size- that is, if the matrix looks big, you can edit the snapshot file to add a higher rels_wanted setting, and restart CADO. I believe CADO will look to see if relations count matches that number, even if filtering is already complete (I'd like confirmation of this, actually!). If I'm right, starting with 285M with a plan to bump to 300M if the matrix comes out big is maybe the best plan.

tl'dr: 285M good. More might be better. :)[/QUOTE]
On previous occasions, I have restarted CADO-NFS after it was already performing krylov, to add more relations. As you posted, I changed the rels_wanted value and it did go back to sieving until the new value was met. I mainly did this when msieve wouldn't build a matrix but CADO-NFS did, but I also did it for the recent testing of msieve matrices.

charybdis 2020-04-08 21:50

[QUOTE=VBCurtis;542130]A couple little things:

I didn't account for needing more relations for C177 vs C175, but this is all guesswork anyway... 290M might be smarter?

Also, Ed uses msieve to solve the matrix because it's faster than CADO. If you're going to use CADO start-to-finish, then the matrix step is relatively slower, which again argues for more sieving. I *think* that you can use a snapshot file to retroactively do more sieving if a matrix doesn't meet your sensibilities for size- that is, if the matrix looks big, you can edit the snapshot file to add a higher rels_wanted setting, and restart CADO. I believe CADO will look to see if relations count matches that number, even if filtering is already complete (I'd like confirmation of this, actually!). If I'm right, starting with 285M with a plan to bump to 300M if the matrix comes out big is maybe the best plan.

tl'dr: 285M good. More might be better. :)[/QUOTE]

I'll be using msieve - it's faster, and the machine I was using ran out of memory during the "replay" stage of CADO filtering for the c172 I've just finished. Filtering is obviously a while away, but what matrix size would you consider "too big" here (say with target density 100)?

VBCurtis 2020-04-08 22:10

[QUOTE=EdH;542132]1. Do you know whether "tasks.filter.run = false" provides the poly/sieve timings or just stops?

*Would running this area throw off your "rels_wanted" value, since these would be more prone to duplicates, or am I off course?[/QUOTE]

1. I'm pretty sure the timing is listed before filtering begins, so CADO should show the time-to-sieve just before it exits.

No big deal on the tweak to relations from starting at lower Q. 500k is already quite low/prone to extra duplicates, going down to 100 or 150 won't change those numbers enough to matter.

VBCurtis 2020-04-08 22:19

[QUOTE=charybdis;542144]I'll be using msieve - it's faster, and the machine I was using ran out of memory during the "replay" stage of CADO filtering for the c172 I've just finished. Filtering is obviously a while away, but what matrix size would you consider "too big" here (say with target density 100)?[/QUOTE]

I glanced through the NFS@home 15e results page to have a look at matrix sizes, but I forgot that most numbers don't have the difficulty listed on the results (one has to open the log to find that info). So, I'm taking a guess: 20M matrix is too big for GNFS177. Using msieve, filtering and the matrix will fit in a 16GB machine easily; the limit is around 25-26M matrix size on 16GB.

... Aha! [url]https://mersenneforum.org/showpost.php?p=533106&postcount=217[/url] is a good data point: C184ish, 32/32LP, 366M raw relations was enough to build a 17.7M matrix. 32/32 needs about 30% more relations than our choice of 31/32, so 280M would be equivalent if we were on ggnfs and starting at Q=20M. We're taking advantage of CADO's super fast speeds at low Q, at the cost of extra duplicates; I estimate you'll need 190M unique relations, and 285-290M raw relations is still a good guess (based on that one data point from the linked post). C177 is markedly easier than C184, so I won't be surprised to see a 15-16M matrix from your dataset.

charybdis 2020-04-13 23:59

So I decided to do an early filtering run (with the default target_density of 90) for the c177 to see how things were going, and was surprised to get a rather friendly matrix (edit - this was after sieving Q up to 190M):

[code]Mon Apr 13 23:19:04 2020 commencing relation filtering
Mon Apr 13 23:19:04 2020 estimated available RAM is 15845.6 MB
Mon Apr 13 23:19:04 2020 commencing duplicate removal, pass 1
...relation errors...
Mon Apr 13 23:45:59 2020 found 81454230 hash collisions in 266531234 relations
Mon Apr 13 23:46:21 2020 added 122209 free relations
Mon Apr 13 23:46:21 2020 commencing duplicate removal, pass 2
Mon Apr 13 23:51:30 2020 found 110111310 duplicates and 156542133 unique relations
Mon Apr 13 23:51:30 2020 memory use: 1449.5 MB
Mon Apr 13 23:51:30 2020 reading ideals above 189857792
Mon Apr 13 23:51:30 2020 commencing singleton removal, initial pass
Tue Apr 14 00:03:26 2020 memory use: 3012.0 MB
Tue Apr 14 00:03:27 2020 reading all ideals from disk
Tue Apr 14 00:03:43 2020 memory use: 2357.2 MB
Tue Apr 14 00:03:46 2020 commencing in-memory singleton removal
Tue Apr 14 00:03:49 2020 begin with 156542133 relations and 141217020 unique ideals
Tue Apr 14 00:04:16 2020 reduce to 69079091 relations and 41834367 ideals in 15 passes
Tue Apr 14 00:04:16 2020 max relations containing the same ideal: 24
Tue Apr 14 00:04:20 2020 reading ideals above 720000
Tue Apr 14 00:04:20 2020 commencing singleton removal, initial pass
Tue Apr 14 00:13:17 2020 memory use: 1506.0 MB
Tue Apr 14 00:13:17 2020 reading all ideals from disk
Tue Apr 14 00:13:38 2020 memory use: 2750.9 MB
Tue Apr 14 00:13:43 2020 keeping 62470184 ideals with weight <= 200, target excess is 377745
Tue Apr 14 00:13:48 2020 commencing in-memory singleton removal
Tue Apr 14 00:13:52 2020 begin with 69079091 relations and 62470184 unique ideals
Tue Apr 14 00:14:38 2020 reduce to 68407306 relations and 61797325 ideals in 11 passes
Tue Apr 14 00:14:38 2020 max relations containing the same ideal: 200
Tue Apr 14 00:15:02 2020 removing 6152193 relations and 5152193 ideals in 1000000 cliques
Tue Apr 14 00:15:04 2020 commencing in-memory singleton removal
Tue Apr 14 00:15:08 2020 begin with 62255113 relations and 61797325 unique ideals
Tue Apr 14 00:15:42 2020 reduce to 61908990 relations and 56293091 ideals in 9 passes
Tue Apr 14 00:15:42 2020 max relations containing the same ideal: 193
Tue Apr 14 00:16:03 2020 removing 4732119 relations and 3732119 ideals in 1000000 cliques
Tue Apr 14 00:16:05 2020 commencing in-memory singleton removal
Tue Apr 14 00:16:08 2020 begin with 57176871 relations and 56293091 unique ideals
Tue Apr 14 00:16:33 2020 reduce to 56939736 relations and 52320100 ideals in 7 passes
Tue Apr 14 00:16:33 2020 max relations containing the same ideal: 185
Tue Apr 14 00:16:52 2020 removing 4296409 relations and 3296409 ideals in 1000000 cliques
Tue Apr 14 00:16:54 2020 commencing in-memory singleton removal
Tue Apr 14 00:16:57 2020 begin with 52643327 relations and 52320100 unique ideals
Tue Apr 14 00:17:19 2020 reduce to 52427201 relations and 48804144 ideals in 7 passes
Tue Apr 14 00:17:19 2020 max relations containing the same ideal: 176
Tue Apr 14 00:17:38 2020 removing 4072883 relations and 3072883 ideals in 1000000 cliques
Tue Apr 14 00:17:39 2020 commencing in-memory singleton removal
Tue Apr 14 00:17:42 2020 begin with 48354318 relations and 48804144 unique ideals
Tue Apr 14 00:18:05 2020 reduce to 48141563 relations and 45515043 ideals in 8 passes
Tue Apr 14 00:18:05 2020 max relations containing the same ideal: 166
Tue Apr 14 00:18:22 2020 removing 3937420 relations and 2937420 ideals in 1000000 cliques
Tue Apr 14 00:18:23 2020 commencing in-memory singleton removal
Tue Apr 14 00:18:26 2020 begin with 44204143 relations and 45515043 unique ideals
Tue Apr 14 00:18:47 2020 reduce to 43985166 relations and 42354678 ideals in 8 passes
Tue Apr 14 00:18:47 2020 max relations containing the same ideal: 159
Tue Apr 14 00:19:02 2020 removing 3855855 relations and 2855855 ideals in 1000000 cliques
Tue Apr 14 00:19:04 2020 commencing in-memory singleton removal
Tue Apr 14 00:19:06 2020 begin with 40129311 relations and 42354678 unique ideals
Tue Apr 14 00:19:23 2020 reduce to 39897778 relations and 39262821 ideals in 7 passes
Tue Apr 14 00:19:23 2020 max relations containing the same ideal: 146
Tue Apr 14 00:19:37 2020 removing 1008731 relations and 811959 ideals in 196772 cliques
Tue Apr 14 00:19:38 2020 commencing in-memory singleton removal
Tue Apr 14 00:19:40 2020 begin with 38889047 relations and 39262821 unique ideals
Tue Apr 14 00:19:54 2020 reduce to 38873526 relations and 38435281 ideals in 6 passes
Tue Apr 14 00:19:54 2020 max relations containing the same ideal: 145
Tue Apr 14 00:20:01 2020 relations with 0 large ideals: 1351
Tue Apr 14 00:20:01 2020 relations with 1 large ideals: 2032
Tue Apr 14 00:20:01 2020 relations with 2 large ideals: 31561
Tue Apr 14 00:20:01 2020 relations with 3 large ideals: 276178
Tue Apr 14 00:20:01 2020 relations with 4 large ideals: 1367390
Tue Apr 14 00:20:01 2020 relations with 5 large ideals: 4161767
Tue Apr 14 00:20:01 2020 relations with 6 large ideals: 8220245
Tue Apr 14 00:20:01 2020 relations with 7+ large ideals: 24813002
Tue Apr 14 00:20:01 2020 commencing 2-way merge
Tue Apr 14 00:20:20 2020 reduce to 25269597 relation sets and 24831352 unique ideals
Tue Apr 14 00:20:20 2020 commencing full merge
Tue Apr 14 00:25:34 2020 memory use: 3048.5 MB
Tue Apr 14 00:25:36 2020 found 12788387 cycles, need 12729552
Tue Apr 14 00:25:38 2020 weight of 12729552 cycles is about 1145831261 (90.01/cycle)
Tue Apr 14 00:25:39 2020 distribution of cycle lengths:
Tue Apr 14 00:25:39 2020 1 relations: 936274
Tue Apr 14 00:25:39 2020 2 relations: 1233061
Tue Apr 14 00:25:39 2020 3 relations: 1366643
Tue Apr 14 00:25:39 2020 4 relations: 1339097
Tue Apr 14 00:25:39 2020 5 relations: 1277647
Tue Apr 14 00:25:39 2020 6 relations: 1185428
Tue Apr 14 00:25:39 2020 7 relations: 1063351
Tue Apr 14 00:25:39 2020 8 relations: 927894
Tue Apr 14 00:25:39 2020 9 relations: 789959
Tue Apr 14 00:25:39 2020 10+ relations: 2610198
Tue Apr 14 00:25:39 2020 heaviest cycle: 23 relations
Tue Apr 14 00:25:41 2020 commencing cycle optimization
Tue Apr 14 00:25:55 2020 start with 80743841 relations
Tue Apr 14 00:27:24 2020 pruned 2673306 relations
Tue Apr 14 00:27:25 2020 memory use: 2459.5 MB
Tue Apr 14 00:27:25 2020 distribution of cycle lengths:
Tue Apr 14 00:27:25 2020 1 relations: 936274
Tue Apr 14 00:27:25 2020 2 relations: 1265314
Tue Apr 14 00:27:25 2020 3 relations: 1424607
Tue Apr 14 00:27:25 2020 4 relations: 1385641
Tue Apr 14 00:27:25 2020 5 relations: 1326920
Tue Apr 14 00:27:25 2020 6 relations: 1219435
Tue Apr 14 00:27:25 2020 7 relations: 1089468
Tue Apr 14 00:27:25 2020 8 relations: 940531
Tue Apr 14 00:27:25 2020 9 relations: 790962
Tue Apr 14 00:27:25 2020 10+ relations: 2350400
Tue Apr 14 00:27:25 2020 heaviest cycle: 22 relations
Tue Apr 14 00:27:42 2020 RelProcTime: 4118
Tue Apr 14 00:27:46 2020
Tue Apr 14 00:27:46 2020 commencing linear algebra
Tue Apr 14 00:27:47 2020 read 12729552 cycles
Tue Apr 14 00:28:04 2020 cycles contain 38650274 unique relations
Tue Apr 14 00:33:26 2020 read 38650274 relations
Tue Apr 14 00:34:10 2020 using 20 quadratic characters above 4294917295
Tue Apr 14 00:36:38 2020 building initial matrix
Tue Apr 14 00:42:43 2020 memory use: 5483.9 MB
Tue Apr 14 00:42:53 2020 read 12729552 cycles
Tue Apr 14 00:42:55 2020 matrix is 12729375 x 12729552 (4750.3 MB) with weight 1481221875 (116.36/col)
Tue Apr 14 00:42:55 2020 sparse part has weight 1092512135 (85.82/col)
Tue Apr 14 00:44:26 2020 filtering completed in 2 passes
Tue Apr 14 00:44:29 2020 matrix is 12728889 x 12729066 (4750.3 MB) with weight 1481204265 (116.36/col)
Tue Apr 14 00:44:29 2020 sparse part has weight 1092509145 (85.83/col)
Tue Apr 14 00:45:25 2020 matrix starts at (0, 0)
Tue Apr 14 00:45:27 2020 matrix is 12728889 x 12729066 (4750.3 MB) with weight 1481204265 (116.36/col)
Tue Apr 14 00:45:27 2020 sparse part has weight 1092509145 (85.83/col)
Tue Apr 14 00:45:27 2020 saving the first 112 matrix rows for later
Tue Apr 14 00:45:29 2020 matrix includes 128 packed rows
Tue Apr 14 00:45:32 2020 matrix is 12728777 x 12729066 (4413.9 MB) with weight 1088568275 (85.52/col)
Tue Apr 14 00:45:32 2020 sparse part has weight 1004323212 (78.90/col)
Tue Apr 14 00:45:32 2020 using block size 8192 and superblock size 442368 for processor cache size 9216 kB
Tue Apr 14 00:45:56 2020 commencing Lanczos iteration (6 threads)
Tue Apr 14 00:45:56 2020 memory use: 5060.6 MB
Tue Apr 14 00:46:26 2020 linear algebra at 0.0%, ETA 64h47m[/code]

Looks like 285M relations was a big overestimate - maybe something to do with the double large prime bounds being slightly less than double the single large prime bounds? These were the parameters I ended up using:

[code]tasks.A = 28
tasks.qmin = 500000
tasks.lim0 = 115000000
tasks.lim1 = 175000000
tasks.lpb0 = 31
tasks.lpb1 = 32
tasks.sieve.lambda0 = 1.855
tasks.sieve.lambda1 = 1.85
tasks.sieve.mfb0 = 58
tasks.sieve.mfb1 = 60
tasks.sieve.ncurves0 = 20
tasks.sieve.ncurves1 = 25[/code]

I'll do another c177 next; Curtis, what parameters do you think I should try this time (bearing in mind Ed's c176 will also be a useful comparison)? I guess from a data-collection point of view it would also be useful to try some more filtering runs on the first c177 with a smaller number of relations to see how many are actually needed?

(Also shouldn't this really be in a separate thread?)

VBCurtis 2020-04-14 01:40

[QUOTE=charybdis;542584]Looks like 285M relations was a big overestimate - maybe something to do with the double large prime bounds being slightly less than double the single large prime bounds? These were the parameters I ended up using:

[code]tasks.A = 28
tasks.qmin = 500000
tasks.lim0 = 115000000
tasks.lim1 = 175000000
tasks.lpb0 = 31
tasks.lpb1 = 32
tasks.sieve.lambda0 = 1.855
tasks.sieve.lambda1 = 1.85
tasks.sieve.mfb0 = 58
tasks.sieve.mfb1 = 60
tasks.sieve.ncurves0 = 20
tasks.sieve.ncurves1 = 25[/code]

I'll do another c177 next; Curtis, what parameters do you think I should try this time (bearing in mind Ed's c176 will also be a useful comparison)? I guess from a data-collection point of view it would also be useful to try some more filtering runs on the first c177 with a smaller number of relations to see how many are actually needed?

(Also shouldn't this really be in a separate thread?)[/QUOTE]

Wow! Sorry about the mistake on relations-estimate. So 266M raw relations, 156M unique (not an unusually-good ratio, meaning this is likely not an exceptional poly), built a 12.7M matrix. That's quite small for this size, meaning more relations are not a good idea. If you're willing to do a few filtering runs, please use msieve's filter_maxrels flag (find the exact invocation via msieve -h) to test 260M and 250M relations? Looks like 250M should be the target number for our future c175 file (since this is a c177, at the high end of what this file will cover). Maybe 240M is even enough...

Since you aborted the run, I suppose you don't have the CADO-generated summary of sieving thread-time? Bummer.

Your final Q of 190M means yield wasn't terrific; that suggests we might benefit from increasing the lim's a bit for your next run. How about lim0=130M and lim1=180M? Those are kind of big by the norms of ggnfs/15e, but you're sieving on 14.5e (A=28). Yield for your job wasn't great, Q:1-190M producing 266M relations is just under yield of 1.5. I think I'd boost either the siever (to I=15, in which case don't bother changing lim's, or even reduce them a bit) or LP.

Alternative: Go for 32LP on both sides, rather than 31/32. That would add 30% to relations needed, 325M rather than 250M. lim's should be less unbalanced in this case, e.g. 140M and 175M. I think I like this plan better, as a less-massive change to the params. EDIT: Also, Ed is doing I=15 on his run, so we'll get some sort of comparison there.

Do you have any sense for whether the job ran more quickly than you expected, or less? GNFS jobs seem to double in length every 5.5 digits with CADO, if that helps you make a comparison to previous work. My experience with poly select tweak of nq=15625, starting Q really small, and tight lambda/low mfb settings generally seem to effectively take ~2 digits off the job time compared to CADO defaults.
As for breaking off a new thread for this interesting 175-params discussion, that's Ed's call- it's his subforum, after all!

charybdis 2020-04-14 11:53

[quote=VBCurtis;542587]Since you aborted the run, I suppose you don't have the CADO-generated summary of sieving thread-time? Bummer.[/quote]

I don't have the full summary, but there are the "combined stats" lines in the log, which give 'stats_total_cpu_time': '67364457.04999968', or ~2.13 CPU-years for sieving (most of the CPUs are i5-4xxx and i5-6xxx).

[quote]Alternative: Go for 32LP on both sides, rather than 31/32. That would add 30% to relations needed, 325M rather than 250M. lim's should be less unbalanced in this case, e.g. 140M and 175M. I think I like this plan better, as a less-massive change to the params. EDIT: Also, Ed is doing I=15 on his run, so we'll get some sort of comparison there.[/quote]

Yes, even if I=15 turns out to be faster I guess 32/32 will give a better data point. I suppose the double large prime bounds should go up too?

[quote]]Do you have any sense for whether the job ran more quickly than you expected, or less? GNFS jobs seem to double in length every 5.5 digits with CADO, if that helps you make a comparison to previous work. My experience with poly select tweak of nq=15625, starting Q really small, and tight lambda/low mfb settings generally seem to effectively take ~2 digits off the job time compared to CADO defaults.[/quote]

I think it took a little longer than I expected, but some of this will be down to the oversieving at high Q with low yield; we'll get a better idea once I've done some filtering runs with fewer relations. I'd already been trying to fudge the parameters for my jobs using some of your guesses.

EdH 2020-04-14 12:39

[QUOTE=charybdis;542584]. . .
(Also shouldn't this really be in a separate thread?)[/QUOTE]
[QUOTE=VBCurtis;542587]. . .
As for breaking off a new thread for this interesting 175-params discussion, that's Ed's call- it's his subforum, after all![/QUOTE]A separate thread would be fine, but there is intermix within some of the posts. The thread might gain more interest from others if it was located outside the blog area. And, this portion doesn't match the original post.

It's OK either way. If you'd like to, Curtis, you can grab the relevant posts and move them to a more appropriate location.


All times are UTC. The time now is 07:57.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2020, Jelsoft Enterprises Ltd.