mersenneforum.org Some CADO-NFS Work At Around 175-180 Decimal Digits
 Register FAQ Search Today's Posts Mark Forums Read

2020-04-30, 18:25   #89
VBCurtis

"Curtis"
Feb 2005
Riverside, CA

105628 Posts

Quote:
 Originally Posted by charybdis I'll sieve a bit more to try and get a matrix at TD 120. Curtis - I'll do a c178 with 3LP and I=15 next as you suggested; please could you give me some parameters to try?
Let's just trim lim's a little bit- how about 90M and 125M? There shouldn't be much difference in settings between A=28 and A=29 (aka I=15); I have more things I want to try, but if we try them all at once we won't know which change found speed.

You needed 353M relations with A=28 to build a decent matrix; I estimate I=15 to need 4-5% fewer, so target 340M? We're seeing duplicate rates all over the place, so the target is more of a "try msieve here while it keeps sieving" for the way you guys have things set up?

2020-04-30, 20:10   #90
EdH

"Ed Hall"
Dec 2009

65658 Posts

Quote:
 Originally Posted by VBCurtis Let's just trim lim's a little bit- how about 90M and 125M? There shouldn't be much difference in settings between A=28 and A=29 (aka I=15); I have more things I want to try, but if we try them all at once we won't know which change found speed. You needed 353M relations with A=28 to build a decent matrix; I estimate I=15 to need 4-5% fewer, so target 340M? We're seeing duplicate rates all over the place, so the target is more of a "try msieve here while it keeps sieving" for the way you guys have things set up?
I'm still trying to decide where the balance point would be for my setup. If a day of extra sieving (or, maybe even two) only saves a day of LA, it's probably a loss in that I could have started sieving the next composite, or as now, work on a team project while LA completes.

I am thinking along this line, though:

I plan to use msieve for LA on a single machine. I'm thinking that I should oversieve on purpose on the CADO-NFS setup and periodically check whether msieve can build a matrix, instead of expecting CADO-NFS to build the matrix. In that vein, I think, if 270M is required, I should just start with 300M as wanted relations and then let msieve test starting at 270M. That way, CADO-NFS isn't taking up the time trying to build and deciding to go for more relations.

Of course, the duplication rate is an issue. Maybe I should use remdups4 and shoot for a unique relations value rather than raw, and use that to adjust the CADO-NFS wanted value of raw. Then again, the duplication rate isn't linear. . .

 2020-04-30, 21:11 #91 VBCurtis     "Curtis" Feb 2005 Riverside, CA 2·7·11·29 Posts I agree with this entirely- 300M target and 270M is a spot to start msieve (assuming 2LP settings). If you use the recent 3LP params, you'd add ~50M relations to both numbers; 3LP is so useful that it's still faster! I am interested to see a C174-176 with 3LP; I might have to try that myself when the Kosta C198 mini-project is over. I really appreciate your contribution there; you've already nearly guaranteed that we won't spend a month to get to Q=80M.
 2020-04-30, 21:37 #92 EdH     "Ed Hall" Dec 2009 Adirondack Mtns 5×13×53 Posts Glad I can be helpful. I'll stick with the 198 team effort for now. I should be able to add the msieve machine next Tuesday evening. I'm using this to refine some scripts I'm running that take care gracefully dropping some of my machines out of the workforce when they near their bedtime, instead of causing WU timeouts.
2020-05-01, 00:13   #93
charybdis

Apr 2020

113 Posts

Good to go this time:
Code:
Thu Apr 30 22:39:41 2020  commencing relation filtering
Thu Apr 30 22:39:41 2020  setting target matrix density to 120.0
Thu Apr 30 22:39:41 2020  estimated available RAM is 15845.4 MB
Thu Apr 30 22:39:42 2020  commencing duplicate removal, pass 1
...
Thu Apr 30 23:16:49 2020  found 110859058 hash collisions in 375804511 relations
Thu Apr 30 23:17:10 2020  commencing duplicate removal, pass 2
Thu Apr 30 23:24:24 2020  found 143912162 duplicates and 231892349 unique relations
...
Fri May  1 00:51:57 2020  matrix is 13966788 x 13967013 (6423.2 MB) with weight 1706963165 (122.21/col)
Fri May  1 00:51:57 2020  sparse part has weight 1544139266 (110.56/col)
Fri May  1 00:51:57 2020  using block size 8192 and superblock size 884736 for processor cache size 9216 kB
Fri May  1 00:52:37 2020  commencing Lanczos iteration (6 threads)
Fri May  1 00:52:37 2020  memory use: 6066.3 MB
Fri May  1 00:53:12 2020  linear algebra at 0.0%, ETA 84h 3m
Quote:
 Originally Posted by VBCurtis Let's just trim lim's a little bit- how about 90M and 125M? There shouldn't be much difference in settings between A=28 and A=29 (aka I=15); I have more things I want to try, but if we try them all at once we won't know which change found speed. You needed 353M relations with A=28 to build a decent matrix; I estimate I=15 to need 4-5% fewer, so target 340M? We're seeing duplicate rates all over the place, so the target is more of a "try msieve here while it keeps sieving" for the way you guys have things set up?
Yes, I figured it was better to put an artificially large rels_wanted in the params file and run msieve when I get the chance. It messes up the ETA, but the yield changes so much through the job that the ETA wasn't all that useful anyway. Thanks once again for the parameters.

 2020-05-05, 15:55 #94 charybdis   Apr 2020 113 Posts 57.6M CPU-seconds of sieving with 3LP at I=15 gave this: Code: Tue May 5 13:11:32 2020 commencing relation filtering Tue May 5 13:11:32 2020 setting target matrix density to 110.0 Tue May 5 13:11:32 2020 estimated available RAM is 15845.4 MB Tue May 5 13:11:32 2020 commencing duplicate removal, pass 1 ... Tue May 5 13:47:36 2020 found 115022527 hash collisions in 367310494 relations Tue May 5 13:47:58 2020 commencing duplicate removal, pass 2 Tue May 5 13:55:03 2020 found 154133454 duplicates and 213177040 unique relations ... Tue May 5 15:20:11 2020 matrix is 15594597 x 15594821 (6670.9 MB) with weight 1766271532 (113.26/col) Tue May 5 15:20:11 2020 sparse part has weight 1592785088 (102.14/col) Tue May 5 15:20:11 2020 using block size 8192 and superblock size 884736 for processor cache size 9216 kB Tue May 5 15:20:56 2020 commencing Lanczos iteration (6 threads) Tue May 5 15:20:56 2020 memory use: 6331.9 MB Tue May 5 15:21:37 2020 linear algebra at 0.0%, ETA 110h56m I'll sieve a bit more, but the matrix is similar to the one I obtained after 60M CPU-seconds of sieving at A=28 in the previous job. Cownoise poly scores for the two jobs were very similar, so assuming they actually do sieve similarly, I=15 looks like it gives a speedup of a few percent over A=28 at c178. I notice once again I=15 is giving more duplicates, perhaps because of the enormous yields at low q values?
 2020-05-05, 17:37 #95 VBCurtis     "Curtis" Feb 2005 Riverside, CA 2·7·11·29 Posts Great comparison- same size matrix, 4% less sieve time. That's a win for I=15. Do you know what the Q-range sieved was? You may be right about the duplicate rate being related to starting sieving at such low Q. It's fast down there, but the duplicate rate makes some of that speed an illusion. I guess that means we don't reduce rels_wanted for I=15 compared to A=28. You could try Q-initial of 5M, see how it affects elapsed time and duplicates. Ideas for future tests: Should we try 31LP on both sides? Since the 3LP side isn't lambda-restrticted, it may make sense to ditch the tight lambda setting on the 2LP side. 31/31 should need 75% the relations of 31/32. Ditching lambda on the 2LP side might need 10% more relations (this number is a guess). mfb=58 and mfb=59 on the 2LP side are worth trying if lambda setting is removed. Finding the optimal ncurves settings could find us another 5% of sieve speed, with no change to matrix size.
2020-05-05, 18:36   #96
charybdis

Apr 2020

113 Posts

Quote:
 Originally Posted by VBCurtis Great comparison- same size matrix, 4% less sieve time. That's a win for I=15. Do you know what the Q-range sieved was?
500k to 90.1M.

Quote:
 You may be right about the duplicate rate being related to starting sieving at such low Q. It's fast down there, but the duplicate rate makes some of that speed an illusion. I guess that means we don't reduce rels_wanted for I=15 compared to A=28. You could try Q-initial of 5M, see how it affects elapsed time and duplicates.
This is something I can test without having to run another job - hopefully should have some data later today.

Quote:
 Should we try 31LP on both sides? Since the 3LP side isn't lambda-restrticted, it may make sense to ditch the tight lambda setting on the 2LP side. 31/31 should need 75% the relations of 31/32. Ditching lambda on the 2LP side might need 10% more relations (this number is a guess). mfb=58 and mfb=59 on the 2LP side are worth trying if lambda setting is removed.
What lims would you suggest using if I try 31/31 for a c178? Might just be best to give all the params so I don't make any mistakes

At some point I suppose I could help out with finding the 2LP/3LP crossover, but I presume it makes sense to optimise 3LP first.

Quote:
 Finding the optimal ncurves settings could find us another 5% of sieve speed, with no change to matrix size.
This could be figured out by test-sieving a small range, right?

 2020-05-05, 19:31 #97 EdH     "Ed Hall" Dec 2009 Adirondack Mtns 5×13×53 Posts I have started a study of duplication in reference to my recent c178 and found something interesting and disappointing. This may be due to my "farm" setup, but I actually have duplicated+ WUs: Code: $ls c180.500000-*.gz c180.500000-510000.k1_rwswi.gz c180.500000-510000.ylvkcw5x.gz triplicated: Code: $ ls c180.550000-*.gz c180.550000-560000.bbfy0nc6.gz c180.550000-560000.w31ocae5.gz c180.550000-560000.by9pspsu.gz and, even more: Code: $ls c180.570000-*.gz c180.570000-580000.3w5sj28b.gz c180.570000-580000.5tugazh8.gz c180.570000-580000.46ui0z36.gz c180.570000-580000.a4r_0xaq.gz c180.570000-580000.58h92gje.gz which, or course, greatly increased my duplication rate: Code: $ zcat c180.500000-*.gz | ./remdups4 100 >test Found 42946 unique, 43346 duplicate, and 0 bad relations. Code: $zcat c180.550000-*.gz | ./remdups4 100 >test Found 41950 unique, 84389 duplicate, and 0 bad relations. Code: $ zcat c180.570000-*.gz | ./remdups4 100 >test Found 42875 unique, 172369 duplicate, and 0 bad relations. For the range 500000-600000, I actually had a 129% duplication rate: Code: \$ zcat c180.5?0000-*.gz | ./remdups4 100 >test Found 414110 unique, 532836 duplicate, and 0 bad relations. I'm wondering if others experience this, or it is, in fact, due to something in my "farm" setup.
2020-05-05, 21:03   #98
charybdis

Apr 2020

113 Posts

Quote:
 Originally Posted by EdH I'm wondering if others experience this, or it is, in fact, due to something in my "farm" setup.
I've had a few duplicate WUs, and they all seem to be caused by a WU timing out and being resubmitted to another client but then finishing on the original client anyway. I removed all such files before running any filtering on my current job. The expired WUs don't get registered in the logfile when they finish, so they don't appear to affect timing data.

Now for the promised duplication rate check:

57.6M CPU-seconds of sieving starting at Q=500k gave
Code:
Tue May  5 13:55:03 2020  found 154133454 duplicates and 213177040 unique relations
57.6M CPU-seconds of sieving starting at Q=5M gives
Code:
Tue May  5 21:03:43 2020  found 121345816 duplicates and 217856601 unique relations
The second run has more unique relations despite having 28M fewer raw relations! So 500k is definitely too low to start.

Last fiddled with by charybdis on 2020-05-05 at 21:06

 2020-05-05, 21:32 #99 VBCurtis     "Curtis" Feb 2005 Riverside, CA 2·7·11·29 Posts I see no reason for lim's to change when LP changes. Let's leave lim's alone for now. Your test for Q=500k vs 5M is awesome, and conclusive! 2% more uniques for the same sieve time. Let's use Q=5M for this size, pending further investigation (say, 4M or 7M or 10M may be yet better). I agree that test-sieving on ncurves will work- but not a test-sieve at a single Q value. If I did it, I'd want 3 Q's early-mid-late job; in your case, 20-50-80M would convince me. Chances are that what's fastest at one will be fastest at another Q, but I don't think it's obvious that it has to happen that way. I'll post a new params.c180 soon with 31/31LP this afternoon. Progress! As for finding the 2LP/3LP crossover, we only need to test every 5 digits since that's how the params files are organized. The work we're doing could be used to generate e.g. params.c178, which CADO would recognise and use for specifically 178-digit inputs, but the maintainers have already told me not to submit any such params files. Seems like overkill, even for us.

 Similar Threads Thread Thread Starter Forum Replies Last Post enzocreti enzocreti 1 2020-03-03 18:38 tuckerkao Miscellaneous Math 2 2020-02-16 06:23 Nick Puzzles 9 2013-02-13 17:17 vsuite GPU Computing 11 2011-02-02 04:47 Corbyguy Software 3 2008-06-09 18:09

All times are UTC. The time now is 16:13.

Tue Nov 24 16:13:06 UTC 2020 up 75 days, 13:24, 4 users, load averages: 2.23, 1.91, 1.78

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.