mersenneforum.org Advice for large GNFS jobs?
 Register FAQ Search Today's Posts Mark Forums Read

2013-07-09, 14:13   #34
henryzz
Just call me Henry

"David"
Sep 2007
Cambridge (GMT/BST)

11×521 Posts

Quote:
 Originally Posted by lorgix I think I have a rough grasp on how lpb works now. You seem to be making a case for higher lpb, but you agree that lpb can be set too high, right? I don't quite get the bold part. I don't understand how mfb works. Are you saying it increases complexity and that I can sometimes get the benefits of a higher lpb without paying the price of a higher mfb? (which would be higher complexity, and I don't know what that is in this context(harder filtering? Is that what your other post was saying?)) I feel that I'm missing a few pieces, but I'm still learning. Hopefully others will benefit from these discussions. Thank you all for your patience.
mfb is the bound at which composites left after sieving(and I think trial factoring and ecm if used) are passed to the quadratic sieve routine. If a composite is larger than the bound it is reckoned that the chance of the relation being useful is below the effort required.

If you raise mfb then effectively you start looking for larger large primes. This takes effort and means that a large portion of the relations will have a larger prime as part of them.

Raising the lpb you are basically saying that if you find relations with larger large primes then you may as well keep them. The chances of them individually being useful ins't very large as there aren't that many of them but between them there is a chance that some will be useful.

The challenge is making sure the only extra effort(filtering) is less than the worth of the relations that make it into the matrix.
This trick hasn't been tested much. Especially with large factorizations but I think we tend to push the balance the wrong way. We ignore relations that could be more helpful than the time taken to discard most of them.

 2013-07-17, 13:02 #35 lorgix     Sep 2010 Scandinavia 26716 Posts I'm hoping frmky, jasonp & fivemack will have something to contribute here; The msieve readme says that "the best decomposition for P MPI processes will reduce the solve time by a factor of about P^0.6 using gigabit ethernet, while an infiniband interconnect scales the solve time by around P^0.71". IIRC, the 0.71 was higher in the old readme, why is that? What kind of Infiniband was this? I think gigabit ethernet has a latency on the order of 20 microseconds, 10GbE is around 5, and Infiniband approaches 1. How important is latency vs. bandwidth? 4X DDR Infiniband is reasonably cheap, is it worth spending the extra money to double the bandwidth and get up to 32Gbit/s? I'm assuming even the "slower" one will absolutely demolish 10GbE. Would it be possible and worthwhile to connect three nodes using three dual-port HCAs? That would mean no switch.
 2013-07-17, 16:58 #36 jasonp Tribal Bullet     Oct 2004 DA016 Posts I can only attempt an answer to the first question, and even then frmky has the actual numbers. The Teragrid nodes we've been using have been getting larger and faster continuously, at the same time the code is picking up patches that reduce the cost of communications, so the scaling is expected to change under those circumstances. We've now run much larger jobs than when the first version of the readme was written, so that gets the behavior more into asymptotic territory. With the recent overhaul of the threaded version, we'll need to re-tune things because there's evidence that multithreading now delivers much better performance out of a single node than one MPI process per core. I suspect the code now is more bandwidth than latency bound now; patches from Ilya ('poily' here) have changed the algorithm from sending a few huge messages to many simultaneous medium-size messages, and Greg reports this made cluster runs 25% faster. Switching from Gbit ethernet to infiniband has halved the solve time on Fullerton's 8-node cluster. In fact we haven't run timing tests on Gbit ethernet in years. Last fiddled with by jasonp on 2013-07-17 at 16:59
 2013-07-17, 18:53 #37 firejuggler     Apr 2010 Over the rainbow 2×1,217 Posts I have been asked to post the parameters for this poly Code: 2203286292154236920662074580008136560385550762038571072069284129582298550469011615783387269827436918721335468107066200517568432204890391543672088684775850203864157356993 (169 digits) R0: -344517009720320657345668021520609 R1: 870535396513771 A0: 252681583117408750059938913868667397960 A1: 716807602129660819233465802155626 A2: 1024911528196794072738767285 A3: 155515608837130473266 A4: -37852190172270 A5: 453960 skew 5721717.05, size 1.841e-016, alpha -6.366, combined = 4.109e-013 rroots = 3 The score is in the excepted range (3.91-013 to 4.5-013) for a C169, but other didn't find anything aproaching . I think that this hit is purely luck. Usually, I run my polyselection with a maxnorm of 5 or 6 time the stage 2 limit from 0 to 600 000 then slowly decrease it as the leading coef rise. However, I got too many hit, so I reduced my max norm to 21e23 msieve151_gpu -np1 "stage1_norm=21e23 0,600000" poly : 453960 870535396513771 344517009705803223426223720724056 developped poly : 453960 -34621935651270 -50766770753153193334 1241978336814028904403119205 706646633645376920708483366483406 -5882326745223555951249963188015466218525 870535396513771 -344517009719081759248796430890814 -2.03 4.358210e+021 The last number is usually in the range of X.xxxxxxe+023 Last fiddled with by firejuggler on 2013-07-17 at 19:04
 2013-07-17, 23:24 #38 swellman     Jun 2012 2×5×7×41 Posts Thanks firejuggler (and VBCurtis et al), appreciate your running the search for this poly. I've added the other parameters and done some test sieving, but the performance is not yet what it needs to be. Using lpbr/a=30, mfbr/a=61, r/alim=111M and r/alambda=2.7 gives a sieving rate of 1.05 rel/sec and 0.86 rels/spec_q on my i7 using 8 threads (4+4 HT) with Yafu on Win 7 (64 bit). While I ultimately plan to factor this composite on a faster i7 in Linux, these benchmarks tell me the parameters are suboptimal. I did try lpbr/a of 31 and mfbr/a of 62. While of course the speed and yield increased, the net time estimate to sieve increased by over 10%. Plus the memory requirements start climbing as well. Suggestions? Last fiddled with by swellman on 2013-07-17 at 23:26 Reason: Crediting other poly searchers
2013-07-19, 18:18   #39
lorgix

Sep 2010
Scandinavia

3·5·41 Posts

Quote:
 Originally Posted by swellman Thanks firejuggler (and VBCurtis et al), appreciate your running the search for this poly. I've added the other parameters and done some test sieving, but the performance is not yet what it needs to be. Using lpbr/a=30, mfbr/a=61, r/alim=111M and r/alambda=2.7 gives a sieving rate of 1.05 rel/sec and 0.86 rels/spec_q on my i7 using 8 threads (4+4 HT) with Yafu on Win 7 (64 bit). While I ultimately plan to factor this composite on a faster i7 in Linux, these benchmarks tell me the parameters are suboptimal. I did try lpbr/a of 31 and mfbr/a of 62. While of course the speed and yield increased, the net time estimate to sieve increased by over 10%. Plus the memory requirements start climbing as well. Suggestions?
Are you using the 15e-siever?

I'm no expert, but my experiments indicate that you could get away with using r/alim of 2^26-2 and
Code:
lpbr: 30
lpba: 30
mfbr: 60
mfba: 60
rlambda: 2.7
alambda: 2.7
if you sieve over alim/3 through alim.

If that is not a good idea, for whatever reason; let me know.

 2013-07-19, 22:11 #40 swellman     Jun 2012 2·5·7·41 Posts No, I was using 14e when I posted my benchmarks, but have since done some more test sieving with 15e. Much better. I'm away from that machine for the weekend, but I will post my results when I return. Appreciate your thoughts. Looks like I need to vary r/alim more - I've been using values of a range of 100-111M.
 2013-07-22, 15:37 #41 WraithX     Mar 2006 23×59 Posts Hi everyone, I've done a lot more test sieving and am basically ready to move on to actual sieving. However, I was wondering, after scaling the relations to compare to each other, how do I compare the times to each other? Here are my test sieve results plus their scaled total relations, which one of these would work the best? Should I multiply the sec/rel by the scaling factor too, and then maybe add the times up to compare all of them against each other? Or should I not scale the times and just add them up as they are? Code: t01: ### norm 1.270759e-20 alpha -7.678108 e 9.961e-16 rroots 3 skew: 291528643.68 deg5 p01 lpbr: 32 lpba: 33 mfbr: 64 mfba: 96 rlambda: 2.7 alambda: 3.7 total yield: 1029, q=100001029 (1.68794 sec/rel) * 1.4142 = 1455 total yield: 1188, q=300001001 (1.97009 sec/rel) * 1.4142 = 1680 total yield: 862, q=500001001 (2.38819 sec/rel) * 1.4142 = 1219 t02: ### norm 1.270759e-20 alpha -7.678108 e 9.961e-16 rroots 3 skew: 291528643.68 deg5 p01 lpbr: 32 lpba: 33 mfbr: 96 mfba: 96 rlambda: 3.7 alambda: 3.7 total yield: 1092, q=100001029 (3.22398 sec/rel) * 1.4142 = 1544 total yield: 1271, q=300001001 (3.73349 sec/rel) * 1.4142 = 1797 total yield: 926, q=500001001 (4.27995 sec/rel) * 1.4142 = 1309 t03: ### norm 1.270759e-20 alpha -7.678108 e 9.961e-16 rroots 3 skew: 291528643.68 deg5 p01 lpbr: 33 lpba: 33 mfbr: 66 mfba: 96 rlambda: 2.7 alambda: 3.7 total yield: 1430, q=100001029 (1.23638 sec/rel) * 1.0000 = 1430 total yield: 1664, q=300001001 (1.42994 sec/rel) * 1.0000 = 1664 total yield: 1244, q=500001001 (1.67699 sec/rel) * 1.0000 = 1244 t04: ### norm 1.270759e-20 alpha -7.678108 e 9.961e-16 rroots 3 skew: 291528643.68 deg5 p01 lpbr: 33 lpba: 33 mfbr: 96 mfba: 96 rlambda: 3.7 alambda: 3.7 total yield: 1581, q=100001029 (2.24839 sec/rel) * 1.0000 = 1581 total yield: 1824, q=300001001 (2.62574 sec/rel) * 1.0000 = 1824 total yield: 1380, q=500001001 (2.89555 sec/rel) * 1.0000 = 1380 t05: ### norm 1.358701e-20 alpha -6.910857 e 1.038e-15 rroots 5 skew: 104279094.33 deg5 p02 lpbr: 32 lpba: 33 mfbr: 64 mfba: 96 rlambda: 2.7 alambda: 3.7 total yield: 986, q=100001029 (1.74083 sec/rel) * 1.4142 = 1394 total yield: 852, q=300001001 (2.15629 sec/rel) * 1.4142 = 1204 total yield: 1162, q=500001001 (2.57390 sec/rel) * 1.4142 = 1643 t06: ### norm 1.358701e-20 alpha -6.910857 e 1.038e-15 rroots 5 skew: 104279094.33 deg5 p02 lpbr: 32 lpba: 33 mfbr: 96 mfba: 96 rlambda: 3.7 alambda: 3.7 total yield: 1061, q=100001029 (3.33601 sec/rel) * 1.4142 = 1500 total yield: 917, q=300001001 (3.96387 sec/rel) * 1.4142 = 1296 total yield: 1245, q=500001001 (4.46854 sec/rel) * 1.4142 = 1760 t07: ### norm 1.358701e-20 alpha -6.910857 e 1.038e-15 rroots 5 skew: 104279094.33 deg5 p02 lpbr: 33 lpba: 33 mfbr: 66 mfba: 96 rlambda: 2.7 alambda: 3.7 total yield: 1383, q=100001029 (1.26526 sec/rel) * 1.0000 = 1383 total yield: 1235, q=300001001 (1.51253 sec/rel) * 1.0000 = 1235 total yield: 1596, q=500001001 (1.90064 sec/rel) * 1.0000 = 1596 t08: ### norm 1.358701e-20 alpha -6.910857 e 1.038e-15 rroots 5 skew: 104279094.33 deg5 p02 lpbr: 33 lpba: 33 mfbr: 96 mfba: 96 rlambda: 3.7 alambda: 3.7 total yield: 1537, q=100001029 (2.32150 sec/rel) * 1.0000 = 1537 total yield: 1395, q=300001001 (2.62871 sec/rel) * 1.0000 = 1395 total yield: 1786, q=500001001 (3.14061 sec/rel) * 1.0000 = 1786 t09: ### norm 1.161363e-20 alpha -9.018629 e 9.234e-16 rroots 5 skew: 609572156.15 deg5 p03 lpbr: 32 lpba: 33 mfbr: 64 mfba: 96 rlambda: 2.7 alambda: 3.7 total yield: 914, q=100001029 (1.86595 sec/rel) * 1.4142 = 1292 total yield: 1039, q=300001001 (2.08668 sec/rel) * 1.4142 = 1469 total yield: 1130, q=500001001 (2.32058 sec/rel) * 1.4142 = 1598 t10: ### norm 1.161363e-20 alpha -9.018629 e 9.234e-16 rroots 5 skew: 609572156.15 deg5 p03 lpbr: 32 lpba: 33 mfbr: 96 mfba: 96 rlambda: 3.7 alambda: 3.7 total yield: 958, q=100001029 (3.31313 sec/rel) * 1.4142 = 1354 total yield: 1087, q=300001001 (3.93585 sec/rel) * 1.4142 = 1537 total yield: 1218, q=500001001 (4.08910 sec/rel) * 1.4142 = 1722 t11: ### norm 1.161363e-20 alpha -9.018629 e 9.234e-16 rroots 5 skew: 609572156.15 deg5 p03 lpbr: 33 lpba: 33 mfbr: 66 mfba: 96 rlambda: 2.7 alambda: 3.7 total yield: 1256, q=100001029 (1.38039 sec/rel) * 1.0000 = 1256 total yield: 1466, q=300001001 (1.50356 sec/rel) * 1.0000 = 1466 total yield: 1579, q=500001001 (1.68606 sec/rel) * 1.0000 = 1579 t12: ### norm 1.161363e-20 alpha -9.018629 e 9.234e-16 rroots 5 skew: 609572156.15 deg5 p03 lpbr: 33 lpba: 33 mfbr: 96 mfba: 96 rlambda: 3.7 alambda: 3.7 total yield: 1379, q=100001029 (2.31964 sec/rel) * 1.0000 = 1379 total yield: 1602, q=300001001 (2.69361 sec/rel) * 1.0000 = 1602 total yield: 1765, q=500001001 (2.84677 sec/rel) * 1.0000 = 1765 t13: ### norm 1.276810e-020 alpha -9.238464 e 9.932e-016 rroots 5 skew: 380780879.24 deg5 p04 lpbr: 32 lpba: 33 mfbr: 64 mfba: 96 rlambda: 2.7 alambda: 3.7 total yield: 960, q=100001029 (1.73365 sec/rel) * 1.4142 = 1357 total yield: 1013, q=300001001 (1.94229 sec/rel) * 1.4142 = 1432 total yield: 1108, q=500001001 (2.25318 sec/rel) * 1.4142 = 1566 t14: ### norm 1.276810e-020 alpha -9.238464 e 9.932e-016 rroots 5 skew: 380780879.24 deg5 p04 lpbr: 33 lpba: 33 mfbr: 66 mfba: 96 rlambda: 2.7 alambda: 3.7 total yield: 1287, q=100001029 (1.31478 sec/rel) * 1.0000 = 1287 total yield: 1423, q=300001001 (1.40586 sec/rel) * 1.0000 = 1423 total yield: 1549, q=500001001 (1.63787 sec/rel) * 1.0000 = 1549 t15: ### norm 1.276810e-020 alpha -9.238464 e 9.932e-016 rroots 5 skew: 380780879.24 deg5 p04 lpbr: 33 lpba: 33 mfbr: 96 mfba: 96 rlambda: 3.7 alambda: 3.7 total yield: 1429, q=100001029 (2.32081 sec/rel) * 1.0000 = 1429 total yield: 1580, q=300001001 (2.50181 sec/rel) * 1.0000 = 1580 total yield: 1706, q=500001001 (2.82340 sec/rel) * 1.0000 = 1706 t16: ### norm 3.235054e-15 alpha -8.969838 e 1.013e-15 rroots 4 skew: 2580939.03 deg6 p07 lpbr: 32 lpba: 33 mfbr: 64 mfba: 96 rlambda: 2.7 alambda: 3.7 total yield: 1142, q=100001029 (1.75196 sec/rel) * 1.4142 = 1615 total yield: 807, q=300001001 (2.31809 sec/rel) * 1.4142 = 1141 total yield: 832, q=500001001 (2.82525 sec/rel) * 1.4142 = 1176 These weren't the only tests that I ran, but these were pretty much the best of all of them. The parameters I varied were: lpbr 32, 33 (I also tried 34, but the binaries I have don't support that) lpba 32, 33 (I also tried 34, but the binaries I have don't support that) mfbr/mfba: 2*lpbr/96, 96/2*lpba, 96/96 rlambda/alambda: 2.7/3.7, 3.7/2.7, 3.7/3.7 I sieved 1000 Q at 100e6, 300e6, and 500e6. I also sieved on both the rational and algebraic sides. The algebraic side was the big winner. All the results above came from the algebraic side sieving.
2013-07-22, 19:23   #42
lorgix

Sep 2010
Scandinavia

3×5×41 Posts

Quote:
 Originally Posted by jasonp I can only attempt an answer to the first question, and even then frmky has the actual numbers. The Teragrid nodes we've been using have been getting larger and faster continuously, at the same time the code is picking up patches that reduce the cost of communications, so the scaling is expected to change under those circumstances. We've now run much larger jobs than when the first version of the readme was written, so that gets the behavior more into asymptotic territory. With the recent overhaul of the threaded version, we'll need to re-tune things because there's evidence that multithreading now delivers much better performance out of a single node than one MPI process per core. I suspect the code now is more bandwidth than latency bound now; patches from Ilya ('poily' here) have changed the algorithm from sending a few huge messages to many simultaneous medium-size messages, and Greg reports this made cluster runs 25% faster. Switching from Gbit ethernet to infiniband has halved the solve time on Fullerton's 8-node cluster. In fact we haven't run timing tests on Gbit ethernet in years.
Some interesting info. Thanks!
Quote:
 Originally Posted by WraithX Hi everyone, I've done a lot more test sieving and am basically ready to move on to actual sieving. *snip* The parameters I varied were: lpbr 32, 33 (I also tried 34, but the binaries I have don't support that) lpba 32, 33 (I also tried 34, but the binaries I have don't support that) mfbr/mfba: 2*lpbr/96, 96/2*lpba, 96/96 rlambda/alambda: 2.7/3.7, 3.7/2.7, 3.7/3.7 I sieved 1000 Q at 100e6, 300e6, and 500e6. I also sieved on both the rational and algebraic sides. The algebraic side was the big winner. All the results above came from the algebraic side sieving.
I think you'll want to recompile so that you can use lpb>33.

You may want to try larger lambda, but I'm not too sure about that.

2013-07-23, 00:25   #43
WraithX

Mar 2006

23·59 Posts

Quote:
 Originally Posted by lorgix I think you'll want to recompile so that you can use lpb>33. You may want to try larger lambda, but I'm not too sure about that.
Hmmm, I've never tried to compile ggnfs before. Can it be compiled in MinGW64? I'm using the binaries that had the Windows ASM improvements provided by Dan Ee over in this thread. Actually, looking at it again, he used MinGW64 and gives instructions on how to compile it. What needs to be changed to allow lpbr/lpba > 33?

Also, do you have any thoughts on my question of how to compare the timing (sec/rel) between all my test sieves? Or anyone else, what is the best way to compare these timings? Should I scale the timings the same way I scaled the relations and then perhaps add them up to do a comparison? Or should I just add up the numbers as they are to compare them?

2013-07-23, 01:47   #44
swellman

Jun 2012

2·5·7·41 Posts

Quote:
 Originally Posted by WraithX Also, do you have any thoughts on my question of how to compare the timing (sec/rel) between all my test sieves? Or anyone else, what is the best way to compare these timings? Should I scale the timings the same way I scaled the relations and then perhaps add them up to do a comparison? Or should I just add up the numbers as they are to compare them?
I look to minimize total estimated sieving time, provided the resulting LA fits within my memory limitations. I use the formula

Sieving_time = (total_rels_req) * (sec/rel) / (#_threads)

where in your case total_rels_req is about 400M (for lpba and lpbr of 32) times the scaling factor.

 Similar Threads Thread Thread Starter Forum Replies Last Post ryanp Factoring 69 2013-04-30 00:28 ixfd64 Factoring 3 2012-06-06 08:27 WraithX Msieve 18 2012-05-20 22:19 Syd Aliquot Sequences 7 2011-03-14 18:35 bdodson Factoring 20 2008-11-26 20:45

All times are UTC. The time now is 01:07.

Wed Oct 21 01:07:28 UTC 2020 up 40 days, 22:18, 0 users, load averages: 1.62, 1.42, 1.50