20130709, 14:13  #34  
Just call me Henry
"David"
Sep 2007
Cambridge (GMT/BST)
11×521 Posts 
Quote:
If you raise mfb then effectively you start looking for larger large primes. This takes effort and means that a large portion of the relations will have a larger prime as part of them. Raising the lpb you are basically saying that if you find relations with larger large primes then you may as well keep them. The chances of them individually being useful ins't very large as there aren't that many of them but between them there is a chance that some will be useful. The challenge is making sure the only extra effort(filtering) is less than the worth of the relations that make it into the matrix. This trick hasn't been tested much. Especially with large factorizations but I think we tend to push the balance the wrong way. We ignore relations that could be more helpful than the time taken to discard most of them. 

20130717, 13:02  #35 
Sep 2010
Scandinavia
267_{16} Posts 
I'm hoping frmky, jasonp & fivemack will have something to contribute here;
The msieve readme says that "the best decomposition for P MPI processes will reduce the solve time by a factor of about P^0.6 using gigabit ethernet, while an infiniband interconnect scales the solve time by around P^0.71". IIRC, the 0.71 was higher in the old readme, why is that? What kind of Infiniband was this? I think gigabit ethernet has a latency on the order of 20 microseconds, 10GbE is around 5, and Infiniband approaches 1. How important is latency vs. bandwidth? 4X DDR Infiniband is reasonably cheap, is it worth spending the extra money to double the bandwidth and get up to 32Gbit/s? I'm assuming even the "slower" one will absolutely demolish 10GbE. Would it be possible and worthwhile to connect three nodes using three dualport HCAs? That would mean no switch. 
20130717, 16:58  #36 
Tribal Bullet
Oct 2004
DA0_{16} Posts 
I can only attempt an answer to the first question, and even then frmky has the actual numbers. The Teragrid nodes we've been using have been getting larger and faster continuously, at the same time the code is picking up patches that reduce the cost of communications, so the scaling is expected to change under those circumstances. We've now run much larger jobs than when the first version of the readme was written, so that gets the behavior more into asymptotic territory. With the recent overhaul of the threaded version, we'll need to retune things because there's evidence that multithreading now delivers much better performance out of a single node than one MPI process per core.
I suspect the code now is more bandwidth than latency bound now; patches from Ilya ('poily' here) have changed the algorithm from sending a few huge messages to many simultaneous mediumsize messages, and Greg reports this made cluster runs 25% faster. Switching from Gbit ethernet to infiniband has halved the solve time on Fullerton's 8node cluster. In fact we haven't run timing tests on Gbit ethernet in years. Last fiddled with by jasonp on 20130717 at 16:59 
20130717, 18:53  #37 
Apr 2010
Over the rainbow
2×1,217 Posts 
I have been asked to post the parameters for this poly
Code:
2203286292154236920662074580008136560385550762038571072069284129582298550469011615783387269827436918721335468107066200517568432204890391543672088684775850203864157356993 (169 digits) R0: 344517009720320657345668021520609 R1: 870535396513771 A0: 252681583117408750059938913868667397960 A1: 716807602129660819233465802155626 A2: 1024911528196794072738767285 A3: 155515608837130473266 A4: 37852190172270 A5: 453960 skew 5721717.05, size 1.841e016, alpha 6.366, combined = 4.109e013 rroots = 3 I think that this hit is purely luck. Usually, I run my polyselection with a maxnorm of 5 or 6 time the stage 2 limit from 0 to 600 000 then slowly decrease it as the leading coef rise. However, I got too many hit, so I reduced my max norm to 21e23 msieve151_gpu np1 "stage1_norm=21e23 0,600000" poly : 453960 870535396513771 344517009705803223426223720724056 developped poly : 453960 34621935651270 50766770753153193334 1241978336814028904403119205 706646633645376920708483366483406 5882326745223555951249963188015466218525 870535396513771 344517009719081759248796430890814 2.03 4.358210e+021 The last number is usually in the range of X.xxxxxxe+023 Last fiddled with by firejuggler on 20130717 at 19:04 
20130717, 23:24  #38 
Jun 2012
2×5×7×41 Posts 
Thanks firejuggler (and VBCurtis et al), appreciate your running the search for this poly. I've added the other parameters and done some test sieving, but the performance is not yet what it needs to be.
Using lpbr/a=30, mfbr/a=61, r/alim=111M and r/alambda=2.7 gives a sieving rate of 1.05 rel/sec and 0.86 rels/spec_q on my i7 using 8 threads (4+4 HT) with Yafu on Win 7 (64 bit). While I ultimately plan to factor this composite on a faster i7 in Linux, these benchmarks tell me the parameters are suboptimal. I did try lpbr/a of 31 and mfbr/a of 62. While of course the speed and yield increased, the net time estimate to sieve increased by over 10%. Plus the memory requirements start climbing as well. Suggestions? Last fiddled with by swellman on 20130717 at 23:26 Reason: Crediting other poly searchers 
20130719, 18:18  #39  
Sep 2010
Scandinavia
3·5·41 Posts 
Quote:
I'm no expert, but my experiments indicate that you could get away with using r/alim of 2^262 and Code:
lpbr: 30 lpba: 30 mfbr: 60 mfba: 60 rlambda: 2.7 alambda: 2.7 If that is not a good idea, for whatever reason; let me know. 

20130719, 22:11  #40 
Jun 2012
2·5·7·41 Posts 
No, I was using 14e when I posted my benchmarks, but have since done some more test sieving with 15e. Much better.
I'm away from that machine for the weekend, but I will post my results when I return. Appreciate your thoughts. Looks like I need to vary r/alim more  I've been using values of a range of 100111M. 
20130722, 15:37  #41 
Mar 2006
2^{3}×59 Posts 
Hi everyone, I've done a lot more test sieving and am basically ready to move on to actual sieving. However, I was wondering, after scaling the relations to compare to each other, how do I compare the times to each other? Here are my test sieve results plus their scaled total relations, which one of these would work the best? Should I multiply the sec/rel by the scaling factor too, and then maybe add the times up to compare all of them against each other? Or should I not scale the times and just add them up as they are?
Code:
t01: ### norm 1.270759e20 alpha 7.678108 e 9.961e16 rroots 3 skew: 291528643.68 deg5 p01 lpbr: 32 lpba: 33 mfbr: 64 mfba: 96 rlambda: 2.7 alambda: 3.7 total yield: 1029, q=100001029 (1.68794 sec/rel) * 1.4142 = 1455 total yield: 1188, q=300001001 (1.97009 sec/rel) * 1.4142 = 1680 total yield: 862, q=500001001 (2.38819 sec/rel) * 1.4142 = 1219 t02: ### norm 1.270759e20 alpha 7.678108 e 9.961e16 rroots 3 skew: 291528643.68 deg5 p01 lpbr: 32 lpba: 33 mfbr: 96 mfba: 96 rlambda: 3.7 alambda: 3.7 total yield: 1092, q=100001029 (3.22398 sec/rel) * 1.4142 = 1544 total yield: 1271, q=300001001 (3.73349 sec/rel) * 1.4142 = 1797 total yield: 926, q=500001001 (4.27995 sec/rel) * 1.4142 = 1309 t03: ### norm 1.270759e20 alpha 7.678108 e 9.961e16 rroots 3 skew: 291528643.68 deg5 p01 lpbr: 33 lpba: 33 mfbr: 66 mfba: 96 rlambda: 2.7 alambda: 3.7 total yield: 1430, q=100001029 (1.23638 sec/rel) * 1.0000 = 1430 total yield: 1664, q=300001001 (1.42994 sec/rel) * 1.0000 = 1664 total yield: 1244, q=500001001 (1.67699 sec/rel) * 1.0000 = 1244 t04: ### norm 1.270759e20 alpha 7.678108 e 9.961e16 rroots 3 skew: 291528643.68 deg5 p01 lpbr: 33 lpba: 33 mfbr: 96 mfba: 96 rlambda: 3.7 alambda: 3.7 total yield: 1581, q=100001029 (2.24839 sec/rel) * 1.0000 = 1581 total yield: 1824, q=300001001 (2.62574 sec/rel) * 1.0000 = 1824 total yield: 1380, q=500001001 (2.89555 sec/rel) * 1.0000 = 1380 t05: ### norm 1.358701e20 alpha 6.910857 e 1.038e15 rroots 5 skew: 104279094.33 deg5 p02 lpbr: 32 lpba: 33 mfbr: 64 mfba: 96 rlambda: 2.7 alambda: 3.7 total yield: 986, q=100001029 (1.74083 sec/rel) * 1.4142 = 1394 total yield: 852, q=300001001 (2.15629 sec/rel) * 1.4142 = 1204 total yield: 1162, q=500001001 (2.57390 sec/rel) * 1.4142 = 1643 t06: ### norm 1.358701e20 alpha 6.910857 e 1.038e15 rroots 5 skew: 104279094.33 deg5 p02 lpbr: 32 lpba: 33 mfbr: 96 mfba: 96 rlambda: 3.7 alambda: 3.7 total yield: 1061, q=100001029 (3.33601 sec/rel) * 1.4142 = 1500 total yield: 917, q=300001001 (3.96387 sec/rel) * 1.4142 = 1296 total yield: 1245, q=500001001 (4.46854 sec/rel) * 1.4142 = 1760 t07: ### norm 1.358701e20 alpha 6.910857 e 1.038e15 rroots 5 skew: 104279094.33 deg5 p02 lpbr: 33 lpba: 33 mfbr: 66 mfba: 96 rlambda: 2.7 alambda: 3.7 total yield: 1383, q=100001029 (1.26526 sec/rel) * 1.0000 = 1383 total yield: 1235, q=300001001 (1.51253 sec/rel) * 1.0000 = 1235 total yield: 1596, q=500001001 (1.90064 sec/rel) * 1.0000 = 1596 t08: ### norm 1.358701e20 alpha 6.910857 e 1.038e15 rroots 5 skew: 104279094.33 deg5 p02 lpbr: 33 lpba: 33 mfbr: 96 mfba: 96 rlambda: 3.7 alambda: 3.7 total yield: 1537, q=100001029 (2.32150 sec/rel) * 1.0000 = 1537 total yield: 1395, q=300001001 (2.62871 sec/rel) * 1.0000 = 1395 total yield: 1786, q=500001001 (3.14061 sec/rel) * 1.0000 = 1786 t09: ### norm 1.161363e20 alpha 9.018629 e 9.234e16 rroots 5 skew: 609572156.15 deg5 p03 lpbr: 32 lpba: 33 mfbr: 64 mfba: 96 rlambda: 2.7 alambda: 3.7 total yield: 914, q=100001029 (1.86595 sec/rel) * 1.4142 = 1292 total yield: 1039, q=300001001 (2.08668 sec/rel) * 1.4142 = 1469 total yield: 1130, q=500001001 (2.32058 sec/rel) * 1.4142 = 1598 t10: ### norm 1.161363e20 alpha 9.018629 e 9.234e16 rroots 5 skew: 609572156.15 deg5 p03 lpbr: 32 lpba: 33 mfbr: 96 mfba: 96 rlambda: 3.7 alambda: 3.7 total yield: 958, q=100001029 (3.31313 sec/rel) * 1.4142 = 1354 total yield: 1087, q=300001001 (3.93585 sec/rel) * 1.4142 = 1537 total yield: 1218, q=500001001 (4.08910 sec/rel) * 1.4142 = 1722 t11: ### norm 1.161363e20 alpha 9.018629 e 9.234e16 rroots 5 skew: 609572156.15 deg5 p03 lpbr: 33 lpba: 33 mfbr: 66 mfba: 96 rlambda: 2.7 alambda: 3.7 total yield: 1256, q=100001029 (1.38039 sec/rel) * 1.0000 = 1256 total yield: 1466, q=300001001 (1.50356 sec/rel) * 1.0000 = 1466 total yield: 1579, q=500001001 (1.68606 sec/rel) * 1.0000 = 1579 t12: ### norm 1.161363e20 alpha 9.018629 e 9.234e16 rroots 5 skew: 609572156.15 deg5 p03 lpbr: 33 lpba: 33 mfbr: 96 mfba: 96 rlambda: 3.7 alambda: 3.7 total yield: 1379, q=100001029 (2.31964 sec/rel) * 1.0000 = 1379 total yield: 1602, q=300001001 (2.69361 sec/rel) * 1.0000 = 1602 total yield: 1765, q=500001001 (2.84677 sec/rel) * 1.0000 = 1765 t13: ### norm 1.276810e020 alpha 9.238464 e 9.932e016 rroots 5 skew: 380780879.24 deg5 p04 lpbr: 32 lpba: 33 mfbr: 64 mfba: 96 rlambda: 2.7 alambda: 3.7 total yield: 960, q=100001029 (1.73365 sec/rel) * 1.4142 = 1357 total yield: 1013, q=300001001 (1.94229 sec/rel) * 1.4142 = 1432 total yield: 1108, q=500001001 (2.25318 sec/rel) * 1.4142 = 1566 t14: ### norm 1.276810e020 alpha 9.238464 e 9.932e016 rroots 5 skew: 380780879.24 deg5 p04 lpbr: 33 lpba: 33 mfbr: 66 mfba: 96 rlambda: 2.7 alambda: 3.7 total yield: 1287, q=100001029 (1.31478 sec/rel) * 1.0000 = 1287 total yield: 1423, q=300001001 (1.40586 sec/rel) * 1.0000 = 1423 total yield: 1549, q=500001001 (1.63787 sec/rel) * 1.0000 = 1549 t15: ### norm 1.276810e020 alpha 9.238464 e 9.932e016 rroots 5 skew: 380780879.24 deg5 p04 lpbr: 33 lpba: 33 mfbr: 96 mfba: 96 rlambda: 3.7 alambda: 3.7 total yield: 1429, q=100001029 (2.32081 sec/rel) * 1.0000 = 1429 total yield: 1580, q=300001001 (2.50181 sec/rel) * 1.0000 = 1580 total yield: 1706, q=500001001 (2.82340 sec/rel) * 1.0000 = 1706 t16: ### norm 3.235054e15 alpha 8.969838 e 1.013e15 rroots 4 skew: 2580939.03 deg6 p07 lpbr: 32 lpba: 33 mfbr: 64 mfba: 96 rlambda: 2.7 alambda: 3.7 total yield: 1142, q=100001029 (1.75196 sec/rel) * 1.4142 = 1615 total yield: 807, q=300001001 (2.31809 sec/rel) * 1.4142 = 1141 total yield: 832, q=500001001 (2.82525 sec/rel) * 1.4142 = 1176 lpbr 32, 33 (I also tried 34, but the binaries I have don't support that) lpba 32, 33 (I also tried 34, but the binaries I have don't support that) mfbr/mfba: 2*lpbr/96, 96/2*lpba, 96/96 rlambda/alambda: 2.7/3.7, 3.7/2.7, 3.7/3.7 I sieved 1000 Q at 100e6, 300e6, and 500e6. I also sieved on both the rational and algebraic sides. The algebraic side was the big winner. All the results above came from the algebraic side sieving. 
20130722, 19:23  #42  
Sep 2010
Scandinavia
3×5×41 Posts 
Quote:
Quote:
You may want to try larger lambda, but I'm not too sure about that. 

20130723, 00:25  #43  
Mar 2006
2^{3}·59 Posts 
Quote:
Also, do you have any thoughts on my question of how to compare the timing (sec/rel) between all my test sieves? Or anyone else, what is the best way to compare these timings? Should I scale the timings the same way I scaled the relations and then perhaps add them up to do a comparison? Or should I just add up the numbers as they are to compare them? 

20130723, 01:47  #44  
Jun 2012
2·5·7·41 Posts 
Quote:
Sieving_time = (total_rels_req) * (sec/rel) / (#_threads) where in your case total_rels_req is about 400M (for lpba and lpbr of 32) times the scaling factor. 

Thread Tools  
Similar Threads  
Thread  Thread Starter  Forum  Replies  Last Post 
Advice for large SNFS jobs?  ryanp  Factoring  69  20130430 00:28 
doing large NFS jobs on Amazon EC2?  ixfd64  Factoring  3  20120606 08:27 
Seeking GNFS factoring advice...  WraithX  Msieve  18  20120520 22:19 
need some advice: gnfs C164 from 162126:i4274  Syd  Aliquot Sequences  7  20110314 18:35 
Filtering on large NFS jobs, particularly 2^908+1  bdodson  Factoring  20  20081126 20:45 