mersenneforum.org  

Go Back   mersenneforum.org > Factoring Projects > Factoring

Reply
 
Thread Tools
Old 2013-07-09, 14:13   #34
henryzz
Just call me Henry
 
henryzz's Avatar
 
"David"
Sep 2007
Cambridge (GMT/BST)

5,743 Posts
Default

Quote:
Originally Posted by lorgix View Post
I think I have a rough grasp on how lpb works now. You seem to be making a case for higher lpb, but you agree that lpb can be set too high, right?
I don't quite get the bold part. I don't understand how mfb works. Are you saying it increases complexity and that I can sometimes get the benefits of a higher lpb without paying the price of a higher mfb? (which would be higher complexity, and I don't know what that is in this context(harder filtering? Is that what your other post was saying?))

I feel that I'm missing a few pieces, but I'm still learning. Hopefully others will benefit from these discussions.

Thank you all for your patience.
mfb is the bound at which composites left after sieving(and I think trial factoring and ecm if used) are passed to the quadratic sieve routine. If a composite is larger than the bound it is reckoned that the chance of the relation being useful is below the effort required.

If you raise mfb then effectively you start looking for larger large primes. This takes effort and means that a large portion of the relations will have a larger prime as part of them.

Raising the lpb you are basically saying that if you find relations with larger large primes then you may as well keep them. The chances of them individually being useful ins't very large as there aren't that many of them but between them there is a chance that some will be useful.

The challenge is making sure the only extra effort(filtering) is less than the worth of the relations that make it into the matrix.
This trick hasn't been tested much. Especially with large factorizations but I think we tend to push the balance the wrong way. We ignore relations that could be more helpful than the time taken to discard most of them.
henryzz is offline   Reply With Quote
Old 2013-07-17, 13:02   #35
lorgix
 
lorgix's Avatar
 
Sep 2010
Scandinavia

3×5×41 Posts
Default

I'm hoping frmky, jasonp & fivemack will have something to contribute here;

The msieve readme says that "the best decomposition
for P MPI processes will reduce the solve time by a factor of about P^0.6
using gigabit ethernet, while an infiniband interconnect scales the solve
time by around P^0.71".

IIRC, the 0.71 was higher in the old readme, why is that?

What kind of Infiniband was this?

I think gigabit ethernet has a latency on the order of 20 microseconds, 10GbE is around 5, and Infiniband approaches 1.

How important is latency vs. bandwidth? 4X DDR Infiniband is reasonably cheap, is it worth spending the extra money to double the bandwidth and get up to 32Gbit/s? I'm assuming even the "slower" one will absolutely demolish 10GbE.

Would it be possible and worthwhile to connect three nodes using three dual-port HCAs? That would mean no switch.
lorgix is offline   Reply With Quote
Old 2013-07-17, 16:58   #36
jasonp
Tribal Bullet
 
jasonp's Avatar
 
Oct 2004

3·1,163 Posts
Default

I can only attempt an answer to the first question, and even then frmky has the actual numbers. The Teragrid nodes we've been using have been getting larger and faster continuously, at the same time the code is picking up patches that reduce the cost of communications, so the scaling is expected to change under those circumstances. We've now run much larger jobs than when the first version of the readme was written, so that gets the behavior more into asymptotic territory. With the recent overhaul of the threaded version, we'll need to re-tune things because there's evidence that multithreading now delivers much better performance out of a single node than one MPI process per core.

I suspect the code now is more bandwidth than latency bound now; patches from Ilya ('poily' here) have changed the algorithm from sending a few huge messages to many simultaneous medium-size messages, and Greg reports this made cluster runs 25% faster. Switching from Gbit ethernet to infiniband has halved the solve time on Fullerton's 8-node cluster. In fact we haven't run timing tests on Gbit ethernet in years.

Last fiddled with by jasonp on 2013-07-17 at 16:59
jasonp is offline   Reply With Quote
Old 2013-07-17, 18:53   #37
firejuggler
 
firejuggler's Avatar
 
Apr 2010
Over the rainbow

46448 Posts
Default

I have been asked to post the parameters for this poly
Code:
2203286292154236920662074580008136560385550762038571072069284129582298550469011615783387269827436918721335468107066200517568432204890391543672088684775850203864157356993 (169 digits)
R0: -344517009720320657345668021520609
R1: 870535396513771
A0: 252681583117408750059938913868667397960
A1: 716807602129660819233465802155626
A2: 1024911528196794072738767285
A3: 155515608837130473266
A4: -37852190172270
A5: 453960
skew 5721717.05, size 1.841e-016, alpha -6.366, combined = 4.109e-013 rroots = 3
The score is in the excepted range (3.91-013 to 4.5-013) for a C169, but other didn't find anything aproaching .
I think that this hit is purely luck. Usually, I run my polyselection with a maxnorm of 5 or 6 time the stage 2 limit
from 0 to 600 000 then slowly decrease it as the leading coef rise.
However, I got too many hit, so I reduced my max norm to 21e23
msieve151_gpu -np1 "stage1_norm=21e23 0,600000"

poly :
453960 870535396513771 344517009705803223426223720724056
developped poly :
453960 -34621935651270 -50766770753153193334 1241978336814028904403119205 706646633645376920708483366483406 -5882326745223555951249963188015466218525 870535396513771 -344517009719081759248796430890814 -2.03 4.358210e+021

The last number is usually in the range of X.xxxxxxe+023

Last fiddled with by firejuggler on 2013-07-17 at 19:04
firejuggler is offline   Reply With Quote
Old 2013-07-17, 23:24   #38
swellman
 
swellman's Avatar
 
Jun 2012

2·5·172 Posts
Default

Thanks firejuggler (and VBCurtis et al), appreciate your running the search for this poly. I've added the other parameters and done some test sieving, but the performance is not yet what it needs to be.

Using lpbr/a=30, mfbr/a=61, r/alim=111M and r/alambda=2.7 gives a sieving rate of 1.05 rel/sec and 0.86 rels/spec_q on my i7 using 8 threads (4+4 HT) with Yafu on Win 7 (64 bit). While I ultimately plan to factor this composite on a faster i7 in Linux, these benchmarks tell me the parameters are suboptimal.

I did try lpbr/a of 31 and mfbr/a of 62. While of course the speed and yield increased, the net time estimate to sieve increased by over 10%. Plus the memory requirements start climbing as well.

Suggestions?

Last fiddled with by swellman on 2013-07-17 at 23:26 Reason: Crediting other poly searchers
swellman is online now   Reply With Quote
Old 2013-07-19, 18:18   #39
lorgix
 
lorgix's Avatar
 
Sep 2010
Scandinavia

3·5·41 Posts
Default

Quote:
Originally Posted by swellman View Post
Thanks firejuggler (and VBCurtis et al), appreciate your running the search for this poly. I've added the other parameters and done some test sieving, but the performance is not yet what it needs to be.

Using lpbr/a=30, mfbr/a=61, r/alim=111M and r/alambda=2.7 gives a sieving rate of 1.05 rel/sec and 0.86 rels/spec_q on my i7 using 8 threads (4+4 HT) with Yafu on Win 7 (64 bit). While I ultimately plan to factor this composite on a faster i7 in Linux, these benchmarks tell me the parameters are suboptimal.

I did try lpbr/a of 31 and mfbr/a of 62. While of course the speed and yield increased, the net time estimate to sieve increased by over 10%. Plus the memory requirements start climbing as well.

Suggestions?
Are you using the 15e-siever?

I'm no expert, but my experiments indicate that you could get away with using r/alim of 2^26-2 and
Code:
lpbr: 30
lpba: 30
mfbr: 60
mfba: 60
rlambda: 2.7
alambda: 2.7
if you sieve over alim/3 through alim.

If that is not a good idea, for whatever reason; let me know.
lorgix is offline   Reply With Quote
Old 2013-07-19, 22:11   #40
swellman
 
swellman's Avatar
 
Jun 2012

2·5·172 Posts
Default

No, I was using 14e when I posted my benchmarks, but have since done some more test sieving with 15e. Much better.

I'm away from that machine for the weekend, but I will post my results when I return. Appreciate your thoughts. Looks like I need to vary r/alim more - I've been using values of a range of 100-111M.
swellman is online now   Reply With Quote
Old 2013-07-22, 15:37   #41
WraithX
 
WraithX's Avatar
 
Mar 2006

23×59 Posts
Default

Hi everyone, I've done a lot more test sieving and am basically ready to move on to actual sieving. However, I was wondering, after scaling the relations to compare to each other, how do I compare the times to each other? Here are my test sieve results plus their scaled total relations, which one of these would work the best? Should I multiply the sec/rel by the scaling factor too, and then maybe add the times up to compare all of them against each other? Or should I not scale the times and just add them up as they are?
Code:
t01:
### norm 1.270759e-20 alpha -7.678108 e 9.961e-16 rroots 3 skew: 291528643.68 deg5 p01
lpbr: 32 lpba: 33 mfbr: 64 mfba: 96 rlambda: 2.7 alambda: 3.7
total yield: 1029, q=100001029 (1.68794 sec/rel) * 1.4142 = 1455
total yield: 1188, q=300001001 (1.97009 sec/rel) * 1.4142 = 1680
total yield:  862, q=500001001 (2.38819 sec/rel) * 1.4142 = 1219

t02:
### norm 1.270759e-20 alpha -7.678108 e 9.961e-16 rroots 3 skew: 291528643.68 deg5 p01
lpbr: 32 lpba: 33 mfbr: 96 mfba: 96 rlambda: 3.7 alambda: 3.7
total yield: 1092, q=100001029 (3.22398 sec/rel) * 1.4142 = 1544
total yield: 1271, q=300001001 (3.73349 sec/rel) * 1.4142 = 1797
total yield:  926, q=500001001 (4.27995 sec/rel) * 1.4142 = 1309

t03:
### norm 1.270759e-20 alpha -7.678108 e 9.961e-16 rroots 3 skew: 291528643.68 deg5 p01
lpbr: 33 lpba: 33 mfbr: 66 mfba: 96 rlambda: 2.7 alambda: 3.7
total yield: 1430, q=100001029 (1.23638 sec/rel) * 1.0000 = 1430
total yield: 1664, q=300001001 (1.42994 sec/rel) * 1.0000 = 1664
total yield: 1244, q=500001001 (1.67699 sec/rel) * 1.0000 = 1244

t04:
### norm 1.270759e-20 alpha -7.678108 e 9.961e-16 rroots 3 skew: 291528643.68 deg5 p01
lpbr: 33 lpba: 33 mfbr: 96 mfba: 96 rlambda: 3.7 alambda: 3.7
total yield: 1581, q=100001029 (2.24839 sec/rel) * 1.0000 = 1581
total yield: 1824, q=300001001 (2.62574 sec/rel) * 1.0000 = 1824
total yield: 1380, q=500001001 (2.89555 sec/rel) * 1.0000 = 1380

t05:
### norm 1.358701e-20 alpha -6.910857 e 1.038e-15 rroots 5 skew: 104279094.33 deg5 p02
lpbr: 32 lpba: 33 mfbr: 64 mfba: 96 rlambda: 2.7 alambda: 3.7
total yield:  986, q=100001029 (1.74083 sec/rel) * 1.4142 = 1394
total yield:  852, q=300001001 (2.15629 sec/rel) * 1.4142 = 1204
total yield: 1162, q=500001001 (2.57390 sec/rel) * 1.4142 = 1643

t06:
### norm 1.358701e-20 alpha -6.910857 e 1.038e-15 rroots 5 skew: 104279094.33 deg5 p02
lpbr: 32 lpba: 33 mfbr: 96 mfba: 96 rlambda: 3.7 alambda: 3.7
total yield: 1061, q=100001029 (3.33601 sec/rel) * 1.4142 = 1500
total yield:  917, q=300001001 (3.96387 sec/rel) * 1.4142 = 1296
total yield: 1245, q=500001001 (4.46854 sec/rel) * 1.4142 = 1760

t07:
### norm 1.358701e-20 alpha -6.910857 e 1.038e-15 rroots 5 skew: 104279094.33 deg5 p02
lpbr: 33 lpba: 33 mfbr: 66 mfba: 96 rlambda: 2.7 alambda: 3.7
total yield: 1383, q=100001029 (1.26526 sec/rel) * 1.0000 = 1383
total yield: 1235, q=300001001 (1.51253 sec/rel) * 1.0000 = 1235
total yield: 1596, q=500001001 (1.90064 sec/rel) * 1.0000 = 1596

t08:
### norm 1.358701e-20 alpha -6.910857 e 1.038e-15 rroots 5 skew: 104279094.33 deg5 p02
lpbr: 33 lpba: 33 mfbr: 96 mfba: 96 rlambda: 3.7 alambda: 3.7
total yield: 1537, q=100001029 (2.32150 sec/rel) * 1.0000 = 1537
total yield: 1395, q=300001001 (2.62871 sec/rel) * 1.0000 = 1395
total yield: 1786, q=500001001 (3.14061 sec/rel) * 1.0000 = 1786

t09:
### norm 1.161363e-20 alpha -9.018629 e 9.234e-16 rroots 5 skew: 609572156.15 deg5 p03
lpbr: 32 lpba: 33 mfbr: 64 mfba: 96 rlambda: 2.7 alambda: 3.7
total yield:  914, q=100001029 (1.86595 sec/rel) * 1.4142 = 1292
total yield: 1039, q=300001001 (2.08668 sec/rel) * 1.4142 = 1469
total yield: 1130, q=500001001 (2.32058 sec/rel) * 1.4142 = 1598

t10:
### norm 1.161363e-20 alpha -9.018629 e 9.234e-16 rroots 5 skew: 609572156.15 deg5 p03
lpbr: 32 lpba: 33 mfbr: 96 mfba: 96 rlambda: 3.7 alambda: 3.7
total yield:  958, q=100001029 (3.31313 sec/rel) * 1.4142 = 1354
total yield: 1087, q=300001001 (3.93585 sec/rel) * 1.4142 = 1537
total yield: 1218, q=500001001 (4.08910 sec/rel) * 1.4142 = 1722

t11:
### norm 1.161363e-20 alpha -9.018629 e 9.234e-16 rroots 5 skew: 609572156.15 deg5 p03
lpbr: 33 lpba: 33 mfbr: 66 mfba: 96 rlambda: 2.7 alambda: 3.7
total yield: 1256, q=100001029 (1.38039 sec/rel) * 1.0000 = 1256
total yield: 1466, q=300001001 (1.50356 sec/rel) * 1.0000 = 1466
total yield: 1579, q=500001001 (1.68606 sec/rel) * 1.0000 = 1579

t12:
### norm 1.161363e-20 alpha -9.018629 e 9.234e-16 rroots 5 skew: 609572156.15 deg5 p03
lpbr: 33 lpba: 33 mfbr: 96 mfba: 96 rlambda: 3.7 alambda: 3.7
total yield: 1379, q=100001029 (2.31964 sec/rel) * 1.0000 = 1379
total yield: 1602, q=300001001 (2.69361 sec/rel) * 1.0000 = 1602
total yield: 1765, q=500001001 (2.84677 sec/rel) * 1.0000 = 1765

t13:
### norm 1.276810e-020 alpha -9.238464 e 9.932e-016 rroots 5 skew: 380780879.24 deg5 p04
lpbr: 32 lpba: 33 mfbr: 64 mfba: 96 rlambda: 2.7 alambda: 3.7
total yield:  960, q=100001029 (1.73365 sec/rel) * 1.4142 = 1357
total yield: 1013, q=300001001 (1.94229 sec/rel) * 1.4142 = 1432
total yield: 1108, q=500001001 (2.25318 sec/rel) * 1.4142 = 1566

t14:
### norm 1.276810e-020 alpha -9.238464 e 9.932e-016 rroots 5 skew: 380780879.24 deg5 p04
lpbr: 33 lpba: 33 mfbr: 66 mfba: 96 rlambda: 2.7 alambda: 3.7
total yield: 1287, q=100001029 (1.31478 sec/rel) * 1.0000 = 1287
total yield: 1423, q=300001001 (1.40586 sec/rel) * 1.0000 = 1423
total yield: 1549, q=500001001 (1.63787 sec/rel) * 1.0000 = 1549

t15:
### norm 1.276810e-020 alpha -9.238464 e 9.932e-016 rroots 5 skew: 380780879.24 deg5 p04
lpbr: 33 lpba: 33 mfbr: 96 mfba: 96 rlambda: 3.7 alambda: 3.7
total yield: 1429, q=100001029 (2.32081 sec/rel) * 1.0000 = 1429
total yield: 1580, q=300001001 (2.50181 sec/rel) * 1.0000 = 1580
total yield: 1706, q=500001001 (2.82340 sec/rel) * 1.0000 = 1706

t16:
### norm 3.235054e-15 alpha -8.969838 e 1.013e-15 rroots 4 skew: 2580939.03 deg6 p07
lpbr: 32 lpba: 33 mfbr: 64 mfba: 96 rlambda: 2.7 alambda: 3.7
total yield: 1142, q=100001029 (1.75196 sec/rel) * 1.4142 = 1615
total yield:  807, q=300001001 (2.31809 sec/rel) * 1.4142 = 1141
total yield:  832, q=500001001 (2.82525 sec/rel) * 1.4142 = 1176
These weren't the only tests that I ran, but these were pretty much the best of all of them. The parameters I varied were:
lpbr 32, 33 (I also tried 34, but the binaries I have don't support that)
lpba 32, 33 (I also tried 34, but the binaries I have don't support that)
mfbr/mfba: 2*lpbr/96, 96/2*lpba, 96/96
rlambda/alambda: 2.7/3.7, 3.7/2.7, 3.7/3.7
I sieved 1000 Q at 100e6, 300e6, and 500e6. I also sieved on both the rational and algebraic sides. The algebraic side was the big winner. All the results above came from the algebraic side sieving.
WraithX is offline   Reply With Quote
Old 2013-07-22, 19:23   #42
lorgix
 
lorgix's Avatar
 
Sep 2010
Scandinavia

3×5×41 Posts
Default

Quote:
Originally Posted by jasonp View Post
I can only attempt an answer to the first question, and even then frmky has the actual numbers. The Teragrid nodes we've been using have been getting larger and faster continuously, at the same time the code is picking up patches that reduce the cost of communications, so the scaling is expected to change under those circumstances. We've now run much larger jobs than when the first version of the readme was written, so that gets the behavior more into asymptotic territory. With the recent overhaul of the threaded version, we'll need to re-tune things because there's evidence that multithreading now delivers much better performance out of a single node than one MPI process per core.

I suspect the code now is more bandwidth than latency bound now; patches from Ilya ('poily' here) have changed the algorithm from sending a few huge messages to many simultaneous medium-size messages, and Greg reports this made cluster runs 25% faster. Switching from Gbit ethernet to infiniband has halved the solve time on Fullerton's 8-node cluster. In fact we haven't run timing tests on Gbit ethernet in years.
Some interesting info. Thanks!
Quote:
Originally Posted by WraithX View Post
Hi everyone, I've done a lot more test sieving and am basically ready to move on to actual sieving.

*snip*

The parameters I varied were:
lpbr 32, 33 (I also tried 34, but the binaries I have don't support that)
lpba 32, 33 (I also tried 34, but the binaries I have don't support that)
mfbr/mfba: 2*lpbr/96, 96/2*lpba, 96/96
rlambda/alambda: 2.7/3.7, 3.7/2.7, 3.7/3.7
I sieved 1000 Q at 100e6, 300e6, and 500e6. I also sieved on both the rational and algebraic sides. The algebraic side was the big winner. All the results above came from the algebraic side sieving.
I think you'll want to recompile so that you can use lpb>33.

You may want to try larger lambda, but I'm not too sure about that.
lorgix is offline   Reply With Quote
Old 2013-07-23, 00:25   #43
WraithX
 
WraithX's Avatar
 
Mar 2006

47210 Posts
Default

Quote:
Originally Posted by lorgix View Post
I think you'll want to recompile so that you can use lpb>33.

You may want to try larger lambda, but I'm not too sure about that.
Hmmm, I've never tried to compile ggnfs before. Can it be compiled in MinGW64? I'm using the binaries that had the Windows ASM improvements provided by Dan Ee over in this thread. Actually, looking at it again, he used MinGW64 and gives instructions on how to compile it. What needs to be changed to allow lpbr/lpba > 33?

Also, do you have any thoughts on my question of how to compare the timing (sec/rel) between all my test sieves? Or anyone else, what is the best way to compare these timings? Should I scale the timings the same way I scaled the relations and then perhaps add them up to do a comparison? Or should I just add up the numbers as they are to compare them?
WraithX is offline   Reply With Quote
Old 2013-07-23, 01:47   #44
swellman
 
swellman's Avatar
 
Jun 2012

2×5×172 Posts
Default

Quote:
Originally Posted by WraithX View Post

Also, do you have any thoughts on my question of how to compare the timing (sec/rel) between all my test sieves? Or anyone else, what is the best way to compare these timings? Should I scale the timings the same way I scaled the relations and then perhaps add them up to do a comparison? Or should I just add up the numbers as they are to compare them?
I look to minimize total estimated sieving time, provided the resulting LA fits within my memory limitations. I use the formula

Sieving_time = (total_rels_req) * (sec/rel) / (#_threads)

where in your case total_rels_req is about 400M (for lpba and lpbr of 32) times the scaling factor.
swellman is online now   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
Advice for large SNFS jobs? ryanp Factoring 69 2013-04-30 00:28
doing large NFS jobs on Amazon EC2? ixfd64 Factoring 3 2012-06-06 08:27
Seeking GNFS factoring advice... WraithX Msieve 18 2012-05-20 22:19
need some advice: gnfs C164 from 162126:i4274 Syd Aliquot Sequences 7 2011-03-14 18:35
Filtering on large NFS jobs, particularly 2^908+1 bdodson Factoring 20 2008-11-26 20:45

All times are UTC. The time now is 16:13.

Tue Nov 24 16:13:00 UTC 2020 up 75 days, 13:23, 4 users, load averages: 2.25, 1.90, 1.78

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2020, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.