mersenneforum.org  

Go Back   mersenneforum.org > Factoring Projects > YAFU

Reply
 
Thread Tools
Old 2020-11-01, 17:02   #56
wombatman
I moo ablest echo power!
 
wombatman's Avatar
 
May 2013

1,741 Posts
Default

Quote:
Originally Posted by bsquared View Post
Looks like you just need to use more makefile options. You need at least NFS=1 to bring in the nfs stuff and SKYLAKEX=1 for the avx512 vector arithmetic stuff. And if you use SKYLAKEX=1 then you also need USE_AVX2=1. My typical build line looks like this:

make NFS=1 USE_AVX2=1 USE_BMI2=1 SKYLAKEX=1

also, ECM=1 is a msieve thing, not a yafu thing, so you can leave that off.
Sorry, I should have been clearer. The CPU is an Ivy Bridge-E, so it can't do any of the AVX-ECM stuff (to my understanding).
wombatman is offline   Reply With Quote
Old 2020-11-01, 17:06   #57
bsquared
 
bsquared's Avatar
 
"Ben"
Feb 2007

3,371 Posts
Default

Quote:
Originally Posted by wombatman View Post
Sorry, I should have been clearer. The CPU is an Ivy Bridge-E, so it can't do any of the AVX-ECM stuff (to my understanding).
Ah, ok. I have only checked that things work *with* avx512, did not stop to consider that things might break without it. Thanks for the report... stay tuned.
bsquared is offline   Reply With Quote
Old 2020-11-01, 17:18   #58
wombatman
I moo ablest echo power!
 
wombatman's Avatar
 
May 2013

1,741 Posts
Default

Quote:
Originally Posted by bsquared View Post
Ah, ok. I have only checked that things work *with* avx512, did not stop to consider that things might break without it. Thanks for the report... stay tuned.
No worries! If it's just a simple makefile change, I'm good with doing that.
wombatman is offline   Reply With Quote
Old 2020-11-01, 17:26   #59
bsquared
 
bsquared's Avatar
 
"Ben"
Feb 2007

337110 Posts
Default

Quote:
Originally Posted by bsquared View Post
Ah, ok. I have only checked that things work *with* avx512, did not stop to consider that things might break without it. Thanks for the report... stay tuned.
Ok, try r394.
bsquared is offline   Reply With Quote
Old 2020-11-01, 17:33   #60
wombatman
I moo ablest echo power!
 
wombatman's Avatar
 
May 2013

1,741 Posts
Default

Quote:
Originally Posted by bsquared View Post
Ok, try r394.
Builds successfully and seems to be working well using factor(rsa(300)) to check both ECM and SIQS (my primary usage). Thanks very much for the quick response!
wombatman is offline   Reply With Quote
Old 2020-11-02, 18:49   #61
bsquared
 
bsquared's Avatar
 
"Ben"
Feb 2007

337110 Posts
Default

addmod and submod can also be simplified for Mersenne inputs and this has a measurable improvement... about 3% faster in stage 1 and 7% in stage 2.

Both yafu and avx-ecm repos have been updated.

Also, I've seen that avx-ecm is about 10-20% faster than yafu. I'm not sure why, the only difference that I can see is that yafu has a lot more code around it and I can't see how the overhead would be that large.

I've done some testing, but if anyone is able to test either package on finding known Mersenne factors, that'd be great.

Last fiddled with by bsquared on 2020-11-02 at 18:50
bsquared is offline   Reply With Quote
Old 2020-11-02, 19:34   #62
bsquared
 
bsquared's Avatar
 
"Ben"
Feb 2007

3,371 Posts
Default

Quote:
Originally Posted by bsquared View Post
Also, I've seen that avx-ecm is about 10-20% faster than yafu. I'm not sure why, the only difference that I can see is that yafu has a lot more code around it and I can't see how the overhead would be that large.
Ok, most of that is explained by different compiler options. avx-ecm was using -O3 and yafu -O2. With -O3, yafu is much better, but avx-ecm is still 5% faster or so.

Given all of the above, for 2^1277-1, AVX-ECM is 6.2 times faster than GMP-ECM now. I hope I have GMP-ECM configured optimally for this system for a fair comparison... here is what it shows for a run with -v:

Code:
 echo "2^1277-1" | ~/ecm-704-linux/ecm -v 7000000
GMP-ECM 7.0.4 [configured with GMP 6.2.0, --enable-asm-redc] [ECM]
Tuned for x86_64/k8/params.h
Running on cpu
Input number is 2^1277-1 (385 digits)
Using special division for factor of 2^1277-1
Using B1=7000000, B2=17125579390, polynomial Dickson(12), sigma=0:1088740413510573297
dF=16384, k=6, d=158340, d2=11, i0=34
Expected number of curves to find a factor of n digits:
35      40      45      50      55      60      65      70      75      80
167     1024    7351    60402   558826  5744532 6.5e+07 7.8e+08 1.1e+10 2.8e+11
Step 1 took 60676ms
B2 takes 32 sec vs. 55 sec for 8 curves (@ B2=100*B1)
bsquared is offline   Reply With Quote
Old 2020-11-02, 22:15   #63
VBCurtis
 
VBCurtis's Avatar
 
"Curtis"
Feb 2005
Riverside, CA

52·11·17 Posts
Default

Quote:
Originally Posted by bsquared View Post
B2 takes 32 sec vs. 55 sec for 8 curves (@ B2=100*B1)
Which one is which for this comparison? Are both GMP-ECM and avx-ecm running the same B2, or is the longer time related to GMP-ECM running its standard B2 level?

Your stage 1 might be so fast that it calls for running the lesser stage 2 and not using GMP-ECM at all; I haven't quite decided which way to go for stage 2.

Also, if I'm reading the start of this thread correctly, the avx-ECM speedup on generic smallish (say, 1000 bits or under) speedup is around double GMP-ECM stage 1 speed, while on special mersenne numbers it's 5x or more?

My use cases are M1277 in the short run, and production work around 900 bits with no special form in the long run. Seems maybe M1277 calls for no GMP-ECM stage 2, while production work may call for using the stage1-save option and GMP-ECM for stage 2.
VBCurtis is offline   Reply With Quote
Old 2020-11-02, 22:36   #64
bsquared
 
bsquared's Avatar
 
"Ben"
Feb 2007

3,371 Posts
Default

Quote:
Originally Posted by VBCurtis View Post
Which one is which for this comparison? Are both GMP-ECM and avx-ecm running the same B2, or is the longer time related to GMP-ECM running its standard B2 level?

Your stage 1 might be so fast that it calls for running the lesser stage 2 and not using GMP-ECM at all; I haven't quite decided which way to go for stage 2.

Also, if I'm reading the start of this thread correctly, the avx-ECM speedup on generic smallish (say, 1000 bits or under) speedup is around double GMP-ECM stage 1 speed, while on special mersenne numbers it's 5x or more?

My use cases are M1277 in the short run, and production work around 900 bits with no special form in the long run. Seems maybe M1277 calls for no GMP-ECM stage 2, while production work may call for using the stage1-save option and GMP-ECM for stage 2.
gmp-ecm ran its larger stage 2 in 32 seconds while avx-ecm ran its smaller stage 2 in 55 seconds, or 6.8 sec/curve.
gmp-ecm B2 was B2=17125579390
avx-ecm B2 was 700000000 (24.4 times smaller)

This B2 size disparity will grow the larger B1 gets.
Also complicating matters is that gmp-ecm uses the Brent-Suyama extension for stage 2, which increases the odds of finding a factor slightly. avx-ecm doesn't do that.

So unfortunately it is a complicated tradeoff that has to consider factor-finding probabilities as well as speed. I don't think there is enough data for avx-ecm yet to compare probability of finding a factor per unit time (or per curve), compared to gmp-ecm. Not sure I would even know how to do that analysis if I had the data, although maybe I could figure it out.

ATH's data set here did show avx-ecm finding *more* factors per curve than gmp-ecm did, but stuff like that will happen for randomized algorithms and small data sets.

After thinking about it a bit, I think we could extract the probabilities from gmp-ecm -v, while running with power 1 (no brent-suyama) and forcing B2=100B1.

Last fiddled with by bsquared on 2020-11-02 at 22:37
bsquared is offline   Reply With Quote
Old 2020-11-02, 22:47   #65
bsquared
 
bsquared's Avatar
 
"Ben"
Feb 2007

64538 Posts
Default

Quote:
Originally Posted by bsquared View Post
After thinking about it a bit, I think we could extract the probabilities from gmp-ecm -v, while running with power 1 (no brent-suyama) and forcing B2=100B1.
Here's what gmp-ecm spit out:

Input number is (2^1277-1) (385 digits)
Using special division for factor of 2^1277-1
Using B1=10000000000, B2=637998273712822, polynomial Dickson(30), sigma=0:6343133439254916928
dF=4194304, k=3, d=48498450, d2=23, i0=184
Expected number of curves to find a factor of n digits:
35 40 45 50 55 60 65 70 75 80
7 19 52 161 545 1993 7815 32642 144421 673682


Input number is (2^1277-1) (385 digits)
Using special division for factor of 2^1277-1
Using B1=10000000000, B2=1000000000000, polynomial x^1, sigma=0:11834000506611778745
dF=131072, k=6, d=1345890, d2=11, i0=7420
Expected number of curves to find a factor of n digits:
35 40 45 50 55 60 65 70 75 80
16 44 134 442 1578 6046 24696 106998 489087 2349846

The 75-digit number of curves ratio is 3.38, which is essentially a multiplier in favor of gmp-ecm curves.

So if avx-ecm is 6.2 times higher throughput, but 3.38 times less likely to find a factor at t75, then the net gain for avx-ecm is about 1.83 for 2^1277-1.

Similar calculations would have to be applied to generic inputs at particular B1's, I think. gmp-ecm's advantage is its vastly superior stage 2. So the avx-ecm stage 1 followed by gmp-ecm stage 2 recipe is probably a winner in a lot of cases. But for 2^1277-1 it looks like pure avx-ecm wins.

Thoughts?
bsquared is offline   Reply With Quote
Old 2020-11-03, 01:47   #66
VBCurtis
 
VBCurtis's Avatar
 
"Curtis"
Feb 2005
Riverside, CA

52·11·17 Posts
Default

When GMP-ECM is faster for Stage 2 with a higher B2, there is only advantage from using it. Save time, get more work- what's not to like?

The difficult tradeoff would be on larger numbers, when the avx-ecm stage 2 is likely faster than GMP-ECM but a smaller B2. That leads to some need for study.
VBCurtis is offline   Reply With Quote
Reply

Thread Tools


All times are UTC. The time now is 08:56.

Tue Mar 2 08:56:36 UTC 2021 up 89 days, 5:07, 0 users, load averages: 0.80, 0.91, 1.02

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.