![]() |
![]() |
#56 | |
I moo ablest echo power!
May 2013
6CD16 Posts |
![]() Quote:
|
|
![]() |
![]() |
![]() |
#57 |
"Ben"
Feb 2007
D2416 Posts |
![]()
Ah, ok. I have only checked that things work *with* avx512, did not stop to consider that things might break without it. Thanks for the report... stay tuned.
|
![]() |
![]() |
![]() |
#58 |
I moo ablest echo power!
May 2013
1,741 Posts |
![]() |
![]() |
![]() |
![]() |
#59 |
"Ben"
Feb 2007
22·292 Posts |
![]() |
![]() |
![]() |
![]() |
#60 |
I moo ablest echo power!
May 2013
6CD16 Posts |
![]() |
![]() |
![]() |
![]() |
#61 |
"Ben"
Feb 2007
336410 Posts |
![]()
addmod and submod can also be simplified for Mersenne inputs and this has a measurable improvement... about 3% faster in stage 1 and 7% in stage 2.
Both yafu and avx-ecm repos have been updated. Also, I've seen that avx-ecm is about 10-20% faster than yafu. I'm not sure why, the only difference that I can see is that yafu has a lot more code around it and I can't see how the overhead would be that large. I've done some testing, but if anyone is able to test either package on finding known Mersenne factors, that'd be great. Last fiddled with by bsquared on 2020-11-02 at 18:50 |
![]() |
![]() |
![]() |
#62 | |
"Ben"
Feb 2007
336410 Posts |
![]() Quote:
Given all of the above, for 2^1277-1, AVX-ECM is 6.2 times faster than GMP-ECM now. I hope I have GMP-ECM configured optimally for this system for a fair comparison... here is what it shows for a run with -v: Code:
echo "2^1277-1" | ~/ecm-704-linux/ecm -v 7000000 GMP-ECM 7.0.4 [configured with GMP 6.2.0, --enable-asm-redc] [ECM] Tuned for x86_64/k8/params.h Running on cpu Input number is 2^1277-1 (385 digits) Using special division for factor of 2^1277-1 Using B1=7000000, B2=17125579390, polynomial Dickson(12), sigma=0:1088740413510573297 dF=16384, k=6, d=158340, d2=11, i0=34 Expected number of curves to find a factor of n digits: 35 40 45 50 55 60 65 70 75 80 167 1024 7351 60402 558826 5744532 6.5e+07 7.8e+08 1.1e+10 2.8e+11 Step 1 took 60676ms |
|
![]() |
![]() |
![]() |
#63 |
"Curtis"
Feb 2005
Riverside, CA
23×3×193 Posts |
![]()
Which one is which for this comparison? Are both GMP-ECM and avx-ecm running the same B2, or is the longer time related to GMP-ECM running its standard B2 level?
Your stage 1 might be so fast that it calls for running the lesser stage 2 and not using GMP-ECM at all; I haven't quite decided which way to go for stage 2. Also, if I'm reading the start of this thread correctly, the avx-ECM speedup on generic smallish (say, 1000 bits or under) speedup is around double GMP-ECM stage 1 speed, while on special mersenne numbers it's 5x or more? My use cases are M1277 in the short run, and production work around 900 bits with no special form in the long run. Seems maybe M1277 calls for no GMP-ECM stage 2, while production work may call for using the stage1-save option and GMP-ECM for stage 2. |
![]() |
![]() |
![]() |
#64 | |
"Ben"
Feb 2007
22×292 Posts |
![]() Quote:
gmp-ecm B2 was B2=17125579390 avx-ecm B2 was 700000000 (24.4 times smaller) This B2 size disparity will grow the larger B1 gets. Also complicating matters is that gmp-ecm uses the Brent-Suyama extension for stage 2, which increases the odds of finding a factor slightly. avx-ecm doesn't do that. So unfortunately it is a complicated tradeoff that has to consider factor-finding probabilities as well as speed. I don't think there is enough data for avx-ecm yet to compare probability of finding a factor per unit time (or per curve), compared to gmp-ecm. Not sure I would even know how to do that analysis if I had the data, although maybe I could figure it out. ATH's data set here did show avx-ecm finding *more* factors per curve than gmp-ecm did, but stuff like that will happen for randomized algorithms and small data sets. After thinking about it a bit, I think we could extract the probabilities from gmp-ecm -v, while running with power 1 (no brent-suyama) and forcing B2=100B1. Last fiddled with by bsquared on 2020-11-02 at 22:37 |
|
![]() |
![]() |
![]() |
#65 | |
"Ben"
Feb 2007
22×292 Posts |
![]() Quote:
Input number is (2^1277-1) (385 digits) Using special division for factor of 2^1277-1 Using B1=10000000000, B2=637998273712822, polynomial Dickson(30), sigma=0:6343133439254916928 dF=4194304, k=3, d=48498450, d2=23, i0=184 Expected number of curves to find a factor of n digits: 35 40 45 50 55 60 65 70 75 80 7 19 52 161 545 1993 7815 32642 144421 673682 Input number is (2^1277-1) (385 digits) Using special division for factor of 2^1277-1 Using B1=10000000000, B2=1000000000000, polynomial x^1, sigma=0:11834000506611778745 dF=131072, k=6, d=1345890, d2=11, i0=7420 Expected number of curves to find a factor of n digits: 35 40 45 50 55 60 65 70 75 80 16 44 134 442 1578 6046 24696 106998 489087 2349846 The 75-digit number of curves ratio is 3.38, which is essentially a multiplier in favor of gmp-ecm curves. So if avx-ecm is 6.2 times higher throughput, but 3.38 times less likely to find a factor at t75, then the net gain for avx-ecm is about 1.83 for 2^1277-1. Similar calculations would have to be applied to generic inputs at particular B1's, I think. gmp-ecm's advantage is its vastly superior stage 2. So the avx-ecm stage 1 followed by gmp-ecm stage 2 recipe is probably a winner in a lot of cases. But for 2^1277-1 it looks like pure avx-ecm wins. Thoughts? |
|
![]() |
![]() |
![]() |
#66 |
"Curtis"
Feb 2005
Riverside, CA
121816 Posts |
![]()
When GMP-ECM is faster for Stage 2 with a higher B2, there is only advantage from using it. Save time, get more work- what's not to like?
The difficult tradeoff would be on larger numbers, when the avx-ecm stage 2 is likely faster than GMP-ECM but a smaller B2. That leads to some need for study. |
![]() |
![]() |