20201101, 17:02  #56  
I moo ablest echo power!
May 2013
5·349 Posts 
Quote:


20201101, 17:06  #57 
"Ben"
Feb 2007
2^{2}×859 Posts 
Ah, ok. I have only checked that things work *with* avx512, did not stop to consider that things might break without it. Thanks for the report... stay tuned.

20201101, 17:18  #58 
I moo ablest echo power!
May 2013
5×349 Posts 

20201101, 17:26  #59 
"Ben"
Feb 2007
2^{2}·859 Posts 

20201101, 17:33  #60 
I moo ablest echo power!
May 2013
5·349 Posts 

20201102, 18:49  #61 
"Ben"
Feb 2007
2^{2}×859 Posts 
addmod and submod can also be simplified for Mersenne inputs and this has a measurable improvement... about 3% faster in stage 1 and 7% in stage 2.
Both yafu and avxecm repos have been updated. Also, I've seen that avxecm is about 1020% faster than yafu. I'm not sure why, the only difference that I can see is that yafu has a lot more code around it and I can't see how the overhead would be that large. I've done some testing, but if anyone is able to test either package on finding known Mersenne factors, that'd be great. Last fiddled with by bsquared on 20201102 at 18:50 
20201102, 19:34  #62  
"Ben"
Feb 2007
2^{2}×859 Posts 
Quote:
Given all of the above, for 2^12771, AVXECM is 6.2 times faster than GMPECM now. I hope I have GMPECM configured optimally for this system for a fair comparison... here is what it shows for a run with v: Code:
echo "2^12771"  ~/ecm704linux/ecm v 7000000 GMPECM 7.0.4 [configured with GMP 6.2.0, enableasmredc] [ECM] Tuned for x86_64/k8/params.h Running on cpu Input number is 2^12771 (385 digits) Using special division for factor of 2^12771 Using B1=7000000, B2=17125579390, polynomial Dickson(12), sigma=0:1088740413510573297 dF=16384, k=6, d=158340, d2=11, i0=34 Expected number of curves to find a factor of n digits: 35 40 45 50 55 60 65 70 75 80 167 1024 7351 60402 558826 5744532 6.5e+07 7.8e+08 1.1e+10 2.8e+11 Step 1 took 60676ms 

20201102, 22:15  #63 
"Curtis"
Feb 2005
Riverside, CA
5^{2}×191 Posts 
Which one is which for this comparison? Are both GMPECM and avxecm running the same B2, or is the longer time related to GMPECM running its standard B2 level?
Your stage 1 might be so fast that it calls for running the lesser stage 2 and not using GMPECM at all; I haven't quite decided which way to go for stage 2. Also, if I'm reading the start of this thread correctly, the avxECM speedup on generic smallish (say, 1000 bits or under) speedup is around double GMPECM stage 1 speed, while on special mersenne numbers it's 5x or more? My use cases are M1277 in the short run, and production work around 900 bits with no special form in the long run. Seems maybe M1277 calls for no GMPECM stage 2, while production work may call for using the stage1save option and GMPECM for stage 2. 
20201102, 22:36  #64  
"Ben"
Feb 2007
2^{2}×859 Posts 
Quote:
gmpecm B2 was B2=17125579390 avxecm B2 was 700000000 (24.4 times smaller) This B2 size disparity will grow the larger B1 gets. Also complicating matters is that gmpecm uses the BrentSuyama extension for stage 2, which increases the odds of finding a factor slightly. avxecm doesn't do that. So unfortunately it is a complicated tradeoff that has to consider factorfinding probabilities as well as speed. I don't think there is enough data for avxecm yet to compare probability of finding a factor per unit time (or per curve), compared to gmpecm. Not sure I would even know how to do that analysis if I had the data, although maybe I could figure it out. ATH's data set here did show avxecm finding *more* factors per curve than gmpecm did, but stuff like that will happen for randomized algorithms and small data sets. After thinking about it a bit, I think we could extract the probabilities from gmpecm v, while running with power 1 (no brentsuyama) and forcing B2=100B1. Last fiddled with by bsquared on 20201102 at 22:37 

20201102, 22:47  #65  
"Ben"
Feb 2007
110101101100_{2} Posts 
Quote:
Input number is (2^12771) (385 digits) Using special division for factor of 2^12771 Using B1=10000000000, B2=637998273712822, polynomial Dickson(30), sigma=0:6343133439254916928 dF=4194304, k=3, d=48498450, d2=23, i0=184 Expected number of curves to find a factor of n digits: 35 40 45 50 55 60 65 70 75 80 7 19 52 161 545 1993 7815 32642 144421 673682 Input number is (2^12771) (385 digits) Using special division for factor of 2^12771 Using B1=10000000000, B2=1000000000000, polynomial x^1, sigma=0:11834000506611778745 dF=131072, k=6, d=1345890, d2=11, i0=7420 Expected number of curves to find a factor of n digits: 35 40 45 50 55 60 65 70 75 80 16 44 134 442 1578 6046 24696 106998 489087 2349846 The 75digit number of curves ratio is 3.38, which is essentially a multiplier in favor of gmpecm curves. So if avxecm is 6.2 times higher throughput, but 3.38 times less likely to find a factor at t75, then the net gain for avxecm is about 1.83 for 2^12771. Similar calculations would have to be applied to generic inputs at particular B1's, I think. gmpecm's advantage is its vastly superior stage 2. So the avxecm stage 1 followed by gmpecm stage 2 recipe is probably a winner in a lot of cases. But for 2^12771 it looks like pure avxecm wins. Thoughts? 

20201103, 01:47  #66 
"Curtis"
Feb 2005
Riverside, CA
4775_{10} Posts 
When GMPECM is faster for Stage 2 with a higher B2, there is only advantage from using it. Save time, get more work what's not to like?
The difficult tradeoff would be on larger numbers, when the avxecm stage 2 is likely faster than GMPECM but a smaller B2. That leads to some need for study. 