- **YAFU**
(*https://www.mersenneforum.org/forumdisplay.php?f=96*)

- - **AVX-ECM**
(*https://www.mersenneforum.org/showthread.php?t=25056*)

[QUOTE=bsquared;561807]Looks like you just need to use more makefile options. You need at least NFS=1 to bring in the nfs stuff and SKYLAKEX=1 for the avx512 vector arithmetic stuff. And if you use SKYLAKEX=1 then you also need USE_AVX2=1. My typical build line looks like this:
make NFS=1 USE_AVX2=1 USE_BMI2=1 SKYLAKEX=1 also, ECM=1 is a msieve thing, not a yafu thing, so you can leave that off.[/QUOTE] Sorry, I should have been clearer. The CPU is an Ivy Bridge-E, so it can't do any of the AVX-ECM stuff (to my understanding). |

[QUOTE=wombatman;561828]Sorry, I should have been clearer. The CPU is an Ivy Bridge-E, so it can't do any of the AVX-ECM stuff (to my understanding).[/QUOTE]
Ah, ok. I have only checked that things work *with* avx512, did not stop to consider that things might break without it. Thanks for the report... stay tuned. |

[QUOTE=bsquared;561829]Ah, ok. I have only checked that things work *with* avx512, did not stop to consider that things might break without it. Thanks for the report... stay tuned.[/QUOTE]
No worries! If it's just a simple makefile change, I'm good with doing that. :smile: |

[QUOTE=bsquared;561829]Ah, ok. I have only checked that things work *with* avx512, did not stop to consider that things might break without it. Thanks for the report... stay tuned.[/QUOTE]
Ok, try r394. |

[QUOTE=bsquared;561832]Ok, try r394.[/QUOTE]
Builds successfully and seems to be working well using factor(rsa(300)) to check both ECM and SIQS (my primary usage). Thanks very much for the quick response! |

addmod and submod can also be simplified for Mersenne inputs and this has a measurable improvement... about 3% faster in stage 1 and 7% in stage 2.
Both yafu and avx-ecm repos have been updated. Also, I've seen that avx-ecm is about 10-20% faster than yafu. I'm not sure why, the only difference that I can see is that yafu has a lot more code around it and I can't see how the overhead would be that large. I've done some testing, but if anyone is able to test either package on finding [URL="https://homes.cerias.purdue.edu/~ssw/cun/pmain1020.txt"]known Mersenne factors[/URL], that'd be great. |

[QUOTE=bsquared;561976]
Also, I've seen that avx-ecm is about 10-20% faster than yafu. I'm not sure why, the only difference that I can see is that yafu has a lot more code around it and I can't see how the overhead would be that large. [/QUOTE] Ok, most of that is explained by different compiler options. avx-ecm was using -O3 and yafu -O2. With -O3, yafu is much better, but avx-ecm is still 5% faster or so. Given all of the above, for 2^1277-1, AVX-ECM is 6.2 times faster than GMP-ECM now. I hope I have GMP-ECM configured optimally for this system for a fair comparison... here is what it shows for a run with -v: [CODE] echo "2^1277-1" | ~/ecm-704-linux/ecm -v 7000000 GMP-ECM 7.0.4 [configured with GMP 6.2.0, --enable-asm-redc] [ECM] Tuned for x86_64/k8/params.h Running on cpu Input number is 2^1277-1 (385 digits) Using special division for factor of 2^1277-1 Using B1=7000000, B2=17125579390, polynomial Dickson(12), sigma=0:1088740413510573297 dF=16384, k=6, d=158340, d2=11, i0=34 Expected number of curves to find a factor of n digits: 35 40 45 50 55 60 65 70 75 80 167 1024 7351 60402 558826 5744532 6.5e+07 7.8e+08 1.1e+10 2.8e+11 Step 1 took 60676ms[/CODE] B2 takes 32 sec vs. 55 sec for 8 curves (@ B2=100*B1) |

[QUOTE=bsquared;561986]
B2 takes 32 sec vs. 55 sec for 8 curves (@ B2=100*B1)[/QUOTE] Which one is which for this comparison? Are both GMP-ECM and avx-ecm running the same B2, or is the longer time related to GMP-ECM running its standard B2 level? Your stage 1 might be so fast that it calls for running the lesser stage 2 and not using GMP-ECM at all; I haven't quite decided which way to go for stage 2. Also, if I'm reading the start of this thread correctly, the avx-ECM speedup on generic smallish (say, 1000 bits or under) speedup is around double GMP-ECM stage 1 speed, while on special mersenne numbers it's 5x or more? My use cases are M1277 in the short run, and production work around 900 bits with no special form in the long run. Seems maybe M1277 calls for no GMP-ECM stage 2, while production work may call for using the stage1-save option and GMP-ECM for stage 2. |

[QUOTE=VBCurtis;562006]Which one is which for this comparison? Are both GMP-ECM and avx-ecm running the same B2, or is the longer time related to GMP-ECM running its standard B2 level?
Your stage 1 might be so fast that it calls for running the lesser stage 2 and not using GMP-ECM at all; I haven't quite decided which way to go for stage 2. Also, if I'm reading the start of this thread correctly, the avx-ECM speedup on generic smallish (say, 1000 bits or under) speedup is around double GMP-ECM stage 1 speed, while on special mersenne numbers it's 5x or more? My use cases are M1277 in the short run, and production work around 900 bits with no special form in the long run. Seems maybe M1277 calls for no GMP-ECM stage 2, while production work may call for using the stage1-save option and GMP-ECM for stage 2.[/QUOTE] gmp-ecm ran its larger stage 2 in 32 seconds while avx-ecm ran its smaller stage 2 in 55 seconds, or 6.8 sec/curve. gmp-ecm B2 was B2=17125579390 avx-ecm B2 was 700000000 (24.4 times smaller) This B2 size disparity will grow the larger B1 gets. Also complicating matters is that gmp-ecm uses the Brent-Suyama extension for stage 2, which increases the odds of finding a factor slightly. avx-ecm doesn't do that. So unfortunately it is a complicated tradeoff that has to consider factor-finding probabilities as well as speed. I don't think there is enough data for avx-ecm yet to compare probability of finding a factor per unit time (or per curve), compared to gmp-ecm. Not sure I would even know how to do that analysis if I had the data, although maybe I could figure it out. ATH's data set [URL="https://www.mersenneforum.org/showpost.php?p=534182&postcount=31"]here [/URL]did show avx-ecm finding *more* factors per curve than gmp-ecm did, but stuff like that will happen for randomized algorithms and small data sets. After thinking about it a bit, I think we could extract the probabilities from gmp-ecm -v, while running with power 1 (no brent-suyama) and forcing B2=100B1. |

[QUOTE=bsquared;562007]
After thinking about it a bit, I think we could extract the probabilities from gmp-ecm -v, while running with power 1 (no brent-suyama) and forcing B2=100B1.[/QUOTE] Here's what gmp-ecm spit out: Input number is (2^1277-1) (385 digits) Using special division for factor of 2^1277-1 Using B1=10000000000, B2=637998273712822, polynomial Dickson(30), sigma=0:6343133439254916928 dF=4194304, k=3, d=48498450, d2=23, i0=184 Expected number of curves to find a factor of n digits: 35 40 45 50 55 60 65 70 75 80 7 19 52 161 545 1993 7815 32642 144421 673682 Input number is (2^1277-1) (385 digits) Using special division for factor of 2^1277-1 Using B1=10000000000, B2=1000000000000, polynomial x^1, sigma=0:11834000506611778745 dF=131072, k=6, d=1345890, d2=11, i0=7420 Expected number of curves to find a factor of n digits: 35 40 45 50 55 60 65 70 75 80 16 44 134 442 1578 6046 24696 106998 489087 2349846 The 75-digit number of curves ratio is 3.38, which is essentially a multiplier in favor of gmp-ecm curves. So if avx-ecm is 6.2 times higher throughput, but 3.38 times less likely to find a factor at t75, then the net gain for avx-ecm is about 1.83 for 2^1277-1. Similar calculations would have to be applied to generic inputs at particular B1's, I think. gmp-ecm's advantage is its vastly superior stage 2. So the avx-ecm stage 1 followed by gmp-ecm stage 2 recipe is probably a winner in a lot of cases. But for 2^1277-1 it looks like pure avx-ecm wins. Thoughts? |

When GMP-ECM is faster for Stage 2 with a higher B2, there is only advantage from using it. Save time, get more work- what's not to like?
The difficult tradeoff would be on larger numbers, when the avx-ecm stage 2 is likely faster than GMP-ECM but a smaller B2. That leads to some need for study. |

All times are UTC. The time now is 20:26. |

Powered by vBulletin® Version 3.8.11

Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.