Here are a couple other things to keep in mind:
1) for lower-end skylake processors the speedups are not as dramatic, with or without special Mersenne math.
I ran on a 7800x and only saw about 2.1 times speedup for 2^1277-1.
More testing/benchmarking is needed for a variety of avx512 capable processors.
The good news is that this situation will only improve for avx-ecm as time goes on. I plan to implement enhancements as soon I can get my hands on an ice lake cpu or whatever cpu supports AVX-512IFMA.
2) avx-ecm uses fixed-time arithmetic in steps of 208 bits. So a curve at, say, 416 bits takes just as long as a curve at 623 bits.
gmp-ecm on the other hand will adjust the size of its arithmetic in 64-bit steps. This could mean fine-tune adjustments for any crossover math that is performed for given input sizes.
|