Looking back over some historical benchmarks (time in seconds for sequence of inputs), 2.06 is a pretty big speedup for AVX2 systems, for either MSVC windows or linux builds. Still hoping it is also more stable across compilers/cpus.
AVX2 linux, icc (Xeon E5-2697 v3):
Code:
version c60 c65 c70 c75 c80 c85 c90 c95
v1.34.5 2.35 6.9 14.8 52.4 104.8 294 3132
v1.35.0 2.2 6.4 12.9 43.6 89.8 262 1037 2912
v2.05 1.83 5.66 11 37 76.6 229
v2.06 1.69 5.48 11.2 35.7 69.5 208 794 2210
AVX2 Windows, MSVC19 (Xeon 6134):
Code:
version c60 c65 c70 c75 c80 c85
v1.34.5 2.54 7.42 15.1 53.8 111.4 311
v2.05 1.93 6.98 14.3 51.7 100.6 314.2
v2.06 1.89 6.11 12.2 41.5 82.1 240.5
And for comparison, with all available instruction set enhancements (AVX512BW, AVX512VL, AVX512F):
Code:
version c60 c65 c70 c75 c80 c85 c90 c95
v2.06 1.16 3.44 6.64 21.9 44.5 135.1 516 1516