Thread: mtsieve
View Single Post
Old 2022-06-06, 15:39   #650
rogue's Avatar
Apr 2003
Between here and the

668110 Posts

Originally Posted by Happy5214 View Post
I assume that's both an algorithmic and processor unit advantage (Montgomery arithmetic with integers on the faster ALU vs. a more straightforward implementation with floats on the slower FPU, or whatever unit handles SSE). I compiled a few of the Montgomery functions to ASM on x86 just to see how much optimization GCC does on them with -O2, and at least on REDC and mul I couldn't find anywhere to hand-optimize. I imagine ARM would be similar, though I'd have to get my ODROID set up again (we switched to a new ISP and that messed up the wiring setup) to know for sure.
If you want to do some comparisons on your own, take a look at the MpArith.h class and it uses. MpArithVec.h is the same, but does 4 at a time. I expect -O2/-O3 to help greatly when using that.
rogue is offline   Reply With Quote