View Single Post
Old 2009-08-24, 23:26   #9
Jul 2009

31 Posts

Originally Posted by ewmayer View Post
Note that should be able to get at least a 2-3x speedup on 64-bit x86-style (and most RISC) architectures (that is, ones running under a 64-bit OS as well) from using the full 64x64 -> 128-bit hardware integer MUL instruction. That "wastes" some bits for the high parts of the multiword products (e.g. the 32x64-bit subproducts), but fullness-of-bitfield-aesthetics is not the name of the game here.
Well the fun part is that I'm doing this on a new architecture: a GPU! And there's arguments to use a 24 bit word size.. there's hardware support for using the FPU to do integer math with 24 bit words (and you can pull the full 48 bit result out). But 32 bits and 64 bits are both also supported in hardware, but each have slower behavior as expected. It's hard to judge the fastest method to do bigint math, so my approach is to try them all and measure!
SPWorley is offline   Reply With Quote