View Single Post
Old 2009-03-23, 15:42   #7
(loop (#_fork))
fivemack's Avatar
Feb 2006
Cambridge, England

18EE16 Posts

A problem is that PSLLDQ requires the shift amount to be hard-coded at compile time, so you have to use something like the PSHUFB code in my example if you want to do a variable shift (see 'daft SSE restrictions' thread elsewhere)

Decoupling the shift and the conversion seems like a good idea, and the extra parallelism from doing four conversions at once in floats seems useful; I'm just a little concerned that any power of 10 above 10^10 can't be stored exactly in a float (10^22 is the largest that fits exactly in a double), so I'd want to do the multiplication by the larger powers of ten once I'm working in doubles.

It's not relevant in this case because the conversion probably takes longer than determining the index, but I'm reminded of some nice Nehalem string-instruction demo code from Intel which features the absurd line

if (A==16) u+=16; else u+=A;

Not too absurd if your strings are very long: the branch is predicted taken and that lets the OOO machinery run several iterations in parallel.

Last fiddled with by fivemack on 2009-03-23 at 15:44
fivemack is offline   Reply With Quote