mersenneforum.org Fast Approximate Ceiling Function
 Register FAQ Search Today's Posts Mark Forums Read

 2010-10-27, 13:02 #1 R.D. Silverman     Nov 2003 22·5·373 Posts Fast Approximate Ceiling Function A challenge: a is double. The generic ceil() function in the C math library is slow. Horribly slow. The following code is much faster, but does not work correctly for positive exact integers: #define iceil(a) (a <= 0.0? (int)a : (int)a + 1) For the application I have, exact integers for a are very very very rare and it does not matter if iceil(a) is wrong by 1. Can anyone find a faster way? Is there anyway to do this without the branch? (this macro gets called many,many,many times)
2010-10-27, 15:14   #2
R.D. Silverman

Nov 2003

1D2416 Posts

Quote:
 Originally Posted by R.D. Silverman A challenge: a is double. The generic ceil() function in the C math library is slow. Horribly slow. The following code is much faster, but does not work correctly for positive exact integers: #define iceil(a) (a <= 0.0? (int)a : (int)a + 1) For the application I have, exact integers for a are very very very rare and it does not matter if iceil(a) is wrong by 1. Can anyone find a faster way? Is there anyway to do this without the branch? (this macro gets called many,many,many times)
I may just choose to accept a small number of additional inaccuracies
by just computing e.g. (int)(a + .999999)

2010-10-27, 17:08   #3
xilman
Bamboozled!

"𒉺𒌌𒇷𒆷𒀭"
May 2003
Down not across

2×5×1,103 Posts

Quote:
 Originally Posted by R.D. Silverman A challenge: a is double. The generic ceil() function in the C math library is slow. Horribly slow. The following code is much faster, but does not work correctly for positive exact integers: #define iceil(a) (a <= 0.0? (int)a : (int)a + 1) For the application I have, exact integers for a are very very very rare and it does not matter if iceil(a) is wrong by 1. Can anyone find a faster way? Is there anyway to do this without the branch? (this macro gets called many,many,many times)
I'm assuming your variable a is a 64-bit double. If it's a 32-bit float you need to change the shift count in the macro below to 31. I'm also assuming IEEE floating point representation, so the sign bit is stored in the MSB of a floating point variable.
Code:
#define iceil(a) ((int)a + !(*(unsigned* ) &a) >> 63)))
Note that this will not work if a is anything but a variable. Whether it is faster than the conditional expression is open to experiment.

This variant allows a to be an arbitrary expression but may not be faster because it has an implicit branch.
Code:
#define iceil(a) ((int)(a) + (int (a) < 0.0))
Your macro doesn't work if a is an expression but is easily fixed with the addition of some parentheses.

Paul

Last fiddled with by xilman on 2010-10-27 at 17:09 Reason: Remove a spurious '\'' character.

 2010-10-29, 18:10 #4 jasonp Tribal Bullet     Oct 2004 3·1,181 Posts If you needed a round-to-nearest and not round-to-next-larger, and knew the bit size M of a floating point mantissa, then a standard trick for rounding to integer computes ((a + c) - c), where c is 3*2^(M-1). The addition will right-justify and round the mantissa in an FPU register, the subtraction replaces bits to the right of the 'binary point' with zeros. There are a few caveats though: - you have to make sure the compiler executes the addition and then subtraction, instead of optimizing them away - you have to know the mantissa size. For x86 machines this could be 53 bits or 64 bits, depending on the OS, compiler, whatever else your app is doing, whether the operation uses SSE or x87 floating point, etc. To get around the first problem you can initialize two global variables to c and copy them to stack variables when you need rounding. You can get around the second problem by forcing the x87 precision to 53 bits, so that SSE and non-SSE code will work the same. There is a similar trick for converting floating point to integer: adding 2^M will right-justify the mantissa without rounding, so you can store to memory and then read the low 32 bits as an integer. The latter is not as necessary now as it was back in the Pentium days because integer conversion is a lot faster on modern x86 machines than it used to be; in the Pentium days the FPU would stall for 6 clocks on an integer conversion, and that was too much for a unit that could churn out an add or multiply every clock.
2010-10-29, 23:18   #5
R.D. Silverman

Nov 2003

22×5×373 Posts

Quote:
 Originally Posted by jasonp If you needed a round-to-nearest and not round-to-next-larger, and knew the bit size M of a floating point mantissa, then a standard trick for rounding to integer computes ((a + c) - c), where c is 3*2^(M-1). The addition will right-justify and round the mantissa in an FPU register, the subtraction replaces bits to the right of the 'binary point' with zeros. There are a few caveats though: - you have to make sure the compiler executes the addition and then subtraction, instead of optimizing them away - you have to know the mantissa size. For x86 machines this could be 53 bits or 64 bits, depending on the OS, compiler, whatever else your app is doing, whether the operation uses SSE or x87 floating point, etc. To get around the first problem you can initialize two global variables to c and copy them to stack variables when you need rounding. You can get around the second problem by forcing the x87 precision to 53 bits, so that SSE and non-SSE code will work the same. There is a similar trick for converting floating point to integer: adding 2^M will right-justify the mantissa without rounding, so you can store to memory and then read the low 32 bits as an integer. The latter is not as necessary now as it was back in the Pentium days because integer conversion is a lot faster on modern x86 machines than it used to be; in the Pentium days the FPU would stall for 6 clocks on an integer conversion, and that was too much for a unit that could churn out an add or multiply every clock.
The Pentium/Core-2 also has an instruction that allows the FPU to set
the rounding mode to "nearest". (one can set other modes as well)

However, I actually need "ceiling", not "nearest"

2010-10-30, 00:30   #6
Robert Holmes

Oct 2007

2×53 Posts

Quote:
 Originally Posted by R.D. Silverman The Pentium/Core-2 also has an instruction that allows the FPU to set the rounding mode to "nearest". (one can set other modes as well) However, I actually need "ceiling", not "nearest"
Wouldn't nearest(x + 0.4999...) do the ceiling?

2010-10-30, 01:53   #7
R.D. Silverman

Nov 2003

22·5·373 Posts

Quote:
 Originally Posted by Robert Holmes Wouldn't nearest(x + 0.4999...) do the ceiling?
Yes, but there is a latency associated with setting the rounding mode.
And if you want to intermingle other FP computations you constantly
set/reset the rounding mode. I've tried it. It is slow.

 2010-10-30, 05:41 #8 WMHalsdorf     Feb 2005 Bristol, CT 51310 Posts This should work #define iceil(a) (a <= 0.0? (int)a : a > (int)a? (int)a+1:(int)a) Last fiddled with by WMHalsdorf on 2010-10-30 at 06:36
 2010-10-30, 11:47 #9 science_man_88     "Forget I exist" Jul 2009 Dumbassville 203008 Posts why not floor() + 1 is that any faster ?
 2010-10-30, 12:19 #10 jasonp Tribal Bullet     Oct 2004 3×1,181 Posts The problem is that floor() and ceil() internally change the rounding mode and then do the integer conversion. Maybe they also have to check for extreme floating point values like inifinity and NaN. Changing the rounding mode is a pretty slow operation, and it's not necessary to do it every time you call floor() and ceil(), you can do it once and then do the same arithmetic they do for many inputs at once. Getting rid of those functions and converting 'a' to an integer won't help you either, because - the C language requires integer conversion to always truncate, rather than round to nearest, so to be strictly C compliant the library has to still fiddle with the rounding mode - the CPU has to get the converted result from an FPU register to a CPU register and back (the output of iceil is an integer but later code needs it to be a double). If you are not using SSE2, that means bouncing through memory, doing the comparison and bouncing back into the FPU, which can be extremely slow. With SSE2 you can copy from the SSE2 unit to an integer register directly, but I think even today that operation has a high latency. Maybe the most efficient choice is to figure out how to use round-to-nearest and try to correct it afterwards... Last fiddled with by jasonp on 2010-10-30 at 12:32
2010-10-30, 12:28   #11
science_man_88

"Forget I exist"
Jul 2009
Dumbassville

100000110000002 Posts

Quote:
 Originally Posted by jasonp The problem with all of those choices is that floor() and ceil() internally change the rounding mode and then do the integer conversion. Changing the rounding mode is a pretty slow operation, and it's not necessary to do it every time you call floor() and ceil(), you can do it once and then do the same arithmetic they do for many inputs at once. Getting rid of those functions and converting 'a' to an integer won't help you either, because - the C language requires integer conversion to always truncate, rather than round to nearest, so to be strictly C compliant the library has to still fiddle with the rounding mode - the CPU has to get the converted result from an FPU register to a CPU register and back (the output of iceil is an integer but later code needs it to be a double). If you are not using SSE2, that means bouncing through memory, doing the comparison and bouncing back into the FPU, which can be extremely slow. With SSE2 you can copy from the SSE2 unit to an integer register directly, but I think even today that operation has a high latency.

TRUNCATE NUMBER IS LIKE FLOOR THEN JUST +1 LOL sorry caps lock was on.

Last fiddled with by jasonp on 2010-10-30 at 12:34 Reason: umm, lol?

 Similar Threads Thread Thread Starter Forum Replies Last Post danaj Computer Science & Computational Number Theory 9 2018-03-31 14:59 JM Montolio A Miscellaneous Math 28 2018-03-08 14:29 Batalov And now for something completely different 24 2018-02-27 17:03 rula Homework Help 3 2017-01-18 01:41 jasong jasong 35 2016-12-11 00:57

All times are UTC. The time now is 02:52.

Mon Dec 6 02:52:43 UTC 2021 up 135 days, 21:21, 0 users, load averages: 2.04, 1.43, 1.31