20101027, 13:02  #1 
Nov 2003
2^{2}·5·373 Posts 
Fast Approximate Ceiling Function
A challenge: a is double.
The generic ceil() function in the C math library is slow. Horribly slow. The following code is much faster, but does not work correctly for positive exact integers: #define iceil(a) (a <= 0.0? (int)a : (int)a + 1) For the application I have, exact integers for a are very very very rare and it does not matter if iceil(a) is wrong by 1. Can anyone find a faster way? Is there anyway to do this without the branch? (this macro gets called many,many,many times) 
20101027, 15:14  #2  
Nov 2003
1D24_{16} Posts 
Quote:
by just computing e.g. (int)(a + .999999) 

20101027, 17:08  #3  
Bamboozled!
"πΊππ·π·π"
May 2003
Down not across
2×5×1,103 Posts 
Quote:
Code:
#define iceil(a) ((int)a + !(*(unsigned* ) &a) >> 63))) This variant allows a to be an arbitrary expression but may not be faster because it has an implicit branch. Code:
#define iceil(a) ((int)(a) + (int (a) < 0.0)) Paul Last fiddled with by xilman on 20101027 at 17:09 Reason: Remove a spurious '\'' character. 

20101029, 18:10  #4 
Tribal Bullet
Oct 2004
3·1,181 Posts 
If you needed a roundtonearest and not roundtonextlarger, and knew the bit size M of a floating point mantissa, then a standard trick for rounding to integer computes ((a + c)  c), where c is 3*2^(M1). The addition will rightjustify and round the mantissa in an FPU register, the subtraction replaces bits to the right of the 'binary point' with zeros. There are a few caveats though:
 you have to make sure the compiler executes the addition and then subtraction, instead of optimizing them away  you have to know the mantissa size. For x86 machines this could be 53 bits or 64 bits, depending on the OS, compiler, whatever else your app is doing, whether the operation uses SSE or x87 floating point, etc. To get around the first problem you can initialize two global variables to c and copy them to stack variables when you need rounding. You can get around the second problem by forcing the x87 precision to 53 bits, so that SSE and nonSSE code will work the same. There is a similar trick for converting floating point to integer: adding 2^M will rightjustify the mantissa without rounding, so you can store to memory and then read the low 32 bits as an integer. The latter is not as necessary now as it was back in the Pentium days because integer conversion is a lot faster on modern x86 machines than it used to be; in the Pentium days the FPU would stall for 6 clocks on an integer conversion, and that was too much for a unit that could churn out an add or multiply every clock. 
20101029, 23:18  #5  
Nov 2003
2^{2}×5×373 Posts 
Quote:
the rounding mode to "nearest". (one can set other modes as well) However, I actually need "ceiling", not "nearest" 

20101030, 00:30  #6 
Oct 2007
2×53 Posts 

20101030, 01:53  #7 
Nov 2003
2^{2}·5·373 Posts 

20101030, 05:41  #8 
Feb 2005
Bristol, CT
513_{10} Posts 
This should work
#define iceil(a) (a <= 0.0? (int)a : a > (int)a? (int)a+1:(int)a) Last fiddled with by WMHalsdorf on 20101030 at 06:36 
20101030, 11:47  #9 
"Forget I exist"
Jul 2009
Dumbassville
20300_{8} Posts 
why not floor() + 1 is that any faster ?

20101030, 12:19  #10 
Tribal Bullet
Oct 2004
3×1,181 Posts 
The problem is that floor() and ceil() internally change the rounding mode and then do the integer conversion. Maybe they also have to check for extreme floating point values like inifinity and NaN. Changing the rounding mode is a pretty slow operation, and it's not necessary to do it every time you call floor() and ceil(), you can do it once and then do the same arithmetic they do for many inputs at once.
Getting rid of those functions and converting 'a' to an integer won't help you either, because  the C language requires integer conversion to always truncate, rather than round to nearest, so to be strictly C compliant the library has to still fiddle with the rounding mode  the CPU has to get the converted result from an FPU register to a CPU register and back (the output of iceil is an integer but later code needs it to be a double). If you are not using SSE2, that means bouncing through memory, doing the comparison and bouncing back into the FPU, which can be extremely slow. With SSE2 you can copy from the SSE2 unit to an integer register directly, but I think even today that operation has a high latency. Maybe the most efficient choice is to figure out how to use roundtonearest and try to correct it afterwards... Last fiddled with by jasonp on 20101030 at 12:32 
20101030, 12:28  #11  
"Forget I exist"
Jul 2009
Dumbassville
10000011000000_{2} Posts 
Quote:
TRUNCATE NUMBER IS LIKE FLOOR THEN JUST +1 LOL sorry caps lock was on. Last fiddled with by jasonp on 20101030 at 12:34 Reason: umm, lol? 

Thread Tools  
Similar Threads  
Thread  Thread Starter  Forum  Replies  Last Post 
Requestion for fast chebyshev theta function  danaj  Computer Science & Computational Number Theory  9  20180331 14:59 
A useful function.  JM Montolio A  Miscellaneous Math  28  20180308 14:29 
Fun with partition function  Batalov  And now for something completely different  24  20180227 17:03 
phi function  rula  Homework Help  3  20170118 01:41 
Do normal adults give themselves an allowance? (...to fast or not to fast  there is no question!)  jasong  jasong  35  20161211 00:57 