mersenneforum.org  

Go Back   mersenneforum.org > Extra Stuff > Programming

Reply
 
Thread Tools
Old 2005-07-25, 17:10   #1
R.D. Silverman
 
R.D. Silverman's Avatar
 
Nov 2003

22×5×373 Posts
Question Another challenge

The following macros are (on the Pentium) 9 times faster than
calling the library ceil() and floor() functions. a is a double.
Note also that |a| < 16K.

#define iceil(a) ((a) <= 0.0 ? (int)(a) : (int)(a) + 1)
#define ifloor(a) ((a) >= 0.0 ? (int)(a) : (int)(a) - 1)

Can anyone do better, by perhaps eliminating the branches?
Note that these are very slightly wrong. If a > 0 and a is an exact
integer, iceil returns a+1, instead of a. But this is OK, as long as the
macro is fast.

R.D. Silverman is offline   Reply With Quote
Old 2005-07-25, 17:40   #2
akruppa
 
akruppa's Avatar
 
"Nancy"
Aug 2002
Alexandria

1001101000112 Posts
Default

The fastest way should be using the fistp opcode which uses the currently set rounding direction, which is round-to-nearest by default. Changing rounding modes in the fpu causes the pipeline to be flushed which imposes a huge penalty. This occurs for *each* type-cast to int as those are defined as truncate (round to zero)!

Try

static inline int
dtoi (double d)
{
int i;
__asm__ ("fistpl %0"
: "=m" (i) : "t" (d) : "st");
return i;
}

which rounds to nearest. Adding/subtracting 0.5 should be quite fast so you'll get the rounding direction you want. You should also be able to use fesetround() (C99) to change the default rounding mode so you don't have to add a constant for your preferred rounding direction.

Alex

Last fiddled with by akruppa on 2005-07-25 at 17:44
akruppa is offline   Reply With Quote
Old 2005-07-25, 17:58   #3
R.D. Silverman
 
R.D. Silverman's Avatar
 
Nov 2003

22×5×373 Posts
Default

Quote:
Originally Posted by akruppa
The fastest way should be using the fistp opcode which uses the currently set rounding direction, which is round-to-nearest by default. Changing rounding modes in the fpu causes the pipeline to be flushed which imposes a huge penalty. This occurs for *each* type-cast to int as those are defined as truncate (round to zero)!

Try

static inline int
dtoi (double d)
{
int i;
__asm__ ("fistpl %0"
: "=m" (i) : "t" (d) : "st");
return i;
}

which rounds to nearest. Adding/subtracting 0.5 should be quite fast so you'll get the rounding direction you want. You should also be able to use fesetround() (C99) to change the default rounding mode so you don't have to add a constant for your preferred rounding direction.

Alex
I presume that %0 is the first arg in the subroutine argument list.
But I am not familiar with this _asm syntax. What are the items in
double quotes? And the colons?

I will try this using VC++ _asm
R.D. Silverman is offline   Reply With Quote
Old 2005-07-25, 18:15   #4
R.D. Silverman
 
R.D. Silverman's Avatar
 
Nov 2003

22×5×373 Posts
Thumbs up

Quote:
Originally Posted by akruppa
The fastest way should be using the fistp opcode which uses the currently set rounding direction, which is round-to-nearest by default. Changing rounding modes in the fpu causes the pipeline to be flushed which imposes a huge penalty. This occurs for *each* type-cast to int as those are defined as truncate (round to zero)!

Try

static inline int
dtoi (double d)
{
int i;
__asm__ ("fistpl %0"
: "=m" (i) : "t" (d) : "st");
return i;
}

which rounds to nearest. Adding/subtracting 0.5 should be quite fast so you'll get the rounding direction you want. You should also be able to use fesetround() (C99) to change the default rounding mode so you don't have to add a constant for your preferred rounding direction.

Alex

For VC++ I need to move d into a floating point register. Do you know
which register fistl acts upon? I guess I'll have to check my P IV
instruction set documentation.

BTW, this isn't very portable. But you know that.
R.D. Silverman is offline   Reply With Quote
Old 2005-07-25, 18:30   #5
akruppa
 
akruppa's Avatar
 
"Nancy"
Aug 2002
Alexandria

1001101000112 Posts
Default

>Do you know which register fistl acts upon?

It pops a value from the fpu stack. My inline function is inteded for gcc, I don't know if VC will like it. I have no way of testing, though.

>BTW, this isn't very portable. But you know that.

True, it isn't. I've tinkered with this problem for the interface between GMP-ECM and GWNUM a little, this was the fastest I solution found.

There is an lrint() function defined in C99 which converts a double to an int, using the currently set rounding direction. When setting the proper #defines, GNU's math.h defines lrint() as a macro that uses fistp again. Without the proper #defines, it uses a library call - faster than the typecast with rounding-mode-change, but of course much slower than the fistp opcode.

In the end, I just used my own fistp wrapper directly instead of fiddling with headers - the GWNUM interface code will only run on x86 so portability is not an issue for me. If it is for you, it may be worthwhile to check what it takes to make lrint() use the fistp opcode.

Alex
akruppa is offline   Reply With Quote
Old 2005-07-25, 18:37   #6
akruppa
 
akruppa's Avatar
 
"Nancy"
Aug 2002
Alexandria

46438 Posts
Default

>I presume that %0 is the first arg in the subroutine argument list.
>But I am not familiar with this _asm syntax. What are the items in
>double quotes? And the colons?

The "%0" refers to the first operand of the asm block, which is "=m" (i). This means output "=" to memory location "m" of variable i. The fistp opcode needs to store to memory, it cannot write to registers.

The "t" (d) tells gcc that the asm block expects the value of variable d in the top "t" of the fpu stack.

The "st" tells that the top value of the fpu stack will be popped.

The colons separate the output : input : flags sections of the operands.

There is also a fist opcode that does not pop the value off the stack. Might be useful if you want to immediately reuse that value.

Alex

Last fiddled with by akruppa on 2005-07-25 at 18:38
akruppa is offline   Reply With Quote
Old 2005-07-25, 19:41   #7
R.D. Silverman
 
R.D. Silverman's Avatar
 
Nov 2003

22×5×373 Posts
Default

Quote:
Originally Posted by akruppa
>I presume that %0 is the first arg in the subroutine argument list.
>But I am not familiar with this _asm syntax. What are the items in
>double quotes? And the colons?

The "%0" refers to the first operand of the asm block, which is "=m" (i). This means output "=" to memory location "m" of variable i. The fistp opcode needs to store to memory, it cannot write to registers.

The "t" (d) tells gcc that the asm block expects the value of variable d in the top "t" of the fpu stack.

The "st" tells that the top value of the fpu stack will be popped.

The colons separate the output : input : flags sections of the operands.

There is also a fist opcode that does not pop the value off the stack. Might be useful if you want to immediately reuse that value.

Alex

I have tried the following:

__inline iceil(a,d)
double a;
int d[];

{
_asm
{
mov edi, d
mov ST0, a
FIST [edi]
}
}

FIST supposedly only outputs its result to memory. [edi] is the
address pointed to by register edi. But the compiler gives "invalid argument"
Nor does it like an explicit specification that EDI is a DWORD PTR

FIST DWORD PTR[edi] does not work.

note that mov eax, DWORD PTR[edi] is perfectly valid.

How does one get FIST to return its output as the function argument?
eax is the register used by VC++ for integer returns, but

FIST eax is invalid since eax is a register, not an address.
R.D. Silverman is offline   Reply With Quote
Old 2005-07-25, 19:56   #8
akruppa
 
akruppa's Avatar
 
"Nancy"
Aug 2002
Alexandria

2,467 Posts
Default

It probably complains about the "mov ST0,a". You load values onto the fpu stack with FLD, not MOV. You could try a "FLD x", where x must be a memory reference again, i.e.

mov edi, x
FLD [edi]
mov edi, d
FIST [edi]

where x is a double *. It would be much nicer to tell the compiler that FIST (or FISTP) expect a double value on the fpu stack and an address to an integer, and which variables these operands should correspond to (like the GCC asm statement does). This will let the compiler merge the FIST[P] opcode into the surrounding code much better, i.e. it can decide for itself which register to use for the addressing, when to load the double on the stack, etc. But someone else will have to help with that, I have never used VC.

Alex

Edit: by far the cleanest and probably easiest solution if you want portability across different cpu types and compilers will be using lrint().

Last fiddled with by akruppa on 2005-07-25 at 19:59
akruppa is offline   Reply With Quote
Old 2005-07-25, 20:09   #9
R.D. Silverman
 
R.D. Silverman's Avatar
 
Nov 2003

746010 Posts
Default

Quote:
Originally Posted by akruppa
It probably complains about the "mov ST0,a". You load values onto the fpu stack with FLD, not MOV. You could try a "FLD x", where x must be a memory reference again, i.e.

mov edi, x
FLD [edi]
mov edi, d
FIST [edi]

where x is a double *. It would be much nicer to tell the compiler that FIST (or FISTP) expect a double value on the fpu stack and an address to an integer, and which variables these operands should correspond to (like the GCC asm statement does). This will let the compiler merge the FIST[P] opcode into the surrounding code much better, i.e. it can decide for itself which register to use for the addressing, when to load the double on the stack, etc. But someone else will have to help with that, I have never used VC.

Alex

Edit: by far the cleanest and probably easiest solution if you want portability across different cpu types and compilers will be using lrint().

Actually you were right. I was getting a report of "invalid argument".
When I clicked on the report it is *supposed* to point to the offending
code. It *was* pointing at the FISTP line, but the error was on the
mov ST0, a line.

I changed it to

mov edi, d
FLD a
FISTP [edi]

and now it seems fine.
R.D. Silverman is offline   Reply With Quote
Old 2005-07-25, 20:48   #10
dsouza123
 
dsouza123's Avatar
 
Sep 2002

2·331 Posts
Default

Do you need to pass an integer array pointer to iceil ?
What about only send a double and get back an int ?

__inline int iceil(double a)
{
int d;
const double h = 0.5000001f;
_asm
{
finit
fld a
fadd h
fist d
};
return d;
}

There may an issue with rounding modes.
Check if int is OK versus long int.
dsouza123 is offline   Reply With Quote
Old 2005-07-26, 11:55   #11
R.D. Silverman
 
R.D. Silverman's Avatar
 
Nov 2003

22·5·373 Posts
Default

Quote:
Originally Posted by dsouza123
Do you need to pass an integer array pointer to iceil ?
What about only send a double and get back an int ?

__inline int iceil(double a)
{
int d;
const double h = 0.5000001f;
_asm
{
finit
fld a
fadd h
fist d
};
return d;
}

There may an issue with rounding modes.
Check if int is OK versus long int.
No, I clearly don't need to pass a pointer. But your suggestion creates
two temporaries on the stack because the float instructions need memory
addresses as destinations. I am not sure if yours is slower or faster than mine
Also, the fadd is not needed in the routine. If the routine only does 'fist'
You can call it with (a + .5) to get "ceil" or (a-.5) to get "floor".

Is the finit instruction really needed??

Thanks for the advice!!!
R.D. Silverman is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
A Challenge on the net devarajkandadai Miscellaneous Math 0 2012-05-31 05:17
When I was your age.....CHALLENGE petrw1 Lounge 14 2009-11-23 02:18
Challenge science_man_88 Miscellaneous Math 229 2009-09-07 08:08
rsa-640 challenge ValerieVonck Factoring 58 2005-10-24 15:54
Who is Challenge? JuanTutors PrimeNet 2 2004-07-22 12:56

All times are UTC. The time now is 00:05.

Wed May 19 00:05:13 UTC 2021 up 40 days, 18:46, 0 users, load averages: 1.78, 1.56, 1.48

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.