mersenneforum.org  

Go Back   mersenneforum.org > Extra Stuff > Programming

Reply
 
Thread Tools
Old 2020-01-12, 05:47   #23
Prime95
P90 years forever!
 
Prime95's Avatar
 
Aug 2002
Yeehaw, FL

24×32×53 Posts
Default

Quote:
Originally Posted by axn View Post
Does the modified code sequence actually result in any reduction in time or is it DOA?

How does one go about doing error analysis? In theory, the code is computing the same polynomial as the previous one, with just fewer multiplications. In practice, I don't know how much additional error (if any) that the new code introduces. I suppose, one could empirically compare all the interesting sin/cos values of all three (native / poly / Knuth).

Also, is it worth combining both sin & cos into a single routine (for either version)? Would that give more options for compiler / scheduler to do a better job?
I did not try your code. If I counted right, the op count is unchanged. A mul became an add.

I've not yet run across code that produces both sin and cos at a reduced cost.
Prime95 is online now   Reply With Quote
Old 2020-01-12, 06:24   #24
CRGreathouse
 
CRGreathouse's Avatar
 
Aug 2006

3×1,993 Posts
Default

Quote:
Originally Posted by Prime95 View Post
I've not yet run across code that produces both sin and cos at a reduced cost.
Perhaps I don't understand, but...
https://pari.math.u-bordeaux.fr/cgi-...022d739b#l3498
CRGreathouse is offline   Reply With Quote
Old 2020-01-12, 19:36   #25
ewmayer
2ω=0
 
ewmayer's Avatar
 
Sep 2002
República de California

266128 Posts
Default

Quote:
Originally Posted by axn View Post
How does one go about doing error analysis? In theory, the code is computing the same polynomial as the previous one, with just fewer multiplications. In practice, I don't know how much additional error (if any) that the new code introduces. I suppose, one could empirically compare all the interesting sin/cos values of all three (native / poly / Knuth).
One could, say, loop over a bunch of evenly-spaced points in the interval of interest, use the various formulae to compute sincos in DP, compute errors using expensive extended-precision sincos as a reference.
ewmayer is offline   Reply With Quote
Old 2020-01-12, 20:50   #26
Prime95
P90 years forever!
 
Prime95's Avatar
 
Aug 2002
Yeehaw, FL

11101110100002 Posts
Default

Quote:
Originally Posted by CRGreathouse View Post
Looks like that code does a sqrt to get the sin value from the cos value. On my GPU that is more costly than the polynomial calculation.
Prime95 is online now   Reply With Quote
Old 2020-01-16, 21:42   #27
ewmayer
2ω=0
 
ewmayer's Avatar
 
Sep 2002
República de California

266128 Posts
Default

I've been playing with the use of a Chebyshev-polynomial basis expansion for approximating e.g. sin and cos - this gives Pade-similar accuracy but is much more computationally convenient. Longer mathematical background (starting with basics of approximation in Hilbert spaces via orthogonal polynomials) here, but the upshot is, to generate an approximation to a given polynomial order n we simply need to evaluate our target function f(x) at the n zeros of the first neglected Chebyshev basis function T_n(x) (which are given by -cos((k+1/2)*Pi/n) for k = 0,...,n-1) and multiply-accumulate f(x) at each such x by the values of the n lower-order bases T_0,...T_n-1 there. For target functions which are even or odd need only even and odd-order T_j, respectively, and only need do the above multiply-accumulate operation on the zeros of T_n lying in the half-interval [0,+1]. Here the results for f(x) = sin(x) using approximations up to the same order as in the Sun code George posted. I did all the computations in quad-float, that's why we can have errors which are smaller than douple-precision epsilon:

Unsurprisingly we get better approximational accuracy for smaller domains due to the closer-to-linear sin(x) snippets those contain.

MaxErr for Chebyshev approximation to sin(y), y in [-Pi/4,+Pi/4] for odd-order basis set of degree n. y = x*Pi/4:
n = 1: maxerr = 1.076412114012678e-02 at x = 0.406840175630695
n = 3: maxerr = 2.211828604797517e-04 at x = 0.744423941213003
n = 5: maxerr = 5.841972262455021e-07 at x = 0.536379036312490
n = 7: maxerr = 1.719787772866942e-09 at x = 0.717635580655843
n = 9: maxerr = 2.461209496415740e-12 at x = 0.785398163397448
n = 11: maxerr = 2.382614064390771e-15 at x = 0.711909242646512
n = 13: maxerr = 1.932929161164053e-18 at x = 0.784331592691555

MaxErr for Chebyshev approximation to sin(y), y in [-1,+1] for odd-order basis set of degree n. y = x:
n = 1: maxerr = 7.725438505767193e-02 at x = 1.000000000000000
n = 3: maxerr = 9.930425666556403e-04 at x = 1.000000000000000
n = 5: maxerr = 5.988354112364584e-06 at x = 1.000000000000000
n = 7: maxerr = 2.094915738264925e-08 at x = 1.000000000000000
n = 9: maxerr = 4.784333699825444e-11 at x = 1.000000000000000
n = 11: maxerr = 7.693285387103897e-14 at x = 1.000000000000000
n = 13: maxerr = 9.181670816793705e-17 at x = 1.000000000000000

MaxErr for Chebyshev approximation to sin(y), y in [-Pi/2,+Pi/2] for odd-order basis set of degree n. y = x*Pi/2:
n = 1: maxerr = 8.092495638969945e-02 at x = 0.402502421574251
n = 3: maxerr = 6.605511643131922e-03 at x = 0.742805235598241
n = 5: maxerr = 7.140718504766345e-05 at x = 0.535902299627308
n = 7: maxerr = 8.439303067887709e-07 at x = 0.717458080670915
n = 9: maxerr = 4.857121351026179e-09 at x = 0.785398163397448
n = 11: maxerr = 1.892336918249797e-11 at x = 0.711866831145689
n = 13: maxerr = 6.158693885456859e-14 at x = 0.784310386941143

I'll be happy to post the expansion coefficients if desired - they appear more or less the same as for the Sun code, at least when we consider the same x-domain. Unsurprisingly they converge to the Taylor series coefficients as we take n larger, the key lies in the differences between the 2 sets of expansion coefficients (Chebyshev vs Taylor) at any given order n.

The nice thing is that, with the sin(x) case coded up and debugged, it's easy to do this for more or less any desired target function, and we have some control over the polynomial order - for example for sin(y) with y in [-Pi/4,+Pi/4] we could probably use n = 11 rather than n = 13 when working in double precision.
ewmayer is offline   Reply With Quote
Old 2020-01-17, 02:06   #28
Prime95
P90 years forever!
 
Prime95's Avatar
 
Aug 2002
Yeehaw, FL

24×32×53 Posts
Default

Good work, Ernst. When you've polished up the code, could you upload it for all to enjoy?
Prime95 is online now   Reply With Quote
Old 2020-01-17, 19:24   #29
ewmayer
2ω=0
 
ewmayer's Avatar
 
Sep 2002
República de California

2D8A16 Posts
Default

Bzipped code attached. Note in the meanwhile I got a PM form DrSardonicus about mapping a target interval [0,pi/2] to [-pi/4,+pi/4]. The code George posted in #14 of the thread was for sin and cos, each over [-pi/4,+pi/4], so those are the domains I targeted. At bottom of the code snip George has "We use the following trig identities to convert our [0, pi/2] range to [-pi/4, pi/4] range" bit, which uses the identities:

cos(x + pi/4) = cos(x) * SQRTHALF - sin(x) * SQRTHALF
sin(x + pi/4) = sin(x) * SQRTHALF + cos(x) * SQRTHALF

Those are computationally efficient, in an FFT context we always need both sin and cos, so mapping [0,pi/2] to [-pi/4,pi/4] and computing sin(x) and cos(x) in the latter interval, the mapping-back needs just one add and one mul for each of sin(x + pi/4) and cos(x + pi/4). But DrS notes that e.g. is we desire cos(x) for x close to pi/2 (i.e. a near-zero result) using the first formula above we have x near pi/4, cos(x) nearly the same as sin(x) and we end up subtracting two small numbers, a known accuracy-killer. That needs further discussion. To avoid this we could use the identity cos(x) = sin(pi/2-x) for arguments in [pi/4,pi/2], but how to do this sort of thing without the resulting conditionals killing the performance?

==================

To build, drop the unzipped qfcheb.c source file into the Mlucas /src directory, and keep this set of object files away from any resulting from a normal Mlucas build, to prevent collision of multiple main() functions on linking. Compile this subset of files:

gcc -c -O3 get_cpuid.c get_fp_rnd_const.c imul_macro.c mi64.c qfcheb.c qfloat.c rng_isaac.c twopmodq.c types.c util.c
Assuming no errors, link with
gcc -o qfcheb *o -lm

Note the code spends nearly all of its time in the max-error-checking loops, which compute the abserr at 1 million evenly-spaced points in the upper half of the Chebyshev interval, x in [0,1] ... the computation of the approximational coefficients is mre or less instantaneous, relatively speaking.

I've added minimal sanity checking on the input order n ... for dumb reasons (legacy of my initial sine-only implementation), n must be even. Taking n > 12 or so takes the precision beyond that of IEEE64 doubles, but you'll see rapidly diminishing returns beyond n ~= 20, when the approximational accuracy approaches that of my custom qfloat type, which is implemented as a pair of uint64s, the .hi ones of which bitmaps to an IEEE64 double format including hidden bit, the .lo one adding 64 more significand bits. If you wanted to use the code to craft cutom polys beyond IEEE64 precision you'd need to fiddle the coefficient-output prints to, say, print hex-integer-form bitfields of the desired length and with the correct bitlength for the exponent and significand of the target type. E.g. if I wanted outputs in x86 register double format, I'd craft 80-bit versions of the qfdbl_as_uint64 and qfdbl functions in qfloat.c, which properly map the 128-bit qfloat format to x86 reg-double format.
Attached Files
File Type: bz2 qfcheb.c.bz2 (3.6 KB, 214 views)

Last fiddled with by ewmayer on 2020-01-19 at 19:32 Reason: Fixed missing-reinit-of-maxerr-prior to-sine-error-computation bug
ewmayer is offline   Reply With Quote
Old 2020-01-18, 02:01   #30
Dr Sardonicus
 
Dr Sardonicus's Avatar
 
Feb 2017
Nowhere

3×1,657 Posts
Default

I sent a PM to Ernst about this. He suggested I post it to the thread, so here it is:

Quote:
Originally Posted by ewmayer View Post
The nice thing is that, with the sin(x) case coded up and debugged, it's easy to do this for more or less any desired target function, and we have some control over the polynomial order - for example for sin(y) with y in [-Pi/4,+Pi/4] we could probably use n = 11 rather than n = 13 when working in double precision.
In the present case, the argument is guaranteed to be in [0, pi/2]. Obviously, you can obtain an argument in [0, pi/4] by replacing the argument x with pi/2 - x if x is greater than pi/4. However, that would appear to require an IF statement; and possibly another would be needed to take care of the reversal of the roles of sine or cosine if you do replace x with pi/2 - x. That may more than offset the reduction in the number of terms required, I don't know. I suppose the subtraction needed for the reduction might also introduce a small error in the "reduced" argument.

Two other ways that are fine in theory are

(1) shifting the interval to [-pi/4, pi/4] by subtracting pi/4 from the argument, compute the sine and cosine for the shifted argument, then use the addition formulas to recover the original sine and cosine. However, this is numerically very bad if the given argument is near either 0 or pi/2. (The near-zero sine or cosine is obtained by subtracting two nearly equal values.)

(2) Divide the argument by 2, then use the double-angle formulas to recover the original sine and cosine. This is numerically bad if the given angle is near pi/2.
Dr Sardonicus is offline   Reply With Quote
Old 2020-01-18, 16:06   #31
xilman
Bamboozled!
 
xilman's Avatar
 
"𒉺𒌌𒇷𒆷𒀭"
May 2003
Down not across

101010110001012 Posts
Default

Quote:
Originally Posted by Dr Sardonicus View Post
In the present case, the argument is guaranteed to be in [0, pi/2]. Obviously, you can obtain an argument in [0, pi/4] by replacing the argument x with pi/2 - x if x is greater than pi/4. However, that would appear to require an IF statement;
Try pulling the standard trick for constant time arithmetic. I.e. compute (x-pi/4) and mask all but the sign bit:



/* Assume earlier code has declared x to be double and given a value in [0, pi/2] */

#define P4 = (3.14159265358979323846 / 4.0);
union du (double d, uint64_t u) mask;

mask.d = (x - P4);
mask.u = mask.u & (1 << 63) | 0x3ff0000000000000;

/* Now mask.d is -1.0 or +1.0 according to the result of the subtraction */

x = x * mask.d + P4 - (P4 * mask.d);

If I didn't make any typos the rather disgusting type-punning code above should perform the range change without any conditional instruction. A union type isn't strictly necessary as all the punning can be done with casts at the cost of even more proof-reading of your code. Hint: declare mask to be of type double and perform bit-twiddling on *(uint64_t *) &mask. I would expect the compiler to generate the same instruction stream, but check this claim!

Please check the above very carefully!

Last fiddled with by xilman on 2020-01-18 at 16:31 Reason: Convert variable "double p4" to a constant macro P4
xilman is offline   Reply With Quote
Old 2020-01-18, 23:26   #32
ewmayer
2ω=0
 
ewmayer's Avatar
 
Sep 2002
República de California

2·3·29·67 Posts
Default

@Paul: I consider type-punning aliasing a necessary evil, e.g. my quad-float code relies on it. There's also the "when you need to be 100% sure that double-const you are using is translated faithfully to IEEE64 bit format with no funny business done by the compiler to the low bits" issue that may arise with decimal-format doubles. The code I posted above does both cos(x) and sin(x) for x in [-pi/4,pi/4], and outputs the resulting approximation-polynomial coefficients in both decimal-float and uint64-bitfield format, e.g. for a 12-th-order approximation:
Code:
MacBook:obj_qfcheb ewmayer$ ./qfcheb -n 12
Using n = 12
Computing Chebyshev-approximation to Cos(y) for |y| <= 0.7853981633974483, using first 12 T_j(x):
Chebyshev basis function coefficients, as double and [exact uint64 bitfield]:
c[ 0] =     8.516319137048081e-01 [0x3feb40919232f208]
c[ 2] =    -1.464366443908369e-01 [0xbfc2be6f9b4c1cb6]
c[ 4] =     1.921449311814647e-03 [0x3f5f7b247d216ef5]
c[ 6] =    -9.964968489829300e-06 [0xbee4e5e6d7331ba2]
c[ 8] =     2.757659560715710e-08 [0x3e5d9c3266cd5a3e]
c[10] =    -4.739945098459938e-11 [0xbdca0ee132c60c1f]
Raw polynomial coefficients, as double and [exact uint64 bitfield]:
d[ 0] =     9.999999999999444e-01 [0x3feffffffffffe0b]
d[ 2] =    -4.999999999935111e-01 [0xbfdffffffffe3763]
d[ 4] =     4.166666654391119e-02 [0x3fa5555554476425]
d[ 6] =    -1.388888039377642e-03 [0xbf56c16b2d3e61f5]
d[ 8] =     2.479892875532367e-05 [0x3efa00e9685da9c4]
d[10] =    -2.717342984102282e-07 [0xbe923c5c15441a00]
maxerr =     5.558974014151506e-14 at x =    0.000000000000000
Computing Chebyshev-approximation to Sin(y) for |y| <= 0.7853981633974483, using first 12 T_j(x):
Chebyshev basis function coefficients, as double and [exact uint64 bitfield]:
c[ 1] =     7.263756766937347e-01 [0x3fe73e7834004d7f]
c[ 3] =    -1.942002905320151e-02 [0xbf93e2d816aef8a1]
c[ 5] =     1.516929228510740e-04 [0x3f23e1f8490d5ad3]
c[ 7] =    -5.605804684120018e-07 [0xbea2cf597c94d8d6]
c[ 9] =     1.205324169088147e-09 [0x3e14b5125ea41a49]
c[11] =    -1.695817118027107e-12 [0xbd7dd54805f3f706]
Raw polynomial coefficients, as double and [exact uint64 bitfield]:
d[ 1] =     9.999999999999957e-01 [0x3fefffffffffffd9]
d[ 3] =    -1.666666666661669e-01 [0xbfc5555555550efe]
d[ 5] =     8.333333323878150e-03 [0x3f81111110bde5e3]
d[ 7] =    -1.984126329839051e-04 [0xbf2a019f8a207d3f]
d[ 9] =     2.755527189338982e-06 [0x3ec71d7317b8ee33]
d[11] =    -2.475655877198039e-08 [0xbe5a9507f3711e2d]
maxerr =     2.382614064390771e-15 at x =    0.711909242646512

Last fiddled with by ewmayer on 2020-01-19 at 19:33
ewmayer is offline   Reply With Quote
Old 2020-01-19, 01:40   #33
xilman
Bamboozled!
 
xilman's Avatar
 
"𒉺𒌌𒇷𒆷𒀭"
May 2003
Down not across

2AC516 Posts
Default

I did make at least 2 tyops but the compiler would have picked them up instantly. The corrected line of code is

union du {double d, uint64_t u} mask;

Note curly brackets, not round ones.
xilman is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
Karatsuba algorithm relation to Matrix Multiplication (Strassen Algorithm)? neomacdev Computer Science & Computational Number Theory 1 2019-07-30 01:40
A Study of Primes Through Non-Resonant Sine Waves EdH Math 56 2010-04-11 19:47
Signs of Remainders of Cosine and Sine Series jinydu Homework Help 2 2008-04-29 01:22
Calculating the number of intersections between sine and cosine curves ShiningArcanine Miscellaneous Math 2 2006-08-06 21:55
Hard proplem for finding sine function tinhnho Miscellaneous Math 6 2005-01-17 05:42

All times are UTC. The time now is 17:02.


Sun Oct 17 17:02:11 UTC 2021 up 86 days, 11:31, 1 user, load averages: 2.51, 2.12, 1.72

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.