 mersenneforum.org Algorithm wanted - sine and/or cosine
 Register FAQ Search Today's Posts Mark Forums Read  2020-01-09, 04:58 #1 Prime95 P90 years forever!   Aug 2002 Yeehaw, FL 2·5·23·31 Posts Algorithm wanted - sine and/or cosine I cannot locate my Knuth book. I need the fastest possible algorithm to compute sine(pi * a/b) and/or cosine(pi * a/b) on a GPU where a and b are positive 32-bit integers and a is less than half of b. The result is a 64-bit double. Accuracy is important. GPU coding: Loops very bad, table lookup is bad, numerous if statements are bad. To get an idea of the kind of code I'm looking for. Two related samples are given. Here is sample 32-bit float code: http://mooooo.ooo/chebyshev-sine-approximation/ Here is code that has too much precision from a double-double library: Code: doubledouble sin(const doubledouble& x) { static const doubledouble tab={ // tab[b] := sin(b*Pi/16)... 0.0, "0.1950903220161282678482848684770222409277", "0.3826834323650897717284599840303988667613", "0.5555702330196022247428308139485328743749", "0.7071067811865475244008443621048490392850", "0.8314696123025452370787883776179057567386", "0.9238795325112867561281831893967882868225", "0.9807852804032304491261822361342390369739", 1.0 }; static const doubledouble sinsTab = { // Chebyshev coefficients "0.9999999999999999999999999999993767021096", "-0.1666666666666666666666666602899977158461", "8333333333333333333322459353395394180616.0e-42", "-1984126984126984056685882073709830240680.0e-43", "2755731922396443936999523827282063607870.0e-45", "-2505210805220830174499424295197047025509.0e-47", "1605649194713006247696761143618673476113.0e-49" }; if (fabs(x.h())<1.0e-7) return x*(1.0-sqr(x)/3); int a,b; doubledouble sins, coss, k1, k3, t2, s, s2, sinb, cosb; // reduce x: -Pi < x <= Pi k1=x/doubledouble::TwoPi; k3=k1-rint(k1); // determine integers a and b to minimize |s|, where s=x-a*Pi/2-b*Pi/16 t2=4*k3; a=int(rint(t2)); b=int(rint((8*(t2-a)))); // must have |a|<=2 and |b|<=7 now s=doubledouble::Pi*(k3+k3-(8*a+b)/16.0); // s is now reduced argument. -Pi/32 < s < Pi/32 s2=sqr(s); // Chebyshev series on -Pi/32..Pi/32, max abs error 2^-98=3.16e-30, whereas // power series has error 6e-28 at Pi/32 with terms up to x^13 included. sins=s*(sinsTab+(sinsTab+(sinsTab+(sinsTab+(sinsTab+ (sinsTab+sinsTab*s2)*s2)*s2)*s2)*s2)*s2); coss=sqrt(1.0-sqr(sins)); // ok as -Pi/32 < s < Pi/32 // here sinb=sin(b*Pi/16) etc. sinb= (b>=0) ? tab[b] : -tab[-b]; cosb=tab[8-abs(b)]; if (0==a) { return sins*cosb+coss*sinb; } else if (1==a) { return -sins*sinb+coss*cosb; } else if (-1==a) { return sins*sinb-coss*cosb; } else { // |a|=2 return -sins*cosb-coss*sinb; } // i.e. return sins*(cosa*cosb-sina*sinb)+coss*(sina*cosb+cosa*sinb); }   2020-01-09, 05:28 #2 Prime95 P90 years forever!   Aug 2002 Yeehaw, FL 2·5·23·31 Posts Just found this stackoverflow page which has some promising leads for me to investigate: https://stackoverflow.com/questions/...math-functions   2020-01-09, 09:48 #3 Nick   Dec 2012 The Netherlands 101101010012 Posts Stepping back from the immediate problem for a moment, it speaks volumes about current GPU design that sine & cosine are not available as fast accurate primitive instructions!   2020-01-09, 09:48 #4 henryzz Just call me Henry   "David" Sep 2007 Cambridge (GMT/BST) 10110011000102 Posts I am fairly sure I have seen the Knuth books online somewhere.   2020-01-09, 13:39 #5 axn   Jun 2003 22·11·107 Posts I have TAoCP with me, but a quick look didn't give any formula for sine/cosine.   2020-01-09, 15:18 #6 tServo   "Marv" May 2009 near the Tannhäuser Gate 10178 Posts I was a bit disappointed that "Hacker's Delight" didn't have any discussion of this. However, HAKMEM has an algorithm for the PDP-10 in about a dozen lines. It uses push & pop, however and, of course, has a limited word length. Still, it might give you some ideas. It's on page 75 https://w3.pppl.gov/~hammett/work/2009/AIM-239-ocr.pdf I googled pdp 10 emulator and found a ton of references.   2020-01-09, 15:48   #7
R.D. Silverman

Nov 2003

26×113 Posts Quote:
 Originally Posted by axn I have TAoCP with me, but a quick look didn't give any formula for sine/cosine.
You may want to look up 'Pade Approximation", Wiki has a short article
including an expression for sin()   2020-01-09, 19:17   #8
De Wandelaar

"Yves"
Jul 2017
Belgium

2×23 Posts The only reference to Sin/Cos I found :
Vol 2 of TAoCP- Seminumerical Algorithms
4.6.4 Evaluation of polynomials - Adaptation of the coefficients
page 432 - second printing
Quote:
 Let us now return to our original problem of evaluating a given polynomialu(x) as rapidly as possible, for "random" values of x. The importance of this problem is due partly to the fact that standard functions such as sin x, cos x, ex, etc., are usually computed by subroutines which rely on evaluation of certain polynomials; these polynomials are evaluated so often, it is desirable to find the fatest possible way to do the computation.
I didn't found a more specific discussion about sin and cos.   2020-01-09, 21:19 #9 bsquared   "Ben" Feb 2007 2×33×61 Posts fastapprox: https://github.com/pmineiro/fastappr...src/fasttrig.h Contains no loops, conditionals, or lookup tables, but accuracy probably not what you want. Maybe something in the approach could be useful though.   2020-01-09, 22:10   #10
Prime95
P90 years forever!

Aug 2002
Yeehaw, FL

2·5·23·31 Posts From the stackexchage post, these 2 links are just what the doctor ordered:

Quote:
 One directory includes an implementation in C, contributed by IBM. Since October 2011, this is the code that actually runs when you call sin() on a typical x86-64 Linux system. It is apparently faster than the fsin assembly instruction. Source code: https://sourceware.org/git/?p=glibc....c;hb=HEAD#l194, look for __sin (double x).
Quote:
 fdlibm's implementation of sin in pure C is much simpler than glibc's and is nicely commented. Source code: http://www.netlib.org/fdlibm/s_sin.c and http://www.netlib.org/fdlibm/k_sin.c
It is not too hard to convert my [0,pi/2] range to [-pi/4,pi/4]

Last fiddled with by Prime95 on 2020-01-09 at 22:10   2020-01-10, 01:43   #11
Dr Sardonicus

Feb 2017
Nowhere

DDE16 Posts I dug up an old (1966) IBM System 360 manual here. I thought of it because my dad programmed IBM 360's and kept some of the old manuals. Perhaps the old algorithm can be adapted. I had to do some fiddling -- copy-paste did not convert to text very well. While I was doing this, I see some more modern renditions have been found.

Quote:
 xxxlSCN Subprogram (DSIN and DCOS) Algorithm 1. Divide |x| by pi/4 and separate the quotient (z) into its integer part (q) and its fraction part (r). Then, z = |x|*4/pi = q + r, where q is an integer and r is within the range, 0 <= r < 1. 2. If the cosine is desired, add 2 to q. If the sine is desired and if x is negative, add 4 to q. This adjustment of q reduces the general case to the computation of sin (x) for x > 0, because cos ( ± x) = sin (|x| + pi/2 ) , and sin (-x) = sin (|x| + pi). 3. Let q0 == q mod 8. Then, for q0 = 0,sin (x) = sin(pi/4* r) q0 = 1, sin (x) = cos(pi/4*(1 - r)) q0 = 2, sin (x) = cos(pi/4*r) q0 = 3, sin (x) = sin(pi/4*(1 - r)) q0 = 4, sin (x) = -sin(pi/4*r) q0 = 5, sin (x) = - cos(pi/4*(1 - r)) q0 = 6, sin (x) = -cos(pi/4*r) q0 = 7, sin (x) = -sin(pi/4*(1 - r)) These formulas reduce each case to the computation of either sin(pi/4*r1) or cos(pi/4*r1); where r1 is either r or (1 - r), and is within the range, 0 <= r1 <= 1. 4. Finally, either sin (pi/4*r1) or cos ( pi/4*r1) is computed, using the Chebyshev interpolation of degree 6 in r1^2 for the sine, and of degree 7 in r1^2 for the cosine. The maximum relative error of the sine polynomial is 2-58 and that of the cosine polynomial is 2-64.3. Effect of an Argument Error As the value of the argument increases, inceases.Because the function value diminishes periodically, no consistent relative error control can be maintained outside of the principal range, - pi/2 <= x <= + pi/2.

Last fiddled with by Dr Sardonicus on 2020-01-10 at 01:53 Reason: Fixing transcription errors   Thread Tools Show Printable Version Email this Page Similar Threads Thread Thread Starter Forum Replies Last Post neomacdev Computer Science & Computational Number Theory 1 2019-07-30 01:40 EdH Math 56 2010-04-11 19:47 jinydu Homework Help 2 2008-04-29 01:22 ShiningArcanine Miscellaneous Math 2 2006-08-06 21:55 tinhnho Miscellaneous Math 6 2005-01-17 05:42

All times are UTC. The time now is 15:01.

Tue Oct 20 15:01:50 UTC 2020 up 40 days, 12:12, 1 user, load averages: 2.70, 2.80, 2.75