Go Back > Extra Stuff > Programming

Thread Tools
Old 2020-01-09, 04:58   #1
P90 years forever!
Prime95's Avatar
Aug 2002
Yeehaw, FL

2×11×349 Posts
Default Algorithm wanted - sine and/or cosine

I cannot locate my Knuth book. I need the fastest possible algorithm to compute sine(pi * a/b) and/or cosine(pi * a/b) on a GPU where a and b are positive 32-bit integers and a is less than half of b. The result is a 64-bit double. Accuracy is important.

GPU coding: Loops very bad, table lookup is bad, numerous if statements are bad.

To get an idea of the kind of code I'm looking for. Two related samples are given.

Here is sample 32-bit float code:

Here is code that has too much precision from a double-double library:
doubledouble sin(const doubledouble& x) {
  static const doubledouble tab[9]={ // tab[b] := sin(b*Pi/16)...
  static const doubledouble sinsTab[7] = { // Chebyshev coefficients
  if (fabs(x.h())<1.0e-7) return x*(1.0-sqr(x)/3);
  int a,b; doubledouble sins, coss, k1, k3, t2, s, s2, sinb, cosb;
  // reduce x: -Pi < x <= Pi
  k1=x/doubledouble::TwoPi; k3=k1-rint(k1);
  // determine integers a and b to minimize |s|, where  s=x-a*Pi/2-b*Pi/16
  b=int(rint((8*(t2-a))));   // must have |a|<=2 and |b|<=7 now
  s=doubledouble::Pi*(k3+k3-(8*a+b)/16.0); // s is now reduced argument. -Pi/32 < s < Pi/32
  // Chebyshev series on -Pi/32..Pi/32, max abs error 2^-98=3.16e-30, whereas
  // power series has error 6e-28 at Pi/32 with terms up to x^13 included.
  coss=sqrt(1.0-sqr(sins));  // ok as -Pi/32 < s < Pi/32
  // here sinb=sin(b*Pi/16) etc.
  sinb= (b>=0) ? tab[b] : -tab[-b]; cosb=tab[8-abs(b)];
  if (0==a) {
    return  sins*cosb+coss*sinb;
  } else if (1==a) {
    return -sins*sinb+coss*cosb;
  } else if (-1==a) {
    return  sins*sinb-coss*cosb;
  } else  { // |a|=2
    return -sins*cosb-coss*sinb;
  // i.e. return sins*(cosa*cosb-sina*sinb)+coss*(sina*cosb+cosa*sinb);
Prime95 is offline   Reply With Quote
Old 2020-01-09, 05:28   #2
P90 years forever!
Prime95's Avatar
Aug 2002
Yeehaw, FL

2·11·349 Posts

Just found this stackoverflow page which has some promising leads for me to investigate:
Prime95 is offline   Reply With Quote
Old 2020-01-09, 09:48   #3
Nick's Avatar
Dec 2012
The Netherlands

2×3×293 Posts

Stepping back from the immediate problem for a moment, it speaks volumes about current GPU design that sine & cosine are not available as fast accurate primitive instructions!
Nick is offline   Reply With Quote
Old 2020-01-09, 09:48   #4
Just call me Henry
henryzz's Avatar
Sep 2007
Cambridge (GMT/BST)

173116 Posts

I am fairly sure I have seen the Knuth books online somewhere.
henryzz is offline   Reply With Quote
Old 2020-01-09, 13:39   #5
axn's Avatar
Jun 2003

3·11·157 Posts

I have TAoCP with me, but a quick look didn't give any formula for sine/cosine.
axn is offline   Reply With Quote
Old 2020-01-09, 15:18   #6
tServo's Avatar
May 2009
near the Tannhäuser Gate

2·3·113 Posts

I was a bit disappointed that "Hacker's Delight" didn't have any discussion of this.

However, HAKMEM has an algorithm for the PDP-10 in about a dozen lines.
It uses push & pop, however and, of course, has a limited word length.

Still, it might give you some ideas. It's on page 75

I googled pdp 10 emulator and found a ton of references.
tServo is offline   Reply With Quote
Old 2020-01-09, 15:48   #7
R.D. Silverman
R.D. Silverman's Avatar
Nov 2003

22×5×373 Posts

Originally Posted by axn View Post
I have TAoCP with me, but a quick look didn't give any formula for sine/cosine.
You may want to look up 'Pade Approximation", Wiki has a short article
including an expression for sin()
R.D. Silverman is offline   Reply With Quote
Old 2020-01-09, 19:17   #8
De Wandelaar
De Wandelaar's Avatar
Jul 2017

2×3×13 Posts

The only reference to Sin/Cos I found :
Vol 2 of TAoCP- Seminumerical Algorithms
4.6.4 Evaluation of polynomials - Adaptation of the coefficients
page 432 - second printing
Let us now return to our original problem of evaluating a given polynomialu(x) as rapidly as possible, for "random" values of x. The importance of this problem is due partly to the fact that standard functions such as sin x, cos x, ex, etc., are usually computed by subroutines which rely on evaluation of certain polynomials; these polynomials are evaluated so often, it is desirable to find the fatest possible way to do the computation.
I didn't found a more specific discussion about sin and cos.
De Wandelaar is offline   Reply With Quote
Old 2020-01-09, 21:19   #9
bsquared's Avatar
Feb 2007

3×1,193 Posts

Contains no loops, conditionals, or lookup tables, but accuracy probably not what you want. Maybe something in the approach could be useful though.
bsquared is offline   Reply With Quote
Old 2020-01-09, 22:10   #10
P90 years forever!
Prime95's Avatar
Aug 2002
Yeehaw, FL

2×11×349 Posts

From the stackexchage post, these 2 links are just what the doctor ordered:

One directory includes an implementation in C, contributed by IBM. Since October 2011, this is the code that actually runs when you call sin() on a typical x86-64 Linux system. It is apparently faster than the fsin assembly instruction. Source code:;hb=HEAD#l194, look for __sin (double x).
fdlibm's implementation of sin in pure C is much simpler than glibc's and is nicely commented. Source code: and
It is not too hard to convert my [0,pi/2] range to [-pi/4,pi/4]

Last fiddled with by Prime95 on 2020-01-09 at 22:10
Prime95 is offline   Reply With Quote
Old 2020-01-10, 01:43   #11
Dr Sardonicus
Dr Sardonicus's Avatar
Feb 2017

23·223 Posts

I dug up an old (1966) IBM System 360 manual here. I thought of it because my dad programmed IBM 360's and kept some of the old manuals. Perhaps the old algorithm can be adapted. I had to do some fiddling -- copy-paste did not convert to text very well. While I was doing this, I see some more modern renditions have been found.

xxxlSCN Subprogram (DSIN and DCOS)


1. Divide |x| by pi/4 and separate the quotient (z) into its integer part (q) and
its fraction part (r). Then, z = |x|*4/pi = q + r, where q is an integer and r is within the range, 0 <= r < 1.

2. If the cosine is desired, add 2 to q. If the sine is desired and if x is negative, add 4 to q. This adjustment of q reduces the general case to the computation of
sin (x) for x > 0, because

cos ( ± x) = sin (|x| + pi/2 ) , and

sin (-x) = sin (|x| + pi).

3. Let q0 == q mod 8.

Then, for

q0 = 0,sin (x) = sin(pi/4* r)

q0 = 1, sin (x) = cos(pi/4*(1 - r))

q0 = 2, sin (x) = cos(pi/4*r)

q0 = 3, sin (x) = sin(pi/4*(1 - r))

q0 = 4, sin (x) = -sin(pi/4*r)

q0 = 5, sin (x) = - cos(pi/4*(1 - r))

q0 = 6, sin (x) = -cos(pi/4*r)

q0 = 7, sin (x) = -sin(pi/4*(1 - r))

These formulas reduce each case to the computation of either sin(pi/4*r1) or cos(pi/4*r1); where r1 is either r or (1 - r), and is within the range, 0 <= r1 <= 1.

4. Finally, either sin (pi/4*r1) or cos ( pi/4*r1) is computed, using the Chebyshev interpolation of degree 6 in r1^2 for the sine, and of degree 7 in r1^2 for the cosine.

The maximum relative error of the sine polynomial is 2-58 and that of the cosine polynomial is 2-64.3.

Effect of an Argument Error

E\; \sim \; \Delta As the value of the argument increases, \Delta inceases.Because the function value diminishes periodically, no consistent relative error control can be maintained outside of the principal range, - pi/2 <= x <= + pi/2.

Last fiddled with by Dr Sardonicus on 2020-01-10 at 01:53 Reason: Fixing transcription errors
Dr Sardonicus is offline   Reply With Quote

Thread Tools

Similar Threads
Thread Thread Starter Forum Replies Last Post
Karatsuba algorithm relation to Matrix Multiplication (Strassen Algorithm)? neomacdev Computer Science & Computational Number Theory 1 2019-07-30 01:40
A Study of Primes Through Non-Resonant Sine Waves EdH Math 56 2010-04-11 19:47
Signs of Remainders of Cosine and Sine Series jinydu Homework Help 2 2008-04-29 01:22
Calculating the number of intersections between sine and cosine curves ShiningArcanine Miscellaneous Math 2 2006-08-06 21:55
Hard proplem for finding sine function tinhnho Miscellaneous Math 6 2005-01-17 05:42

All times are UTC. The time now is 21:01.

Wed Dec 1 21:01:43 UTC 2021 up 131 days, 15:30, 1 user, load averages: 1.51, 1.41, 1.41

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.