20200111, 20:45  #1 
P90 years forever!
Aug 2002
Yeehaw, FL
2^{4}·3^{2}·53 Posts 
Sin/cos algorithm question #2
Given cos(a) and sin(a), we want to compute cos(a*n) and sin(a*n) for n=2...10
My first implementation was a simple loop doing a complex multiply: Code:
base = step; for (i32 i = 1; i < MIDDLE; ++i) { u[i] = mul(u[i], base); // Apply twiddle base = mul(base, step); // Next twiddle } Each cos / sin value can be computed with just one FMA instruction: Code:
steps[1] = step; u[1] = mul(u[1], steps[1]); // apply twiddle step1costimes2 = steps[1].cos * 2.0; steps[2].cos = step1xtimes2 * steps[1].cos  1.0; steps[2].sin = step1xtimes2 * steps[1].sin; u[2] = mul(u[2], steps[2]); // apply twiddle for (i32 i = 3; i < MIDDLE; i++) { steps[i].cos = fma(step1costimes2, steps[i1].cos, steps[i2].cos); steps[i].sin = fma(step1costimes2, steps[i1].sin, steps[i2].sin); u[i] = mul(u[i], steps[i]); // apply twiddle } 
20200111, 21:20  #2 
Dec 2012
The Netherlands
1740_{10} Posts 
Unless I am missing the point here, the site you link to appears simply to be applying the standard formulae
cos(A+B)=cos(A)cos(B)sin(A)sin(B) and sin(A+B)=sin(A)cos(B)+cos(A)sin(B) and then iterating. So if the error for each trigonometric function is at most some epsilon then we can easily see how big the error gets on iteration. If you only need n up to 10, I would simply compute rational functions approximating sin(nx) and cos(nx) in the same way as for sin() and cos(). 
20200111, 22:25  #3  
Feb 2017
Nowhere
3·1,657 Posts 
Quote:
The use of n iterations would be bad for accuracy if n is large. However, the number of iterations can be reduced if n is factored. Since T_{n}(2*cos(x)) = 2*cos(n*x), if n = a*b we have 2*cos(n*x) = T_{a}(T_{b}(2*cos(x))) The composition could be extended to any number of factors. This might significantly reduce the number of iterations if n is the product of many small factors. However, the cost of making this determination would probably make it useless in the present application. I think using a polynomial or rational function approximation over a limited range is the way to go. 

20200112, 13:51  #4  
"Robert Gerbicz"
Oct 2005
Hungary
1,493 Posts 
Quote:
Get sin(10*n) from the previously computed sin(5*n),cos(5*n) sin(9*n) from sin(4*n),cos(4*n) and sin(5*n),cos(5*n) etc. 

20200112, 16:16  #5 
Feb 2017
Nowhere
4971_{10} Posts 
Using the addition formulas might be numerically bad if k*a is at or near an integer multiple of pi/2, since either sin(k*a) or cos(k*a) will be very near 0, meaning there is nearly complete cancelation of the terms being added or subtracted in the addition formula, with a possible loss of sig figs.

20200112, 16:48  #6  
P90 years forever!
Aug 2002
Yeehaw, FL
1DD0_{16} Posts 
Quote:
Your suggested algorithm generates the most temporaries, which can be an issue on a GPU. On AMD, with an GPU optimizer that sometimes serves as a random execution time generator, the straightforward compute of n+1 with a complex mul from cos(a*n) and sin(a*n) is fastest. On nVidia, the fastest (other than the broken Chebyshev) computes the 3n and 5n values using 4*n and n as inputs (a complex mul and a complex mul by conjugate). Last fiddled with by Prime95 on 20200112 at 16:49 

20200715, 06:09  #7 
Jul 2020
100_{2} Posts 
I looked into a very similar question a couple of years ago and wrote up my conclusions in a question / answer pair on Stackoverflow:
https://stackoverflow.com/questions/...llyspacedang I compared four different rotation formulas to account for tradeoffs between speed and accuracy. 
Thread Tools  
Similar Threads  
Thread  Thread Starter  Forum  Replies  Last Post 
Karatsuba algorithm relation to Matrix Multiplication (Strassen Algorithm)?  neomacdev  Computer Science & Computational Number Theory  1  20190730 01:40 
Doubt about P1 algorithm  ET_  Software  20  20111118 11:19 
TF algorithm(s)  davieddy  Miscellaneous Math  61  20110706 20:13 
Sieve algorithm  geoff  Twin Prime Search  5  20101107 17:02 
Prime95's FFT Algorithm  Angular  Math  6  20020926 00:13 