20080730, 04:02  #1 
Jul 2008
13 Posts 
interested in theory+implementation: FFT multipl
Greetings all,
I assume I am like many others on this forum in that I am implementing my own arbitraryprecision integer arithmetic library for the purpose of conducting LucasLehmer tests. I understand that if I am to have any hope of testing multimillion exponents, I will need to implement an FFT squaring algorithm. Unfortunately, the academic papers on this subject are hopelessly obscure for me, as I lack the necessary mathematical background (ring theory? what?). If someone could please offer me a little direction in this, I would be grateful. Also, what is the difference b/t the SchonhageStrassen algorithm and other FFT's? Is the new Furer algorithm (2007) also considered an FFT? Can anyone share their implementations? 
20080730, 14:44  #2  
Tribal Bullet
Oct 2004
3537_{10} Posts 
Quote:
K. Jayasuriya, "Multiprecision Arithmetic Using Fast Hartley Transforms", Appl. Math. and Comp. 75:239251 (1996) Quote:


20080801, 19:33  #3 
Jul 2008
13 Posts 
Thanks a lot, I will certainly give the material a look. It certainly seems a lot more accessible.

20080805, 03:01  #4 
Jul 2008
13 Posts 
i must say i am pretty disappointed with the pdf book by fftguru.com
though the author may have a strong command of the subject (i cannot say otherwise), i certainly do not think he has a strong command of how to teach others. he spends very little time explaining anything. amazingly, he finds that it is somehow interesting or even significant to digress to his personal experience of trying to find a way to beat the odds at roulette. while i realize that he was probably trying to make a point of how fft could be applied in this situation, i felt it was extremely inappropriate (i downloaded this to learn fft, not to read about his boring escapades). another chapter is devoted (and very poorly, i believe), to complex numbers. anybody interested in learning fft should already have had experience with complex numbers, including Euler's formula in complex analysis. i dont believe he even bothered to mention this. i could not understand the later chapters, because i lacked understanding of the foundations that he attempted to explain in the first chapter. the main thing i got out of this was what it meant when one states that the Fourier Transform transforms a function from its representation in the time domain into a representation in the frequency domain. he even had a simple picture, and for that i am grateful. but i expected so much more. i decided to just buy a book on the subject. im sure his sourcecode will be a lot more interesting and intelligible when i have actually learned the basics. 
20080808, 00:15  #5 
Jul 2008
13 Posts 
question: which is faster: number theoretic transform, or transform using complex nth roots of unity?

20080809, 22:19  #6  
(loop (#_fork))
Feb 2006
Cambridge, England
13·491 Posts 
Quote:
NTT has a guarantee of correctness which is perhaps more comfortable than the floatingpoint methods. 

20080811, 19:33  #7  
∂^{2}ω=0
Sep 2002
República de California
3×5^{3}×31 Posts 
Quote:
And both floatingpoint and integer *hardware* units are subject to similar kinds of hardwarelevel glitches, whether they be due to 3rdparty device drivers, overheating, bad memory, voltage glitches or "cosmic rays". As I've recounted on several occasions, during the allinteger parallel "wavefront" verification of the compositeness of F24, on at least 3 occasions we had a residue mismatch between a pureinteger partialinterval "triple check" run and the [matching] independentfloatingpointFFT wavefront results. In every such case, rerun of the pureinteger code for the given iteration range on an alternate PC gave a result that matched the floatingpoint ones, which were run on workstationclass hardware [a Sparc 1 and an SGI Octane]. So marginalquality PC hardware is not rendered less marginal simply by running in pureinteger mode. Dual floatingpoint runs with aggressively set transform lengths and independent computation data [as is done by GIMPS] is a far better way to go in the sense of establishing a result "beyond reasonable doubt", especially when one is testing thousands of numbers per week, which would make the orderofmagnitude speed penalty associated with a pureinteger run unacceptable. [In the case of F24 we had just *one* result to verify, and dualFPresult verification was not as widely accepted yet, especially by the academic community.] 

20080817, 20:17  #8  
Jul 2008
13 Posts 
Thanks for your responses.
Some additional questions, if I may. For a NTT I would have to reduce modulo a 32bit prime number to even hope for acceptable speed b/c doing so by anything larger would require that I implement the reduction in software, correct? Quote:
The point you're making here is that the transform size would have to be incredibly large (and perhaps larger than is feasible) to ensure no overflows? Quote:
The main reason why I was considering NTT was the "guarantee" of correctness. But I guess I should stick with what others have already proven to be effective. Where may I find the mathematics associated with ensuring that doubleprecision FP rounding generates a correct result? I briefly glanced through volume 2 of Knuth's Art of Computer Programming: Seminumerical Algorithms, and there is a section on applying FFT to multiplication, and I thought there may have been something there related to it. For now, I will be content to bury my head in a DFT book, and then an FFT book. 

20080818, 18:47  #9  
∂^{2}ω=0
Sep 2002
República de California
3×5^{3}×31 Posts 
Quote:
Quote:
Quote:
Quote:
Quote:


20080818, 22:05  #10 
"Richard B. Woods"
Aug 2002
Wisconsin USA
2^{2}×3×641 Posts 
antlu,
Don't be misled by Ernst's quotation marks there. Cosmic rays (highenergy particles from outer space) really are more likely to have an effect on the smallerandsmaller circuitry we're getting. A particle hit that disrupts three or four atoms on a chip is more likely to cause an error on today's narrow circuitry than on the slowerbutwider circuits of yesteryear. Last fiddled with by cheesehead on 20080818 at 22:05 
20080818, 22:17  #11 
Jul 2008
13 Posts 
I see. I really appreciate your time, btw.
Could you please elaborate a bit on the error detection methods employed by GIMPS? I understand that an LL residue for a particular exponent must be doublechecked by a different individual before results are accepted. But what is the powerof2 shift of the initial residue you mention? Is this just that instead of beginning the LL sequence with 4, you begin with a different power of 2 (or 4 shifted by a power of 2)? It would seem that hardware reliability is a huge factor in testing large exponents. >32 million iterations, with idunnohowmany machine operations per iteration. Would there be any advantage to saving intermediate results? Thus, instead of performing an LL test in its entirety, we perform some fraction of it, and then repeat this part to verify that the result up to some number of iterations is correct before proceeding? Just a thought. I am convinced that I should proceed with FP. I have heard mentions of a Discrete Weighted Transform (DWT) and my very shallow understanding (please correct me on my errors) of its advantage is that one does not need to pad the input with zeros as one would need to in a regular FFT (padding with zeros to extend the input to a powerof2 length). The tradeoff being that one must perform two convolutions (cyclic and negacyclic?) instead of one to arrive at the final result. I don't even understand most of what I said yet. Perhaps it will make more sense after some reading. In your opinion, would there be any benefit to including a hybridsquaring routine where Karatsuba and Toom3 are used for smaller operands? When L^2  2 (mod Mersenne) is relatively small, the iteration may be faster with these routines than with an FFT. I look forward to Intel's new Nehalem microarchitecture. 
Thread Tools  
Similar Threads  
Thread  Thread Starter  Forum  Replies  Last Post 
SQUFOF implementation  alpertron  Factoring  15  20100412 19:16 
Interested in C/C++ libraries for factoring  PythonPower  Factoring  27  20090528 17:08 
need C implementation of divb(r,k)  Leith  Miscellaneous Math  4  20050118 23:14 
RSA implementation  flava  Programming  12  20041026 03:51 
Is anybody interested in this?  ET_  ElevenSmooth  2  20040326 16:29 