![]() |
![]() |
#1 |
Apr 2003
Berlin, Germany
192 Posts |
![]()
During my studies of FFT+NTT I got following idea:
In case of FFT we have to provide enough bits per word to safely hold the squares of the values in frequency space without getting rounding errors during the inverse transform. But actually we don't need that many bits during the forward transform. That offers the possibility to use a floating point format with a smaller mantissa. I assume that the 24bit mantissa of single precision numbers is unfortunately just a few bits too short to fully utilize the double precision 53bit mantissa on the way back. However there is maybe still a possible gain by using SP for forward and DP for inverse transform thanks to roughly double throughput for single precision calculations using SIMD instructions on some architectures. The memory traffic wouldn't decrease much if at all because IMO it would be better to have the 32bit values at nearly the same locations as the doubles later to avoid additional TLB/cacheline issues, which could appear if some out-of-place transform style is being used. An enhancement is possible by grouping SP values of 2 cachelines together into one to reduce memory traffic. The unused space of cacheline size would be used for the doubles and the grouping of SP values is also necessary to be able to load them efficiently using SIMD instructions. Another story is, if this scheme could be applied to NTTs by using primes of different sizes. Unfortunately the values we get after a forward transform are repeatedly multiplied by powers of roots and added - always modulo a chosen prime. So I imagine it will be difficult or even impossible to change these values in a way that a different prime (with more bits) can be used for the inverse transform. Any ideas? |
![]() |
![]() |
![]() |
#2 |
P90 years forever!
Aug 2002
Yeehaw, FL
11111110101102 Posts |
![]()
The very first FFT I wrote for prime95 (way back in 1995) used a Nussbaumer FFT where the forward FFT used 112 bit integers and the inverse FFT used 224 bit integers. Then I read up on the Crandall/Fagin DWT and it was much faster.
Not sure if that's at all relevant to this thread... |
![]() |
![]() |
![]() |
#3 | |
Apr 2003
Berlin, Germany
36110 Posts |
![]() Quote:
The floating point version of this idea could be tested easily. The DWT shouldn't cause troubles because the weights are just weights and don't change the number of relevant bits per word that much. And the mantissa size for the forward transform should be at least a half of the needed mantissa size for the squared values. All bits following the least significant mantissa bit will get lost, so we don't need to bring them into play. OTOH the multiplication pyramid shows, that a 50th bit of a factor would still have some influence on a double precision product. BTW, how do the prospects for this 112/224 bit nussbaumer look on 64bit hardware with fast 64bit mul? It won't beat the floating point FFT but maybe there are other uses (for arbitrary large integer multiplication) |
|
![]() |
![]() |
![]() |
Thread Tools | |
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Inverse Laplace Transform | flouran | Math | 1 | 2010-01-18 23:48 |
FFT sizes | Cruelty | Riesel Prime Search | 3 | 2006-07-12 15:15 |
How Much Memory at Various Sizes? | wblipp | GMP-ECM | 5 | 2005-04-24 20:04 |
Cache Sizes | Unregistered | Hardware | 4 | 2003-12-06 13:43 |
Looking Forward | QuintLeo | Lounge | 5 | 2003-01-17 21:29 |