mersenneforum.org > Math Bits in an FFT word
 Register FAQ Search Today's Posts Mark Forums Read

 2007-10-17, 19:33 #1 Prime95 P90 years forever!     Aug 2002 Yeehaw, FL 11111010110102 Posts Bits in an FFT word A recent bug report has revealed an error in my calculation for the expected number of bits in an FFT result word prior to the carry propagation step. When working on Mersenne numbers, 2^p-1, we put n=p/FFTlen bits in each FFT input word. Since we use balanced representation each result word is the sum of many (n-1) times (n-1) bit values. Some of these "cancel-out" as the quantities are both positive and negative. Lots of data has shown that I can expect the result words to have at most (n-1) + (n-1) + 0.6 * log2 (FFTlen) bits. When I test k*2^p-1 numbers, there an extra log2(k)/2 bits in each input word. This adds an extra log2(k) bits to each result word. Tests have shown this formula works quite well. The problem arises when I test numbers 2^p-c. In the Mersenne case the weights for each input word are between 1 and 2 (or is it 0.5 and 1.0 - I forget). In the 2^p-c case the input words are weighted with multipliers between 1 and c. In the Mersenne case every input word has roughly the same number of bits. In the 2^n-c case, the input words have anywhere between n and n+log2(c) bits. Any ideas on what my formula should be in this case? Ernst, help!
 2007-10-18, 19:24 #2 ewmayer ∂2ω=0     Sep 2002 República de California 11,743 Posts Well, assuming that the input-word-multipliers in the mod-(2n-c) DWT are approximately randomly [or uniformly] distributed in the interval [1,c], this leads to a multiplier (1+c)/2 on the average input wordsize W. It seems like you should just be able to rework formulae 5-8 in the F24 paper by replacing W with W*(1+c)/2 [which reduces to W in the Mersenne-mod case, by way of a basic sanity check] to get the generalized error estimate you need. I have a simple C routine which I've been using [starting with the Mlucas 3.0 beta] that implements formula (8) in the above paper, and which has saved me much grief in terms of calculating FFT-length breakover points. Here it is - feel free to use and customize as desired. let me know if this, together with the above W-fiddling works for you: Code: /* For a given FFT length, estimate maximum exponent that can be tested. This implements formula (8) in the F24 paper (Math Comp. 72 (243), pp.1555-1572, December 2002) in order to estimate the maximum average wordsize for a given FFT length. For roughly IEEE64-compliant arithmetic, an asymptotic constant of 0.6 (log2(C) in the the paper, which recommends something around unity) seems to fit the observed data best. */ uint32 given_N_get_maxP(uint32 N) { const double Bmant = 53; const double AsympConst = 0.6; const double ln2inv = 1.0/log(2.0); double ln_N, lnln_N, l2_N, lnl2_N, l2l2_N, lnlnln_N, l2lnln_N; double Wbits, maxExp2; ln_N = log(1.0*N); lnln_N = log(ln_N); l2_N = ln2inv*ln_N; lnl2_N = log(l2_N); l2l2_N = ln2inv*lnl2_N; lnlnln_N = log(lnln_N); l2lnln_N = ln2inv*lnlnln_N; Wbits = 0.5*( Bmant - AsympConst - 0.5*(l2_N + l2l2_N) - 1.5*(l2lnln_N) ); maxExp2 = Wbits*N; /* 3/10/05: Future versions will need to loosen this p < 2^32 restriction: */ ASSERT(HERE, maxExp2 <= 1.0*0xffffffff,"given_N_get_maxP: maxExp2 <= 1.0*0xffffffff"); return (uint32)maxExp2; } p.s.: Even though you are targeting x86-style CPUs with your code, the constant Bmant above should remain 53 in all cases, since that [to leading order] reflects the size of stored intermediates, not the register precision. The only added tweak you might want would be a slightly higher value of AsympConst for non-SSE2 CPUs than for those supporting SSE2. But the above should be a pretty good generic value for initial experiments, it can always be fine-tuned later.
 2007-10-18, 20:00 #3 ewmayer ∂2ω=0     Sep 2002 República de California 11,743 Posts It seems the suitably modified inequality (8) leads to a generalized version of the above function of this form: I've also added a super-basic test utility, you should be able to copy this into a .c file and build: Code: #include #include typedef unsigned int uint32; /* For a given FFT length N [= number of floats in the input data array], estimate maximum exponent P that can be tested in a DWT modulo (2^n - C). */ uint32 given_N_and_C__get_maxP(uint32 N, uint32 C) { const double Bmant = 53; const double AsympConst = 0.6; const double ln2inv = 1.0/log(2.0); double ln_N, lnln_N, l2_N, lnl2_N, l2l2_N, lnlnln_N, l2lnln_N, l2cplus1; double Wbits, maxExp2; ln_N = log(1.0*N); lnln_N = log(ln_N); l2_N = ln2inv*ln_N; lnl2_N = log(l2_N); l2l2_N = ln2inv*lnl2_N; lnlnln_N = log(lnln_N); l2lnln_N = ln2inv*lnlnln_N; l2cplus1 = ln2inv*log(1.0 + C); Wbits = 1.0 - l2cplus1 + 0.5*( Bmant - AsympConst - 0.5*(l2_N + l2l2_N) - 1.5*(l2lnln_N) ); maxExp2 = Wbits*N; return (uint32)maxExp2; } int main() { uint32 c, i, n; printf("Enter subtractive constant C in modulus (2^n - C) >\n"); scanf("%u", &c); printf("Max p for various power-of-2 FFT lengths:\n"); for(i=10;i<=25;i++) { n = 1 << i; printf("N =2^%2d, Max p = %10u\n",i,given_N_and_C__get_maxP(n,c)); } return 0; }
2007-10-19, 13:25   #4
davieddy

"Lucan"
Dec 2006
England

647410 Posts

Quote:
 Originally Posted by Prime95 When working on Mersenne numbers, 2^p-1, we put n=p/FFTlen bits in each FFT input word. Since we use balanced representation each result word is the sum of many (n-1) times (n-1) bit values. Some of these "cancel-out" as the quantities are both positive and negative. Lots of data has shown that I can expect the result words to have at most (n-1) + (n-1) + 0.6 * log2 (FFTlen) bits.
"Expect to have" presumably means "almost no exceptions" in ~40M*FFTsize trials.
Doesn't a simple application of the normal (Gaussian) distribution
inform the argument a bit?

David

Last fiddled with by davieddy on 2007-10-19 at 13:29

2007-10-19, 16:27   #5
ewmayer
2ω=0

Sep 2002
República de California

2DDF16 Posts

Quote:
 Originally Posted by davieddy "Expect to have" presumably means "almost no exceptions" in ~40M*FFTsize trials. Doesn't a simple application of the normal (Gaussian) distribution inform the argument a bit?
Gaussian distribution of what, exactly? Due to the balanced-digit representation of inputs and what we [think we] know about the properties of these kinds of deterministic or pseudoprime tests [basically a bunch of successive modmuls with certain deeper properties, but seemingly yielding random intermediates in the range [-q/2, +q/2], with q the modulus in question], we are convolving a set of N inputs which [sans any DWT weighting] are randomly distributed in [-W/2, +W/2]. So what part of the calculation leads to a Gaussian distribution? [I'm not saying there isn't one in there, but one needs to be more careful.] The convolution output-size and roundoff-error heuristic I devised in the above paper starts with the above assumptions, and one quickly sees that random walks in N-dimensional space are key. Floating-point roundoff errors in the convolution outputs [assuming one is doing a floating-point-based convolution] appear to follow a quasi-Gaussian trend, but this is actually an area that could really benefit from further study. [I wish I had the time, but as my primes-related free time is very scarce, I must choose my battles very carefully. "Double the speed of the code by SSE-ifying the FFT, or gain perhaps 1% in the sharpness of the FFT-length-versus-exponent cutoffs via cleverer ROE analysis?" That kind of thing.]

Sans DWT weighting - or at least with the simpler kinds of DWT weights we use to effect negacyclics as for Fermat numbers [weights are the set of Nth complex roots of unity] and Mersennes [weights are fractional powers of 2 roughly randomly in [1, 2] things seem to follow the predictions of the random-walk heuristic very well. What George is saying that in his more-general case something seems to go awry with the above approach, which may require deeper analysis to understand.

2007-10-19, 22:20   #6
Prime95
P90 years forever!

Aug 2002
Yeehaw, FL

11111010110102 Posts

Quote:
 Originally Posted by ewmayer What George is saying that in his more-general case something seems to go awry with the above approach, which may require deeper analysis to understand.
Actually, I haven't studied the non-Mersenne case to come up with any conclusions. I had hoped Ernst had, or had a good theory to try.

I'm not sure I agree that plugging the (1+c)/2 average weighting in your formula is OK. Just doing a little thought experiment:

If I have a length 1024 FFT, and each FFT word has 10 bits after weighting in a Mersenne-mod scenario. Then each output word is the sum of 1024 20-bit products. We add 0.6 * log2 (1024) to hold carry bits as the 1024 values are both positive and negative. Thus, we feel safe with 26 bits of precision.

Now lets say c is such that the DWT weights now add 0,1,...,8 bits to each input word (so the input words are now 10-18 bits). If we use the average bit length of inputs, you'd think we'd need 14*2+6 = 34. However, occasionally we are adding 18-bit times 18-bit products, so this is clearly insufficient!

 2007-10-19, 23:28 #7 ewmayer ∂2ω=0     Sep 2002 República de California 11,743 Posts Hmm, yes, perhaps the dynamic range of the input multipliers is key here, as you say - for Fermat-mod the DWT weights are all of complex norm 1, so as uniform as can be, and for Mersenne-mod the dynamic range is from 1 to 2, which is also very small, so perhaps neglecting it [or treated it by way of an average] doesn't hurt us - at least a whole lot of computation tells us that seems to be the case. You could always try C rather than (1+C)/2 as the multiplier and see which of the two models seems to give a more-reasonable ROE size - the "right" formula should give a nearly-constant max. ROE with increasing FFT length, for moduli around the max-p at each length.

 Similar Threads Thread Thread Starter Forum Replies Last Post mfgoode Lounge 108 2020-04-20 22:06 Prime95 Programming 19 2012-10-06 09:34 jasong jasong 3 2007-10-05 18:02 mfgoode Lounge 11 2006-04-01 16:16 mfgoode Puzzles 11 2004-09-14 19:08

All times are UTC. The time now is 18:30.

Tue Sep 27 18:30:52 UTC 2022 up 40 days, 15:59, 0 users, load averages: 1.73, 1.34, 1.26