mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Software

Reply
 
Thread Tools
Old 2008-01-13, 13:36   #1
Mini-Geek
Account Deleted
 
Mini-Geek's Avatar
 
"Tim Sorbera"
Aug 2006
San Antonio, TX USA

3·1,423 Posts
Default Variable FFT sizes

Why are the FFT sizes only a few fixed sizes? Why not have them variable, only being as large as they need to to do the number without errors. As mentioned in The Happy Me(al) thread, I barely got one that was just outside of an FFT size to use the smaller one, and I have one that's on the lower end of its FFT size. Surely it doesn't need all 2560K FFT size precision to do something 39.78M? Even if 2048K is indeed too inaccurate for the number, there must be something in between those that will give sufficient accuracy and higher speed.
To see what FFT size it needs, perhaps a ~1 hour self-test running the number at different FFT sizes, picking the smallest one that's within acceptable accuracy limits.
If not variable, perhaps many sizes, such as 1 per ~1-2M instead of 1 per ~5-10M.

In something only somewhat-related, is there a way I can get Prime95 to check if my 39.78M will run within acceptable limits on 2048K? It did this automatically, and passed, with my 39.509M number, and I just wanted to make sure I'm not running my other one at a larger FFT size if it's not needed.
Mini-Geek is offline   Reply With Quote
Old 2008-01-13, 15:08   #2
davieddy
 
davieddy's Avatar
 
"Lucan"
Dec 2006
England

2·3·13·83 Posts
Default

If you ask Primenet for an exponent NOW, you should
get one <39.5M.
A cursory study of how a FFT works should explain why the
optimum size is a power of 2, as will a glance at the iteration
times for more awkward sizes. I assume that this trend continues
for sizes less smooth than (2^19)*5.
Not to mention that I suspect the programming of each size has been
customized by George!

David

It's similar to the reason why tennis tournaments start with
64 or 128 players.

Last fiddled with by davieddy on 2008-01-13 at 15:34
davieddy is offline   Reply With Quote
Old 2008-01-13, 18:44   #3
Mini-Geek
Account Deleted
 
Mini-Geek's Avatar
 
"Tim Sorbera"
Aug 2006
San Antonio, TX USA

3·1,423 Posts
Default

Quote:
Originally Posted by davieddy View Post
If you ask Primenet for an exponent NOW, you should
get one <39.5M.
Yeah, but I have no reason to just pass it on instead of finishing it (unless it's likely an SSE will take it up instead of another SSE2), I just wanted to make sure it's not wasting time by using a higher FFT size.
Quote:
Originally Posted by davieddy View Post
A cursory study of how a FFT works should explain why the
optimum size is a power of 2, as will a glance at the iteration
times for more awkward sizes. I assume that this trend continues
for sizes less smooth than (2^19)*5.
Not to mention that I suspect the programming of each size has been
customized by George!

David

It's similar to the reason why tennis tournaments start with
64 or 128 players.
I've tried to look up how FFT works, but it's pretty complicated. Over my head.
I've never noticed how the iteration times are with powers of two before, but now I see that. I guess variable sizes really would be worse.

Anyone know about forcing the self-test to see if 2048K can run the number?
Mini-Geek is offline   Reply With Quote
Old 2008-01-14, 12:51   #4
davieddy
 
davieddy's Avatar
 
"Lucan"
Dec 2006
England

11001010010102 Posts
Default

Quote:
Originally Posted by Mini-Geek View Post
I've tried to look up how FFT works, but it's pretty complicated. Over my head.
I was referring to "butterfly" diagrams which show where the
"Fast" enters into the Fast (Discrete) Fourier Transform. These
diagrams are what resemble the draw for a tennis tournament.
I agree that a full understanding of the subject cannot be gained
by "cursory study".
davieddy is offline   Reply With Quote
Old 2008-01-14, 16:37   #5
ewmayer
2ω=0
 
ewmayer's Avatar
 
Sep 2002
Rep├║blica de California

2×13×449 Posts
Default

Quote:
Originally Posted by Mini-Geek View Post
Why are the FFT sizes only a few fixed sizes? Why not have them variable, only being as large as they need to to do the number without errors. As mentioned in The Happy Me(al) thread, I barely got one that was just outside of an FFT size to use the smaller one, and I have one that's on the lower end of its FFT size. Surely it doesn't need all 2560K FFT size precision to do something 39.78M? Even if 2048K is indeed too inaccurate for the number, there must be something in between those that will give sufficient accuracy and higher speed.
There is - 2304K. However, that [and analogously with all such ever-finer-gradations of length] needs something called a "radix-9 DFT" routine. [Factorize the FFT length and you'll see where that arises.] Similarly, the other 3 "in-between" lengths in going to 4096K need radices 11,13 and 15 respectively, in addition to the usual powers-of-2. By way of example, my Mlucas code supports all those, but the only one of them which consistently gives decent bang-for-the-coding buck is radix 9. Here's why:

1) The relative efficiency of an FFT is [roughly] related to the smoothness of the factorization of its length - thus power-of-2 radices are most efficient, and large-odd-prime radices are worst;

2) Larger radices lead to higher register pressure in-hardware, thus coding them to run efficiently becomes quite difficult.

The above is just a bare outline: feel free to do some homework on the DFT if you wish to understand this more fully. For example, a good advanced-topics question is: "For odd composite radices of similar size, why might one with no repeated factors be preferable to one with repeated factors? [E.g. radix 21 vs 25]."

Last fiddled with by ewmayer on 2008-01-14 at 16:42
ewmayer is offline   Reply With Quote
Old 2008-01-14, 17:16   #6
fivemack
(loop (#_fork))
 
fivemack's Avatar
 
Feb 2006
Cambridge, England

22·32·179 Posts
Default

Quote:
"For odd composite radices of similar size, why might one with no repeated factors be preferable to one with repeated factors? [E.g. radix 21 vs 25]."
Because radix-21 has 12 roots of unity and radix-25 has 20? (OK, six and ten once you've dealt with negation) - but on an x86-like platform I'd have thought you handled the roots of unity with load-execute operations, and wouldn't the radix-21 DFT be done as 3x7 anyway?

Is there some obvious reason that I'm missing why fast FFT code tends to be {small}*2^N rather than {small}*2^N*3^M? The three-way decimation-in-time stage isn't too hideous, the cube roots of unity are conveniently 1 and +/- omega.
fivemack is offline   Reply With Quote
Old 2008-01-14, 17:33   #7
Mini-Geek
Account Deleted
 
Mini-Geek's Avatar
 
"Tim Sorbera"
Aug 2006
San Antonio, TX USA

10000101011012 Posts
Default

Thanks for all the explanations, I understand now that it takes a very smooth FFT size to be efficient.

I got this error on my 39.509M number and was wondering if it's serious or a software bug or anything, and if this significantly raises the chance that this will return an incorrect result (copied from results.txt, with the self-test checking if 2048K is acceptable):
Code:
Trying 1000 iterations for exponent 39509597 using 2048K FFT.
If average roundoff error is above 0.2425, then a larger FFT will be used.
Final average roundoff error is 0.24118, using 2048K FFT for exponent 39509597.
[Mon Jan 14 01:17:53 2008]
Iteration: 1420185/39509597, ERROR: ROUND OFF (0.40625) > 0.40
Continuing from last save file.
[Mon Jan 14 01:42:17 2008]
Disregard last error.  Result is reproducible and thus not a hardware problem.
For added safety, redoing iteration using a slower, more reliable method.
Continuing from last save file.

Last fiddled with by Mini-Geek on 2008-01-14 at 17:43 Reason: clarifying part of second sentence
Mini-Geek is offline   Reply With Quote
Old 2008-01-14, 17:33   #8
ewmayer
2ω=0
 
ewmayer's Avatar
 
Sep 2002
Rep├║blica de California

2×13×449 Posts
Default

Quote:
Originally Posted by fivemack View Post
Because radix-21 has 12 roots of unity and radix-25 has 20? (OK, six and ten once you've dealt with negation) - but on an x86-like platform I'd have thought you handled the roots of unity with load-execute operations, and wouldn't the radix-21 DFT be done as 3x7 anyway?
"Why twiddle when you can permute?"

Quote:
Is there some obvious reason that I'm missing why fast FFT code tends to be {small}*2^N rather than {small}*2^N*3^M? The three-way decimation-in-time stage isn't too hideous, the cube roots of unity are conveniently 1 and +/- omega.
Might be worth investigating, but I expect a highly optimized power-of-2 combined with a decent variety of odd leading radices gives something approaching an optimum in the performance/implementation-difficulty space. At large FFT lengths, one would have the option of perhaps-inefficient single leading radix [pay me now] followed by many really fast powers-of-two, or multiple powers of three, all slightly less efficient than their smoother power-of-2 brethren [pay me later].

EDIT: By way of example, I can do a complex radix-8 DFT [sans any multipass twiddles] with just 52 [scalar] add and 4 mul.

A radix-9 DFT, on the other hand, needs 84 add and 20 mul. That is significantly more ops-per-input than the power of 2. OTOOH, if I follow that radix-9 with a large power of 2, the extra work done in doing the one radix-9 pass gets offset [and more] by the shorter vector length compared to [say] going to the next-larger power of 2, or even to 10*[power of 2]. The radix-10 DFT needs 88 add and 20 mul and thus is slightly better-per-input, but that work gets multiplied by [10/9] in comparing vs 9*[same power of 2] to reflect the overall larger vector length.

Of course in the real world there is much more to it than mere arithmetic opcount - cache misses and memory bandwidth are much more important at larger vector lengyhs - but all other things being equal [which the same large-power-of-2 ensures] using a shorter vector length will usually give better cache behavior, as well, since proportionally more of the data array can fit in cache.

Last fiddled with by ewmayer on 2008-01-14 at 19:08
ewmayer is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
Let y^2=xz-x^2+1, and if value of x is known, can y and z be directly calculated? Given all variable MARTHA Number Theory Discussion Group 11 2018-03-02 15:55
Exponentiation w/ independent variable Unregistered Homework Help 4 2010-08-04 06:38
FFT sizes Cruelty Riesel Prime Search 3 2006-07-12 15:15
How Much Memory at Various Sizes? wblipp GMP-ECM 5 2005-04-24 20:04
Cache Sizes Unregistered Hardware 4 2003-12-06 13:43

All times are UTC. The time now is 07:32.


Thu Dec 2 07:32:37 UTC 2021 up 132 days, 2:01, 0 users, load averages: 1.27, 1.28, 1.22

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.