From fairly extensive testing on my own systems, I get a power regresssion:
fft = 0.03665* exponent^1.12217 or exponent = 21.852 * fft ^0.9783 Only one fft I've tested so far falls more than 1% below this bound, 3136k. 
Drivers, Cudalucas version and settings were all the same. The only exception in the tests was power of two FFT, and only on the minimum band. The first approach works for any GPUs, both current and future, so it should be used 

FFT Bench Titan and gtx 590
This weekend was the first weekend where I were able to fiddle with my platform since mid august.. and oboy did I get new experiences.
* New driver from Nvidia from midseptember has the memory fix where we had to take memory down to 2500mhz. I can now run stock 3004mhz on memory on the titans. * I researched FFTs.. and I discovered a new FFT length for my exponents I am testing which saved a few hours * With FFT research, I confirmed the FFT length for my 590s were the most optimal one. I also looked into interesting FFT sizes for both platforms, titan and 590 board. This overview is in the attached excel file. All red FFT lengths are interesting as they are large and quick compared to the other ones. The FFT length I tested was from 3m5m which is suitable for most people. I would love to do this on 690, 680, 780, etc.. but I simply do not have these cards. Atleast I found the best FFT length for my exponents I am testing, and for the titans, the fiddling during weekend, took my exponents from 84h per test to 60h per test. Simply with a combination of new driver and adjusting GPU clock speed and memory speed and adjusted FFT length. The attached overview, simply also shows the difference between the platforms. Titan has one 4m FFT length with great result, while 590 has a different 4m FFT which is quick. This explains easily why the 590s now are 2.5 times slower per titan. But since 590s can do 2 exponents simultaniously, they still perform compared to Titan. also 590s are cheap these days. One of the Titans is right now computing a very detailed FFTlength overview from 3.5m4.5m as this is the range of FFTs I am looking for the exponents I am testing. I will take time to get this list, so I am waiting patiently for the titan to process the list. I will post my findings after some tests.. I am offcourse hoping for a new FFT length which gives me even better performance. well see. Last fiddled with by Manpowre on 20131020 at 12:25 
I checked 20k FFT lengths, and the quickest one between 3.5m4.5m is 4096000 with 0,888446 msec for Titan. (590 has different quickest FFT length in this FFTlength area).
This FFTlength of 4096000 also shows up using 1024 between each fft length when benchmarking using cudalucas. nice confirmation... 
Why did you start the fft from the value close to the original one? I am asking, could one test the bestfft by ranging fft from such as 1024 to 3276800? Then pick up the best one (defined as the fast one as well as throws no error or less error than 0.12) . Surely, this had happened in my job cases, and error is less than 0,1 with the fft_picked less or less less than the fft_automatic. 

If you have a bag full of 10 cm nails and want to put them into a fence, you may try to do this very fast with a 100 kilotons pneumatic hammer, be done in five minutes and at the end can't find the fence anymore, in the rubble you create, or you can try what is written on the bag: "use a half kilo manual hammer to insert these nails into the fence". Sometimes the half kilo can be too heavy for you if you hammer all the day with it, your hand will be in terrible pain, or, if you are a strong guy, it can be too light and detrimental to your productivity. So you can "tune" the hammer between, say, 300 grams and 800 grams, to be more productive and get less pain in your muscles. For sure you won't use a 5 grams jewelery hammer, otherwise you may need to hammer all your life for that bag of nails. Neither a one ton hammer won't be good, if you still need the fence at the end... So, stay around the prescription on the bag. Why? Well... to understand why, you need to learn the subtleties of FFT multiplication...
 
