Jan 2020
Finland
I just recently upgraded my computer to 9900K.
I did couple tests regarding wattage / throughput. Only tested 2880K fft because that's the size i am doing doublechecking at the moment. results for 9900K @3.6Ghz all cores: FFTlen=2880K, Type=3, Arch=4, Pass1=320, Pass2=9216, clm=4 (1 core, 1 worker): 11.45 ms. Throughput: 87.36 iter/sec. FFTlen=2880K, Type=3, Arch=4, Pass1=320, Pass2=9216, clm=4 (2 cores, 1 worker): 5.98 ms. Throughput: 167.18 iter/sec. FFTlen=2880K, Type=3, Arch=4, Pass1=320, Pass2=9216, clm=4 (2 cores, 2 workers): 11.71, 11.60 ms. Throughput: 171.63 iter/sec. FFTlen=2880K, Type=3, Arch=4, Pass1=320, Pass2=9216, clm=4 (4 cores, 1 worker): 3.22 ms. Throughput: 310.47 iter/sec. FFTlen=2880K, Type=3, Arch=4, Pass1=320, Pass2=9216, clm=4 (4 cores, 2 workers): 7.12, 7.06 ms. Throughput: 281.97 iter/sec. FFTlen=2880K, Type=3, Arch=4, Pass1=320, Pass2=9216, clm=4 (4 cores, 4 workers): 14.38, 14.23, 14.26, 14.28 ms. Throughput: 279.98 iter/sec. FFTlen=2880K, Type=3, Arch=4, Pass1=320, Pass2=9216, clm=4 (6 cores, 1 worker): 2.73 ms. Throughput: 366.08 iter/sec. FFTlen=2880K, Type=3, Arch=4, Pass1=320, Pass2=9216, clm=4 (6 cores, 2 workers): 6.80, 6.79 ms. Throughput: 294.25 iter/sec. FFTlen=2880K, Type=3, Arch=4, Pass1=320, Pass2=9216, clm=4 (6 cores, 4 workers): 21.49, 21.41, 10.72, 10.71 ms. Throughput: 279.89 iter/sec. FFTlen=2880K, Type=3, Arch=4, Pass1=320, Pass2=9216, clm=4 (8 cores, 1 worker): 2.76 ms. Throughput: 362.61 iter/sec. FFTlen=2880K, Type=3, Arch=4, Pass1=320, Pass2=9216, clm=4 (8 cores, 2 workers): 7.24, 7.23 ms. Throughput: 276.43 iter/sec. FFTlen=2880K, Type=3, Arch=4, Pass1=320, Pass2=9216, clm=4 (8 cores, 4 workers): 14.98, 15.23, 15.15, 15.02 ms. Throughput: 265.06 iter/sec. FFTlen=2880K, Type=3, Arch=4, Pass1=320, Pass2=9216, clm=4 (8 cores, 8 workers): 30.88, 30.40, 31.11, 30.18, 30.40, 30.64, 30.48, 30.48 ms. Throughput: 261.71 iter/sec. doublechecking work for 6cores 1 worker results in 62w powerconsumption reported by HWMonitor. same test 9900K with 4.7ghz boost all cores. 6cores 1 worker was still the fastest with 380iter/sec and power usage was 125w. Conclusion: doubling the power consumption only results in 5% performance increase in Prime95. 
Feb 2005
Conclusion: Your CPU is waiting on your memory to provide data.
Faster Ghz CPU, in this case, is hurry up and wait. You could try *under*clocking the CPU, if you were looking for peak efficiency; I imagine you could drop wattage 10% or more while still waiting on memory a bit. 
Dec 2018
I have an i58400 with crappy slow OEM memory at work. It's a 6core processor but I'm actually running Prime95 on just 4 cores because the throughput was best at that setting. So it is starved for memory bandwidth even sooner.

Jan 2020
Finland
I did some more testing for the 9900K
I overclocked the memory to 3600Mhz with 1.38V and got it stable. I tested different speed from 800Mhz to 4000Mhz. No need to go faster because memory starts to bottleneck. Fastest speed was 455,32 iter/sec 6cores @4000Mhz consuming 84.5watts. That results in 5.39iters/watt For the peak efficiency / watt, @1500Mhz was able to get 283,27 iter/sec with 25watt consumption. Thats 11.33 iters/watt! For anyone interested i've attached all the data that I collected in a spreadsheet. 
Note that if you compute iterations/sec per Watt then the output unit is iterations/Joule (because 1 Watt = 1 Joule/sec). 

iterations/Joule is an interesting measure, but a great deal depends on the fft length.
And on whether the system's many auxiliary loads are fed by those Joules, or only the cpu. Where is the power consumption measured, at the wall plug, the cpu's sensors, or elsewhere? Computational effort per iteration is O(n log n log log n), not constant. 
Jan 2020
Finland
power usage was measured by checking cpu's sensors (package power) with HWMonitor when first fft implementation was running in prime95 benchmark. And I averaged it by eye. Lets say i've measured wattage to be 25w, in that case it was actually fluctuating between 24,7 and 25,3. Random spikes was ignored, I figured they are probably background processes consuming cpu cycles occasionally. There was little difference in consumption when different types of fft implementations were running, but i didn't bother taking measures every implementation. It would've been too time consuming. The whole point of Iters/joule measure was to find the most power efficient speed for the cpu. I would've guessed it to be with slowest speed and lowest corevoltage but it did not be that case. First I changed cpu speeds in BIOS and voltages were on auto, but for some reason the voltages didin't go lower than 0.9v. Then I found software called Throttlestop which lets you change cpu speed on the fly from windows. Voltages with different speeds were stock voltages the cpu asked for. 

Stock Ryzen 9 3900X with dual DDR43200 (dual rank).
Code:
Prime95 64bit version 29.8, RdtscTiming=1 Timings for 2880K FFT length (12 cores, 1 worker): 1.33 ms. Throughput: 750.80 iter/sec. Timings for 2880K FFT length (12 cores, 2 workers): 2.10, 2.10 ms. Throughput: 954.10 iter/sec. Oliver 
I decided that my computers were using too much power so I have started tuning them for efficiency. My first set of results are for an i39100.
With the setting I chose to use, mprime speed dropped by less than 10% with almost a 46% drop in power use. In addition to the significant drop in power use, the computer fans spin much slower now and make significantly less noise. The one thing I should have measured during the testing but didn't was the CPU temperature. I will look into adding that in the future. Most likely the temperature is much lower with the reduced power use. See the attached picture for the details. 
