20190908, 15:44  #1 
"Sam"
Jun 2019
California, USA
30_{10} Posts 
CPU Energy Efficiency for Prime95
I just picked up an Intel Core i99900T CPU (8C/16T, 35W TDP) and in combination with 32GB DDR43600 dual rank memory and a ASRock Z390 Phantom GamingITX/ac motherboard managed to achieve some decent throughput figures:
Timings for 2048K FFT length (8 cores, 1 worker): 1.11 ms. Throughput: 904.50 iter/sec. Timings for 2304K FFT length (8 cores, 1 worker): 1.47 ms. Throughput: 680.55 iter/sec. Timings for 2400K FFT length (8 cores, 1 worker): 1.83 ms. Throughput: 545.20 iter/sec. Timings for 2560K FFT length (8 cores, 1 worker): 1.95 ms. Throughput: 512.50 iter/sec. Timings for 2688K FFT length (8 cores, 1 worker): 2.03 ms. Throughput: 492.94 iter/sec. Timings for 2880K FFT length (8 cores, 1 worker): 2.29 ms. Throughput: 437.28 iter/sec. Timings for 3072K FFT length (8 cores, 1 worker): 2.42 ms. Throughput: 413.06 iter/sec. Timings for 3200K FFT length (8 cores, 1 worker): 2.55 ms. Throughput: 391.66 iter/sec. Timings for 3360K FFT length (8 cores, 1 worker): 2.82 ms. Throughput: 354.72 iter/sec. Timings for 3456K FFT length (8 cores, 1 worker): 2.82 ms. Throughput: 353.99 iter/sec. Timings for 3584K FFT length (8 cores, 1 worker): 2.95 ms. Throughput: 339.23 iter/sec. Timings for 3840K FFT length (8 cores, 1 worker): 3.14 ms. Throughput: 318.70 iter/sec. Timings for 4096K FFT length (8 cores, 1 worker): 3.47 ms. Throughput: 288.33 iter/sec. Timings for 4480K FFT length (8 cores, 1 worker): 3.92 ms. Throughput: 255.15 iter/sec. Timings for 4608K FFT length (8 cores, 1 worker): 3.88 ms. Throughput: 258.06 iter/sec. Timings for 4800K FFT length (8 cores, 1 worker): 4.34 ms. Throughput: 230.54 iter/sec. Timings for 5120K FFT length (8 cores, 1 worker): 4.53 ms. Throughput: 220.61 iter/sec. Timings for 5376K FFT length (8 cores, 1 worker): 4.80 ms. Throughput: 208.33 iter/sec. Timings for 5760K FFT length (8 cores, 1 worker): 5.39 ms. Throughput: 185.67 iter/sec. Timings for 6144K FFT length (8 cores, 1 worker): 5.61 ms. Throughput: 178.22 iter/sec. Timings for 6400K FFT length (8 cores, 1 worker): 5.98 ms. Throughput: 167.09 iter/sec. Timings for 6720K FFT length (8 cores, 1 worker): 6.19 ms. Throughput: 161.55 iter/sec. Timings for 6912K FFT length (8 cores, 1 worker): 6.55 ms. Throughput: 152.76 iter/sec. Timings for 7168K FFT length (8 cores, 1 worker): 6.47 ms. Throughput: 154.53 iter/sec. Timings for 7680K FFT length (8 cores, 1 worker): 7.02 ms. Throughput: 142.46 iter/sec. Timings for 8064K FFT length (8 cores, 1 worker): 7.46 ms. Throughput: 134.00 iter/sec. Timings for 8192K FFT length (8 cores, 1 worker): 7.51 ms. Throughput: 133.24 iter/sec. This CPU strikes me as being quite energy efficient in running Prime95. It throughput is similar to a Core i78700K CPU (6C/12T, 95W TDP) but at nearly 1/3 the power dissipation. It would be interesting to determine which modern CPU can achieve the highest efficiency for running Prime95, essentially a iters/sec per watt metric. The CPUs delivering the highest throughput numbers tend not to be the most energy efficient. My i75960X rig with quad channel DDR4 memory is currently the fastest among my systems, but this CPU consumes almost 120W while running Prime95, and its throughput is nowhere close to 3X of the i99900T. Upcoming Intel Cascade LakeX CPUs may provide an efficiency improvement over current gen Core X CPUs, so I'll be watching those closely. Mobile CPUs with high core count may also be candidates for the energy efficiency crown. 
20190908, 16:31  #2 
Jun 2003
2^{3}×607 Posts 
Keep in mind that 35W TDP != 35W max power consumption. Have you tried to measure the actual power at the wall? Obviously, the total system power will be much higher, but the CPU itself might draw much more than 35W at full load, especially when using AVX.

20190908, 16:44  #3 
"Sam"
Jun 2019
California, USA
2×3×5 Posts 
Yes, I'm fully aware of that.
When Prime95 is started from idle, the i99900T engages Turbo Boost and consumes ~70W package power for several seconds, as measured using HWMonitor. The package power then goes back down close to 35W and stays there. I did not override any default settings for CPU such as AVX offset in the BIOS, so once the CPU runs out of power & thermal headroom with Turbo Boost, it reduces the frequencies of the cores and the package power returns to TDP level. Total system power is of course much higher than 35W, with memory, chipset, graphics, storage, VRs, etc. all consuming power in addition to the CPU itself, but this is true for any computer system. Last fiddled with by scan80269 on 20190908 at 16:50 
20190908, 16:48  #4 
"Sam Laur"
Dec 2018
Turku, Finland
2·3·5·11 Posts 
Also see how the clock speed behaves during the run. As I recall, the 35W "TDP" limited parts run at full speed for some time (some seconds  max. tens of seconds, depending on the motherboard manufacturer's parameters in the BIOS) and then throttle the clock lower if CPU demand stays high. And if the cooling is designed for 35 watts, it will probably also hit some sort of thermal throttling during a longer run. Full exponent test, not just throughput benchmark.

20190908, 17:11  #5 
"Composite as Heck"
Oct 2017
1371_{8} Posts 
Probably the 3900X or 3950X are the most energy efficient consumer CPU assuming the 64MB of cache is a big deal. Epyc zen2 will be the best including server CPUs due to running in the sweet spot of the power curve and more densely packing the compute power, therefore having less overhead per iteration and probably better utilising the sweet spot of the PSU. But if we include GPUs the Radeon VII beats all reasonable options.

20190908, 17:17  #6  
Apr 2019
5×41 Posts 
Quote:
I could definitely see the EPYC doing well though; 8 channel per socket i think? 

20190908, 17:24  #7 
"Sam"
Jun 2019
California, USA
1E_{16} Posts 
Your description is spot on.
Intel CPU Turbo Boost frequencies for all cores correspond to a power level way higher than TDP, especially when running AVX. The Turbo Boost duration for desktop Intel CPUs is typically no more than a few seconds by default, after which the CPU core frequencies will go down to bring the steady state package power consumption in line with TDP. Attached screen shot is from my i78700T CPU running Prime95 exponent 86622433. The cores are at 2.7GHz most of the time, which is a bit higher than the 2.4GHz "base frequency", but nowhere near the 4.0GHz "max Turbo frequency". Steady state package power fluctuates slightly but is always very close to TDP at 35W. Thermal throttling is an entirely different thing, and occurs when CPU internal temperature reaches "PROCHOT", typically 100C but may be higher or lower depending on CPU model. As long as the thermal solution can evacuate TDP level heat away from the CPU, the core/package temperatures should not reach PROCHOT. My i78700T has a giant NOFAN CR95C passive cooler as thermal solution, and the CPU package and core temperatures are below 60C while running Prime95 24x7. Last fiddled with by scan80269 on 20190908 at 17:26 
20190908, 17:44  #8  
"Sam"
Jun 2019
California, USA
1E_{16} Posts 
Quote:
I suspect the i99900T with high speed DDR4 memory (e.g. 3600 dual rank) may be hard to beat in efficiency, since the TDP is only 35W, so CPUs with higher TDP will need to deliver several times the throughput to come out on top. For example, even with AVX512 and 6channel memory, I doubt if a Xeon W3175X platform can achieve >7.2X the throughput of what I posted for i99900T/DDR43600. Wouldn't mind being proven wrong, though. Perhaps the way to compare is to simply take the iters/sec figure for leading edge exponents (5120K FFT??) divided by the CPU steady state package power, with optimized thread and worker counts for each CPU. Last fiddled with by scan80269 on 20190908 at 17:45 

20190908, 17:53  #9 
Feb 2016
UK
623_{8} Posts 
On Intel, TDP is the maximum power required to run at base clock at elevated but within spec temperatures. Most enthusiast level systems with adequate cooling will boost and remain at PL2 indefinitely, which is somewhere above TDP. This is considered inspec by Intel, it isn't overclocking. Where TDP really plays is in thermally limited systems, such as laptops and horrible systems from box shifters like Dell. Because of the limited cooling potential, they allow a short boost above TDP, before pulling back to it. Clocks will probably be below max turbo, but doesn't necessarily have to drop all the way down to base.
As for efficiency, it mostly comes down to where on the efficiency curve you run on a CPU. Lower clocks at lower voltage helps a huge amount, if you can throw enough cores at it. So a direct comparison between Intel and AMD isn't trivial. I'm doing the new Fermat divisor project on PrimeGrid at the moment. Based on CPU selfreported power, I can also look at average work unit time, and work out a production over time. Combined, I can work out the number of tasks per kWh. Units at time of writing are quite small, 120k128k FFT, running 1 task per core. 3600  328 units/kWh 3700X  430 units/kWh 6700k  230 units/kWh E5 2683v3  293 units/kWh Mainstream consumer CPUs tend to be more biased towards clock than efficiency, which is in part why there are lower power versions available too. The 3600 and 3700X reported near enough the same power used running 6 and 8 tasks respectively, and clocks only differed by about 100 MHz. I don't know if they used a better bin on the 3700X but it certainly is more efficient. Been on my "to do" list for a while, but I wanted to see what sort of tradeoffs can be had by essentially underclocking/undervolting. Might be simpler to run a lower power limit and let the CPU take care of it. 
20190908, 19:23  #10  
"Eric"
Jan 2018
USA
2^{2}×53 Posts 
Quote:


20190908, 21:34  #11 
Feb 2016
UK
13·31 Posts 
At the risk of taking this on a tangent, I presume that is for large mersenne tasks. I know there have been attempts at implementing e.g. LLR on GPU with... performance that wasn't anything to talk about. I haven't kept up to date, but presume nothing has changed recently. I'm left wondering if this is a GPU limitation, a code limitation, or a math limitation? For now, CPUs are still optimal for many forms of prime finding.

Thread Tools  
Similar Threads  
Thread  Thread Starter  Forum  Replies  Last Post 
How much do you pay for your electric energy?  em99010pepe  Lounge  31  20110214 01:57 
kinetic energy  science_man_88  Miscellaneous Math  8  20100529 04:14 
Energy Minimization  ShiningArcanine  Math  2  20080416 13:47 
VIA C3 efficiency  ET_  Hardware  4  20070327 21:29 
Energy efficiency for LL  markhl  Hardware  5  20040204 13:33 