From my experience, prime95 is usually limited by something other than
the CPU so the CPU is not 100% busy and does not use all the power it can.
Here is my one example for an i53470 rated at 77 TDP
With 1 DIMM, the entire system uses 67 W and gets these benchmark results
Quote:
Timings for 4096K FFT length (1 cpu, 1 worker): 24.78 ms. Throughput: 40.36 iter/sec.
Timings for 4096K FFT length (2 cpus, 2 workers): 31.09, 31.08 ms. Throughput: 64.34 iter/sec.
Timings for 4096K FFT length (3 cpus, 3 workers): 41.77, 41.73, 41.81 ms. Throughput: 71.82 iter/sec.
Timings for 4096K FFT length (4 cpus, 4 workers): 56.00, 57.24, 56.07, 56.25 ms. Throughput: 70.94 iter/sec.

Notice how 4 workers is not getting more work (iter / sec) done than 3 workers.
With 2 DIMM in 2 channels, the system uses 80 W and gets these benchmark results.
Quote:
Timings for 4096K FFT length (1 cpu, 1 worker): 24.09 ms. Throughput: 41.51 iter/sec.
Timings for 4096K FFT length (2 cpus, 2 workers): 25.11, 24.77 ms. Throughput: 80.20 iter/sec.
Timings for 4096K FFT length (3 cpus, 3 workers): 27.38, 27.38, 27.50 ms. Throughput: 109.42 iter/sec.
Timings for 4096K FFT length (4 cpus, 4 workers): 32.22, 32.11, 32.09, 32.08 ms. Throughput: 124.51 iter/sec.

If the CPU was the bottleneck, we would expect 4 worker to get 4x the
throughput of 1 worker, or 4x 41.51 = 166 iter / sec instead of the 124.5
I am getting.
The 4096K FFT is used for exponents around 70,000,000 .
Doing other types of work, like double check, trial factor, or P1 factoring
may get different results.
Hope this helps.