
Quote:
Originally Posted by vsuite
Are there any settings to make an i7 quad core with hyperthreading seem like an 8 core machine so I can benchmark with 5 or 6 LL threads please?

I asked this question because I wonder whether 4 threads on 4 core HT i7 processors gives us the maximal throughput.
Intel(R) Core(TM) i74790K CPU @ 4.00GHz
CPU speed: 3990.82 MHz, 4 hyperthreaded cores Timings for 1024K FFT length (4 cpus, 4 workers): 6.75, 6.77, 6.88, 6.78 ms. Throughput: 588.75 iter/sec.
Timings for 1024K FFT length (4 cpus hyperthreaded, 4 workers): 7.01, 6.99, 7.03, 7.01 ms. Throughput: 570.72 iter/sec.
Timings for 1280K FFT length (4 cpus, 4 workers): 8.54, 8.51, 8.58, 8.54 ms. Throughput: 468.27 iter/sec.
Timings for 1280K FFT length (4 cpus hyperthreaded, 4 workers): 8.78, 8.76, 8.91, 8.76 ms. Throughput: 454.29 iter/sec.
Timings for 1536K FFT length (4 cpus, 4 workers): 10.16, 10.20, 10.27, 10.23 ms. Throughput: 391.61 iter/sec.
Timings for 1536K FFT length (4 cpus hyperthreaded, 4 workers): 10.53, 10.52, 10.63, 10.50 ms. Throughput: 379.35 iter/sec.
Timings for 1792K FFT length (4 cpus, 4 workers): 12.42, 12.43, 12.86, 12.34 ms. Throughput: 319.79 iter/sec.
Timings for 1792K FFT length (4 cpus hyperthreaded, 4 workers): 12.73, 13.51, 12.64, 14.93 ms. Throughput: 298.67 iter/sec.
Timings for 2048K FFT length (4 cpus, 4 workers): 13.71, 13.86, 13.92, 13.72 ms. Throughput: 289.80 iter/sec.
Timings for 2048K FFT length (4 cpus hyperthreaded, 4 workers): 14.34, 14.27, 14.39, 14.22 ms. Throughput: 279.59 iter/sec.
Timings for 2560K FFT length (4 cpus, 4 workers): 17.96, 17.96, 18.06, 17.86 ms. Throughput: 222.71 iter/sec.
Timings for 2560K FFT length (4 cpus hyperthreaded, 4 workers): 18.63, 18.62, 18.31, 18.17 ms. Throughput: 217.05 iter/sec.
Timings for 3072K FFT length (4 cpus, 4 workers): 21.53, 21.79, 21.47, 21.45 ms. Throughput: 185.53 iter/sec.
Timings for 3072K FFT length (4 cpus hyperthreaded, 4 workers): 22.09, 22.24, 22.09, 22.43 ms. Throughput: 180.10 iter/sec.
Timings for 3584K FFT length (4 cpus, 4 workers): 24.98, 25.53, 25.20, 25.22 ms. Throughput: 158.54 iter/sec.
Timings for 3584K FFT length (4 cpus hyperthreaded, 4 workers): 26.14, 25.63, 25.88, 25.93 ms. Throughput: 154.49 iter/sec.
Timings for 4096K FFT length (4 cpus, 4 workers): 28.66, 28.68, 28.96, 28.73 ms. Throughput: 139.11 iter/sec.
Timings for 4096K FFT length (4 cpus hyperthreaded, 4 workers): 29.71, 29.33, 29.84, 29.37 ms. Throughput: 135.31 iter/sec.
Timings for 5120K FFT length (4 cpus, 4 workers): 35.66, 35.97, 36.01, 35.79 ms. Throughput: 111.55 iter/sec.
Timings for 5120K FFT length (4 cpus hyperthreaded, 4 workers): 38.51, 38.96, 36.54, 38.47 ms. Throughput: 104.99 iter/sec.
Timings for 6144K FFT length (4 cpus, 4 workers): 42.15, 42.54, 42.02, 41.96 ms. Throughput: 94.86 iter/sec.
Timings for 6144K FFT length (4 cpus hyperthreaded, 4 workers): 43.98, 43.97, 44.13, 43.62 ms. Throughput: 91.06 iter/sec.
Timings for 7168K FFT length (4 cpus, 4 workers): 49.11, 49.92, 49.27, 49.16 ms. Throughput: 81.03 iter/sec.
Timings for 7168K FFT length (4 cpus hyperthreaded, 4 workers): 52.03, 51.71, 51.90, 51.76 ms. Throughput: 77.15 iter/sec.
Timings for 8192K FFT length (4 cpus, 4 workers): 56.63, 56.62, 56.66, 56.55 ms. Throughput: 70.65 iter/sec.
Timings for 8192K FFT length (4 cpus hyperthreaded, 4 workers): 58.61, 57.98, 59.05, 58.59 ms. Throughput: 68.31 iter/sec.
Throughput is similar to the 4 cpus hyperthreaded if Prime95 is made to think it is a 8 core cpu
Intel(R) Core(TM) i74790K CPU @ 4.00GHz
CPU speed: 3979.14 MHz, 8 cores
Timings for 1024K FFT length (8 cpus, 8 workers): 14.20, 13.91, 14.07, 14.08, 14.35, 13.97, 14.38, 13.89 ms. Throughput: 567.23 iter/sec.
Timings for 1280K FFT length (8 cpus, 8 workers): 18.64, 17.00, 17.49, 17.46, 17.68, 17.52, 17.84, 17.46 ms. Throughput: 453.90 iter/sec.
Timings for 1536K FFT length (8 cpus, 8 workers): 21.18, 21.10, 21.17, 21.04, 21.33, 20.82, 21.49, 21.08 ms. Throughput: 378.25 iter/sec.
Timings for 1792K FFT length (8 cpus, 8 workers): 27.10, 24.77, 25.65, 25.48, 25.63, 25.29, 26.03, 25.21 ms. Throughput: 312.13 iter/sec.
Timings for 2048K FFT length (8 cpus, 8 workers): 30.11, 27.77, 27.79, 27.79, 28.52, 28.33, 29.09, 28.77 ms. Throughput: 280.67 iter/sec.
Timings for 2560K FFT length (8 cpus, 8 workers): 39.15, 36.08, 36.51, 36.54, 37.20, 36.74, 37.56, 37.24 ms. Throughput: 215.59 iter/sec.
[Mon Dec 05 23:57:12 2016]
Timings for 3072K FFT length (8 cpus, 8 workers): 44.09, 44.07, 43.48, 43.49, 44.80, 43.98, 45.17, 44.55 ms. Throughput: 181.01 iter/sec.
Timings for 3584K FFT length (8 cpus, 8 workers): 54.59, 50.56, 50.89, 50.90, 51.72, 50.77, 51.83, 51.47 ms. Throughput: 155.15 iter/sec.
Timings for 4096K FFT length (8 cpus, 8 workers): 58.57, 58.75, 58.54, 58.74, 60.00, 58.68, 59.38, 59.15 ms. Throughput: 135.65 iter/sec.
Timings for 5120K FFT length (8 cpus, 8 workers): 78.79, 73.09, 73.40, 73.41, 74.53, 73.62, 74.60, 74.00 ms. Throughput: 107.54 iter/sec.
Timings for 6144K FFT length (8 cpus, 8 workers): 90.20, 83.53, 88.52, 88.50, 86.48, 85.48, 86.64, 85.88 ms. Throughput: 92.10 iter/sec.
Timings for 7168K FFT length (8 cpus, 8 workers): 118.17, 93.58, 104.24, 104.27, 100.25, 99.53, 100.65, 99.77 ms. Throughput: 78.31 iter/sec.
Timings for 8192K FFT length (8 cpus, 8 workers): 119.92, 112.98, 119.11, 119.10, 114.97, 113.93, 115.30, 114.60 ms. Throughput: 68.86 iter/sec.
Throughput improves slightly when NUMCPUs is 7, but then it drops again when at 6, 5 and even 4. All things being equal throughput should improve.
Prime95 is not optimized to run 6. 5 or 4 threads on a 4 core hyperthreaded 4790 when the chip is treated as a 6, 5 or 4 core chip.
There is only one way for 7 threads to be used: AB CD EF Go. By releasing one thread of the 8. AB Co DE FG is considered equivalent to oA BC DE FG.
There are 2 ways for 6 threads to be used: AB CD EF oo and AE BF Co Do. The latter is more optimal. Threads E and F are free to be assigned to any thread, but each of A, B, C, and D are assigned to specific cores.
There are 2 ways for 5 threads to be used: AB CD Eo oo and AE Bo Co Do. The latter is more optimal. Threads E is free to be assigned to any thread, but each of A, B, C, and D are assigned to specific cores.
There are 3 ways for 4 threads to be used: AB CD oo oo and AB Co Do oo and Ao Bo Co Do. The last is most optimal. Each of A, B, C, and D are assigned to specific cores.
I guess Prime95 would be optimized to choose Ao Bo Co Do for any 4 threads simultaneously being run.
What I recommend is that processing be optimized to run 5 threads as above or even 6 threads.
I postulate that there should be slight total throughput increase for 5 threads [possibly dependent on memory system] and then a drop off until 8 threads. I don't think Prime95 allows benchmarking of this specific condition.
