mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Software

Reply
 
Thread Tools
Old 2020-02-09, 20:48   #34
fivemack
(loop (#_fork))
 
fivemack's Avatar
 
Feb 2006
Cambridge, England

13·487 Posts
Default

I am a little confused by what's happening here:

Code:
Timing 20480K FFT, 12 cores, 2 workers.  Average times: 50.92, 50.64 ms.  Total throughput: 39.39 iter/sec.
[Worker #1 Feb 4 17:37] Timing 20480K FFT, 12 cores, 3 workers.  Average times: 103.31, 102.76, 50.81 ms.  Total throughput: 39.09 iter/sec.
where for what is presumably four cores per worker we get two slow jobs and one quick one, with the quick one being just as fast as the previous line's six cores per worker, and the slow ones being half that speed. Is there non-uniform memory access within each socket?
fivemack is offline   Reply With Quote
Old 2020-02-09, 22:36   #35
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

3,209 Posts
Default

Quote:
Originally Posted by fivemack View Post
I am a little confused by what's happening here:

Code:
Timing 20480K FFT, 12 cores, 2 workers.  Average times: 50.92, 50.64 ms.  Total throughput: 39.39 iter/sec.
[Worker #1 Feb 4 17:37] Timing 20480K FFT, 12 cores, 3 workers.  Average times: 103.31, 102.76, 50.81 ms.  Total throughput: 39.09 iter/sec.
where for what is presumably four cores per worker we get two slow jobs and one quick one, with the quick one being just as fast as the previous line's six cores per worker, and the slow ones being half that speed. Is there non-uniform memory access within each socket?
Maybe the workers landed like this:
CPU1 4 cores worker 3; 1 core worker 2; 1 core worker 1
CPU2 3 cores worker 2; 3 cores worker 1

Worker 3 would have faster communication between all its cores.

Last fiddled with by kriesel on 2020-02-09 at 22:39
kriesel is offline   Reply With Quote
Old 2020-02-10, 02:16   #36
Prime95
P90 years forever!
 
Prime95's Avatar
 
Aug 2002
Yeehaw, FL

11010000100112 Posts
Default

Quote:
Originally Posted by fivemack View Post
I am a little confused by what's happening here:

Code:
Timing 20480K FFT, 12 cores, 2 workers.  Average times: 50.92, 50.64 ms.  Total throughput: 39.39 iter/sec.
[Worker #1 Feb 4 17:37] Timing 20480K FFT, 12 cores, 3 workers.  Average times: 103.31, 102.76, 50.81 ms.  Total throughput: 39.09 iter/sec.
where for what is presumably four cores per worker we get two slow jobs and one quick one, with the quick one being just as fast as the previous line's six cores per worker, and the slow ones being half that speed. Is there non-uniform memory access within each socket?
Prime95 won't split a worker across L3 caches or CPU sockets. I assume your 12 cores are really two 6-core chiplets. If so, then worker #1 and #2 would each get 3 cores on one chiplet while worker #3 would get 6 cores on the the other chiplet.
Prime95 is offline   Reply With Quote
Old 2020-02-10, 03:08   #37
axn
 
axn's Avatar
 
Jun 2003

22·3·373 Posts
Default

Quote:
Originally Posted by Prime95 View Post
Prime95 won't split a worker across L3 caches or CPU sockets. I assume your 12 cores are really two 6-core chiplets. If so, then worker #1 and #2 would each get 3 cores on one chiplet while worker #3 would get 6 cores on the the other chiplet.
Close. They are dual socket Xeons, each with 12 cores. Presumably the same explanation still applies, i.e. Workers 1 & 2 on socket 1 with 3 cores each, and Worker 3 on socket 2 with 6 cores. Although, I suppose the question is, why didn't it do 4-4-4, since there were spare cores available in each socket.
axn is offline   Reply With Quote
Old 2020-02-10, 04:00   #38
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

1100100010012 Posts
Default

Quote:
Originally Posted by axn View Post
Close. They are dual socket Xeons, each with 12 cores. Presumably the same explanation still applies, i.e. Workers 1 & 2 on socket 1 with 3 cores each, and Worker 3 on socket 2 with 6 cores. Although, I suppose the question is, why didn't it do 4-4-4, since there were spare cores available in each socket.
pkg 1: 4 & 4; sharing memory bandwidth
pkg 2: 4; not sharing memory bandwidth
kriesel is offline   Reply With Quote
Old 2020-02-10, 06:08   #39
Prime95
P90 years forever!
 
Prime95's Avatar
 
Aug 2002
Yeehaw, FL

3·52·89 Posts
Default

Quote:
Originally Posted by axn View Post
Close. They are dual socket Xeons, each with 12 cores. Presumably the same explanation still applies, i.e. Workers 1 & 2 on socket 1 with 3 cores each, and Worker 3 on socket 2 with 6 cores. Although, I suppose the question is, why didn't it do 4-4-4, since there were spare cores available in each socket.
I'm not sure what the algorithm does when you use less than all the cores -- not a case I was really designing for.
Prime95 is offline   Reply With Quote
Old 2020-02-13, 08:30   #40
jas
 
"Simon Josefsson"
Jan 2020
Stockholm

11 Posts
Default

Quote:
Originally Posted by jas View Post
Thank you -- this makes sense, and I have ordered more memory from ebay now. Will do more testing once I get it.

I received 8x8GB DDR4-2133 now, and re-ran 20480K FFT benchmarks. The performance now scale by number of cores up to 24 cores properly. The number of workers 2-24 does not seem to care, having 24 cores and 2 workers yield 115 iter/s and 24 cores with 24 workers yield 117 iter/s. Is this normal?



I think I will go with 2 workers and 24 cores to get a faster churn of founds.


/Simon

Code:
Timing 20480K FFT, 1 core, 1 worker.  Average times: 132.36 ms.  Total throughput:  7.55 iter/sec.
Timing 20480K FFT, 2 cores, 1 worker.  Average times: 69.19 ms.  Total throughput: 14.45 iter/sec.
Timing 20480K FFT, 2 cores, 2 workers.  Average times: 132.58, 132.45 ms.  Total throughput: 15.09 iter/sec.
Timing 20480K FFT, 3 cores, 1 worker.  Average times: 48.95 ms.  Total throughput: 20.43 iter/sec.
Timing 20480K FFT, 3 cores, 2 workers.  Average times: 67.98, 132.73 ms.  Total throughput: 22.24 iter/sec.
Timing 20480K FFT, 3 cores, 3 workers.  Average times: 133.80, 133.10, 137.11 ms.  Total throughput: 22.28 iter/sec.
Timing 20480K FFT, 4 cores, 1 worker.  Average times: 36.83 ms.  Total throughput: 27.15 iter/sec.
Timing 20480K FFT, 4 cores, 2 workers.  Average times: 69.86, 69.49 ms.  Total throughput: 28.71 iter/sec.
Timing 20480K FFT, 4 cores, 3 workers.  Average times: 134.28, 133.20, 67.70 ms.  Total throughput: 29.73 iter/sec.
Timing 20480K FFT, 4 cores, 4 workers.  Average times: 138.31, 133.19, 136.47, 133.65 ms.  Total throughput: 29.55 iter/sec.
Timing 20480K FFT, 5 cores, 1 worker.  Average times: 29.83 ms.  Total throughput: 33.53 iter/sec.
Timing 20480K FFT, 5 cores, 2 workers.  Average times: 48.41, 68.36 ms.  Total throughput: 35.29 iter/sec.
Timing 20480K FFT, 5 cores, 3 workers.  Average times: 142.63, 72.09, 68.86 ms.  Total throughput: 35.40 iter/sec.
Timing 20480K FFT, 5 cores, 4 workers.  Average times: 145.07, 73.36, 133.23, 133.19 ms.  Total throughput: 35.54 iter/sec.
Timing 20480K FFT, 5 cores, 5 workers.  Average times: 142.55, 141.35, 143.04, 135.20, 135.13 ms.  Total throughput: 35.88 iter/sec.
Timing 20480K FFT, 6 cores, 1 worker.  Average times: 25.94 ms.  Total throughput: 38.55 iter/sec.
Timing 20480K FFT, 6 cores, 2 workers.  Average times: 48.68, 48.69 ms.  Total throughput: 41.08 iter/sec.
Timing 20480K FFT, 6 cores, 3 workers.  Average times: 142.82, 72.64, 48.95 ms.  Total throughput: 41.20 iter/sec.
Timing 20480K FFT, 6 cores, 4 workers.  Average times: 143.11, 71.86, 142.39, 72.03 ms.  Total throughput: 41.81 iter/sec.
Timing 20480K FFT, 6 cores, 5 workers.  Average times: 143.06, 141.80, 143.11, 143.67, 72.75 ms.  Total throughput: 41.74 iter/sec.
Timing 20480K FFT, 6 cores, 6 workers.  Average times: 144.38, 141.73, 142.95, 146.63, 146.15, 146.02 ms.  Total throughput: 41.49 iter/sec.
Timing 20480K FFT, 7 cores, 1 worker.  Average times: 22.81 ms.  Total throughput: 43.84 iter/sec.
Timing 20480K FFT, 7 cores, 2 workers.  Average times: 36.59, 48.21 ms.  Total throughput: 48.07 iter/sec.
Timing 20480K FFT, 7 cores, 3 workers.  Average times: 74.03, 73.91, 49.93 ms.  Total throughput: 47.07 iter/sec.
Timing 20480K FFT, 7 cores, 4 workers.  Average times: 74.00, 73.84, 145.59, 74.23 ms.  Total throughput: 47.39 iter/sec.
Timing 20480K FFT, 7 cores, 5 workers.  Average times: 145.49, 143.13, 72.93, 143.32, 72.59 ms.  Total throughput: 48.32 iter/sec.
Timing 20480K FFT, 7 cores, 6 workers.  Average times: 144.42, 143.04, 73.28, 143.75, 142.97, 143.26 ms.  Total throughput: 48.49 iter/sec.
Timing 20480K FFT, 7 cores, 7 workers.  Average times: 144.21, 143.15, 143.06, 144.38, 143.58, 143.14, 142.93 ms.  Total throughput: 48.78 iter/sec.
Timing 20480K FFT, 8 cores, 1 worker.  Average times: 20.51 ms.  Total throughput: 48.76 iter/sec.
Timing 20480K FFT, 8 cores, 2 workers.  Average times: 36.86, 36.86 ms.  Total throughput: 54.26 iter/sec.
Timing 20480K FFT, 8 cores, 3 workers.  Average times: 73.06, 72.95, 36.88 ms.  Total throughput: 54.51 iter/sec.
Timing 20480K FFT, 8 cores, 4 workers.  Average times: 73.28, 72.94, 73.56, 73.51 ms.  Total throughput: 54.55 iter/sec.
Timing 20480K FFT, 8 cores, 5 workers.  Average times: 144.73, 144.33, 73.04, 73.47, 73.43 ms.  Total throughput: 54.76 iter/sec.
Timing 20480K FFT, 8 cores, 6 workers.  Average times: 145.45, 142.96, 73.26, 144.39, 144.45, 73.51 ms.  Total throughput: 54.97 iter/sec.
Timing 20480K FFT, 8 cores, 7 workers.  Average times: 145.92, 143.29, 143.21, 144.83, 145.18, 144.54, 73.57 ms.  Total throughput: 55.12 iter/sec.
Timing 20480K FFT, 8 cores, 8 workers.  Average times: 149.16, 143.05, 143.03, 144.44, 147.75, 147.55, 147.36, 147.92 ms.  Total throughput: 54.70 iter/sec.
Timing 20480K FFT, 9 cores, 1 worker.  Average times: 18.65 ms.  Total throughput: 53.62 iter/sec.
Timing 20480K FFT, 9 cores, 2 workers.  Average times: 30.36, 37.43 ms.  Total throughput: 59.66 iter/sec.
Timing 20480K FFT, 9 cores, 3 workers.  Average times: 75.58, 50.70, 38.01 ms.  Total throughput: 59.27 iter/sec.
Timing 20480K FFT, 9 cores, 4 workers.  Average times: 74.76, 49.79, 73.61, 73.74 ms.  Total throughput: 60.61 iter/sec.
Timing 20480K FFT, 9 cores, 5 workers.  Average times: 147.22, 74.28, 73.90, 73.41, 73.38 ms.  Total throughput: 61.04 iter/sec.
Timing 20480K FFT, 9 cores, 6 workers.  Average times: 148.75, 73.94, 74.21, 144.92, 144.51, 73.55 ms.  Total throughput: 61.14 iter/sec.
Timing 20480K FFT, 9 cores, 7 workers.  Average times: 147.33, 145.59, 145.78, 74.12, 144.48, 144.58, 73.87 ms.  Total throughput: 61.38 iter/sec.
Timing 20480K FFT, 9 cores, 8 workers.  Average times: 148.60, 145.69, 145.63, 74.84, 147.36, 146.53, 146.55, 146.97 ms.  Total throughput: 61.06 iter/sec.
Timing 20480K FFT, 9 cores, 9 workers.  Average times: 148.51, 145.62, 146.64, 147.24, 144.51, 144.70, 144.37, 145.60, 144.62 ms.  Total throughput: 61.75 iter/sec.
Timing 20480K FFT, 10 cores, 1 worker.  Average times: 18.23 ms.  Total throughput: 54.87 iter/sec.
Timing 20480K FFT, 10 cores, 2 workers.  Average times: 30.52, 30.57 ms.  Total throughput: 65.48 iter/sec.
Timing 20480K FFT, 10 cores, 3 workers.  Average times: 75.03, 49.79, 30.11 ms.  Total throughput: 66.63 iter/sec.
Timing 20480K FFT, 10 cores, 4 workers.  Average times: 75.61, 50.63, 75.48, 50.59 ms.  Total throughput: 65.99 iter/sec.
Timing 20480K FFT, 10 cores, 5 workers.  Average times: 147.14, 75.07, 74.91, 76.38, 51.24 ms.  Total throughput: 66.07 iter/sec.
Timing 20480K FFT, 10 cores, 6 workers.  Average times: 147.91, 74.01, 74.51, 147.48, 74.14, 74.02 ms.  Total throughput: 67.47 iter/sec.
Timing 20480K FFT, 10 cores, 7 workers.  Average times: 147.26, 145.79, 146.46, 75.03, 149.65, 76.49, 75.99 ms.  Total throughput: 66.72 iter/sec.
Timing 20480K FFT, 10 cores, 8 workers.  Average times: 146.90, 145.39, 146.41, 73.96, 146.72, 146.65, 146.65, 74.02 ms.  Total throughput: 68.00 iter/sec.
Timing 20480K FFT, 10 cores, 9 workers.  Average times: 146.88, 146.07, 146.14, 147.32, 144.23, 146.74, 146.56, 147.22, 74.35 ms.  Total throughput: 68.10 iter/sec.
Timing 20480K FFT, 10 cores, 10 workers.  Average times: 146.92, 146.68, 145.74, 146.68, 144.42, 147.39, 146.53, 146.74, 147.11, 145.91 ms.  Total throughput: 68.30 iter/sec.
Timing 20480K FFT, 11 cores, 1 worker.  Average times: 17.49 ms.  Total throughput: 57.17 iter/sec.
Timing 20480K FFT, 11 cores, 2 workers.  Average times: 25.37, 29.77 ms.  Total throughput: 73.00 iter/sec.
Timing 20480K FFT, 11 cores, 3 workers.  Average times: 51.58, 51.01, 30.12 ms.  Total throughput: 72.19 iter/sec.
Timing 20480K FFT, 11 cores, 4 workers.  Average times: 52.15, 51.69, 76.27, 51.15 ms.  Total throughput: 71.18 iter/sec.
Timing 20480K FFT, 11 cores, 5 workers.  Average times: 77.93, 77.32, 76.51, 76.71, 51.25 ms.  Total throughput: 71.38 iter/sec.
Timing 20480K FFT, 11 cores, 6 workers.  Average times: 77.55, 77.13, 76.10, 149.33, 75.75, 75.32 ms.  Total throughput: 72.17 iter/sec.
Timing 20480K FFT, 11 cores, 7 workers.  Average times: 152.02, 149.30, 77.24, 76.04, 149.39, 75.72, 75.33 ms.  Total throughput: 72.55 iter/sec.
Timing 20480K FFT, 11 cores, 8 workers.  Average times: 151.78, 149.02, 77.10, 76.49, 149.24, 148.89, 149.03, 75.91 ms.  Total throughput: 72.64 iter/sec.
Timing 20480K FFT, 11 cores, 9 workers.  Average times: 150.95, 149.29, 149.44, 149.67, 76.46, 149.67, 149.58, 150.46, 76.33 ms.  Total throughput: 72.89 iter/sec.
Timing 20480K FFT, 11 cores, 10 workers.  Average times: 150.57, 149.09, 149.35, 150.24, 76.26, 149.50, 149.93, 149.55, 149.77, 149.12 ms.  Total throughput: 73.24 iter/sec.
Timing 20480K FFT, 12 cores, 1 worker.  Average times: 17.01 ms.  Total throughput: 58.79 iter/sec.
Timing 20480K FFT, 12 cores, 2 workers.  Average times: 26.20, 26.22 ms.  Total throughput: 76.31 iter/sec.
Timing 20480K FFT, 12 cores, 3 workers.  Average times: 51.60, 50.97, 25.61 ms.  Total throughput: 78.05 iter/sec.
Timing 20480K FFT, 12 cores, 4 workers.  Average times: 52.66, 51.83, 51.88, 51.59 ms.  Total throughput: 76.94 iter/sec.
Timing 20480K FFT, 12 cores, 5 workers.  Average times: 76.59, 76.23, 75.83, 50.89, 50.72 ms.  Total throughput: 78.73 iter/sec.
Timing 20480K FFT, 12 cores, 6 workers.  Average times: 77.09, 76.84, 75.93, 76.61, 76.65, 75.83 ms.  Total throughput: 78.44 iter/sec.
Timing 20480K FFT, 12 cores, 7 workers.  Average times: 151.09, 149.59, 76.60, 75.65, 76.82, 76.51, 75.68 ms.  Total throughput: 78.88 iter/sec.
Timing 20480K FFT, 12 cores, 8 workers.  Average times: 152.80, 149.68, 76.65, 75.80, 151.35, 150.42, 77.12, 76.06 ms.  Total throughput: 78.83 iter/sec.
Timing 20480K FFT, 12 cores, 9 workers.  Average times: 151.51, 150.18, 150.37, 151.59, 75.64, 151.19, 150.68, 76.79, 75.90 ms.  Total throughput: 79.17 iter/sec.
Timing 20480K FFT, 12 cores, 10 workers.  Average times: 151.53, 150.43, 150.27, 150.73, 75.60, 150.43, 150.29, 150.99, 150.42, 75.99 ms.  Total throughput: 79.50 iter/sec.
Timing 20480K FFT, 12 cores, 12 workers.  Average times: 150.76, 149.40, 149.95, 153.83, 149.97, 149.19, 150.02, 153.54, 150.14, 153.05, 149.12, 151.95 ms.  Total throughput: 79.53 iter/sec.
Timing 20480K FFT, 13 cores, 1 worker.  Average times: 15.58 ms.  Total throughput: 64.19 iter/sec.
Timing 20480K FFT, 13 cores, 2 workers.  Average times: 22.61, 25.69 ms.  Total throughput: 83.15 iter/sec.
Timing 20480K FFT, 13 cores, 3 workers.  Average times: 53.55, 40.16, 26.20 ms.  Total throughput: 81.74 iter/sec.
Timing 20480K FFT, 13 cores, 4 workers.  Average times: 53.08, 39.50, 50.83, 50.70 ms.  Total throughput: 83.55 iter/sec.
Timing 20480K FFT, 13 cores, 5 workers.  Average times: 79.50, 79.00, 52.55, 51.15, 51.14 ms.  Total throughput: 83.38 iter/sec.
Timing 20480K FFT, 13 cores, 6 workers.  Average times: 79.23, 78.67, 52.60, 76.88, 77.02, 76.09 ms.  Total throughput: 83.48 iter/sec.
Timing 20480K FFT, 13 cores, 7 workers.  Average times: 156.57, 79.07, 78.80, 78.29, 76.80, 76.91, 76.00 ms.  Total throughput: 83.68 iter/sec.
Timing 20480K FFT, 13 cores, 8 workers.  Average times: 156.30, 78.53, 78.17, 77.96, 149.84, 149.99, 76.08, 75.38 ms.  Total throughput: 84.50 iter/sec.
Timing 20480K FFT, 13 cores, 9 workers.  Average times: 155.69, 154.14, 155.19, 79.63, 78.99, 153.83, 153.54, 78.81, 77.88 ms.  Total throughput: 83.12 iter/sec.
Timing 20480K FFT, 13 cores, 10 workers.  Average times: 155.67, 153.87, 153.94, 78.97, 78.06, 150.37, 150.46, 151.12, 150.62, 75.96 ms.  Total throughput: 84.61 iter/sec.
Timing 20480K FFT, 13 cores, 12 workers.  Average times: 157.13, 153.79, 154.22, 155.59, 154.30, 78.13, 150.61, 150.55, 151.48, 151.65, 149.38, 149.77 ms.  Total throughput: 84.91 iter/sec.
Timing 20480K FFT, 14 cores, 1 worker.  Average times: 14.81 ms.  Total throughput: 67.54 iter/sec.
Timing 20480K FFT, 14 cores, 2 workers.  Average times: 23.11, 23.02 ms.  Total throughput: 86.70 iter/sec.
Timing 20480K FFT, 14 cores, 3 workers.  Average times: 53.67, 40.10, 22.94 ms.  Total throughput: 87.15 iter/sec.
Timing 20480K FFT, 14 cores, 4 workers.  Average times: 52.84, 39.47, 52.94, 39.34 ms.  Total throughput: 88.57 iter/sec.
Timing 20480K FFT, 14 cores, 5 workers.  Average times: 79.53, 78.92, 52.55, 53.21, 39.78 ms.  Total throughput: 88.21 iter/sec.
Timing 20480K FFT, 14 cores, 6 workers.  Average times: 80.03, 79.74, 52.81, 79.43, 80.00, 52.97 ms.  Total throughput: 87.94 iter/sec.
Timing 20480K FFT, 14 cores, 7 workers.  Average times: 155.62, 78.94, 79.07, 77.99, 78.99, 79.15, 52.52 ms.  Total throughput: 88.90 iter/sec.
Timing 20480K FFT, 14 cores, 8 workers.  Average times: 155.59, 78.90, 78.52, 78.36, 155.77, 79.23, 78.69, 78.22 ms.  Total throughput: 89.13 iter/sec.
Timing 20480K FFT, 14 cores, 9 workers.  Average times: 155.62, 153.97, 155.82, 79.01, 78.38, 155.86, 78.98, 79.07, 78.41 ms.  Total throughput: 89.23 iter/sec.
Timing 20480K FFT, 14 cores, 10 workers.  Average times: 156.42, 156.07, 154.37, 78.95, 78.18, 154.87, 154.88, 155.99, 79.04, 78.43 ms.  Total throughput: 89.46 iter/sec.
Timing 20480K FFT, 14 cores, 12 workers.  Average times: 157.27, 155.25, 154.45, 155.31, 154.07, 78.25, 155.16, 154.55, 155.65, 156.00, 152.82, 78.48 ms.  Total throughput: 90.02 iter/sec.
Timing 20480K FFT, 14 cores, 14 workers.  Average times: 155.68, 154.81, 153.97, 155.60, 153.73, 153.76, 158.06, 158.83, 158.29, 159.32, 158.37, 157.50, 157.28, 156.81 ms.  Total throughput: 89.43 iter/sec.
Timing 20480K FFT, 15 cores, 1 worker.  Average times: 14.59 ms.  Total throughput: 68.52 iter/sec.
Timing 20480K FFT, 15 cores, 2 workers.  Average times: 20.47, 22.85 ms.  Total throughput: 92.62 iter/sec.
Timing 20480K FFT, 15 cores, 3 workers.  Average times: 41.90, 41.24, 22.65 ms.  Total throughput: 92.26 iter/sec.
Timing 20480K FFT, 15 cores, 4 workers.  Average times: 41.35, 40.98, 52.96, 39.53 ms.  Total throughput: 92.77 iter/sec.
Timing 20480K FFT, 15 cores, 5 workers.  Average times: 82.79, 55.40, 54.75, 53.25, 39.85 ms.  Total throughput: 92.27 iter/sec.
Timing 20480K FFT, 15 cores, 6 workers.  Average times: 82.25, 54.59, 54.45, 78.86, 79.09, 52.30 ms.  Total throughput: 93.28 iter/sec.
Timing 20480K FFT, 15 cores, 7 workers.  Average times: 81.82, 81.70, 81.18, 81.06, 79.20, 79.26, 52.70 ms.  Total throughput: 93.33 iter/sec.
Timing 20480K FFT, 15 cores, 8 workers.  Average times: 82.04, 81.90, 81.36, 81.32, 155.37, 79.17, 78.79, 78.27 ms.  Total throughput: 93.52 iter/sec.
Timing 20480K FFT, 15 cores, 9 workers.  Average times: 161.23, 159.46, 81.71, 81.06, 80.97, 154.82, 78.60, 78.66, 77.93 ms.  Total throughput: 94.13 iter/sec.
Timing 20480K FFT, 15 cores, 10 workers.  Average times: 161.20, 159.17, 81.99, 81.18, 81.76, 156.01, 154.65, 155.02, 78.56, 78.37 ms.  Total throughput: 94.05 iter/sec.
Timing 20480K FFT, 15 cores, 12 workers.  Average times: 164.26, 159.09, 159.32, 160.16, 81.82, 81.91, 159.11, 158.45, 157.80, 157.84, 156.80, 80.38 ms.  Total throughput: 93.41 iter/sec.
Timing 20480K FFT, 15 cores, 14 workers.  Average times: 161.56, 159.67, 159.89, 159.68, 158.57, 158.49, 80.98, 153.82, 155.36, 154.03, 154.08, 152.48, 152.35, 152.51 ms.  Total throughput: 95.53 iter/sec.
Timing 20480K FFT, 16 cores, 1 worker.  Average times: 14.38 ms.  Total throughput: 69.54 iter/sec.
Timing 20480K FFT, 16 cores, 2 workers.  Average times: 20.47, 20.48 ms.  Total throughput: 97.67 iter/sec.
Timing 20480K FFT, 16 cores, 3 workers.  Average times: 42.63, 41.78, 20.96 ms.  Total throughput: 95.11 iter/sec.
Timing 20480K FFT, 16 cores, 4 workers.  Average times: 42.45, 42.02, 42.43, 42.10 ms.  Total throughput: 94.67 iter/sec.
Timing 20480K FFT, 16 cores, 5 workers.  Average times: 82.62, 55.14, 54.78, 41.37, 41.24 ms.  Total throughput: 96.91 iter/sec.
Timing 20480K FFT, 16 cores, 6 workers.  Average times: 82.12, 55.03, 54.59, 81.88, 54.90, 54.54 ms.  Total throughput: 97.43 iter/sec.
Timing 20480K FFT, 16 cores, 7 workers.  Average times: 82.19, 81.61, 81.32, 81.37, 81.76, 54.67, 54.42 ms.  Total throughput: 97.91 iter/sec.
Timing 20480K FFT, 16 cores, 8 workers.  Average times: 82.62, 82.36, 81.51, 81.59, 82.36, 82.52, 82.03, 81.76 ms.  Total throughput: 97.45 iter/sec.
Timing 20480K FFT, 16 cores, 9 workers.  Average times: 161.10, 160.27, 82.18, 81.44, 81.61, 82.65, 82.37, 81.37, 81.47 ms.  Total throughput: 97.95 iter/sec.
Timing 20480K FFT, 16 cores, 10 workers.  Average times: 161.02, 159.63, 82.87, 82.15, 82.36, 162.18, 162.76, 83.35, 82.38, 82.63 ms.  Total throughput: 97.41 iter/sec.
Timing 20480K FFT, 16 cores, 12 workers.  Average times: 165.29, 159.29, 160.47, 163.86, 81.86, 83.10, 158.87, 158.77, 160.18, 162.96, 81.79, 81.75 ms.  Total throughput: 98.34 iter/sec.
Timing 20480K FFT, 16 cores, 14 workers.  Average times: 161.25, 159.60, 159.77, 159.99, 158.20, 158.03, 81.00, 159.85, 159.31, 159.25, 160.21, 157.96, 157.80, 80.91 ms.  Total throughput: 100.05 iter/sec.
Timing 20480K FFT, 16 cores, 16 workers.  Average times: 161.69, 160.43, 159.56, 159.41, 158.05, 158.74, 159.04, 157.99, 159.75, 160.68, 160.95, 159.76, 159.27, 158.05, 158.21, 158.22 ms.  Total throughput: 100.41 iter/sec.
Timing 20480K FFT, 17 cores, 1 worker.  Average times: 14.74 ms.  Total throughput: 67.85 iter/sec.
Timing 20480K FFT, 17 cores, 2 workers.  Average times: 19.44, 20.98 ms.  Total throughput: 99.11 iter/sec.
Timing 20480K FFT, 17 cores, 3 workers.  Average times: 44.37, 34.89, 20.69 ms.  Total throughput: 99.54 iter/sec.
Timing 20480K FFT, 17 cores, 4 workers.  Average times: 43.55, 34.62, 41.81, 41.28 ms.  Total throughput: 99.98 iter/sec.
Timing 20480K FFT, 17 cores, 5 workers.  Average times: 58.06, 57.13, 57.44, 41.59, 41.45 ms.  Total throughput: 100.30 iter/sec.
Timing 20480K FFT, 17 cores, 6 workers.  Average times: 58.16, 57.31, 57.19, 81.32, 54.88, 54.47 ms.  Total throughput: 101.01 iter/sec.
Timing 20480K FFT, 17 cores, 7 workers.  Average times: 85.28, 85.49, 85.21, 57.06, 81.77, 54.90, 54.39 ms.  Total throughput: 101.52 iter/sec.
Timing 20480K FFT, 17 cores, 8 workers.  Average times: 86.66, 86.44, 86.17, 58.17, 83.93, 84.20, 82.88, 83.01 ms.  Total throughput: 99.81 iter/sec.
Timing 20480K FFT, 17 cores, 9 workers.  Average times: 167.55, 86.34, 86.38, 85.53, 86.01, 83.31, 83.28, 82.99, 82.86 ms.  Total throughput: 100.57 iter/sec.
Timing 20480K FFT, 17 cores, 10 workers.  Average times: 168.94, 86.17, 85.61, 85.24, 86.10, 162.28, 163.52, 83.03, 82.20, 82.30 ms.  Total throughput: 101.19 iter/sec.
Timing 20480K FFT, 17 cores, 12 workers.  Average times: 168.41, 166.85, 167.53, 85.64, 85.50, 85.66, 160.84, 161.19, 159.75, 160.59, 81.51, 81.82 ms.  Total throughput: 102.35 iter/sec.
Timing 20480K FFT, 17 cores, 14 workers.  Average times: 169.13, 167.51, 166.87, 168.58, 167.30, 85.91, 86.46, 162.53, 163.06, 162.48, 163.28, 161.80, 161.28, 82.83 ms.  Total throughput: 102.01 iter/sec.
Timing 20480K FFT, 17 cores, 16 workers.  Average times: 169.56, 168.26, 167.66, 169.42, 167.31, 167.64, 168.48, 86.43, 159.97, 161.97, 160.19, 159.99, 160.82, 159.92, 159.63, 159.89 ms.  Total throughput: 103.06 iter/sec.
Timing 20480K FFT, 18 cores, 1 worker.  Average times: 14.66 ms.  Total throughput: 68.19 iter/sec.
Timing 20480K FFT, 18 cores, 2 workers.  Average times: 19.34, 19.26 ms.  Total throughput: 103.63 iter/sec.
Timing 20480K FFT, 18 cores, 3 workers.  Average times: 43.57, 34.75, 19.02 ms.  Total throughput: 104.30 iter/sec.
Timing 20480K FFT, 18 cores, 4 workers.  Average times: 44.03, 34.95, 43.28, 34.48 ms.  Total throughput: 103.44 iter/sec.
Timing 20480K FFT, 18 cores, 5 workers.  Average times: 58.12, 57.72, 57.54, 43.48, 34.57 ms.  Total throughput: 103.84 iter/sec.
Timing 20480K FFT, 18 cores, 6 workers.  Average times: 58.78, 58.25, 58.26, 58.52, 58.56, 58.33 ms.  Total throughput: 102.65 iter/sec.
Timing 20480K FFT, 18 cores, 7 workers.  Average times: 86.23, 86.09, 85.46, 58.11, 58.31, 57.85, 58.18 ms.  Total throughput: 103.75 iter/sec.
Timing 20480K FFT, 18 cores, 8 workers.  Average times: 86.93, 86.62, 85.77, 58.48, 87.25, 87.27, 86.61, 58.71 ms.  Total throughput: 103.31 iter/sec.
Timing 20480K FFT, 18 cores, 9 workers.  Average times: 168.62, 86.87, 86.16, 85.88, 86.08, 87.15, 87.16, 86.84, 58.45 ms.  Total throughput: 103.88 iter/sec.
Timing 20480K FFT, 18 cores, 10 workers.  Average times: 169.87, 85.93, 86.16, 84.85, 85.33, 167.83, 86.11, 85.52, 85.30, 85.46 ms.  Total throughput: 105.33 iter/sec.
Timing 20480K FFT, 18 cores, 12 workers.  Average times: 168.40, 167.13, 168.30, 86.16, 85.51, 86.14, 168.58, 168.59, 168.17, 86.45, 86.15, 86.49 ms.  Total throughput: 105.32 iter/sec.
Timing 20480K FFT, 18 cores, 14 workers.  Average times: 171.32, 168.02, 168.58, 170.28, 167.29, 85.61, 86.10, 168.05, 167.85, 168.71, 169.92, 165.83, 85.89, 86.32 ms.  Total throughput: 105.84 iter/sec.
Timing 20480K FFT, 18 cores, 16 workers.  Average times: 170.64, 168.72, 169.02, 168.53, 167.00, 166.65, 167.48, 85.94, 168.57, 167.56, 167.52, 167.89, 166.54, 166.56, 166.10, 85.98 ms.  Total throughput: 106.72 iter/sec.
Timing 20480K FFT, 18 cores, 18 workers.  Average times: 173.88, 170.74, 171.63, 170.91, 166.55, 168.84, 168.74, 168.81, 172.12, 169.01, 169.24, 168.62, 168.61, 166.00, 167.77, 166.76, 166.04, 169.98 ms.  Total throughput: 106.45 iter/sec.
Timing 20480K FFT, 19 cores, 1 worker.  Average times: 14.46 ms.  Total throughput: 69.18 iter/sec.
Timing 20480K FFT, 19 cores, 2 workers.  Average times: 18.57, 19.42 ms.  Total throughput: 105.32 iter/sec.
Timing 20480K FFT, 19 cores, 3 workers.  Average times: 37.03, 36.65, 19.16 ms.  Total throughput: 106.49 iter/sec.
Timing 20480K FFT, 19 cores, 4 workers.  Average times: 37.47, 37.01, 43.75, 34.85 ms.  Total throughput: 105.26 iter/sec.
Timing 20480K FFT, 19 cores, 5 workers.  Average times: 61.89, 61.30, 46.78, 44.49, 35.05 ms.  Total throughput: 104.86 iter/sec.
Timing 20480K FFT, 19 cores, 6 workers.  Average times: 61.40, 61.21, 46.28, 57.91, 57.81, 58.05 ms.  Total throughput: 106.03 iter/sec.
Timing 20480K FFT, 19 cores, 7 workers.  Average times: 91.96, 91.75, 60.72, 61.27, 57.71, 57.50, 57.75 ms.  Total throughput: 106.60 iter/sec.
Timing 20480K FFT, 19 cores, 8 workers.  Average times: 91.80, 91.44, 61.61, 61.80, 87.97, 87.92, 87.23, 59.16 ms.  Total throughput: 105.35 iter/sec.
Timing 20480K FFT, 19 cores, 9 workers.  Average times: 92.37, 91.99, 91.06, 90.73, 91.73, 87.46, 87.61, 86.95, 58.99 ms.  Total throughput: 105.90 iter/sec.
Timing 20480K FFT, 19 cores, 10 workers.  Average times: 92.11, 91.12, 90.86, 90.56, 91.41, 168.25, 85.99, 85.67, 86.36, 86.11 ms.  Total throughput: 107.26 iter/sec.
Timing 20480K FFT, 19 cores, 12 workers.  Average times: 179.30, 176.91, 91.18, 90.34, 90.78, 91.46, 168.15, 168.14, 169.33, 85.40, 85.43, 85.62 ms.  Total throughput: 108.11 iter/sec.
Timing 20480K FFT, 19 cores, 14 workers.  Average times: 180.69, 178.67, 178.41, 180.99, 91.38, 91.37, 92.73, 170.64, 170.09, 171.50, 170.01, 169.42, 86.99, 87.87 ms.  Total throughput: 107.17 iter/sec.
Timing 20480K FFT, 19 cores, 16 workers.  Average times: 181.46, 178.32, 179.32, 179.68, 176.91, 177.75, 91.44, 92.00, 170.16, 168.50, 168.75, 168.94, 167.46, 166.63, 166.76, 86.17 ms.  Total throughput: 108.58 iter/sec.
Timing 20480K FFT, 19 cores, 18 workers.  Average times: 181.75, 178.73, 179.32, 178.97, 177.60, 178.31, 178.70, 178.30, 92.50, 169.39, 169.15, 168.97, 169.80, 167.93, 167.53, 167.73, 167.29, 168.52 ms.  Total throughput: 108.93 iter/sec.
Timing 20480K FFT, 20 cores, 1 worker.  Average times: 14.33 ms.  Total throughput: 69.77 iter/sec.
Timing 20480K FFT, 20 cores, 2 workers.  Average times: 18.53, 18.22 ms.  Total throughput: 108.83 iter/sec.
Timing 20480K FFT, 20 cores, 3 workers.  Average times: 37.37, 36.98, 18.13 ms.  Total throughput: 108.97 iter/sec.
Timing 20480K FFT, 20 cores, 4 workers.  Average times: 37.01, 37.03, 37.21, 36.96 ms.  Total throughput: 107.95 iter/sec.
Timing 20480K FFT, 20 cores, 5 workers.  Average times: 61.87, 60.87, 46.40, 36.82, 36.81 ms.  Total throughput: 108.47 iter/sec.
Timing 20480K FFT, 20 cores, 6 workers.  Average times: 62.08, 61.33, 46.39, 61.07, 60.83, 46.13 ms.  Total throughput: 108.46 iter/sec.
Timing 20480K FFT, 20 cores, 7 workers.  Average times: 91.69, 91.68, 60.44, 61.00, 61.45, 61.07, 46.20 ms.  Total throughput: 109.05 iter/sec.
Timing 20480K FFT, 20 cores, 8 workers.  Average times: 92.44, 92.01, 61.47, 62.06, 92.45, 92.46, 61.70, 62.09 ms.  Total throughput: 108.01 iter/sec.
Timing 20480K FFT, 20 cores, 9 workers.  Average times: 91.65, 92.05, 90.75, 90.92, 91.67, 92.18, 91.61, 61.45, 61.62 ms.  Total throughput: 108.97 iter/sec.
Timing 20480K FFT, 20 cores, 10 workers.  Average times: 91.94, 92.25, 90.78, 91.27, 91.81, 91.25, 92.95, 90.68, 90.93, 92.95 ms.  Total throughput: 109.08 iter/sec.
Timing 20480K FFT, 20 cores, 12 workers.  Average times: 180.43, 178.40, 92.37, 91.60, 92.28, 92.71, 181.11, 180.68, 93.24, 92.15, 92.13, 93.32 ms.  Total throughput: 108.72 iter/sec.
Timing 20480K FFT, 20 cores, 14 workers.  Average times: 180.46, 177.33, 178.84, 178.34, 91.19, 91.21, 91.94, 178.69, 178.02, 178.15, 178.97, 91.48, 91.11, 92.09 ms.  Total throughput: 110.37 iter/sec.
Timing 20480K FFT, 20 cores, 16 workers.  Average times: 181.27, 178.44, 178.09, 179.65, 177.03, 177.42, 91.53, 92.17, 178.04, 177.42, 178.18, 179.06, 176.34, 176.53, 90.91, 92.34 ms.  Total throughput: 110.98 iter/sec.
Timing 20480K FFT, 20 cores, 18 workers.  Average times: 180.52, 178.45, 179.25, 179.46, 177.76, 177.93, 181.03, 178.40, 93.50, 181.01, 179.42, 180.35, 180.58, 179.06, 177.97, 178.08, 178.84, 92.99 ms.  Total throughput: 110.71 iter/sec.
Timing 20480K FFT, 20 cores, 20 workers.  Average times: 181.30, 180.31, 182.31, 180.67, 178.09, 179.82, 180.40, 178.19, 180.55, 180.69, 179.00, 179.90, 179.16, 180.70, 177.94, 177.51, 177.91, 177.36, 181.18, 179.11 ms.  Total throughput: 111.36 iter/sec.
Timing 20480K FFT, 21 cores, 1 worker.  Average times: 14.35 ms.  Total throughput: 69.71 iter/sec.
Timing 20480K FFT, 21 cores, 2 workers.  Average times: 17.74, 18.51 ms.  Total throughput: 110.37 iter/sec.
Timing 20480K FFT, 21 cores, 3 workers.  Average times: 39.95, 32.99, 18.59 ms.  Total throughput: 109.14 iter/sec.
Timing 20480K FFT, 21 cores, 4 workers.  Average times: 39.59, 32.88, 36.72, 36.90 ms.  Total throughput: 110.00 iter/sec.
Timing 20480K FFT, 21 cores, 5 workers.  Average times: 65.15, 48.84, 49.15, 36.93, 36.74 ms.  Total throughput: 110.47 iter/sec.
Timing 20480K FFT, 21 cores, 6 workers.  Average times: 66.09, 49.49, 50.00, 62.62, 62.10, 46.63 ms.  Total throughput: 108.85 iter/sec.
Timing 20480K FFT, 21 cores, 7 workers.  Average times: 97.86, 65.38, 64.89, 65.70, 61.68, 60.63, 46.10 ms.  Total throughput: 110.54 iter/sec.
Timing 20480K FFT, 21 cores, 8 workers.  Average times: 97.24, 65.60, 65.11, 65.97, 92.61, 92.76, 62.22, 62.84 ms.  Total throughput: 109.61 iter/sec.
Timing 20480K FFT, 21 cores, 9 workers.  Average times: 96.95, 96.82, 95.57, 96.09, 65.79, 92.78, 93.42, 61.95, 62.60 ms.  Total throughput: 110.31 iter/sec.
Timing 20480K FFT, 21 cores, 10 workers.  Average times: 96.96, 97.12, 96.18, 96.23, 65.53, 91.26, 90.64, 90.38, 90.31, 91.75 ms.  Total throughput: 111.69 iter/sec.
Timing 20480K FFT, 21 cores, 12 workers.  Average times: 190.41, 97.21, 97.73, 96.47, 97.35, 97.46, 179.36, 178.73, 92.99, 92.14, 92.25, 93.01 ms.  Total throughput: 111.04 iter/sec.
Timing 20480K FFT, 21 cores, 14 workers.  Average times: 190.98, 187.50, 188.09, 99.55, 97.02, 97.27, 98.11, 179.75, 179.81, 181.16, 180.67, 91.37, 92.28, 93.71 ms.  Total throughput: 111.34 iter/sec.
Timing 20480K FFT, 21 cores, 16 workers.  Average times: 191.49, 189.06, 189.27, 190.21, 186.14, 96.77, 97.51, 97.68, 177.95, 178.34, 178.25, 178.40, 177.14, 177.40, 90.96, 92.07 ms.  Total throughput: 112.83 iter/sec.
Timing 20480K FFT, 21 cores, 18 workers.  Average times: 192.17, 190.99, 190.39, 190.35, 187.80, 188.02, 188.34, 97.21, 97.94, 179.70, 177.59, 179.25, 179.78, 177.66, 178.80, 175.08, 177.70, 92.85 ms.  Total throughput: 113.06 iter/sec.
Timing 20480K FFT, 21 cores, 20 workers.  Average times: 193.19, 191.19, 191.17, 190.27, 188.04, 189.27, 188.60, 188.97, 189.88, 98.17, 178.72, 180.79, 177.60, 177.88, 177.22, 176.69, 177.12, 177.79, 179.09, 178.66 ms.  Total throughput: 113.67 iter/sec.
Timing 20480K FFT, 22 cores, 1 worker.  Average times: 14.78 ms.  Total throughput: 67.65 iter/sec.
Timing 20480K FFT, 22 cores, 2 workers.  Average times: 17.88, 17.87 ms.  Total throughput: 111.89 iter/sec.
Timing 20480K FFT, 22 cores, 3 workers.  Average times: 39.48, 32.90, 17.78 ms.  Total throughput: 111.96 iter/sec.
Timing 20480K FFT, 22 cores, 4 workers.  Average times: 39.99, 32.92, 39.57, 32.90 ms.  Total throughput: 111.05 iter/sec.
Timing 20480K FFT, 22 cores, 5 workers.  Average times: 65.21, 48.87, 49.10, 39.18, 32.42 ms.  Total throughput: 112.53 iter/sec.
Timing 20480K FFT, 22 cores, 6 workers.  Average times: 65.45, 48.77, 49.20, 65.47, 48.80, 49.21 ms.  Total throughput: 112.19 iter/sec.
Timing 20480K FFT, 22 cores, 7 workers.  Average times: 97.15, 65.69, 65.25, 65.58, 66.44, 49.43, 49.88 ms.  Total throughput: 111.42 iter/sec.
Timing 20480K FFT, 22 cores, 8 workers.  Average times: 98.22, 65.77, 65.22, 65.89, 97.06, 65.45, 64.36, 65.64 ms.  Total throughput: 112.25 iter/sec.
Timing 20480K FFT, 22 cores, 9 workers.  Average times: 97.91, 97.18, 96.11, 97.70, 65.99, 97.61, 65.80, 65.43, 66.58 ms.  Total throughput: 112.04 iter/sec.
Timing 20480K FFT, 22 cores, 10 workers.  Average times: 97.33, 97.55, 96.81, 97.17, 65.91, 98.28, 98.19, 97.53, 97.33, 66.02 ms.  Total throughput: 112.35 iter/sec.
Timing 20480K FFT, 22 cores, 12 workers.  Average times: 190.56, 96.52, 97.05, 95.97, 96.59, 97.37, 188.09, 96.70, 96.48, 95.80, 97.10, 96.78 ms.  Total throughput: 114.05 iter/sec.
Timing 20480K FFT, 22 cores, 14 workers.  Average times: 192.15, 188.71, 189.87, 97.44, 96.54, 97.38, 97.62, 188.72, 188.02, 189.87, 96.48, 95.89, 96.78, 96.79 ms.  Total throughput: 114.25 iter/sec.
Timing 20480K FFT, 22 cores, 16 workers.  Average times: 189.42, 189.08, 188.56, 190.61, 187.79, 97.10, 97.60, 97.33, 190.07, 189.08, 190.96, 189.30, 188.11, 96.15, 96.97, 97.63 ms.  Total throughput: 114.60 iter/sec.
Timing 20480K FFT, 22 cores, 18 workers.  Average times: 193.86, 190.18, 190.41, 190.56, 189.31, 187.77, 189.42, 97.48, 98.15, 189.66, 188.68, 190.68, 190.14, 188.06, 186.72, 187.90, 97.16, 97.54 ms.  Total throughput: 114.87 iter/sec.
Timing 20480K FFT, 22 cores, 20 workers.  Average times: 191.24, 190.99, 190.41, 192.42, 190.73, 189.40, 190.14, 190.07, 192.24, 98.84, 192.60, 190.80, 191.15, 191.10, 188.52, 188.55, 190.13, 189.10, 190.63, 97.01 ms.  Total throughput: 114.88 iter/sec.
Timing 20480K FFT, 22 cores, 22 workers.  Average times: 193.14, 192.92, 193.39, 192.15, 190.40, 190.75, 190.29, 191.70, 191.94, 192.37, 191.91, 189.43, 190.13, 190.40, 189.13, 188.51, 187.68, 187.53, 188.74, 189.37, 189.12, 191.91 ms.  Total throughput: 115.44 iter/sec.
Timing 20480K FFT, 23 cores, 1 worker.  Average times: 15.10 ms.  Total throughput: 66.20 iter/sec.
Timing 20480K FFT, 23 cores, 2 workers.  Average times: 17.42, 17.82 ms.  Total throughput: 113.53 iter/sec.
Timing 20480K FFT, 23 cores, 3 workers.  Average times: 34.42, 34.52, 17.70 ms.  Total throughput: 114.51 iter/sec.
Timing 20480K FFT, 23 cores, 4 workers.  Average times: 34.64, 34.63, 38.90, 32.44 ms.  Total throughput: 114.27 iter/sec.
Timing 20480K FFT, 23 cores, 5 workers.  Average times: 51.99, 51.50, 52.49, 39.27, 32.62 ms.  Total throughput: 113.83 iter/sec.
Timing 20480K FFT, 23 cores, 6 workers.  Average times: 52.23, 51.52, 52.19, 64.99, 48.69, 49.10 ms.  Total throughput: 114.01 iter/sec.
Timing 20480K FFT, 23 cores, 7 workers.  Average times: 68.93, 69.02, 69.17, 69.04, 64.55, 48.77, 49.07 ms.  Total throughput: 114.31 iter/sec.
Timing 20480K FFT, 23 cores, 8 workers.  Average times: 69.27, 68.93, 69.10, 69.06, 97.12, 65.32, 64.35, 65.17 ms.  Total throughput: 114.39 iter/sec.
Timing 20480K FFT, 23 cores, 9 workers.  Average times: 102.48, 102.94, 101.63, 69.00, 69.62, 98.58, 65.71, 65.65, 66.18 ms.  Total throughput: 113.87 iter/sec.
Timing 20480K FFT, 23 cores, 10 workers.  Average times: 103.19, 102.80, 102.06, 69.16, 69.60, 97.04, 97.17, 95.72, 95.71, 65.39 ms.  Total throughput: 114.83 iter/sec.
Timing 20480K FFT, 23 cores, 12 workers.  Average times: 103.92, 103.06, 101.51, 101.82, 102.97, 102.68, 190.08, 97.74, 96.77, 96.72, 98.02, 97.89 ms.  Total throughput: 115.03 iter/sec.
Timing 20480K FFT, 23 cores, 14 workers.  Average times: 203.38, 200.88, 103.98, 102.17, 102.22, 103.36, 103.50, 190.13, 188.45, 188.65, 96.85, 96.17, 97.08, 97.51 ms.  Total throughput: 115.57 iter/sec.
Timing 20480K FFT, 23 cores, 16 workers.  Average times: 203.91, 202.22, 201.25, 202.52, 102.79, 103.25, 103.98, 103.85, 189.33, 188.93, 189.79, 188.50, 187.50, 96.47, 96.39, 96.77 ms.  Total throughput: 115.97 iter/sec.
Timing 20480K FFT, 23 cores, 18 workers.  Average times: 201.46, 202.25, 200.97, 203.44, 197.94, 197.13, 103.52, 105.28, 104.11, 189.91, 190.35, 190.68, 192.55, 188.30, 191.04, 187.40, 97.84, 97.90 ms.  Total throughput: 115.96 iter/sec.
Timing 20480K FFT, 23 cores, 20 workers.  Average times: 203.42, 201.23, 204.95, 205.08, 201.24, 199.87, 199.54, 200.00, 104.92, 104.46, 193.29, 192.31, 192.57, 193.48, 190.11, 189.77, 191.37, 191.10, 192.83, 99.31 ms.  Total throughput: 115.71 iter/sec.
Timing 20480K FFT, 23 cores, 22 workers.  Average times: 208.68, 206.92, 206.34, 206.14, 201.64, 200.86, 200.84, 201.54, 202.53, 202.12, 105.61, 192.49, 191.58, 192.60, 192.33, 190.82, 190.57, 189.93, 190.43, 193.61, 191.30, 191.38 ms.  Total throughput: 115.98 iter/sec.
Timing 20480K FFT, 24 cores, 1 worker.  Average times: 13.03 ms.  Total throughput: 76.76 iter/sec.
Timing 20480K FFT, 24 cores, 2 workers.  Average times: 17.41, 17.15 ms.  Total throughput: 115.76 iter/sec.
Timing 20480K FFT, 24 cores, 3 workers.  Average times: 34.89, 34.72, 17.00 ms.  Total throughput: 116.29 iter/sec.
Timing 20480K FFT, 24 cores, 4 workers.  Average times: 34.62, 34.58, 34.09, 34.32 ms.  Total throughput: 116.27 iter/sec.
Timing 20480K FFT, 24 cores, 5 workers.  Average times: 52.35, 52.11, 52.30, 34.83, 34.88 ms.  Total throughput: 114.79 iter/sec.
Timing 20480K FFT, 24 cores, 6 workers.  Average times: 51.84, 51.59, 52.05, 51.82, 51.32, 51.98 ms.  Total throughput: 115.91 iter/sec.
Timing 20480K FFT, 24 cores, 7 workers.  Average times: 69.18, 68.77, 68.89, 69.64, 51.71, 51.01, 51.97 ms.  Total throughput: 116.06 iter/sec.
Timing 20480K FFT, 24 cores, 8 workers.  Average times: 69.21, 69.03, 68.68, 69.17, 69.56, 69.22, 69.68, 70.00 ms.  Total throughput: 115.41 iter/sec.
Timing 20480K FFT, 24 cores, 9 workers.  Average times: 103.40, 102.88, 101.65, 69.13, 69.68, 69.88, 69.34, 69.66, 69.65 ms.  Total throughput: 115.49 iter/sec.
Timing 20480K FFT, 24 cores, 10 workers.  Average times: 103.35, 103.55, 101.99, 69.48, 69.96, 103.45, 102.84, 102.47, 69.44, 69.75 ms.  Total throughput: 115.71 iter/sec.
Timing 20480K FFT, 24 cores, 12 workers.  Average times: 102.80, 102.81, 101.98, 102.57, 103.11, 103.10, 102.58, 102.97, 101.55, 102.18, 103.31, 103.22 ms.  Total throughput: 116.87 iter/sec.
Timing 20480K FFT, 24 cores, 14 workers.  Average times: 203.37, 200.08, 102.96, 101.62, 102.40, 103.53, 103.43, 199.81, 199.41, 102.90, 101.49, 101.78, 102.80, 103.45 ms.  Total throughput: 117.37 iter/sec.
Timing 20480K FFT, 24 cores, 16 workers.  Average times: 203.00, 201.21, 202.24, 201.52, 103.07, 102.63, 103.32, 104.19, 200.65, 199.85, 199.14, 199.08, 101.64, 102.27, 103.48, 103.51 ms.  Total throughput: 117.50 iter/sec.
Timing 20480K FFT, 24 cores, 18 workers.  Average times: 205.94, 200.88, 201.82, 203.74, 200.08, 199.77, 103.67, 104.79, 104.75, 201.66, 200.60, 201.52, 201.44, 201.26, 199.32, 103.41, 104.07, 104.14 ms.  Total throughput: 117.17 iter/sec.
Timing 20480K FFT, 24 cores, 20 workers.  Average times: 202.47, 201.40, 203.56, 203.82, 198.64, 199.01, 201.23, 200.29, 104.72, 104.58, 204.21, 202.93, 202.50, 202.24, 201.01, 200.99, 201.12, 201.27, 104.85, 104.77 ms.  Total throughput: 117.54 iter/sec.
Timing 20480K FFT, 24 cores, 22 workers.  Average times: 205.32, 203.54, 202.86, 202.82, 202.32, 201.86, 201.45, 202.20, 204.34, 203.72, 104.57, 201.89, 202.25, 203.05, 202.42, 200.99, 200.76, 200.71, 200.31, 203.17, 202.45, 104.58 ms.  Total throughput: 117.93 iter/sec.
Timing 20480K FFT, 24 cores, 24 workers.  Average times: 208.24, 204.87, 206.90, 203.85, 203.61, 203.74, 203.89, 203.37, 205.71, 206.17, 203.98, 200.84, 206.43, 207.21, 205.88, 205.75, 200.85, 205.02, 200.33, 200.59, 204.70, 203.13, 204.81, 203.15 ms.  Total throughput: 117.49 iter/sec.

Last fiddled with by Uncwilly on 2020-02-13 at 21:41 Reason: changed to code for better reading of text
jas is offline   Reply With Quote
Old 2020-02-13, 10:34   #41
axn
 
axn's Avatar
 
Jun 2003

22×3×373 Posts
Default

Quote:
Originally Posted by jas View Post
I received 8x8GB DDR4-2133 now, and re-ran 20480K FFT benchmarks. The performance now scale by number of cores up to 24 cores properly.
Cool!
Quote:
Originally Posted by jas View Post
The number of workers 2-24 does not seem to care, having 24 cores and 2 workers yield 115 iter/s and 24 cores with 24 workers yield 117 iter/s. Is this normal?
This is somewhat expected. The CPU is still bottlenecked by available bandwidth. A 20M FFT (which is 160MB in size) is much too big to fit in the 30MB L3 cache available. So even when you're running only 1 test per socket (i.e. 2 workers total), the memory is being hit pretty hard. Hence, almost all of the worker configurations end up giving the same thruput.
One thing I noticed is that when you had only 2 sticks, you were getting 38 iter/sec. With 4x bandwidth now available, naive scaling would have said 38x4 = 152 iter/sec. You're getting about 75% of that. While that is not bad, wondering whether some other system knobs can be turned to get some extra performance.

Quote:
Originally Posted by jas View Post
I think I will go with 2 workers and 24 cores to get a faster churn of founds.
Yep. Given how little performance difference is there between this and the best case, this is definitely the best choice. The individual tests are going to take about 2 months, so it is probably better to do the PRP test (with Gerbicz Error Check) rather than LL.
axn is offline   Reply With Quote
Old 2020-02-13, 10:49   #42
S485122
 
S485122's Avatar
 
Sep 2006
Brussels, Belgium

3×491 Posts
Default

In some of the timed configurations some cores have half the throughput of others. It already happens with low core counts, for instance 6, 9 or 12 cores and 3 workers each having its very different timing.
On the other hand the 24 cores 24 workers scenario has fairly equal average iteration times.
The asymmetrical core per worker configurations logically give asymmetrical timings.

Perhaps the explanation is in data missing from the benchmarks : what is the standard deviation in those average iteration times, how are the cores spread over the CPU's... ?

Jacob
S485122 is offline   Reply With Quote
Old 2020-02-13, 11:23   #43
axn
 
axn's Avatar
 
Jun 2003

105748 Posts
Default

Quote:
Originally Posted by S485122 View Post
Perhaps the explanation is in data missing from the benchmarks : what is the standard deviation in those average iteration times, how are the cores spread over the CPU's... ?
See George's post #36 and #39 upthread. Asymmetric = weirdness.
axn is offline   Reply With Quote
Old 2020-02-13, 11:26   #44
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

320910 Posts
Default

Quote:
Originally Posted by axn View Post
Given how little performance difference is there between this and the best case, this is definitely the best choice. The individual tests are going to take about 2 months, so it is probably better to do the PRP test (with Gerbicz Error Check) rather than LL.
For a couple percent more throughput, it's up to the user whether some more latency is acceptable. PRP/GEC ought be a requirement or at least the default for the current first-test wavefront and up for software/hardware combos that can run it. PRP/GEC for 100Mdigit is recommended and LL for 100Mdigit is not recommended, by no less than Prime95.
For the really high worker counts and large exponents, one must consider the probability of assignment expiration, and even probable hardware lifetime.
kriesel is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
access to windows machine files from ubuntu machine wildrabbitt Hardware 1 2019-02-24 18:34
prime95 settings evanh Software 3 2017-12-04 15:18
CPU settings and other newbie Qs Z3noN Information & Answers 2 2017-01-11 05:58
Memory Settings Fred Software 5 2016-05-03 00:51
settings for cudaLucas fairsky GPU Computing 11 2013-11-03 02:08

All times are UTC. The time now is 23:58.

Sun Feb 23 23:58:11 UTC 2020 up 23 days, 18:30, 2 users, load averages: 1.68, 2.16, 2.39

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2020, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.