(loop (#_fork))
I am a little confused by what's happening here:
Timing 20480K FFT, 12 cores, 2 workers. Average times: 50.92, 50.64 ms. Total throughput: 39.39 iter/sec. [Worker #1 Feb 4 17:37] Timing 20480K FFT, 12 cores, 3 workers. Average times: 103.31, 102.76, 50.81 ms. Total throughput: 39.09 iter/sec. 
CPU1 4 cores worker 3; 1 core worker 2; 1 core worker 1 CPU2 3 cores worker 2; 3 cores worker 1 Worker 3 would have faster communication between all its cores. 

Close. They are dual socket Xeons, each with 12 cores. Presumably the same explanation still applies, i.e. Workers 1 & 2 on socket 1 with 3 cores each, and Worker 3 on socket 2 with 6 cores. Although, I suppose the question is, why didn't it do 444, since there were spare cores available in each socket.

pkg 2: 4; not sharing memory bandwidth 

I received 8x8GB DDR42133 now, and reran 20480K FFT benchmarks. The performance now scale by number of cores up to 24 cores properly. The number of workers 224 does not seem to care, having 24 cores and 2 workers yield 115 iter/s and 24 cores with 24 workers yield 117 iter/s. Is this normal? I think I will go with 2 workers and 24 cores to get a faster churn of founds. /Simon Code:
Timing 20480K FFT, 1 core, 1 worker. Average times: 132.36 ms. Total throughput: 7.55 iter/sec. Timing 20480K FFT, 2 cores, 1 worker. Average times: 69.19 ms. Total throughput: 14.45 iter/sec. Timing 20480K FFT, 2 cores, 2 workers. Average times: 132.58, 132.45 ms. Total throughput: 15.09 iter/sec. Timing 20480K FFT, 3 cores, 1 worker. Average times: 48.95 ms. Total throughput: 20.43 iter/sec. Timing 20480K FFT, 3 cores, 2 workers. Average times: 67.98, 132.73 ms. Total throughput: 22.24 iter/sec. Timing 20480K FFT, 3 cores, 3 workers. Average times: 133.80, 133.10, 137.11 ms. Total throughput: 22.28 iter/sec. Timing 20480K FFT, 4 cores, 1 worker. Average times: 36.83 ms. Total throughput: 27.15 iter/sec. Timing 20480K FFT, 4 cores, 2 workers. Average times: 69.86, 69.49 ms. Total throughput: 28.71 iter/sec. Timing 20480K FFT, 4 cores, 3 workers. Average times: 134.28, 133.20, 67.70 ms. Total throughput: 29.73 iter/sec. Timing 20480K FFT, 4 cores, 4 workers. Average times: 138.31, 133.19, 136.47, 133.65 ms. Total throughput: 29.55 iter/sec. Timing 20480K FFT, 5 cores, 1 worker. Average times: 29.83 ms. Total throughput: 33.53 iter/sec. Timing 20480K FFT, 5 cores, 2 workers. Average times: 48.41, 68.36 ms. Total throughput: 35.29 iter/sec. Timing 20480K FFT, 5 cores, 3 workers. Average times: 142.63, 72.09, 68.86 ms. Total throughput: 35.40 iter/sec. Timing 20480K FFT, 5 cores, 4 workers. Average times: 145.07, 73.36, 133.23, 133.19 ms. Total throughput: 35.54 iter/sec. Timing 20480K FFT, 5 cores, 5 workers. Average times: 142.55, 141.35, 143.04, 135.20, 135.13 ms. Total throughput: 35.88 iter/sec. Timing 20480K FFT, 6 cores, 1 worker. Average times: 25.94 ms. Total throughput: 38.55 iter/sec. Timing 20480K FFT, 6 cores, 2 workers. Average times: 48.68, 48.69 ms. Total throughput: 41.08 iter/sec. Timing 20480K FFT, 6 cores, 3 workers. Average times: 142.82, 72.64, 48.95 ms. Total throughput: 41.20 iter/sec. Timing 20480K FFT, 6 cores, 4 workers. Average times: 143.11, 71.86, 142.39, 72.03 ms. Total throughput: 41.81 iter/sec. Timing 20480K FFT, 6 cores, 5 workers. Average times: 143.06, 141.80, 143.11, 143.67, 72.75 ms. Total throughput: 41.74 iter/sec. Timing 20480K FFT, 6 cores, 6 workers. Average times: 144.38, 141.73, 142.95, 146.63, 146.15, 146.02 ms. Total throughput: 41.49 iter/sec. Timing 20480K FFT, 7 cores, 1 worker. Average times: 22.81 ms. Total throughput: 43.84 iter/sec. Timing 20480K FFT, 7 cores, 2 workers. Average times: 36.59, 48.21 ms. Total throughput: 48.07 iter/sec. Timing 20480K FFT, 7 cores, 3 workers. Average times: 74.03, 73.91, 49.93 ms. Total throughput: 47.07 iter/sec. Timing 20480K FFT, 7 cores, 4 workers. Average times: 74.00, 73.84, 145.59, 74.23 ms. Total throughput: 47.39 iter/sec. Timing 20480K FFT, 7 cores, 5 workers. Average times: 145.49, 143.13, 72.93, 143.32, 72.59 ms. Total throughput: 48.32 iter/sec. Timing 20480K FFT, 7 cores, 6 workers. Average times: 144.42, 143.04, 73.28, 143.75, 142.97, 143.26 ms. Total throughput: 48.49 iter/sec. Timing 20480K FFT, 7 cores, 7 workers. Average times: 144.21, 143.15, 143.06, 144.38, 143.58, 143.14, 142.93 ms. Total throughput: 48.78 iter/sec. Timing 20480K FFT, 8 cores, 1 worker. Average times: 20.51 ms. Total throughput: 48.76 iter/sec. Timing 20480K FFT, 8 cores, 2 workers. Average times: 36.86, 36.86 ms. Total throughput: 54.26 iter/sec. Timing 20480K FFT, 8 cores, 3 workers. Average times: 73.06, 72.95, 36.88 ms. Total throughput: 54.51 iter/sec. Timing 20480K FFT, 8 cores, 4 workers. Average times: 73.28, 72.94, 73.56, 73.51 ms. Total throughput: 54.55 iter/sec. Timing 20480K FFT, 8 cores, 5 workers. Average times: 144.73, 144.33, 73.04, 73.47, 73.43 ms. Total throughput: 54.76 iter/sec. Timing 20480K FFT, 8 cores, 6 workers. Average times: 145.45, 142.96, 73.26, 144.39, 144.45, 73.51 ms. Total throughput: 54.97 iter/sec. Timing 20480K FFT, 8 cores, 7 workers. Average times: 145.92, 143.29, 143.21, 144.83, 145.18, 144.54, 73.57 ms. Total throughput: 55.12 iter/sec. Timing 20480K FFT, 8 cores, 8 workers. Average times: 149.16, 143.05, 143.03, 144.44, 147.75, 147.55, 147.36, 147.92 ms. Total throughput: 54.70 iter/sec. Timing 20480K FFT, 9 cores, 1 worker. Average times: 18.65 ms. Total throughput: 53.62 iter/sec. Timing 20480K FFT, 9 cores, 2 workers. Average times: 30.36, 37.43 ms. Total throughput: 59.66 iter/sec. Timing 20480K FFT, 9 cores, 3 workers. Average times: 75.58, 50.70, 38.01 ms. Total throughput: 59.27 iter/sec. Timing 20480K FFT, 9 cores, 4 workers. Average times: 74.76, 49.79, 73.61, 73.74 ms. Total throughput: 60.61 iter/sec. Timing 20480K FFT, 9 cores, 5 workers. Average times: 147.22, 74.28, 73.90, 73.41, 73.38 ms. Total throughput: 61.04 iter/sec. Timing 20480K FFT, 9 cores, 6 workers. Average times: 148.75, 73.94, 74.21, 144.92, 144.51, 73.55 ms. Total throughput: 61.14 iter/sec. Timing 20480K FFT, 9 cores, 7 workers. Average times: 147.33, 145.59, 145.78, 74.12, 144.48, 144.58, 73.87 ms. Total throughput: 61.38 iter/sec. Timing 20480K FFT, 9 cores, 8 workers. Average times: 148.60, 145.69, 145.63, 74.84, 147.36, 146.53, 146.55, 146.97 ms. Total throughput: 61.06 iter/sec. Timing 20480K FFT, 9 cores, 9 workers. Average times: 148.51, 145.62, 146.64, 147.24, 144.51, 144.70, 144.37, 145.60, 144.62 ms. Total throughput: 61.75 iter/sec. Timing 20480K FFT, 10 cores, 1 worker. Average times: 18.23 ms. Total throughput: 54.87 iter/sec. Timing 20480K FFT, 10 cores, 2 workers. Average times: 30.52, 30.57 ms. Total throughput: 65.48 iter/sec. Timing 20480K FFT, 10 cores, 3 workers. Average times: 75.03, 49.79, 30.11 ms. Total throughput: 66.63 iter/sec. Timing 20480K FFT, 10 cores, 4 workers. Average times: 75.61, 50.63, 75.48, 50.59 ms. Total throughput: 65.99 iter/sec. Timing 20480K FFT, 10 cores, 5 workers. Average times: 147.14, 75.07, 74.91, 76.38, 51.24 ms. Total throughput: 66.07 iter/sec. Timing 20480K FFT, 10 cores, 6 workers. Average times: 147.91, 74.01, 74.51, 147.48, 74.14, 74.02 ms. Total throughput: 67.47 iter/sec. Timing 20480K FFT, 10 cores, 7 workers. Average times: 147.26, 145.79, 146.46, 75.03, 149.65, 76.49, 75.99 ms. Total throughput: 66.72 iter/sec. Timing 20480K FFT, 10 cores, 8 workers. Average times: 146.90, 145.39, 146.41, 73.96, 146.72, 146.65, 146.65, 74.02 ms. Total throughput: 68.00 iter/sec. Timing 20480K FFT, 10 cores, 9 workers. Average times: 146.88, 146.07, 146.14, 147.32, 144.23, 146.74, 146.56, 147.22, 74.35 ms. Total throughput: 68.10 iter/sec. Timing 20480K FFT, 10 cores, 10 workers. Average times: 146.92, 146.68, 145.74, 146.68, 144.42, 147.39, 146.53, 146.74, 147.11, 145.91 ms. Total throughput: 68.30 iter/sec. Timing 20480K FFT, 11 cores, 1 worker. Average times: 17.49 ms. Total throughput: 57.17 iter/sec. Timing 20480K FFT, 11 cores, 2 workers. Average times: 25.37, 29.77 ms. Total throughput: 73.00 iter/sec. Timing 20480K FFT, 11 cores, 3 workers. Average times: 51.58, 51.01, 30.12 ms. Total throughput: 72.19 iter/sec. Timing 20480K FFT, 11 cores, 4 workers. Average times: 52.15, 51.69, 76.27, 51.15 ms. Total throughput: 71.18 iter/sec. Timing 20480K FFT, 11 cores, 5 workers. Average times: 77.93, 77.32, 76.51, 76.71, 51.25 ms. Total throughput: 71.38 iter/sec. Timing 20480K FFT, 11 cores, 6 workers. Average times: 77.55, 77.13, 76.10, 149.33, 75.75, 75.32 ms. Total throughput: 72.17 iter/sec. Timing 20480K FFT, 11 cores, 7 workers. Average times: 152.02, 149.30, 77.24, 76.04, 149.39, 75.72, 75.33 ms. Total throughput: 72.55 iter/sec. Timing 20480K FFT, 11 cores, 8 workers. Average times: 151.78, 149.02, 77.10, 76.49, 149.24, 148.89, 149.03, 75.91 ms. Total throughput: 72.64 iter/sec. Timing 20480K FFT, 11 cores, 9 workers. Average times: 150.95, 149.29, 149.44, 149.67, 76.46, 149.67, 149.58, 150.46, 76.33 ms. Total throughput: 72.89 iter/sec. Timing 20480K FFT, 11 cores, 10 workers. Average times: 150.57, 149.09, 149.35, 150.24, 76.26, 149.50, 149.93, 149.55, 149.77, 149.12 ms. Total throughput: 73.24 iter/sec. Timing 20480K FFT, 12 cores, 1 worker. Average times: 17.01 ms. Total throughput: 58.79 iter/sec. Timing 20480K FFT, 12 cores, 2 workers. Average times: 26.20, 26.22 ms. Total throughput: 76.31 iter/sec. Timing 20480K FFT, 12 cores, 3 workers. Average times: 51.60, 50.97, 25.61 ms. Total throughput: 78.05 iter/sec. Timing 20480K FFT, 12 cores, 4 workers. Average times: 52.66, 51.83, 51.88, 51.59 ms. Total throughput: 76.94 iter/sec. Timing 20480K FFT, 12 cores, 5 workers. Average times: 76.59, 76.23, 75.83, 50.89, 50.72 ms. Total throughput: 78.73 iter/sec. Timing 20480K FFT, 12 cores, 6 workers. Average times: 77.09, 76.84, 75.93, 76.61, 76.65, 75.83 ms. Total throughput: 78.44 iter/sec. Timing 20480K FFT, 12 cores, 7 workers. Average times: 151.09, 149.59, 76.60, 75.65, 76.82, 76.51, 75.68 ms. Total throughput: 78.88 iter/sec. Timing 20480K FFT, 12 cores, 8 workers. Average times: 152.80, 149.68, 76.65, 75.80, 151.35, 150.42, 77.12, 76.06 ms. Total throughput: 78.83 iter/sec. Timing 20480K FFT, 12 cores, 9 workers. Average times: 151.51, 150.18, 150.37, 151.59, 75.64, 151.19, 150.68, 76.79, 75.90 ms. Total throughput: 79.17 iter/sec. Timing 20480K FFT, 12 cores, 10 workers. Average times: 151.53, 150.43, 150.27, 150.73, 75.60, 150.43, 150.29, 150.99, 150.42, 75.99 ms. Total throughput: 79.50 iter/sec. Timing 20480K FFT, 12 cores, 12 workers. Average times: 150.76, 149.40, 149.95, 153.83, 149.97, 149.19, 150.02, 153.54, 150.14, 153.05, 149.12, 151.95 ms. Total throughput: 79.53 iter/sec. Timing 20480K FFT, 13 cores, 1 worker. Average times: 15.58 ms. Total throughput: 64.19 iter/sec. Timing 20480K FFT, 13 cores, 2 workers. Average times: 22.61, 25.69 ms. Total throughput: 83.15 iter/sec. Timing 20480K FFT, 13 cores, 3 workers. Average times: 53.55, 40.16, 26.20 ms. Total throughput: 81.74 iter/sec. Timing 20480K FFT, 13 cores, 4 workers. Average times: 53.08, 39.50, 50.83, 50.70 ms. Total throughput: 83.55 iter/sec. Timing 20480K FFT, 13 cores, 5 workers. Average times: 79.50, 79.00, 52.55, 51.15, 51.14 ms. Total throughput: 83.38 iter/sec. Timing 20480K FFT, 13 cores, 6 workers. Average times: 79.23, 78.67, 52.60, 76.88, 77.02, 76.09 ms. Total throughput: 83.48 iter/sec. Timing 20480K FFT, 13 cores, 7 workers. Average times: 156.57, 79.07, 78.80, 78.29, 76.80, 76.91, 76.00 ms. Total throughput: 83.68 iter/sec. Timing 20480K FFT, 13 cores, 8 workers. Average times: 156.30, 78.53, 78.17, 77.96, 149.84, 149.99, 76.08, 75.38 ms. Total throughput: 84.50 iter/sec. Timing 20480K FFT, 13 cores, 9 workers. Average times: 155.69, 154.14, 155.19, 79.63, 78.99, 153.83, 153.54, 78.81, 77.88 ms. Total throughput: 83.12 iter/sec. Timing 20480K FFT, 13 cores, 10 workers. Average times: 155.67, 153.87, 153.94, 78.97, 78.06, 150.37, 150.46, 151.12, 150.62, 75.96 ms. Total throughput: 84.61 iter/sec. Timing 20480K FFT, 13 cores, 12 workers. Average times: 157.13, 153.79, 154.22, 155.59, 154.30, 78.13, 150.61, 150.55, 151.48, 151.65, 149.38, 149.77 ms. Total throughput: 84.91 iter/sec. Timing 20480K FFT, 14 cores, 1 worker. Average times: 14.81 ms. Total throughput: 67.54 iter/sec. Timing 20480K FFT, 14 cores, 2 workers. Average times: 23.11, 23.02 ms. Total throughput: 86.70 iter/sec. Timing 20480K FFT, 14 cores, 3 workers. Average times: 53.67, 40.10, 22.94 ms. Total throughput: 87.15 iter/sec. Timing 20480K FFT, 14 cores, 4 workers. Average times: 52.84, 39.47, 52.94, 39.34 ms. Total throughput: 88.57 iter/sec. Timing 20480K FFT, 14 cores, 5 workers. Average times: 79.53, 78.92, 52.55, 53.21, 39.78 ms. Total throughput: 88.21 iter/sec. Timing 20480K FFT, 14 cores, 6 workers. Average times: 80.03, 79.74, 52.81, 79.43, 80.00, 52.97 ms. Total throughput: 87.94 iter/sec. Timing 20480K FFT, 14 cores, 7 workers. Average times: 155.62, 78.94, 79.07, 77.99, 78.99, 79.15, 52.52 ms. Total throughput: 88.90 iter/sec. Timing 20480K FFT, 14 cores, 8 workers. Average times: 155.59, 78.90, 78.52, 78.36, 155.77, 79.23, 78.69, 78.22 ms. Total throughput: 89.13 iter/sec. Timing 20480K FFT, 14 cores, 9 workers. Average times: 155.62, 153.97, 155.82, 79.01, 78.38, 155.86, 78.98, 79.07, 78.41 ms. Total throughput: 89.23 iter/sec. Timing 20480K FFT, 14 cores, 10 workers. Average times: 156.42, 156.07, 154.37, 78.95, 78.18, 154.87, 154.88, 155.99, 79.04, 78.43 ms. Total throughput: 89.46 iter/sec. Timing 20480K FFT, 14 cores, 12 workers. Average times: 157.27, 155.25, 154.45, 155.31, 154.07, 78.25, 155.16, 154.55, 155.65, 156.00, 152.82, 78.48 ms. Total throughput: 90.02 iter/sec. Timing 20480K FFT, 14 cores, 14 workers. Average times: 155.68, 154.81, 153.97, 155.60, 153.73, 153.76, 158.06, 158.83, 158.29, 159.32, 158.37, 157.50, 157.28, 156.81 ms. Total throughput: 89.43 iter/sec. Timing 20480K FFT, 15 cores, 1 worker. Average times: 14.59 ms. Total throughput: 68.52 iter/sec. Timing 20480K FFT, 15 cores, 2 workers. Average times: 20.47, 22.85 ms. Total throughput: 92.62 iter/sec. Timing 20480K FFT, 15 cores, 3 workers. Average times: 41.90, 41.24, 22.65 ms. Total throughput: 92.26 iter/sec. Timing 20480K FFT, 15 cores, 4 workers. Average times: 41.35, 40.98, 52.96, 39.53 ms. Total throughput: 92.77 iter/sec. Timing 20480K FFT, 15 cores, 5 workers. Average times: 82.79, 55.40, 54.75, 53.25, 39.85 ms. Total throughput: 92.27 iter/sec. Timing 20480K FFT, 15 cores, 6 workers. Average times: 82.25, 54.59, 54.45, 78.86, 79.09, 52.30 ms. Total throughput: 93.28 iter/sec. Timing 20480K FFT, 15 cores, 7 workers. Average times: 81.82, 81.70, 81.18, 81.06, 79.20, 79.26, 52.70 ms. Total throughput: 93.33 iter/sec. Timing 20480K FFT, 15 cores, 8 workers. Average times: 82.04, 81.90, 81.36, 81.32, 155.37, 79.17, 78.79, 78.27 ms. Total throughput: 93.52 iter/sec. Timing 20480K FFT, 15 cores, 9 workers. Average times: 161.23, 159.46, 81.71, 81.06, 80.97, 154.82, 78.60, 78.66, 77.93 ms. Total throughput: 94.13 iter/sec. Timing 20480K FFT, 15 cores, 10 workers. Average times: 161.20, 159.17, 81.99, 81.18, 81.76, 156.01, 154.65, 155.02, 78.56, 78.37 ms. Total throughput: 94.05 iter/sec. Timing 20480K FFT, 15 cores, 12 workers. Average times: 164.26, 159.09, 159.32, 160.16, 81.82, 81.91, 159.11, 158.45, 157.80, 157.84, 156.80, 80.38 ms. Total throughput: 93.41 iter/sec. Timing 20480K FFT, 15 cores, 14 workers. Average times: 161.56, 159.67, 159.89, 159.68, 158.57, 158.49, 80.98, 153.82, 155.36, 154.03, 154.08, 152.48, 152.35, 152.51 ms. Total throughput: 95.53 iter/sec. Timing 20480K FFT, 16 cores, 1 worker. Average times: 14.38 ms. Total throughput: 69.54 iter/sec. Timing 20480K FFT, 16 cores, 2 workers. Average times: 20.47, 20.48 ms. Total throughput: 97.67 iter/sec. Timing 20480K FFT, 16 cores, 3 workers. Average times: 42.63, 41.78, 20.96 ms. Total throughput: 95.11 iter/sec. Timing 20480K FFT, 16 cores, 4 workers. Average times: 42.45, 42.02, 42.43, 42.10 ms. Total throughput: 94.67 iter/sec. Timing 20480K FFT, 16 cores, 5 workers. Average times: 82.62, 55.14, 54.78, 41.37, 41.24 ms. Total throughput: 96.91 iter/sec. Timing 20480K FFT, 16 cores, 6 workers. Average times: 82.12, 55.03, 54.59, 81.88, 54.90, 54.54 ms. Total throughput: 97.43 iter/sec. Timing 20480K FFT, 16 cores, 7 workers. Average times: 82.19, 81.61, 81.32, 81.37, 81.76, 54.67, 54.42 ms. Total throughput: 97.91 iter/sec. Timing 20480K FFT, 16 cores, 8 workers. Average times: 82.62, 82.36, 81.51, 81.59, 82.36, 82.52, 82.03, 81.76 ms. Total throughput: 97.45 iter/sec. Timing 20480K FFT, 16 cores, 9 workers. Average times: 161.10, 160.27, 82.18, 81.44, 81.61, 82.65, 82.37, 81.37, 81.47 ms. Total throughput: 97.95 iter/sec. Timing 20480K FFT, 16 cores, 10 workers. Average times: 161.02, 159.63, 82.87, 82.15, 82.36, 162.18, 162.76, 83.35, 82.38, 82.63 ms. Total throughput: 97.41 iter/sec. Timing 20480K FFT, 16 cores, 12 workers. Average times: 165.29, 159.29, 160.47, 163.86, 81.86, 83.10, 158.87, 158.77, 160.18, 162.96, 81.79, 81.75 ms. Total throughput: 98.34 iter/sec. Timing 20480K FFT, 16 cores, 14 workers. Average times: 161.25, 159.60, 159.77, 159.99, 158.20, 158.03, 81.00, 159.85, 159.31, 159.25, 160.21, 157.96, 157.80, 80.91 ms. Total throughput: 100.05 iter/sec. Timing 20480K FFT, 16 cores, 16 workers. Average times: 161.69, 160.43, 159.56, 159.41, 158.05, 158.74, 159.04, 157.99, 159.75, 160.68, 160.95, 159.76, 159.27, 158.05, 158.21, 158.22 ms. Total throughput: 100.41 iter/sec. Timing 20480K FFT, 17 cores, 1 worker. Average times: 14.74 ms. Total throughput: 67.85 iter/sec. Timing 20480K FFT, 17 cores, 2 workers. Average times: 19.44, 20.98 ms. Total throughput: 99.11 iter/sec. Timing 20480K FFT, 17 cores, 3 workers. Average times: 44.37, 34.89, 20.69 ms. Total throughput: 99.54 iter/sec. Timing 20480K FFT, 17 cores, 4 workers. Average times: 43.55, 34.62, 41.81, 41.28 ms. Total throughput: 99.98 iter/sec. Timing 20480K FFT, 17 cores, 5 workers. Average times: 58.06, 57.13, 57.44, 41.59, 41.45 ms. Total throughput: 100.30 iter/sec. Timing 20480K FFT, 17 cores, 6 workers. Average times: 58.16, 57.31, 57.19, 81.32, 54.88, 54.47 ms. Total throughput: 101.01 iter/sec. Timing 20480K FFT, 17 cores, 7 workers. Average times: 85.28, 85.49, 85.21, 57.06, 81.77, 54.90, 54.39 ms. Total throughput: 101.52 iter/sec. Timing 20480K FFT, 17 cores, 8 workers. Average times: 86.66, 86.44, 86.17, 58.17, 83.93, 84.20, 82.88, 83.01 ms. Total throughput: 99.81 iter/sec. Timing 20480K FFT, 17 cores, 9 workers. Average times: 167.55, 86.34, 86.38, 85.53, 86.01, 83.31, 83.28, 82.99, 82.86 ms. Total throughput: 100.57 iter/sec. Timing 20480K FFT, 17 cores, 10 workers. Average times: 168.94, 86.17, 85.61, 85.24, 86.10, 162.28, 163.52, 83.03, 82.20, 82.30 ms. Total throughput: 101.19 iter/sec. Timing 20480K FFT, 17 cores, 12 workers. Average times: 168.41, 166.85, 167.53, 85.64, 85.50, 85.66, 160.84, 161.19, 159.75, 160.59, 81.51, 81.82 ms. Total throughput: 102.35 iter/sec. Timing 20480K FFT, 17 cores, 14 workers. Average times: 169.13, 167.51, 166.87, 168.58, 167.30, 85.91, 86.46, 162.53, 163.06, 162.48, 163.28, 161.80, 161.28, 82.83 ms. Total throughput: 102.01 iter/sec. Timing 20480K FFT, 17 cores, 16 workers. Average times: 169.56, 168.26, 167.66, 169.42, 167.31, 167.64, 168.48, 86.43, 159.97, 161.97, 160.19, 159.99, 160.82, 159.92, 159.63, 159.89 ms. Total throughput: 103.06 iter/sec. Timing 20480K FFT, 18 cores, 1 worker. Average times: 14.66 ms. Total throughput: 68.19 iter/sec. Timing 20480K FFT, 18 cores, 2 workers. Average times: 19.34, 19.26 ms. Total throughput: 103.63 iter/sec. Timing 20480K FFT, 18 cores, 3 workers. Average times: 43.57, 34.75, 19.02 ms. Total throughput: 104.30 iter/sec. Timing 20480K FFT, 18 cores, 4 workers. Average times: 44.03, 34.95, 43.28, 34.48 ms. Total throughput: 103.44 iter/sec. Timing 20480K FFT, 18 cores, 5 workers. Average times: 58.12, 57.72, 57.54, 43.48, 34.57 ms. Total throughput: 103.84 iter/sec. Timing 20480K FFT, 18 cores, 6 workers. Average times: 58.78, 58.25, 58.26, 58.52, 58.56, 58.33 ms. Total throughput: 102.65 iter/sec. Timing 20480K FFT, 18 cores, 7 workers. Average times: 86.23, 86.09, 85.46, 58.11, 58.31, 57.85, 58.18 ms. Total throughput: 103.75 iter/sec. Timing 20480K FFT, 18 cores, 8 workers. Average times: 86.93, 86.62, 85.77, 58.48, 87.25, 87.27, 86.61, 58.71 ms. Total throughput: 103.31 iter/sec. Timing 20480K FFT, 18 cores, 9 workers. Average times: 168.62, 86.87, 86.16, 85.88, 86.08, 87.15, 87.16, 86.84, 58.45 ms. Total throughput: 103.88 iter/sec. Timing 20480K FFT, 18 cores, 10 workers. Average times: 169.87, 85.93, 86.16, 84.85, 85.33, 167.83, 86.11, 85.52, 85.30, 85.46 ms. Total throughput: 105.33 iter/sec. Timing 20480K FFT, 18 cores, 12 workers. Average times: 168.40, 167.13, 168.30, 86.16, 85.51, 86.14, 168.58, 168.59, 168.17, 86.45, 86.15, 86.49 ms. Total throughput: 105.32 iter/sec. Timing 20480K FFT, 18 cores, 14 workers. Average times: 171.32, 168.02, 168.58, 170.28, 167.29, 85.61, 86.10, 168.05, 167.85, 168.71, 169.92, 165.83, 85.89, 86.32 ms. Total throughput: 105.84 iter/sec. Timing 20480K FFT, 18 cores, 16 workers. Average times: 170.64, 168.72, 169.02, 168.53, 167.00, 166.65, 167.48, 85.94, 168.57, 167.56, 167.52, 167.89, 166.54, 166.56, 166.10, 85.98 ms. Total throughput: 106.72 iter/sec. Timing 20480K FFT, 18 cores, 18 workers. Average times: 173.88, 170.74, 171.63, 170.91, 166.55, 168.84, 168.74, 168.81, 172.12, 169.01, 169.24, 168.62, 168.61, 166.00, 167.77, 166.76, 166.04, 169.98 ms. Total throughput: 106.45 iter/sec. Timing 20480K FFT, 19 cores, 1 worker. Average times: 14.46 ms. Total throughput: 69.18 iter/sec. Timing 20480K FFT, 19 cores, 2 workers. Average times: 18.57, 19.42 ms. Total throughput: 105.32 iter/sec. Timing 20480K FFT, 19 cores, 3 workers. Average times: 37.03, 36.65, 19.16 ms. Total throughput: 106.49 iter/sec. Timing 20480K FFT, 19 cores, 4 workers. Average times: 37.47, 37.01, 43.75, 34.85 ms. Total throughput: 105.26 iter/sec. Timing 20480K FFT, 19 cores, 5 workers. Average times: 61.89, 61.30, 46.78, 44.49, 35.05 ms. Total throughput: 104.86 iter/sec. Timing 20480K FFT, 19 cores, 6 workers. Average times: 61.40, 61.21, 46.28, 57.91, 57.81, 58.05 ms. Total throughput: 106.03 iter/sec. Timing 20480K FFT, 19 cores, 7 workers. Average times: 91.96, 91.75, 60.72, 61.27, 57.71, 57.50, 57.75 ms. Total throughput: 106.60 iter/sec. Timing 20480K FFT, 19 cores, 8 workers. Average times: 91.80, 91.44, 61.61, 61.80, 87.97, 87.92, 87.23, 59.16 ms. Total throughput: 105.35 iter/sec. Timing 20480K FFT, 19 cores, 9 workers. Average times: 92.37, 91.99, 91.06, 90.73, 91.73, 87.46, 87.61, 86.95, 58.99 ms. Total throughput: 105.90 iter/sec. Timing 20480K FFT, 19 cores, 10 workers. Average times: 92.11, 91.12, 90.86, 90.56, 91.41, 168.25, 85.99, 85.67, 86.36, 86.11 ms. Total throughput: 107.26 iter/sec. Timing 20480K FFT, 19 cores, 12 workers. Average times: 179.30, 176.91, 91.18, 90.34, 90.78, 91.46, 168.15, 168.14, 169.33, 85.40, 85.43, 85.62 ms. Total throughput: 108.11 iter/sec. Timing 20480K FFT, 19 cores, 14 workers. Average times: 180.69, 178.67, 178.41, 180.99, 91.38, 91.37, 92.73, 170.64, 170.09, 171.50, 170.01, 169.42, 86.99, 87.87 ms. Total throughput: 107.17 iter/sec. Timing 20480K FFT, 19 cores, 16 workers. Average times: 181.46, 178.32, 179.32, 179.68, 176.91, 177.75, 91.44, 92.00, 170.16, 168.50, 168.75, 168.94, 167.46, 166.63, 166.76, 86.17 ms. Total throughput: 108.58 iter/sec. Timing 20480K FFT, 19 cores, 18 workers. Average times: 181.75, 178.73, 179.32, 178.97, 177.60, 178.31, 178.70, 178.30, 92.50, 169.39, 169.15, 168.97, 169.80, 167.93, 167.53, 167.73, 167.29, 168.52 ms. Total throughput: 108.93 iter/sec. Timing 20480K FFT, 20 cores, 1 worker. Average times: 14.33 ms. Total throughput: 69.77 iter/sec. Timing 20480K FFT, 20 cores, 2 workers. Average times: 18.53, 18.22 ms. Total throughput: 108.83 iter/sec. Timing 20480K FFT, 20 cores, 3 workers. Average times: 37.37, 36.98, 18.13 ms. Total throughput: 108.97 iter/sec. Timing 20480K FFT, 20 cores, 4 workers. Average times: 37.01, 37.03, 37.21, 36.96 ms. Total throughput: 107.95 iter/sec. Timing 20480K FFT, 20 cores, 5 workers. Average times: 61.87, 60.87, 46.40, 36.82, 36.81 ms. Total throughput: 108.47 iter/sec. Timing 20480K FFT, 20 cores, 6 workers. Average times: 62.08, 61.33, 46.39, 61.07, 60.83, 46.13 ms. Total throughput: 108.46 iter/sec. Timing 20480K FFT, 20 cores, 7 workers. Average times: 91.69, 91.68, 60.44, 61.00, 61.45, 61.07, 46.20 ms. Total throughput: 109.05 iter/sec. Timing 20480K FFT, 20 cores, 8 workers. Average times: 92.44, 92.01, 61.47, 62.06, 92.45, 92.46, 61.70, 62.09 ms. Total throughput: 108.01 iter/sec. Timing 20480K FFT, 20 cores, 9 workers. Average times: 91.65, 92.05, 90.75, 90.92, 91.67, 92.18, 91.61, 61.45, 61.62 ms. Total throughput: 108.97 iter/sec. Timing 20480K FFT, 20 cores, 10 workers. Average times: 91.94, 92.25, 90.78, 91.27, 91.81, 91.25, 92.95, 90.68, 90.93, 92.95 ms. Total throughput: 109.08 iter/sec. Timing 20480K FFT, 20 cores, 12 workers. Average times: 180.43, 178.40, 92.37, 91.60, 92.28, 92.71, 181.11, 180.68, 93.24, 92.15, 92.13, 93.32 ms. Total throughput: 108.72 iter/sec. Timing 20480K FFT, 20 cores, 14 workers. Average times: 180.46, 177.33, 178.84, 178.34, 91.19, 91.21, 91.94, 178.69, 178.02, 178.15, 178.97, 91.48, 91.11, 92.09 ms. Total throughput: 110.37 iter/sec. Timing 20480K FFT, 20 cores, 16 workers. Average times: 181.27, 178.44, 178.09, 179.65, 177.03, 177.42, 91.53, 92.17, 178.04, 177.42, 178.18, 179.06, 176.34, 176.53, 90.91, 92.34 ms. Total throughput: 110.98 iter/sec. Timing 20480K FFT, 20 cores, 18 workers. Average times: 180.52, 178.45, 179.25, 179.46, 177.76, 177.93, 181.03, 178.40, 93.50, 181.01, 179.42, 180.35, 180.58, 179.06, 177.97, 178.08, 178.84, 92.99 ms. Total throughput: 110.71 iter/sec. Timing 20480K FFT, 20 cores, 20 workers. Average times: 181.30, 180.31, 182.31, 180.67, 178.09, 179.82, 180.40, 178.19, 180.55, 180.69, 179.00, 179.90, 179.16, 180.70, 177.94, 177.51, 177.91, 177.36, 181.18, 179.11 ms. Total throughput: 111.36 iter/sec. Timing 20480K FFT, 21 cores, 1 worker. Average times: 14.35 ms. Total throughput: 69.71 iter/sec. Timing 20480K FFT, 21 cores, 2 workers. Average times: 17.74, 18.51 ms. Total throughput: 110.37 iter/sec. Timing 20480K FFT, 21 cores, 3 workers. Average times: 39.95, 32.99, 18.59 ms. Total throughput: 109.14 iter/sec. Timing 20480K FFT, 21 cores, 4 workers. Average times: 39.59, 32.88, 36.72, 36.90 ms. Total throughput: 110.00 iter/sec. Timing 20480K FFT, 21 cores, 5 workers. Average times: 65.15, 48.84, 49.15, 36.93, 36.74 ms. Total throughput: 110.47 iter/sec. Timing 20480K FFT, 21 cores, 6 workers. Average times: 66.09, 49.49, 50.00, 62.62, 62.10, 46.63 ms. Total throughput: 108.85 iter/sec. Timing 20480K FFT, 21 cores, 7 workers. Average times: 97.86, 65.38, 64.89, 65.70, 61.68, 60.63, 46.10 ms. Total throughput: 110.54 iter/sec. Timing 20480K FFT, 21 cores, 8 workers. Average times: 97.24, 65.60, 65.11, 65.97, 92.61, 92.76, 62.22, 62.84 ms. Total throughput: 109.61 iter/sec. Timing 20480K FFT, 21 cores, 9 workers. Average times: 96.95, 96.82, 95.57, 96.09, 65.79, 92.78, 93.42, 61.95, 62.60 ms. Total throughput: 110.31 iter/sec. Timing 20480K FFT, 21 cores, 10 workers. Average times: 96.96, 97.12, 96.18, 96.23, 65.53, 91.26, 90.64, 90.38, 90.31, 91.75 ms. Total throughput: 111.69 iter/sec. Timing 20480K FFT, 21 cores, 12 workers. Average times: 190.41, 97.21, 97.73, 96.47, 97.35, 97.46, 179.36, 178.73, 92.99, 92.14, 92.25, 93.01 ms. Total throughput: 111.04 iter/sec. Timing 20480K FFT, 21 cores, 14 workers. Average times: 190.98, 187.50, 188.09, 99.55, 97.02, 97.27, 98.11, 179.75, 179.81, 181.16, 180.67, 91.37, 92.28, 93.71 ms. Total throughput: 111.34 iter/sec. Timing 20480K FFT, 21 cores, 16 workers. Average times: 191.49, 189.06, 189.27, 190.21, 186.14, 96.77, 97.51, 97.68, 177.95, 178.34, 178.25, 178.40, 177.14, 177.40, 90.96, 92.07 ms. Total throughput: 112.83 iter/sec. Timing 20480K FFT, 21 cores, 18 workers. Average times: 192.17, 190.99, 190.39, 190.35, 187.80, 188.02, 188.34, 97.21, 97.94, 179.70, 177.59, 179.25, 179.78, 177.66, 178.80, 175.08, 177.70, 92.85 ms. Total throughput: 113.06 iter/sec. Timing 20480K FFT, 21 cores, 20 workers. Average times: 193.19, 191.19, 191.17, 190.27, 188.04, 189.27, 188.60, 188.97, 189.88, 98.17, 178.72, 180.79, 177.60, 177.88, 177.22, 176.69, 177.12, 177.79, 179.09, 178.66 ms. Total throughput: 113.67 iter/sec. Timing 20480K FFT, 22 cores, 1 worker. Average times: 14.78 ms. Total throughput: 67.65 iter/sec. Timing 20480K FFT, 22 cores, 2 workers. Average times: 17.88, 17.87 ms. Total throughput: 111.89 iter/sec. Timing 20480K FFT, 22 cores, 3 workers. Average times: 39.48, 32.90, 17.78 ms. Total throughput: 111.96 iter/sec. Timing 20480K FFT, 22 cores, 4 workers. Average times: 39.99, 32.92, 39.57, 32.90 ms. Total throughput: 111.05 iter/sec. Timing 20480K FFT, 22 cores, 5 workers. Average times: 65.21, 48.87, 49.10, 39.18, 32.42 ms. Total throughput: 112.53 iter/sec. Timing 20480K FFT, 22 cores, 6 workers. Average times: 65.45, 48.77, 49.20, 65.47, 48.80, 49.21 ms. Total throughput: 112.19 iter/sec. Timing 20480K FFT, 22 cores, 7 workers. Average times: 97.15, 65.69, 65.25, 65.58, 66.44, 49.43, 49.88 ms. Total throughput: 111.42 iter/sec. Timing 20480K FFT, 22 cores, 8 workers. Average times: 98.22, 65.77, 65.22, 65.89, 97.06, 65.45, 64.36, 65.64 ms. Total throughput: 112.25 iter/sec. Timing 20480K FFT, 22 cores, 9 workers. Average times: 97.91, 97.18, 96.11, 97.70, 65.99, 97.61, 65.80, 65.43, 66.58 ms. Total throughput: 112.04 iter/sec. Timing 20480K FFT, 22 cores, 10 workers. Average times: 97.33, 97.55, 96.81, 97.17, 65.91, 98.28, 98.19, 97.53, 97.33, 66.02 ms. Total throughput: 112.35 iter/sec. Timing 20480K FFT, 22 cores, 12 workers. Average times: 190.56, 96.52, 97.05, 95.97, 96.59, 97.37, 188.09, 96.70, 96.48, 95.80, 97.10, 96.78 ms. Total throughput: 114.05 iter/sec. Timing 20480K FFT, 22 cores, 14 workers. Average times: 192.15, 188.71, 189.87, 97.44, 96.54, 97.38, 97.62, 188.72, 188.02, 189.87, 96.48, 95.89, 96.78, 96.79 ms. Total throughput: 114.25 iter/sec. Timing 20480K FFT, 22 cores, 16 workers. Average times: 189.42, 189.08, 188.56, 190.61, 187.79, 97.10, 97.60, 97.33, 190.07, 189.08, 190.96, 189.30, 188.11, 96.15, 96.97, 97.63 ms. Total throughput: 114.60 iter/sec. Timing 20480K FFT, 22 cores, 18 workers. Average times: 193.86, 190.18, 190.41, 190.56, 189.31, 187.77, 189.42, 97.48, 98.15, 189.66, 188.68, 190.68, 190.14, 188.06, 186.72, 187.90, 97.16, 97.54 ms. Total throughput: 114.87 iter/sec. Timing 20480K FFT, 22 cores, 20 workers. Average times: 191.24, 190.99, 190.41, 192.42, 190.73, 189.40, 190.14, 190.07, 192.24, 98.84, 192.60, 190.80, 191.15, 191.10, 188.52, 188.55, 190.13, 189.10, 190.63, 97.01 ms. Total throughput: 114.88 iter/sec. Timing 20480K FFT, 22 cores, 22 workers. Average times: 193.14, 192.92, 193.39, 192.15, 190.40, 190.75, 190.29, 191.70, 191.94, 192.37, 191.91, 189.43, 190.13, 190.40, 189.13, 188.51, 187.68, 187.53, 188.74, 189.37, 189.12, 191.91 ms. Total throughput: 115.44 iter/sec. Timing 20480K FFT, 23 cores, 1 worker. Average times: 15.10 ms. Total throughput: 66.20 iter/sec. Timing 20480K FFT, 23 cores, 2 workers. Average times: 17.42, 17.82 ms. Total throughput: 113.53 iter/sec. Timing 20480K FFT, 23 cores, 3 workers. Average times: 34.42, 34.52, 17.70 ms. Total throughput: 114.51 iter/sec. Timing 20480K FFT, 23 cores, 4 workers. Average times: 34.64, 34.63, 38.90, 32.44 ms. Total throughput: 114.27 iter/sec. Timing 20480K FFT, 23 cores, 5 workers. Average times: 51.99, 51.50, 52.49, 39.27, 32.62 ms. Total throughput: 113.83 iter/sec. Timing 20480K FFT, 23 cores, 6 workers. Average times: 52.23, 51.52, 52.19, 64.99, 48.69, 49.10 ms. Total throughput: 114.01 iter/sec. Timing 20480K FFT, 23 cores, 7 workers. Average times: 68.93, 69.02, 69.17, 69.04, 64.55, 48.77, 49.07 ms. Total throughput: 114.31 iter/sec. Timing 20480K FFT, 23 cores, 8 workers. Average times: 69.27, 68.93, 69.10, 69.06, 97.12, 65.32, 64.35, 65.17 ms. Total throughput: 114.39 iter/sec. Timing 20480K FFT, 23 cores, 9 workers. Average times: 102.48, 102.94, 101.63, 69.00, 69.62, 98.58, 65.71, 65.65, 66.18 ms. Total throughput: 113.87 iter/sec. Timing 20480K FFT, 23 cores, 10 workers. Average times: 103.19, 102.80, 102.06, 69.16, 69.60, 97.04, 97.17, 95.72, 95.71, 65.39 ms. Total throughput: 114.83 iter/sec. Timing 20480K FFT, 23 cores, 12 workers. Average times: 103.92, 103.06, 101.51, 101.82, 102.97, 102.68, 190.08, 97.74, 96.77, 96.72, 98.02, 97.89 ms. Total throughput: 115.03 iter/sec. Timing 20480K FFT, 23 cores, 14 workers. Average times: 203.38, 200.88, 103.98, 102.17, 102.22, 103.36, 103.50, 190.13, 188.45, 188.65, 96.85, 96.17, 97.08, 97.51 ms. Total throughput: 115.57 iter/sec. Timing 20480K FFT, 23 cores, 16 workers. Average times: 203.91, 202.22, 201.25, 202.52, 102.79, 103.25, 103.98, 103.85, 189.33, 188.93, 189.79, 188.50, 187.50, 96.47, 96.39, 96.77 ms. Total throughput: 115.97 iter/sec. Timing 20480K FFT, 23 cores, 18 workers. Average times: 201.46, 202.25, 200.97, 203.44, 197.94, 197.13, 103.52, 105.28, 104.11, 189.91, 190.35, 190.68, 192.55, 188.30, 191.04, 187.40, 97.84, 97.90 ms. Total throughput: 115.96 iter/sec. Timing 20480K FFT, 23 cores, 20 workers. Average times: 203.42, 201.23, 204.95, 205.08, 201.24, 199.87, 199.54, 200.00, 104.92, 104.46, 193.29, 192.31, 192.57, 193.48, 190.11, 189.77, 191.37, 191.10, 192.83, 99.31 ms. Total throughput: 115.71 iter/sec. Timing 20480K FFT, 23 cores, 22 workers. Average times: 208.68, 206.92, 206.34, 206.14, 201.64, 200.86, 200.84, 201.54, 202.53, 202.12, 105.61, 192.49, 191.58, 192.60, 192.33, 190.82, 190.57, 189.93, 190.43, 193.61, 191.30, 191.38 ms. Total throughput: 115.98 iter/sec. Timing 20480K FFT, 24 cores, 1 worker. Average times: 13.03 ms. Total throughput: 76.76 iter/sec. Timing 20480K FFT, 24 cores, 2 workers. Average times: 17.41, 17.15 ms. Total throughput: 115.76 iter/sec. Timing 20480K FFT, 24 cores, 3 workers. Average times: 34.89, 34.72, 17.00 ms. Total throughput: 116.29 iter/sec. Timing 20480K FFT, 24 cores, 4 workers. Average times: 34.62, 34.58, 34.09, 34.32 ms. Total throughput: 116.27 iter/sec. Timing 20480K FFT, 24 cores, 5 workers. Average times: 52.35, 52.11, 52.30, 34.83, 34.88 ms. Total throughput: 114.79 iter/sec. Timing 20480K FFT, 24 cores, 6 workers. Average times: 51.84, 51.59, 52.05, 51.82, 51.32, 51.98 ms. Total throughput: 115.91 iter/sec. Timing 20480K FFT, 24 cores, 7 workers. Average times: 69.18, 68.77, 68.89, 69.64, 51.71, 51.01, 51.97 ms. Total throughput: 116.06 iter/sec. Timing 20480K FFT, 24 cores, 8 workers. Average times: 69.21, 69.03, 68.68, 69.17, 69.56, 69.22, 69.68, 70.00 ms. Total throughput: 115.41 iter/sec. Timing 20480K FFT, 24 cores, 9 workers. Average times: 103.40, 102.88, 101.65, 69.13, 69.68, 69.88, 69.34, 69.66, 69.65 ms. Total throughput: 115.49 iter/sec. Timing 20480K FFT, 24 cores, 10 workers. Average times: 103.35, 103.55, 101.99, 69.48, 69.96, 103.45, 102.84, 102.47, 69.44, 69.75 ms. Total throughput: 115.71 iter/sec. Timing 20480K FFT, 24 cores, 12 workers. Average times: 102.80, 102.81, 101.98, 102.57, 103.11, 103.10, 102.58, 102.97, 101.55, 102.18, 103.31, 103.22 ms. Total throughput: 116.87 iter/sec. Timing 20480K FFT, 24 cores, 14 workers. Average times: 203.37, 200.08, 102.96, 101.62, 102.40, 103.53, 103.43, 199.81, 199.41, 102.90, 101.49, 101.78, 102.80, 103.45 ms. Total throughput: 117.37 iter/sec. Timing 20480K FFT, 24 cores, 16 workers. Average times: 203.00, 201.21, 202.24, 201.52, 103.07, 102.63, 103.32, 104.19, 200.65, 199.85, 199.14, 199.08, 101.64, 102.27, 103.48, 103.51 ms. Total throughput: 117.50 iter/sec. Timing 20480K FFT, 24 cores, 18 workers. Average times: 205.94, 200.88, 201.82, 203.74, 200.08, 199.77, 103.67, 104.79, 104.75, 201.66, 200.60, 201.52, 201.44, 201.26, 199.32, 103.41, 104.07, 104.14 ms. Total throughput: 117.17 iter/sec. Timing 20480K FFT, 24 cores, 20 workers. Average times: 202.47, 201.40, 203.56, 203.82, 198.64, 199.01, 201.23, 200.29, 104.72, 104.58, 204.21, 202.93, 202.50, 202.24, 201.01, 200.99, 201.12, 201.27, 104.85, 104.77 ms. Total throughput: 117.54 iter/sec. Timing 20480K FFT, 24 cores, 22 workers. Average times: 205.32, 203.54, 202.86, 202.82, 202.32, 201.86, 201.45, 202.20, 204.34, 203.72, 104.57, 201.89, 202.25, 203.05, 202.42, 200.99, 200.76, 200.71, 200.31, 203.17, 202.45, 104.58 ms. Total throughput: 117.93 iter/sec. Timing 20480K FFT, 24 cores, 24 workers. Average times: 208.24, 204.87, 206.90, 203.85, 203.61, 203.74, 203.89, 203.37, 205.71, 206.17, 203.98, 200.84, 206.43, 207.21, 205.88, 205.75, 200.85, 205.02, 200.33, 200.59, 204.70, 203.13, 204.81, 203.15 ms. Total throughput: 117.49 iter/sec. 

One thing I noticed is that when you had only 2 sticks, you were getting 38 iter/sec. With 4x bandwidth now available, naive scaling would have said 38x4 = 152 iter/sec. You're getting about 75% of that. While that is not bad, wondering whether some other system knobs can be turned to get some extra performance. Yep. Given how little performance difference is there between this and the best case, this is definitely the best choice. The individual tests are going to take about 2 months, so it is probably better to do the PRP test (with Gerbicz Error Check) rather than LL. 

In some of the timed configurations some cores have half the throughput of others. It already happens with low core counts, for instance 6, 9 or 12 cores and 3 workers each having its very different timing.
On the other hand the 24 cores 24 workers scenario has fairly equal average iteration times. The asymmetrical core per worker configurations logically give asymmetrical timings. Perhaps the explanation is in data missing from the benchmarks : what is the standard deviation in those average iteration times, how are the cores spread over the CPU's... ? Jacob 
For the really high worker counts and large exponents, one must consider the probability of assignment expiration, and even probable hardware lifetime. 

