Interesting that 300K which is so much worse with 8 Cores/1 Worker ...
has much more consistent times for 8 / 8. Timings for 280K FFT length (8 cores, 1 worker): 0.16 ms. Throughput: 6246.39 iter/sec. Timings for 280K FFT length (8 cores, 2 workers): 0.23, 0.23 ms. Throughput: 8765.58 iter/sec. Timings for 280K FFT length (8 cores, 4 workers): 0.39, 0.39, 0.39, 0.39 ms. Throughput: 10241.03 iter/sec. Timings for 280K FFT length (8 cores, 8 workers): 0.67, 0.67, 0.66, 0.66, 0.67, 0.66, 0.66, 0.67 ms. Throughput: 12039.18 iter/sec. Timings for 300K FFT length (8 cores, 1 worker): 0.36 ms. Throughput: 2795.59 iter/sec. Timings for 300K FFT length (8 cores, 2 workers): 0.34, 0.34 ms. Throughput: 5864.26 iter/sec. Timings for 300K FFT length (8 cores, 4 workers): 0.51, 0.52, 0.52, 0.50 ms. Throughput: 7814.25 iter/sec. Timings for 300K FFT length (8 cores, 8 workers): 0.73, 0.72, 0.72, 0.70, 0.71, 0.71, 0.71, 0.71 ms. Throughput: 11245.56 iter/sec. Timings for 320K FFT length (8 cores, 1 worker): 0.17 ms. Throughput: 5715.57 iter/sec. Timings for 320K FFT length (8 cores, 2 workers): 0.26, 0.26 ms. Throughput: 7777.33 iter/sec. Timings for 320K FFT length (8 cores, 4 workers): 0.45, 0.44, 0.45, 0.45 ms. Throughput: 8965.12 iter/sec. Timings for 320K FFT length (8 cores, 8 workers): 1.00, 0.83, 0.85, 0.86, 0.84, 0.82, 0.84, 0.83 ms. Throughput: 9329.29 iter/sec. 
5800x3D 1024K to 8192K throughput benchmark.
Nonoptimized, nonoc'd CPU, 4000MHz RAM DDR4 using AXMS stock settings.
Nothing unexpected or outstanding at first glance. Code:
CPU speed: 3400.12 MHz, 8 hyperthreaded cores CPU features: 3DNow! Prefetch, SSE, SSE2, SSE4, AVX, AVX2, FMA L1 cache size: 8x32 KB, L2 cache size: 8x512 KB, L3 cache size: 96 MB L1 cache line size: 64 bytes, L2 cache line size: 64 bytes Machine topology as determined by hwloc library: Machine#0 (total=13447548KB, Backend=Windows, OSName=Windows, WindowsBuildEnvironment=MinGW, OSRelease=10, OSVersion=10.0.18362, Hostname=5800X3D, Architecture=x86_64, hwlocVersion=2.4.1, ProcessName=prime95.exe) Package (total=13447548KB, CPUVendor=AuthenticAMD, CPUFamilyNumber=25, CPUModelNumber=33, CPUModel="AMD Ryzen 7 5800X3D 8Core Processor ", CPUStepping=2) L3 (size=98304KB, linesize=64, ways=16, Inclusive=0) L2 (size=512KB, linesize=64, ways=8, Inclusive=1) L1d (size=32KB, linesize=64, ways=8, Inclusive=0) Core (cpuset: 0x00000003) PU#0 (cpuset: 0x00000001) PU#1 (cpuset: 0x00000002) L2 (size=512KB, linesize=64, ways=8, Inclusive=1) L1d (size=32KB, linesize=64, ways=8, Inclusive=0) Core (cpuset: 0x0000000c) PU#2 (cpuset: 0x00000004) PU#3 (cpuset: 0x00000008) L2 (size=512KB, linesize=64, ways=8, Inclusive=1) L1d (size=32KB, linesize=64, ways=8, Inclusive=0) Core (cpuset: 0x00000030) PU#4 (cpuset: 0x00000010) PU#5 (cpuset: 0x00000020) L2 (size=512KB, linesize=64, ways=8, Inclusive=1) L1d (size=32KB, linesize=64, ways=8, Inclusive=0) Core (cpuset: 0x000000c0) PU#6 (cpuset: 0x00000040) PU#7 (cpuset: 0x00000080) L2 (size=512KB, linesize=64, ways=8, Inclusive=1) L1d (size=32KB, linesize=64, ways=8, Inclusive=0) Core (cpuset: 0x00000300) PU#8 (cpuset: 0x00000100) PU#9 (cpuset: 0x00000200) L2 (size=512KB, linesize=64, ways=8, Inclusive=1) L1d (size=32KB, linesize=64, ways=8, Inclusive=0) Core (cpuset: 0x00000c00) PU#10 (cpuset: 0x00000400) PU#11 (cpuset: 0x00000800) L2 (size=512KB, linesize=64, ways=8, Inclusive=1) L1d (size=32KB, linesize=64, ways=8, Inclusive=0) Core (cpuset: 0x00003000) PU#12 (cpuset: 0x00001000) PU#13 (cpuset: 0x00002000) L2 (size=512KB, linesize=64, ways=8, Inclusive=1) L1d (size=32KB, linesize=64, ways=8, Inclusive=0) Core (cpuset: 0x0000c000) PU#14 (cpuset: 0x00004000) PU#15 (cpuset: 0x00008000) Prime95 64bit version 30.7, RdtscTiming=1 Timings for 1024K FFT length (8 cores, 1 worker): 0.55 ms. Throughput: 1821.51 iter/sec. Timings for 1024K FFT length (8 cores, 2 workers): 1.04, 1.04 ms. Throughput: 1931.97 iter/sec. Timings for 1024K FFT length (8 cores, 4 workers): 2.00, 1.99, 2.00, 2.00 ms. Throughput: 2001.40 iter/sec. Timings for 1024K FFT length (8 cores, 8 workers): 3.82, 3.82, 3.82, 3.80, 3.82, 3.83, 3.84, 3.82 ms. Throughput: 2093.63 iter/sec. Timings for 1120K FFT length (8 cores, 1 worker): 0.60 ms. Throughput: 1653.34 iter/sec. Timings for 1120K FFT length (8 cores, 2 workers): 1.15, 1.15 ms. Throughput: 1735.32 iter/sec. Timings for 1120K FFT length (8 cores, 4 workers): 2.22, 2.24, 2.21, 2.20 ms. Throughput: 1804.09 iter/sec. Timings for 1120K FFT length (8 cores, 8 workers): 4.21, 4.23, 4.21, 4.21, 4.21, 4.23, 4.22, 4.21 ms. Throughput: 1897.55 iter/sec. Timings for 1152K FFT length (8 cores, 1 worker): 0.59 ms. Throughput: 1691.24 iter/sec. Timings for 1152K FFT length (8 cores, 2 workers): 1.13, 1.13 ms. Throughput: 1769.93 iter/sec. Timings for 1152K FFT length (8 cores, 4 workers): 2.22, 2.20, 2.20, 2.20 ms. Throughput: 1814.40 iter/sec. Timings for 1152K FFT length (8 cores, 8 workers): 4.29, 4.26, 4.26, 4.23, 4.22, 4.25, 4.24, 4.23 ms. Throughput: 1883.85 iter/sec. [Thu May 5 22:25:18 2022] Timings for 1280K FFT length (8 cores, 1 worker): 0.68 ms. Throughput: 1464.10 iter/sec. Timings for 1280K FFT length (8 cores, 2 workers): 1.30, 1.30 ms. Throughput: 1535.99 iter/sec. Timings for 1280K FFT length (8 cores, 4 workers): 2.52, 2.57, 2.51, 2.52 ms. Throughput: 1581.03 iter/sec. Timings for 1280K FFT length (8 cores, 8 workers): 4.85, 4.90, 4.84, 4.86, 4.85, 4.88, 4.86, 4.86 ms. Throughput: 1645.09 iter/sec. Timings for 1344K FFT length (8 cores, 1 worker): 0.71 ms. Throughput: 1401.51 iter/sec. Timings for 1344K FFT length (8 cores, 2 workers): 1.37, 1.37 ms. Throughput: 1456.32 iter/sec. Timings for 1344K FFT length (8 cores, 4 workers): 2.70, 2.68, 2.69, 2.69 ms. Throughput: 1486.34 iter/sec. Timings for 1344K FFT length (8 cores, 8 workers): 5.22, 5.23, 5.22, 5.22, 5.22, 5.24, 5.27, 5.22 ms. Throughput: 1529.46 iter/sec. Timings for 1440K FFT length (8 cores, 1 worker): 0.75 ms. Throughput: 1339.61 iter/sec. Timings for 1440K FFT length (8 cores, 2 workers): 1.46, 1.44 ms. Throughput: 1376.79 iter/sec. Timings for 1440K FFT length (8 cores, 4 workers): 2.83, 2.81, 2.86, 2.84 ms. Throughput: 1411.41 iter/sec. Timings for 1440K FFT length (8 cores, 8 workers): 5.53, 5.56, 5.46, 5.47, 5.49, 5.53, 5.49, 5.48 ms. Throughput: 1454.27 iter/sec. Timings for 1536K FFT length (8 cores, 1 worker): 0.84 ms. Throughput: 1189.28 iter/sec. Timings for 1536K FFT length (8 cores, 2 workers): 1.59, 1.61 ms. Throughput: 1250.85 iter/sec. Timings for 1536K FFT length (8 cores, 4 workers): 3.04, 3.03, 3.04, 3.05 ms. Throughput: 1315.83 iter/sec. Timings for 1536K FFT length (8 cores, 8 workers): 5.86, 5.84, 5.81, 5.84, 5.84, 5.86, 5.83, 5.85 ms. Throughput: 1369.74 iter/sec. Timings for 1600K FFT length (8 cores, 1 worker): 0.82 ms. Throughput: 1214.51 iter/sec. Timings for 1600K FFT length (8 cores, 2 workers): 1.61, 1.59 ms. Throughput: 1250.17 iter/sec. Timings for 1600K FFT length (8 cores, 4 workers): 3.11, 3.14, 3.12, 3.11 ms. Throughput: 1281.28 iter/sec. Timings for 1600K FFT length (8 cores, 8 workers): 6.15, 6.13, 6.11, 6.16, 6.11, 6.15, 6.17, 6.13 ms. Throughput: 1303.17 iter/sec. Timings for 1680K FFT length (8 cores, 1 worker): 0.91 ms. Throughput: 1104.91 iter/sec. Timings for 1680K FFT length (8 cores, 2 workers): 1.77, 1.76 ms. Throughput: 1133.54 iter/sec. Timings for 1680K FFT length (8 cores, 4 workers): 3.45, 3.44, 3.45, 3.45 ms. Throughput: 1160.00 iter/sec. Timings for 1680K FFT length (8 cores, 8 workers): 6.85, 6.84, 6.83, 6.81, 6.82, 6.85, 6.84, 6.84 ms. Throughput: 1170.61 iter/sec. Timings for 1792K FFT length (8 cores, 1 worker): 1.01 ms. Throughput: 991.35 iter/sec. Timings for 1792K FFT length (8 cores, 2 workers): 1.90, 1.91 ms. Throughput: 1049.05 iter/sec. Timings for 1792K FFT length (8 cores, 4 workers): 3.74, 3.70, 3.72, 3.71 ms. Throughput: 1076.18 iter/sec. Timings for 1792K FFT length (8 cores, 8 workers): 7.31, 7.34, 7.28, 7.28, 7.29, 7.33, 7.29, 7.29 ms. Throughput: 1095.94 iter/sec. Timings for 1920K FFT length (8 cores, 1 worker): 0.98 ms. Throughput: 1024.47 iter/sec. Timings for 1920K FFT length (8 cores, 2 workers): 1.92, 1.89 ms. Throughput: 1049.13 iter/sec. [Thu May 5 22:30:23 2022] Timings for 1920K FFT length (8 cores, 4 workers): 3.74, 3.74, 3.73, 3.73 ms. Throughput: 1071.95 iter/sec. Timings for 1920K FFT length (8 cores, 8 workers): 8.26, 8.22, 8.20, 8.18, 8.30, 8.15, 8.25, 8.29 ms. Throughput: 972.00 iter/sec. Timings for 2048K FFT length (8 cores, 1 worker): 1.15 ms. Throughput: 866.43 iter/sec. Timings for 2048K FFT length (8 cores, 2 workers): 2.18, 2.23 ms. Throughput: 906.57 iter/sec. Timings for 2048K FFT length (8 cores, 4 workers): 4.27, 4.19, 4.27, 4.18 ms. Throughput: 945.90 iter/sec. Timings for 2048K FFT length (8 cores, 8 workers): 9.26, 9.18, 9.12, 9.11, 9.12, 9.15, 9.11, 9.11 ms. Throughput: 874.90 iter/sec. Timings for 2240K FFT length (8 cores, 1 worker): 1.20 ms. Throughput: 829.97 iter/sec. Timings for 2240K FFT length (8 cores, 2 workers): 2.33, 2.35 ms. Throughput: 854.92 iter/sec. Timings for 2240K FFT length (8 cores, 4 workers): 4.63, 4.60, 4.58, 4.59 ms. Throughput: 869.64 iter/sec. Timings for 2240K FFT length (8 cores, 8 workers): 10.89, 11.01, 10.80, 10.97, 10.78, 10.83, 10.66, 10.81 ms. Throughput: 737.80 iter/sec. Timings for 2304K FFT length (8 cores, 1 worker): 1.20 ms. Throughput: 832.67 iter/sec. Timings for 2304K FFT length (8 cores, 2 workers): 2.32, 2.33 ms. Throughput: 861.15 iter/sec. Timings for 2304K FFT length (8 cores, 4 workers): 4.58, 4.57, 4.64, 4.57 ms. Throughput: 871.49 iter/sec. Timings for 2304K FFT length (8 cores, 8 workers): 11.58, 11.70, 11.63, 11.65, 11.44, 11.10, 11.56, 11.17 ms. Throughput: 697.18 iter/sec. Timings for 2400K FFT length (8 cores, 1 worker): 1.27 ms. Throughput: 785.73 iter/sec. Timings for 2400K FFT length (8 cores, 2 workers): 2.49, 2.52 ms. Throughput: 797.82 iter/sec. Timings for 2400K FFT length (8 cores, 4 workers): 4.91, 4.90, 4.90, 4.91 ms. Throughput: 815.68 iter/sec. Timings for 2400K FFT length (8 cores, 8 workers): 12.80, 12.72, 12.76, 12.91, 12.73, 12.74, 12.44, 12.75 ms. Throughput: 628.38 iter/sec. Timings for 2560K FFT length (8 cores, 1 worker): 1.41 ms. Throughput: 707.11 iter/sec. Timings for 2560K FFT length (8 cores, 2 workers): 2.76, 2.80 ms. Throughput: 719.15 iter/sec. Timings for 2560K FFT length (8 cores, 4 workers): 5.47, 5.40, 5.40, 5.40 ms. Throughput: 738.28 iter/sec. Timings for 2560K FFT length (8 cores, 8 workers): 13.82, 13.76, 13.51, 14.81, 13.80, 13.47, 13.88, 13.96 ms. Throughput: 576.91 iter/sec. Timings for 2688K FFT length (8 cores, 1 worker): 1.46 ms. Throughput: 687.16 iter/sec. Timings for 2688K FFT length (8 cores, 2 workers): 2.83, 2.83 ms. Throughput: 706.48 iter/sec. Timings for 2688K FFT length (8 cores, 4 workers): 5.61, 5.61, 5.60, 5.61 ms. Throughput: 713.56 iter/sec. Timings for 2688K FFT length (8 cores, 8 workers): 15.67, 15.25, 15.30, 15.49, 15.74, 14.73, 15.14, 15.62 ms. Throughput: 520.82 iter/sec. Timings for 2800K FFT length (8 cores, 1 worker): 1.58 ms. Throughput: 632.00 iter/sec. Timings for 2800K FFT length (8 cores, 2 workers): 3.08, 3.08 ms. Throughput: 649.40 iter/sec. Timings for 2800K FFT length (8 cores, 4 workers): 6.01, 5.98, 5.99, 5.96 ms. Throughput: 668.72 iter/sec. Timings for 2800K FFT length (8 cores, 8 workers): 16.97, 17.62, 16.81, 16.83, 16.77, 17.51, 16.85, 16.85 ms. Throughput: 470.01 iter/sec. [Thu May 5 22:35:31 2022] Timings for 2880K FFT length (8 cores, 1 worker): 1.52 ms. Throughput: 659.39 iter/sec. Timings for 2880K FFT length (8 cores, 2 workers): 2.97, 3.01 ms. Throughput: 668.70 iter/sec. Timings for 2880K FFT length (8 cores, 4 workers): 5.87, 5.85, 5.88, 5.87 ms. Throughput: 681.73 iter/sec. Timings for 2880K FFT length (8 cores, 8 workers): 17.62, 17.43, 17.26, 17.36, 17.56, 16.80, 19.11, 17.55 ms. Throughput: 455.48 iter/sec. Timings for 3072K FFT length (8 cores, 1 worker): 1.60 ms. Throughput: 626.87 iter/sec. Timings for 3072K FFT length (8 cores, 2 workers): 3.13, 3.13 ms. Throughput: 639.33 iter/sec. Timings for 3072K FFT length (8 cores, 4 workers): 6.21, 6.24, 6.20, 6.20 ms. Throughput: 643.85 iter/sec. Timings for 3072K FFT length (8 cores, 8 workers): 19.10, 19.58, 19.56, 19.61, 19.72, 18.80, 19.31, 19.82 ms. Throughput: 411.66 iter/sec. Timings for 3200K FFT length (8 cores, 1 worker): 1.74 ms. Throughput: 575.10 iter/sec. Timings for 3200K FFT length (8 cores, 2 workers): 3.40, 3.38 ms. Throughput: 589.81 iter/sec. Timings for 3200K FFT length (8 cores, 4 workers): 6.75, 6.74, 6.75, 6.69 ms. Throughput: 594.29 iter/sec. Timings for 3200K FFT length (8 cores, 8 workers): 20.36, 19.59, 20.58, 20.57, 20.59, 21.98, 21.27, 21.03 ms. Throughput: 385.95 iter/sec. Timings for 3360K FFT length (8 cores, 1 worker): 1.84 ms. Throughput: 543.23 iter/sec. Timings for 3360K FFT length (8 cores, 2 workers): 3.61, 3.66 ms. Throughput: 550.28 iter/sec. Timings for 3360K FFT length (8 cores, 4 workers): 7.23, 7.20, 7.20, 7.21 ms. Throughput: 554.67 iter/sec. Timings for 3360K FFT length (8 cores, 8 workers): 23.46, 21.81, 22.16, 21.76, 24.26, 23.28, 22.52, 22.61 ms. Throughput: 352.37 iter/sec. Timings for 3584K FFT length (8 cores, 1 worker): 1.96 ms. Throughput: 510.57 iter/sec. Timings for 3584K FFT length (8 cores, 2 workers): 3.84, 3.83 ms. Throughput: 521.48 iter/sec. Timings for 3584K FFT length (8 cores, 4 workers): 7.78, 7.69, 7.70, 7.72 ms. Throughput: 518.06 iter/sec. Timings for 3584K FFT length (8 cores, 8 workers): 24.69, 25.31, 25.16, 25.43, 25.25, 24.63, 24.97, 26.01 ms. Throughput: 317.77 iter/sec. Timings for 3840K FFT length (8 cores, 1 worker): 2.09 ms. Throughput: 477.87 iter/sec. Timings for 3840K FFT length (8 cores, 2 workers): 4.18, 4.11 ms. Throughput: 482.30 iter/sec. Timings for 3840K FFT length (8 cores, 4 workers): 8.51, 8.47, 8.50, 8.48 ms. Throughput: 471.03 iter/sec. Timings for 3840K FFT length (8 cores, 8 workers): 28.31, 26.02, 26.73, 27.02, 29.30, 26.07, 27.45, 28.35 ms. Throughput: 292.38 iter/sec. Timings for 4096K FFT length (8 cores, 1 worker): 2.26 ms. Throughput: 442.60 iter/sec. Timings for 4096K FFT length (8 cores, 2 workers): 4.44, 4.44 ms. Throughput: 450.26 iter/sec. Timings for 4096K FFT length (8 cores, 4 workers): 9.77, 9.66, 9.77, 9.65 ms. Throughput: 411.85 iter/sec. Timings for 4096K FFT length (8 cores, 8 workers): 30.87, 31.18, 30.41, 31.13, 31.77, 29.69, 31.02, 30.04 ms. Throughput: 260.16 iter/sec. Timings for 4480K FFT length (8 cores, 1 worker): 2.53 ms. Throughput: 395.06 iter/sec. Timings for 4480K FFT length (8 cores, 2 workers): 4.97, 4.93 ms. Throughput: 403.79 iter/sec. [Thu May 5 22:40:42 2022] Timings for 4480K FFT length (8 cores, 4 workers): 13.07, 13.07, 13.07, 13.07 ms. Throughput: 305.98 iter/sec. Timings for 4480K FFT length (8 cores, 8 workers): 40.23, 40.22, 40.21, 40.21, 40.22, 40.23, 40.22, 40.22 ms. Throughput: 198.91 iter/sec. Timings for 4608K FFT length (8 cores, 1 worker): 2.53 ms. Throughput: 395.21 iter/sec. Timings for 4608K FFT length (8 cores, 2 workers): 4.93, 4.93 ms. Throughput: 405.86 iter/sec. Timings for 4608K FFT length (8 cores, 4 workers): 11.80, 11.61, 11.60, 11.85 ms. Throughput: 341.44 iter/sec. Timings for 4608K FFT length (8 cores, 8 workers): 34.89, 35.49, 38.28, 36.01, 36.50, 34.16, 36.19, 36.05 ms. Throughput: 222.77 iter/sec. Timings for 4800K FFT length (8 cores, 1 worker): 2.72 ms. Throughput: 368.13 iter/sec. Timings for 4800K FFT length (8 cores, 2 workers): 5.35, 5.29 ms. Throughput: 376.09 iter/sec. Timings for 4800K FFT length (8 cores, 4 workers): 12.64, 12.68, 12.87, 12.71 ms. Throughput: 314.33 iter/sec. Timings for 4800K FFT length (8 cores, 8 workers): 38.85, 36.54, 38.12, 37.28, 37.95, 36.28, 38.16, 38.37 ms. Throughput: 212.35 iter/sec. Timings for 5120K FFT length (8 cores, 1 worker): 2.86 ms. Throughput: 349.51 iter/sec. Timings for 5120K FFT length (8 cores, 2 workers): 5.63, 5.59 ms. Throughput: 356.46 iter/sec. Timings for 5120K FFT length (8 cores, 4 workers): 14.58, 14.30, 14.26, 14.32 ms. Throughput: 278.44 iter/sec. Timings for 5120K FFT length (8 cores, 8 workers): 41.60, 40.58, 41.96, 41.19, 41.50, 39.44, 43.41, 41.01 ms. Throughput: 193.66 iter/sec. Timings for 5376K FFT length (8 cores, 1 worker): 3.04 ms. Throughput: 328.71 iter/sec. Timings for 5376K FFT length (8 cores, 2 workers): 6.00, 6.02 ms. Throughput: 332.87 iter/sec. Timings for 5376K FFT length (8 cores, 4 workers): 15.87, 15.97, 16.18, 16.25 ms. Throughput: 248.94 iter/sec. Timings for 5376K FFT length (8 cores, 8 workers): 44.97, 44.45, 45.45, 45.07, 44.54, 43.04, 45.48, 46.42 ms. Throughput: 178.14 iter/sec. Timings for 5600K FFT length (8 cores, 1 worker): 3.12 ms. Throughput: 320.55 iter/sec. Timings for 5600K FFT length (8 cores, 2 workers): 6.15, 6.15 ms. Throughput: 325.13 iter/sec. Timings for 5600K FFT length (8 cores, 4 workers): 17.49, 17.93, 17.49, 17.80 ms. Throughput: 226.28 iter/sec. Timings for 5600K FFT length (8 cores, 8 workers): 47.96, 48.18, 48.50, 47.76, 48.44, 47.26, 47.98, 48.89 ms. Throughput: 166.27 iter/sec. Timings for 5760K FFT length (8 cores, 1 worker): 3.13 ms. Throughput: 319.52 iter/sec. Timings for 5760K FFT length (8 cores, 2 workers): 6.24, 6.24 ms. Throughput: 320.58 iter/sec. Timings for 5760K FFT length (8 cores, 4 workers): 21.12, 21.17, 21.27, 21.12 ms. Throughput: 188.92 iter/sec. Timings for 5760K FFT length (8 cores, 8 workers): 56.75, 55.82, 55.68, 56.87, 56.59, 55.87, 56.83, 55.72 ms. Throughput: 142.19 iter/sec. Timings for 6144K FFT length (8 cores, 1 worker): 3.40 ms. Throughput: 294.26 iter/sec. Timings for 6144K FFT length (8 cores, 2 workers): 6.76, 6.85 ms. Throughput: 293.91 iter/sec. Timings for 6144K FFT length (8 cores, 4 workers): 20.67, 19.86, 19.70, 19.92 ms. Throughput: 199.72 iter/sec. [Thu May 5 22:45:47 2022] Timings for 6144K FFT length (8 cores, 8 workers): 53.06, 52.40, 52.69, 52.98, 52.99, 50.68, 52.47, 53.26 ms. Throughput: 152.22 iter/sec. Timings for 6400K FFT length (8 cores, 1 worker): 3.56 ms. Throughput: 281.20 iter/sec. Timings for 6400K FFT length (8 cores, 2 workers): 7.04, 7.00 ms. Throughput: 284.79 iter/sec. Timings for 6400K FFT length (8 cores, 4 workers): 21.02, 21.89, 20.41, 21.51 ms. Throughput: 188.73 iter/sec. Timings for 6400K FFT length (8 cores, 8 workers): 55.57, 54.56, 57.04, 55.12, 55.78, 55.46, 55.44, 56.50 ms. Throughput: 143.69 iter/sec. Timings for 6720K FFT length (8 cores, 1 worker): 3.71 ms. Throughput: 269.24 iter/sec. Timings for 6720K FFT length (8 cores, 2 workers): 7.54, 7.54 ms. Throughput: 265.09 iter/sec. Timings for 6720K FFT length (8 cores, 4 workers): 24.47, 24.38, 24.25, 24.64 ms. Throughput: 163.71 iter/sec. Timings for 6720K FFT length (8 cores, 8 workers): 62.69, 61.28, 61.24, 62.39, 61.28, 60.91, 62.22, 62.84 ms. Throughput: 129.35 iter/sec. Timings for 7168K FFT length (8 cores, 1 worker): 4.17 ms. Throughput: 239.98 iter/sec. Timings for 7168K FFT length (8 cores, 2 workers): 8.40, 8.40 ms. Throughput: 238.18 iter/sec. Timings for 7168K FFT length (8 cores, 4 workers): 26.18, 26.75, 25.78, 25.71 ms. Throughput: 153.26 iter/sec. Timings for 7168K FFT length (8 cores, 8 workers): 63.65, 66.52, 64.48, 62.92, 66.19, 62.65, 65.37, 64.88 ms. Throughput: 123.92 iter/sec. Timings for 7680K FFT length (8 cores, 1 worker): 4.34 ms. Throughput: 230.49 iter/sec. Timings for 7680K FFT length (8 cores, 2 workers): 10.17, 10.23 ms. Throughput: 196.08 iter/sec. Timings for 7680K FFT length (8 cores, 4 workers): 34.98, 35.15, 35.16, 35.32 ms. Throughput: 113.78 iter/sec. Timings for 7680K FFT length (8 cores, 8 workers): 81.23, 80.78, 80.40, 80.60, 79.95, 81.31, 79.75, 81.04 ms. Throughput: 99.22 iter/sec. Timings for 8000K FFT length (8 cores, 1 worker): 4.57 ms. Throughput: 219.06 iter/sec. Timings for 8000K FFT length (8 cores, 2 workers): 9.63, 9.49 ms. Throughput: 209.18 iter/sec. Timings for 8000K FFT length (8 cores, 4 workers): 30.01, 30.62, 31.70, 30.39 ms. Throughput: 130.44 iter/sec. Timings for 8000K FFT length (8 cores, 8 workers): 74.16, 76.71, 73.04, 73.07, 74.12, 74.25, 74.86, 74.15 ms. Throughput: 107.70 iter/sec. Timings for 8064K FFT length (8 cores, 1 worker): 4.64 ms. Throughput: 215.61 iter/sec. Timings for 8064K FFT length (8 cores, 2 workers): 9.82, 9.79 ms. Throughput: 203.99 iter/sec. Timings for 8064K FFT length (8 cores, 4 workers): 30.86, 30.88, 30.86, 32.85 ms. Throughput: 127.63 iter/sec. Timings for 8064K FFT length (8 cores, 8 workers): 74.85, 73.24, 73.21, 73.78, 77.11, 71.62, 74.85, 76.06 ms. Throughput: 107.66 iter/sec. Timings for 8192K FFT length (8 cores, 1 worker): 4.79 ms. Throughput: 208.95 iter/sec. Timings for 8192K FFT length (8 cores, 2 workers): 10.10, 10.09 ms. Throughput: 198.12 iter/sec. Timings for 8192K FFT length (8 cores, 4 workers): 31.64, 32.78, 32.29, 32.07 ms. Throughput: 124.26 iter/sec. Timings for 8192K FFT length (8 cores, 8 workers): 78.81, 75.92, 78.91, 76.51, 77.04, 74.21, 77.16, 76.64 ms. Throughput: 104.07 iter/sec. 
Timings for 4096K FFT length (8 cores, 1 worker): 2.26 ms. Throughput: 442.60 iter/sec. Timings for 4480K FFT length (8 cores, 1 worker): 2.53 ms. Throughput: 395.06 iter/sec. Timings for 4608K FFT length (8 cores, 1 worker): 2.53 ms. Throughput: 395.21 iter/sec. Timings for 4800K FFT length (8 cores, 1 worker): 2.72 ms. Throughput: 368.13 iter/sec. Timings for 5120K FFT length (8 cores, 1 worker): 2.86 ms. Throughput: 349.51 iter/sec. Timings for 5376K FFT length (8 cores, 1 worker): 3.04 ms. Throughput: 328.71 iter/sec. Timings for 5600K FFT length (8 cores, 1 worker): 3.12 ms. Throughput: 320.55 iter/sec. Timings for 5760K FFT length (8 cores, 1 worker): 3.13 ms. Throughput: 319.52 iter/sec. Timings for 6144K FFT length (8 cores, 1 worker): 3.40 ms. Throughput: 294.26 iter/sec. Timings for 6400K FFT length (8 cores, 1 worker): 3.56 ms. Throughput: 281.20 iter/sec. Timings for 6720K FFT length (8 cores, 1 worker): 3.71 ms. Throughput: 269.24 iter/sec. Timings for 7168K FFT length (8 cores, 1 worker): 4.17 ms. Throughput: 239.98 iter/sec. Timings for 7680K FFT length (8 cores, 1 worker): 4.34 ms. Throughput: 230.49 iter/sec. Timings for 8000K FFT length (8 cores, 1 worker): 4.57 ms. Throughput: 219.06 iter/sec. Timings for 8064K FFT length (8 cores, 1 worker): 4.64 ms. Throughput: 215.61 iter/sec. Timings for 8192K FFT length (8 cores, 1 worker): 4.79 ms. Throughput: 208.95 iter/sec. Code:
Best time for 4096K FFT length: 1.936 ms., avg: 1.962 ms. Best time for 4480K FFT length: 2.416 ms., avg: 2.520 ms. Best time for 4608K FFT length: 2.298 ms., avg: 2.382 ms. Best time for 4800K FFT length: 2.299 ms., avg: 2.351 ms. Best time for 5120K FFT length: 2.536 ms., avg: 2.655 ms. Best time for 5376K FFT length: 3.022 ms., avg: 3.265 ms. Best time for 5600K FFT length: 3.337 ms., avg: 3.511 ms. Best time for 5760K FFT length: 4.065 ms., avg: 4.131 ms. Best time for 6144K FFT length: 3.502 ms., avg: 3.672 ms. Best time for 6400K FFT length: 3.539 ms., avg: 3.646 ms. Best time for 6720K FFT length: 4.854 ms., avg: 4.948 ms. Best time for 7168K FFT length: 5.050 ms., avg: 5.149 ms. Best time for 7680K FFT length: 6.219 ms., avg: 6.283 ms. Best time for 8000K FFT length: 5.612 ms., avg: 5.763 ms. Best time for 8064K FFT length: 6.222 ms., avg: 6.392 ms. Best time for 8192K FFT length: 6.397 ms., avg: 6.476 ms. Last fiddled with by Mark Rose on 20220506 at 07:38 

The nonSMT results are in the same post you linked, above the SMT results.

I was thinking about the effect of memory bandwidth as a bottleneck for the performance of a GPU or a CPU. I thought about and tried to calculate the amount of bandwidth needed for 1 TFLOPS of FP64 to be fully used. But my results were about 130 GB/s, which seems too little in the context of Radeon VII, which houses roughly 3 TFLOPS of FP64 throughput, yet the actual performance in PRP tests differs.
I used the conversion 1 TFLOPS = 500 GHzD/D, 500 GHzD is one test with an exponent around 113,500,000, which needs 6144K FFT size, and that requires about 48 MiB of FFT data to be transferred, thus 113,500,000 * 48 MiB in one day is about 130 GB/s. Could someone explain to me how the memory bandwidth affects the performance, and what could be used as ruleofthumb conversion for the bandwidth required for 1 TFLOPS FP64 to be fully used? 
Does anyone have a Ryzen 5700g and is willing to post benchmark results? I'd be curious to see what impact the 16mb of L3 has on wavefront PRP throughput.
Thanks in advance 👍 
