mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware

Reply
 
Thread Tools
Old 2022-04-20, 18:48   #870
petrw1
1976 Toyota Corona years forever!
 
petrw1's Avatar
 
"Wayne"
Nov 2006
Saskatchewan, Canada

121418 Posts
Default

Interesting that 300K which is so much worse with 8 Cores/1 Worker ...
has much more consistent times for 8 / 8.

Timings for 280K FFT length (8 cores, 1 worker): 0.16 ms. Throughput: 6246.39 iter/sec.
Timings for 280K FFT length (8 cores, 2 workers): 0.23, 0.23 ms. Throughput: 8765.58 iter/sec.
Timings for 280K FFT length (8 cores, 4 workers): 0.39, 0.39, 0.39, 0.39 ms. Throughput: 10241.03 iter/sec.
Timings for 280K FFT length (8 cores, 8 workers): 0.67, 0.67, 0.66, 0.66, 0.67, 0.66, 0.66, 0.67 ms. Throughput: 12039.18 iter/sec.

Timings for 300K FFT length (8 cores, 1 worker): 0.36 ms. Throughput: 2795.59 iter/sec.
Timings for 300K FFT length (8 cores, 2 workers): 0.34, 0.34 ms. Throughput: 5864.26 iter/sec.
Timings for 300K FFT length (8 cores, 4 workers): 0.51, 0.52, 0.52, 0.50 ms. Throughput: 7814.25 iter/sec.
Timings for 300K FFT length (8 cores, 8 workers): 0.73, 0.72, 0.72, 0.70, 0.71, 0.71, 0.71, 0.71 ms. Throughput: 11245.56 iter/sec.

Timings for 320K FFT length (8 cores, 1 worker): 0.17 ms. Throughput: 5715.57 iter/sec.
Timings for 320K FFT length (8 cores, 2 workers): 0.26, 0.26 ms. Throughput: 7777.33 iter/sec.
Timings for 320K FFT length (8 cores, 4 workers): 0.45, 0.44, 0.45, 0.45 ms. Throughput: 8965.12 iter/sec.
Timings for 320K FFT length (8 cores, 8 workers): 1.00, 0.83, 0.85, 0.86, 0.84, 0.82, 0.84, 0.83 ms. Throughput: 9329.29 iter/sec.
petrw1 is offline   Reply With Quote
Old 2022-05-06, 06:06   #871
sdbardwick
 
sdbardwick's Avatar
 
Aug 2002
North San Diego County

743 Posts
Default 5800x3D 1024K to 8192K throughput benchmark.

Non-optimized, non-oc'd CPU, 4000MHz RAM DDR4 using A-XMS stock settings.
Nothing unexpected or outstanding at first glance.

Code:
CPU speed: 3400.12 MHz, 8 hyperthreaded cores
CPU features: 3DNow! Prefetch, SSE, SSE2, SSE4, AVX, AVX2, FMA
L1 cache size: 8x32 KB, L2 cache size: 8x512 KB, L3 cache size: 96 MB
L1 cache line size: 64 bytes, L2 cache line size: 64 bytes
Machine topology as determined by hwloc library:
 Machine#0 (total=13447548KB, Backend=Windows, OSName=Windows, WindowsBuildEnvironment=MinGW, OSRelease=10, OSVersion=10.0.18362, Hostname=5800X3D, Architecture=x86_64, hwlocVersion=2.4.1, ProcessName=prime95.exe)
  Package (total=13447548KB, CPUVendor=AuthenticAMD, CPUFamilyNumber=25, CPUModelNumber=33, CPUModel="AMD Ryzen 7 5800X3D 8-Core Processor           ", CPUStepping=2)
    L3 (size=98304KB, linesize=64, ways=16, Inclusive=0)
      L2 (size=512KB, linesize=64, ways=8, Inclusive=1)
        L1d (size=32KB, linesize=64, ways=8, Inclusive=0)
          Core (cpuset: 0x00000003)
            PU#0 (cpuset: 0x00000001)
            PU#1 (cpuset: 0x00000002)
      L2 (size=512KB, linesize=64, ways=8, Inclusive=1)
        L1d (size=32KB, linesize=64, ways=8, Inclusive=0)
          Core (cpuset: 0x0000000c)
            PU#2 (cpuset: 0x00000004)
            PU#3 (cpuset: 0x00000008)
      L2 (size=512KB, linesize=64, ways=8, Inclusive=1)
        L1d (size=32KB, linesize=64, ways=8, Inclusive=0)
          Core (cpuset: 0x00000030)
            PU#4 (cpuset: 0x00000010)
            PU#5 (cpuset: 0x00000020)
      L2 (size=512KB, linesize=64, ways=8, Inclusive=1)
        L1d (size=32KB, linesize=64, ways=8, Inclusive=0)
          Core (cpuset: 0x000000c0)
            PU#6 (cpuset: 0x00000040)
            PU#7 (cpuset: 0x00000080)
      L2 (size=512KB, linesize=64, ways=8, Inclusive=1)
        L1d (size=32KB, linesize=64, ways=8, Inclusive=0)
          Core (cpuset: 0x00000300)
            PU#8 (cpuset: 0x00000100)
            PU#9 (cpuset: 0x00000200)
      L2 (size=512KB, linesize=64, ways=8, Inclusive=1)
        L1d (size=32KB, linesize=64, ways=8, Inclusive=0)
          Core (cpuset: 0x00000c00)
            PU#10 (cpuset: 0x00000400)
            PU#11 (cpuset: 0x00000800)
      L2 (size=512KB, linesize=64, ways=8, Inclusive=1)
        L1d (size=32KB, linesize=64, ways=8, Inclusive=0)
          Core (cpuset: 0x00003000)
            PU#12 (cpuset: 0x00001000)
            PU#13 (cpuset: 0x00002000)
      L2 (size=512KB, linesize=64, ways=8, Inclusive=1)
        L1d (size=32KB, linesize=64, ways=8, Inclusive=0)
          Core (cpuset: 0x0000c000)
            PU#14 (cpuset: 0x00004000)
            PU#15 (cpuset: 0x00008000)
Prime95 64-bit version 30.7, RdtscTiming=1
Timings for 1024K FFT length (8 cores, 1 worker):  0.55 ms.  Throughput: 1821.51 iter/sec.
Timings for 1024K FFT length (8 cores, 2 workers):  1.04,  1.04 ms.  Throughput: 1931.97 iter/sec.
Timings for 1024K FFT length (8 cores, 4 workers):  2.00,  1.99,  2.00,  2.00 ms.  Throughput: 2001.40 iter/sec.
Timings for 1024K FFT length (8 cores, 8 workers):  3.82,  3.82,  3.82,  3.80,  3.82,  3.83,  3.84,  3.82 ms.  Throughput: 2093.63 iter/sec.
Timings for 1120K FFT length (8 cores, 1 worker):  0.60 ms.  Throughput: 1653.34 iter/sec.
Timings for 1120K FFT length (8 cores, 2 workers):  1.15,  1.15 ms.  Throughput: 1735.32 iter/sec.
Timings for 1120K FFT length (8 cores, 4 workers):  2.22,  2.24,  2.21,  2.20 ms.  Throughput: 1804.09 iter/sec.
Timings for 1120K FFT length (8 cores, 8 workers):  4.21,  4.23,  4.21,  4.21,  4.21,  4.23,  4.22,  4.21 ms.  Throughput: 1897.55 iter/sec.
Timings for 1152K FFT length (8 cores, 1 worker):  0.59 ms.  Throughput: 1691.24 iter/sec.
Timings for 1152K FFT length (8 cores, 2 workers):  1.13,  1.13 ms.  Throughput: 1769.93 iter/sec.
Timings for 1152K FFT length (8 cores, 4 workers):  2.22,  2.20,  2.20,  2.20 ms.  Throughput: 1814.40 iter/sec.
Timings for 1152K FFT length (8 cores, 8 workers):  4.29,  4.26,  4.26,  4.23,  4.22,  4.25,  4.24,  4.23 ms.  Throughput: 1883.85 iter/sec.
[Thu May  5 22:25:18 2022]
Timings for 1280K FFT length (8 cores, 1 worker):  0.68 ms.  Throughput: 1464.10 iter/sec.
Timings for 1280K FFT length (8 cores, 2 workers):  1.30,  1.30 ms.  Throughput: 1535.99 iter/sec.
Timings for 1280K FFT length (8 cores, 4 workers):  2.52,  2.57,  2.51,  2.52 ms.  Throughput: 1581.03 iter/sec.
Timings for 1280K FFT length (8 cores, 8 workers):  4.85,  4.90,  4.84,  4.86,  4.85,  4.88,  4.86,  4.86 ms.  Throughput: 1645.09 iter/sec.
Timings for 1344K FFT length (8 cores, 1 worker):  0.71 ms.  Throughput: 1401.51 iter/sec.
Timings for 1344K FFT length (8 cores, 2 workers):  1.37,  1.37 ms.  Throughput: 1456.32 iter/sec.
Timings for 1344K FFT length (8 cores, 4 workers):  2.70,  2.68,  2.69,  2.69 ms.  Throughput: 1486.34 iter/sec.
Timings for 1344K FFT length (8 cores, 8 workers):  5.22,  5.23,  5.22,  5.22,  5.22,  5.24,  5.27,  5.22 ms.  Throughput: 1529.46 iter/sec.
Timings for 1440K FFT length (8 cores, 1 worker):  0.75 ms.  Throughput: 1339.61 iter/sec.
Timings for 1440K FFT length (8 cores, 2 workers):  1.46,  1.44 ms.  Throughput: 1376.79 iter/sec.
Timings for 1440K FFT length (8 cores, 4 workers):  2.83,  2.81,  2.86,  2.84 ms.  Throughput: 1411.41 iter/sec.
Timings for 1440K FFT length (8 cores, 8 workers):  5.53,  5.56,  5.46,  5.47,  5.49,  5.53,  5.49,  5.48 ms.  Throughput: 1454.27 iter/sec.
Timings for 1536K FFT length (8 cores, 1 worker):  0.84 ms.  Throughput: 1189.28 iter/sec.
Timings for 1536K FFT length (8 cores, 2 workers):  1.59,  1.61 ms.  Throughput: 1250.85 iter/sec.
Timings for 1536K FFT length (8 cores, 4 workers):  3.04,  3.03,  3.04,  3.05 ms.  Throughput: 1315.83 iter/sec.
Timings for 1536K FFT length (8 cores, 8 workers):  5.86,  5.84,  5.81,  5.84,  5.84,  5.86,  5.83,  5.85 ms.  Throughput: 1369.74 iter/sec.
Timings for 1600K FFT length (8 cores, 1 worker):  0.82 ms.  Throughput: 1214.51 iter/sec.
Timings for 1600K FFT length (8 cores, 2 workers):  1.61,  1.59 ms.  Throughput: 1250.17 iter/sec.
Timings for 1600K FFT length (8 cores, 4 workers):  3.11,  3.14,  3.12,  3.11 ms.  Throughput: 1281.28 iter/sec.
Timings for 1600K FFT length (8 cores, 8 workers):  6.15,  6.13,  6.11,  6.16,  6.11,  6.15,  6.17,  6.13 ms.  Throughput: 1303.17 iter/sec.
Timings for 1680K FFT length (8 cores, 1 worker):  0.91 ms.  Throughput: 1104.91 iter/sec.
Timings for 1680K FFT length (8 cores, 2 workers):  1.77,  1.76 ms.  Throughput: 1133.54 iter/sec.
Timings for 1680K FFT length (8 cores, 4 workers):  3.45,  3.44,  3.45,  3.45 ms.  Throughput: 1160.00 iter/sec.
Timings for 1680K FFT length (8 cores, 8 workers):  6.85,  6.84,  6.83,  6.81,  6.82,  6.85,  6.84,  6.84 ms.  Throughput: 1170.61 iter/sec.
Timings for 1792K FFT length (8 cores, 1 worker):  1.01 ms.  Throughput: 991.35 iter/sec.
Timings for 1792K FFT length (8 cores, 2 workers):  1.90,  1.91 ms.  Throughput: 1049.05 iter/sec.
Timings for 1792K FFT length (8 cores, 4 workers):  3.74,  3.70,  3.72,  3.71 ms.  Throughput: 1076.18 iter/sec.
Timings for 1792K FFT length (8 cores, 8 workers):  7.31,  7.34,  7.28,  7.28,  7.29,  7.33,  7.29,  7.29 ms.  Throughput: 1095.94 iter/sec.
Timings for 1920K FFT length (8 cores, 1 worker):  0.98 ms.  Throughput: 1024.47 iter/sec.
Timings for 1920K FFT length (8 cores, 2 workers):  1.92,  1.89 ms.  Throughput: 1049.13 iter/sec.
[Thu May  5 22:30:23 2022]
Timings for 1920K FFT length (8 cores, 4 workers):  3.74,  3.74,  3.73,  3.73 ms.  Throughput: 1071.95 iter/sec.
Timings for 1920K FFT length (8 cores, 8 workers):  8.26,  8.22,  8.20,  8.18,  8.30,  8.15,  8.25,  8.29 ms.  Throughput: 972.00 iter/sec.
Timings for 2048K FFT length (8 cores, 1 worker):  1.15 ms.  Throughput: 866.43 iter/sec.
Timings for 2048K FFT length (8 cores, 2 workers):  2.18,  2.23 ms.  Throughput: 906.57 iter/sec.
Timings for 2048K FFT length (8 cores, 4 workers):  4.27,  4.19,  4.27,  4.18 ms.  Throughput: 945.90 iter/sec.
Timings for 2048K FFT length (8 cores, 8 workers):  9.26,  9.18,  9.12,  9.11,  9.12,  9.15,  9.11,  9.11 ms.  Throughput: 874.90 iter/sec.
Timings for 2240K FFT length (8 cores, 1 worker):  1.20 ms.  Throughput: 829.97 iter/sec.
Timings for 2240K FFT length (8 cores, 2 workers):  2.33,  2.35 ms.  Throughput: 854.92 iter/sec.
Timings for 2240K FFT length (8 cores, 4 workers):  4.63,  4.60,  4.58,  4.59 ms.  Throughput: 869.64 iter/sec.
Timings for 2240K FFT length (8 cores, 8 workers): 10.89, 11.01, 10.80, 10.97, 10.78, 10.83, 10.66, 10.81 ms.  Throughput: 737.80 iter/sec.
Timings for 2304K FFT length (8 cores, 1 worker):  1.20 ms.  Throughput: 832.67 iter/sec.
Timings for 2304K FFT length (8 cores, 2 workers):  2.32,  2.33 ms.  Throughput: 861.15 iter/sec.
Timings for 2304K FFT length (8 cores, 4 workers):  4.58,  4.57,  4.64,  4.57 ms.  Throughput: 871.49 iter/sec.
Timings for 2304K FFT length (8 cores, 8 workers): 11.58, 11.70, 11.63, 11.65, 11.44, 11.10, 11.56, 11.17 ms.  Throughput: 697.18 iter/sec.
Timings for 2400K FFT length (8 cores, 1 worker):  1.27 ms.  Throughput: 785.73 iter/sec.
Timings for 2400K FFT length (8 cores, 2 workers):  2.49,  2.52 ms.  Throughput: 797.82 iter/sec.
Timings for 2400K FFT length (8 cores, 4 workers):  4.91,  4.90,  4.90,  4.91 ms.  Throughput: 815.68 iter/sec.
Timings for 2400K FFT length (8 cores, 8 workers): 12.80, 12.72, 12.76, 12.91, 12.73, 12.74, 12.44, 12.75 ms.  Throughput: 628.38 iter/sec.
Timings for 2560K FFT length (8 cores, 1 worker):  1.41 ms.  Throughput: 707.11 iter/sec.
Timings for 2560K FFT length (8 cores, 2 workers):  2.76,  2.80 ms.  Throughput: 719.15 iter/sec.
Timings for 2560K FFT length (8 cores, 4 workers):  5.47,  5.40,  5.40,  5.40 ms.  Throughput: 738.28 iter/sec.
Timings for 2560K FFT length (8 cores, 8 workers): 13.82, 13.76, 13.51, 14.81, 13.80, 13.47, 13.88, 13.96 ms.  Throughput: 576.91 iter/sec.
Timings for 2688K FFT length (8 cores, 1 worker):  1.46 ms.  Throughput: 687.16 iter/sec.
Timings for 2688K FFT length (8 cores, 2 workers):  2.83,  2.83 ms.  Throughput: 706.48 iter/sec.
Timings for 2688K FFT length (8 cores, 4 workers):  5.61,  5.61,  5.60,  5.61 ms.  Throughput: 713.56 iter/sec.
Timings for 2688K FFT length (8 cores, 8 workers): 15.67, 15.25, 15.30, 15.49, 15.74, 14.73, 15.14, 15.62 ms.  Throughput: 520.82 iter/sec.
Timings for 2800K FFT length (8 cores, 1 worker):  1.58 ms.  Throughput: 632.00 iter/sec.
Timings for 2800K FFT length (8 cores, 2 workers):  3.08,  3.08 ms.  Throughput: 649.40 iter/sec.
Timings for 2800K FFT length (8 cores, 4 workers):  6.01,  5.98,  5.99,  5.96 ms.  Throughput: 668.72 iter/sec.
Timings for 2800K FFT length (8 cores, 8 workers): 16.97, 17.62, 16.81, 16.83, 16.77, 17.51, 16.85, 16.85 ms.  Throughput: 470.01 iter/sec.
[Thu May  5 22:35:31 2022]
Timings for 2880K FFT length (8 cores, 1 worker):  1.52 ms.  Throughput: 659.39 iter/sec.
Timings for 2880K FFT length (8 cores, 2 workers):  2.97,  3.01 ms.  Throughput: 668.70 iter/sec.
Timings for 2880K FFT length (8 cores, 4 workers):  5.87,  5.85,  5.88,  5.87 ms.  Throughput: 681.73 iter/sec.
Timings for 2880K FFT length (8 cores, 8 workers): 17.62, 17.43, 17.26, 17.36, 17.56, 16.80, 19.11, 17.55 ms.  Throughput: 455.48 iter/sec.
Timings for 3072K FFT length (8 cores, 1 worker):  1.60 ms.  Throughput: 626.87 iter/sec.
Timings for 3072K FFT length (8 cores, 2 workers):  3.13,  3.13 ms.  Throughput: 639.33 iter/sec.
Timings for 3072K FFT length (8 cores, 4 workers):  6.21,  6.24,  6.20,  6.20 ms.  Throughput: 643.85 iter/sec.
Timings for 3072K FFT length (8 cores, 8 workers): 19.10, 19.58, 19.56, 19.61, 19.72, 18.80, 19.31, 19.82 ms.  Throughput: 411.66 iter/sec.
Timings for 3200K FFT length (8 cores, 1 worker):  1.74 ms.  Throughput: 575.10 iter/sec.
Timings for 3200K FFT length (8 cores, 2 workers):  3.40,  3.38 ms.  Throughput: 589.81 iter/sec.
Timings for 3200K FFT length (8 cores, 4 workers):  6.75,  6.74,  6.75,  6.69 ms.  Throughput: 594.29 iter/sec.
Timings for 3200K FFT length (8 cores, 8 workers): 20.36, 19.59, 20.58, 20.57, 20.59, 21.98, 21.27, 21.03 ms.  Throughput: 385.95 iter/sec.
Timings for 3360K FFT length (8 cores, 1 worker):  1.84 ms.  Throughput: 543.23 iter/sec.
Timings for 3360K FFT length (8 cores, 2 workers):  3.61,  3.66 ms.  Throughput: 550.28 iter/sec.
Timings for 3360K FFT length (8 cores, 4 workers):  7.23,  7.20,  7.20,  7.21 ms.  Throughput: 554.67 iter/sec.
Timings for 3360K FFT length (8 cores, 8 workers): 23.46, 21.81, 22.16, 21.76, 24.26, 23.28, 22.52, 22.61 ms.  Throughput: 352.37 iter/sec.
Timings for 3584K FFT length (8 cores, 1 worker):  1.96 ms.  Throughput: 510.57 iter/sec.
Timings for 3584K FFT length (8 cores, 2 workers):  3.84,  3.83 ms.  Throughput: 521.48 iter/sec.
Timings for 3584K FFT length (8 cores, 4 workers):  7.78,  7.69,  7.70,  7.72 ms.  Throughput: 518.06 iter/sec.
Timings for 3584K FFT length (8 cores, 8 workers): 24.69, 25.31, 25.16, 25.43, 25.25, 24.63, 24.97, 26.01 ms.  Throughput: 317.77 iter/sec.
Timings for 3840K FFT length (8 cores, 1 worker):  2.09 ms.  Throughput: 477.87 iter/sec.
Timings for 3840K FFT length (8 cores, 2 workers):  4.18,  4.11 ms.  Throughput: 482.30 iter/sec.
Timings for 3840K FFT length (8 cores, 4 workers):  8.51,  8.47,  8.50,  8.48 ms.  Throughput: 471.03 iter/sec.
Timings for 3840K FFT length (8 cores, 8 workers): 28.31, 26.02, 26.73, 27.02, 29.30, 26.07, 27.45, 28.35 ms.  Throughput: 292.38 iter/sec.
Timings for 4096K FFT length (8 cores, 1 worker):  2.26 ms.  Throughput: 442.60 iter/sec.
Timings for 4096K FFT length (8 cores, 2 workers):  4.44,  4.44 ms.  Throughput: 450.26 iter/sec.
Timings for 4096K FFT length (8 cores, 4 workers):  9.77,  9.66,  9.77,  9.65 ms.  Throughput: 411.85 iter/sec.
Timings for 4096K FFT length (8 cores, 8 workers): 30.87, 31.18, 30.41, 31.13, 31.77, 29.69, 31.02, 30.04 ms.  Throughput: 260.16 iter/sec.
Timings for 4480K FFT length (8 cores, 1 worker):  2.53 ms.  Throughput: 395.06 iter/sec.
Timings for 4480K FFT length (8 cores, 2 workers):  4.97,  4.93 ms.  Throughput: 403.79 iter/sec.
[Thu May  5 22:40:42 2022]
Timings for 4480K FFT length (8 cores, 4 workers): 13.07, 13.07, 13.07, 13.07 ms.  Throughput: 305.98 iter/sec.
Timings for 4480K FFT length (8 cores, 8 workers): 40.23, 40.22, 40.21, 40.21, 40.22, 40.23, 40.22, 40.22 ms.  Throughput: 198.91 iter/sec.
Timings for 4608K FFT length (8 cores, 1 worker):  2.53 ms.  Throughput: 395.21 iter/sec.
Timings for 4608K FFT length (8 cores, 2 workers):  4.93,  4.93 ms.  Throughput: 405.86 iter/sec.
Timings for 4608K FFT length (8 cores, 4 workers): 11.80, 11.61, 11.60, 11.85 ms.  Throughput: 341.44 iter/sec.
Timings for 4608K FFT length (8 cores, 8 workers): 34.89, 35.49, 38.28, 36.01, 36.50, 34.16, 36.19, 36.05 ms.  Throughput: 222.77 iter/sec.
Timings for 4800K FFT length (8 cores, 1 worker):  2.72 ms.  Throughput: 368.13 iter/sec.
Timings for 4800K FFT length (8 cores, 2 workers):  5.35,  5.29 ms.  Throughput: 376.09 iter/sec.
Timings for 4800K FFT length (8 cores, 4 workers): 12.64, 12.68, 12.87, 12.71 ms.  Throughput: 314.33 iter/sec.
Timings for 4800K FFT length (8 cores, 8 workers): 38.85, 36.54, 38.12, 37.28, 37.95, 36.28, 38.16, 38.37 ms.  Throughput: 212.35 iter/sec.
Timings for 5120K FFT length (8 cores, 1 worker):  2.86 ms.  Throughput: 349.51 iter/sec.
Timings for 5120K FFT length (8 cores, 2 workers):  5.63,  5.59 ms.  Throughput: 356.46 iter/sec.
Timings for 5120K FFT length (8 cores, 4 workers): 14.58, 14.30, 14.26, 14.32 ms.  Throughput: 278.44 iter/sec.
Timings for 5120K FFT length (8 cores, 8 workers): 41.60, 40.58, 41.96, 41.19, 41.50, 39.44, 43.41, 41.01 ms.  Throughput: 193.66 iter/sec.
Timings for 5376K FFT length (8 cores, 1 worker):  3.04 ms.  Throughput: 328.71 iter/sec.
Timings for 5376K FFT length (8 cores, 2 workers):  6.00,  6.02 ms.  Throughput: 332.87 iter/sec.
Timings for 5376K FFT length (8 cores, 4 workers): 15.87, 15.97, 16.18, 16.25 ms.  Throughput: 248.94 iter/sec.
Timings for 5376K FFT length (8 cores, 8 workers): 44.97, 44.45, 45.45, 45.07, 44.54, 43.04, 45.48, 46.42 ms.  Throughput: 178.14 iter/sec.
Timings for 5600K FFT length (8 cores, 1 worker):  3.12 ms.  Throughput: 320.55 iter/sec.
Timings for 5600K FFT length (8 cores, 2 workers):  6.15,  6.15 ms.  Throughput: 325.13 iter/sec.
Timings for 5600K FFT length (8 cores, 4 workers): 17.49, 17.93, 17.49, 17.80 ms.  Throughput: 226.28 iter/sec.
Timings for 5600K FFT length (8 cores, 8 workers): 47.96, 48.18, 48.50, 47.76, 48.44, 47.26, 47.98, 48.89 ms.  Throughput: 166.27 iter/sec.
Timings for 5760K FFT length (8 cores, 1 worker):  3.13 ms.  Throughput: 319.52 iter/sec.
Timings for 5760K FFT length (8 cores, 2 workers):  6.24,  6.24 ms.  Throughput: 320.58 iter/sec.
Timings for 5760K FFT length (8 cores, 4 workers): 21.12, 21.17, 21.27, 21.12 ms.  Throughput: 188.92 iter/sec.
Timings for 5760K FFT length (8 cores, 8 workers): 56.75, 55.82, 55.68, 56.87, 56.59, 55.87, 56.83, 55.72 ms.  Throughput: 142.19 iter/sec.
Timings for 6144K FFT length (8 cores, 1 worker):  3.40 ms.  Throughput: 294.26 iter/sec.
Timings for 6144K FFT length (8 cores, 2 workers):  6.76,  6.85 ms.  Throughput: 293.91 iter/sec.
Timings for 6144K FFT length (8 cores, 4 workers): 20.67, 19.86, 19.70, 19.92 ms.  Throughput: 199.72 iter/sec.
[Thu May  5 22:45:47 2022]
Timings for 6144K FFT length (8 cores, 8 workers): 53.06, 52.40, 52.69, 52.98, 52.99, 50.68, 52.47, 53.26 ms.  Throughput: 152.22 iter/sec.
Timings for 6400K FFT length (8 cores, 1 worker):  3.56 ms.  Throughput: 281.20 iter/sec.
Timings for 6400K FFT length (8 cores, 2 workers):  7.04,  7.00 ms.  Throughput: 284.79 iter/sec.
Timings for 6400K FFT length (8 cores, 4 workers): 21.02, 21.89, 20.41, 21.51 ms.  Throughput: 188.73 iter/sec.
Timings for 6400K FFT length (8 cores, 8 workers): 55.57, 54.56, 57.04, 55.12, 55.78, 55.46, 55.44, 56.50 ms.  Throughput: 143.69 iter/sec.
Timings for 6720K FFT length (8 cores, 1 worker):  3.71 ms.  Throughput: 269.24 iter/sec.
Timings for 6720K FFT length (8 cores, 2 workers):  7.54,  7.54 ms.  Throughput: 265.09 iter/sec.
Timings for 6720K FFT length (8 cores, 4 workers): 24.47, 24.38, 24.25, 24.64 ms.  Throughput: 163.71 iter/sec.
Timings for 6720K FFT length (8 cores, 8 workers): 62.69, 61.28, 61.24, 62.39, 61.28, 60.91, 62.22, 62.84 ms.  Throughput: 129.35 iter/sec.
Timings for 7168K FFT length (8 cores, 1 worker):  4.17 ms.  Throughput: 239.98 iter/sec.
Timings for 7168K FFT length (8 cores, 2 workers):  8.40,  8.40 ms.  Throughput: 238.18 iter/sec.
Timings for 7168K FFT length (8 cores, 4 workers): 26.18, 26.75, 25.78, 25.71 ms.  Throughput: 153.26 iter/sec.
Timings for 7168K FFT length (8 cores, 8 workers): 63.65, 66.52, 64.48, 62.92, 66.19, 62.65, 65.37, 64.88 ms.  Throughput: 123.92 iter/sec.
Timings for 7680K FFT length (8 cores, 1 worker):  4.34 ms.  Throughput: 230.49 iter/sec.
Timings for 7680K FFT length (8 cores, 2 workers): 10.17, 10.23 ms.  Throughput: 196.08 iter/sec.
Timings for 7680K FFT length (8 cores, 4 workers): 34.98, 35.15, 35.16, 35.32 ms.  Throughput: 113.78 iter/sec.
Timings for 7680K FFT length (8 cores, 8 workers): 81.23, 80.78, 80.40, 80.60, 79.95, 81.31, 79.75, 81.04 ms.  Throughput: 99.22 iter/sec.
Timings for 8000K FFT length (8 cores, 1 worker):  4.57 ms.  Throughput: 219.06 iter/sec.
Timings for 8000K FFT length (8 cores, 2 workers):  9.63,  9.49 ms.  Throughput: 209.18 iter/sec.
Timings for 8000K FFT length (8 cores, 4 workers): 30.01, 30.62, 31.70, 30.39 ms.  Throughput: 130.44 iter/sec.
Timings for 8000K FFT length (8 cores, 8 workers): 74.16, 76.71, 73.04, 73.07, 74.12, 74.25, 74.86, 74.15 ms.  Throughput: 107.70 iter/sec.
Timings for 8064K FFT length (8 cores, 1 worker):  4.64 ms.  Throughput: 215.61 iter/sec.
Timings for 8064K FFT length (8 cores, 2 workers):  9.82,  9.79 ms.  Throughput: 203.99 iter/sec.
Timings for 8064K FFT length (8 cores, 4 workers): 30.86, 30.88, 30.86, 32.85 ms.  Throughput: 127.63 iter/sec.
Timings for 8064K FFT length (8 cores, 8 workers): 74.85, 73.24, 73.21, 73.78, 77.11, 71.62, 74.85, 76.06 ms.  Throughput: 107.66 iter/sec.
Timings for 8192K FFT length (8 cores, 1 worker):  4.79 ms.  Throughput: 208.95 iter/sec.
Timings for 8192K FFT length (8 cores, 2 workers): 10.10, 10.09 ms.  Throughput: 198.12 iter/sec.
Timings for 8192K FFT length (8 cores, 4 workers): 31.64, 32.78, 32.29, 32.07 ms.  Throughput: 124.26 iter/sec.
Timings for 8192K FFT length (8 cores, 8 workers): 78.81, 75.92, 78.91, 76.51, 77.04, 74.21, 77.16, 76.64 ms.  Throughput: 104.07 iter/sec.
sdbardwick is offline   Reply With Quote
Old 2022-05-06, 07:38   #872
Mark Rose
 
Mark Rose's Avatar
 
"/X\(‘-‘)/X\"
Jan 2013

13×227 Posts
Default

Quote:
Originally Posted by sdbardwick View Post
Non-optimized, non-oc'd CPU, 4000MHz RAM DDR4 using A-XMS stock settings.
Nothing unexpected or outstanding at first glance
Looks quite a bit faster than a 5800X at larger FFTs though!

Code:
Timings for 4096K FFT length (8 cores, 1 worker):  2.26 ms.  Throughput: 442.60 iter/sec.
Timings for 4480K FFT length (8 cores, 1 worker):  2.53 ms.  Throughput: 395.06 iter/sec.
Timings for 4608K FFT length (8 cores, 1 worker):  2.53 ms.  Throughput: 395.21 iter/sec.
Timings for 4800K FFT length (8 cores, 1 worker):  2.72 ms.  Throughput: 368.13 iter/sec.
Timings for 5120K FFT length (8 cores, 1 worker):  2.86 ms.  Throughput: 349.51 iter/sec.
Timings for 5376K FFT length (8 cores, 1 worker):  3.04 ms.  Throughput: 328.71 iter/sec.
Timings for 5600K FFT length (8 cores, 1 worker):  3.12 ms.  Throughput: 320.55 iter/sec.
Timings for 5760K FFT length (8 cores, 1 worker):  3.13 ms.  Throughput: 319.52 iter/sec.
Timings for 6144K FFT length (8 cores, 1 worker):  3.40 ms.  Throughput: 294.26 iter/sec.
Timings for 6400K FFT length (8 cores, 1 worker):  3.56 ms.  Throughput: 281.20 iter/sec.
Timings for 6720K FFT length (8 cores, 1 worker):  3.71 ms.  Throughput: 269.24 iter/sec.
Timings for 7168K FFT length (8 cores, 1 worker):  4.17 ms.  Throughput: 239.98 iter/sec.
Timings for 7680K FFT length (8 cores, 1 worker):  4.34 ms.  Throughput: 230.49 iter/sec.
Timings for 8000K FFT length (8 cores, 1 worker):  4.57 ms.  Throughput: 219.06 iter/sec.
Timings for 8064K FFT length (8 cores, 1 worker):  4.64 ms.  Throughput: 215.61 iter/sec.
Timings for 8192K FFT length (8 cores, 1 worker):  4.79 ms.  Throughput: 208.95 iter/sec.
vs

Code:
Best time for 4096K FFT length: 1.936 ms., avg: 1.962 ms.
Best time for 4480K FFT length: 2.416 ms., avg: 2.520 ms.
Best time for 4608K FFT length: 2.298 ms., avg: 2.382 ms.
Best time for 4800K FFT length: 2.299 ms., avg: 2.351 ms.
Best time for 5120K FFT length: 2.536 ms., avg: 2.655 ms.
Best time for 5376K FFT length: 3.022 ms., avg: 3.265 ms.
Best time for 5600K FFT length: 3.337 ms., avg: 3.511 ms.
Best time for 5760K FFT length: 4.065 ms., avg: 4.131 ms.
Best time for 6144K FFT length: 3.502 ms., avg: 3.672 ms.
Best time for 6400K FFT length: 3.539 ms., avg: 3.646 ms.
Best time for 6720K FFT length: 4.854 ms., avg: 4.948 ms.
Best time for 7168K FFT length: 5.050 ms., avg: 5.149 ms.
Best time for 7680K FFT length: 6.219 ms., avg: 6.283 ms.
Best time for 8000K FFT length: 5.612 ms., avg: 5.763 ms.
Best time for 8064K FFT length: 6.222 ms., avg: 6.392 ms.
Best time for 8192K FFT length: 6.397 ms., avg: 6.476 ms.

Last fiddled with by Mark Rose on 2022-05-06 at 07:38
Mark Rose is offline   Reply With Quote
Old 2022-05-06, 14:37   #873
M344587487
 
M344587487's Avatar
 
"Composite as Heck"
Oct 2017

17·53 Posts
Default

Quote:
Originally Posted by Mark Rose View Post
Looks quite a bit faster than a 5800X at larger FFTs though!
It looks like you've used the 5800X SMT results which are much worse at ~8M, the non-SMT results are only slightly below the 5800X3D? Scaling to very large FFT's might be where the 3D version pulls ahead, if the 18M result diverges drastically then it would be worth running the full range 8M onwards to find where the inflection points are. If those are found then nerfing the RAM bandwidth and re-running the range would be interesting to see how that factors in and at what FFT RAM bandwidth comes back into play.
M344587487 is offline   Reply With Quote
Old 2022-05-07, 19:17   #874
Mark Rose
 
Mark Rose's Avatar
 
"/X\(‘-‘)/X\"
Jan 2013

1011100001112 Posts
Default

Quote:
Originally Posted by M344587487 View Post
It looks like you've used the 5800X SMT results which are much worse at ~8M, the non-SMT results are only slightly below the 5800X3D? Scaling to very large FFT's might be where the 3D version pulls ahead, if the 18M result diverges drastically then it would be worth running the full range 8M onwards to find where the inflection points are. If those are found then nerfing the RAM bandwidth and re-running the range would be interesting to see how that factors in and at what FFT RAM bandwidth comes back into play.
It was the only benchmark I was able to find.
Mark Rose is offline   Reply With Quote
Old 2022-05-08, 05:52   #875
M344587487
 
M344587487's Avatar
 
"Composite as Heck"
Oct 2017

38516 Posts
Default

The non-SMT results are in the same post you linked, above the SMT results.
M344587487 is offline   Reply With Quote
Old 2022-05-11, 19:33   #876
Viliam Furik
 
Viliam Furik's Avatar
 
"Viliam Furík"
Jul 2018
Martin, Slovakia

14028 Posts
Default

I was thinking about the effect of memory bandwidth as a bottleneck for the performance of a GPU or a CPU. I thought about and tried to calculate the amount of bandwidth needed for 1 TFLOPS of FP64 to be fully used. But my results were about 130 GB/s, which seems too little in the context of Radeon VII, which houses roughly 3 TFLOPS of FP64 throughput, yet the actual performance in PRP tests differs.

I used the conversion 1 TFLOPS = 500 GHz-D/D, 500 GHz-D is one test with an exponent around 113,500,000, which needs 6144K FFT size, and that requires about 48 MiB of FFT data to be transferred, thus 113,500,000 * 48 MiB in one day is about 130 GB/s.

Could someone explain to me how the memory bandwidth affects the performance, and what could be used as rule-of-thumb conversion for the bandwidth required for 1 TFLOPS FP64 to be fully used?
Viliam Furik is offline   Reply With Quote
Old 2022-08-01, 14:16   #877
hj47
 
hj47's Avatar
 
Oct 2008

2·3·11 Posts
Default

Does anyone have a Ryzen 5700g and is willing to post benchmark results? I'd be curious to see what impact the 16mb of L3 has on wavefront PRP throughput.


Thanks in advance 👍
hj47 is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
Perpetual "interesting video" thread... Xyzzy Lounge 46 2022-07-22 11:59
LLR benchmark thread Oddball Riesel Prime Search 5 2010-08-02 00:11
Perpetual I'm pi**ed off thread rogue Soap Box 19 2009-10-28 19:17
Perpetual autostereogram thread... Xyzzy Lounge 10 2006-09-28 00:36
Perpetual ECM factoring challenge thread... Xyzzy Factoring 65 2005-09-05 08:16

All times are UTC. The time now is 16:21.


Mon Aug 15 16:21:55 UTC 2022 up 39 days, 11:09, 1 user, load averages: 1.62, 1.42, 1.44

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2022, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.

≠ ± ∓ ÷ × · − √ ‰ ⊗ ⊕ ⊖ ⊘ ⊙ ≤ ≥ ≦ ≧ ≨ ≩ ≺ ≻ ≼ ≽ ⊏ ⊐ ⊑ ⊒ ² ³ °
∠ ∟ ° ≅ ~ ‖ ⟂ ⫛
≡ ≜ ≈ ∝ ∞ ≪ ≫ ⌊⌋ ⌈⌉ ∘ ∏ ∐ ∑ ∧ ∨ ∩ ∪ ⨀ ⊕ ⊗ 𝖕 𝖖 𝖗 ⊲ ⊳
∅ ∖ ∁ ↦ ↣ ∩ ∪ ⊆ ⊂ ⊄ ⊊ ⊇ ⊃ ⊅ ⊋ ⊖ ∈ ∉ ∋ ∌ ℕ ℤ ℚ ℝ ℂ ℵ ℶ ℷ ℸ 𝓟
¬ ∨ ∧ ⊕ → ← ⇒ ⇐ ⇔ ∀ ∃ ∄ ∴ ∵ ⊤ ⊥ ⊢ ⊨ ⫤ ⊣ … ⋯ ⋮ ⋰ ⋱
∫ ∬ ∭ ∮ ∯ ∰ ∇ ∆ δ ∂ ℱ ℒ ℓ
𝛢𝛼 𝛣𝛽 𝛤𝛾 𝛥𝛿 𝛦𝜀𝜖 𝛧𝜁 𝛨𝜂 𝛩𝜃𝜗 𝛪𝜄 𝛫𝜅 𝛬𝜆 𝛭𝜇 𝛮𝜈 𝛯𝜉 𝛰𝜊 𝛱𝜋 𝛲𝜌 𝛴𝜎𝜍 𝛵𝜏 𝛶𝜐 𝛷𝜙𝜑 𝛸𝜒 𝛹𝜓 𝛺𝜔