mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware > Cloud Computing

Reply
 
Thread Tools
Old 2021-11-17, 13:05   #67
tdulcet
 
tdulcet's Avatar
 
"Teal Dulcet"
Jun 2018

71 Posts
Default

Quote:
Originally Posted by drkirkby View Post
Why would 3 workers give the most throughput on a dual-socket computer?
I ran the throughput benchmark on a c5.metal instance and got different results. Specifically, two workers were faster at the higher FFT lengths. Here are the fastest numbers of workers for each supported FFT length benchmarked by default:
  • 6 workers: 2048K, 2100K, 2160K, 2240K, 2304K, 2400K
  • 4 workers: 2520K, 2560K, 2592K, 2688K, 2880K, 2940K, 3000K, 3072K, 3136K, 3200K, 3360K, 3456K, 3600K, 3840K, 3920K, 4200K, 4320K, 4480K, 4800K
  • 3 workers: 4032K
  • 2 workers: 4608K, 4704K, 5040K, 5120K, 5184K, 5376K, 5760K, 6048K, 6144K, 6272K, 6400K, 6720K, 7056K, 7168K, 7200K, 7680K, 8064K
Here are the actual results for one of the FFT lengths used for wavefront first time tests:
Code:
Timings for 6144K FFT length (48 cores, 1 worker): 1.35 ms. Throughput: 740.08 iter/sec.
Timings for 6144K FFT length (48 cores, 2 workers): 1.16, 1.19 ms. Throughput: 1697.08 iter/sec.
Timings for 6144K FFT length (48 cores, 3 workers): 3.04, 3.07, 1.23 ms. Throughput: 1470.23 iter/sec.
Timings for 6144K FFT length (48 cores, 4 workers): 3.05, 3.02, 3.02, 3.00 ms. Throughput: 1322.79 iter/sec.
Timings for 6144K FFT length (48 cores, 6 workers): 5.47, 5.47, 5.48, 5.39, 5.37, 5.39 ms. Throughput: 1105.26 iter/sec.
Timings for 6144K FFT length (48 cores, 8 workers): 7.56, 7.54, 7.56, 7.55, 7.41, 7.50, 7.46, 7.44 ms. Throughput: 1066.38 iter/sec.
Timings for 6144K FFT length (48 cores, 12 workers): 11.56, 11.61, 11.62, 12.32, 11.57, 11.51, 11.54, 11.55, 11.29, 11.40, 11.43, 11.25 ms. Throughput: 1039.05 iter/sec.
Timings for 6144K FFT length (48 cores, 16 workers): 20.99, 20.72, 20.95, 20.82, 20.78, 21.04, 20.89, 20.78, 14.67, 13.45, 14.54, 14.61, 13.46, 13.71, 13.94, 14.91 ms. Throughput: 949.13 iter/sec.
Timings for 6144K FFT length (48 cores, 24 workers): 57.30, 56.56, 56.51, 56.29, 56.69, 56.99, 56.67, 56.94, 56.71, 56.65, 56.85, 56.70, 26.03, 30.28, 25.78, 27.11, 29.24, 29.75, 27.03, 29.71, 27.35, 28.07, 30.19, 28.15 ms. Throughput: 637.96 iter/sec.
Timings for 6144K FFT length (48 cores, 48 workers): 130.05, 132.14, 128.50, 128.65, 129.51, 129.92, 128.45, 129.71, 128.78, 129.41, 130.18, 128.95, 130.09, 129.10, 130.14, 129.61, 128.04, 130.51, 129.25, 129.42, 129.92, 130.49, 129.53, 131.25, 86.04, 102.15, 87.65, 103.38, 74.75, 91.32, 91.88, 76.62, 75.89, 103.66, 101.44, 101.42, 95.30, 93.57, 79.79, 102.96, 72.71, 95.29, 98.47, 87.29, 100.54, 87.55, 94.32, 102.50 ms. Throughput: 449.43 iter/sec.
MPrime by default wanted to use 12 workers, but 2 workers is significantly faster.
tdulcet is offline   Reply With Quote
Old 2021-11-17, 15:06   #68
drkirkby
 
"David Kirkby"
Jan 2021
Althorne, Essex, UK

449 Posts
Default

Quote:
Originally Posted by tdulcet View Post
I ran the throughput benchmark on a c5.metal instance and got different results.
Are you using the same CPUs as I had - 2 x Intel Xeon Platinum 8275CL CPU @ 3.00GHz? The c5.metal does not specify the CPU type.

Amazon AWS seems to use some odd CPUs, which makes them difficult to use in other machines. I had a couple of high-spec CPUs (I think 26 core 2.6 GHz), but they would not run in my Dell 7920. Apparently they were used by Amazon for AWS, but were not supported by many motherboards.
drkirkby is offline   Reply With Quote
Old 2021-11-17, 16:39   #69
tdulcet
 
tdulcet's Avatar
 
"Teal Dulcet"
Jun 2018

1078 Posts
Default

Quote:
Originally Posted by drkirkby View Post
Are you using the same CPUs as I had - 2 x Intel Xeon Platinum 8275CL CPU @ 3.00GHz? The c5.metal does not specify the CPU type.
Yes, the same CPU:
Code:
$ wget https://raw.github.com/tdulcet/Linux-System-Information/master/info.sh -qO - | bash -s

Linux Distribution: Ubuntu 20.04.3 LTS
Linux Kernel: 5.11.0-1020-aws
Computer Model: Amazon EC2 c5.metal 1.0
Processor (CPU): Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz
CPU Cores/Threads: 48/96
Architecture: x86_64 (64-bit)
Total memory (RAM): 193053 MiB (189GiB) (202431 MB (203GB))
Total swap space: 0 MiB (0 MB)
Disk space: nvme0n1: 512000 MiB (500GiB) (536870 MB (537GB))
I was not sure how many workers would be used, so I ended up getting much more disk space than I now need for the PRP proof files, even at proof power 10...
tdulcet is offline   Reply With Quote
Old 2022-09-14, 15:22   #70
tdulcet
 
tdulcet's Avatar
 
"Teal Dulcet"
Jun 2018

71 Posts
Default

AWS has new C6i instances, which have 33% more CPU cores and memory compared to the C5 instances and they claim provide up to 9% higher memory bandwidth. I ran the MPrime throughput benchmark on a c6i.metal instance and they are just over 3% faster when accounting for the extra CPU cores. As before, two workers were faster at the higher FFT lengths. Here are the fastest numbers of workers for each supported FFT length benchmarked by default:
  • 4 workers: 3072K, 3136K, 3200K, 3360K, 3456K, 3600K, 3840K, 3920K, 4032K, 4200K, 4320K, 4480K, 4608K, 4704K, 4800K, 5040K, 5120K, 5184K, 5760K
  • 2 workers: 5376K, 6048K, 6144K, 6272K, 6400K, 6720K, 7056K, 7168K, 7200K, 7680K, 8064K
and the actual results for one of the FFT lengths used for wavefront first time tests:
Code:
Timings for 6144K FFT length (64 cores, 1 worker):  1.24 ms.  Throughput: 807.35 iter/sec.
Timings for 6144K FFT length (64 cores, 2 workers):  0.86,  0.87 ms.  Throughput: 2318.22 iter/sec.
Timings for 6144K FFT length (64 cores, 4 workers):  1.95,  1.91,  1.98,  1.94 ms.  Throughput: 2058.45 iter/sec.
Timings for 6144K FFT length (64 cores, 8 workers):  5.27,  5.25,  5.25,  5.27,  5.28,  5.23,  5.30,  5.26 ms.  Throughput: 1520.27 iter/sec.
Timings for 6144K FFT length (64 cores, 16 workers): 11.41, 11.86, 11.38, 11.40, 11.26, 11.44, 11.39, 11.43, 11.37, 11.28, 11.30, 11.13, 11.39, 11.33, 11.33, 11.42 ms.  Throughput: 1405.95 iter/sec.
Timings for 6144K FFT length (64 cores, 32 workers): 23.49, 23.06, 23.28, 23.11, 23.65, 23.80, 23.32, 23.34, 23.03, 23.23, 23.34, 23.20, 23.13, 23.23, 23.33, 23.40, 23.42, 23.30, 23.32, 23.12, 23.20, 23.21, 23.22, 23.36, 23.16, 23.22, 23.56, 23.30, 23.16, 23.24, 23.39, 23.52 ms.  Throughput: 1373.40 iter/sec.
Timings for 6144K FFT length (64 cores, 64 workers): 47.16, 47.00, 47.11, 47.05, 46.94, 47.28, 46.44, 46.86, 46.70, 47.37, 46.98, 46.54, 46.59, 47.13, 46.90, 47.48, 46.78, 47.04, 47.02, 46.70, 47.30, 46.80, 47.10, 46.58, 47.08, 46.86, 46.91, 47.04, 47.32, 46.99, 47.56, 47.58, 47.04, 47.14, 46.68, 47.16, 47.37, 46.90, 47.05, 46.90, 46.88, 46.86, 46.83, 47.05, 47.24, 46.82, 46.86, 46.95, 46.49, 47.07, 47.31, 46.97, 47.04, 47.16, 47.00, 46.72, 46.85, 47.16, 46.63, 46.64, 47.34, 47.38, 47.11, 47.50 ms.  Throughput: 1361.62 iter/sec.
I attached the full results.bench.txt file. MPrime by default wanted to use 16 workers.

When using my Mlucas install script to do a throughput benchmark with Mlucas, either 32 or 64 workers were faster for most FFT lengths, but with only half the total throughput:
Code:
Benchmark Summary

        Adjusted msec/iter times (ms/iter) vs Actual iters/sec total throughput (iter/s) for each combination

FFT     #1                 #2                 #3                 #4                 #5                 #6                 #7                #8                 #9                 #10                #11                #12                #13                #14
length  ms/iter  iter/s    ms/iter  iter/s    ms/iter  iter/s    ms/iter  iter/s    ms/iter  iter/s    ms/iter  iter/s    ms/iter  iter/s   ms/iter  iter/s    ms/iter  iter/s    ms/iter  iter/s    ms/iter  iter/s    ms/iter  iter/s    ms/iter  iter/s    ms/iter  iter/s
2048K   9.2      3877.988  9.9      4049.134  10.8     3677.103  12.16    4297.167  16.48    3212.725  41.28    1524.390  197.76   308.261  8.61     3204.588  9.04     3302.408  10.08    3315.682  12.08    4032.574  20       2565.038  90.88    1011.809  -        -
2304K   11.08    3387.758  11.48    3209.781  13.16    3084.030  15.36    3222.328  21.6     2330.516  47.68    1339.686  -        -        10.17    2816.611  10.92    2933.168  12.04    2714.630  14.64    3034.511  25.12    2059.512  -        -         -        -
2560K   11.98    3071.143  12.44    2906.295  13.56    2851.886  15.76    2918.760  21.92    2293.556  49.92    1284.611  -        -        11.07    2551.369  11.62    2421.313  13.04    2466.145  16.4     2814.493  26.24    1960.345  -        -         -        -
2816K   14.06    2743.639  14.88    2596.983  16.4     2348.945  18.32    2365.856  25.12    2038.319  54.08    1090.517  -        -        12.79    2233.350  13.78    2423.773  14.92    2132.498  18.56    2271.102  27.84    1774.640  -        -         -        -
3072K   14.2     2572.612  14.82    2587.992  15.96    2337.116  18.8     2569.891  24.8     2052.901  53.76    1078.431  -        -        13.28    2091.157  13.84    2085.101  14.88    1936.213  19.28    1794.908  29.76    1704.247  -        -         -        -
3328K   16.13    2347.267  16.78    2249.035  18.32    1831.000  20.88    1857.355  26.56    2006.182  55.68    1018.065  -        -        14.97    1908.566  15.8     2041.972  17       1875.230  20.32    1549.497  30.24    1655.682  -        -         -        -
3584K   16.74    2184.298  17.4     2071.261  18.84    2015.334  22.08    1701.082  26.24    2001.601  55.68    1008.492  -        -        15.67    1776.107  16.12    1700.491  17.76    1779.794  21.28    1468.572  29.92    1642.884  -        -         -        -
3840K   18.91    2007.035  19.2     1927.621  21.24    1824.570  22.8     1704.757  28       1827.642  56.64    1004.005  -        -        17.55    1642.385  18.74    1571.324  19.76    1593.616  22.4     1345.302  32.16    1562.182  -        -         -        -
4096K   19.24    1908.389  19.76    1829.629  20.84    1773.479  22.8     1648.035  29.12    1884.797  53.76    1008.081  222.72   275.938  18.56    1555.231  19.2     1493.142  19.88    1434.561  22.4     1261.338  32.48    1575.412  126.08   817.072   -        -
4608K   23.77    1425.222  24.22    1403.128  25.24    1370.652  26.88    1273.997  34.72    1353.899  63.36    838.937   242.56   252.525  21.44    1513.768  22.08    1453.030  23.12    1374.998  26.56    1078.660  38.4     1202.060  133.44   749.489   -        -
5120K   26       1299.523  26.62    1270.726  28.28    1259.305  29.28    1151.423  35.52    1273.578  68.48    929.660   257.28   240.442  23.68    1365.599  24.44    1319.097  25.04    1250.206  28.08    987.904   40.8     1073.267  138.56   654.660   -        -
5632K   31.53    1139.601  32.04    1121.889  33.48    1099.795  34.96    1011.509  43.84    1035.287  74.88    728.940   280.32   206.228  27.91    1228.281  28.54    1217.586  29.76    1130.747  33.04    887.546   44.16    864.658   159.36   593.240   -        -
6144K   34.24    1054.845  34.98    1069.896  36.48    1020.605  38.16    935.837   46.56    877.383   79.04    691.432   277.12   212.540  30.84    1117.139  31.42    1083.632  32.4     1018.938  35.68    826.421   48.48    737.946   158.72   577.898   -        -
6656K   36.88    978.016   38.14    967.198   39.28    947.125   40.64    864.520   49.76    742.998   86.08    643.316   288.64   210.393  32.93    1047.741  33.64    1004.116  34.84    956.815   39.28    774.321   52.48    636.751   161.6    568.586   -        -
7168K   40.28    909.258   41.96    898.860   43.24    894.615   44.24    826.761   53.6     676.205   86.4     637.452   288.64   215.889  35.86    969.420   36.72    959.746   37.4     878.828   40.8     724.044   54.08    579.357   170.88   546.273   -        -
7680K   45.68    827.973   47.08    823.036   48.76    807.410   50.48    740.681   58.4     607.710   93.12    611.168   309.76   189.394  40.41    895.935   41.44    863.145   42.16    820.114   46.32    657.822   60.64    536.359   179.84   404.555   -        -

Fastest combination
#  Workers/Runs  Threads  First -cpu argument
1  64            1        0

Mean ± σ std dev faster  #   Workers/Runs  Threads  First -cpu argument
1.023 ± 0.028 (2.3%)     2   32            2        0:1
1.079 ± 0.065 (7.9%)     3   16            4        0:3
1.119 ± 0.089 (11.9%)    4   8             8        0:7
1.211 ± 0.134 (21.1%)    5   4             16       0:15
1.951 ± 0.439 (95.1%)    6   2             32       0:31
6.030 ± 2.440 (503.0%)   7   1             64       0:63
1.097 ± 0.141 (9.7%)     8   64            2        0,64
1.107 ± 0.133 (10.7%)    9   32            4        0:1,64:65
1.158 ± 0.122 (15.8%)    10  16            8        0:3,64:67
1.300 ± 0.156 (30.0%)    11  8             16       0:7,64:71
1.426 ± 0.145 (42.6%)    12  4             32       0:15,64:79
2.137 ± 0.627 (113.7%)   13  2             64       0:31,64:95
I attached the full bench.txt file. For reference, here is the system information:
Code:
$ wget https://raw.github.com/tdulcet/Linux-System-Information/master/info.sh -qO - | bash -s

Linux Distribution:             Ubuntu 22.04 LTS
Linux Kernel:                   5.15.0-1011-aws
Computer Model:                 Amazon EC2 c6i.metal 110-003545-001
Processor (CPU):                Intel(R) Xeon(R) Platinum 8375C CPU @ 2.90GHz
CPU Sockets/Cores/Threads:      2/64/128
Architecture:                   x86_64 (64-bit)
Total memory (RAM):             257746 MiB (252GiB) (270266 MB (271GB))
Total swap space:               0 MiB (0 MB)
Disk space:                     nvme0n1: 30720 MiB (30GiB) (32212 MB (33GB))
Attached Files
File Type: txt results.bench.txt (62.3 KB, 22 views)
File Type: txt bench.txt (53.6 KB, 24 views)
tdulcet is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
How-to guide for running LL tests on Google Compute Engine cloud GP2 Cloud Computing 4 2020-08-03 11:21
Is it possible to disable benchmarking while torture tests are running? ZFR Software 4 2018-02-02 20:18
Amazon Cloud Outrage kladner Science & Technology 7 2017-03-02 14:18
running single tests fast dragonbud20 Information & Answers 12 2015-09-26 21:40
LL tests running at different speeds GARYP166 Information & Answers 11 2009-07-13 19:39

All times are UTC. The time now is 03:21.


Wed Dec 7 03:21:21 UTC 2022 up 111 days, 49 mins, 0 users, load averages: 1.83, 1.21, 1.01

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2022, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.

≠ ± ∓ ÷ × · − √ ‰ ⊗ ⊕ ⊖ ⊘ ⊙ ≤ ≥ ≦ ≧ ≨ ≩ ≺ ≻ ≼ ≽ ⊏ ⊐ ⊑ ⊒ ² ³ °
∠ ∟ ° ≅ ~ ‖ ⟂ ⫛
≡ ≜ ≈ ∝ ∞ ≪ ≫ ⌊⌋ ⌈⌉ ∘ ∏ ∐ ∑ ∧ ∨ ∩ ∪ ⨀ ⊕ ⊗ 𝖕 𝖖 𝖗 ⊲ ⊳
∅ ∖ ∁ ↦ ↣ ∩ ∪ ⊆ ⊂ ⊄ ⊊ ⊇ ⊃ ⊅ ⊋ ⊖ ∈ ∉ ∋ ∌ ℕ ℤ ℚ ℝ ℂ ℵ ℶ ℷ ℸ 𝓟
¬ ∨ ∧ ⊕ → ← ⇒ ⇐ ⇔ ∀ ∃ ∄ ∴ ∵ ⊤ ⊥ ⊢ ⊨ ⫤ ⊣ … ⋯ ⋮ ⋰ ⋱
∫ ∬ ∭ ∮ ∯ ∰ ∇ ∆ δ ∂ ℱ ℒ ℓ
𝛢𝛼 𝛣𝛽 𝛤𝛾 𝛥𝛿 𝛦𝜀𝜖 𝛧𝜁 𝛨𝜂 𝛩𝜃𝜗 𝛪𝜄 𝛫𝜅 𝛬𝜆 𝛭𝜇 𝛮𝜈 𝛯𝜉 𝛰𝜊 𝛱𝜋 𝛲𝜌 𝛴𝜎𝜍 𝛵𝜏 𝛶𝜐 𝛷𝜙𝜑 𝛸𝜒 𝛹𝜓 𝛺𝜔