mersenneforum.org  

Go Back   mersenneforum.org > New To GIMPS? Start Here! > Information & Answers

Reply
 
Thread Tools
Old 2017-11-29, 17:45   #1
daxmick
 
daxmick's Avatar
 
Feb 2014

5416 Posts
Default Number of workers vs. number of CPUs

I'm a little confused on how many workers I should spawn in prime95. Should I spawn 1 worker per CPU, per Core, or per Hyperthread "core"?

The prime95 program offered to run 6 workers on my WIN7 virtual machine that I've assigned 32GB RAM and 20vCPUs to. Is that a good ratio? It appears that the VM is running at 100% CPU.

Would it be better to run 1 worker and dedicate 20 CPUs to it?
daxmick is offline   Reply With Quote
Old 2017-11-29, 18:41   #2
VBCurtis
 
VBCurtis's Avatar
 
"Curtis"
Feb 2005
Riverside, CA

672 Posts
Default

Boundaries that seem to apply across all CPU families:
More than one worker per physical core is not optimal (hyperthreaded cores should not be considered for worker count).
Assigning one worker to more threads than a single physical socket is inefficient; each socket should get its own worker, at minimum.

Within those two bounds, optimal production is determined by experimentation; the benchmark tools mostly automate this, but virtual machines are hard to pin down because thread assignments may go to HT cores sometimes but not others.
VBCurtis is offline   Reply With Quote
Old 2017-11-29, 18:49   #3
petrw1
1976 Toyota Corona years forever!
 
petrw1's Avatar
 
"Wayne"
Nov 2006
Saskatchewan, Canada

105658 Posts
Default

Quote:
Originally Posted by daxmick View Post
I'm a little confused on how many workers I should spawn in prime95. Should I spawn 1 worker per CPU, per Core, or per Hyperthread "core"?

The prime95 program offered to run 6 workers on my WIN7 virtual machine that I've assigned 32GB RAM and 20vCPUs to. Is that a good ratio? It appears that the VM is running at 100% CPU.

Would it be better to run 1 worker and dedicate 20 CPUs to it?
Welcome.

First 100% CPU is always expected as Prime95 is very efficient.

NEVER allocate more workers than Physical cores. (There is the very odd exception to this rule but not enough to consider).

I'm guessing Prime95 thinks you have 6 Physical Cores...

If you do indeed have 6 cores the general rule is to run 6 workers with 1 core each.
Sometime it is slightly more efficient to run less workers with more cores each: For example 3 workers with 2 cores each or 2 workers with 3 cores each.

If you want to complete a very large assignment quickly allocate all 6 cores to 1 worker. However, the overall throughput will be up to 25% less than 6 workers with 1 core each.
NOTE: a very large assignment is something like an LL test on an exponent over 100 Million.

If you have more or less physical cores adjust appropriately.
petrw1 is offline   Reply With Quote
Old 2017-11-29, 21:22   #4
daxmick
 
daxmick's Avatar
 
Feb 2014

22×3×7 Posts
Default

Quote:
Originally Posted by petrw1 View Post
I'm guessing Prime95 thinks you have 6 Physical Cores...
First, thank you for the quick reply!
The odd thing is that I have 2 physical CPUs (sockets) and each have 12 cores. So, if my math is correct I have 24 cores (48 with hyperthreading).

So, if I want to maximize the number of "things" I'm working on I "could" have 12 workers, or if I wanted to maximize speed on completing a single "thing" I could have 1 worker. Is that how I should look at this?
daxmick is offline   Reply With Quote
Old 2017-11-29, 21:35   #5
petrw1
1976 Toyota Corona years forever!
 
petrw1's Avatar
 
"Wayne"
Nov 2006
Saskatchewan, Canada

446910 Posts
Default

Quote:
Originally Posted by daxmick View Post
First, thank you for the quick reply!
The odd thing is that I have 2 physical CPUs (sockets) and each have 12 cores. So, if my math is correct I have 24 cores (48 with hyperthreading).

So, if I want to maximize the number of "things" I'm working on I "could" have 12 workers, or if I wanted to maximize speed on completing a single "thing" I could have 1 worker. Is that how I should look at this?
If you really have 24 cores then you should have 24 workers.
Your limiting factor may be RAM.
With 32GB and 24 workers definitely do NOT run P-1 tests.

Again unless you are doing a REALLY big assignment you would lose a reasonable amount of overall thruput putting all 24 cores on 1 assignment.

As VBCurtis your best bet would be to run the Benchmark tool.
On Version 2.8.x in Windows it is:
Options... Benchmark.
In Version 2.9.x there are a few more options. I believe you want a "Throughput" benchmark. Maybe someone can correct me.

It the end it should direct you to the best worker/core mix.
And further indicate the number of Physical cores.
petrw1 is offline   Reply With Quote
Old 2017-11-29, 21:38   #6
Prime95
P90 years forever!
 
Prime95's Avatar
 
Aug 2002
Yeehaw, FL

1BF716 Posts
Default

Options/Benchmark is your friend. Prime95 arbitrarily guessed 4 cores/worker would be pretty good.

Do a throughput benchmark using all 24 cores, a 4M FFT size, and 2,4,6,8,12 workers. Let us know what was best -- we are a curious bunch.
Prime95 is offline   Reply With Quote
Old 2017-11-29, 23:14   #7
daxmick
 
daxmick's Avatar
 
Feb 2014

22×3×7 Posts
Default

Quote:
Originally Posted by petrw1 View Post
With 32GB and 24 workers definitely do NOT run P-1 tests.
So, RAM is included in the calculation? That adds to the question then... how much RAM per core should I account for? Or is it RAM per worker? I have up to 128GB of RAM available.

Quote:
Originally Posted by Prime95 View Post
Options/Benchmark is your friend. Prime95 arbitrarily guessed 4 cores/worker would be pretty good.

Do a throughput benchmark using all 24 cores, a 4M FFT size, and 2,4,6,8,12 workers. Let us know what was best -- we are a curious bunch.
I wasn't able to adjust the workers for the Throughput benchmark. It was 1,2,6,20 cores (currently with 32GB RAM). Unfortunately I don't know how this program works well enough to really read the results (unless it is just the max iter/sec value). Maybe someone can decode/explain it? Here are the results:
<snip>
[Wed Nov 29 14:40:46 2017]
Compare your results to other computers at http://www.mersenne.org/report_benchmarks
Intel(R) Xeon(R) CPU E5-2640 0 @ 2.50GHz
CPU speed: 1371.03 MHz, 20 cores
CPU features: Prefetch, SSE, SSE2, SSE4, AVX
L1 cache size: 32 KB
L2 cache size: 256 KB, L3 cache size: 15 MB
L1 cache line size: 64 bytes
L2 cache line size: 64 bytes
TLBS: 64
Machine topology as determined by hwloc library:
Machine#0 (total=31082972KB, Backend=Windows, hwlocVersion=1.11.6, ProcessName=prime95.exe)
NUMANode#0 (local=15302680KB, total=15302680KB)
Package#0 (CPUVendor=GenuineIntel, CPUFamilyNumber=6, CPUModelNumber=45, CPUModel="Intel(R) Xeon(R) CPU E5-2640 0 @ 2.50GHz", CPUStepping=7)
L3 (size=15360KB, linesize=64, ways=20, Inclusive=1)
L2 (size=256KB, linesize=64, ways=8, Inclusive=0)
L1d (size=32KB, linesize=64, ways=8, Inclusive=0)
Core (cpuset: 0x00000001)
PU#0 (cpuset: 0x00000001)
Core (cpuset: 0x00000002)
PU#1 (cpuset: 0x00000002)
Core (cpuset: 0x00000004)
PU#2 (cpuset: 0x00000004)
Core (cpuset: 0x00000008)
PU#3 (cpuset: 0x00000008)
Core (cpuset: 0x00000010)
PU#4 (cpuset: 0x00000010)
Core (cpuset: 0x00000020)
PU#5 (cpuset: 0x00000020)
Core (cpuset: 0x00000040)
PU#6 (cpuset: 0x00000040)
Core (cpuset: 0x00000080)
PU#7 (cpuset: 0x00000080)
Core (cpuset: 0x00000100)
PU#8 (cpuset: 0x00000100)
Core (cpuset: 0x00000200)
PU#9 (cpuset: 0x00000200)
NUMANode#1 (local=15780292KB, total=15780292KB)
Package#1 (CPUVendor=GenuineIntel, CPUFamilyNumber=6, CPUModelNumber=45, CPUModel="Intel(R) Xeon(R) CPU E5-2640 0 @ 2.50GHz", CPUStepping=7)
L3 (size=15360KB, linesize=64, ways=20, Inclusive=1)
L2 (size=256KB, linesize=64, ways=8, Inclusive=0)
L1d (size=32KB, linesize=64, ways=8, Inclusive=0)
Core (cpuset: 0x00000400)
PU#10 (cpuset: 0x00000400)
Core (cpuset: 0x00000800)
PU#11 (cpuset: 0x00000800)
Core (cpuset: 0x00001000)
PU#12 (cpuset: 0x00001000)
Core (cpuset: 0x00002000)
PU#13 (cpuset: 0x00002000)
Core (cpuset: 0x00004000)
PU#14 (cpuset: 0x00004000)
Core (cpuset: 0x00008000)
PU#15 (cpuset: 0x00008000)
Core (cpuset: 0x00010000)
PU#16 (cpuset: 0x00010000)
Core (cpuset: 0x00020000)
PU#17 (cpuset: 0x00020000)
Core (cpuset: 0x00040000)
PU#18 (cpuset: 0x00040000)
Core (cpuset: 0x00080000)
PU#19 (cpuset: 0x00080000)
Prime95 64-bit version 29.4, RdtscTiming=1
Timings for 2048K FFT length (20 cores, 1 worker): 2.40 ms. Throughput: 417.35 iter/sec.
Timings for 2048K FFT length (20 cores, 2 workers): 3.50, 3.51 ms. Throughput: 570.46 iter/sec.
Timings for 2048K FFT length (20 cores, 6 workers): 10.70, 10.69, 8.42, 9.42, 12.00, 8.57 ms. Throughput: 611.89 iter/sec.
Timings for 2048K FFT length (20 cores, 20 workers): 38.21, 38.33, 38.31, 38.31, 17.91, 35.94, 36.37, 18.15, 38.42, 38.39, 38.36, 38.32, 17.43, 38.42, 38.28, 38.19, 38.04, 38.25, 17.36, 38.39 ms. Throughput: 646.77 iter/sec.
Timings for 2304K FFT length (20 cores, 1 worker): 4.15 ms. Throughput: 241.13 iter/sec.
Timings for 2304K FFT length (20 cores, 2 workers): 3.85, 3.88 ms. Throughput: 517.34 iter/sec.
Timings for 2304K FFT length (20 cores, 6 workers): 11.53, 10.27, 9.75, 10.23, 10.69, 10.34 ms. Throughput: 574.78 iter/sec.
Timings for 2304K FFT length (20 cores, 20 workers): 40.52, 23.15, 35.27, 40.88, 40.39, 21.67, 40.95, 39.49, 31.57, 38.20, 39.93, 40.20, 19.38, 40.52, 40.27, 40.41, 19.36, 40.17, 40.23, 40.44 ms. Throughput: 601.11 iter/sec.
Timings for 2400K FFT length (20 cores, 1 worker): 3.42 ms. Throughput: 292.46 iter/sec.
Timings for 2400K FFT length (20 cores, 2 workers): 3.77, 3.77 ms. Throughput: 530.75 iter/sec.
Timings for 2400K FFT length (20 cores, 6 workers): 12.33, 12.89, 7.86, 11.96, 9.72, 9.84 ms. Throughput: 574.05 iter/sec.
Timings for 2400K FFT length (20 cores, 20 workers): 39.98, 40.55, 28.61, 40.29, 40.02, 39.59, 37.99, 24.52, 22.99, 40.43, 32.32, 40.09, 40.28, 40.18, 20.40, 40.19, 40.07, 40.01, 40.43, 23.68 ms. Throughput: 591.48 iter/sec.
Timings for 2560K FFT length (20 cores, 1 worker): 3.14 ms. Throughput: 318.65 iter/sec.
Timings for 2560K FFT length (20 cores, 2 workers): 4.35, 4.30 ms. Throughput: 462.82 iter/sec.
Timings for 2560K FFT length (20 cores, 6 workers): 11.99, 13.23, 11.09, 11.03, 14.05, 11.33 ms. Throughput: 499.19 iter/sec.
Timings for 2560K FFT length (20 cores, 20 workers): 39.06, 37.04, 47.32, 46.93, 22.23, 49.12, 51.32, 27.56, 48.94, 51.36, 48.66, 48.86, 48.21, 34.94, 49.27, 49.53, 49.32, 49.49, 25.27, 21.49 ms. Throughput: 513.50 iter/sec.
Timings for 2688K FFT length (20 cores, 1 worker): 3.28 ms. Throughput: 304.46 iter/sec.
Timings for 2688K FFT length (20 cores, 2 workers): 4.36, 4.35 ms. Throughput: 459.37 iter/sec.
[Wed Nov 29 14:45:54 2017]
Timings for 2688K FFT length (20 cores, 6 workers): 13.08, 13.23, 10.58, 12.76, 12.83, 11.04 ms. Throughput: 493.39 iter/sec.
Timings for 2688K FFT length (20 cores, 20 workers): 24.45, 47.69, 48.00, 47.75, 28.96, 38.76, 37.22, 48.61, 48.32, 48.57, 47.70, 47.90, 23.82, 23.24, 44.10, 47.90, 48.27, 48.27, 48.01, 47.68 ms. Throughput: 506.34 iter/sec.
Timings for 2880K FFT length (20 cores, 1 worker): 4.37 ms. Throughput: 228.88 iter/sec.
Timings for 2880K FFT length (20 cores, 2 workers): 4.73, 4.85 ms. Throughput: 417.48 iter/sec.
Timings for 2880K FFT length (20 cores, 6 workers): 16.77, 13.26, 10.49, 12.55, 14.41, 12.31 ms. Throughput: 460.71 iter/sec.
Timings for 2880K FFT length (20 cores, 20 workers): 37.25, 40.22, 48.85, 33.47, 48.82, 24.89, 48.70, 49.19, 48.90, 49.24, 25.41, 46.63, 45.70, 48.11, 48.66, 48.40, 25.95, 48.13, 49.43, 49.20 ms. Throughput: 488.89 iter/sec.
Timings for 3072K FFT length (20 cores, 1 worker): 3.55 ms. Throughput: 281.84 iter/sec.
Timings for 3072K FFT length (20 cores, 2 workers): 5.41, 5.41 ms. Throughput: 369.54 iter/sec.
Timings for 3072K FFT length (20 cores, 6 workers): 18.44, 17.31, 11.57, 11.79, 19.59, 16.38 ms. Throughput: 395.31 iter/sec.
Timings for 3072K FFT length (20 cores, 20 workers): 62.78, 26.01, 63.60, 56.58, 67.10, 68.04, 68.02, 26.69, 66.81, 68.02, 68.14, 49.76, 36.64, 62.25, 31.23, 46.76, 52.42, 46.79, 61.31, 68.11 ms. Throughput: 402.18 iter/sec.
Timings for 3200K FFT length (20 cores, 1 worker): 5.88 ms. Throughput: 169.94 iter/sec.
Timings for 3200K FFT length (20 cores, 2 workers): 5.41, 6.12 ms. Throughput: 348.38 iter/sec.
Timings for 3200K FFT length (20 cores, 6 workers): 14.67, 15.56, 13.00, 16.57, 14.94, 11.97 ms. Throughput: 420.22 iter/sec.
Timings for 3200K FFT length (20 cores, 20 workers): 30.51, 46.17, 54.46, 39.68, 38.83, 54.05, 54.58, 54.56, 54.89, 46.56, 55.20, 54.06, 55.14, 55.12, 52.44, 54.26, 53.96, 54.72, 27.76, 28.53 ms. Throughput: 436.89 iter/sec.
Timings for 3360K FFT length (20 cores, 1 worker): 3.65 ms. Throughput: 273.63 iter/sec.
Timings for 3360K FFT length (20 cores, 2 workers): 5.39, 5.39 ms. Throughput: 370.87 iter/sec.
Timings for 3360K FFT length (20 cores, 6 workers): 16.19, 15.84, 13.75, 14.42, 17.07, 14.15 ms. Throughput: 396.18 iter/sec.
Timings for 3360K FFT length (20 cores, 20 workers): 31.55, 58.57, 58.84, 55.16, 58.61, 59.27, 58.73, 29.12, 54.68, 59.17, 58.91, 47.68, 58.45, 58.94, 34.64, 53.75, 51.15, 58.31, 31.31, 59.16 ms. Throughput: 409.43 iter/sec.
Timings for 3456K FFT length (20 cores, 1 worker): 4.10 ms. Throughput: 244.11 iter/sec.
[Wed Nov 29 14:51:09 2017]
Timings for 3456K FFT length (20 cores, 2 workers): 5.92, 5.87 ms. Throughput: 339.44 iter/sec.
Timings for 3456K FFT length (20 cores, 6 workers): 19.57, 17.84, 13.56, 16.80, 18.63, 14.50 ms. Throughput: 363.07 iter/sec.
Timings for 3456K FFT length (20 cores, 20 workers): 65.46, 61.70, 63.60, 65.22, 66.48, 33.04, 52.40, 66.38, 34.28, 66.15, 66.54, 39.14, 64.78, 66.58, 64.80, 62.08, 64.60, 46.17, 65.94, 31.01 ms. Throughput: 373.41 iter/sec.
Timings for 3584K FFT length (20 cores, 1 worker): 4.16 ms. Throughput: 240.15 iter/sec.
Timings for 3584K FFT length (20 cores, 2 workers): 6.72, 6.71 ms. Throughput: 297.75 iter/sec.
Timings for 3584K FFT length (20 cores, 6 workers): 20.76, 22.58, 14.98, 16.59, 24.33, 16.51 ms. Throughput: 321.14 iter/sec.
Timings for 3584K FFT length (20 cores, 20 workers): 76.80, 75.28, 75.80, 80.42, 72.56, 81.19, 72.34, 33.61, 81.32, 32.69, 81.03, 80.52, 73.16, 77.12, 76.79, 33.11, 64.69, 79.31, 38.12, 75.66 ms. Throughput: 326.63 iter/sec.
Timings for 3840K FFT length (20 cores, 1 worker): 5.41 ms. Throughput: 185.01 iter/sec.
Timings for 3840K FFT length (20 cores, 2 workers): 6.59, 6.56 ms. Throughput: 304.39 iter/sec.
Timings for 3840K FFT length (20 cores, 6 workers): 17.96, 20.20, 16.72, 15.19, 23.88, 17.07 ms. Throughput: 331.30 iter/sec.
Timings for 3840K FFT length (20 cores, 20 workers): 71.68, 39.16, 53.96, 71.43, 70.50, 53.34, 71.57, 49.44, 71.82, 56.43, 68.67, 68.87, 40.21, 70.96, 71.39, 71.04, 40.10, 44.88, 71.68, 71.31 ms. Throughput: 342.12 iter/sec.
Timings for 4032K FFT length (20 cores, 1 worker): 4.73 ms. Throughput: 211.54 iter/sec.
Timings for 4032K FFT length (20 cores, 2 workers): 6.99, 6.98 ms. Throughput: 286.28 iter/sec.
Timings for 4032K FFT length (20 cores, 6 workers): 16.60, 25.97, 18.07, 19.64, 19.20, 19.79 ms. Throughput: 307.60 iter/sec.
Timings for 4032K FFT length (20 cores, 20 workers): 76.88, 79.40, 79.36, 36.92, 76.59, 60.98, 63.29, 47.68, 78.14, 78.19, 48.42, 57.99, 78.47, 61.37, 78.25, 62.23, 44.51, 79.94, 77.13, 78.79 ms. Throughput: 313.53 iter/sec.
Timings for 4096K FFT length (20 cores, 1 worker): 5.18 ms. Throughput: 193.18 iter/sec.
Timings for 4096K FFT length (20 cores, 2 workers): 7.31, 7.29 ms. Throughput: 274.03 iter/sec.
Timings for 4096K FFT length (20 cores, 6 workers): 22.95, 20.14, 18.22, 22.82, 22.11, 16.91 ms. Throughput: 296.26 iter/sec.
[Wed Nov 29 14:56:14 2017]
Timings for 4096K FFT length (20 cores, 20 workers): 79.73, 79.14, 77.49, 78.53, 79.39, 39.13, 39.36, 79.21, 79.83, 79.68, 71.10, 66.34, 59.75, 70.98, 79.29, 55.76, 79.08, 40.79, 79.47, 78.83 ms. Throughput: 305.01 iter/sec.
Timings for 4480K FFT length (20 cores, 1 worker): 5.38 ms. Throughput: 185.84 iter/sec.
Timings for 4480K FFT length (20 cores, 2 workers): 7.49, 7.47 ms. Throughput: 267.44 iter/sec.
Timings for 4480K FFT length (20 cores, 6 workers): 25.92, 18.35, 20.65, 20.54, 24.16, 19.28 ms. Throughput: 283.43 iter/sec.
Timings for 4480K FFT length (20 cores, 20 workers): 41.13, 83.66, 41.92, 82.51, 83.57, 81.01, 83.30, 83.00, 83.33, 83.33, 48.19, 83.46, 83.62, 82.28, 82.34, 83.66, 83.56, 69.19, 74.60, 40.70 ms. Throughput: 289.94 iter/sec.
Timings for 4608K FFT length (20 cores, 1 worker): 5.61 ms. Throughput: 178.17 iter/sec.
Timings for 4608K FFT length (20 cores, 2 workers): 7.90, 7.89 ms. Throughput: 253.43 iter/sec.
Timings for 4608K FFT length (20 cores, 6 workers): 25.14, 22.88, 19.34, 23.64, 25.61, 18.41 ms. Throughput: 270.86 iter/sec.
Timings for 4608K FFT length (20 cores, 20 workers): 86.80, 86.17, 87.27, 86.26, 88.85, 42.12, 86.63, 79.47, 44.13, 88.85, 88.15, 87.34, 87.18, 42.15, 86.41, 88.15, 41.88, 86.70, 86.29, 86.48 ms. Throughput: 278.69 iter/sec.
Timings for 4800K FFT length (20 cores, 1 worker): 5.66 ms. Throughput: 176.62 iter/sec.
Timings for 4800K FFT length (20 cores, 2 workers): 8.23, 8.19 ms. Throughput: 243.52 iter/sec.
Timings for 4800K FFT length (20 cores, 6 workers): 25.57, 26.91, 18.84, 23.79, 23.07, 22.78 ms. Throughput: 258.64 iter/sec.
Timings for 4800K FFT length (20 cores, 20 workers): 93.11, 90.29, 91.82, 47.44, 59.41, 94.25, 50.78, 92.35, 92.56, 85.95, 59.33, 90.55, 94.93, 42.52, 90.34, 92.09, 91.99, 94.98, 90.94, 54.14 ms. Throughput: 268.94 iter/sec.
Timings for 5120K FFT length (20 cores, 1 worker): 6.47 ms. Throughput: 154.51 iter/sec.
Timings for 5120K FFT length (20 cores, 2 workers): 9.21, 9.16 ms. Throughput: 217.75 iter/sec.
Timings for 5120K FFT length (20 cores, 6 workers): 30.00, 30.28, 20.17, 34.15, 22.34, 23.50 ms. Throughput: 232.52 iter/sec.
Timings for 5120K FFT length (20 cores, 20 workers): 49.59, 101.60, 101.01, 100.88, 101.62, 99.63, 98.00, 100.16, 48.96, 101.52, 99.51, 91.20, 49.11, 101.38, 96.45, 101.02, 100.16, 100.72, 50.93, 101.14 ms. Throughput: 241.10 iter/sec.
Timings for 5376K FFT length (20 cores, 1 worker): 6.53 ms. Throughput: 153.05 iter/sec.
[Wed Nov 29 15:01:24 2017]
Timings for 5376K FFT length (20 cores, 2 workers): 9.34, 9.31 ms. Throughput: 214.41 iter/sec.
Timings for 5376K FFT length (20 cores, 6 workers): 25.42, 30.07, 24.28, 21.27, 34.93, 25.79 ms. Throughput: 228.21 iter/sec.
Timings for 5376K FFT length (20 cores, 20 workers): 69.55, 77.28, 96.74, 104.01, 102.45, 58.99, 74.29, 102.78, 103.01, 94.24, 102.66, 101.69, 103.75, 71.43, 88.18, 57.02, 63.89, 103.12, 90.30, 103.89 ms. Throughput: 235.63 iter/sec.
Timings for 5760K FFT length (20 cores, 1 worker): 7.09 ms. Throughput: 141.02 iter/sec.
Timings for 5760K FFT length (20 cores, 2 workers): 9.66, 9.61 ms. Throughput: 207.63 iter/sec.
Timings for 5760K FFT length (20 cores, 6 workers): 31.77, 30.04, 21.98, 32.62, 23.01, 27.46 ms. Throughput: 220.78 iter/sec.
Timings for 5760K FFT length (20 cores, 20 workers): 63.67, 107.85, 107.65, 61.85, 106.82, 65.99, 100.42, 108.34, 108.39, 105.96, 106.88, 107.29, 107.29, 107.88, 86.25, 109.04, 107.83, 53.38, 109.05, 56.44 ms. Throughput: 225.73 iter/sec.
Timings for 6144K FFT length (20 cores, 1 worker): 7.70 ms. Throughput: 129.90 iter/sec.
Timings for 6144K FFT length (20 cores, 2 workers): 11.12, 11.12 ms. Throughput: 179.86 iter/sec.
Timings for 6144K FFT length (20 cores, 6 workers): 28.95, 40.99, 26.71, 41.50, 35.84, 22.80 ms. Throughput: 192.23 iter/sec.
Timings for 6144K FFT length (20 cores, 20 workers): 66.36, 125.75, 124.54, 83.76, 123.24, 123.94, 74.74, 124.56, 125.17, 100.65, 130.60, 125.03, 105.78, 123.31, 122.69, 65.99, 124.44, 130.60, 59.97, 110.87 ms. Throughput: 196.41 iter/sec.
Timings for 6400K FFT length (20 cores, 1 worker): 7.74 ms. Throughput: 129.23 iter/sec.
Timings for 6400K FFT length (20 cores, 2 workers): 11.50, 11.42 ms. Throughput: 174.47 iter/sec.
Timings for 6400K FFT length (20 cores, 6 workers): 43.84, 33.20, 25.16, 34.96, 33.42, 28.68 ms. Throughput: 186.05 iter/sec.
Timings for 6400K FFT length (20 cores, 20 workers): 67.83, 59.96, 129.29, 122.52, 133.75, 132.97, 126.52, 126.06, 133.96, 100.43, 127.68, 58.84, 135.45, 133.65, 130.27, 133.77, 135.44, 129.95, 128.17, 58.70 ms. Throughput: 190.33 iter/sec.
Timings for 6720K FFT length (20 cores, 1 worker): 8.31 ms. Throughput: 120.40 iter/sec.
Timings for 6720K FFT length (20 cores, 2 workers): 11.53, 11.37 ms. Throughput: 174.72 iter/sec.
[Wed Nov 29 15:06:26 2017]
Timings for 6720K FFT length (20 cores, 6 workers): 27.68, 43.63, 30.57, 42.24, 32.78, 25.91 ms. Throughput: 184.54 iter/sec.
Timings for 6720K FFT length (20 cores, 20 workers): 129.37, 129.04, 115.29, 130.64, 68.38, 129.43, 119.89, 62.04, 128.76, 127.72, 129.64, 126.78, 127.53, 128.82, 76.40, 61.43, 128.18, 129.49, 101.97, 111.66 ms. Throughput: 189.07 iter/sec.
Timings for 6912K FFT length (20 cores, 1 worker): 8.58 ms. Throughput: 116.49 iter/sec.
Timings for 6912K FFT length (20 cores, 2 workers): 13.05, 12.98 ms. Throughput: 153.71 iter/sec.
Timings for 6912K FFT length (20 cores, 6 workers): 35.86, 37.57, 37.19, 45.06, 46.26, 26.01 ms. Throughput: 163.65 iter/sec.
Timings for 6912K FFT length (20 cores, 20 workers): 155.31, 65.91, 158.20, 158.95, 160.21, 158.13, 160.28, 158.85, 155.31, 66.22, 73.09, 152.40, 150.07, 152.12, 155.50, 123.06, 150.83, 75.56, 116.95, 157.62 ms. Throughput: 163.66 iter/sec.
Timings for 7168K FFT length (20 cores, 1 worker): 8.95 ms. Throughput: 111.73 iter/sec.
Timings for 7168K FFT length (20 cores, 2 workers): 13.20, 13.23 ms. Throughput: 151.34 iter/sec.
Timings for 7168K FFT length (20 cores, 6 workers): 50.22, 34.11, 32.42, 37.41, 37.99, 36.57 ms. Throughput: 160.47 iter/sec.
Timings for 7168K FFT length (20 cores, 20 workers): 69.52, 151.76, 151.72, 151.10, 153.70, 152.53, 151.18, 69.46, 153.64, 152.80, 149.58, 147.73, 144.40, 148.19, 149.92, 149.94, 68.87, 72.03, 140.11, 148.58 ms. Throughput: 164.05 iter/sec.
Timings for 7680K FFT length (20 cores, 1 worker): 9.23 ms. Throughput: 108.37 iter/sec.
Timings for 7680K FFT length (20 cores, 2 workers): 14.34, 14.34 ms. Throughput: 139.47 iter/sec.
Timings for 7680K FFT length (20 cores, 6 workers): 50.29, 31.75, 43.11, 54.94, 33.58, 39.34 ms. Throughput: 147.98 iter/sec.
Timings for 7680K FFT length (20 cores, 20 workers): 167.94, 71.75, 93.29, 109.12, 179.30, 179.36, 176.81, 171.65, 166.21, 176.98, 91.27, 102.03, 179.94, 163.99, 167.54, 168.75, 168.41, 163.89, 152.53, 81.45 ms. Throughput: 149.26 iter/sec.
Timings for 8000K FFT length (20 cores, 1 worker): 9.82 ms. Throughput: 101.84 iter/sec.
Timings for 8000K FFT length (20 cores, 2 workers): 14.04, 13.93 ms. Throughput: 143.02 iter/sec.
Timings for 8000K FFT length (20 cores, 6 workers): 50.13, 49.67, 27.40, 49.86, 37.47, 34.30 ms. Throughput: 152.48 iter/sec.
[Wed Nov 29 15:11:36 2017]
Timings for 8000K FFT length (20 cores, 20 workers): 154.19, 72.59, 158.16, 163.38, 72.88, 155.24, 151.27, 148.87, 163.62, 159.39, 99.64, 155.69, 159.07, 98.74, 158.52, 163.94, 164.93, 164.83, 89.34, 113.56 ms. Throughput: 155.99 iter/sec.
Timings for 8192K FFT length (20 cores, 1 worker): 10.50 ms. Throughput: 95.27 iter/sec.
Timings for 8192K FFT length (20 cores, 2 workers): 15.86, 15.82 ms. Throughput: 126.27 iter/sec.
Timings for 8192K FFT length (20 cores, 6 workers): 46.48, 43.33, 46.01, 60.91, 42.41, 37.49 ms. Throughput: 133.00 iter/sec.
Timings for 8192K FFT length (20 cores, 20 workers): 184.80, 93.81, 182.43, 187.74, 145.93, 113.38, 189.01, 148.27, 187.70, 139.96, 187.41, 182.88, 189.76, 183.01, 187.30, 87.91, 166.80, 135.81, 96.36, 189.81 ms. Throughput: 134.32 iter/sec.
</snip>
daxmick is offline   Reply With Quote
Old 2017-11-30, 00:24   #8
Prime95
P90 years forever!
 
Prime95's Avatar
 
Aug 2002
Yeehaw, FL

715910 Posts
Default

Quote:
Originally Posted by daxmick View Post
So, RAM is included in the calculation? That adds to the question then... how much RAM per core should I account for? Or is it RAM per worker? I have up to 128GB of RAM available.

I wasn't able to adjust the workers for the Throughput benchmark. It was 1,2,6,20 cores (currently with 32GB RAM). Unfortunately I don't know how this program works well enough to really read the results (unless it is just the max iter/sec value). Maybe someone can decode/explain it?
RAM is irrelevant. Prime95 will use on the order of 50MB per worker.

Yes, it simply is a case of maximizing the throughput (iter/sec) value. Which in your case seems heavily skewed to one core per worker. I'd try benching the 5, 10, 20 worker case just to be sure (I previously suggested 6,12 because I thought you had a 24 worker case).

Assuming the 20 worker benchmark maintains the best throughput, the only question remaining is "do you have the patience to wait for 20 workers to plod along at a slow pace before getting any results?". GIMPS is better off with 4 completed results after a week's time rather than 20 abandoned partially completed results in a week's time.
Prime95 is offline   Reply With Quote
Old 2017-11-30, 00:51   #9
daxmick
 
daxmick's Avatar
 
Feb 2014

8410 Posts
Default

Quote:
Originally Posted by Prime95 View Post
Assuming the 20 worker benchmark maintains the best throughput, the only question remaining is "do you have the patience to wait for 20 workers to plod along at a slow pace before getting any results?". GIMPS is better off with 4 completed results after a week's time rather than 20 abandoned partially completed results in a week's time.
I'm not worried about how many or how long. I'm looking for best overall performance in the long term. In other words, if it is more efficient to "plod through" 20 concurrent tests over several weeks vs. 4 concurrent tests in just a few days then I'd do the 20. BUT if I can do multiple "4 concurrent tests" (in this case more than 5 in series) faster than the 20 concurrent then I should choose to use 4 workers, Yes?

Just trying to figure out how to read the results output and decide which is best to do.
daxmick is offline   Reply With Quote
Old 2017-11-30, 00:54   #10
retina
Undefined
 
retina's Avatar
 
"The unspeakable one"
Jun 2006
My evil lair

10110111101112 Posts
Default

Quote:
Originally Posted by daxmick View Post
I'm not worried about how many or how long. I'm looking for best overall performance in the long term. In other words, if it is more efficient to "plod through" 20 concurrent tests over several weeks vs. 4 concurrent tests in just a few days then I'd do the 20. BUT if I can do multiple "4 concurrent tests" (in this case more than 5 in series) faster than the 20 concurrent then I should choose to use 4 workers, Yes?

Just trying to figure out how to read the results output and decide which is best to do.
If your only goal is to maximise throughput over the long term then you only need to look at the value "iter/sec" and aim to maximise it.

Last fiddled with by retina on 2017-11-30 at 00:54
retina is offline   Reply With Quote
Old 2017-11-30, 00:56   #11
daxmick
 
daxmick's Avatar
 
Feb 2014

22·3·7 Posts
Default

Quote:
Originally Posted by retina View Post
If your only goal is to maximise throughput over the long term then you only need to look at the value "iter/sec" and aim to maximise it.
Which, from the above results output, appears to be 20 cores and 6 workers, yes? (Which happens to be the suggested number of workers when I first started the program.)
daxmick is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
Change number of workers not interactively Mcbsc Software 1 2015-03-09 14:25
Number of CPUs to use houding Software 3 2015-02-26 19:56
Number of distinct prime factors of a Double Mersenne number aketilander Operazione Doppi Mersennes 1 2012-11-09 21:16
command line switch for the number of workers to start roemer2201 Software 6 2012-02-16 07:47
Fermat number F6=18446744073709551617 is a composite number. Proof. literka Factoring 5 2012-01-30 12:28

All times are UTC. The time now is 23:55.

Sun Nov 29 23:55:48 UTC 2020 up 80 days, 21:06, 3 users, load averages: 1.43, 1.38, 1.36

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2020, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.