mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Software

Reply
 
Thread Tools
Old 2021-04-11, 18:05   #1
drkirkby
 
"David Kirkby"
Jan 2021
Althorne, Essex, UK

C016 Posts
Default Running both CPUs on dual CPU system is slower than using one CPU

I have a Dell 7920 with dual Intel Xeon 8167M CPUs (2.0 GHz, 24-core). Running one worker, with 26 cores results in an iteration time of around 2.5 ms on a 110 million exponent. However, if I try to run 26 other cores, the time increases to about 8 ms or so. I've tried running two completely independant copies of mprime in different dirrectories That does still screws things up. I have tried forcing one copy of mprime to use a particular CPU, and the memory associated with that CPU, but it does not help.



The best throughput seems to be to run multiple workers, with the number of cores not exceeding 26.



There is something wrong with this Dell 7920. I believe there's a fault on the motherboard, but I can't get Dell to look at it, despite its under warranty. The fault only shows up when the 4th DIMM slot on CPU0 is occupied. But I have 3rd party RAM. I only have two pieces of Dell RAM, so I am stuck. The memory configuration is



CPU0 4 x 32 GB
CPU1 6 x 32 GB.



I'm guessing the memory problem might be the cause that two CPUs work slowly, but any other suggestions that might improve things would be welcome.



Dave
drkirkby is online now   Reply With Quote
Old 2021-04-11, 18:55   #2
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

117738 Posts
Default

https://ark.intel.com/content/www/us...-2-10-ghz.html 24 cores, hyperthreading won't help with P-1 or PRP or LL throughput. Cache is cpu-package-local. Communication between packages is slower than within a cpu package.
It's common knowledge to run at least as many prime95 / mprime workers as cpu packages, for performance.
I'd expect a dual cpu system with 24 cores per cpu to be max throughput at all cores running, no use of hyperthreading, and the number of workers optimal to be a function of fft length. See the benchmarks attached at https://www.mersenneforum.org/showpo...18&postcount=4 https://www.mersenneforum.org/showpo...19&postcount=5 https://www.mersenneforum.org/showpo...4&postcount=11
Do not overload the cpu cores 24x2 but 26+26 as you posted.
Madpoo has tested dual 14-core and found a fastest iteration timing around 20 cores there in one worker; that means 6 cores communicating with the rest over the connection between cpu sockets. Fastest iteration time, most energy-efficient, and maximum throughput for a system are different conditions.

Last fiddled with by kriesel on 2021-04-11 at 19:14
kriesel is online now   Reply With Quote
Old 2021-04-11, 18:58   #3
a1call
 
a1call's Avatar
 
"Rashid Naimi"
Oct 2015
Remote to Here/There

2×1,009 Posts
Default

This may or may not work, but I would setup a virtual machine and run the program in there. I assume you will be able to pool all the cores into a single virtual CPU that way.
Others might correct me on that.

ETA Nevermind then, this was a cross-post.

ETA II


ETA III On my Ryzen 9 system with 16 cores and 32 hyper-threads the fastest total performance seems to be at just above 16 threads. Say 19 threads is marginally faster than 29 threads or 16 threads but the CPU runs significantly hotter than 29 threads. As a result I prefer to run at higher number of threads to keep the CPU at about 58° C. Total performance is not significantly reduced in my observation.
The caveats are that I don't run GIMPS and that I run everything on virtual machines.

Last fiddled with by a1call on 2021-04-11 at 19:34
a1call is offline   Reply With Quote
Old 2021-04-11, 19:41   #4
drkirkby
 
"David Kirkby"
Jan 2021
Althorne, Essex, UK

26·3 Posts
Default

Quote:
Originally Posted by kriesel View Post
https://ark.intel.com/content/www/us...-2-10-ghz.html 24 cores, hyperthreading won't help with P-1 or PRP or LL throughput.
Hyperthreading was not being used.

Quote:
Originally Posted by kriesel View Post
Do not overload the cpu cores 24x2 but 26+26 as you posted.
Madpoo has tested dual 14-core and found a fastest iteration timing around 20 cores there in one worker; that means 6 cores communicating with the rest over the connection between cpu sockets. Fastest iteration time, most energy-efficient, and maximum throughput for a system are different conditions.
The 24 was a typo. The CPUs have 26 cores each.

https://www.cpubenchmark.net/cpu.php....00GHz&id=3389


This is a "secret" OEM CPU for which Intel will release no information. It might have some unusual characteristics. I believe some of the CPUs in this range have two links between them, and others three. I suspect this has two, but I don't know.

I really need to get the motherboard sorted out, but Dell charge a fortune for RAM. The cheapest way I could reproduce the problem with Dell RAM would be to buy two new 8 GB RAM modules. These are over £250 (around $300 USD) in the UK. Absolutely crazy for 8 GB modules. Dell support is absolutely ****.

Last fiddled with by drkirkby on 2021-04-11 at 19:42
drkirkby is online now   Reply With Quote
Old 2021-04-11, 20:33   #5
a1call
 
a1call's Avatar
 
"Rashid Naimi"
Oct 2015
Remote to Here/There

2·1,009 Posts
Default

Could throttling due to excessive heat be the reason for significant increased timings per iterations?
Do the CPU'S maintain their temperature when all cores are utilized?

Last fiddled with by a1call on 2021-04-11 at 20:34
a1call is offline   Reply With Quote
Old 2021-04-18, 18:19   #6
drkirkby
 
"David Kirkby"
Jan 2021
Althorne, Essex, UK

26·3 Posts
Default

Quote:
Originally Posted by a1call View Post
Could throttling due to excessive heat be the reason for significant increased timings per iterations?
Do the CPU'S maintain their temperature when all cores are utilized?
I have not checked the temperature, but I would doubt that to be an issue. The Dell 7920 is sold by Dell with CPUs up to 205 W, but the machines are often on eBay with 250 W CPUs. The CPUs I have are only 150 W. The machine is nowhere near fully loaded.

I seem to have sorted out an optimal strategy for at least 110 million exponents

1) One one copy of mprime.
2) Run two workers.

Each worker takes about two days to generate a PRP test of a 110 million digit exponent, so I should be able to do about 1 per day on average.

I've currently got one worker doing 110904847 and the other doing 332646233, so very different sizes of FFTs. I have not tried benchmarking to see if there's any advantage working any differently. Once the 332646233 is finished, it will be the last large exponent I attempt for a very long time. As exponents for testing get larger, it may be a different stratergy will work better.

I thought running two copies of mprime, and forcing one to use one CPU and the RAM from that CPU was logical. But it actually works quite poorly.

Dave

Last fiddled with by drkirkby on 2021-04-18 at 18:20
drkirkby is online now   Reply With Quote
Old 2021-04-20, 02:03   #7
LaurV
Romulan Interpreter
 
LaurV's Avatar
 
Jun 2011
Thailand

5×1,889 Posts
Default

Just to point out that "the amount of power it sucks" and "how hot it gets" are, in theory, and 99.9999 percent of the time in practice too, unrelated. I can put "one volt/one ampere" through a "one ohm/one watt" resistor and make it 2000°C "one red hot son of a pepper" in few minutes, with no fan blowing on it. How hot your CPU gets has nothing to do with the power it consumes, but with metal blocks, thermal paste, fans, air, water, radiators, thermal paste (did I say that?), dust clogs, dead cockroaches under the chipset carcass, etc., i.e. how able you are to remove the heat it produces, fast and efficient. Do you think the 5000 MEGA-Watts power generators at the power factories get so much hotter than your computer?

Last fiddled with by LaurV on 2021-04-20 at 02:07
LaurV is offline   Reply With Quote
Old 2021-04-20, 13:06   #8
drkirkby
 
"David Kirkby"
Jan 2021
Althorne, Essex, UK

26×3 Posts
Default

Quote:
Originally Posted by LaurV View Post
Just to point out that "the amount of power it sucks" and "how hot it gets" are, in theory, and 99.9999 percent of the time in practice too, unrelated. I can put "one volt/one ampere" through a "one ohm/one watt" resistor and make it 2000°C "one red hot son of a pepper" in few minutes, with no fan blowing on it. How hot your CPU gets has nothing to do with the power it consumes, but with metal blocks, thermal paste, fans, air, water, radiators, thermal paste (did I say that?), dust clogs, dead cockroaches under the chipset carcass, etc., i.e. how able you are to remove the heat it produces, fast and efficient. Do you think the 5000 MEGA-Watts power generators at the power factories get so much hotter than your computer?
I will see if I can find a way of checking the temperature - I'm unaware of a bit of Linux code, but I'm sure I will be able to find something.

I accept that fans can get blocked but there are several things that make me think it's not a thermal problem.

a) It's a Dell 7920 workstation in a very large case, that's designed to take a lot of parts - about 10 disks, 3 TB RAM, 2 CPUs, 24 RDIMMS, loads of PCI slots. A fully configured one of these is over $100,000. Mine is virtually empty.

b) There are around 10 fans, which speed up and get very noisy if the machine is pushed. I have only heard that when running diagnostics. It is not making much noise.

c) It's not overclocked.

FWIW, here's a video about it. There's also a rackmount version which has dual power supplies.

https://www.youtube.com/watch?v=jP65i_Iqml8


Dave
drkirkby is online now   Reply With Quote
Old 2021-04-20, 13:30   #9
Uncwilly
6809 > 6502
 
Uncwilly's Avatar
 
"""""""""""""""""""
Aug 2003
101×103 Posts

5×1,913 Posts
Default

Quote:
Originally Posted by drkirkby View Post
b) There are around 10 fans, which speed up and get very noisy if the machine is pushed. I have only heard that when running diagnostics. It is not making much noise.
The CPU or GPU coolers might be more of an issue than the case fans.
Uncwilly is offline   Reply With Quote
Old 2021-04-20, 13:47   #10
drkirkby
 
"David Kirkby"
Jan 2021
Althorne, Essex, UK

26·3 Posts
Default

Quote:
Originally Posted by Uncwilly View Post
The CPU or GPU coolers might be more of an issue than the case fans.
Yes, something does seem to be amiss. I installed lm-sensors a minute ago, and see the 2nd CPU is significantly hotter than the first - see results at bottom of the post. Although both are within acceptable limits, the 2nd CPU is around 30 deg C hotter than the first. However, there's something amiss with the program, as its showing 29 cores for each CPU, but the CPUs are only 26 cores, not 29. Perhaps the other 3-cores are not really cores, but sensors in other parts of the CPU - L2 and L3 cache for example.

There is something a bit weird about the heatsinks in this machine, as the arrows on them both show the airflow towards the centre of the chassis. However, my attempts to get from Dell the correct part number of the heatsink have failed.

a) They gave me a part number, but they did not have any. When I looked on eBay for the part, I see they were all marked "CPU0" and not "CPU1" as the second one should be.

b) I found another part number on eBay, which looked as though it was right.

c) I went back to Dell. They said they did not have any of that part either, but they were equivalent

However, I do note that the direction of airflow shown on the 2nd CPU appears to be from back to front, rather than front to back. But the heatsinks don't have fans on them - they are purely passive.



Code:
[dkirkby@jackdaw ~]$ sensors
#coretemp-isa-0000
Adapter: ISA adapter
Package id 0:  +56.0°C  (high = +89.0°C, crit = +99.0°C)
Core 0:        +56.0°C  (high = +89.0°C, crit = +99.0°C)
Core 1:        +56.0°C  (high = +89.0°C, crit = +99.0°C)
Core 2:        +55.0°C  (high = +89.0°C, crit = +99.0°C)
Core 3:        +54.0°C  (high = +89.0°C, crit = +99.0°C)
Core 4:        +54.0°C  (high = +89.0°C, crit = +99.0°C)
Core 5:        +54.0°C  (high = +89.0°C, crit = +99.0°C)
Core 6:        +53.0°C  (high = +89.0°C, crit = +99.0°C)
Core 8:        +52.0°C  (high = +89.0°C, crit = +99.0°C)
Core 9:        +55.0°C  (high = +89.0°C, crit = +99.0°C)
Core 10:       +55.0°C  (high = +89.0°C, crit = +99.0°C)
Core 11:       +54.0°C  (high = +89.0°C, crit = +99.0°C)
Core 12:       +53.0°C  (high = +89.0°C, crit = +99.0°C)
Core 13:       +52.0°C  (high = +89.0°C, crit = +99.0°C)
Core 16:       +54.0°C  (high = +89.0°C, crit = +99.0°C)
Core 17:       +55.0°C  (high = +89.0°C, crit = +99.0°C)
Core 18:       +56.0°C  (high = +89.0°C, crit = +99.0°C)
Core 19:       +54.0°C  (high = +89.0°C, crit = +99.0°C)
Core 20:       +52.0°C  (high = +89.0°C, crit = +99.0°C)
Core 21:       +54.0°C  (high = +89.0°C, crit = +99.0°C)
Core 22:       +50.0°C  (high = +89.0°C, crit = +99.0°C)
Core 24:       +54.0°C  (high = +89.0°C, crit = +99.0°C)
Core 25:       +53.0°C  (high = +89.0°C, crit = +99.0°C)
Core 26:       +54.0°C  (high = +89.0°C, crit = +99.0°C)
Core 27:       +54.0°C  (high = +89.0°C, crit = +99.0°C)
Core 28:       +54.0°C  (high = +89.0°C, crit = +99.0°C)
Core 29:       +53.0°C  (high = +89.0°C, crit = +99.0°C)

coretemp-isa-0001
Adapter: ISA adapter
Package id 1:  +85.0°C  (high = +89.0°C, crit = +99.0°C)
Core 0:        +81.0°C  (high = +89.0°C, crit = +99.0°C)
Core 1:        +79.0°C  (high = +89.0°C, crit = +99.0°C)
Core 2:        +79.0°C  (high = +89.0°C, crit = +99.0°C)
Core 3:        +84.0°C  (high = +89.0°C, crit = +99.0°C)
Core 4:        +83.0°C  (high = +89.0°C, crit = +99.0°C)
Core 5:        +80.0°C  (high = +89.0°C, crit = +99.0°C)
Core 6:        +76.0°C  (high = +89.0°C, crit = +99.0°C)
Core 8:        +82.0°C  (high = +89.0°C, crit = +99.0°C)
Core 9:        +82.0°C  (high = +89.0°C, crit = +99.0°C)
Core 10:       +81.0°C  (high = +89.0°C, crit = +99.0°C)
Core 11:       +75.0°C  (high = +89.0°C, crit = +99.0°C)
Core 12:       +75.0°C  (high = +89.0°C, crit = +99.0°C)
Core 13:       +75.0°C  (high = +89.0°C, crit = +99.0°C)
Core 16:       +82.0°C  (high = +89.0°C, crit = +99.0°C)
Core 17:       +81.0°C  (high = +89.0°C, crit = +99.0°C)
Core 18:       +80.0°C  (high = +89.0°C, crit = +99.0°C)
Core 19:       +74.0°C  (high = +89.0°C, crit = +99.0°C)
Core 20:       +74.0°C  (high = +89.0°C, crit = +99.0°C)
Core 21:       +81.0°C  (high = +89.0°C, crit = +99.0°C)
Core 22:       +77.0°C  (high = +89.0°C, crit = +99.0°C)
Core 24:       +82.0°C  (high = +89.0°C, crit = +99.0°C)
Core 25:       +85.0°C  (high = +89.0°C, crit = +99.0°C)
Core 26:       +83.0°C  (high = +89.0°C, crit = +99.0°C)
Core 27:       +81.0°C  (high = +89.0°C, crit = +99.0°C)
Core 28:       +76.0°C  (high = +89.0°C, crit = +99.0°C)
Core 29:       +75.0°C  (high = +89.0°C, crit = +99.0°C)
drkirkby is online now   Reply With Quote
Old 2021-04-20, 17:21   #11
Xyzzy
 
Xyzzy's Avatar
 
"Mike"
Aug 2002

177438 Posts
Default

As root, did you run sensors-detect --auto?
Xyzzy is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
What's the best configuration of mprime for dual CPUs? drkirkby Software 13 2021-04-18 18:28
Slower mprime iteration times after running needed benchmarks? PhilF Software 6 2021-01-24 18:33
Identical PenD Dual-Core CPUs, one is 2.25x faster NBtarheel_33 Hardware 5 2008-11-12 03:24
Dual CPU: 2 copies run slower? db597 Hardware 20 2007-06-07 18:49
Dual CPUs and Hyperthreading Unregistered Hardware 34 2004-09-27 08:56

All times are UTC. The time now is 10:53.

Tue May 11 10:53:41 UTC 2021 up 33 days, 5:34, 1 user, load averages: 1.89, 2.05, 1.82

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.