mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Software

Reply
 
Thread Tools
Old 2021-07-09, 09:26   #1
drkirkby
 
"David Kirkby"
Jan 2021
Althorne, Essex, UK

1110000002 Posts
Default Increasing memory channels, but not RAM, slows mprime.

Due to a fault on my motherboard, my computer (Dell 7920) would not power on if RAM modules were inserted in certain sockets. The CPUs have 6 memory channels, but the only way to use 6 DIMMs on the first CPU was to place them in sockets which resulted in only 4 memory channels being used. (There are 12 DIMM sockets per CPU, so I had the flexability to use the wrong sockets). The second CPU had no such issues, but the BIOS indicated only 4 of the 12 memory channels were being used, despite there were 12 DIMMs. (I would have expected 4+4=8 or 4+6=10 memory channels, but the BIOS said only 4). I would expect this to be a non-optimal configuration. With this sub-optimal memory configuration, I had run the benchmarks on mprime and found the optimal throughput was to use 2 workers. With 2 workers, each iteration of a PRP test of a 104 million exponent was taking about 1.5 ms.

The Dell motherboard was changed for an identical model and the fault went away. With the same 12 DIMMs as before, the memory channels have been increased from 4 to 12. Much to my surprise, this increased the time per iteration of mprime 30.6b4 from around 1.5 to 2.0 ms for the same exponents! Essentially the throughput had decreased. Can anyone explain why this might happen?

I re-run the benchmarks and found the configuration giving optimal throughput had changed from 2 workers to 4 workers. So I increased the number of workers from 2 to 4. Unsurprisingly reducing the number of cores per worker from 26 to 13 increased the iteration time further. The iteration time is now 3 ms.

Since now 4 exponents are being tested simultaneously, at 3 ms/iteration, this is essentially the same total throughput as testing 2 exponents at 1.5 ms/iteration. So the increase in memory channels from 4 to 12 has not resulted in any change in total throughput, but has resulted in the optimal number of workers changing from 2 to 4.

The CPUs are only clocking the RAM at 2400 MHz, not the 2933 MHz that the motherboard is capable of. But that's due to CPUs I have. I'm assuming this means that the performance of the computer is not set by memory bandwidth, but by CPU speed (only 2.0 GHz). But I'm still puzzled why changing the memory channels from 4 to 12 resulted in decreased throughput until I increased the number of workers from 2 to 4.

Any thoughts?

Last fiddled with by drkirkby on 2021-07-09 at 09:36
drkirkby is offline   Reply With Quote
Old 2021-07-09, 09:48   #2
paulunderwood
 
paulunderwood's Avatar
 
Sep 2002
Database er0rr

5·787 Posts
Default

I doubt your Dell runs "12 channel" -- most likely it will be a quad channel box. Check you motherboard manual for optimal channel settings. (If you find a manual online please provide a link to it.)

Last fiddled with by paulunderwood on 2021-07-09 at 09:53
paulunderwood is offline   Reply With Quote
Old 2021-07-09, 10:21   #3
axn
 
axn's Avatar
 
Jun 2003

144616 Posts
Default

How much L3 does your CPUs have? The more L3 you have, the less the impact of poor memory bandwidth.

Also, previously, was the memory running at 2400 only or was it faster?

Finally, if truly 12 ram channels are active, you might look at other configurations with more workers.
axn is online now   Reply With Quote
Old 2021-07-09, 10:22   #4
ATH
Einyen
 
ATH's Avatar
 
Dec 2003
Denmark

2×3×13×41 Posts
Default

I have only heard about up to 8-channel memory so far on the newest servers.
Searching for Dell 7920:
https://i.dell.com/sites/csdocuments...Spec-Sheet.pdf
Quote:
Memory Options1 Six channel memory up to 1.5TB 2666MHz DDR4
ECC memory with dual CPUs, up to 3TB with
select CPU SKUs
Up to 768GB of 2933MHz DDR4 ECC memory 24
DIMM Slots (12 DIMMs per CPU).
Note: memory speed is dependent on specific
Intel Xeon Scalable Processor CPU installed
To get 6-channel memory, I think you need to install 6 DIMMs in every 2nd slot or all 12 DIMMs, and it should be the exact same type of RAM.


Try CPU-Z: https://www.cpuid.com/
You do not have to install anything, there is a zip file with CPU-Z:
https://download.cpuid.com/cpu-z/cpu-z_1.96-en.zip

Just run the "cpuz_x64.exe" and on the Memory tab it will show you how many memory channels you use, mine is running 4-channel memory (quad):
Attached Thumbnails
Click image for larger version

Name:	Memory.jpg
Views:	54
Size:	61.9 KB
ID:	25248  

Last fiddled with by ATH on 2021-07-09 at 10:23
ATH is offline   Reply With Quote
Old 2021-07-09, 10:35   #5
drkirkby
 
"David Kirkby"
Jan 2021
Althorne, Essex, UK

26×7 Posts
Default

Quote:
Originally Posted by paulunderwood View Post
I doubt your Dell runs "12 channel" -- most likely it will be a quad channel box. Check you motherboard manual for optimal channel settings. (If you find a manual online please provide a link to it.)
Attached is a photograph I took of the BIOS, showing 12 memory channels in use. The particular CPU is not generally available, so there's no information on the Intel website, but other sources indicate the CPU has 6 memory channels.
https://www.cpu-world.com/CPUs/Xeon/...n%208167M.html

Here's the user manual.
https://dl.dell.com/topicspdf/precis...nual_en-us.pdf
Page 98 says up to 6 memory channels per CPU. Someone kindly sent me a technical reference on this workstation, which Dell don't make public, which I can't send at the moment, but I will send a link later. But I think the photograph shows it is using 12 memory channels.

Note there's a rackmount version of this, so if you Google it, ignore the rackmount version, although I think they are pretty similar. The rackmount version has dual PSUs and enhanced security features, whereas the tower does not.
Attached Thumbnails
Click image for larger version

Name:	unnamed.jpg
Views:	74
Size:	943.7 KB
ID:	25247  

Last fiddled with by drkirkby on 2021-07-09 at 10:36
drkirkby is offline   Reply With Quote
Old 2021-07-09, 10:49   #6
drkirkby
 
"David Kirkby"
Jan 2021
Althorne, Essex, UK

44810 Posts
Default

Quote:
Originally Posted by axn View Post
How much L3 does your CPUs have? The more L3 you have, the less the impact of poor memory bandwidth.

Also, previously, was the memory running at 2400 only or was it faster?

Finally, if truly 12 ram channels are active, you might look at other configurations with more workers.
There's 35.75 MB of L3 cache per CPU. The 2400 MHz is a limitation of the CPU - other CPUs in the Xeon gold or platinum range run the RAM up to 2933 MHz, but they are quite expensive CPUs, whereas these CPUs are quite cheap. I've benchmarked more workers (I tried, 1, 2, 3 .. 52). But 4 workers gives optimal throughput.
drkirkby is offline   Reply With Quote
Old 2021-07-09, 10:55   #7
axn
 
axn's Avatar
 
Jun 2003

2·3·5·173 Posts
Default

Quote:
Originally Posted by drkirkby View Post
There's 35.75 MB of L3 cache per CPU. The 2400 MHz is a limitation of the CPU - other CPUs in the Xeon gold or platinum range run the RAM up to 2933 MHz, but they are quite expensive CPUs, whereas these CPUs are quite cheap. I've benchmarked more workers (I tried, 1, 2, 3 .. 52). But 4 workers gives optimal throughput.
I'm interested in (3 workers x 8 threads) x 2 config (yes, two cores will be idle in each CPU). How does it compare to the (2x13)x2?
axn is online now   Reply With Quote
Old 2021-07-09, 10:59   #8
drkirkby
 
"David Kirkby"
Jan 2021
Althorne, Essex, UK

26·7 Posts
Default

Quote:
Originally Posted by ATH View Post
I have only heard about up to 8-channel memory so far on the newest servers.
Searching for Dell 7920:
https://i.dell.com/sites/csdocuments...Spec-Sheet.pdf

To get 6-channel memory, I think you need to install 6 DIMMs in every 2nd slot or all 12 DIMMs, and it should be the exact same type of RAM.

Try CPU-Z: https://www.cpuid.com/
You do not have to install anything, there is a zip file with CPU-Z:
https://download.cpuid.com/cpu-z/cpu-z_1.96-en.zip

Just run the "cpuz_x64.exe" and on the Memory tab it will show you how many memory channels you use, mine is running 4-channel memory (quad):
There are 12 sockets of the 24 occupied. One DIMM is Dell, the other 11 Kingston. All DIMMs are the same size. Perhaps I should swap the Dell DIMM for a Kingston one, but there are clearly 12 channels in use, as the photograph from the BIOS shows. Even the Kingston DIMMs, although cheaper than Dell, are not exactly cheap. If it was thought to be a benefit from changing I would do so, but I doubt there is. But maybe others feel otherwise - I'm open to suggestions.

Edit - I will try those utilities later, but I need to reboot into Windows, and I have some real work to do just now.

Last fiddled with by drkirkby on 2021-07-09 at 11:06
drkirkby is offline   Reply With Quote
Old 2021-07-09, 11:01   #9
drkirkby
 
"David Kirkby"
Jan 2021
Althorne, Essex, UK

26×7 Posts
Default

Quote:
Originally Posted by axn View Post
I'm interested in (3 workers x 8 threads) x 2 config (yes, two cores will be idle in each CPU). How does it compare to the (2x13)x2?
I'll test this out later - I have to do some real work now, that pays the bills.
drkirkby is offline   Reply With Quote
Old 2021-07-09, 12:57   #10
paulunderwood
 
paulunderwood's Avatar
 
Sep 2002
Database er0rr

5×787 Posts
Default

It is two hex channel chips. It looks like you have installed the DIMMS correctly. You can run mprime/Prime95's benchmarking to automatically get the maximum throughput based on the number of workers and the current wavefront FFT sizes.

Last fiddled with by paulunderwood on 2021-07-09 at 13:01
paulunderwood is offline   Reply With Quote
Old 2021-07-09, 13:46   #11
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

22×1,481 Posts
Default

Quote:
Originally Posted by drkirkby View Post
I will try those utilities later, but I need to reboot into Windows, and I have some real work to do just now.
Surely linux has the capability tucked away somewhere. (Quick DuckDuckGo search later...)
Dmidecode appears to provide the necessary info, at least indirectly.
"Bank locator: CHAN A DIMM 0" https://www.cyberciti.biz/faq/check-ram-speed-linux/

Last fiddled with by kriesel on 2021-07-09 at 13:47
kriesel is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
Assigning too much memory slows down P-1 stage 2? ZFR Software 11 2020-12-13 10:19
Allow mprime to use more memory ZFR Software 1 2020-12-10 09:50
Mini ITX with LGA 2011 (4 memory channels) bgbeuning Hardware 7 2016-06-18 10:32
mprime checking available memory tha Software 7 2015-12-07 15:56
Cheesy memory slows down prime95? nomadicus Hardware 9 2003-03-01 00:15

All times are UTC. The time now is 02:12.


Sat Dec 4 02:12:17 UTC 2021 up 133 days, 20:41, 0 users, load averages: 1.22, 1.42, 1.40

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.