mersenneforum.org Increasing memory channels, but not RAM, slows mprime.
 User Name Remember Me? Password
 Register FAQ Search Today's Posts Mark Forums Read

 2021-07-15, 21:29 #34 sdbardwick     Aug 2002 North San Diego County 10101100012 Posts A quick read of the MAN page for NUMACTL makes me think that Code: $numactl --physcpubind=0 mprime0 -m attempts to assign mprime to core 0, not socket 0. I'd investigate cpunodebind: --cpunodebind=nodes, -N nodes Only execute process on the CPUs of nodes. Note that nodes may consist of multiple CPUs. Last fiddled with by sdbardwick on 2021-07-15 at 21:40 Reason: italics 2021-07-16, 06:51 #35 drkirkby "David Kirkby" Jan 2021 Althorne, Essex, UK 3·149 Posts Quote:  Originally Posted by sdbardwick A quick read of the MAN page for NUMACTL makes me think that Code: $ numactl --physcpubind=0 mprime0 -m attempts to assign mprime to core 0, not socket 0. I'd investigate cpunodebind: --cpunodebind=nodes, -N nodes Only execute process on the CPUs of nodes. Note that nodes may consist of multiple CPUs.

Thank you. I looked in the man page and see the -H command shows the hardware. I see there are two nodes, and each node has 52 cpus, which I assume is due to the hyperthreading.

Code:
drkirkby@canary:~$numactl -H available: 2 nodes (0-1) node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 node 0 size: 192100 MB node 0 free: 167392 MB node 1 cpus: 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 node 1 size: 193498 MB node 1 free: 175122 MB node distances: node 0 1 0: 10 21 1: 21 10 drkirkby@canary:~$
Maybe for physical CPU 0 I need to specify the cpus as 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 and 77. I wonder if 0 to 25 correspond to physical cores and 52-77 are due to the hyperthreading.

I assume the 192100 MB and 193498 MB correspond to the RAM connected to the nodes. I'm wondering why they are slightly different. i would expect 196,608 MB on each.

Thank you for your help - I will look into this more today.

 2021-07-16, 09:29 #36 fivemack (loop (#_fork))     Feb 2006 Cambridge, England 2×3×29×37 Posts Yes, the normal Linux enumeration is 0 to 2N-1 as physical cores, N in each socket, and 2N to 4N-1 as the logical cores - I determined this rather tediously by running two copies of ECM at a time and seeing which taskset arrangements made them slower. My 8167M cores have arrived; the machine uses 450W when they're installed and it's running 75 threads of lasieve (versus 250W for 40 threads of lasieve on 2x4114), and runs all six 12000rpm fans flat-out, and nonetheless the temperature sensors are mostly reading around 99C. I guess my first action should be to undo the eight Torx screws and reapply the thermal grease more carefully. And once I'm confident the cores run reasonably I should probably get another 96GB of RAM so I can run 104 threads of lasieve ... I'm not sure I want to spend a thousand pounds on second-hand RAM to get to nearly-three rather than nearly-two gigabytes per thread, I'm reasonably sure nearly-one gigabyte per thread is not enough.
2021-07-18, 11:41   #37
drkirkby

"David Kirkby"
Jan 2021
Althorne, Essex, UK

3×149 Posts

Quote:
 Originally Posted by fivemack Yes, the normal Linux enumeration is 0 to 2N-1 as physical cores, N in each socket, and 2N to 4N-1 as the logical cores - I determined this rather tediously by running two copies of ECM at a time and seeing which taskset arrangements made them slower. My 8167M cores have arrived; the machine uses 450W when they're installed and it's running 75 threads of lasieve (versus 250W for 40 threads of lasieve on 2x4114), and runs all six 12000rpm fans flat-out, and nonetheless the temperature sensors are mostly reading around 99C. I guess my first action should be to undo the eight Torx screws and reapply the thermal grease more carefully. And once I'm confident the cores run reasonably I should probably get another 96GB of RAM so I can run 104 threads of lasieve ... I'm not sure I want to spend a thousand pounds on second-hand RAM to get to nearly-three rather than nearly-two gigabytes per thread, I'm reasonably sure nearly-one gigabyte per thread is not enough.
Welcome to the 8167M owners club! I wonder if there's any more than two of us? (One could probably find out from benchmarks submitted).

I don't know what the fan speeds are on my machine. "sensors" only shows 4 fans, which range range 0 to 2235 rpm, but there are around 10 fans, so "sensors" can't be seeing them all. But the machine is very quite, even at when running flat out, but if I run the diagnostics, which does test all fans at full speed, it is noisy.

What is lasieve? A google could not find it for me.

The "sensors" program shows 30 cores for each CPU, but since there are not 30 cores, I assume some of those temperature sensors must be in places like the L2 and L3 cache. At the moment, the maximum temperature of CPU0 is 75°C and CPU1 85°C. Since the hot air from the first heatsink blows over the second heatsink, the temperature difference is hardly surprising.

It's probably worth having 1 DIMM/channel, even tbough it does not seem to be benefiting me for GIMPS. Testing 4 exponents at a time with 12 memory channels active is no more or less fast than testing two at a time with 4 memory channels active.

I've tried controlling the cores a process used using Affinity in mprime's local.txt, as well as with taskset. Based on temperature measurements of the cores, I'm not convinced any cores are more/less active than others. A GUI based system monitor shows the temperature of 108 CPUs (obviously the hyperthreading is taken into account). The load seems to switch from core to core, so I'm not convinced I am able to control this

RMIMMs are quite expensive unfortunately. I have spent far more on RAM for this workstation than for anything else. It eclipses the cost of the CPUs, GPU and the complete workstation. But I do have a genuine business need for a lot of RAM, so I can justify buying it.

Code:
drkirkby@canary:~/gimps\$ sensors
coretemp-isa-0001
Package id 1:  +85.0°C  (high = +89.0°C, crit = +99.0°C)
Core 0:        +80.0°C  (high = +89.0°C, crit = +99.0°C)
Core 1:        +81.0°C  (high = +89.0°C, crit = +99.0°C)
Core 2:        +83.0°C  (high = +89.0°C, crit = +99.0°C)
Core 3:        +80.0°C  (high = +89.0°C, crit = +99.0°C)
Core 4:        +82.0°C  (high = +89.0°C, crit = +99.0°C)
Core 5:        +85.0°C  (high = +89.0°C, crit = +99.0°C)
Core 6:        +82.0°C  (high = +89.0°C, crit = +99.0°C)
Core 8:        +80.0°C  (high = +89.0°C, crit = +99.0°C)
Core 9:        +83.0°C  (high = +89.0°C, crit = +99.0°C)
Core 10:       +83.0°C  (high = +89.0°C, crit = +99.0°C)
Core 11:       +85.0°C  (high = +89.0°C, crit = +99.0°C)
Core 12:       +84.0°C  (high = +89.0°C, crit = +99.0°C)
Core 13:       +83.0°C  (high = +89.0°C, crit = +99.0°C)
Core 16:       +83.0°C  (high = +89.0°C, crit = +99.0°C)
Core 17:       +82.0°C  (high = +89.0°C, crit = +99.0°C)
Core 18:       +84.0°C  (high = +89.0°C, crit = +99.0°C)
Core 19:       +84.0°C  (high = +89.0°C, crit = +99.0°C)
Core 20:       +84.0°C  (high = +89.0°C, crit = +99.0°C)
Core 21:       +81.0°C  (high = +89.0°C, crit = +99.0°C)
Core 22:       +81.0°C  (high = +89.0°C, crit = +99.0°C)
Core 24:       +79.0°C  (high = +89.0°C, crit = +99.0°C)
Core 25:       +81.0°C  (high = +89.0°C, crit = +99.0°C)
Core 26:       +82.0°C  (high = +89.0°C, crit = +99.0°C)
Core 27:       +85.0°C  (high = +89.0°C, crit = +99.0°C)
Core 28:       +84.0°C  (high = +89.0°C, crit = +99.0°C)
Core 29:       +84.0°C  (high = +89.0°C, crit = +99.0°C)

dell_smm-virtual-0
fan1:           0 RPM
fan2:        1726 RPM
fan3:         859 RPM

nvme-pci-10200
Composite:    +40.9°C  (low  = -20.1°C, high = +77.8°C)
(crit = +81.8°C)
Sensor 1:     +40.9°C  (low  = -273.1°C, high = +65261.8°C)

nouveau-pci-7300
fan1:        2235 RPM
temp1:        +36.0°C  (high = +95.0°C, hyst =  +3.0°C)
(crit = +105.0°C, hyst =  +5.0°C)
(emerg = +135.0°C, hyst =  +5.0°C)

coretemp-isa-0000
Package id 0:  +75.0°C  (high = +89.0°C, crit = +99.0°C)
Core 0:        +70.0°C  (high = +89.0°C, crit = +99.0°C)
Core 1:        +66.0°C  (high = +89.0°C, crit = +99.0°C)
Core 2:        +66.0°C  (high = +89.0°C, crit = +99.0°C)
Core 3:        +64.0°C  (high = +89.0°C, crit = +99.0°C)
Core 4:        +65.0°C  (high = +89.0°C, crit = +99.0°C)
Core 5:        +65.0°C  (high = +89.0°C, crit = +99.0°C)
Core 6:        +70.0°C  (high = +89.0°C, crit = +99.0°C)
Core 8:        +63.0°C  (high = +89.0°C, crit = +99.0°C)
Core 9:        +65.0°C  (high = +89.0°C, crit = +99.0°C)
Core 10:       +66.0°C  (high = +89.0°C, crit = +99.0°C)
Core 11:       +66.0°C  (high = +89.0°C, crit = +99.0°C)
Core 12:       +65.0°C  (high = +89.0°C, crit = +99.0°C)
Core 13:       +63.0°C  (high = +89.0°C, crit = +99.0°C)
Core 16:       +65.0°C  (high = +89.0°C, crit = +99.0°C)
Core 17:       +66.0°C  (high = +89.0°C, crit = +99.0°C)
Core 18:       +67.0°C  (high = +89.0°C, crit = +99.0°C)
Core 19:       +65.0°C  (high = +89.0°C, crit = +99.0°C)
Core 20:       +64.0°C  (high = +89.0°C, crit = +99.0°C)
Core 21:       +65.0°C  (high = +89.0°C, crit = +99.0°C)
Core 22:       +64.0°C  (high = +89.0°C, crit = +99.0°C)
Core 24:       +65.0°C  (high = +89.0°C, crit = +99.0°C)
Core 25:       +64.0°C  (high = +89.0°C, crit = +99.0°C)
Core 26:       +65.0°C  (high = +89.0°C, crit = +99.0°C)
Core 27:       +66.0°C  (high = +89.0°C, crit = +99.0°C)
Core 28:       +66.0°C  (high = +89.0°C, crit = +99.0°C)
Core 29:       +75.0°C  (high = +89.0°C, crit = +99.0°C)

nvme-pci-0100
Composite:    +36.9°C  (low  =  -0.1°C, high = +85.8°C)
(crit = +86.8°C)
Sensor 1:     +38.9°C  (low  = -273.1°C, high = +65261.8°C)
Sensor 2:     +33.9°C  (low  = -273.1°C, high = +65261.8°C)

 Similar Threads Thread Thread Starter Forum Replies Last Post ZFR Software 11 2020-12-13 10:19 ZFR Software 1 2020-12-10 09:50 bgbeuning Hardware 7 2016-06-18 10:32 tha Software 7 2015-12-07 15:56 nomadicus Hardware 9 2003-03-01 00:15

All times are UTC. The time now is 19:02.

Sun Oct 17 19:02:50 UTC 2021 up 86 days, 13:31, 0 users, load averages: 2.20, 1.60, 1.48