mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Software

Reply
 
Thread Tools
Old 2021-07-15, 21:29   #34
sdbardwick
 
sdbardwick's Avatar
 
Aug 2002
North San Diego County

10101100012 Posts
Default

A quick read of the MAN page for NUMACTL makes me think that
Code:
$ numactl --physcpubind=0 mprime0 -m
attempts to assign mprime to core 0, not socket 0. I'd investigate cpunodebind:

--cpunodebind=nodes, -N nodes
Only execute process on the CPUs of nodes. Note that nodes may consist of multiple CPUs.

Last fiddled with by sdbardwick on 2021-07-15 at 21:40 Reason: italics
sdbardwick is online now   Reply With Quote
Old 2021-07-16, 06:51   #35
drkirkby
 
"David Kirkby"
Jan 2021
Althorne, Essex, UK

3·149 Posts
Default

Quote:
Originally Posted by sdbardwick View Post
A quick read of the MAN page for NUMACTL makes me think that
Code:
$ numactl --physcpubind=0 mprime0 -m
attempts to assign mprime to core 0, not socket 0. I'd investigate cpunodebind:

--cpunodebind=nodes, -N nodes
Only execute process on the CPUs of nodes. Note that nodes may consist of multiple CPUs.

Thank you. I looked in the man page and see the -H command shows the hardware. I see there are two nodes, and each node has 52 cpus, which I assume is due to the hyperthreading.

Code:
drkirkby@canary:~$ numactl -H
available: 2 nodes (0-1)
node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77
node 0 size: 192100 MB
node 0 free: 167392 MB
node 1 cpus: 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103
node 1 size: 193498 MB
node 1 free: 175122 MB
node distances:
node   0   1
  0:  10  21
  1:  21  10
drkirkby@canary:~$
Maybe for physical CPU 0 I need to specify the cpus as 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 and 77. I wonder if 0 to 25 correspond to physical cores and 52-77 are due to the hyperthreading.

I assume the 192100 MB and 193498 MB correspond to the RAM connected to the nodes. I'm wondering why they are slightly different. i would expect 196,608 MB on each.

Thank you for your help - I will look into this more today.
drkirkby is offline   Reply With Quote
Old 2021-07-16, 09:29   #36
fivemack
(loop (#_fork))
 
fivemack's Avatar
 
Feb 2006
Cambridge, England

2×3×29×37 Posts
Default

Yes, the normal Linux enumeration is 0 to 2N-1 as physical cores, N in each socket, and 2N to 4N-1 as the logical cores - I determined this rather tediously by running two copies of ECM at a time and seeing which taskset arrangements made them slower.

My 8167M cores have arrived; the machine uses 450W when they're installed and it's running 75 threads of lasieve (versus 250W for 40 threads of lasieve on 2x4114), and runs all six 12000rpm fans flat-out, and nonetheless the temperature sensors are mostly reading around 99C. I guess my first action should be to undo the eight Torx screws and reapply the thermal grease more carefully. And once I'm confident the cores run reasonably I should probably get another 96GB of RAM so I can run 104 threads of lasieve ... I'm not sure I want to spend a thousand pounds on second-hand RAM to get to nearly-three rather than nearly-two gigabytes per thread, I'm reasonably sure nearly-one gigabyte per thread is not enough.
fivemack is offline   Reply With Quote
Old 2021-07-18, 11:41   #37
drkirkby
 
"David Kirkby"
Jan 2021
Althorne, Essex, UK

3×149 Posts
Default

Quote:
Originally Posted by fivemack View Post
Yes, the normal Linux enumeration is 0 to 2N-1 as physical cores, N in each socket, and 2N to 4N-1 as the logical cores - I determined this rather tediously by running two copies of ECM at a time and seeing which taskset arrangements made them slower.

My 8167M cores have arrived; the machine uses 450W when they're installed and it's running 75 threads of lasieve (versus 250W for 40 threads of lasieve on 2x4114), and runs all six 12000rpm fans flat-out, and nonetheless the temperature sensors are mostly reading around 99C. I guess my first action should be to undo the eight Torx screws and reapply the thermal grease more carefully. And once I'm confident the cores run reasonably I should probably get another 96GB of RAM so I can run 104 threads of lasieve ... I'm not sure I want to spend a thousand pounds on second-hand RAM to get to nearly-three rather than nearly-two gigabytes per thread, I'm reasonably sure nearly-one gigabyte per thread is not enough.
Welcome to the 8167M owners club! I wonder if there's any more than two of us? (One could probably find out from benchmarks submitted).

I don't know what the fan speeds are on my machine. "sensors" only shows 4 fans, which range range 0 to 2235 rpm, but there are around 10 fans, so "sensors" can't be seeing them all. But the machine is very quite, even at when running flat out, but if I run the diagnostics, which does test all fans at full speed, it is noisy.

What is lasieve? A google could not find it for me.

The "sensors" program shows 30 cores for each CPU, but since there are not 30 cores, I assume some of those temperature sensors must be in places like the L2 and L3 cache. At the moment, the maximum temperature of CPU0 is 75°C and CPU1 85°C. Since the hot air from the first heatsink blows over the second heatsink, the temperature difference is hardly surprising.

It's probably worth having 1 DIMM/channel, even tbough it does not seem to be benefiting me for GIMPS. Testing 4 exponents at a time with 12 memory channels active is no more or less fast than testing two at a time with 4 memory channels active.

I've tried controlling the cores a process used using Affinity in mprime's local.txt, as well as with taskset. Based on temperature measurements of the cores, I'm not convinced any cores are more/less active than others. A GUI based system monitor shows the temperature of 108 CPUs (obviously the hyperthreading is taken into account). The load seems to switch from core to core, so I'm not convinced I am able to control this

RMIMMs are quite expensive unfortunately. I have spent far more on RAM for this workstation than for anything else. It eclipses the cost of the CPUs, GPU and the complete workstation. But I do have a genuine business need for a lot of RAM, so I can justify buying it.

Code:
drkirkby@canary:~/gimps$ sensors
coretemp-isa-0001
Adapter: ISA adapter
Package id 1:  +85.0°C  (high = +89.0°C, crit = +99.0°C)
Core 0:        +80.0°C  (high = +89.0°C, crit = +99.0°C)
Core 1:        +81.0°C  (high = +89.0°C, crit = +99.0°C)
Core 2:        +83.0°C  (high = +89.0°C, crit = +99.0°C)
Core 3:        +80.0°C  (high = +89.0°C, crit = +99.0°C)
Core 4:        +82.0°C  (high = +89.0°C, crit = +99.0°C)
Core 5:        +85.0°C  (high = +89.0°C, crit = +99.0°C)
Core 6:        +82.0°C  (high = +89.0°C, crit = +99.0°C)
Core 8:        +80.0°C  (high = +89.0°C, crit = +99.0°C)
Core 9:        +83.0°C  (high = +89.0°C, crit = +99.0°C)
Core 10:       +83.0°C  (high = +89.0°C, crit = +99.0°C)
Core 11:       +85.0°C  (high = +89.0°C, crit = +99.0°C)
Core 12:       +84.0°C  (high = +89.0°C, crit = +99.0°C)
Core 13:       +83.0°C  (high = +89.0°C, crit = +99.0°C)
Core 16:       +83.0°C  (high = +89.0°C, crit = +99.0°C)
Core 17:       +82.0°C  (high = +89.0°C, crit = +99.0°C)
Core 18:       +84.0°C  (high = +89.0°C, crit = +99.0°C)
Core 19:       +84.0°C  (high = +89.0°C, crit = +99.0°C)
Core 20:       +84.0°C  (high = +89.0°C, crit = +99.0°C)
Core 21:       +81.0°C  (high = +89.0°C, crit = +99.0°C)
Core 22:       +81.0°C  (high = +89.0°C, crit = +99.0°C)
Core 24:       +79.0°C  (high = +89.0°C, crit = +99.0°C)
Core 25:       +81.0°C  (high = +89.0°C, crit = +99.0°C)
Core 26:       +82.0°C  (high = +89.0°C, crit = +99.0°C)
Core 27:       +85.0°C  (high = +89.0°C, crit = +99.0°C)
Core 28:       +84.0°C  (high = +89.0°C, crit = +99.0°C)
Core 29:       +84.0°C  (high = +89.0°C, crit = +99.0°C)

dell_smm-virtual-0
Adapter: Virtual device
fan1:           0 RPM
fan2:        1726 RPM
fan3:         859 RPM

nvme-pci-10200
Adapter: PCI adapter
Composite:    +40.9°C  (low  = -20.1°C, high = +77.8°C)
                       (crit = +81.8°C)
Sensor 1:     +40.9°C  (low  = -273.1°C, high = +65261.8°C)

nouveau-pci-7300
Adapter: PCI adapter
fan1:        2235 RPM
temp1:        +36.0°C  (high = +95.0°C, hyst =  +3.0°C)
                       (crit = +105.0°C, hyst =  +5.0°C)
                       (emerg = +135.0°C, hyst =  +5.0°C)

coretemp-isa-0000
Adapter: ISA adapter
Package id 0:  +75.0°C  (high = +89.0°C, crit = +99.0°C)
Core 0:        +70.0°C  (high = +89.0°C, crit = +99.0°C)
Core 1:        +66.0°C  (high = +89.0°C, crit = +99.0°C)
Core 2:        +66.0°C  (high = +89.0°C, crit = +99.0°C)
Core 3:        +64.0°C  (high = +89.0°C, crit = +99.0°C)
Core 4:        +65.0°C  (high = +89.0°C, crit = +99.0°C)
Core 5:        +65.0°C  (high = +89.0°C, crit = +99.0°C)
Core 6:        +70.0°C  (high = +89.0°C, crit = +99.0°C)
Core 8:        +63.0°C  (high = +89.0°C, crit = +99.0°C)
Core 9:        +65.0°C  (high = +89.0°C, crit = +99.0°C)
Core 10:       +66.0°C  (high = +89.0°C, crit = +99.0°C)
Core 11:       +66.0°C  (high = +89.0°C, crit = +99.0°C)
Core 12:       +65.0°C  (high = +89.0°C, crit = +99.0°C)
Core 13:       +63.0°C  (high = +89.0°C, crit = +99.0°C)
Core 16:       +65.0°C  (high = +89.0°C, crit = +99.0°C)
Core 17:       +66.0°C  (high = +89.0°C, crit = +99.0°C)
Core 18:       +67.0°C  (high = +89.0°C, crit = +99.0°C)
Core 19:       +65.0°C  (high = +89.0°C, crit = +99.0°C)
Core 20:       +64.0°C  (high = +89.0°C, crit = +99.0°C)
Core 21:       +65.0°C  (high = +89.0°C, crit = +99.0°C)
Core 22:       +64.0°C  (high = +89.0°C, crit = +99.0°C)
Core 24:       +65.0°C  (high = +89.0°C, crit = +99.0°C)
Core 25:       +64.0°C  (high = +89.0°C, crit = +99.0°C)
Core 26:       +65.0°C  (high = +89.0°C, crit = +99.0°C)
Core 27:       +66.0°C  (high = +89.0°C, crit = +99.0°C)
Core 28:       +66.0°C  (high = +89.0°C, crit = +99.0°C)
Core 29:       +75.0°C  (high = +89.0°C, crit = +99.0°C)

nvme-pci-0100
Adapter: PCI adapter
Composite:    +36.9°C  (low  =  -0.1°C, high = +85.8°C)
                       (crit = +86.8°C)
Sensor 1:     +38.9°C  (low  = -273.1°C, high = +65261.8°C)
Sensor 2:     +33.9°C  (low  = -273.1°C, high = +65261.8°C)
drkirkby is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
Assigning too much memory slows down P-1 stage 2? ZFR Software 11 2020-12-13 10:19
Allow mprime to use more memory ZFR Software 1 2020-12-10 09:50
Mini ITX with LGA 2011 (4 memory channels) bgbeuning Hardware 7 2016-06-18 10:32
mprime checking available memory tha Software 7 2015-12-07 15:56
Cheesy memory slows down prime95? nomadicus Hardware 9 2003-03-01 00:15

All times are UTC. The time now is 19:02.


Sun Oct 17 19:02:50 UTC 2021 up 86 days, 13:31, 0 users, load averages: 2.20, 1.60, 1.48

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.