mersenneforum.org mprime (Linux) doesn't do "affinity" correctly...
 Register FAQ Search Today's Posts Mark Forums Read

 2014-01-02, 19:04 #1 chalsall If I May     "Chris Halsall" Sep 2002 Barbados 100101011001002 Posts mprime (Linux) doesn't do "affinity" correctly... This has probably already been discussed at length. Please forgive me if I'm ignorant of previous discussions on this matter which have already covered this. But... I have a few Dell R720 servers at my disposal, each with two (2#) Intel E5-2420s (8# real cores) with hyper-threading enabled (16# virtual cores). I'm currently running DCs on these, and trusted mprime to do the "right thing"; the local.txt is configured for 2# workers, 8# threads each. The output from mprime at start-up: Code: Main thread Jan 2 14:41] Mersenne number primality test program version 27.9 [Main thread Jan 2 14:41] Optimizing for CPU architecture: Core i3/i5/i7, L2 cache size: 256 KB, L3 cache size: 20 MB [Main thread Jan 2 14:41] Logical CPUs 1,17 form one physical CPU. [Main thread Jan 2 14:41] Logical CPUs 2,18 form one physical CPU. ... [Worker #1 Jan 2 14:41] Setting affinity to run helper thread 2 on any logical CPU. [Worker #1 Jan 2 14:41] Setting affinity to run helper thread 1 on any logical CPU. ... So... mprime correctly detects the hyper-thread matches, but doesn't seem to set the affinity mask. So the threads bounce around all the available CPUs, as seen by: Code: watch "ps -mo pid,tid,fname,user,psr -p pgrep mprime" OK, says I, I'll try "AffinityScramble2=0123456789ABCDEFGHIJKLMNOPQRSTUV" as an experiment, as suggested by undoc.txt. Same thing; the threads bounce around -- it seems the affinity mask is not being set. This can be demonstrated simply by the following results: Code: [chalsall@development prime]$taskset -p -c 0 21060 pid 21060's current affinity list: 0-31 pid 21060's new affinity list: 0 So then I hacked together a simple Perl script in five minutes: Code: #!/usr/bin/perl$Workers = 2; $Threads = 8;$TIDs = ps -mo pid,tid,fname,user,psr -p \pgrep mprime\; $Cnt = 0;$Worker = 0; $Thread = 0; @Lines = split /\n/,$TIDs; foreach $TID (@Lines) {$Cnt++; if ($Cnt < 5) { next; }$TID =~ s/\s*-\s*(\d*).*/$1/;$Core = $Thread*2 +$Worker; $Cmd = "taskset -p -c$Core $TID"; print "$Cmd\n"; $Cmd;$Thread++; if ($Thread >=$Threads) { $Thread = 0;$Worker++; } } ...and my throughput has more than doubled. Note that this seems to only be important with multi-CPU, multi-core machines. It had no upside for single-CPU, multi-core machines, even though under such systems the affinity does not seem to be set and the threads "bounce around". I've now added this script as a "cronjob" running every five minutes, since when a candidate completes all workers are launched under new threads. George et al, thoughts? Am I being silly somewhere in my mprime configuration files, or is this actually a real issue with multi-CPU systems? Final note: all of these tests were under CentOS 6.4. There may be different behavior under other versions of Linux.
 2014-01-02, 19:34 #2 sdbardwick     Aug 2002 North San Diego County 2·11·31 Posts Quick first glance: Appears that you have the worker threads set to "Run on any CPU" or possibly "Smart Assignment". Under Windows with HT CPUs, I usually end up setting each worker thread to a specific CPU and use affinityscramble2 to make sure each thread lands on an unique (unshared via HT) physical core. On a quad core HT set up for 2 threads per test, this looks like: Code: affinityscramble2 = 02461357 [Worker #1] Affinity=0 [Worker #2] Affinity=2 Linux, IIRC, enumerates the logical cores differently - 01234567 gave the correct result on Ubuntu (I think; it was a while ago, but the enumeration was definately different than Windows). Last fiddled with by sdbardwick on 2014-01-02 at 19:36
 2014-01-02, 19:41 #3 sdbardwick     Aug 2002 North San Diego County 68210 Posts Dredging my memory: My 2x Opteron6128 did bounce threads around until I tied them to specific cores. Bouncing was really bad for performance; I suspect the NUMA added huge memory access delays.
 2014-01-02, 19:50 #4 kracker     "Mr. Meeseeks" Jan 2012 California, USA 32·241 Posts Or... can't you disable HT from BIOS? (Unless you need them for something else)
2014-01-02, 20:23   #5
chalsall
If I May

"Chris Halsall"
Sep 2002

22×2,393 Posts

Quote:
 Originally Posted by sdbardwick On a quad core HT set up for 2 threads per test, this looks like: Code: affinityscramble2 = 02461357 [Worker #1] Affinity=0 [Worker #2] Affinity=2
OK, interesting... Thank you for that.

Adding the "Affinity=x" line under each worker line in the local.txt file causes mprime to set the affinity for each thread (so they don't "bounce around"), but it ignores the "Affinityscramble2" setting, and the results are sub-optimal (after the first four threads).

I'm going to stick with my Perl script. It at least works (at least, for my particular machines).

Last fiddled with by chalsall on 2014-01-02 at 20:24

2014-01-02, 20:34   #6
chalsall
If I May

"Chris Halsall"
Sep 2002

22·2,393 Posts

Quote:
 Originally Posted by kracker Or... can't you disable HT from BIOS? (Unless you need them for something else)
Hyper-Threading (HT) can be useful. mprime doesn't gain anything from it (because it is so optimized), but other programs can.

This is the whole point of tying specific threads to specific processors.

Last fiddled with by chalsall on 2014-01-02 at 20:35 Reason: Had an extra ")". Compiler error.

2014-01-02, 20:53   #7
sdbardwick

Aug 2002
North San Diego County

12528 Posts

Quote:
 Originally Posted by chalsall OK, interesting... Thank you for that. Adding the "Affinity=x" line under each worker line in the local.txt file causes mprime to set the affinity for each thread (so they don't "bounce around"), but it ignores the "Affinityscramble2" setting, and the results are sub-optimal (after the first four threads). I'm going to stick with my Perl script. It at least works (at least, for my particular machines).
I haven't had it ignore the affinityscramble2 entry, but the translation between the affinityscramble2 list (0 is lowest), Affinity=x lines (0 is lowest), and the mprime CPU numbers "Setting affinity to run worker on logical CPU #n" (1 is lowest) can be (very) confusing - lots of opportunity for OBOEs.

2014-01-02, 21:05   #8
chalsall
If I May

"Chris Halsall"
Sep 2002

22·2,393 Posts

Quote:
 Originally Posted by sdbardwick I haven't had it ignore the affinityscramble2 entry, but the translation between the affinityscramble2 list (0 is lowest), Affinity=x lines (0 is lowest), and the mprime CPU numbers "Setting affinity to run worker on logical CPU #n" (1 is lowest) can be (very) confusing - lots of opportunity for OBOEs.
OK.

I ran several experiments based on your observations.

Nothing I altered in the text files had any effect on the programs' behavior with regards to task assignment to CPUs.

As an example, "affinityscramble2" was set to "012345679abcdedfhi".

Observe behavior.

Set affinityscramble2 to "000000000000000000"

Observe behavior. Same behavior.

Observe behavior. Same behavior...

At some point an observer must assume the code has a bug somewhere....

 2014-01-02, 21:46 #9 ewmayer ∂2ω=0     Sep 2002 República de California 22·32·17·19 Posts This thread appears to be discussing similar issues as this one in the Linux subforum.
2014-01-02, 21:58   #10
chalsall
If I May

"Chris Halsall"
Sep 2002

22·2,393 Posts

Quote:
 Originally Posted by ewmayer This thread appears to be discussing similar issues as this one in the Linux subforum.

But we've already established that this is not an issue with mprime "not be detecting which logical CPUs form the physical CPUs".

This is, instead, mprime making a much bigger mistake. Read: Not understanding how to deal with multi-socket-CPU environments.

Last fiddled with by chalsall on 2014-01-02 at 22:07

 2014-01-02, 22:29 #11 sdbardwick     Aug 2002 North San Diego County 12528 Posts Hmmm. I'll take a closer look tonight or tomorrow when I have physical access to a dual-socket box. AMD though, without HT (Opteron 6128 and 4280). I know that last I checked Prime95 under Windows 7 and Server 2012 respects the scramble on both those boxes (although I will retest), so I'll scrounge up a live USB or CD of a linux distro. Although I note that the first example looks like the default assignment, and your second 2 examples involve impossible core configs such that mprime might be smart enough to ignore them and revert to the default. I haven't looked at the source code for error handling, so just speculation on my part. Last fiddled with by sdbardwick on 2014-01-02 at 22:29 Reason: spelug error

 Similar Threads Thread Thread Starter Forum Replies Last Post Explorer09 Software 1 2017-03-01 02:34 blip Software 1 2015-11-20 16:43 Syntony PrimeNet 6 2014-10-23 00:23 Mr. P-1 Information & Answers 5 2013-02-08 16:06 T.Rex Software 9 2006-09-01 21:21

All times are UTC. The time now is 07:44.

Sun May 9 07:44:44 UTC 2021 up 31 days, 2:25, 0 users, load averages: 3.31, 3.32, 3.11