mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Software

Reply
 
Thread Tools
Old 2014-01-02, 19:04   #1
chalsall
If I May
 
chalsall's Avatar
 
"Chris Halsall"
Sep 2002
Barbados

100101011001002 Posts
Default mprime (Linux) doesn't do "affinity" correctly...

This has probably already been discussed at length. Please forgive me if I'm ignorant of previous discussions on this matter which have already covered this. But...

I have a few Dell R720 servers at my disposal, each with two (2#) Intel E5-2420s (8# real cores) with hyper-threading enabled (16# virtual cores).

I'm currently running DCs on these, and trusted mprime to do the "right thing"; the local.txt is configured for 2# workers, 8# threads each.

The output from mprime at start-up:
Code:
Main thread Jan 2 14:41] Mersenne number primality test program version 27.9
[Main thread Jan 2 14:41] Optimizing for CPU architecture: Core i3/i5/i7, L2 cache size: 256 KB, L3 cache size: 20 MB
[Main thread Jan 2 14:41] Logical CPUs 1,17 form one physical CPU.
[Main thread Jan 2 14:41] Logical CPUs 2,18 form one physical CPU.
...
[Worker #1 Jan 2 14:41] Setting affinity to run helper thread 2 on any logical CPU.
[Worker #1 Jan 2 14:41] Setting affinity to run helper thread 1 on any logical CPU.
...
So... mprime correctly detects the hyper-thread matches, but doesn't seem to set the affinity mask. So the threads bounce around all the available CPUs, as seen by:
Code:
watch "ps -mo pid,tid,fname,user,psr -p `pgrep mprime`"
OK, says I, I'll try "AffinityScramble2=0123456789ABCDEFGHIJKLMNOPQRSTUV" as an experiment, as suggested by undoc.txt.

Same thing; the threads bounce around -- it seems the affinity mask is not being set. This can be demonstrated simply by the following results:

Code:
[chalsall@development prime]$ taskset -p -c 0 21060
pid 21060's current affinity list: 0-31
pid 21060's new affinity list: 0
So then I hacked together a simple Perl script in five minutes:

Code:
#!/usr/bin/perl

$Workers = 2;
$Threads = 8;

$TIDs = `ps -mo pid,tid,fname,user,psr -p \`pgrep mprime\``;

$Cnt = 0;
$Worker = 0;
$Thread = 0;

@Lines = split /\n/, $TIDs;

foreach $TID (@Lines) {
   $Cnt++;

   if ($Cnt < 5) { next; }

   $TID =~ s/\s*-\s*(\d*).*/$1/;

   $Core = $Thread*2 + $Worker;

   $Cmd = "taskset -p -c $Core $TID";

   print "$Cmd\n";
   `$Cmd`;

   $Thread++;
   if ($Thread >= $Threads) {
      $Thread = 0;
      $Worker++;
   }
}
...and my throughput has more than doubled.

Note that this seems to only be important with multi-CPU, multi-core machines. It had no upside for single-CPU, multi-core machines, even though under such systems the affinity does not seem to be set and the threads "bounce around".

I've now added this script as a "cronjob" running every five minutes, since when a candidate completes all workers are launched under new threads.

George et al, thoughts?

Am I being silly somewhere in my mprime configuration files, or is this actually a real issue with multi-CPU systems?

Final note: all of these tests were under CentOS 6.4. There may be different behavior under other versions of Linux.
chalsall is offline   Reply With Quote
Old 2014-01-02, 19:34   #2
sdbardwick
 
sdbardwick's Avatar
 
Aug 2002
North San Diego County

2·11·31 Posts
Default

Quick first glance:
Appears that you have the worker threads set to "Run on any CPU" or possibly "Smart Assignment".

Under Windows with HT CPUs, I usually end up setting each worker thread to a specific CPU and use affinityscramble2 to make sure each thread lands on an unique (unshared via HT) physical core.

On a quad core HT set up for 2 threads per test, this looks like:
Code:
affinityscramble2 = 02461357
[Worker #1]
Affinity=0

[Worker #2]
Affinity=2
Linux, IIRC, enumerates the logical cores differently - 01234567 gave the correct result on Ubuntu (I think; it was a while ago, but the enumeration was definately different than Windows).

Last fiddled with by sdbardwick on 2014-01-02 at 19:36
sdbardwick is offline   Reply With Quote
Old 2014-01-02, 19:41   #3
sdbardwick
 
sdbardwick's Avatar
 
Aug 2002
North San Diego County

68210 Posts
Default

Dredging my memory:
My 2x Opteron6128 did bounce threads around until I tied them to specific cores. Bouncing was really bad for performance; I suspect the NUMA added huge memory access delays.
sdbardwick is offline   Reply With Quote
Old 2014-01-02, 19:50   #4
kracker
 
kracker's Avatar
 
"Mr. Meeseeks"
Jan 2012
California, USA

32·241 Posts
Default

Or... can't you disable HT from BIOS? (Unless you need them for something else)
kracker is offline   Reply With Quote
Old 2014-01-02, 20:23   #5
chalsall
If I May
 
chalsall's Avatar
 
"Chris Halsall"
Sep 2002
Barbados

22×2,393 Posts
Default

Quote:
Originally Posted by sdbardwick View Post
On a quad core HT set up for 2 threads per test, this looks like:
Code:
affinityscramble2 = 02461357
[Worker #1]
Affinity=0

[Worker #2]
Affinity=2
OK, interesting... Thank you for that.

Adding the "Affinity=x" line under each worker line in the local.txt file causes mprime to set the affinity for each thread (so they don't "bounce around"), but it ignores the "Affinityscramble2" setting, and the results are sub-optimal (after the first four threads).

I'm going to stick with my Perl script. It at least works (at least, for my particular machines).

Last fiddled with by chalsall on 2014-01-02 at 20:24
chalsall is offline   Reply With Quote
Old 2014-01-02, 20:34   #6
chalsall
If I May
 
chalsall's Avatar
 
"Chris Halsall"
Sep 2002
Barbados

22·2,393 Posts
Default

Quote:
Originally Posted by kracker View Post
Or... can't you disable HT from BIOS? (Unless you need them for something else)
Hyper-Threading (HT) can be useful. mprime doesn't gain anything from it (because it is so optimized), but other programs can.

This is the whole point of tying specific threads to specific processors.

Last fiddled with by chalsall on 2014-01-02 at 20:35 Reason: Had an extra ")". Compiler error.
chalsall is offline   Reply With Quote
Old 2014-01-02, 20:53   #7
sdbardwick
 
sdbardwick's Avatar
 
Aug 2002
North San Diego County

12528 Posts
Default

Quote:
Originally Posted by chalsall View Post
OK, interesting... Thank you for that.

Adding the "Affinity=x" line under each worker line in the local.txt file causes mprime to set the affinity for each thread (so they don't "bounce around"), but it ignores the "Affinityscramble2" setting, and the results are sub-optimal (after the first four threads).

I'm going to stick with my Perl script. It at least works (at least, for my particular machines).
I haven't had it ignore the affinityscramble2 entry, but the translation between the affinityscramble2 list (0 is lowest), Affinity=x lines (0 is lowest), and the mprime CPU numbers "Setting affinity to run worker on logical CPU #n" (1 is lowest) can be (very) confusing - lots of opportunity for OBOEs.
sdbardwick is offline   Reply With Quote
Old 2014-01-02, 21:05   #8
chalsall
If I May
 
chalsall's Avatar
 
"Chris Halsall"
Sep 2002
Barbados

22·2,393 Posts
Default

Quote:
Originally Posted by sdbardwick View Post
I haven't had it ignore the affinityscramble2 entry, but the translation between the affinityscramble2 list (0 is lowest), Affinity=x lines (0 is lowest), and the mprime CPU numbers "Setting affinity to run worker on logical CPU #n" (1 is lowest) can be (very) confusing - lots of opportunity for OBOEs.
OK.

I ran several experiments based on your observations.

Nothing I altered in the text files had any effect on the programs' behavior with regards to task assignment to CPUs.

As an example, "affinityscramble2" was set to "012345679abcdedfhi".

Observe behavior.

Set affinityscramble2 to "000000000000000000"

Observe behavior. Same behavior.

Set affinityscramble2 to "deadbeefdeadbeefdeadbeef"

Observe behavior. Same behavior...

At some point an observer must assume the code has a bug somewhere....
chalsall is offline   Reply With Quote
Old 2014-01-02, 21:46   #9
ewmayer
2ω=0
 
ewmayer's Avatar
 
Sep 2002
Rep├║blica de California

22·32·17·19 Posts
Default

This thread appears to be discussing similar issues as this one in the Linux subforum.
ewmayer is offline   Reply With Quote
Old 2014-01-02, 21:58   #10
chalsall
If I May
 
chalsall's Avatar
 
"Chris Halsall"
Sep 2002
Barbados

22·2,393 Posts
Default

Quote:
Originally Posted by ewmayer View Post
This thread appears to be discussing similar issues as this one in the Linux subforum.
Thanks for that link.

But we've already established that this is not an issue with mprime "not be detecting which logical CPUs form the physical CPUs".

This is, instead, mprime making a much bigger mistake. Read: Not understanding how to deal with multi-socket-CPU environments.

Last fiddled with by chalsall on 2014-01-02 at 22:07
chalsall is offline   Reply With Quote
Old 2014-01-02, 22:29   #11
sdbardwick
 
sdbardwick's Avatar
 
Aug 2002
North San Diego County

12528 Posts
Default

Hmmm.
I'll take a closer look tonight or tomorrow when I have physical access to a dual-socket box. AMD though, without HT (Opteron 6128 and 4280). I know that last I checked Prime95 under Windows 7 and Server 2012 respects the scramble on both those boxes (although I will retest), so I'll scrounge up a live USB or CD of a linux distro.

Although I note that the first example looks like the default assignment, and your second 2 examples involve impossible core configs such that mprime might be smart enough to ignore them and revert to the default. I haven't looked at the source code for error handling, so just speculation on my part.

Last fiddled with by sdbardwick on 2014-01-02 at 22:29 Reason: spelug error
sdbardwick is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
[Patch] CPU affinity prompt problem in mprime Linux / OS X build Explorer09 Software 1 2017-03-01 02:34
mprime ETA and primenet "days to go" do not match blip Software 1 2015-11-20 16:43
Primenet doesn't believe that "I fixed the hardware..." Syntony PrimeNet 6 2014-10-23 00:23
64 bit mprime "not in executable format" Mr. P-1 Information & Answers 5 2013-02-08 16:06
mprime on FC4 on x86_64 : "cannot execute binary file" T.Rex Software 9 2006-09-01 21:21

All times are UTC. The time now is 07:44.

Sun May 9 07:44:44 UTC 2021 up 31 days, 2:25, 0 users, load averages: 3.31, 3.32, 3.11

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.