![]() |
![]() |
#1 |
"Ed Hall"
Dec 2009
Adirondack Mtns
71518 Posts |
![]()
OK, I've been playing with mpi processing and msieve LA for a while now with some success, but this latest setup I'm trying to get running is just taxing my patience and level of knowledge.
On the success side, I have a setup with three dual core machines connected via a gigabit switch that gives me the slightest advantage (time-wise) over a quad core machine running alone. I am trying to get the quad core to link up with a second quad core via mpi. I have everything working with the two machines connected direct via their gigabit networks and they are claiming 1 Gigabit connectivity. Running msieve LA on the two machines via mpi works with no errors. But, for a recent c129, the single quad core machine did the LA in about 1hr 45m. When I task the two machines via mpi to run the same LA, I can't get them to do it in under 4 hours! I've tried all the various options of threads, grids, mpi processes, etc. Nothing will come back at less than 4 hours. I've also never gotten near, let alone, over 200% on either cpu as displayed by top. If I use 2 threads and 2 processes per machine, I can get around 150% on each of 2 threads per machine. If I go to 1 process and 4 threads per machine, I still only get around 150%. If I go full 4 processes on both machines with 1 thread, I get 4 processes at ~100%. But none of these calls get me under 4 hours and some are nearer to 5. All thoughts welcome. |
![]() |
![]() |
![]() |
#2 | |
(loop (#_fork))
Feb 2006
Cambridge, England
11000111011112 Posts |
![]() Quote:
I've had situations where adding more processors via MPI *on the same motherboard* slows the task down. Probably worth checking with iperf that you are actually getting 900Mbps or so over the gigabit. Last fiddled with by fivemack on 2016-12-09 at 11:54 |
|
![]() |
![]() |
![]() |
#3 | |
"Ed Hall"
Dec 2009
Adirondack Mtns
71518 Posts |
![]() Quote:
Sorry for the rant! On the brighter side, pings between the machines over the gigabit connection are much quicker than via the 10/100 switch/router connection. I also found one possible setting that may have been in error. After the current c127 finishes NFS, I'll test again. Meanwhile, I think I can use a stand-a-lone version of iperf and see if that works. Otherwise, I'll beat on it a bit here and there to see if I can find out why it won't talk to the web. I might end up just moving to Ubuntu and see if that takes care of all the troubles. The only reason I'm using Debian on so many machines is because at the time I was installing the earlier systems, I couldn't get Ubuntu to run headless. Now I can. Sorry I got long winded (fingered). Thanks for all the help... |
|
![]() |
![]() |
![]() |
#4 |
"Ed Hall"
Dec 2009
Adirondack Mtns
7×17×31 Posts |
![]()
Just a short follow on:
iperf was a no-go so far, but there is a definite transfer difference for normal file movement. Between the two Debian machines across 10/100, I get 11.2 MB/s. Via the gigabit, I get 33.7MB/s. But, between the Ubuntu machines, via a switch, I get 45.2MB/s. And, there I have it... |
![]() |
![]() |
![]() |
#5 |
Jul 2003
So Cal
2,083 Posts |
![]()
For msieve LA, gigabit ethernet is slow and very high latency. This will bottleneck the calculation. As Tom suggests, QDR Infiniband will work much better but it's relatively expensive.
|
![]() |
![]() |
![]() |
#6 |
Undefined
"The unspeakable one"
Jun 2006
My evil lair
23×32×5×17 Posts |
![]()
I suspect this is the real killer.
To the OP: You could try removing the switch and directly connect the two machines. You might need a crossed cable, but most Ethernet chips nowadays can auto switch so a normal straight cable might also work. Also you might need to assign static IPs, or run a DHCP server on one of the boxes. Last fiddled with by retina on 2016-12-10 at 02:14 |
![]() |
![]() |
![]() |
#7 | |
"Ed Hall"
Dec 2009
Adirondack Mtns
7×17×31 Posts |
![]() Quote:
I'm stumped about the one machine not talking to "mommy" anymore, so I can't get repository packages. Because of dependencies, I can't easily install iperf and haven't found a stand-a-lone version, but normal file transfer is slower across these two, so I don't know that I need to prove that with iperf. I think the above transfer at 33.7MB/s is less than 1/3 of the gigabit rating, if my math is correct. ![]() Thanks for all suggestions. |
|
![]() |
![]() |
![]() |
#8 | |
(loop (#_fork))
Feb 2006
Cambridge, England
13·491 Posts |
![]() Quote:
Code:
% scp wheat@wheat:pingle . pingle 100% 1024MB 85.3MB/s 00:12 [ 4] 0.0-10.0 sec 1.09 GBytes 935 Mbits/sec I too have a problem with machines not talking to the net-at-large, I ended up using 'apt-get download iperf' on one that worked and 'dpkg -i iperf_2.0.5-3_amd64.deb' after scping the file. Wondering whether a domestic edge-router is unhappy to have twenty distinct machines behind it. Last fiddled with by fivemack on 2016-12-10 at 18:39 |
|
![]() |
![]() |
![]() |
#9 | |
Bamboozled!
"πΊππ·π·π"
May 2003
Down not across
246268 Posts |
![]() Quote:
There are communication overheads such that you will never reach 1000 megabits per second for application data. Two of my systems connected through a gigabit switch usually transfer data at around 75MB/s over sftp. That's 60% of the notional 1Gbps and is a fair indication of the sort of overheads that you can expect. More efficient protocols than sftp will do better but I doubt you'll ever get much more than 80% (100MB/s) in practice. |
|
![]() |
![]() |
![]() |
#10 | |
"Ed Hall"
Dec 2009
Adirondack Mtns
7×17×31 Posts |
![]() Quote:
iperf appears to show all is well with the connection: via the 10/100 path: Code:
[ ID] Interval Transfer Bandwidth [ 3] 0.0-10.1 sec 116 MBytes 96.0 Mbits/sec Code:
[ ID] Interval Transfer Bandwidth [ 3] 0.0-10.0 sec 1.09 GBytes 939 Mbits/sec Code:
[ ID] Interval Transfer Bandwidth [ 3] 0.0-10.0 sec 1.10 GBytes 942 Mbits/sec I do have one limiting issue I'm aware of, in that I haven't been able to figure out network sharing for the mpi drive without using sshfs, which I'm sure is digging into the bandwidth. But that's in use with the other, three machine, setup, too. Is there something about network drive sharing that I could maybe get working that would do better than my sshfs? Thanks to all for the help. I'm learning more stuff... |
|
![]() |
![]() |
![]() |
Thread Tools | |
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
When are we no longer humane? | jasong | jasong | 10 | 2012-12-04 05:00 |
AMD benchmarks NO LONGER needed for v26 | Prime95 | Software | 11 | 2012-01-13 15:06 |
"...[take] longer than the age of the known universe to | sdbardwick | Lounge | 11 | 2009-10-27 09:19 |
Longer-term plans? | fivemack | NFSNET Discussion | 3 | 2008-02-21 19:26 |
RSA contests are no longer active :( | ixfd64 | Lounge | 2 | 2007-06-09 08:55 |