mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   Software (https://www.mersenneforum.org/forumdisplay.php?f=10)
-   -   Running multiple copies of mprime on Linux (https://www.mersenneforum.org/showthread.php?t=3179)

hc_grove 2004-10-10 13:33

Running multiple copies of mprime on Linux
 
Hi

I'm most active in Seventeen or Bust, but as I have as I have a problem with a test version of mprime (some new code for P-1 factoring) George suggested that I posted here.

I use a bunch of different machines (home directory shared via NFS) to run a mixture of prp clients, sieve clients and P-1 clients. To make that a bit easier I have written some scripts to start any client on any machine. Basically I have a script called `on` that two arguments a hostname and a job name, it then logs in on the given host, changes to the directory given be the job name, and starts another script called `job.sh`. Apart from some commands that just help me find out what jobs run on what computer, the job.sh scripts I created for mprime looks like this:
[CODE]
#! /bin/sh
cd ~/17orbust/p-1_1
./mprime -A1 &
[/CODE]
and
[CODE]
#! /bin/sh
cd ~/17orbust/p-1_2
./mprime -A2 &
[/CODE]
When I try to run these two scripts on the same machine (called shannon), the following happens:
[CODE]
grove@galois > ./on shannon p-1_1
grove@galois > ./on shannon p-1_2
grove@galois > Another mprime is already running!
[/CODE]

What am I doing wrong :question:

Xyzzy 2004-10-10 14:05

In local.ini there is a line like this:

Pid=10563

That is what causes the message to appear... I've never seen mprime report that unless there really was another mprime running, but it wouldn't hurt to check...

Maybe you could run "killall mprime" before you tried to start the new one? Or grep out the process id and kill it?

geoff 2004-10-10 14:33

The instance of mprime started as 'mprime -A1' writes its Pid in the file loca0001.ini, 'mprime -A2' writes it in loca0002.ini, etc. When they are stopped the Pid is set to zero.

If you created these files by copying the local.ini from an already running mprime then that could cause the message you see. If there really is no other mprime running then just deleting the Pid= line from the loca*.ini file should fix it.

hc_grove 2004-10-10 15:34

Thank you.

I hadn't noticed that Pid= line, so I'd just made loca000N.ini a symlink (to be sure they all run with the same configuration I prefer symlinks over copying the file around), so when the first copy started it put it's pid in there, and then the others wouldn't start.

Now I've replaced the symlinks with copies (just means I have to do mre work if I decide to change the configuration) it works perfectly.


All times are UTC. The time now is 10:35.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2022, Jelsoft Enterprises Ltd.