mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Software

Reply
 
Thread Tools
Old 2016-08-10, 12:42   #1
rudi_m
 
rudi_m's Avatar
 
Jul 2005

2×7×13 Posts
Thumbs up mprime: signal handler, SIGHUP

Hi,

I've noticed on recent systemd linux distros that mprime is not able to terminate cleanly on shutdown/reboot, i.e. it will not dump the current working state to disk. This is really bad because because each reboot we will throw away a lot of calculations (dependent on DiskWriteTime). Maybe it may also cause corrupt state and config files.

The reason is that mprime does not handle SIGHUP, which is sent by systemd-logind on shutdown (additionally to SIGTERM).

The fix in prime.c would be either to handle SIGHUP like we handle already SIGINT and SIGTERM
+ (void)signal(SIGHUP, sigterm_handler);

or maybe just ignore SIGHUP
+ (void)signal(SIGHUP, SIG_IGN);

I would probably prefer to ignore it because people often run mprime in the background and don't want it to be terminated if the underlying terminal dies.


While investigating the SIGHUP issue I've found some more minor things which should be reviewed/fixed too:

1. Is our sigterm_handler() really thread-safe? I'm not sure.
sigterm_handler() might be called re-entrant by arbitrary threads. Probably better only the main-thread should install the signal handler
and all the other threads should block any signals.

2. mprime -m does not terminate on any signal while sitting in the menu.

3. in menu.c the get_line() function should handle the NULL return value of fgets() to detect EOF and terminate in this case.
You can reproduce a stupid endless loop like this:
mprime -m < /dev/zero
This endless loop may also happen in real-life if the underlying terminal (stdin) dies.
(Note "yes bla | mprime -m" is _valid_ endless loop.)



regarding 1. I've added some debug code to print the PID within the signal handler
Code:
$ ./mprime -t &
$ rudi@zappa:~/MPrime/mprime-current> pstree -p  | grep mprime
           |-bash(18822)---mprime(21028)-+-{mprime}(21032)
                                                                |-{mprime}(21033)
                                                                |-{mprime}(21037)
                                                                |-{mprime}(21038)
                                                                `-{mprime}(21039)

$ PPP=21028
$ kill -TERM $PPP ; kill -INT  $PPP;kill -TERM $PPP ; kill -INT  $PPP;kill -TERM $PPP ; kill -INT  $PPP;kill -TERM $PPP


# This is the mprime output. It looks somehow silly allthough it seems to work.
# The "Main thread" messages wer not printed by the main thread.
........
catched A: 15 (21028)
[Main thread Aug 10 14:35] Stopping all worker threads.
catched A: 2 (21032)
[Main thread Aug 10 14:35] Stopping all worker threads.
catched A: 15 (21037)
[Main thread Aug 10 14:35] Stopping all worker threads.
[Worker #3 Aug 10 14:35] Torture Test completed 0 tests in 0 minutes - 0 errors, 0 warnings.
[Worker #3 Aug 10 14:35] Worker stopped.
[Worker #4 Aug 10 14:35] Torture Test completed 0 tests in 0 minutes - 0 errors, 0 warnings.
[Worker #4 Aug 10 14:35] Worker stopped.
[Worker #1 Aug 10 14:35] Torture Test completed 0 tests in 0 minutes - 0 errors, 0 warnings.
[Worker #1 Aug 10 14:35] Worker stopped.
catched B: 2
catched B: 15
catched A: 2 (21037)
[Main thread Aug 10 14:35] Stopping all worker threads.
catched B: 15
catched B: 2
[Worker #2 Aug 10 14:35] Torture Test completed 0 tests in 0 minutes - 0 errors, 0 warnings.
[Worker #2 Aug 10 14:35] Worker stopped.
[Main thread Aug 10 14:35] Execution halted.


cu,
Rudi

Last fiddled with by rudi_m on 2016-08-10 at 12:47
rudi_m is offline   Reply With Quote
Old 2016-08-10, 19:14   #2
Prime95
P90 years forever!
 
Prime95's Avatar
 
Aug 2002
Yeehaw, FL

11111000011012 Posts
Default

Thanks. I will include these fixes in the next release.
Prime95 is offline   Reply With Quote
Old 2016-08-10, 21:33   #3
Mark Rose
 
Mark Rose's Avatar
 
"/X\(‘-‘)/X\"
Jan 2013

13·227 Posts
Default

I seem to recall another thread where someone was having trouble keeping mprime running on logging out. I can't find the thread. This would probably fix that issue also.
Mark Rose is offline   Reply With Quote
Old 2016-08-10, 21:51   #4
rudi_m
 
rudi_m's Avatar
 
Jul 2005

2×7×13 Posts
Default

Quote:
Originally Posted by Mark Rose View Post
I seem to recall another thread where someone was having trouble keeping mprime running on logging out. I can't find the thread. This would probably fix that issue also.
To be safe we could additionally add an option "--daemonize" to invoke daemon(1,1). This is the most portable way to detach a process from the terminal to run in the background forever.
rudi_m is offline   Reply With Quote
Old 2016-08-11, 00:58   #5
GP2
 
GP2's Avatar
 
Sep 2003

13×199 Posts
Default

Quote:
Originally Posted by Mark Rose View Post
I seem to recall another thread where someone was having trouble keeping mprime running on logging out. I can't find the thread. This would probably fix that issue also.
screen is a very useful utility under Linux for remote sessions

Code:
screen
./mprime -d
CTRL-A d          # detach from screen
screen -r         # restore screen
CTRL-A c          # create a second terminal window (bash command line)
CTRL-A p          # go back to previous window (with mprime)
CTRL-A n          # go back to next window
CTRL-A p          # go back to previous window
CTRL-A d          # detach from screen
exit              # log out

Log back in
screen -r         # mprime is still running
http://aperiodic.net/screen/man:start
GP2 is offline   Reply With Quote
Old 2016-08-11, 01:35   #6
Mark Rose
 
Mark Rose's Avatar
 
"/X\(‘-‘)/X\"
Jan 2013

295110 Posts
Default

Quote:
Originally Posted by GP2 View Post
screen is a very useful utility under Linux for remote sessions
Indeed. I usually invoke it through crontab like `screen -S mprime -d -m /home/wherever/mprime -d` either on startup. On machines with desktop users, I also run `killall mprime` in cron to schedule running it.

`tmux` is more powerful but I've been a bit lazy in learning it.
Mark Rose is offline   Reply With Quote
Old 2016-09-07, 01:46   #7
Prime95
P90 years forever!
 
Prime95's Avatar
 
Aug 2002
Yeehaw, FL

7,949 Posts
Default

Quote:
Originally Posted by rudi_m View Post

The reason is that mprime does not handle SIGHUP, which is sent by systemd-logind on shutdown (additionally to SIGTERM).
I would probably prefer to ignore it because people often run mprime in the background and don't want it to be terminated if the underlying terminal dies.

2. mprime -m does not terminate on any signal while sitting in the menu.

3. in menu.c the get_line() function should handle the NULL return value of fgets() to detect EOF and terminate in this case.
You can reproduce a stupid endless loop like this:
mprime -m < /dev/zero

OK, I added ignoring the SIGHUP signal. I hope that fixes your issue.

For 3, I check for NULL from fgets and exit. This works if I type ^D, but your /dev/zero case still loops.

For 2, I have no idea what to do. Linux programming is not my specialty. Your help would be appreciated.
Prime95 is offline   Reply With Quote
Old 2016-09-07, 04:10   #8
rudi_m
 
rudi_m's Avatar
 
Jul 2005

2×7×13 Posts
Default

Quote:
Originally Posted by Prime95 View Post
OK, I added ignoring the SIGHUP signal. I hope that fixes your issue.
Thanks!

Quote:
Originally Posted by Prime95 View Post
For 3, I check for NULL from fgets and exit. This works if I type ^D, but your /dev/zero case still loops.
Sorry, I've made a typo. I meant
mprime -m < /dev/null
This produces EOF and should exit.
/dev/zero produces zero bytes, Seems that fgets handles them like newlines. No need to handle this case.

Quote:
Originally Posted by Prime95 View Post
For 2, I have no idea what to do. Linux programming is not my specialty. Your help would be appreciated.
I'll have a look at it. Somehow the sigterm_handler should call exit() directly if we are in menu mode (-m) and no workers are running.

Are you using a VCS when coding? Would be nice to see what you have done so far. I have a git repository (just imported your official releases) but would be nice to have one which is used by upstream (you) ;)
https://github.com/rudimeier/mprime
rudi_m is offline   Reply With Quote
Old 2016-09-07, 04:27   #9
Prime95
P90 years forever!
 
Prime95's Avatar
 
Aug 2002
Yeehaw, FL

7,949 Posts
Default

I use a local SVN repository.

Attached are the rather minor changes I made to prime.c and menu.c
Attached Files
File Type: gz foo.gz (13.1 KB, 118 views)
Prime95 is offline   Reply With Quote
Old 2016-09-08, 02:17   #10
bgbeuning
 
Dec 2014

FF16 Posts
Default

Quote:
Originally Posted by Prime95 View Post
For 2, I have no idea what to do. Linux programming is not my specialty. Your help would be appreciated.
On Windows a CTRL+C starts a new thread in your process that does the right
thing depending on if the process is being debugged, or the process has installed
a CTRL+C handler.

(Windows does not like to interrupt system calls.
That is why they have so many asynchronous API.)

On UNIX all system calls can be interrupted (by CTRL+C or other signals).
On a signal, any system call returns an error with errno = EINTR.
If mprime -m is using stdio, then somewhere deep inside stdio is a read(2)
that will be blocked waiting for input. When a CTRL+C happens the read(2)
call will fail with EINTR. I am guessing stdio will set an error in the FILE so
using ferror(3) might tell you something happened when you get 0 bytes back.
Or checking for (errno == EINTR).
bgbeuning is offline   Reply With Quote
Old 2016-09-08, 07:28   #11
GP2
 
GP2's Avatar
 
Sep 2003

A1B16 Posts
Default

Quote:
Originally Posted by Prime95 View Post
OK, I added ignoring the SIGHUP signal. I hope that fixes your issue.
Wait, are we sure we want this?

In Linux the standard way to make a program immune to SIGHUP is to invoke it using the "nohup" command. In other words, "nohup mprime" instead of just "mprime".

So maybe the original suggestion of handling SIGHUP the same was as SIGINT and SIGTERM is a better idea.
GP2 is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
(Patch) Handling of SIGHUP and SIGPIPE (for mprime) Explorer09 Software 2 2017-03-08 05:37
Pari segmentation fault on 32 bit system when using signal handlers drf Programming 2 2015-02-26 09:40
64 bit mprime? aaronl Linux 1 2005-11-10 16:50
Mprime's signal handling? zukertort Software 1 2005-04-23 12:18
Problem with mprime (Fixed with mprime -d) antiroach Software 2 2004-07-19 04:07

All times are UTC. The time now is 12:05.


Sat Aug 13 12:05:07 UTC 2022 up 37 days, 6:52, 2 users, load averages: 1.30, 1.21, 1.15

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2022, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.

≠ ± ∓ ÷ × · − √ ‰ ⊗ ⊕ ⊖ ⊘ ⊙ ≤ ≥ ≦ ≧ ≨ ≩ ≺ ≻ ≼ ≽ ⊏ ⊐ ⊑ ⊒ ² ³ °
∠ ∟ ° ≅ ~ ‖ ⟂ ⫛
≡ ≜ ≈ ∝ ∞ ≪ ≫ ⌊⌋ ⌈⌉ ∘ ∏ ∐ ∑ ∧ ∨ ∩ ∪ ⨀ ⊕ ⊗ 𝖕 𝖖 𝖗 ⊲ ⊳
∅ ∖ ∁ ↦ ↣ ∩ ∪ ⊆ ⊂ ⊄ ⊊ ⊇ ⊃ ⊅ ⊋ ⊖ ∈ ∉ ∋ ∌ ℕ ℤ ℚ ℝ ℂ ℵ ℶ ℷ ℸ 𝓟
¬ ∨ ∧ ⊕ → ← ⇒ ⇐ ⇔ ∀ ∃ ∄ ∴ ∵ ⊤ ⊥ ⊢ ⊨ ⫤ ⊣ … ⋯ ⋮ ⋰ ⋱
∫ ∬ ∭ ∮ ∯ ∰ ∇ ∆ δ ∂ ℱ ℒ ℓ
𝛢𝛼 𝛣𝛽 𝛤𝛾 𝛥𝛿 𝛦𝜀𝜖 𝛧𝜁 𝛨𝜂 𝛩𝜃𝜗 𝛪𝜄 𝛫𝜅 𝛬𝜆 𝛭𝜇 𝛮𝜈 𝛯𝜉 𝛰𝜊 𝛱𝜋 𝛲𝜌 𝛴𝜎𝜍 𝛵𝜏 𝛶𝜐 𝛷𝜙𝜑 𝛸𝜒 𝛹𝜓 𝛺𝜔