mersenneforum.org  

Go Back   mersenneforum.org > Prime Search Projects > No Prime Left Behind

Reply
 
Thread Tools
Old 2010-06-06, 01:00   #12
AMDave
 
AMDave's Avatar
 
Jan 2006
deep in a while-loop

2×7×47 Posts
Default

stats are up
AMDave is offline   Reply With Quote
Old 2010-06-06, 01:04   #13
Lennart
 
Lennart's Avatar
 
"Lennart"
Jun 2007

100011000002 Posts
Default

Quote:
Originally Posted by AMDave View Post
stats are up

Thank's Dave


Lennart

Last fiddled with by Lennart on 2010-06-06 at 01:04
Lennart is offline   Reply With Quote
Old 2010-06-06, 01:52   #14
Mini-Geek
Account Deleted
 
Mini-Geek's Avatar
 
"Tim Sorbera"
Aug 2006
San Antonio, TX USA

102538 Posts
Default

Quote:
Originally Posted by gd_barnes View Post
Max, should we be renaming these prpserver.logs every once in a while or is there a way to make it quit writing out so many messages. These files are huge. I can't help but think writing so many messages to them actually caused the server to go down.
In prpserver.ini:
Code:
// Size limit in bytes for the prpclient.log file.
//    0 - no limit
//   -1 - no log
loglimit=0
You might prefer to use something else (e.g. a rename-off) instead of or in addition to this line.
Mini-Geek is offline   Reply With Quote
Old 2010-06-06, 02:01   #15
mdettweiler
A Sunny Moo
 
mdettweiler's Avatar
 
Aug 2007
USA (GMT-5)

792 Posts
Default

Quote:
Originally Posted by gd_barnes View Post
Port 9000 went down about 2-1/2 hours ago with a "Couldn't append to file (prpserver.log)" error message. It then said it has a "Segmentation fault".

But here is the weird part: The prpserver program is completely missing. It no longer exists at all! It's like it got deleted on the fly or something. When I tried to execute ./prpserver, it said "No such file or directory.".

Here is my solution:

1. It appears that the prpserver.log possibly got too large. So I'm renaming it like Max did with prpserver-0518.log. I will call it prpserver-0605.log. It was > 500 MB.

2. Copied the prpserver program over from another folder that contained PRPnet 3.2.6.

3. Restarted the server with ./prpserver.

Everything seems to be working now.

I'm sorry about the problems everyone.

Max, should we be renaming these prpserver.logs every once in a while or is there a way to make it quit writing out so many messages. These files are huge. I can't help but think writing so many messages to them actually caused the server to go down.

Karsten, I think Max gave you remote access to my machine now. Have you tried getting into it yet? If so, could you babysit it for a little while?

I'm going out to eat and to a movie now. I hope to be able to check it again in 4-5 hours.


Gary
I just got back from a convention that took up most of yesterday and today--hence why I wasn't able to check the forum throughout the day, notice this problem earlier, and do something about it. Thanks Gary for taking care of that.

I must say, that problem is extremely weird and I don't have much of an idea what could have caused it. prpserver.log having gotten too large is a possible cause, but I don't really see why the size of the file matters when all the server's doing is blindly appending to it.

I currently have the logging set to debuglevel=1, which outputs an enormous amount of information and can make prpserver.log balloon rather quickly, especially under heavy load as it is now. (As we saw here, that log file which I hadn't touched for a few weeks was already half a gigabyte.) However, I have been bitten way too many times by rare bugs in the server that showed up once and were nearly impossible to trace because I wasn't logging in enough detail; then the bugs conveniently didn't show up for a good long time after that. Hence, I have been keeping all the servers on maximum logging, since there's plenty of disk space on the server.

The strangest thing is that the prpserver executable actually self-destructed after the segfault. I have never, ever seen that happen in all of my experience with computers. Fortunately it was easy to remedy since you could just copy the executable from another server, but still it's the craziest thing I have ever seen. I'll take a look at the tail end of the old prpserver.log either tonight or tomorrow and see if I can get anything of use to send to Mark. Unfortunately, when dealing with a segfault things often fall apart so quickly that the server doesn't even have a chance to record what happened to it in its dying words, so I may or may not have any luck with that. Usually the most helpful thing is a stack trace, but I'd have to have the server running under a debugger to catch that, which it wasn't since it was running stably besides this.
mdettweiler is offline   Reply With Quote
Old 2010-06-06, 14:51   #16
gd_barnes
 
gd_barnes's Avatar
 
May 2007
Kansas; USA

3·7·487 Posts
Default

I was thinking of something but I don't know if it is possible from a technical perspective.

I think it would be the most fair thing to do to extend the rally time on the PRPnet server by 3 hours since that is how long it was down. That would allow everyone a nearly equal amount of time for the rally. We don't have a way to keep people from running PRPnet that were on LLRnet during the PRPnet outage so that will be on the honor system.

What does everyone think?

Max and Dave, is it possible to make the times for the rally different for the 2 servers? Would that overly complicate things? This would make it June 4th at 7 PM GMT to June 6th at 7 PM GMT for LLRnet port 3000 and June 4th at 7 PM GMT to June 6th at 10 PM GMT for PRPnet port 9000.


Gary
gd_barnes is online now   Reply With Quote
Old 2010-06-06, 15:58   #17
mdettweiler
A Sunny Moo
 
mdettweiler's Avatar
 
Aug 2007
USA (GMT-5)

792 Posts
Default

Quote:
Originally Posted by gd_barnes View Post
I was thinking of something but I don't know if it is possible from a technical perspective.

I think it would be the most fair thing to do to extend the rally time on the PRPnet server by 3 hours since that is how long it was down. That would allow everyone a nearly equal amount of time for the rally. We don't have a way to keep people from running PRPnet that were on LLRnet during the PRPnet outage so that will be on the honor system.

What does everyone think?

Max and Dave, is it possible to make the times for the rally different for the 2 servers? Would that overly complicate things? This would make it June 4th at 7 PM GMT to June 6th at 7 PM GMT for LLRnet port 3000 and June 4th at 7 PM GMT to June 6th at 10 PM GMT for PRPnet port 9000.


Gary
Yeah, that should be pretty easily possible. They both have separate start/end times defined anyway (since PRPnet reports its times in GMT and LLRnet in local time, i.e. GMT-5), so it should be easy enough to add a few hours to the PRPnet end time. Dave?

Meanwhile, it looks like it's going to be a somewhat tight race between ROLP and PrimeSearchTeam. Last night I calculated that ROLP was doing about 300 pairs/hour, and PrimeSearchTeam averaging 378 pairs/hour. Yet PrimeSearchTeam (i.e. Lennart) didn't join the rally until 5 hours into it (accounting correctly for time zone differences), and didn't really get up to ideal levels until 7 hours into the rally; thus, ROLP has been ahead of the game the whole while, with PrimeSearchTeam steadily gaining. The question is...will PrimeSearchTeam catch ROLP before the end of the rally (which is only 3 hours away)?

Last fiddled with by mdettweiler on 2010-06-06 at 16:04
mdettweiler is offline   Reply With Quote
Old 2010-06-06, 16:06   #18
Lennart
 
Lennart's Avatar
 
"Lennart"
Jun 2007

25·5·7 Posts
Default

You don't need to make any changes for me. But there are more user on that server, you have to ask them

Lennart
Lennart is offline   Reply With Quote
Old 2010-06-06, 16:08   #19
mdettweiler
A Sunny Moo
 
mdettweiler's Avatar
 
Aug 2007
USA (GMT-5)

792 Posts
Default

Quote:
Originally Posted by Lennart View Post
You don't need to make any changes for me. But there are more user on that server, you have to ask them

Lennart
I think we should go for it. As I recall, we did this once or twice before in the past when we had servers go down; thus, the precedent would be to extend the rally on that server to account for the downtime.
mdettweiler is offline   Reply With Quote
Old 2010-06-06, 16:09   #20
Flatlander
I quite division it
 
Flatlander's Avatar
 
"Chris"
Feb 2005
England

31·67 Posts
Default

Quote:
Originally Posted by mdettweiler View Post
The question is...will PrimeSearchTeam catch ROLP before the end of the rally (which is only 3 hours away)?
Then the last thing you should do is extend it!
Flatlander is offline   Reply With Quote
Old 2010-06-06, 16:10   #21
mdettweiler
A Sunny Moo
 
mdettweiler's Avatar
 
Aug 2007
USA (GMT-5)

624110 Posts
Default

Quote:
Originally Posted by Flatlander View Post
Then the last thing you should do is extend it!
Well...we can't let administrators' team affiliations get in the way of that. After all, that would be a conflict of interest.
mdettweiler is offline   Reply With Quote
Old 2010-06-06, 16:55   #22
kar_bon
 
kar_bon's Avatar
 
Mar 2006
Germany

54428 Posts
Default

@all participants using the new LLRnet script:

If you stop crunching the 5th Drive here, please stop the script after the rally is over and start it with calling 'do -c' to cancel all reserved pairs of that client.
This won't stay those pairs 2 days in the server-joblist before they sent to a client again!

Thanks.
kar_bon is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
LLRnet/PRPnet rally April 4th-11th mdettweiler No Prime Left Behind 55 2011-04-25 09:35
LLRnet/PRPnet rally January 3rd-10th mdettweiler No Prime Left Behind 48 2011-01-12 10:14
LLRnet/PRPnet rally Oct. 27th-Nov. 3rd mdettweiler No Prime Left Behind 33 2010-12-24 19:16
LLRnet/PRPnet rally August 12th-19th mdettweiler No Prime Left Behind 88 2010-09-09 12:50
LLRnet server rally 400<k<1001 June 20-22 mdettweiler No Prime Left Behind 67 2008-06-23 15:32

All times are UTC. The time now is 02:11.

Mon Oct 26 02:11:17 UTC 2020 up 45 days, 23:22, 0 users, load averages: 1.67, 1.82, 1.79

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2020, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.