 stats are up
2010-06-06, 01:04   #13
Lennart

Jun 2007

Thank's Dave

Lennart

2010-06-06, 01:52   #14
Mini-Geek
"Tim Sorbera"
Aug 2006
San Antonio, TX USA

Quote:
 Originally Posted by gd_barnes Max, should we be renaming these prpserver.logs every once in a while or is there a way to make it quit writing out so many messages. These files are huge. I can't help but think writing so many messages to them actually caused the server to go down.
In prpserver.ini:
Code:
// Size limit in bytes for the prpclient.log file.
//    0 - no limit
//   -1 - no log
loglimit=0
You might prefer to use something else (e.g. a rename-off) instead of or in addition to this line.

2010-06-06, 02:01   #15
mdettweiler
Aug 2007
USA (GMT-5)

Quote:
 Originally Posted by gd_barnes Port 9000 went down about 2-1/2 hours ago with a "Couldn't append to file (prpserver.log)" error message. It then said it has a "Segmentation fault". But here is the weird part: The prpserver program is completely missing. It no longer exists at all! It's like it got deleted on the fly or something. When I tried to execute ./prpserver, it said "No such file or directory.". Here is my solution: 1. It appears that the prpserver.log possibly got too large. So I'm renaming it like Max did with prpserver-0518.log. I will call it prpserver-0605.log. It was > 500 MB. 2. Copied the prpserver program over from another folder that contained PRPnet 3.2.6. 3. Restarted the server with ./prpserver. Everything seems to be working now. I'm sorry about the problems everyone. Max, should we be renaming these prpserver.logs every once in a while or is there a way to make it quit writing out so many messages. These files are huge. I can't help but think writing so many messages to them actually caused the server to go down. Karsten, I think Max gave you remote access to my machine now. Have you tried getting into it yet? If so, could you babysit it for a little while? I'm going out to eat and to a movie now. I hope to be able to check it again in 4-5 hours. Gary
I just got back from a convention that took up most of yesterday and today--hence why I wasn't able to check the forum throughout the day, notice this problem earlier, and do something about it. Thanks Gary for taking care of that.

I must say, that problem is extremely weird and I don't have much of an idea what could have caused it. prpserver.log having gotten too large is a possible cause, but I don't really see why the size of the file matters when all the server's doing is blindly appending to it.

I currently have the logging set to debuglevel=1, which outputs an enormous amount of information and can make prpserver.log balloon rather quickly, especially under heavy load as it is now. (As we saw here, that log file which I hadn't touched for a few weeks was already half a gigabyte.) However, I have been bitten way too many times by rare bugs in the server that showed up once and were nearly impossible to trace because I wasn't logging in enough detail; then the bugs conveniently didn't show up for a good long time after that. Hence, I have been keeping all the servers on maximum logging, since there's plenty of disk space on the server.

The strangest thing is that the prpserver executable actually self-destructed after the segfault. I have never, ever seen that happen in all of my experience with computers. Fortunately it was easy to remedy since you could just copy the executable from another server, but still it's the craziest thing I have ever seen. I'll take a look at the tail end of the old prpserver.log either tonight or tomorrow and see if I can get anything of use to send to Mark. Unfortunately, when dealing with a segfault things often fall apart so quickly that the server doesn't even have a chance to record what happened to it in its dying words, so I may or may not have any luck with that. Usually the most helpful thing is a stack trace, but I'd have to have the server running under a debugger to catch that, which it wasn't since it was running stably besides this.

 2010-06-06, 14:51 #16 gd_barnes     May 2007

I was thinking of something but I don't know if it is possible from a technical perspective. I think it would be the most fair thing to do to extend the rally time on the PRPnet server by 3 hours since that is how long it was down. That would allow everyone a nearly equal amount of time for the rally. We don't have a way to keep people from running PRPnet that were on LLRnet during the PRPnet outage so that will be on the honor system. What does everyone think? Max and Dave, is it possible to make the times for the rally different for the 2 servers? Would that overly complicate things? This would make it June 4th at 7 PM GMT to June 6th at 7 PM GMT for LLRnet port 3000 and June 4th at 7 PM GMT to June 6th at 10 PM GMT for PRPnet port 9000. Gary
2010-06-06, 15:58   #17
mdettweiler
Aug 2007
USA (GMT-5)

Quote:
 Originally Posted by gd_barnes I was thinking of something but I don't know if it is possible from a technical perspective. I think it would be the most fair thing to do to extend the rally time on the PRPnet server by 3 hours since that is how long it was down. That would allow everyone a nearly equal amount of time for the rally. We don't have a way to keep people from running PRPnet that were on LLRnet during the PRPnet outage so that will be on the honor system. What does everyone think? Max and Dave, is it possible to make the times for the rally different for the 2 servers? Would that overly complicate things? This would make it June 4th at 7 PM GMT to June 6th at 7 PM GMT for LLRnet port 3000 and June 4th at 7 PM GMT to June 6th at 10 PM GMT for PRPnet port 9000. Gary
Yeah, that should be pretty easily possible. They both have separate start/end times defined anyway (since PRPnet reports its times in GMT and LLRnet in local time, i.e. GMT-5), so it should be easy enough to add a few hours to the PRPnet end time. Dave?

Meanwhile, it looks like it's going to be a somewhat tight race between ROLP and PrimeSearchTeam. Last night I calculated that ROLP was doing about 300 pairs/hour, and PrimeSearchTeam averaging 378 pairs/hour. Yet PrimeSearchTeam (i.e. Lennart) didn't join the rally until 5 hours into it (accounting correctly for time zone differences), and didn't really get up to ideal levels until 7 hours into the rally; thus, ROLP has been ahead of the game the whole while, with PrimeSearchTeam steadily gaining. The question is...will PrimeSearchTeam catch ROLP before the end of the rally (which is only 3 hours away)?

 2010-06-06, 16:06 #18 Lennart     "Lennart" Jun 2007

You don't need to make any changes for me. But there are more user on that server, you have to ask them Lennart
2010-06-06, 16:08   #19
mdettweiler
Aug 2007
USA (GMT-5)

Quote:
 Originally Posted by Lennart You don't need to make any changes for me. But there are more user on that server, you have to ask them Lennart
I think we should go for it. As I recall, we did this once or twice before in the past when we had servers go down; thus, the precedent would be to extend the rally on that server to account for the downtime.

2010-06-06, 16:09   #20
Flatlander
"Chris"
Feb 2005
England

Quote:
 Originally Posted by mdettweiler The question is...will PrimeSearchTeam catch ROLP before the end of the rally (which is only 3 hours away)?
Then the last thing you should do is extend it!

2010-06-06, 16:10   #21
mdettweiler
Aug 2007
USA (GMT-5)

Quote:
 Originally Posted by Flatlander Then the last thing you should do is extend it!
Well...we can't let administrators' team affiliations get in the way of that. After all, that would be a conflict of interest.

 2010-06-06, 16:55 #22 kar_bon     Mar 2006

@all participants using the new LLRnet script: If you stop crunching the 5th Drive here, please stop the script after the rally is over and start it with calling 'do -c' to cancel all reserved pairs of that client. This won't stay those pairs 2 days in the server-joblist before they sent to a client again! Thanks.

