![]() |
![]() |
#1189 | |
May 2007
Kansas; USA
247648 Posts |
![]() Quote:
You are bound and determined to gloss over this whole issue without doing a detailed look at the exact times and matching up when the rejected results were originally handed out. I took 2 hours last night to do that for you now. How about looking into it this time please? Please calculate when the 26 rejected results were originally handed out today. I saved them off under an obvious file name. Like I said, I only had time to look at the first 2-3 and those were handed out at 09:55-10:00 CDT on Aug. 18th. Simply take the time that the original result was returned and subtract the # of seconds that it took to return it. I'm not going to back off on this until we nail it down. I nailed down 10 rejected results to the original power outage. The other 46 still have no explanation. How do we know that they were as a result of yet another crash? We don't. We need to match up exact crash times with times in which the original pairs were handed out. We seem to have gotten into this habit of glossing over these server problems and that habit needs to end. I don't know if this will help but it can't hurt: On port G8000 only, please increase the JobMaxTime to 2 days. Please tell me how you safely stop the server to do this. If you can let me know how that is done, then I'll do it if it is needed in the future. I now how to change the JobMaxTime and to restart it but don't want to create a problem when I stop it. Karsten, can we talk you into returning pairs normally instead of ~100 at a time about twice a day? If you need to do so many at a time, how about you write a script to do ~20 each hour for 5 hours or something like that? That may help some. Gary Last fiddled with by gd_barnes on 2009-08-19 at 16:06 |
|
![]() |
![]() |
![]() |
#1190 |
Mar 2006
Germany
23×32×41 Posts |
![]()
i've sent the 2 outstanding pairs at n=957k for GB8000 some time ago and they are in the "last copy off"-file, but the "First unprocessed k/n-pairs" still show one of them!
why? was the prune-time not 1 hour? please edit the stats-page for the GB ports to show those settings like the IB ports! PS: the stats updated 14:45 CDT with n=971k! Last fiddled with by kar_bon on 2009-08-19 at 20:08 |
![]() |
![]() |
![]() |
#1191 |
May 2008
Wilmington, DE
22×23×31 Posts |
![]()
What has this http://nplb.ironbits.net/ been replaced with? You know, the one with the IB port, all the primes for the day, the first n to process, links to the rejects, results for the day, etc; in the vertical format.
Last fiddled with by MyDogBuster on 2009-08-19 at 20:56 |
![]() |
![]() |
![]() |
#1192 | |
"Lennart"
Jun 2007
25·5·7 Posts |
![]()
http://noprimeleftbehind.net/index.php
Lennart Quote:
|
|
![]() |
![]() |
![]() |
#1193 |
Jan 2006
deep in a while-loop
29A16 Posts |
![]() |
![]() |
![]() |
![]() |
#1194 |
Jan 2006
deep in a while-loop
2×32×37 Posts |
![]()
SNAP!
![]() |
![]() |
![]() |
![]() |
#1195 |
May 2008
Wilmington, DE
22×23×31 Posts |
![]()
try http://www.noprimeleftbehind.net
http://noprimeleftbehind.net/index.php Not the one's I'm looking for. The one I had in mind did not show the hourly progress. It did show all the primes found for that day listed by each port. It is similar to http://nplb-gb1.no-ip.org/llrnet/ but instead for IB. It did have a like to http://www.noprimeleftbehind.net, but also had links to all current results for the day, rejects, etc. Last fiddled with by MyDogBuster on 2009-08-19 at 21:50 |
![]() |
![]() |
![]() |
#1196 | |
A Sunny Moo
Aug 2007
USA (GMT-5)
3·2,083 Posts |
![]() Quote:
As for the rejected results, here's a tally of how many were handed out when: -10 rejected from marco.bs around 11:00 CDT, 8/18 -20 rejected from kar_bon around 15:39 CDT, 8/18 -21 rejected from kar_bon around 00:37 CDT, 8/19 -5 rejected from marco.bs around 2:23 CDT, 8/19 Note that we can't tell exactly when these were handed out because they were (like many rejected results) listed with a time of 0.0 sec. Correlating with times on the same k/n pairs in the main results files is not helpful in this case, since those could very well have been assigned at a different time. Note that all of these rejected results are from G8000, a server which did not crash at all. Thus, we can't even circumstantially correlate these with any particular crashes. Even if it had been known to crash, then we wouldn't be able to know when the crashes happened; the time and date of restarts aren't logged. I know it may seem like I'm glossing over this stuff, but quite frankly, LLRnet doesn't let me do much more than that. It just plain doesn't log enough info. Yes, we redirect the screen output to a file, but that's essentially useless since there's no timestamps on it. Because of this, most server glitches simply have to be glossed over, because any further investigation is just going to waste a lot of time on something that there's not enough information to pinpoint. The glitches usually (as is the case now) will be handled by the server through its normal processes of expiry and reassignment; there's just nothing more we can do except let it run its course. This is one of the reasons why PRPnet will be very, very nice when it's all ready for production use. It keeps very detailed logs that are of great help when tracing down problems of any sort. As for the jobMaxTime, unfortunately it's a rather difficult process to stop the server once it's in the loop. I can do it, but in order to verify that the server's actually stopped correctly, I have to do a number of "geek things" that would be really, really hard to explain. Ditto for restarting with the whole loop thing. I've just now changed G8000 to 2 days jobMaxTime; if you need any such changes performed while the servers are in the loop thingy, let me know and I'll do it the absolute soonest that I can. I'd love to tell you how to do it so that it isn't dependent on my availability, but quite frankly, as I said that may be a bit difficult. ![]() Max ![]() |
|
![]() |
![]() |
![]() |
#1197 |
May 2007
Kansas; USA
22×3×5×179 Posts |
![]()
I found calculating when the original pairs (that were later handed out a 2nd time) were handed out to be helpful, even though you couldn't determine when the duplicated pairs (that actually DID reject) were handed out. When the rejected results were returned doesn't help us much.
Here's why: Likely the originals and the duplicates were handed out at about the same time. As you could see from the calculated times above, those original pairs were all handed out at 2 distinct times, one of which I was able to correlate almost exactly to the power outage. In other words, this tells me that there was some distinct problem that occurred at those 2 times. Had the original pairs been handed out at more random times, we could not come to such a conclusion. Even if the duplicated pairs had been handed out at distinct times, we couldn't discern such because, as you said, the rejected results don't show how much time was taken. By gleening as much info. as possible through calculations such as this allows us to hopefully cut down on it in the future. Anyway, I agree, it's not easy to gleen much info. from things on LLRnet. I guess we'll have to stop now. One more question: Will getting David's code on to my servers mean that we can avoid the "loop thing" code to restart the servers? If so, that will prevent quite a bit of this "after outage" multiple crashes that we keep encountering. Gary Last fiddled with by gd_barnes on 2009-08-19 at 23:00 |
![]() |
![]() |
![]() |
#1198 | ||||
May 2007
Kansas; USA
22·3·5·179 Posts |
![]() Quote:
Quote:
Quote:
Quote:
Lennart and AMDave, both of these responses are incorrect and both link to the same incorrect page. Ian asked for the "noprimeleftbehind" link name version of http://nplb.ironbits.net/. If everything is going to roll over to the new server, we need a new link name with "noprimeleftbehind" in it that specifically has this web page in it. I previously inquired to David about this. David, are you just going to leave this one link on the old "ironbits" link name or can we expect a new link that has "noprimeleftbehind" in it? This is an important page that we don't want to lose. I'll Email David with a link to this posting. Thanks, Gary |
||||
![]() |
![]() |
![]() |
#1199 | |
Mar 2006
Germany
1011100010002 Posts |
![]() Quote:
i've given those timestamps for the client-side to write this (still using this on my clients): Code:
[2009-08-20 00:36:21] 2013*2^235548-1 is not prime. Res64: 0BAAB87826667E2E Time : 61.858 sec. [2009-08-20 00:37:23] 2013*2^235595-1 is not prime. Res64: A6ED66F8AA9036F5 Time : 61.854 sec. [2009-08-20 00:38:24] 2013*2^235640-1 is not prime. Res64: 8BCA8E2B12058E30 Time : 61.950 sec. [2009-08-20 00:39:26] so you have to read as. Code:
[2009-08-20 00:37:23] 2013*2^235548-1 is not prime. Res64: 0BAAB87826667E2E Time : 61.858 sec. [2009-08-20 00:38:24] 2013*2^235595-1 is not prime. Res64: A6ED66F8AA9036F5 Time : 61.854 sec. [2009-08-20 00:39:26] 2013*2^235640-1 is not prime. Res64: 8BCA8E2B12058E30 Time : 61.950 sec. perhaps you can change the "server.lua" the same. |
|
![]() |
![]() |
![]() |
Thread Tools | |
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
PRPnet servers for NPLB | mdettweiler | No Prime Left Behind | 230 | 2022-02-21 06:42 |
Servers for NPLB | gd_barnes | No Prime Left Behind | 0 | 2009-08-10 19:21 |
LLRnet servers for CRUS | gd_barnes | Conjectures 'R Us | 39 | 2008-07-15 10:26 |
NPLB LLRnet server discussion | em99010pepe | No Prime Left Behind | 229 | 2008-04-30 19:13 |
NPLB LLRnet server #1 - dried | em99010pepe | No Prime Left Behind | 19 | 2008-03-26 06:19 |