mersenneforum.org LLRnet servers for NPLB
 Register FAQ Search Today's Posts Mark Forums Read

 2009-08-17, 21:40 #1156 gd_barnes     May 2007 Kansas; USA 1052510 Posts Well, crap. I just connected to port G8000 10 mins. ago since David's servers weren't working on my end. I only got about 10-15 pairs but several were older, i.e. about n=957K. But why would it have expired them and handed them to me? Max, any thoughts? Karsten, since I'm able to connect to David's machines now, I'll stop my port G8000 connections and return those pairs to the server. You might try re-sending them back to the server after I edit this port that I have returned them. What a friggin mess! Gary
2009-08-17, 21:46   #1157
mdettweiler
A Sunny Moo

Aug 2007
USA (GMT-5)

11000011010012 Posts

Quote:
 Originally Posted by kar_bon GB7000 and GB8000 (only tested those) are online again. BUT: for port 7000 i got 2 rejected pairs (perhaps more the next 15 minutes) but this shouldn't happen, because those pairs were assigned max. 1 hour ago! the same for port 8000: and here it's quite heavier! many pairs at n=969k! so much time lost! could someone explain this?!
I think that's because the servers don't factor in the fact that they were down when determining whether a pair should expire. They only look at the raw times and subtract them from the current time. Thus, if the k/n pair was assigned more than 24 hours ago, then it will be expired, regardless of server downtime.

Generally, if a server is down for a long period of time, we either a) temporarily change the jobMaxTime to something longer to avoid such cancellations; or b) tell people to avoid grabbing new work from the server until everyone's had the chance to return their results (we'll usually do this is, say, just one person has a large # of k/n pairs that need to be returned).

@Gary: Ah, that explains it. When you pulled down a couple of k/n pairs to test the servers, the server expired exactly that many k/n pairs of Karsten's that were more than 24 hours old, and gave them to you.

Last fiddled with by mdettweiler on 2009-08-17 at 21:46

 2009-08-17, 21:47 #1158 kar_bon     Mar 2006 Germany 32·52·13 Posts n=957k? on port GB8000? i've processed those n-range on 2009-08-07! so those pairs had to be handed out several times before you got them now! another thing: in the rejected file: Code: user=kar_bon [2009-08-17 04:06:20] 327*2^969546-1 is not prime. Res64: 0D67CB813245264B Time : 66189.0 sec. if the jobMaxTime is 1 day (86400 secs) so why i got this pair rejected after 66000 secs? where're the Gremlins in here? Last fiddled with by kar_bon on 2009-08-17 at 21:49
2009-08-17, 21:52   #1159
gd_barnes

May 2007
Kansas; USA

101001000111012 Posts

Quote:
 Originally Posted by mdettweiler I think that's because the servers don't factor in the fact that they were down when determining whether a pair should expire. They only look at the raw times and subtract them from the current time. Thus, if the k/n pair was assigned more than 24 hours ago, then it will be expired, regardless of server downtime. Generally, if a server is down for a long period of time, we either a) temporarily change the jobMaxTime to something longer to avoid such cancellations; or b) tell people to avoid grabbing new work from the server until everyone's had the chance to return their results (we'll usually do this is, say, just one person has a large # of k/n pairs that need to be returned). @Gary: Ah, that explains it. When you pulled down a couple of k/n pairs to test the servers, the server expired exactly that many k/n pairs of Karsten's that were more than 24 hours old, and gave them to you.

HUH??????????????????????????????

1. My servers have a JobMaxTime of 3 days.

2. My servers were offline for a grand total of ONE hour.

Another explanation is in order please. Thanks.

Gary

2009-08-17, 21:52   #1160
mdettweiler
A Sunny Moo

Aug 2007
USA (GMT-5)

3·2,083 Posts

Quote:
 Originally Posted by kar_bon n=957k? on port GB8000? i've processed those n-range on 2009-08-07! so those pairs had to be handed out several times before you got them now! another thing: in the rejected file: Code: user=kar_bon [2009-08-17 04:06:20] 327*2^969546-1 is not prime. Res64: 0D67CB813245264B Time : 66189.0 sec. if the jobMaxTime is 1 day (86400 secs) so why i got this pair rejected after 66000 secs? where're the Gremlins in here?
Uh...according to the status page on http://nplb-gb1.no-ip.org/llrnet/, G8000's lowest outstanding n is around 957K. That sounds about right given what you're describing.

2009-08-17, 21:54   #1161
mdettweiler
A Sunny Moo

Aug 2007
USA (GMT-5)

141518 Posts

Quote:
 Originally Posted by gd_barnes HUH?????????????????????????????? 1. My servers have a JobMaxTime of 3 days. 2. My servers were offline for a grand total of ONE hour. Another explanation is in order please. Thanks. Gary
Hmm...I see. First of all, your servers have been at 1 day for a while (we set them to that a while back for reasons I don't remember off the top of my head, and we never bothered to set them back). As for the servers being off for one hour, if Karsten had cached pairs from over 24 hours ago, that 1 hour might have been just enough to throw a wrench into his plans to return them before the deadline.

 2009-08-17, 21:57 #1162 gd_barnes     May 2007 Kansas; USA 52·421 Posts Karsten, I've now returned about 10 unprocessed pairs to port G8000. I ended up returning residues on a total of 4 of them before stopping. But I'm still baffled...all the ones I processed were n=~970K. I haven't a clue as to what is going on. I KNOW I had some pairs around n=~957K in my queue. Perhaps it handed out n=~970K, then n=~957K, then more n=~970K. Heck, I don't know. It doesn't matter. There's nothing we can do about it now. If you can try re-returning your rejected results to the server, go for it. I swear, I'm getting just "this" close to running this entire project with manual files. Gary Last fiddled with by gd_barnes on 2009-08-17 at 22:00
2009-08-17, 22:01   #1163
gd_barnes

May 2007
Kansas; USA

52×421 Posts

Quote:
 Originally Posted by mdettweiler Hmm...I see. First of all, your servers have been at 1 day for a while (we set them to that a while back for reasons I don't remember off the top of my head, and we never bothered to set them back). As for the servers being off for one hour, if Karsten had cached pairs from over 24 hours ago, that 1 hour might have been just enough to throw a wrench into his plans to return them before the deadline.
You told me they were back at 3 days again quite a while ago after everyone had this big argument over that. The agreement was that David's would be 1 day and mine 3 days except for IB9000 that was put at 2 days. Oh well, never mind. In the future, I'll check the JobMaxTime myself and set it at whatever I deem appropriate and simply let everyone know what that is. The democratic process on that has not worked at all.

Last fiddled with by gd_barnes on 2009-08-17 at 22:05

 2009-08-17, 22:04 #1164 kar_bon     Mar 2006 Germany 32·52·13 Posts the pairs from GB7000 assigned for me about 1 hour ago, just before the outrage! pairs at n=117k (as the two rejected) will processed in 30-40 seconds and my WUCacheSize = 3! so this shouldn't happen! if Gary get some pairs from GB7000 there'e must be something wrong with the joblist.txt not saved regular. and the pair (345 957466) from port GB8000: i don't know why this happens. this pair is not in results-file! i've processed those n-range on 2009-08-06 (see those resultfile) so why is this pair not handed out earlier? so please look in the joblist, who has this pair assigned! it's still in the status report as first unprocessed one! Last fiddled with by kar_bon on 2009-08-17 at 22:07
2009-08-17, 22:04   #1165
mdettweiler
A Sunny Moo

Aug 2007
USA (GMT-5)

3×2,083 Posts

Quote:
 Originally Posted by gd_barnes Karsten, I've now returned about 10 unprocessed pairs to port G8000. I ended returning residues on a total of 4 of them before stopping. But I'm still baffled...all the ones I processed were n=~970K. I haven't a clue as to what is going on. I KNOW I had some pairs around n=~957K in my queue. Perhaps it handed out n=~970K, then n=~957K, then more n=~970K. Heck, I don't know. It doesn't matter. There's nothing we can do about it now. If you can try re-returning your rejected results to the server, go for it. I swear, I'm getting just "this" close to running this entire project with manual files. Gary
I think it had something to do with the fact that when MooMoo unreserved the various reservations he had at the tail end of the mini-drive, we loaded some stuff in a funny order. I don't exactly remember how we did it all at the time, but at any rate, it seems that's the explanation for it.

Nothing to worry about, everything should straighten itself out in the end.

2009-08-17, 22:08   #1166
mdettweiler
A Sunny Moo

Aug 2007
USA (GMT-5)

11000011010012 Posts

Quote:
 Originally Posted by kar_bon the pairs from GB7000 assigned for me about 1 hour ago, just before the outrage! pairs at n=117k (as the two rejected) will processed in 30-40 seconds and my WUCacheSize = 3! so this shouldn't happen! if Gary get some pairs from GB7000 there'e must be something wrong with the joblist.txt not saved regular. and the pair (345 957466) from port GB8000: i don't know why this happens. this pair is not in results-file! i've processed those n-range on 2009-08-06 (see those resultfile) so why is this pair not handed out earlier?
Regarding the G7000 pairs, my guess is that due to the sudden interruption of power from the outage, the server didn't have the chance to update joblist.txt with the latest happenings, and thus it lost the last minute or two of data. That would quite believably cause a few rejected pairs. No big deal, they're small enough that hardly any work was wasted.

As for G8000, see my last message for an explanation of that.

 Similar Threads Thread Thread Starter Forum Replies Last Post mdettweiler No Prime Left Behind 228 2018-12-26 04:50 gd_barnes No Prime Left Behind 0 2009-08-10 19:21 gd_barnes Conjectures 'R Us 39 2008-07-15 10:26 em99010pepe No Prime Left Behind 229 2008-04-30 19:13 em99010pepe No Prime Left Behind 19 2008-03-26 06:19

All times are UTC. The time now is 16:17.

Tue Oct 26 16:17:09 UTC 2021 up 95 days, 10:46, 0 users, load averages: 1.99, 1.52, 1.39

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.