mersenneforum.org  

Go Back   mersenneforum.org > Prime Search Projects > No Prime Left Behind

Closed Thread
 
Thread Tools
Old 2008-05-23, 13:47   #34
mdettweiler
A Sunny Moo
 
mdettweiler's Avatar
 
Aug 2007
USA (GMT-5)

186916 Posts
Default

Quote:
Originally Posted by AES View Post
It must have been a connectivity issue on the T1. I don't see any problems with the llr server. I'm going to restart it just to make sure.
Hmm...I find it surprising that a connection hiccup could make the clients freeze. I've seen connection hiccups (mostly on my end) before--they never make my client freeze, it just gives an error.

Maybe my theory about the personal proxy is correct, but the server seemed back to normal again because it finally released all the sockets about the time Carlos reported that the server was back online?

Anyway, restarting the server should clear out any bugs left in the system. (Why do our servers always seem to go down right before a rally is to be held on them?)
mdettweiler is offline  
Old 2008-05-23, 15:05   #35
glennpat
 
glennpat's Avatar
 
May 2007
Minnesota USA

1100012 Posts
Smile

Quote:
Originally Posted by gd_barnes View Post
Surely you jest.

5/19 Glennpat 1, me 1
5/20 none
5/21 Flatlander 1, me 1
5/22 MrOzzy 1


If we're averaging more than 1 a day, that's not too bad at our current n-levels. No other project can claim that they average 1 per day!

Regardless, from the rally this weekend, I expect us to get at least 4 or possibly 5-6 in 2 days. We had 3 in one day from the last rally.


Gary
Last night I printed out the stats from the server for the "primes by user" and today there are 2 more. Looks like those primes are not waiting for the rally.
glennpat is offline  
Old 2008-05-23, 15:12   #36
AES
 
Jul 2007
Tennessee

25×19 Posts
Default

I don't know what was going on. Everything seemed OK but something had obviously happened. Does anyone have the client console log from when this started?

It seems like the client should continue to process the WUCache even if the server goes completely offline.

Last fiddled with by AES on 2008-05-23 at 15:13
AES is offline  
Old 2008-05-23, 15:39   #37
em99010pepe
 
em99010pepe's Avatar
 
Sep 2004

2·5·283 Posts
Default

Quote:
Originally Posted by AES View Post
I don't know what was going on. Everything seemed OK but something had obviously happened. Does anyone have the client console log from when this started?

I had lucky because I was using the machine when all my clients stuck, please see the time of my posting. I don't have the log.

Quote:
Originally Posted by AES View Post
It seems like the client should continue to process the WUCache even if the server goes completely offline.
Correct but that doesn't happens...

Last fiddled with by em99010pepe on 2008-05-23 at 15:39
em99010pepe is offline  
Old 2008-05-23, 16:02   #38
Mini-Geek
Account Deleted
 
Mini-Geek's Avatar
 
"Tim Sorbera"
Aug 2006
San Antonio, TX USA

17·251 Posts
Default

I've added one core. The other will go on in about half an hour, once it finishes balancing the time remaining between my two cores.
Mini-Geek is offline  
Old 2008-05-23, 16:09   #39
Brucifer
 
Brucifer's Avatar
 
Dec 2005

31310 Posts
Default

Not a good morning at all.......... :(

Twenty-nine llrnet instances locked up. One was running. The running one was one that I had set the cache to 50 on, and using the PG llrnet app. Two other PG llrnet app instances with the cache set to 15, and refill at 5 were locked up. And then all the rest were using the standard sr2 llrnet app and were locked up.

So it begs the question regarding the PG llrnet app of why the small cache instances locked and the 50 cache instance didn't? So I don't think it is an issue of the PG client not keeping running when the "event" happened, I think it is that the server or network clogged with the event, and then they hung cause they couldn't get more work.

My two bits worth is that the load on the server system/net just got to heavy???????

I'm still not running as I haven't gone through and cleared everything.
Brucifer is offline  
Old 2008-05-23, 17:05   #40
gd_barnes
 
gd_barnes's Avatar
 
May 2007
Kansas; USA

2·47·109 Posts
Default

I haven't checked my main machines yet but at least 2 of my slower cores are sleeping on LLRnet right now. I'll make the rounds and see if others are having the same problem.

Unfortunately I have the cache set at 1 for all of them.


Gary
gd_barnes is online now  
Old 2008-05-23, 17:06   #41
mdettweiler
A Sunny Moo
 
mdettweiler's Avatar
 
Aug 2007
USA (GMT-5)

3×2,083 Posts
Default

It looks like the PrimeGrid client doesn't respond any differently than the stock client in regard to freezing up--whereas the "normal" Linux client will empty its workfile.txt cache before freezing up, the PrimeGrid version freezes up when it reaches its refill value--essentially, they freeze when they're supposed to get more work. (This is in contrast to the Windows client, which freezes with full cache and 99% progress.)

Going by gut feeling, I still think this might have something to do with the fact that personal proxies tend to lock up Windows LLRnet servers. After the rally, I think I'll try running a test on my local network to see if I can reproduce a situation like freezeup we had earlier today.
mdettweiler is offline  
Old 2008-05-23, 17:25   #42
mdettweiler
A Sunny Moo
 
mdettweiler's Avatar
 
Aug 2007
USA (GMT-5)

3×2,083 Posts
Default

Quote:
Originally Posted by gd_barnes View Post
I haven't checked my main machines yet but at least 2 of my slower cores are sleeping on LLRnet right now. I'll make the rounds and see if others are having the same problem.

Unfortunately I have the cache set at 1 for all of them.


Gary
I would recommend having the cache set at 2, at the very least--otherwise you have about a half second to a second (depending on your connection latency and DNS server speed) of idle time between k/n pairs (since it can't start on the next one until it downloads it). It may not seem like much, but it adds up.

I generally, as a rule of thumb, keep a minimum cache of 5 for numbers this size on my Core 2 Duo. (That is, a cache of 5 on each core.) In fact, I usually use a cache size of 10 just to give myself some extra "padding" in case my internet connection skips out (which it does, for brief intervals, regularly; this is why I do only manual work while I'm on vacation). On slower machines such as my P3 1Ghz, I usually use a cache of 2 (about 40 minutes of work on that machine).
mdettweiler is offline  
Old 2008-05-23, 17:38   #43
gd_barnes
 
gd_barnes's Avatar
 
May 2007
Kansas; USA

2·47·109 Posts
Default

Bad...bad...bad...

All 18 of my LLRnet port 300 cores have been sleeping since 5:45 AM EDT this morning. (Ugh, HUGE loss of CPU cycles!) I've now 'killed' the LLRnet instances on all of them and tried to restart them. No luck...

We need to take some fast action with less than 1-1/2 hours before the rally.

David (Ironbits) or Adam, we need to think about setting up another temporary server and loading it up with some work for the rally. Perhaps Bruce and I only can run on the temporary server since we're the heaviest 2 hitters (I think) running the rally.

I'll be out for about a half hour here. If everyone can post their thoughts here, that would be great. I don't want a repeat of our 2nd rally.


Gary
gd_barnes is online now  
Old 2008-05-23, 17:46   #44
gd_barnes
 
gd_barnes's Avatar
 
May 2007
Kansas; USA

2×47×109 Posts
Default

I've sent a note to IronBits to be on stand by to set up a temporary LLRnet server for the rally.

Adam, I've killed all instances of the LLRnet server on my machines. Can you restart your server? I'm hoping that taking 18 instances of it away will help.


Gary
gd_barnes is online now  
Closed Thread

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
Rally Jan. 23rd-25th gd_barnes No Prime Left Behind 89 2009-01-25 22:59
LLRnet server rally 400<k<1001 August 8-10 mdettweiler No Prime Left Behind 66 2008-08-11 03:00
LLRnet server rally 400<k<1001 June 20-22 mdettweiler No Prime Left Behind 67 2008-06-23 15:32
LLRnet server rally port 300 May 3rd-4th gd_barnes No Prime Left Behind 45 2008-05-05 19:56
LLRnet server rally March 8th-9th gd_barnes No Prime Left Behind 135 2008-03-14 19:52

All times are UTC. The time now is 08:32.

Sat Dec 5 08:32:04 UTC 2020 up 2 days, 4:43, 0 users, load averages: 1.51, 1.72, 2.01

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2020, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.