mersenneforum.org Testing....
 Register FAQ Search Today's Posts Mark Forums Read

 2010-02-23, 06:02 #56 gd_barnes     "Gary" May 2007 Overland Park, KS 101110000111112 Posts Just got home now. Looking at things now. Sorry you missed all the admins online when you were on a couple of hours ago Karsten. I was on for 3-1/2 hours this afternoon and will be on for another 3 hours now. Max, this pruning thing is really starting to bother me but I think it's something that's existed in LLRnet for a long time. It seems that it takes far longer to prune pairs than it should. Anyway, here is what I'm not understanding: 1. The first few pairs of the file that were all small primes for k=3 (n=1K-10K primes) were shown as immediately rejected by the client. 2. When I look in the rejected file on the server for the rejected client results in #1, they aren't there. 3. When I look in the regular results on the server for the rejected client results in #1, 1 out of 5 of them ARE there. 4. When I look in joblist.txt for the rejected client results in #1, 4 out of 5 of them ARE there. The rejected client pairs are: 3 1274 3 3276 3 4204 3 5134 3 7559 Pairs still in joblist.txt and knpairs.txt: 3 3276 3 4204 3 5134 3 7559 Pair in results.txt on the server: 3 1274 So for some reason, the server wouldn't "take" 4 out of the 5 small k=3 primes results. Please note that these are NOT In the rejected SERVER pairs. They only show as rejected on the client. I think what I'm going to do is stop the server, clear everything out completely, and reload the server. Unfortunately I didn't save the pairs that I loaded. (Big mistake. I don't know what I was thinking.) I'll make sure I save them this time and possibly post them here. I'll also keep the files from this first big run. I'll put a file name extension of "-1st" on them. I'm also changing that primes.txt file option. I'd like to see all of the primes from all 4 cores in one directory on each machine. That will be cool. :-) Gary
 2010-02-23, 06:27 #57 kar_bon     Mar 2006 Germany 2,999 Posts so it's a pruning error? i thought of this: the knpairs-file on the server contains a blank line or something else because this error occurs almost instantly when k=209 was at n=250k (with your 31 cores grabbing pairs my last results not sent was 30000000000000:M:1:2:258 2009 249720 -2 0AE06F5C6CAB155A). i started the script for G4000 then and just got connection errors half an hour ago! why? can't connect to G4000! Last fiddled with by kar_bon on 2010-02-23 at 06:29
2010-02-23, 06:46   #58
mdettweiler
A Sunny Moo

Aug 2007
USA (GMT-5)

624910 Posts

Quote:
 Originally Posted by gd_barnes Just got home now. Looking at things now. Sorry you missed all the admins online when you were on a couple of hours ago Karsten. I was on for 3-1/2 hours this afternoon and will be on for another 3 hours now. Max, this pruning thing is really starting to bother me but I think it's something that's existed in LLRnet for a long time. It seems that it takes far longer to prune pairs than it should. Anyway, here is what I'm not understanding: 1. The first few pairs of the file that were all small primes for k=3 (n=1K-10K primes) were shown as immediately rejected by the client. 2. When I look in the rejected file on the server for the rejected client results in #1, they aren't there. 3. When I look in the regular results on the server for the rejected client results in #1, 1 out of 5 of them ARE there. 4. When I look in joblist.txt for the rejected client results in #1, 4 out of 5 of them ARE there. The rejected client pairs are: 3 1274 3 3276 3 4204 3 5134 3 7559 Pairs still in joblist.txt and knpairs.txt: 3 3276 3 4204 3 5134 3 7559 Pair in results.txt on the server: 3 1274 So for some reason, the server wouldn't "take" 4 out of the 5 small k=3 primes results. Please note that these are NOT In the rejected SERVER pairs. They only show as rejected on the client. I think what I'm going to do is stop the server, clear everything out completely, and reload the server. Unfortunately I didn't save the pairs that I loaded. (Big mistake. I don't know what I was thinking.) I'll make sure I save them this time and possibly post them here. I'll also keep the files from this first big run. I'll put a file name extension of "-1st" on them. I'm also changing that primes.txt file option. I'd like to see all of the primes from all 4 cores in one directory on each machine. That will be cool. :-) Gary
I'm not entirely sure what happened here so agreed, probably best to clean out and reload the server to make sure this wasn't a fluke from some boo-boo in one of the files or something like that. BTW, when you restart the server, try changing prunePeriod to 15 minutes in llr-serverconfig.txt. That should make the pruning less of an issue.

2010-02-23, 06:46   #59
mdettweiler
A Sunny Moo

Aug 2007
USA (GMT-5)

3·2,083 Posts

Quote:
 Originally Posted by kar_bon so it's a pruning error? i thought of this: the knpairs-file on the server contains a blank line or something else because this error occurs almost instantly when k=209 was at n=250k (with your 31 cores grabbing pairs my last results not sent was 30000000000000:M:1:2:258 2009 249720 -2 0AE06F5C6CAB155A). i started the script for G4000 then and just got connection errors half an hour ago! why? can't connect to G4000!
Man, you're right, that is weird...I can't connect to the server machine at all. I wonder if something went kapooey over on Gary's end?

 2010-02-23, 06:52 #60 kar_bon     Mar 2006 Germany 2,999 Posts want receive 100 WU's for offline pc, got only 50 at once, and error that pairs won't accepted: someone others did them. and i could not send all results! got to go to work and no new pairs for my i7 and laptop! sh...
2010-02-23, 06:55   #61
mdettweiler
A Sunny Moo

Aug 2007
USA (GMT-5)

3×2,083 Posts

Quote:
 Originally Posted by kar_bon want receive 100 WU's for offline pc, got only 50 at once, and error that pairs won't accepted: someone others did them. and i could not send all results! got to go to work and no new pairs for my i7 and laptop! sh...
Eh? That's weird. I'm not even getting into the server, so I'm not sure how you even got 50 workunits, let alone 100.

2010-02-23, 07:11   #62
gd_barnes

"Gary"
May 2007
Overland Park, KS

11,807 Posts

OK, guys, you way jumped the gun on me. From my last post, I hadn't stated that I had cleared everything out and reloaded the server yet. I've been playing around with some things; starting and stopping the server a couple of times and re-clearing some things. I didn't think anyone was around. Sorry.

Anyway, port 9950 has now been officially loaded back up and will remain going now. Max, attached are the pairs that I loaded into it.

2 problems:

1. I changed the appropriate option to false in the do.pl program but the primes are still writing to primes.txt in the individual directories instead of one directory above. Can you run a specific test on that on your end?

2. I changed the iterations to 1000000 in do.pl yet it's still displaying every 10000 iterations. (This sure seems like a tough thing to get rid of! Why is the default so small?) The continual extra display is driving me batty. lol Anyway, I made sure there was no previously existing .ini file in each directory.

One more thing: Don't forget about the problem trying to quit out of the clients when they can't get pairs. It is a serious major hassle to stop them and is part of the reason it took me a while to stop-start all of my clients. What I finally had to do after hitting Ctl-C several times on each (which turned out to not be necessary) is go to the system manager and kill all 4 instances of do.pl followed by killing all 4 instances of llrnet. If I only killed do.pl, the clients would try to "come back". It was really weird.

Gary
Attached Files
 knpairs.txt.tar.gz (60.4 KB, 155 views)

Last fiddled with by gd_barnes on 2010-02-23 at 07:36

2010-02-23, 07:54   #63
mdettweiler
A Sunny Moo

Aug 2007
USA (GMT-5)

141518 Posts

Quote:
 Originally Posted by gd_barnes OK, guys, you way jumped the gun on me. From my last post, I hadn't stated that I had cleared everything out and reloaded the server yet. I've been playing around with some things; starting and stopping the server a couple of times and re-clearing some things. I didn't think anyone was around. Sorry. Anyway, port 9950 has now been officially loaded back up and will remain going now. Max, attached are the pairs that I loaded into it. 2 problems: 1. I changed the appropriate option to false in the do.pl program but the primes are still writing to primes.txt in the individual directories instead of one directory above. Can you run a specific test on that on your end?
Oh! Duh, I see it now. You see this bit of code down in the checkForPrimes() subroutine?
Code:
      # If individualPrimeLog is set to true, we put primes.txt in the working directory.
# Otherwise, we put it in the parent directory.
if(individualPrimeLog) { open(PRIMELOG, ">>primes.txt"); }
else { open(PRIMELOG, ">>", "../primes.txt"); }
print PRIMELOG $line . "\n"; close(PRIMELOG); # If beepOnPrime is set to true, then beep (note: may not be supported on all configurations) print "\a"; The part that I put in bold needs to be$individualPrimeLog instead. I had a brain fart and forgot I was programming in Perl for a moment. I'll upload corrected files shortly.

Quote:
 2. I changed the iterations to 1000000 in do.pl yet it's still displaying every 10000 iterations. (This sure seems like a tough thing to get rid of! Why is the default so small?) The continual extra display is driving me batty. lol Anyway, I made sure there was no previously existing .ini file in each directory.
Did you stop and restart do.pl after making the change? It won't take effect until you do so. Also, keep in mind that it won't take effect until the next k/n pair after the one currently in progress when you stopped the program to change it; the script only writes out llr.ini at the beginning of each batch (otherwise it would mess up processing of the batch).

Quote:
 One more thing: Don't forget about the problem trying to quit out of the clients when they can't get pairs. It is a serious major hassle to stop them and is part of the reason it took me a while to stop-start all of my clients. What I finally had to do after hitting Ctl-C several times on each (which turned out to not be necessary) is go to the system manager and kill all 4 instances of do.pl followed by killing all 4 instances of llrnet. If I only killed do.pl, the clients would try to "come back". It was really weird.
Yes, as I mentioned before, I'll look into that; I haven't had time just yet but hopefully can do it tomorrow.

BTW, on a completely different topic, I never did get the chance to load up G6000; can you do that? Thanks.

 2010-02-23, 07:57 #64 gd_barnes     "Gary" May 2007 Overland Park, KS 11,807 Posts OK on #1. Glad that's an easy fix. On #2, I've started-stopped clients many times in all of this. I changed the # of iterations way earlier in the evening. There is definitely an issue there. I'll load port 6000 tomorrow. Vaughan has pulled most cores off of it also and without me on it until tomorrow, it can wait now. Last fiddled with by gd_barnes on 2010-02-23 at 08:51
2010-02-23, 08:38   #65
mdettweiler
A Sunny Moo

Aug 2007
USA (GMT-5)

3×2,083 Posts

Quote:
 Originally Posted by gd_barnes OK on #1. Glad that's an easy fix. On #2, I've started-stopped clients many times in all of this. I changed the # of iterations way earlier in the evening. There is definitely an issue there. I'll load port 6000 tomorrow.
After some further discussion with Gary over chat, I was able to squash #2. Gary, as you saw I applied the fix to the clients on jeepford, but those don't have #1 fixed; I'd recommend downloading the latest do.pl (which I just uploaded) and swapping them out.

Last fiddled with by mdettweiler on 2010-02-23 at 08:38

2010-02-23, 08:50   #66
gd_barnes

"Gary"
May 2007
Overland Park, KS

11,807 Posts

Quote:
 Originally Posted by mdettweiler After some further discussion with Gary over chat, I was able to squash #2. Gary, as you saw I applied the fix to the clients on jeepford, but those don't have #1 fixed; I'd recommend downloading the latest do.pl (which I just uploaded) and swapping them out.
I thought you fixed #1 on Jeepford also.

 Similar Threads Thread Thread Starter Forum Replies Last Post kladner Soap Box 3 2016-10-14 18:43 GARYP166 Information & Answers 9 2009-02-18 22:41 gd_barnes Riesel Prime Search 20 2007-11-08 21:13 grobie Marin's Mersenne-aries 1 2006-05-15 12:26 eepiccolo Math 6 2006-03-28 20:53

All times are UTC. The time now is 01:35.

Mon Jan 30 01:35:14 UTC 2023 up 164 days, 23:03, 0 users, load averages: 1.20, 1.36, 1.25