mersenneforum.org  

Go Back   mersenneforum.org > Prime Search Projects > No Prime Left Behind > Raiders of the Lost Primes

Reply
 
Thread Tools
Old 2010-02-24, 23:10   #100
gd_barnes
 
gd_barnes's Avatar
 
"Gary"
May 2007
Overland Park, KS

3×5×7×113 Posts
Default

I'm having a serious concern about the most recent stress test now. With the aforementioned 5 errors fixed, there should be no problems. But something happened that has now happened 3 times in a row:

1. The server "loses" several pairs right at the very beginning. In this case, it is pairs numbered 6 thru 10. The first 5 went through OK. (BTW, my cache was set to 10 for this test.)

2. The server "loses" a large # of pairs at the very end. In this case, 42 of them, which is the fewest that its lost of any of the 3 stress tests I've run. (Likely because I was only running 4 cores vs. 31 cores.) Checking confirmed that it was the final 42 pairs.


They are just sitting in knpairs.txt and joblist.txt as though they were handed out and never processed. Yet checking my clients confirmed that they were.

I don't know if this is stress-related or related to problems in the Linux client/script. Since this occurred when running just one quad, which is effectively like 1000+ clients at n=~400K, which makes it a pretty decent stress test, I may need to run the Windows client to see if it has the same problem. I can simulate a similar load with 4 cores of my I7 with the same knpairs loaded in the server.

I initially thought that it might be related to the fact that all of the first few pairs are prime except that the same issue seems to be happening at the beginning of the file as at the end.

For reference, I'm attaching the final knpairs that didn't process and the joblist. See a few posts back where I posted the entire knpairs file. The prune period was set to 15 mins and the server dried some 9 hours ago so these are not just some straggling pairs that still need to be received by the server.


Gary
Attached Files
File Type: gz joblist-knpairs.tar.gz (909 Bytes, 113 views)

Last fiddled with by gd_barnes on 2010-02-24 at 23:11
gd_barnes is online now   Reply With Quote
Old 2010-02-24, 23:15   #101
gd_barnes
 
gd_barnes's Avatar
 
"Gary"
May 2007
Overland Park, KS

3·5·7·113 Posts
Default

Quote:
Originally Posted by kar_bon View Post
see post #97 in the first code-block: it's llrnet.lua.



i'll use the same link as in post #1 for any new version.

i'll try to implement the other options the next time, not sure if today all of them.

BTW: i thought about another helpful output:
when starting the script, prompt the most important setting from llr-clientconfig.txt at first!

Code:
+-------------------------------------+
| LLRnet client V0.9b7 with cLLR V3.8 |
| K.Bonath, 2010-02-10, Version 0.61  |
+-------------------------------------+

Current configuration:
server = "nplb-gb1.no-ip.org"
port = 9950
username = "kar_bon"
WUCacheSize=1
that's what would have saved some time on running and checking for errors at the first tests with the script (you know: forgot to change my username in your settings).

suggestions?

That's a good idea on displaying that info. But if you do it, we need to make sure Max agrees and that he can change the Linux client.


Gary
gd_barnes is online now   Reply With Quote
Old 2010-02-25, 01:02   #102
mdettweiler
A Sunny Moo
 
mdettweiler's Avatar
 
Aug 2007
USA (GMT-5)

11000011010102 Posts
Default

Quote:
Originally Posted by gd_barnes View Post
I'm having a serious concern about the most recent stress test now. With the aforementioned 5 errors fixed, there should be no problems. But something happened that has now happened 3 times in a row:

1. The server "loses" several pairs right at the very beginning. In this case, it is pairs numbered 6 thru 10. The first 5 went through OK. (BTW, my cache was set to 10 for this test.)

2. The server "loses" a large # of pairs at the very end. In this case, 42 of them, which is the fewest that its lost of any of the 3 stress tests I've run. (Likely because I was only running 4 cores vs. 31 cores.) Checking confirmed that it was the final 42 pairs.


They are just sitting in knpairs.txt and joblist.txt as though they were handed out and never processed. Yet checking my clients confirmed that they were.

I don't know if this is stress-related or related to problems in the Linux client/script. Since this occurred when running just one quad, which is effectively like 1000+ clients at n=~400K, which makes it a pretty decent stress test, I may need to run the Windows client to see if it has the same problem. I can simulate a similar load with 4 cores of my I7 with the same knpairs loaded in the server.

I initially thought that it might be related to the fact that all of the first few pairs are prime except that the same issue seems to be happening at the beginning of the file as at the end.

For reference, I'm attaching the final knpairs that didn't process and the joblist. See a few posts back where I posted the entire knpairs file. The prune period was set to 15 mins and the server dried some 9 hours ago so these are not just some straggling pairs that still need to be received by the server.


Gary
I have to wonder if this has something to do with what I suggested before, that the server might not "know" it's time to prune unless there's actually activity happening. I doubt it has a separate thread devoted to monitoring such things, so that kind of behavior would indeed by expected. In fact, come to think of it, in the past I was able to "trigger" an overdue prune by sending in a completely bogus result with the intent that it would be rejected--that's enough to "wake it up".
mdettweiler is offline   Reply With Quote
Old 2010-02-25, 01:13   #103
mdettweiler
A Sunny Moo
 
mdettweiler's Avatar
 
Aug 2007
USA (GMT-5)

2·55 Posts
Default

Quote:
Originally Posted by gd_barnes View Post
Max,

It took me a while but I finally concluded what you did: It's much better to "kill" the Linux client with the system manager than it is to do Ctl-C. There are several times I noticed when it took 3-4 Ctl-C's to kill it; usually on small tests -or- when the server had dried. (Don't quote me on the exact scenarios but I do know that sometimes it didn't want to "die" on the first Ctl-C.)

Can you please put something in the documentation about it being best to kill the clients when stopping them?
I wouldn't recommend using kill as a matter of course since that won't give LLR a chance to save its checkpoint file; for small tests like the ones we're testing with it's not a terribly big deal, but it would be not a good thing to recommend to users in general. Possibly it would be better to just say in the readme that sometimes you have to Ctrl-C it a few times to kill it (especially with small tests) and if it's getting connection errors. For connection errors, though, it shouldn't hurt to just kill it the "hard" way.
mdettweiler is offline   Reply With Quote
Old 2010-02-25, 03:42   #104
mdettweiler
A Sunny Moo
 
mdettweiler's Avatar
 
Aug 2007
USA (GMT-5)

2·55 Posts
Default

Quote:
Originally Posted by gd_barnes View Post
That's a good idea on displaying that info. But if you do it, we need to make sure Max agrees and that he can change the Linux client.
Yeah, I suppose I could do that--shouldn't be too hard.
mdettweiler is offline   Reply With Quote
Old 2010-02-25, 06:04   #105
gd_barnes
 
gd_barnes's Avatar
 
"Gary"
May 2007
Overland Park, KS

271318 Posts
Default

Quote:
Originally Posted by mdettweiler View Post
I have to wonder if this has something to do with what I suggested before, that the server might not "know" it's time to prune unless there's actually activity happening. I doubt it has a separate thread devoted to monitoring such things, so that kind of behavior would indeed by expected. In fact, come to think of it, in the past I was able to "trigger" an overdue prune by sending in a completely bogus result with the intent that it would be rejected--that's enough to "wake it up".
Hum. That doesn't quite hold water here. First, the first few pairs in the knpairs would have been processed long before. Second, the pairs are being shown as processed and sent to the server by the client yet they are not showing up in the results. For the prune to work, wouldn't the results have to be there? You might take a peak at port 9985. The files from yesterday's run including joblist, knpairs, results, and stdout all have a "-1st" extension on them. Shortly I'm going to run it again but I'll make the file much smaller this go around.

Unfortunately I've already stopped the server; saved off the applicable files and reloaded it. I'll have to try a smaller file to retest it in < 1 hour or so instead of waiting 6-7 hours for it to dry.

Quote:
Originally Posted by mdettweiler View Post
I wouldn't recommend using kill as a matter of course since that won't give LLR a chance to save its checkpoint file; for small tests like the ones we're testing with it's not a terribly big deal, but it would be not a good thing to recommend to users in general. Possibly it would be better to just say in the readme that sometimes you have to Ctrl-C it a few times to kill it (especially with small tests) and if it's getting connection errors. For connection errors, though, it shouldn't hurt to just kill it the "hard" way.
OK, agreed. Makes sense. For the most part, I've been able to make it stop by the 2nd Ctl-C and sometimes on the 1st. So yeah, just commenting that you might have to hit Ctl-C something like 2-4 times to get it to stop should be OK.


Gary

Last fiddled with by gd_barnes on 2010-02-25 at 06:05
gd_barnes is online now   Reply With Quote
Old 2010-02-25, 06:05   #106
mdettweiler
A Sunny Moo
 
mdettweiler's Avatar
 
Aug 2007
USA (GMT-5)

2×55 Posts
Default

I've now tested the do.pl script on Windows for most of today (since around 10 AM EST), and have encountered no problems except the small factor issue. BTW, I did a bit of investigating on that and found a couple things:

-The first of the four results (which came before the small factor in the batch) was accepted fine.
-On the server end, the small-factor result was received and accepted, though with the NewPGen header (!) put in place of the residual.
-The remaining 3 results in the batch, all of which came after the small-factor one, were rejected and subsequently thrown out by the client.

I'm not positive, but I think normal LLRnet is designed to be able to handle small factors correctly (though I haven't actually tested it). At any rate, though, no properly sieved file should ever have factors in it small enough for LLR to turn up; I imagine it wouldn't be a big deal if we didn't bother to fix it, since if there's small factors in the server, then there's a much bigger problem than just a few abandoned tests. Not to mention that if LLRnet doesn't have a precedent for handling these (as I said, I'm not sure if it does), then we wouldn't be able to fix it at all without adding code for it on the server end (which we probably don't want to get into).

Other than that, though, do.pl seems to be working perfectly. Gary, have you gotten the chance to test the latest version yet on Linux and do the stress test you were planning?
mdettweiler is offline   Reply With Quote
Old 2010-02-25, 06:14   #107
gd_barnes
 
gd_barnes's Avatar
 
"Gary"
May 2007
Overland Park, KS

3·5·7·113 Posts
Default

Quote:
Originally Posted by mdettweiler View Post
I've now tested the do.pl script on Windows for most of today (since around 10 AM EST), and have encountered no problems except the small factor issue. BTW, I did a bit of investigating on that and found a couple things:

-The first of the four results (which came before the small factor in the batch) was accepted fine.
-On the server end, the small-factor result was received and accepted, though with the NewPGen header (!) put in place of the residual.
-The remaining 3 results in the batch, all of which came after the small-factor one, were rejected and subsequently thrown out by the client.

I'm not positive, but I think normal LLRnet is designed to be able to handle small factors correctly (though I haven't actually tested it). At any rate, though, no properly sieved file should ever have factors in it small enough for LLR to turn up; I imagine it wouldn't be a big deal if we didn't bother to fix it, since if there's small factors in the server, then there's a much bigger problem than just a few abandoned tests. Not to mention that if LLRnet doesn't have a precedent for handling these (as I said, I'm not sure if it does), then we wouldn't be able to fix it at all without adding code for it on the server end (which we probably don't want to get into).

Other than that, though, do.pl seems to be working perfectly. Gary, have you gotten the chance to test the latest version yet on Linux and do the stress test you were planning?

The version that I posted yesterday is the latest version. Correct? lol It is that latest version that I ran my big stress test on yesterday. It was about halfway through the stress test that I changed a prime residue from 16 x's to a single digit of "0" like the Windows client. What I want to test today is the same script for the cancellation of pairs and the problem with the pairs not processed at the beginning and end of the file by the server.

I just got back in after a long day and need to do a couple of things yet. But I plan to test in the wee hours here for 2-4 hours.

BTW, I also observed what you did on a pair that had a factor of 5. It put the file header in the residue. You know what? I think that might explain why the 4-5 pairs right after it were not accepted by the server even though the client processed them. Bingo! And...if what you said about the final pruning is causing them not to be processed at the end, well...that might explain completely what happened yesterday with the pairs that weren't processed by the server. That said, the server never showed the results for the missing pairs at the end so I'm questioning how a final prune would actually be able to work.

Agreed that a small factor should never happen on a reasonably sieved file. As a programmer though, it would be nice to code around it but not at the expense of a lot of extra time/testing. I'll see what the code looks like.


Gary

Last fiddled with by gd_barnes on 2010-02-25 at 06:17
gd_barnes is online now   Reply With Quote
Old 2010-02-25, 06:24   #108
gd_barnes
 
gd_barnes's Avatar
 
"Gary"
May 2007
Overland Park, KS

271318 Posts
Default

Quote:
Originally Posted by kar_bon View Post
it's not so easy as thought, but the following lines will do the trick.
Code:
result, residue = primeTest(t, format("%s %s", k, n))
if result == 0 then
   residue = "0"
end
so, if a prime is found, set the residue to '0' and all is ok!

Note: not needed for the script, only for the 'old' version of the LLRnet-client.

Karsten,

I was looking to make this change to the residue for a prime in llrnet.lua on the Linux side but it appears to already default to a "0". Here is the code:

Code:
         -- perform prime test !
         if not asynchronous then
            Logout() -- logout before performing computation
         end
--       UpdateStatus(format("Working on : %s/%s (%s)", k, n, t))
--       print(format("Working on : %s/%s (%s)", k, n, t))
--       result, residue = primeTest(t, format("%s %s", k, n))
         result, residue = 0, "0"
         -- check user interruption
         if stopCheck() then
            return -- return with no error
         end
      end
      SemaWait(semaphore)

What change is needed to accomplish what you are talking about?

Edit: The code in the Windows client is the same. Please enlighten me.

Last fiddled with by gd_barnes on 2010-02-25 at 06:26
gd_barnes is online now   Reply With Quote
Old 2010-02-25, 06:31   #109
kar_bon
 
kar_bon's Avatar
 
Mar 2006
Germany

5·601 Posts
Default

Quote:
Originally Posted by gd_barnes View Post
BTW, I also observed what you did on a pair that had a factor of 5. It put the file header in the residue. You know what? I think that might explain why the 4-5 pairs right after it were not accepted by the server even though the client processed them. Bingo! And...if what you said about the final pruning is causing them not to be processed at the end, well...that might explain completely what happened yesterday with the pairs that weren't processed by the server. That said, the server never showed the results for the missing pairs at the end so I'm questioning how a final prune would actually be able to work.

Agreed that a small factor should never happen on a reasonably sieved file. As a programmer though, it would be nice to code around it but not at the expense of a lot of extra time/testing. I'll see what the code looks like.
i had a look at the llrserver.lua:
try to do this: there're functions called PrunePairs() and PruneJoblist() (called in funtion ProxyUpdate). make an output on the server with "print("PrunePairs Call 1")" everytime that function is called before (the other same) and gave every call an own number, so you can say, which call invokes the function.
even better: put the date/time into it:
Code:
print(format("PrunePairs Call #1: [%s] ", date("%Y-%m-%d\ %H:%M:%S")))
and test a small amout of pairs and a prune time of 15 mins.

this let you see, where and when the server pruned; perhaps there's issue: only pruning when results received.

Last fiddled with by kar_bon on 2010-02-25 at 06:36
kar_bon is offline   Reply With Quote
Old 2010-02-25, 06:34   #110
gd_barnes
 
gd_barnes's Avatar
 
"Gary"
May 2007
Overland Park, KS

3·5·7·113 Posts
Default

Quote:
Originally Posted by kar_bon View Post
i had a look at the llrserver.lua:
try to do this: there're functions called PrunePairs() and PruneJoblist() (called in funtion ProxyUpdate). make an output on the server with "print("PrunePairs Call 1")" everytime that function is called before (the other same) and gave every call an own number, so you can say, which call invokes the function.
even better: put the date/time into it:
Code:
print(format("PrunePairs Call #1: [%s] ", date("%Y-%m-%d\ %H:%M:%S")))
and test a small amout of pairs and a prune time of 15 mins.

Don't you ever sleep? lol

I'm going to need some help with this. I'm not clear on where in llrserver.lua that it goes. Can you post an updated llrserver.lua file with this change in it?
gd_barnes is online now   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
Anti-poverty drug testing vs "high" tax deduction testing kladner Soap Box 3 2016-10-14 18:43
What am I testing? GARYP166 Information & Answers 9 2009-02-18 22:41
k=243 testing ?? gd_barnes Riesel Prime Search 20 2007-11-08 21:13
Testing grobie Marin's Mersenne-aries 1 2006-05-15 12:26
Speed of P-1 testing vs. Trial Factoring testing eepiccolo Math 6 2006-03-28 20:53

All times are UTC. The time now is 13:24.


Thu Feb 9 13:24:55 UTC 2023 up 175 days, 10:53, 1 user, load averages: 0.69, 0.74, 0.79

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2023, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.

≠ ± ∓ ÷ × · − √ ‰ ⊗ ⊕ ⊖ ⊘ ⊙ ≤ ≥ ≦ ≧ ≨ ≩ ≺ ≻ ≼ ≽ ⊏ ⊐ ⊑ ⊒ ² ³ °
∠ ∟ ° ≅ ~ ‖ ⟂ ⫛
≡ ≜ ≈ ∝ ∞ ≪ ≫ ⌊⌋ ⌈⌉ ∘ ∏ ∐ ∑ ∧ ∨ ∩ ∪ ⨀ ⊕ ⊗ 𝖕 𝖖 𝖗 ⊲ ⊳
∅ ∖ ∁ ↦ ↣ ∩ ∪ ⊆ ⊂ ⊄ ⊊ ⊇ ⊃ ⊅ ⊋ ⊖ ∈ ∉ ∋ ∌ ℕ ℤ ℚ ℝ ℂ ℵ ℶ ℷ ℸ 𝓟
¬ ∨ ∧ ⊕ → ← ⇒ ⇐ ⇔ ∀ ∃ ∄ ∴ ∵ ⊤ ⊥ ⊢ ⊨ ⫤ ⊣ … ⋯ ⋮ ⋰ ⋱
∫ ∬ ∭ ∮ ∯ ∰ ∇ ∆ δ ∂ ℱ ℒ ℓ
𝛢𝛼 𝛣𝛽 𝛤𝛾 𝛥𝛿 𝛦𝜀𝜖 𝛧𝜁 𝛨𝜂 𝛩𝜃𝜗 𝛪𝜄 𝛫𝜅 𝛬𝜆 𝛭𝜇 𝛮𝜈 𝛯𝜉 𝛰𝜊 𝛱𝜋 𝛲𝜌 𝛴𝜎𝜍 𝛵𝜏 𝛶𝜐 𝛷𝜙𝜑 𝛸𝜒 𝛹𝜓 𝛺𝜔