20031002, 20:40  #1 
P90 years forever!
Aug 2002
Yeehaw, FL
3^{2}·19·43 Posts 
Which exponents should be rereleased for first time tests?
GP2 has come up with several suggestions for creating a list of exponents to release as first time tests. A few months ago I released expoents up to 16 million(?) that had no tests with nonzero error counts (the bottom two counters of the error count field).
GP2  can you post your proposals for open debate? Can we reach an agreement on what our rules should be? If asking for first time tests, the user should get an exponent with a reasonable chance of finding a prime. I choose the 16 million limit under the theory that a user would rather get an untested exponent at 20 million than a oncetestedmaybebad exponent at 17 million. Maybe a sliding scale would be better  if there was only one error we trail the leading edge by 5 million, but if there were several errors we immmediately kick it back for a retest. Ideas and debate welcome... P.S. Next database update the nonverifiedLL database will contain the error count. 
20031002, 23:24  #2 
Jul 2003
2·3·5 Posts 
It would be of considerable benefit in the long run to include in hrf3.txt file the date the result was returned (the same goes for “bad” list). The reason would be to coordinate bad results from a particular user as a function of time. Numerous factors such as hardware changes/break downs, environmental heat, OC’ing, etc… can drastically change the reliability of a machine over time (for good or bad). With that information we could target suspected bad results with a high degree of accuracy, hence maintaining a high probability of the test yielding a true first time LL.
Of course this approach doesn’t help that much in the short term but can certainly make life simpler in the future. For now, the same data can be approximated using old hrf3.txt, status.txt, & cleared.txt files. Then with manual grunt work identify suspected bad results. If the suspected bad results meet a density threshold then they are rereleased for first time tests. If the density of bad results falls below the threshold then a group of Marin’s Mersennearies volunteers could work on refining the list of suspected bad results for that user until they were either completed or meet our threshold standards for rerelease. As to what that threshold needs to be, I would think something on the order of >= 3350% would warrant rereleased based on a sliding scale to the leading edge of first time tests. 
20031003, 00:32  #3 
Sep 2003
2583_{10} Posts 
OK, here are a few thoughts.
Most exponents only get a routine doublecheck a long time after the firsttime LL test. If the firsttime test gave an erroneous result, the worstcase scenario is that discovery of a new prime gets delayed for several years. So for obvious reasons, we'd like to accelerate the doublechecking of any result where there's a good reason to suspect that it has a higherthanaverage chance of being erroneous. With this in mind, George already examines results returned with nonzero error codes and returns those exponents for early retesting. Nonzero error codes indicate a higher probability that the result was erroneous. Note though that the error code does not catch all bad results. In fact, 55% of results that turn out to be bad are returned with a zero error code! And conversely some results with a nonzero error code turn out to be good anyway. So a lot of bad firsttime results are slipping through the net. How can we catch more of these bad firsttime results? Well, it turns out that not all machines are created equal. Although the overall error rate for exponents above 4M runs around 3.5  4.0%, there is huge variation. The overwhelming majority of machines (90%+ of them) have a 0% error rate. Conversely, some machines have a much higher error rate, for whatever reason (hardware issues, overclocking, bad memory chips, underpowered power supply, poor quality electricity, etc). So my proposal is to identify the "errorprone" machines and target their firsttime tests for early doublechecking, regardless of whether those firsttime tests returned a zero error code or not. We can identify errorprone machines by looking at their track record of old results in the data files, and checking to see if they have a history of returning results that were later confirmed to be bad. More specifics will follow in the next message. Last fiddled with by GP2 on 20031003 at 05:30 
20031003, 00:56  #4 
Sep 2003
3^{2}·7·41 Posts 
What kind of exponents should be released for early retesting? Here's some thoughts.
The odds of finding a Mersenne prime are very small. However, if you do firsttime LL testing your odds are about 25 to 35 times better than if you do routine doublechecking (given that the overall error rate for exponents over 4M is around 3.5 to 4.0%). So we assume that people who do firsttime LL testing work only want to get exponents for which there is a "legitimate" chance that it could be a prime. They don't want to get a double check unless there is a good possibility that the firsttime check was erroneous. And they probably don't want to get triple checks at all, since the odds are very strong that one of the two original results was correct and they both returned nonprime. So that's the basic philosophy. Should exponents less than M39 be released for early retesting? I'd say yes. George already does this with nonzero error code results. And second, I think people want to discover a prime even if it's not the current recordholder, just to get their name on the list of Mersenne discoverers. Any record will get broken in a couple of years any way. Holding the record is a fleeting thing but being a discoverer is forever. So I'd say: Early doublechecking? Yes. Early triplechecking? No. Exponents below M39 too? Yes. [Edit: early triplechecking would be done only if both existing tests were by errorprone machines] A ban on triplechecking is very unfortunate though, because it's only through triple checks that new bad results are confirmed. For many machines we currently only have inadequate statistical information as to whether they're errorprone or not, because they're doing mostly firsttime tests of higher exponents. Perhaps such early triplechecking can be done through volunteer efforts, but there'd be a lot of them... it might be helpful if triplechecking was allowed after all. What do you folks think about this? One other thing to consider: most people do not specifically ask for firsttime LL checking. They keep the default of "whatever work makes the most sense". So theoretically such folks would not object to receiving an occasional triplecheck if it makes "the most sense" for the overall good of the project. So, is it possible to make some small modifications to the client (and server) so that clients who accept "whatever makes the most sense" will randomly have a, say, 1% chance of getting a triplecheck instead of a firsttime LL test? Just wondering out loud... Last fiddled with by GP2 on 20031003 at 18:26 
20031003, 01:29  #5 
Sep 2003
101000010111_{2} Posts 
OK,
Finally some specifics: Calculate the error rate for confirmed results as follows: (bad)/(bad+good). bad = results in BAD good = results in LUCAS_V.TXT Ignore HRF3.TXT (unconfirmed results) for now... more on that later. So, if we set the standard for errorprone machines as:  Known error rates of 50% or more  At least two bad results returned (to reduce the chances of statistical fluke). Then we look for all unverified exponents returned by all such machines (unverified meaning it doesn't have two matching results). There are 963 such exponents, of which 782 need doublechecks and 181 need tripleorhigherchecks. Or, if we relax the standard to known error rates of 33% or more (instead of 50%), we get 1597 exponents, of which 1339 need doublechecks and 258 need tripleorhigherchecks. So for starters, I'd propose those 782 [or 1339] doublechecks to be released as firsttime LL tests. We can try these for now and see how they turn out, and then perhaps I'd propose more... [Edit: I forgot to filter out exponents which are already currently assigned in status.txt. That reduces the numbers slightly. Old numbers (as originally posted): 50% > 1172 = 907 + 265 33% > 1914 = 1531 + 383 New numbers: 50% > 963 = 782 + 181 33% > 1597 = 1339 + 258 ] Last fiddled with by GP2 on 20031003 at 01:48 
20031003, 01:56  #6 
Sep 2003
3^{2}×7×41 Posts 
Here's the file for the 50% standard.
Note some of the very lowest exponents (in the 10.2M range) have probably already been assigned by now. 
20031003, 01:56  #7 
Sep 2003
3^{2}×7×41 Posts 
Here's the file for the 33% standard.
Note some of the very lowest exponents (in the 10.2M range) have probably already been assigned by now. 
20031003, 02:34  #8 
Sep 2003
3^{2}·7·41 Posts 
When looking for errorprone machines, we can look at verified results (verified good or verified bad) as mentioned in the previous messages.
But we can also look at unverified results. If, for instance, a machine has 5 unverified results needing a doublecheck and 7 unverified results needing a triplecheck, it's probably error prone. There are 7 known errors, and it's much more likely that those errors all came from the one machine, rather than 7 independent errors from 7 different other users. Remember, 90%+ of machines have a 0% error rate. So those 5 unverified results should probably be sent for early doublechecking. I'm not sure what the threshold should be. Perhaps:  50% [or 33%] or more of the unverified exponents need triplechecking  At least two triplechecks required (to reduce the chance of a statistical fluke). I'll run some numbers to see how many exponents this would generate. 
20031003, 07:00  #9 
Sep 2003
101000010111_{2} Posts 
One final refinement: triple checks would be done if both original tests were done by errorprone machines (and quadruplecheck if all 3 tests were done by errorprone machines, etc).
If one LL test has been done, then:  if it was done by an errorprone machine, schedule an immediate doublecheck  if it was done by a nonerrorprone machine, it will be routinely doublechecked in due course If two LL tests have been done, then:  if both were done by errorprone machines, schedule an immediate triplecheck  if one or none were done by errorprone machines, it will be routinely triplechecked in due course. The two situations are more or less equivalent. As long as there is at least one presumedgood LL test for an exponent that came back with a nonprime result, there is no urgent need to schedule a verification test immediately. 
20031003, 17:54  #10 
Sep 2003
3^{2}×7×41 Posts 
OK,
To summarize: We define a presumedgood result as any result that was not returned by an errorprone machine. Any exponent that does not have at least one presumedgood result is a candidate for early reLLtesting. Any exponent that does have at least one presumedgood result will get reLLtested in due course (in a few years' time). How to define an errorprone machine? A preliminary definition, which might change over time: At least 50% bad/(bad+good) with bad >= 2 or At least 50% uv3_plus / (uv3_plus + uv2) with uv3_plus >=2 bad = results in BAD returned by that machine good = results in LUCAS_V.TXT returned by that machine uv2 = unverifiedneedsasecondcheck = results in HRF3.TXT returned by that machine, where there only exists one result for that exponent (ie, no other results exist that were returned by other machines). uv3_plus = unverifiedneedsathirdorhighercheck = results in HRF3.TXT returned by that machine, where there exist two or more results for that exponent (returned by that machine and other machines). Using this standard, we get 1407 exponents, of which 80 are triplechecks where both original tests were done by errorprone machines. [Edit: we get 1407 exponents after we filter out anything in STATUS.TXT or CLEARED.TXT. We don't want to reassign anything already currently assigned, or interfere with anything recently cleared but not yet removed from the cleared list.] So this list of 1407 is what I'm proposing at the moment. If it turns out well, we can lower the standard to 33% instead of 50% and do it again). Any comments or suggestions? [Edit: the exponents in the attachment below (new_err50.zip) are not quite in numerical order. The final 80 start over again from 10.3M (those are the triplechecks with both existing tests done by errorprone machines).] Last fiddled with by GP2 on 20031004 at 04:56 
20031003, 18:11  #11  
Sep 2003
101000010111_{2} Posts 
Quote:
Take the machine Team_Italia/Pisolo mentioned in the M77909869 thread. Look at all unverified exponents (in HRF3.TXT) that were returned by Team_Italia/Pisolo, including results for those exponents returned by other machines. 15421759,S01806,AMD1333,WX1 15421759,Team_Italia,Pisolo,WZ1 15605357,Team_Italia,Pisolo,WZ1 15605357,eccles,C633807E7,WX1 15605761,Team_Italia,Pisolo,WZ1 15605761,feiraus,RHXPCOMP,WX1 15606607,BranMuffin,Test_Unit,WX1 15606607,Team_Italia,Pisolo,WZ1 15607301,S61214,C9D7D87DC,WX1 15607301,Team_Italia,Pisolo,WZ1 16809887,Team_Italia,Pisolo,WZ1 17141533,Team_Italia,Pisolo,WZ1 17141701,Team_Italia,Pisolo,WZ1 17146933,Team_Italia,Pisolo,WZ1 17168813,Team_Italia,Pisolo,WZ1 8854607,S05753,c_harkins,WS5 8854607,Team_Italia,Pisolo,WZ1 9201217,SW,jungfrau,WV2 9201217,Team_Italia,Pisolo,WZ1 We get uv3_plus = 7 and uv2 = 5. That is, 5 of the exponents were returned by Pisolo alone, and 7 of them also had a nonmatching result returned by 7 different other machines. This means we have 7 known errors here (after all, every time two results don't match, one of them must be wrong). Now remember that 90%+ of machines have an error rate of 0%. So it is far more likely that the 7 errors are all due to Pisolo rather than that any significant number of them are due to errors by the other 7 machines involved. So Pisolo is considered an errorprone machine, even though we have no verified results for it (in BAD or LUCAS_V.TXT). Last fiddled with by GP2 on 20031003 at 18:14 

Thread Tools  
Similar Threads  
Thread  Thread Starter  Forum  Replies  Last Post 
Hyperthreading LL First Time Tests  danmur  Information & Answers  6  20161213 12:44 
Priority of DC compared to Firsttime tests  moebius  Data  4  20100303 02:01 
Requesting Firsttime LL tests but getting TF  rx7350  PrimeNet  11  20090212 10:15 
aren't self tests onetime only?  ixfd64  Software  1  20060424 00:04 
New first time tests under 13M have appeared  garo  Data  5  20050202 10:50 