mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Data

Reply
 
Thread Tools
Old 2003-10-02, 20:40   #1
Prime95
P90 years forever!
 
Prime95's Avatar
 
Aug 2002
Yeehaw, FL

32·19·43 Posts
Default Which exponents should be re-released for first time tests?

GP2 has come up with several suggestions for creating a list of exponents to release as first time tests. A few months ago I released expoents up to 16 million(?) that had no tests with non-zero error counts (the bottom two counters of the error count field).

GP2 - can you post your proposals for open debate? Can we reach an agreement on what our rules should be?

If asking for first time tests, the user should get an exponent with a reasonable chance of finding a prime. I choose the 16 million limit under the theory that a user would rather get an untested exponent at 20 million than a once-tested-may-be-bad exponent at 17 million. Maybe a sliding scale would be better - if there was only one error we trail the leading edge by 5 million, but if there were several errors we immmediately kick it back for a retest. Ideas and debate welcome...

P.S. Next database update the non-verified-LL database will contain the error count.
Prime95 is offline   Reply With Quote
Old 2003-10-02, 23:24   #2
PrimeFun
 
PrimeFun's Avatar
 
Jul 2003

2·3·5 Posts
Default

It would be of considerable benefit in the long run to include in hrf3.txt file the date the result was returned (the same goes for “bad” list). The reason would be to coordinate bad results from a particular user as a function of time. Numerous factors such as hardware changes/break downs, environmental heat, OC’ing, etc… can drastically change the reliability of a machine over time (for good or bad). With that information we could target suspected bad results with a high degree of accuracy, hence maintaining a high probability of the test yielding a true first time LL.

Of course this approach doesn’t help that much in the short term but can certainly make life simpler in the future. For now, the same data can be approximated using old hrf3.txt, status.txt, & cleared.txt files. Then with manual grunt work identify suspected bad results. If the suspected bad results meet a density threshold then they are re-released for first time tests. If the density of bad results falls below the threshold then a group of Marin’s Mersenne-aries volunteers could work on refining the list of suspected bad results for that user until they were either completed or meet our threshold standards for re-release.

As to what that threshold needs to be, I would think something on the order of >= 33-50% would warrant re-released based on a sliding scale to the leading edge of first time tests.
PrimeFun is offline   Reply With Quote
Old 2003-10-03, 00:32   #3
GP2
 
GP2's Avatar
 
Sep 2003

258310 Posts
Default

OK, here are a few thoughts.


Most exponents only get a routine double-check a long time after the first-time LL test. If the first-time test gave an erroneous result, the worst-case scenario is that discovery of a new prime gets delayed for several years. So for obvious reasons, we'd like to accelerate the double-checking of any result where there's a good reason to suspect that it has a higher-than-average chance of being erroneous.


With this in mind, George already examines results returned with nonzero error codes and returns those exponents for early re-testing. Nonzero error codes indicate a higher probability that the result was erroneous.

Note though that the error code does not catch all bad results. In fact, 55% of results that turn out to be bad are returned with a zero error code! And conversely some results with a nonzero error code turn out to be good anyway. So a lot of bad first-time results are slipping through the net.


How can we catch more of these bad first-time results?

Well, it turns out that not all machines are created equal. Although the overall error rate for exponents above 4M runs around 3.5 - 4.0%, there is huge variation. The overwhelming majority of machines (90%+ of them) have a 0% error rate. Conversely, some machines have a much higher error rate, for whatever reason (hardware issues, overclocking, bad memory chips, underpowered power supply, poor quality electricity, etc).

So my proposal is to identify the "error-prone" machines and target their first-time tests for early double-checking, regardless of whether those first-time tests returned a zero error code or not.

We can identify error-prone machines by looking at their track record of old results in the data files, and checking to see if they have a history of returning results that were later confirmed to be bad.

More specifics will follow in the next message.

Last fiddled with by GP2 on 2003-10-03 at 05:30
GP2 is offline   Reply With Quote
Old 2003-10-03, 00:56   #4
GP2
 
GP2's Avatar
 
Sep 2003

32·7·41 Posts
Default

What kind of exponents should be released for early re-testing? Here's some thoughts.


The odds of finding a Mersenne prime are very small.

However, if you do first-time LL testing your odds are about 25 to 35 times better than if you do routine double-checking (given that the overall error rate for exponents over 4M is around 3.5 to 4.0%).

So we assume that people who do first-time LL testing work only want to get exponents for which there is a "legitimate" chance that it could be a prime. They don't want to get a double check unless there is a good possibility that the first-time check was erroneous. And they probably don't want to get triple checks at all, since the odds are very strong that one of the two original results was correct and they both returned non-prime.

So that's the basic philosophy.


Should exponents less than M39 be released for early re-testing? I'd say yes. George already does this with nonzero error code results. And second, I think people want to discover a prime even if it's not the current record-holder, just to get their name on the list of Mersenne discoverers. Any record will get broken in a couple of years any way. Holding the record is a fleeting thing but being a discoverer is forever.

So I'd say:
Early double-checking? Yes.
Early triple-checking? No.
Exponents below M39 too? Yes.

[Edit: early triple-checking would be done only if both existing tests were by error-prone machines]

A ban on triple-checking is very unfortunate though, because it's only through triple checks that new bad results are confirmed. For many machines we currently only have inadequate statistical information as to whether they're error-prone or not, because they're doing mostly first-time tests of higher exponents.

Perhaps such early triple-checking can be done through volunteer efforts, but there'd be a lot of them... it might be helpful if triple-checking was allowed after all. What do you folks think about this?


One other thing to consider: most people do not specifically ask for first-time LL checking. They keep the default of "whatever work makes the most sense". So theoretically such folks would not object to receiving an occasional triple-check if it makes "the most sense" for the overall good of the project.

So, is it possible to make some small modifications to the client (and server)
so that clients who accept "whatever makes the most sense" will randomly have a, say, 1% chance of getting a triple-check instead of a first-time LL test? Just wondering out loud...

Last fiddled with by GP2 on 2003-10-03 at 18:26
GP2 is offline   Reply With Quote
Old 2003-10-03, 01:29   #5
GP2
 
GP2's Avatar
 
Sep 2003

1010000101112 Posts
Default

OK,
Finally some specifics:

Calculate the error rate for confirmed results as follows: (bad)/(bad+good).

bad = results in BAD
good = results in LUCAS_V.TXT

Ignore HRF3.TXT (unconfirmed results) for now... more on that later.


So, if we set the standard for error-prone machines as:

- Known error rates of 50% or more
- At least two bad results returned (to reduce the chances of statistical fluke).

Then we look for all unverified exponents returned by all such machines (unverified meaning it doesn't have two matching results).

There are 963 such exponents, of which 782 need double-checks and 181 need triple-or-higher-checks.

Or,
if we relax the standard to known error rates of 33% or more (instead of 50%), we get 1597 exponents, of which 1339 need double-checks and 258 need triple-or-higher-checks.


So for starters, I'd propose those 782 [or 1339] double-checks to be released as first-time LL tests.

We can try these for now and see how they turn out, and then perhaps I'd propose more...


[Edit: I forgot to filter out exponents which are already currently assigned in status.txt. That reduces the numbers slightly.

Old numbers (as originally posted):
50% --> 1172 = 907 + 265
33% --> 1914 = 1531 + 383

New numbers:
50% --> 963 = 782 + 181
33% --> 1597 = 1339 + 258
]

Last fiddled with by GP2 on 2003-10-03 at 01:48
GP2 is offline   Reply With Quote
Old 2003-10-03, 01:56   #6
GP2
 
GP2's Avatar
 
Sep 2003

32×7×41 Posts
Default

Here's the file for the 50% standard.

Note some of the very lowest exponents (in the 10.2M range) have probably already been assigned by now.
Attached Files
File Type: zip exp50pct.zip (3.1 KB, 175 views)
GP2 is offline   Reply With Quote
Old 2003-10-03, 01:56   #7
GP2
 
GP2's Avatar
 
Sep 2003

32×7×41 Posts
Default

Here's the file for the 33% standard.

Note some of the very lowest exponents (in the 10.2M range) have probably already been assigned by now.
Attached Files
File Type: zip exp33pct.zip (5.1 KB, 164 views)
GP2 is offline   Reply With Quote
Old 2003-10-03, 02:34   #8
GP2
 
GP2's Avatar
 
Sep 2003

32·7·41 Posts
Default

When looking for error-prone machines, we can look at verified results (verified good or verified bad) as mentioned in the previous messages.

But we can also look at unverified results.

If, for instance, a machine has 5 unverified results needing a double-check and 7 unverified results needing a triple-check, it's probably error prone. There are 7 known errors, and it's much more likely that those errors all came from the one machine, rather than 7 independent errors from 7 different other users. Remember, 90%+ of machines have a 0% error rate.

So those 5 unverified results should probably be sent for early double-checking.

I'm not sure what the threshold should be. Perhaps:
- 50% [or 33%] or more of the unverified exponents need triple-checking
- At least two triple-checks required (to reduce the chance of a statistical fluke).

I'll run some numbers to see how many exponents this would generate.
GP2 is offline   Reply With Quote
Old 2003-10-03, 07:00   #9
GP2
 
GP2's Avatar
 
Sep 2003

1010000101112 Posts
Default

One final refinement: triple checks would be done if both original tests were done by error-prone machines (and quadruple-check if all 3 tests were done by error-prone machines, etc).

If one LL test has been done, then:
- if it was done by an error-prone machine, schedule an immediate double-check
- if it was done by a non-error-prone machine, it will be routinely double-checked in due course

If two LL tests have been done, then:
- if both were done by error-prone machines, schedule an immediate triple-check
- if one or none were done by error-prone machines, it will be routinely triple-checked in due course.

The two situations are more or less equivalent. As long as there is at least one presumed-good LL test for an exponent that came back with a non-prime result, there is no urgent need to schedule a verification test immediately.
GP2 is offline   Reply With Quote
Old 2003-10-03, 17:54   #10
GP2
 
GP2's Avatar
 
Sep 2003

32×7×41 Posts
Default

OK,

To summarize:

We define a presumed-good result as any result that was not returned by an error-prone machine.

Any exponent that does not have at least one presumed-good result is a candidate for early re-LL-testing.

Any exponent that does have at least one presumed-good result will get re-LL-tested in due course (in a few years' time).


How to define an error-prone machine?
A preliminary definition, which might change over time:

At least 50% bad/(bad+good) with bad >= 2
or
At least 50% uv3_plus / (uv3_plus + uv2) with uv3_plus >=2

bad = results in BAD returned by that machine
good = results in LUCAS_V.TXT returned by that machine

uv2 = unverified-needs-a-second-check =
results in HRF3.TXT returned by that machine, where there only exists one result for that exponent (ie, no other results exist that were returned by other machines).

uv3_plus = unverified-needs-a-third-or-higher-check =
results in HRF3.TXT returned by that machine, where there exist two or more results for that exponent (returned by that machine and other machines).

Using this standard, we get 1407 exponents, of which 80 are triple-checks where both original tests were done by error-prone machines.

[Edit: we get 1407 exponents after we filter out anything in STATUS.TXT or CLEARED.TXT. We don't want to reassign anything already currently assigned, or interfere with anything recently cleared but not yet removed from the cleared list.]

So this list of 1407 is what I'm proposing at the moment. If it turns out well, we can lower the standard to 33% instead of 50% and do it again).

Any comments or suggestions?

[Edit: the exponents in the attachment below (new_err50.zip) are not quite in numerical order. The final 80 start over again from 10.3M (those are the triple-checks with both existing tests done by error-prone machines).]
Attached Files
File Type: zip new_err50.zip (5.4 KB, 169 views)

Last fiddled with by GP2 on 2003-10-04 at 04:56
GP2 is offline   Reply With Quote
Old 2003-10-03, 18:11   #11
GP2
 
GP2's Avatar
 
Sep 2003

1010000101112 Posts
Default

Quote:
Originally posted by GP2

At least 50% uv3_plus / (uv3_plus + uv2) with uv3_plus >=2
Just in case the "uv3_plus" business is mysterious, here's a concrete example:

Take the machine Team_Italia/Pisolo mentioned in the M77909869 thread.

Look at all unverified exponents (in HRF3.TXT) that were returned by Team_Italia/Pisolo, including results for those exponents returned by other machines.


15421759,S01806,AMD1333,WX1
15421759,Team_Italia,Pisolo,WZ1
15605357,Team_Italia,Pisolo,WZ1
15605357,eccles,C633807E7,WX1
15605761,Team_Italia,Pisolo,WZ1
15605761,feiraus,RHXPCOMP,WX1
15606607,BranMuffin,Test_Unit,WX1
15606607,Team_Italia,Pisolo,WZ1
15607301,S61214,C9D7D87DC,WX1
15607301,Team_Italia,Pisolo,WZ1
16809887,Team_Italia,Pisolo,WZ1
17141533,Team_Italia,Pisolo,WZ1
17141701,Team_Italia,Pisolo,WZ1
17146933,Team_Italia,Pisolo,WZ1
17168813,Team_Italia,Pisolo,WZ1
8854607,S05753,c_harkins,WS5
8854607,Team_Italia,Pisolo,WZ1
9201217,SW,jungfrau,WV2
9201217,Team_Italia,Pisolo,WZ1


We get uv3_plus = 7 and uv2 = 5.
That is, 5 of the exponents were returned by Pisolo alone, and 7 of them also had a non-matching result returned by 7 different other machines.

This means we have 7 known errors here (after all, every time two results don't match, one of them must be wrong).

Now remember that 90%+ of machines have an error rate of 0%. So it is far more likely that the 7 errors are all due to Pisolo rather than that any significant number of them are due to errors by the other 7 machines involved.

So Pisolo is considered an error-prone machine, even though we have no verified results for it (in BAD or LUCAS_V.TXT).

Last fiddled with by GP2 on 2003-10-03 at 18:14
GP2 is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
Hyperthreading L-L First Time Tests danmur Information & Answers 6 2016-12-13 12:44
Priority of DC compared to First-time tests moebius Data 4 2010-03-03 02:01
Requesting First-time LL tests but getting TF rx7350 PrimeNet 11 2009-02-12 10:15
aren't self tests one-time only? ixfd64 Software 1 2006-04-24 00:04
New first time tests under 13M have appeared garo Data 5 2005-02-02 10:50

All times are UTC. The time now is 05:52.

Tue Mar 2 05:52:32 UTC 2021 up 89 days, 2:03, 0 users, load averages: 1.88, 2.15, 2.39

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.