mersenneforum.org Doing my own triple checks
 Register FAQ Search Today's Posts Mark Forums Read

 2021-03-21, 19:37 #1 tServo     "Marv" May 2009 near the Tannhäuser Gate 2·7·47 Posts Doing my own triple checks After running a DC on M56247413 via the latest Prime95, it didn't agree with the original results. So I ran it via CUDALucas on a different machine and the residues matched. However, is doing your own triple checks in this manner acceptable?
2021-03-21, 19:46   #2
kriesel

"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

5,419 Posts

Quote:
 Originally Posted by tServo After running a DC on M56247413 via the latest Prime95, it didn't agree with the original results. So I ran it via CUDALucas on a different machine and the residues matched. However, is doing your own triple checks in this manner acceptable?
Not very. CUDALucas is totally on the honor system. It lacks the Jacobi check so is more prone to undetected error. It's slower than Gpuowl 6.11-380, which includes Jacobi check. And doing CUDALucas or other manual gpu run after is not the best order. Run gpu DC and report it. It will usually be quicker than a cpu-based DC.
Then if a TC is needed, PrimeNet-connected prime95 can report interim residues, which is more convincing that a TC was performed, and correctly. And less likely to lead server admins etc to run a fourth check to see if the multiple submissions are real and reproducible, as they sometimes do for multiple LL tests on the same exponent by the same user.

Mainly though, in normal cases it's more efficient and reliable to run PRP/GEC/proof on the same exponent, instead of LL DC. There's a manual assignment choice for that, added recently.

2021-03-22, 04:01   #3
LaurV
Romulan Interpreter

Jun 2011
Thailand

72×197 Posts

Quote:
 Originally Posted by tServo is doing your own triple checks in this manner acceptable?
YES! (sorry Ken!) cudaLucas has a random shift implemented, which some here consider as valuable as GC (I would add "or more", because a random shift also protects against software/implementation errors). It is impossible* to get two matching residues for two different shifts, if any kind of error, hardware or software, happened, as the FFT will deal every time with totally different sets of data.

However, double-checking (and TC, QC, etc) your own work this way is frown about, because there is no mechanism implemented in cudaLucas (like encryption, etc), to prevent fraud, one could edit the reports with a text editor, change the shifts, and get additional credits, screwing up the whole system. Therefore, these DC's/TC's are accepted, but they go in a list** with exponents to be checked once more, with some "low priority", i.e. somebody will have to waste resources in the future to run one more test for them, if the resource allocation and time allows. This goes even for "trusted" users (like myself , ask Madpoo how many TCs he ran for my self-DC exponents, he will reply that if he catch me at a corner he'll break my legs, haha).

So, (and here I agree with Ken), if you have a card that can run gpuOwl, and the speed difference to cudaLucas is not a lot in cudaLucas favor, then you better run gpuOwl and PRP tests with CERT (even for LLDC work, as George already confirmed, this would be allowed, because the rate CERTs run is more efficient that the rate failed-LL will run, because something like 3% of the LLs fail, in average). If you are forced to run cudaLucas, because either the Owl is too slow compared, or can't be run in your card, then limit yourself to LLDC to which the first test was done by third parties.

If, in the last case, you get a mismatch, and you know your card is good, then probably you are right, and the initial LL was wrong (some are done 12 years ago, with the hardware at that time!). In this case, put the exponent here public, requesting a TC. We have dedicated threads for such activity (look for strategic DC and TC threads, if some mods didn't play with the titles - somebody will link you to them below, I bet). Some "big guns" are reading those threads, and take exponents for verification, so somebody else will do a TC (and most probably confirm your residue) faster than you can do it by yourself. And then, you are on the "good side".

---------
* impossible is limited to the 16 digits of the residue, you have a statistical 1/2^64 chance to get the same residue if you just pick a "random" one, haha. If cheating is involved, your chances are 1 in 256, because only the last byte is masked

** there is no physical list in the server, but there are few of us actively hunting for exponents/users in odd situations, and "maintaining" that list. Users doing strange things will be caught, sooner or later.

Last fiddled with by LaurV on 2021-03-22 at 04:10

2021-03-22, 08:41   #4
kriesel

"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

5,419 Posts

Quote:
 Originally Posted by LaurV cudaLucas has a random shift implemented, which some here consider as valuable as GC (I would add "or more", because a random shift also protects against software/implementation errors). It is impossible* to get two matching residues for two different shifts, if any kind of error, hardware or software, happened, as the FFT will deal every time with totally different sets of data.
That's wrong. CUDALucas V2.05 had pseudorandom shift. And systematic errors producing same residues for different exponents, shifts, and iterations. The math and bugs make the same bad residues appear in different situations including other software.

Quote:
 if you have a card that can run gpuOwl, and the speed difference to cudaLucas is not a lot in cudaLucas favor, then you better run gpuOwl and PRP tests with CERT (even for LLDC work, as George already confirmed, this would be allowed, because the rate CERTs run is more efficient that the rate failed-LL will run, because something like 3% of the LLs fail, in average). If you are forced to run cudaLucas, because either the Owl is too slow compared, or can't be run in your card, then limit yourself to LLDC to which the first test was done by third parties.
The usual error rate figure is 2% per primality test. But that is based mostly on smaller exponents since it's those already done and checked. The very small sample of verified LL results available for 100Mdigit exponents indicates their error rate is around 19.% per LL test for them. For confirming a 100Mdigit prime with LL, it's still workable; run 5 tests and probably 4 will agree. Extrapolating it out on an equal error rate (errors per computing device-month), LL tests near 1G have estimated 88.% overall final residue error probability, 12% chance of completing correctly. For 300Mdigit, it requires good hardware, conservatively configured, perhaps conservatively large fft length, Jacobi check, and will require ganged runs with regular interim res64 comparisons among runs and frequent permanent save files for retreat and retry from the last-believed-good state when (not if) interim residues begin to differ. The chance of completing 4 of 4 independent 300Mdigit LL tests to confirm a PRP as a Mersenne without error is small. 12%^4 is ~207 ppm. Running 8 independent runs on 300Mdigit gives a probability of about one correct final residue result and no knowledge which one. Maybe the confirmation team for large exponents uses special software, like the "supersafe" version of early gpuowl that did all the LL iterations twice and compared them along the way. https://mersenneforum.org/showpost.p...6&postcount=62 or adds on statistical analysis on the fly of res64 values. Residues repeating monotonously or short-cycle are easily detected symptoms of bugs, including some that might not be identified yet. See #3 of https://www.mersenneforum.org/showpo...1&postcount=10

---------
Quote:
 * impossible is limited to the 16 digits of the residue, you have a statistical 1/2^64 chance to get the same residue if you just pick a "random" one, haha. If cheating is involved, your chances are 1 in 256, because only the last byte is masked.
The statistical argument fails against systematic errors. First thing Madpoo would do when a prime indication came up is check whether the result was from CUDALucas; those were presumed to be false positives, based on experience. The occurrence of over 300 false matching residues 0x00 was a far from random distribution.
There's more on errors in primality testing at https://www.mersenneforum.org/showpo...40&postcount=4

Last fiddled with by kriesel on 2021-03-22 at 08:46

 2021-03-22, 09:28 #5 LaurV Romulan Interpreter     Jun 2011 Thailand 72×197 Posts Lots of straw men there. The guy did a P95 run and got a mismatch. Then he did a cudaLucas run and got the same residue as his P95 run. Which is not a zero, not a ef-ef-ef-ef, not 2 or -2. How it is that "not very" acceptable? And what all the fuss has to do with zero-residues and other typical bugs (fixed residues) in older versions of cudaLucas? P95 and gpuOwl also had older versions which produced wrong residues. And where did you get the 19%? That would mean one test in 5 done with cudaLucas in the past is wrong. Which is far away from the truth. Unless you count only few results for the very large exponents which are most probably fake reports for credit (as I said, it is easy to fake a cudaLucas report, and that's indeed a bad part about it, compared with the "newer stuff" which care a lot more about safety of the tests and reports.
2021-03-22, 21:46   #6
kriesel

"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

5,419 Posts

Quote:
 Originally Posted by LaurV However, double-checking (and TC, QC, etc) your own work this way is frown about, because there is no mechanism implemented in cudaLucas (like encryption, etc), to prevent fraud, one could edit the reports with a text editor, change the shifts, and get additional credits, screwing up the whole system. Therefore, these DC's/TC's are accepted, but ...
Nor is there protection against systematic error if running the same case on the same hardware type/software.
Quote:
 if you have a card that can run gpuOwl, and the speed difference to cudaLucas is not a lot in cudaLucas favor, then you better run gpuOwl and PRP tests with CERT (even for LLDC work, as George already confirmed, this would be allowed, because the rate CERTs run is more efficient that the rate failed-LL will run, because something like 3% of the LLs fail, in average). If you are forced to run cudaLucas, because either the Owl is too slow compared, or can't be run in your card, then limit yourself to LLDC to which the first test was done by third parties.
The considerable optimization preda and prime95 did on gpuowl a while back has as far as I know eliminated the cases where CUDALucas was competitive in performance for the same gpuowl-capable NVIDIA hardware and exponent. Danc2 or tdulcet recently reported a 78% speed advantage over CUDALucas for a fast Colab gpu. If anyone has recent-gpuowl-version counterexamples, please share specifics.
Quote:
 Originally Posted by LaurV The guy did a P95 run and got a mismatch. Then he did a cudaLucas run and got the same residue as his P95 run. Which is not a zero, not a ef-ef-ef-ef, not 2 or -2. How it is that "not very" acceptable?
(a) see your own statement about it in the first quote above.
(b) That order of result reporting makes fraud easy. Try as a thought experiment, reporting a CUDALucas result and then subsequently duplicating the periodic interim residues that PrimeNet connected prime95 would regularly generate and report along the way, for a different degree of difficulty.
(c) It's contrary to the assignment rules.
(d) It therefore generated more work for Madpoo to additional-check, with the consequent inefficiency.
Quote:
 And what all the fuss has to do with zero-residues and other typical bugs (fixed residues) in older versions of cudaLucas?
They're not corrected in CUDALucas. They're trapped for and execution terminated. That's very different from correcting or preventing the underlying problem(s) that produce them. Their rate of occurrence is a measure of software/hardware combination unreliability.

We know to look to the extremes on res64, zero or near it, or equivalently all-f or near it, for anomalies, and they do appear there. Just because other res64 values or other outputs are less conspicuously suspect does not mean they're right.
Quote:
 P95 and gpuOwl also had older versions which produced wrong residues.
Which unlike CUDALucas are actively maintained, with the bugs being identified and CORRECTED.
Quote:
 And where did you get the 19%? That would mean one test in 5 done with cudaLucas in the past is wrong. Which is far away from the truth. Unless you count only few results for the very large exponents
I specifically stated 19% was concerning 100Mdigit (p~333M) based on a very small sample size available of verified exponent LL tests. I got the 19% from a calculation posted at https://mersenneforum.org/showpost.p...&postcount=930. Feel free to contribute accurate runs toward increasing the very small 100Mdigit LL DC or TC sample size (for research purposes, not production). There are a few other 332M TC candidates listed there. The prime95 LL TC on ECC system ram of 332194529 has matched Fan Ming's gpuowl interim residues up to 14M (4.2%) so far and will take a few months to complete.
The 19%/LLtest on 100Mdigit is actually better reliability than I would project from Madpoo's statistical study of ~4%/exponent, 2%/LLtest error rate on p<40M. Probably only those with new fast reliable hardware stayed with the big exponents long enough.
If p~103M has ~2%/LLtest error rate, with today's level of error detection and correction (Jacobi symbol check included on most with prime95, and could be on gpuowl v6.11-380, but still not available in CUDALucas), 98% reliability/test at that run time corresponds to ~80% reliability at 332M (100Mdigit), assuming errors occur at an average rate per day of running; time ~p2.1, (332/103)2.1 ~11.7 times the runtime; 0.9811.7 ~0.79 probability of correct completion, 21% probability of erroneous completion. That's close to the 19% from scarce empirical data on 100Mdigit. The reliability estimate numbers are even more ghastly nearer p~1G, for the same runtime scaling reason. Six-month LL runs on a fast gpu are more likely to be wrong than right.
Quote:
 which are most probably fake reports for credit (as I said, it is easy to fake a cudaLucas report.
Different strokes for different folks. I like the idea of any deliberate fakers being discovered and discredited.

Last fiddled with by kriesel on 2021-03-22 at 21:47

2021-03-22, 22:57   #7
Prime95
P90 years forever!

Aug 2002
Yeehaw, FL

753710 Posts

Quote:
 Originally Posted by tServo After running a DC on M56247413 via the latest Prime95, it didn't agree with the original results. So I ran it via CUDALucas on a different machine and the residues matched. However, is doing your own triple checks in this manner acceptable?
Yes, but frowned upon.

Ignore the discussion of Cudalucas and shift counts, etc. The issue is: "Is it acceptable to double-check your own results"? The server will accept your double-check, so technically the answer is yes. However, there are users that think this is "too trusting" or "lacking in rigor". These users will hunt down these cases and perform a triple-check.

Thus, performing your own double-check ends up wasting resources as it creates work for those that do triple-checking. Best is to let someone else resolve your mismatching double-check.

2021-03-22, 23:06   #8
R. Gerbicz

"Robert Gerbicz"
Oct 2005
Hungary

101110011102 Posts

Quote:
 Originally Posted by LaurV YES! (sorry Ken!) cudaLucas has a random shift implemented, which some here consider as valuable as GC (I would add "or more", because a random shift also protects against software/implementation errors). It is impossible* to get two matching residues for two different shifts, if any kind of error, hardware or software, happened, as the FFT will deal every time with totally different sets of data.
These were discussed multiple times. A new thing to demonstrate that the shiftcount trick is actually not enough!
We claim that mult(x,y,p) returns (x*y)%(2^p-1), and implement [correctly!] the standard Lucas test with shiftcount [actually these shifts are not done in a fast genious way], with Pari-Gp:

Code:
mult(x,y,p)=mp=2^p-1;return((2^p-3*2^vecsum(binary(bitor(x,y)))-85*2^vecsum(binary(bitand(x,y))))%mp)
LucasTest(p,sh)={mp=2^p-1;a=(4*2^sh)%mp;for(i=1,p-2,sh=(2*sh)%p;a=(mult(a,a,p)-2^(sh+1))%mp);return((a*2^(p-sh))%mp)}

LucasTest(607,1)%(2^64)
LucasTest(607,3)%(2^64)
LucasTest(607,5)%(2^64)
? %3 = 18446744073709551613
? %4 = 18446744073709551613
? %5 = 18446744073709551613
So we lowered the number of known Mersenne primes by one ?
Passed the ancient shiftcount trick multiple times in a row, and returned by the same non-zero res64. We had a crappy multiplication code and passed the shiftcount checks in Lucas test.

ps. replace mult(x,y,p)=(x*y)%(2^p-1) to get the working multiplication method.
The crappy multiplication is commutative to worse the case, so
mult(x,y,p)=mult(y,x,p) for any x,y.

2021-03-23, 04:55   #9
LaurV
Romulan Interpreter

Jun 2011
Thailand

226658 Posts

Quote:
 Originally Posted by Prime95 Yes, but frowned upon.
That's exactly what I said.
Quote:
 Thus, performing your own double-check ends up wasting resources as it creates work for those that do triple-checking. Best is to let someone else resolve your mismatching double-check.
Agree with the second part. You just summarized well, what everybody is saying in this thread. We are in violent agreement here. However, first part is not true. Performing your own double-check ends up wasting YOUR OWN resources. You don't force anybody else, in any way, to waste their resources. Either if you do self-DC or if you don't do it, some third party guy will still do one more test. You don't cause anybody to waste resources, by DC-ing your own work, except yourself. You are free to do whatever you want with your resources. Double check your own work as much as you want, if this gives you peace of mind. Well, you don't help the project much (as you could use your OWN resources more efficient) but that is ANOTHER discussion.

This is my argument here for ages (see old discussions with Madpoo, etc).

PS @Robert, man, that is so fubar, it will never happen in real life , you still need to find a multiplication method that works for most of the other numbers, and fails for some. We have seen forced examples like that in the past, they are far from reality. But yeah, I got your point, CERT is better, nobody argues with that. Yet.

Last fiddled with by LaurV on 2021-03-23 at 04:58

 Similar Threads Thread Thread Starter Forum Replies Last Post GP2 Marin's Mersenne-aries 4 2019-03-16 15:31 eepiccolo PrimeNet 5 2007-01-31 06:50 patrik Data 5 2004-01-31 22:19 outlnder Lounge 4 2003-04-07 18:06 outlnder Lounge 6 2003-01-24 22:01

All times are UTC. The time now is 03:12.

Mon Aug 2 03:12:18 UTC 2021 up 9 days, 21:41, 0 users, load averages: 1.44, 1.35, 1.38