mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware > GPU Computing

Reply
 
Thread Tools
Old 2017-08-11, 02:06   #78
science_man_88
 
science_man_88's Avatar
 
"Forget I exist"
Jul 2009
Dumbassville

26×131 Posts
Default

Quote:
Originally Posted by preda View Post
The Brent-Zimmerman paper? (link?)
https://maths-people.anu.edu.au/~bre..._ACCMCC_10.pdf pretty easy to find once you know what to search for.
science_man_88 is offline   Reply With Quote
Old 2017-08-11, 02:30   #79
Madpoo
Serpentine Vermin Jar
 
Madpoo's Avatar
 
Jul 2014

CDF16 Posts
Default

Quote:
Originally Posted by preda View Post
...
Talking about "wrong LL" probabilities, while the overall error rate as you say is small (about 4% or less), I would suspect a very skewed distribution from the POV of the hardware involved: I think there is a big number of producers with 0% errors, and a small number of producers with a very high error rate (let's say, 50% errors).

Intuitively, I say that the hardware is split in two distinct categories, "good" which produces only correct LL, and "broken" which produces mostly wrong LL.
...
You're mostly correct. My analysis has shown that machines generating bad results tend towards generating only bad results. Machines with awesome track records will generally be error free.

In fact, when I started and there were thousands of pending mismatches (now only a handful), I used those general trends to guess the winner and loser.

On the other hand, there are the oddball machines that started out great and then got worse over time, to the point where everything they turned in was junk. Maybe some memory module degraded, or it got dusty in there and heat started causing issues, or who knows what.

There are also systems where the bad results came and went in waves. Maybe they were trying out some overclocking and ran into issues so they dialed it back, then tried again later on. Could have been dozens of other things too, I'm sure. People living in hot seasonal climates, turning in bad results only in summer time? Maybe.

A classic example of that is a particular system by the user Robert_SoCal that currently has 220 bad results, 117 good ones, and still 519 unknown.

When I break that one down by year & month, it's clear to see that it had good months and bad months. From 2012 to the beginning of 2015, it did terrible. I think over 50% of the results were bad.

From April 2015 onwards, it started to improve. I, and others, have done cherry picking of it's newer 2015+ exponents to see if the bad trend continued or not, but when I look at its history, it's last bad result came in April 2015 and we've verified about 36 in the months up to Dec 2016. Still, there are a lot of unverified exponents out there, 25-35 per month, and only 1-5 that we've verified in each of those months. We may have got lucky and happened to verify the exponent it did right.

Of course in this case, this is his "Manual Testing" cpu, so the results are probably being pasted in from different systems over the years, all CUDALucas from 2.04 beta up to 2.05.1.
Madpoo is offline   Reply With Quote
Old 2017-08-11, 02:56   #80
GP2
 
GP2's Avatar
 
Sep 2003

A1816 Posts
Default

Quote:
Originally Posted by preda View Post
Talking about "wrong LL" probabilities, while the overall error rate as you say is small (about 4% or less), I would suspect a very skewed distribution from the POV of the hardware involved: I think there is a big number of producers with 0% errors, and a small number of producers with a very high error rate (let's say, 50% errors).

Intuitively, I say that the hardware is split in two distinct categories, "good" which produces only correct LL, and "broken" which produces mostly wrong LL.
I believe this intuition is incorrect.

I mostly do double-checks of strategic exponents. That is, I try to identify exponents which have a high likelihood of having an incorrect first-time check and then perform double checks on them. So far, I've had almost 600 mismatches, where my double-check result differed from the first-time check (and all subsequently confirmed on triple check, as expected since I use servers with ECC memory).

So I spend time trying to look for patterns, and... it turns out, it's hard to come up with any general rules. The split into two distinct categories, which you posit, doesn't really exist in practice. There are machines with 10% error rates, with 20% error rates, with every kind of error rate under the sun. There are machines where erroneous results are strongly correlated with a nonzero mprime error code and other machines which produce erroneous results without setting any mprime error codes. There are some machines whose erroneous results are concentrated in certain calendar months and others where they are not. "Happy families machines are all alike; every unhappy machine is unhappy in its own way."

But let's consider a machine with a 50% error rate. Glass half empty or half full? 50% of its results are good. So if we plot a probability histogram of the number of errors in LL tests, there is a peak at n=0, where P(n)=0.50... so what does the rest of the histogram look like for n > 0 ? Most likely it is monotonically decreasing, so there will be a sizeable number of LL tests where there is only one error. So even with a machine with such a high error rate, the Jacobi check with rewind to the last savefile will produce non-negligible benefits. And from empirical observation there are many machines with much lower error rates — at least for mprime. I don't know if the error characteristics of GPU-based programs will differ from CPU-based programs like mprime.

Last fiddled with by GP2 on 2017-08-11 at 03:02
GP2 is offline   Reply With Quote
Old 2017-08-11, 03:39   #81
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

32×7×83 Posts
Default

Quote:
Originally Posted by preda View Post
Placing the buffers avoiding "bad memory" is a good idea, but I have not implemented that yet. A question I have about memory testing, is how does the virtual to physical address mapping interact with bad memory locations. I would suppose, if the virtual mapping changes, that the same bad physical location can show up at different spots in the virtual address space at different points in time, and pinning down the actual bad location (physical) becomes difficult.


The main reason for doing Jacobi on the CPU is ease of programming (in fact I just invoke GMP and I don't have to program it at all). Implementing that algorithm efficiently and on GPU would be a sizable piece of work.

A secondary benefit of pushing the Jacobi check on the CPU is that the GPU performance does not decrease when the check is on. (at the cost of more CPU cycles).


I suppose double checks are best done with a different software (e.g. mprime), and if that supports offset then nothing is lost by having gpuOwL use offset==0.
Thanks for the explanations.

Re allocating bad memory to lock it out, sorry if I was unclear before, I think the key is to do it at real physical addresses, and have it persist long enough, and that may be why in linux it was a kernel driver implementation. I was speaking of CUDA or openCL calls permanently allocating physical GPU bad memory to take it out of circulation, not OS-managed general purpose virtual memory in the system RAM DIMMs. Perhaps checking gpu ram could be done in a fast startup-time memory test; keep allocated and don't use the blocks that test bad. That overhead would only need to be paid at startup for cards that have already tested bad in some memory blocks. (Does the OS virtualize and page out gpu memory while a gpu program is running!?) In my memory testing of one GPU card, via CUDALucas, the range of 25MB blocks that failed was quite stable from run to run (blocks 23-40 out of ~58. The CUDA memory model article at https://www.3dgep.com/cuda-memory-mo...A_Memory_Types does not contain the string "virtual" (for whatever that is worth) http://www.seas.upenn.edu/~cis565/LECTURES/Lecture3.pdf slide 7 contains "virtual memory -does not exist". There may be persistence issues with the approach.

Re double checks: little is lost until the next software package comes along and hasn't yet implemented nonzero offsets (as was once the case with practically everything, including cudalucas and prime95), or gpuOwl use becomes widespread, and one LLtest is done by gpuOwl, and the other lltest is done by gpuOwl or another zero-offset-only software. Lots of results submitted with zero offset sets the stage for such an issue later.
kriesel is offline   Reply With Quote
Old 2017-08-11, 05:36   #82
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

32×7×83 Posts
Default

Quote:
Originally Posted by preda View Post
gpuOwL does save "persistent" checkpoints, by default every 10M iterations. The program does not automatically use those for anything yet, but the user can manually rollback to them.

The issue with automatically rolling back on a Jacobi-error, is that the only previous point that's guaranteed to be correct is the very beginning.

Talking about "wrong LL" probabilities, while the overall error rate as you say is small (about 4% or less), I would suspect a very skewed distribution from the POV of the hardware involved: I think there is a big number of producers with 0% errors, and a small number of producers with a very high error rate (let's say, 50% errors).

Intuitively, I say that the hardware is split in two distinct categories, "good" which produces only correct LL, and "broken" which produces mostly wrong LL.

Now, the moment when a self-check detects a Jacobi-error, it places this particular instance of hardware in the "broken" category, with very high expected error rate. (Not the 4% average error rate, but much higher)
I see something different here in hardware reliability. I think there is a distribution of mersenne-engine reliabilities (and the reliability declines with age for the same hardware unit). An admittedly small data set is with the past 106 verified residues I've produced, there were 3 different hardware units that produced one known bad result each. (Makes for a very small bad-residues log--so far.) One of them has run verified residues before and after, and also repeated LL test successfully on one small prime before the error and 16 after. Another produced a bad residue after 3 verified ones. Another has produced 3 verified residues before the bad one in its current installation, and probably more verified before relocation there. None of the three systems seem to qualify as broken in your definition (history of mostly wrong LL residues). In a fourth case the hardware was demonstrably reliably errorring in memory, so its LL use was not begun. Nearly my whole fleet of hardware is old hardware bought used.

Running CUDA code, it is common that excessive roundoff error is detected and corrected by a restart from last save and retry. Also resetting the device and restarting from last checkpoint. Including verified results. Too much of it may result in a bad residue. Some of it is no problem.

No interim residue or final residue is guaranteed to be correct. Passing the Jacobi test or any other error check is consistent with a higher probability of correctness, to that point, and that's as much as we can hope for.

Please consider giving users of your software the choice of continuing after a recovery from last believed-good save file if the Jacobi check indicates recent iterations went wrong, rather than abandoning all the previous work on that exponent run.
kriesel is offline   Reply With Quote
Old 2017-08-11, 06:14   #83
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

522910 Posts
Default

Quote:
Originally Posted by Madpoo View Post
At the end of the test those temp files could be deleted then.

Probably the only reason Prime95 doesn't do that now has everything to do with the good old days when it started, back in '96. Drive space (and speed) were factors and saving a bunch of temp files along the way could have caused issues.
Back in the V18-20 prime95 QA effort (~16 years ago?) we saved interim files every million iterations for possible restart from there.
kriesel is offline   Reply With Quote
Old 2017-08-11, 06:19   #84
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

32×7×83 Posts
Default

Quote:
Originally Posted by preda View Post
The drawback of -supersafe is that it's twice slower. The benefit is that it's very strongly protected from hardware errors such as memory corruption (at any level, global/cache/register) or non-systematic arithmetic corruption (if there is such a kind of hardware error).

I have a GPU that went bad. I could not use it for LL anymore. Now, with -supersafe, it's twice slower but I trust the results again. (also, I can drop the underclock that I was using in hope to improve reliability before).
That GPU and the electricity feeding it would probably better serve the GIMPS project by doing trial factoring, than half-speed LL test.

And, it's already benefited the project considerably, by motivating your inquiry into reliable running, thereby bringing the Jacobi test into play.

I wouldn't run code that _required_ running at half speed.

Last fiddled with by kriesel on 2017-08-11 at 06:20
kriesel is offline   Reply With Quote
Old 2017-08-11, 07:07   #85
preda
 
preda's Avatar
 
"Mihai Preda"
Apr 2015

23×32×19 Posts
Default

Quote:
Originally Posted by kriesel View Post
I wouldn't run code that _required_ running at half speed.
It doesn't *require* it, it's just an option for the user that suspects the hardware is not reliable yet wants to squeeze LL from it (and that'd be some pretty solid LL). It's a trade-off -- admittedly an expensive one.

The timing as is (4.5 ms / double iteration) isn't terrible. Now if I could just halve that.. :)
preda is offline   Reply With Quote
Old 2017-08-11, 14:00   #86
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

32×7×83 Posts
Default

Quote:
Originally Posted by preda View Post
It doesn't *require* it, it's just an option for the user that suspects the hardware is not reliable yet wants to squeeze LL from it (and that'd be some pretty solid LL). It's a trade-off -- admittedly an expensive one.

The timing as is (4.5 ms / double iteration) isn't terrible. Now if I could just halve that.. :)
Good. Options are good. Throughput is good. (So guess how I feel about Jacobi detected error forcing a restart from the original value 4, costing 2% of throughput. Particularly when other software may offer a choice.)

Keep up the good work.
kriesel is offline   Reply With Quote
Old 2017-08-12, 23:01   #87
preda
 
preda's Avatar
 
"Mihai Preda"
Apr 2015

23·32·19 Posts
Default

After consideration and input from this thread, this is the approach I ended up using in gpuOwL RE Jacobi check:

- on startup (from beginning or from savefile), establish a "good Jacobi" point that will be used if rollback is needed. When starting from a savefile this involves running one Jacobi check at the very beginning to verify that the savefile passes Jacobi (if it doesn't, it won't start).

- on every Jacobi-check, either move the "good Jacobi" point forward if the check passes, or roll back to the the most recent rollback point if the check fails.

The rollback point is kept in RAM, thus no file-read is involved in rolling back (thus simpler implem).

One more Jacobi check is done at the end (after the last iteration), with the same behavior.
preda is offline   Reply With Quote
Old 2017-08-13, 12:13   #88
R. Gerbicz
 
R. Gerbicz's Avatar
 
"Robert Gerbicz"
Oct 2005
Hungary

7·211 Posts
Default

This reliable error checking idea here http://mersenneforum.org/showthread.php?t=22510
from me is just working for Mersenne numbers also!

Why not do a Fermat pseudoprime test for base=3, but (for p>2) in the
equivalent form of res=3^(2^p) mod mp, where mp=2^p-1.
If it is 9 (or correctly it is 9 mod mp), then mp is a prp prime.
And the totally same error checking trick works, what worked for Proth numbers,
with a 0.1% overhead we could get a solid rock test.

Assuming that we need the same time as for LL test to get res, starting with 3, and doing here
p squaremod (we don't need to subtract 2).
And for those p primes that passed this test we should make a Lucas Lehmer test,
to prove that mp is "really" prime.

For what p primes we need to do a LLT: (quick PARI-Gp test up to p=25000)
Code:
forprime(p=2,25000,q=2^p-1;if(Mod(3,q)^(q+1)==Mod(9,q),print1(p",")))
2,3,5,7,13,17,19,31,61,89,107,127,521,607,1279,2203,2281,3217,4253,4423,9689,9941,11213,19937,21701,23209,
as we can see there was no composite fermat prp (for base=3) up to p=25000, ofcourse there could be some in higher range,
though it is likely that there is not even a single prp for Mersenne numbers.


Slightly modified test code, but the heart of the algorithm is the same as for Proth numbers:
(note that here lift(u0)=3 smallish, we would not need to store it)
Code:
myrand(r,N) returns s randomly from [0,N) for that s!=0 and s!=r (the r,s are in Z_N).

myrand(r,N)={local(tmp);while(1,tmp=random(N);if(tmp!=0&&lift(tmp+r)!=0,return(tmp+r)))}

we test mp=2^p-1 Mersenne number (where p is prime),
we use L at error checking, making errors in the i-th squaring
with 50% chance if errpos[i]!=0 (note that if we return to the same i multiple times,
then we choose the making error independently from the previous choices already done)
if printmsg!=0, then we print out some additional info,
the return value is 3^(mp+1) mod mp, note that for prp prime the return value is 9 (correctly 9 mod mp).
If you would not give a p prime or errpos's length is too small, then the return value is (-1).

prpmersenne(p,L,errpos,printmsg=1)={
if(isprime(p)==0,if(printmsg,print("p is not prime."));return(-1));
if(length(errpos)<p,if(printmsg,print("The errpos array's length should be at least p"));return(-1));
mp=2^p-1;
numerr=0;
L2=L^2;
u0=Mod(3,mp);
prev_d=u0;
saved_u=u0;
saved_d=u0;
saved_i=0;
i=0;res=u0;while(i<p,i+=1;
res=res^2;
if(errpos[i]&&random(2),res=myrand(res,mp));
if(i%L==0,d=prev_d*res;set_d=0;
if(i%L2==0||(i%L==0&&i+L>=p),
if(d!=u0*prev_d^(2^L),
numerr+=1;if(printmsg,print("Found error at iteration=",i,", roll back to iteration=",saved_i));
i=saved_i;res=saved_u;prev_d=saved_d;set_d=1,
saved_i=i;saved_u=res;saved_d=d));
if(!set_d,prev_d=d)));
if(printmsg,print1("m",p);if(res==Mod(9,mp),print(" is prp."),print(" is composite."));
print("Number of errors (corrected and) detected=",numerr));
return(lift(res))}
For p=61, test the mp=2^61-1 fifty times, making errors at the i=21,23,45 with L=4:
(i=21,23 are in the same L2=16 block).
Code:
p=61;errpos=vector(p,i,0);errpos[21]=1;errpos[23]=1;errpos[45]=1;
cnt=0;for(h=1,50,cnt+=(prpmersenne(p,4,errpos,1)==9);print());cnt
the result:
Code:
Found error at iteration=32, roll back to iteration=16
Found error at iteration=32, roll back to iteration=16
Found error at iteration=48, roll back to iteration=32
Found error at iteration=48, roll back to iteration=32
m61 is prp.
Number of errors (corrected and) detected=4

m61 is prp.
Number of errors (corrected and) detected=0

Found error at iteration=48, roll back to iteration=32
Found error at iteration=48, roll back to iteration=32
m61 is prp.
Number of errors (corrected and) detected=2

Found error at iteration=32, roll back to iteration=16
Found error at iteration=32, roll back to iteration=16
Found error at iteration=32, roll back to iteration=16
Found error at iteration=32, roll back to iteration=16
Found error at iteration=32, roll back to iteration=16
Found error at iteration=32, roll back to iteration=16
Found error at iteration=32, roll back to iteration=16
Found error at iteration=32, roll back to iteration=16
Found error at iteration=48, roll back to iteration=32
Found error at iteration=48, roll back to iteration=32
Found error at iteration=48, roll back to iteration=32
m61 is prp.
Number of errors (corrected and) detected=11

m61 is prp.
Number of errors (corrected and) detected=0

Found error at iteration=48, roll back to iteration=32
m61 is prp.
Number of errors (corrected and) detected=1

Found error at iteration=32, roll back to iteration=16
Found error at iteration=32, roll back to iteration=16
Found error at iteration=32, roll back to iteration=16
Found error at iteration=48, roll back to iteration=32
Found error at iteration=48, roll back to iteration=32
m61 is prp.
Number of errors (corrected and) detected=5

Found error at iteration=32, roll back to iteration=16
Found error at iteration=48, roll back to iteration=32
m61 is prp.
Number of errors (corrected and) detected=2

Found error at iteration=32, roll back to iteration=16
Found error at iteration=32, roll back to iteration=16
Found error at iteration=32, roll back to iteration=16
Found error at iteration=32, roll back to iteration=16
Found error at iteration=32, roll back to iteration=16
Found error at iteration=32, roll back to iteration=16
Found error at iteration=32, roll back to iteration=16
Found error at iteration=32, roll back to iteration=16
Found error at iteration=32, roll back to iteration=16
Found error at iteration=32, roll back to iteration=16
Found error at iteration=32, roll back to iteration=16
Found error at iteration=32, roll back to iteration=16
Found error at iteration=48, roll back to iteration=32
Found error at iteration=48, roll back to iteration=32
m61 is prp.
Number of errors (corrected and) detected=14

m61 is prp.
Number of errors (corrected and) detected=0

Found error at iteration=32, roll back to iteration=16
Found error at iteration=32, roll back to iteration=16
Found error at iteration=32, roll back to iteration=16
Found error at iteration=32, roll back to iteration=16
Found error at iteration=32, roll back to iteration=16
Found error at iteration=32, roll back to iteration=16
Found error at iteration=32, roll back to iteration=16
Found error at iteration=48, roll back to iteration=32
m61 is prp.
Number of errors (corrected and) detected=8

Found error at iteration=32, roll back to iteration=16
Found error at iteration=32, roll back to iteration=16
Found error at iteration=32, roll back to iteration=16
Found error at iteration=32, roll back to iteration=16
Found error at iteration=32, roll back to iteration=16
Found error at iteration=32, roll back to iteration=16
Found error at iteration=32, roll back to iteration=16
Found error at iteration=32, roll back to iteration=16
Found error at iteration=32, roll back to iteration=16
Found error at iteration=32, roll back to iteration=16
m61 is prp.
Number of errors (corrected and) detected=10

Found error at iteration=32, roll back to iteration=16
Found error at iteration=32, roll back to iteration=16
Found error at iteration=32, roll back to iteration=16
Found error at iteration=32, roll back to iteration=16
Found error at iteration=48, roll back to iteration=32
m61 is prp.
Number of errors (corrected and) detected=5

Found error at iteration=32, roll back to iteration=16
m61 is prp.
Number of errors (corrected and) detected=1

Found error at iteration=32, roll back to iteration=16
Found error at iteration=48, roll back to iteration=32
m61 is prp.
Number of errors (corrected and) detected=2

m61 is prp.
Number of errors (corrected and) detected=0

Found error at iteration=32, roll back to iteration=16
Found error at iteration=48, roll back to iteration=32
m61 is prp.
Number of errors (corrected and) detected=2

m61 is prp.
Number of errors (corrected and) detected=0

Found error at iteration=32, roll back to iteration=16
m61 is prp.
Number of errors (corrected and) detected=1

Found error at iteration=48, roll back to iteration=32
Found error at iteration=48, roll back to iteration=32
m61 is prp.
Number of errors (corrected and) detected=2

Found error at iteration=32, roll back to iteration=16
Found error at iteration=48, roll back to iteration=32
m61 is prp.
Number of errors (corrected and) detected=2

Found error at iteration=32, roll back to iteration=16
Found error at iteration=32, roll back to iteration=16
Found error at iteration=48, roll back to iteration=32
m61 is prp.
Number of errors (corrected and) detected=3

Found error at iteration=32, roll back to iteration=16
Found error at iteration=32, roll back to iteration=16
Found error at iteration=32, roll back to iteration=16
m61 is prp.
Number of errors (corrected and) detected=3

Found error at iteration=32, roll back to iteration=16
Found error at iteration=32, roll back to iteration=16
Found error at iteration=32, roll back to iteration=16
Found error at iteration=48, roll back to iteration=32
m61 is prp.
Number of errors (corrected and) detected=4

Found error at iteration=32, roll back to iteration=16
Found error at iteration=32, roll back to iteration=16
Found error at iteration=32, roll back to iteration=16
Found error at iteration=32, roll back to iteration=16
m61 is prp.
Number of errors (corrected and) detected=4

Found error at iteration=32, roll back to iteration=16
Found error at iteration=32, roll back to iteration=16
Found error at iteration=32, roll back to iteration=16
Found error at iteration=32, roll back to iteration=16
Found error at iteration=32, roll back to iteration=16
m61 is prp.
Number of errors (corrected and) detected=5

Found error at iteration=32, roll back to iteration=16
Found error at iteration=32, roll back to iteration=16
Found error at iteration=32, roll back to iteration=16
Found error at iteration=32, roll back to iteration=16
Found error at iteration=32, roll back to iteration=16
Found error at iteration=32, roll back to iteration=16
Found error at iteration=48, roll back to iteration=32
Found error at iteration=48, roll back to iteration=32
Found error at iteration=48, roll back to iteration=32
m61 is prp.
Number of errors (corrected and) detected=9

Found error at iteration=32, roll back to iteration=16
Found error at iteration=32, roll back to iteration=16
Found error at iteration=32, roll back to iteration=16
Found error at iteration=32, roll back to iteration=16
Found error at iteration=32, roll back to iteration=16
Found error at iteration=32, roll back to iteration=16
Found error at iteration=48, roll back to iteration=32
Found error at iteration=48, roll back to iteration=32
m61 is prp.
Number of errors (corrected and) detected=8

Found error at iteration=32, roll back to iteration=16
m61 is prp.
Number of errors (corrected and) detected=1

Found error at iteration=32, roll back to iteration=16
Found error at iteration=32, roll back to iteration=16
m61 is prp.
Number of errors (corrected and) detected=2

Found error at iteration=48, roll back to iteration=32
Found error at iteration=48, roll back to iteration=32
Found error at iteration=48, roll back to iteration=32
m61 is prp.
Number of errors (corrected and) detected=3

m61 is prp.
Number of errors (corrected and) detected=0

Found error at iteration=32, roll back to iteration=16
Found error at iteration=48, roll back to iteration=32
m61 is prp.
Number of errors (corrected and) detected=2

Found error at iteration=32, roll back to iteration=16
Found error at iteration=32, roll back to iteration=16
Found error at iteration=32, roll back to iteration=16
Found error at iteration=32, roll back to iteration=16
Found error at iteration=32, roll back to iteration=16
Found error at iteration=32, roll back to iteration=16
m61 is prp.
Number of errors (corrected and) detected=6

Found error at iteration=32, roll back to iteration=16
Found error at iteration=48, roll back to iteration=32
m61 is prp.
Number of errors (corrected and) detected=2

Found error at iteration=48, roll back to iteration=32
m61 is prp.
Number of errors (corrected and) detected=1

Found error at iteration=32, roll back to iteration=16
Found error at iteration=32, roll back to iteration=16
Found error at iteration=32, roll back to iteration=16
Found error at iteration=48, roll back to iteration=32
Found error at iteration=48, roll back to iteration=32
Found error at iteration=48, roll back to iteration=32
m61 is prp.
Number of errors (corrected and) detected=6

Found error at iteration=48, roll back to iteration=32
Found error at iteration=48, roll back to iteration=32
m61 is prp.
Number of errors (corrected and) detected=2

Found error at iteration=32, roll back to iteration=16
Found error at iteration=32, roll back to iteration=16
Found error at iteration=32, roll back to iteration=16
Found error at iteration=32, roll back to iteration=16
Found error at iteration=32, roll back to iteration=16
Found error at iteration=32, roll back to iteration=16
Found error at iteration=32, roll back to iteration=16
Found error at iteration=32, roll back to iteration=16
Found error at iteration=48, roll back to iteration=32
m61 is prp.
Number of errors (corrected and) detected=9

Found error at iteration=32, roll back to iteration=16
Found error at iteration=32, roll back to iteration=16
Found error at iteration=32, roll back to iteration=16
Found error at iteration=32, roll back to iteration=16
Found error at iteration=32, roll back to iteration=16
Found error at iteration=32, roll back to iteration=16
Found error at iteration=32, roll back to iteration=16
Found error at iteration=32, roll back to iteration=16
Found error at iteration=32, roll back to iteration=16
Found error at iteration=32, roll back to iteration=16
Found error at iteration=32, roll back to iteration=16
m61 is prp.
Number of errors (corrected and) detected=11

Found error at iteration=32, roll back to iteration=16
m61 is prp.
Number of errors (corrected and) detected=1

Found error at iteration=32, roll back to iteration=16
Found error at iteration=32, roll back to iteration=16
Found error at iteration=48, roll back to iteration=32
Found error at iteration=48, roll back to iteration=32
Found error at iteration=48, roll back to iteration=32
Found error at iteration=48, roll back to iteration=32
Found error at iteration=48, roll back to iteration=32
Found error at iteration=48, roll back to iteration=32
m61 is prp.
Number of errors (corrected and) detected=8

Found error at iteration=32, roll back to iteration=16
Found error at iteration=32, roll back to iteration=16
m61 is prp.
Number of errors (corrected and) detected=2

Found error at iteration=32, roll back to iteration=16
Found error at iteration=32, roll back to iteration=16
m61 is prp.
Number of errors (corrected and) detected=2

Found error at iteration=32, roll back to iteration=16
Found error at iteration=32, roll back to iteration=16
Found error at iteration=32, roll back to iteration=16
Found error at iteration=32, roll back to iteration=16
Found error at iteration=48, roll back to iteration=32
Found error at iteration=48, roll back to iteration=32
Found error at iteration=48, roll back to iteration=32
m61 is prp.
Number of errors (corrected and) detected=7

Found error at iteration=32, roll back to iteration=16
Found error at iteration=32, roll back to iteration=16
Found error at iteration=32, roll back to iteration=16
Found error at iteration=32, roll back to iteration=16
Found error at iteration=32, roll back to iteration=16
Found error at iteration=32, roll back to iteration=16
Found error at iteration=32, roll back to iteration=16
Found error at iteration=48, roll back to iteration=32
m61 is prp.
Number of errors (corrected and) detected=8

Found error at iteration=48, roll back to iteration=32
Found error at iteration=48, roll back to iteration=32
Found error at iteration=48, roll back to iteration=32
Found error at iteration=48, roll back to iteration=32
m61 is prp.
Number of errors (corrected and) detected=4

m61 is prp.
Number of errors (corrected and) detected=0

Found error at iteration=32, roll back to iteration=16
Found error at iteration=32, roll back to iteration=16
Found error at iteration=32, roll back to iteration=16
Found error at iteration=32, roll back to iteration=16
m61 is prp.
Number of errors (corrected and) detected=4

Found error at iteration=32, roll back to iteration=16
Found error at iteration=32, roll back to iteration=16
Found error at iteration=32, roll back to iteration=16
Found error at iteration=32, roll back to iteration=16
Found error at iteration=32, roll back to iteration=16
Found error at iteration=32, roll back to iteration=16
Found error at iteration=32, roll back to iteration=16
Found error at iteration=48, roll back to iteration=32
Found error at iteration=48, roll back to iteration=32
Found error at iteration=48, roll back to iteration=32
Found error at iteration=48, roll back to iteration=32
m61 is prp.
Number of errors (corrected and) detected=11

%5 = 50
?
Corrected and detected all errors in the 50 test runs.

This test with the much smaller mp=2^17-1 with L=2:
Code:
p=17;errpos=vector(p,i,0);errpos[9]=1;errpos[11]=1;
sum(h=1,10^6,prpmersenne(p,2,errpos,0)==9)
%7 = 999988
So out of a million test, it has not found the error(s) in 12 cases,
that is an error rate of less than 2/mp.

Ofcourse for a true run errpos=vector(p,i,0), (a zero array), it is used above only to insert false residues in the squaring computations. And for largish Mersenne computations use L=2000 (or say L=1000), depending on how much overhead you allow.
R. Gerbicz is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
Stockfish / Lutefisk game, move 14 poll. Hungry for fish and black pieces. MooMoo2 Other Chess Games 0 2016-11-26 06:52
Redoing factoring work done by unreliable machines tha Lone Mersenne Hunters 23 2016-11-02 08:51
Unreliable AMD Phenom 9850 xilman Hardware 4 2014-08-02 18:08
[new fish check in] heloo mwxdbcr Lounge 0 2009-01-14 04:55
The Happy Fish thread xilman Hobbies 24 2006-08-22 11:44

All times are UTC. The time now is 05:20.

Sun Jun 13 05:20:12 UTC 2021 up 16 days, 3:07, 0 users, load averages: 2.03, 1.86, 1.73

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.