20210304, 01:07  #12  
P90 years forever!
Aug 2002
Yeehaw, FL
7×1,069 Posts 
Quote:
First off, you must be running a 32bit OS as the 5600K FFT is not chosen by 64bit prime95. Perhaps this is why you are the only one reporting a problem. Second, I suspect you do not have round off checking turned on. The reason you are only seeing the problem in the last 50 iterations is that prime95 is not doing any roundoff checking except in the first 50 and last 50 iterations (letting Gerbicz catch any real errors). Turn on roundoff checking and see if these errors are occurring throughout the PRP test. Third, in a short run I was seeing errors of only 0.18. If you are getting errors throughout your test it is *very* surprising that you aren't getting Gerbicz errors. Also surprising that you see this on multiple machines. Fourth, perhaps we should see if the issue occurs in 30.3 or maybe even 29.8. I'd like to understand what is going on rather than just fixing the problem by switching to a larger FFT size. 

20210304, 22:59  #13  
Sep 2017
USA
5×47 Posts 
Thank you both for the helpful replies!
Quote:
Quote:
Quote:
Quote:
Quote:
(It is also very possible that User error is entirely to blame ) Last fiddled with by Runtime Error on 20210304 at 23:28 

20210304, 23:29  #14  
P90 years forever!
Aug 2002
Yeehaw, FL
7483_{10} Posts 
Quote:


20210304, 23:37  #15 
P90 years forever!
Aug 2002
Yeehaw, FL
7×1,069 Posts 
This is NOT RECOMMENDED for the general public. You can work probably around your problem by setting "MaxRoundoffError=0.499" in prime.txt.

20210304, 23:49  #16  
Sep 2017
USA
11101011_{2} Posts 
Quote:
However, on linux, these always start from a "fresh" install with only {mprime, libgmp.so, libgmp.so.10libgmp.so.10.3.2, libgmp.so.10.4.0, local.txt, prime.txt, worktodo.txt} in the folder. Those are still running at FFT length of 5734400 = 5600K. Thanks again. 

20210305, 00:09  #17  
P90 years forever!
Aug 2002
Yeehaw, FL
16473_{8} Posts 
Quote:
The problem is specific to 30.4. I will fix it in build 10. The TL;DR; details: The problem occurs in the last 50 iterations when prime95 switches from Gerbicz error checking to doublechecking (running each iteration twice with different shift counts). To generate the 2nd shift count the value is doubled  the bug is that this addition is not getting normalized (carries propagated) when the exponent is just under 2800K below the FFT limit. The gwnum library was tweaked in this area in v30.4 (part of what led to some ECM speedup). 

20210305, 02:34  #18  
Sep 2017
USA
5·47 Posts 
Quote:


20210307, 20:42  #19 
P90 years forever!
Aug 2002
Yeehaw, FL
7·1,069 Posts 

20210307, 21:50  #20 
∂^{2}ω=0
Sep 2002
República de California
7×11×151 Posts 
George, when doing e.g. PRPtesting and p1 stage 1 we need to be able to include a small constant integer multiplier like 3 in the roundandcarrystep anyway, yes? Here some simple sample code from on of my nonSIMD C carry macros  all vars doubles, x is current FFTconvolution output word, wtwtinv DWT weight and its reciprocal, basebaseinv the powerof2 base and reciprocal for the current word, frac the fractional error in the FFT output, cy the carry into the nexthigher word:
Code:
x *= wtinv;\ temp = DNINT(x);\ frac = fabs(xtemp);\ temp = temp*prp_mult + cy;\ if(frac > maxerr) maxerr=frac;\ cy = DNINT(temp*baseinv);\ x = (tempcy*base)*wt;\ 
20210308, 01:09  #21  
P90 years forever!
Aug 2002
Yeehaw, FL
7×1,069 Posts 
Quote:
In a PRP test, when a Gerbicz block completes and there are less than 49 iterations remaining, prime95 switches to doublechecking. Say the current value is x with shift count of s. Prime95 does: Code:
x2 = x + x; // x2 is now equals x with a shift count of s+1 do last N iterations on x do last N iterations on x2 undo the shift counts compare x and x2 Now, this bug has been there since version 29. So why did it only rear its ugly head now? Well, version 30.4 of the gwnum library was improved to keep better track of how many unnormalized adds have been done and be more aggressive in not doing normalized adds. This more aggressive code, triggered the bug. The root cause of the problem is that I did not sufficiently study and understand the impact of unnormalized adds on future multiplies. Having done more study I made some interesting discoveries. In the following, I discuss the unnormalized add impact in terms of "FFToutputbits" where using one more FFT output bit will double the roundoff error. 1) SInce forever, gwnum users were told that it was safe to do one unnormalized add prior to a gwmul. Why is this? It turns out gwsquare has much worse roundoff error than gwmul. I measured squaring (gwnum FFT sizes are chosen based squaring roundoff) as 0.527 output bits worse than a multiply. Conveniently, doing an unnormalized add on random data requires 0.509 more output bits. 2) Doing two unnormalized adds (a+b+c)*d requires another 0.288 output bits. 3) Doing a third unnormalized add (a+b+c+d)*e requires another 0.218 output bits. 4) Doing an unnormalized add on nonrandom data requires 1.0 output bits. This is exactly what the PRP bug was doing. Adding x+x is decidedly nonrandom, doubling the magnitude of every FFT word. 5) Worse yet, the PRP bug was doing an unnormalized add of nonrandom data and then calling gwsquare. This requires 2.0 output bits  quadrupling the roundoff error. The extensive fix is that the gwnum library interface for gwadd has been upgraded. You now pass in an option that includes how the output will be utimately used (gwsquare, gwmul, etc). An option to indicate this is nonrandom data. Options to force a normalize, force no normalization, and a few other goodies. This info lets gwnum make much more sensible decisions on whether to do a normalization. Last fiddled with by Prime95 on 20210308 at 01:11 

20210313, 16:39  #22  
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
11·463 Posts 
Quote:
(Where's the 0.355 from? Maybe I botched it, but here goes. Base case, final iterations, twice with differing shifts always, as prime95 described: p1 mod blocksize x 2 shifts; on average blocksize/2 x 2 = 1 x blocksize iterations on average. Alternately, do p1 mod blocksize <= blocksize/2 as shifted DC iterations, > as extend to GEC block size. If the later GEC passes, the successive iterations within it are shown good, including p1 mod blocksize. Do p1 mod blocksize / blocksize <= 1/2 case as before; work blocksize/4 * 2 on average= blocksize/2 * 1/2 occurrence probability = blocksize/4. Do p1 mod blocksize / blocksize > 1/2 case as extend to next blocksize; work blocksize*3/4 * 1 * 1/2 occurrence probability = 3/8; total 5/8, savings 3/8 from the 1. But the GEC is not free. IIRC it's ~0.2% at normal blocksize of 1000, or about 2 iterations; 2/50 = 4% at blocksize 50. The GEC cost is incurred in half the possible cases; p1 mod blocksize / blocksize > 1/2 so on average 2%. 3/8.02 ~ .355) Last fiddled with by kriesel on 20210313 at 17:18 

Thread Tools  
Similar Threads  
Thread  Thread Starter  Forum  Replies  Last Post 
Large Round Off Errors  evoflash  Software  8  20130210 18:39 
Hardware, FFT limits and round off errors  ewergela  Hardware  9  20050901 14:51 
Reproducible error question  PhilF  Software  0  20050314 02:32 
Round off errors  Matt_G  Hardware  4  20040412 14:46 
Errors during Torture Test  sjhanson  Hardware  20  20030202 23:28 