![]() |
![]() |
#12 | |
P90 years forever!
Aug 2002
Yeehaw, FL
11101000001012 Posts |
![]() Quote:
First off, you must be running a 32-bit OS as the 5600K FFT is not chosen by 64-bit prime95. Perhaps this is why you are the only one reporting a problem. Second, I suspect you do not have round off checking turned on. The reason you are only seeing the problem in the last 50 iterations is that prime95 is not doing any roundoff checking except in the first 50 and last 50 iterations (letting Gerbicz catch any real errors). Turn on round-off checking and see if these errors are occurring throughout the PRP test. Third, in a short run I was seeing errors of only 0.18. If you are getting errors throughout your test it is *very* surprising that you aren't getting Gerbicz errors. Also surprising that you see this on multiple machines. Fourth, perhaps we should see if the issue occurs in 30.3 or maybe even 29.8. I'd like to understand what is going on rather than just fixing the problem by switching to a larger FFT size. |
|
![]() |
![]() |
![]() |
#13 | |||||
Sep 2017
USA
5·47 Posts |
![]()
Thank you both for the helpful replies!
Quote:
Quote:
Quote:
Quote:
Quote:
(It is also very possible that User error is entirely to blame ![]() Last fiddled with by Runtime Error on 2021-03-04 at 23:28 |
|||||
![]() |
![]() |
![]() |
#14 | |
P90 years forever!
Aug 2002
Yeehaw, FL
17×19×23 Posts |
![]() Quote:
|
|
![]() |
![]() |
![]() |
#15 |
P90 years forever!
Aug 2002
Yeehaw, FL
11101000001012 Posts |
![]()
This is NOT RECOMMENDED for the general public. You can work probably around your problem by setting "MaxRoundoffError=0.499" in prime.txt.
|
![]() |
![]() |
![]() |
#16 | |
Sep 2017
USA
5·47 Posts |
![]() Quote:
![]() However, on linux, these always start from a "fresh" install with only {mprime, libgmp.so, libgmp.so.10libgmp.so.10.3.2, libgmp.so.10.4.0, local.txt, prime.txt, worktodo.txt} in the folder. Those are still running at FFT length of 5734400 = 5600K. Thanks again. |
|
![]() |
![]() |
![]() |
#17 | |
P90 years forever!
Aug 2002
Yeehaw, FL
11101000001012 Posts |
![]() Quote:
The problem is specific to 30.4. I will fix it in build 10. The TL;DR; details: The problem occurs in the last 50 iterations when prime95 switches from Gerbicz error checking to double-checking (running each iteration twice with different shift counts). To generate the 2nd shift count the value is doubled -- the bug is that this addition is not getting normalized (carries propagated) when the exponent is just under 2800K below the FFT limit. The gwnum library was tweaked in this area in v30.4 (part of what led to some ECM speedup). |
|
![]() |
![]() |
![]() |
#18 | |
Sep 2017
USA
5×47 Posts |
![]() Quote:
![]() |
|
![]() |
![]() |
![]() |
#19 |
P90 years forever!
Aug 2002
Yeehaw, FL
17·19·23 Posts |
![]() |
![]() |
![]() |
![]() |
#20 |
∂2ω=0
Sep 2002
República de California
2×5,813 Posts |
![]()
George, when doing e.g. PRP-testing and p-1 stage 1 we need to be able to include a small constant integer multiplier like 3 in the round-and-carry-step anyway, yes? Here some simple sample code from on of my non-SIMD C carry macros - all vars doubles, x is current FFT-convolution output word, wt|wtinv DWT weight and its reciprocal, base|baseinv the power-of-2 base and reciprocal for the current word, frac the fractional error in the FFT output, cy the carry into the next-higher word:
Code:
x *= wtinv;\ temp = DNINT(x);\ frac = fabs(x-temp);\ temp = temp*prp_mult + cy;\ if(frac > maxerr) maxerr=frac;\ cy = DNINT(temp*baseinv);\ x = (temp-cy*base)*wt;\ |
![]() |
![]() |
![]() |
#21 | |
P90 years forever!
Aug 2002
Yeehaw, FL
11101000001012 Posts |
![]() Quote:
In a PRP test, when a Gerbicz block completes and there are less than 49 iterations remaining, prime95 switches to double-checking. Say the current value is x with shift count of s. Prime95 does: Code:
x2 = x + x; // x2 is now equals x with a shift count of s+1 do last N iterations on x do last N iterations on x2 undo the shift counts compare x and x2 Now, this bug has been there since version 29. So why did it only rear its ugly head now? Well, version 30.4 of the gwnum library was improved to keep better track of how many unnormalized adds have been done and be more aggressive in not doing normalized adds. This more aggressive code, triggered the bug. The root cause of the problem is that I did not sufficiently study and understand the impact of unnormalized adds on future multiplies. Having done more study I made some interesting discoveries. In the following, I discuss the unnormalized add impact in terms of "FFT-output-bits" where using one more FFT output bit will double the round-off error. 1) SInce forever, gwnum users were told that it was safe to do one unnormalized add prior to a gwmul. Why is this? It turns out gwsquare has much worse roundoff error than gwmul. I measured squaring (gwnum FFT sizes are chosen based squaring roundoff) as 0.527 output bits worse than a multiply. Conveniently, doing an unnormalized add on random data requires 0.509 more output bits. 2) Doing two unnormalized adds (a+b+c)*d requires another 0.288 output bits. 3) Doing a third unnormalized add (a+b+c+d)*e requires another 0.218 output bits. 4) Doing an unnormalized add on non-random data requires 1.0 output bits. This is exactly what the PRP bug was doing. Adding x+x is decidedly non-random, doubling the magnitude of every FFT word. 5) Worse yet, the PRP bug was doing an unnormalized add of non-random data and then calling gwsquare. This requires 2.0 output bits -- quadrupling the roundoff error. The extensive fix is that the gwnum library interface for gwadd has been upgraded. You now pass in an option that includes how the output will be utimately used (gwsquare, gwmul, etc). An option to indicate this is non-random data. Options to force a normalize, force no normalization, and a few other goodies. This info lets gwnum make much more sensible decisions on whether to do a normalization. Last fiddled with by Prime95 on 2021-03-08 at 01:11 |
|
![]() |
![]() |
![]() |
#22 | |
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
2×11×229 Posts |
![]() Quote:
(Where's the 0.355 from? Maybe I botched it, but here goes. Base case, final iterations, twice with differing shifts always, as prime95 described: p-1 mod blocksize x 2 shifts; on average blocksize/2 x 2 = 1 x blocksize iterations on average. Alternately, do p-1 mod blocksize <= blocksize/2 as shifted DC iterations, > as extend to GEC block size. If the later GEC passes, the successive iterations within it are shown good, including p-1 mod blocksize. Do p-1 mod blocksize / blocksize <= 1/2 case as before; work blocksize/4 * 2 on average= blocksize/2 * 1/2 occurrence probability = blocksize/4. Do p-1 mod blocksize / blocksize > 1/2 case as extend to next blocksize; work blocksize*3/4 * 1 * 1/2 occurrence probability = 3/8; total 5/8, savings 3/8 from the 1. But the GEC is not free. IIRC it's ~0.2% at normal blocksize of 1000, or about 2 iterations; 2/50 = 4% at blocksize 50. The GEC cost is incurred in half the possible cases; p-1 mod blocksize / blocksize > 1/2 so on average 2%. 3/8-.02 ~ .355) Last fiddled with by kriesel on 2021-03-13 at 17:18 |
|
![]() |
![]() |
![]() |
Thread Tools | |
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Large Round Off Errors | evoflash | Software | 8 | 2013-02-10 18:39 |
Hardware, FFT limits and round off errors | ewergela | Hardware | 9 | 2005-09-01 14:51 |
Reproducible error question | PhilF | Software | 0 | 2005-03-14 02:32 |
Round off errors | Matt_G | Hardware | 4 | 2004-04-12 14:46 |
Errors during Torture Test | sjhanson | Hardware | 20 | 2003-02-02 23:28 |