mersenneforum.org Reproducible round off errors near end of test
 User Name Remember Me? Password
 Register FAQ Search Today's Posts Mark Forums Read

2021-03-04, 01:07   #12
Prime95
P90 years forever!

Aug 2002
Yeehaw, FL

7×1,069 Posts

Quote:
 Originally Posted by Runtime Error Hi, I have encountered a few more reproducible roundoff errors at the end of tests. All occurred within the last 50 iterations. All exponents on different machines running mprime 30.4b9p8 with FFT length 5734400.
I'm somewhat stumped.

First off, you must be running a 32-bit OS as the 5600K FFT is not chosen by 64-bit prime95. Perhaps this is why you are the only one reporting a problem.

Second, I suspect you do not have round off checking turned on. The reason you are only seeing the problem in the last 50 iterations is that prime95 is not doing any roundoff checking except in the first 50 and last 50 iterations (letting Gerbicz catch any real errors). Turn on round-off checking and see if these errors are occurring throughout the PRP test.

Third, in a short run I was seeing errors of only 0.18. If you are getting errors throughout your test it is *very* surprising that you aren't getting Gerbicz errors. Also surprising that you see this on multiple machines.

Fourth, perhaps we should see if the issue occurs in 30.3 or maybe even 29.8.

I'd like to understand what is going on rather than just fixing the problem by switching to a larger FFT size.

2021-03-04, 22:59   #13
Runtime Error

Sep 2017
USA

5×47 Posts

Thank you both for the helpful replies!

Quote:
 Originally Posted by Prime95 First off, you must be running a 32-bit OS as the 5600K FFT is not chosen by 64-bit prime95.
I am running only on 64-bit OS. The first errors I reported (including stuck 102600269) were on 64-bit Windows. All others are on 64-bit Linux. I double checked and prime95/mprime are indeed the 64-bit versions.

Quote:
 Originally Posted by Prime95 Second, I suspect you do not have round off checking turned on. The reason you are only seeing the problem in the last 50 iterations is that prime95 is not doing any roundoff checking except in the first 50 and last 50 iterations (letting Gerbicz catch any real errors). Turn on round-off checking and see if these errors are occurring throughout the PRP test.
Correct, round off checking is probably turned off. Earlier, I turned on round off checking on 102600269 from the 102M checkpoint, and it remained at/below 0.191 until it predictably tripped on iteration 102600233 (picture attached to Post 1). As seen in the picture on Post 9, the round off error triggered on the iteration following a Gerbicz check. Perhaps that is useful information? I will keep round off checking turned on.

Quote:
 Originally Posted by Prime95 Third, in a short run I was seeing errors of only 0.18. If you are getting errors throughout your test it is *very* surprising that you aren't getting Gerbicz errors. Also surprising that you see this on multiple machines.
I have started some exponents on late model Xeons (with AVX-512) on 30.4b9p8 to see if this shows up there too.

Quote:
 Originally Posted by Prime95 Fourth, perhaps we should see if the issue occurs in 30.3 or maybe even 29.8.
Great idea. I will roll back half of my instances to 30.3. I will follow up in a week or two.

Quote:
 Originally Posted by Prime95 I'd like to understand what is going on rather than just fixing the problem by switching to a larger FFT size.
I am happy to help with debugging in any way that I can. Thank you!
(It is also very possible that User error is entirely to blame )

Last fiddled with by Runtime Error on 2021-03-04 at 23:28

2021-03-04, 23:29   #14
Prime95
P90 years forever!

Aug 2002
Yeehaw, FL

748310 Posts

Quote:
 Originally Posted by Runtime Error I am running only on 64-bit OS. The first errors I reported (including stuck 102600269) were on 64-bit Windows. All others are on 64-bit Linux. I double checked and prime95/mprime are indeed the 64-bit versions.
I'm puzzled on the selection of 5600K FFT. Do you have an old gwnum.txt lying around that is causing 5600K to be selected? Not that this matters, I'm just curious. A 5600K FFT should run just fine.

 2021-03-04, 23:37 #15 Prime95 P90 years forever!     Aug 2002 Yeehaw, FL 7×1,069 Posts This is NOT RECOMMENDED for the general public. You can work probably around your problem by setting "MaxRoundoffError=0.499" in prime.txt.
2021-03-04, 23:49   #16
Runtime Error

Sep 2017
USA

111010112 Posts

Quote:
 Originally Posted by Prime95 I'm puzzled on the selection of 5600K FFT. Do you have an old gwnum.txt lying around that is causing 5600K to be selected? Not that this matters, I'm just curious. A 5600K FFT should run just fine.
Yes, . My windows client is now running at 5760K FFT. Thank you.

However, on linux, these always start from a "fresh" install with only {mprime, libgmp.so, libgmp.so.10libgmp.so.10.3.2, libgmp.so.10.4.0, local.txt, prime.txt, worktodo.txt} in the folder. Those are still running at FFT length of 5734400 = 5600K.

Thanks again.

2021-03-05, 00:09   #17
Prime95
P90 years forever!

Aug 2002
Yeehaw, FL

164738 Posts

Quote:
 Originally Posted by Runtime Error (It is also very possible that User error is entirely to blame )
Nope, my bad!

The problem is specific to 30.4. I will fix it in build 10.

The TL;DR; details:
The problem occurs in the last 50 iterations when prime95 switches from Gerbicz error checking to double-checking (running each iteration twice with different shift counts). To generate the 2nd shift count the value is doubled -- the bug is that this addition is not getting normalized (carries propagated) when the exponent is just under 2800K below the FFT limit. The gwnum library was tweaked in this area in v30.4 (part of what led to some ECM speedup).

2021-03-05, 02:34   #18
Runtime Error

Sep 2017
USA

5·47 Posts

Quote:
 Originally Posted by Prime95 Nope, my bad! The problem is specific to 30.4. I will fix it in build 10. The TL;DR; details: The problem occurs in the last 50 iterations when prime95 switches from Gerbicz error checking to double-checking (running each iteration twice with different shift counts). To generate the 2nd shift count the value is doubled -- the bug is that this addition is not getting normalized (carries propagated) when the exponent is just under 2800K below the FFT limit. The gwnum library was tweaked in this area in v30.4 (part of what led to some ECM speedup).
Wow thank you, you rock!!! Great job!

2021-03-07, 20:42   #19
Prime95
P90 years forever!

Aug 2002
Yeehaw, FL

7·1,069 Posts

Quote:
 Originally Posted by Prime95 The problem is specific to 30.4. I will fix it in build 10.
The fix is rather extensive and will be addressed in 30.5. I've got a 30.5 gwnum library released for some preliminary testing. If all goes well, expect prime95 version 30.5 in a few days.

2021-03-07, 21:50   #20
ewmayer
2ω=0

Sep 2002
República de California

7×11×151 Posts

Quote:
 Originally Posted by Prime95 The fix is rather extensive
George, when doing e.g. PRP-testing and p-1 stage 1 we need to be able to include a small constant integer multiplier like 3 in the round-and-carry-step anyway, yes? Here some simple sample code from on of my non-SIMD C carry macros - all vars doubles, x is current FFT-convolution output word, wt|wtinv DWT weight and its reciprocal, base|baseinv the power-of-2 base and reciprocal for the current word, frac the fractional error in the FFT output, cy the carry into the next-higher word:
Code:
		x *= wtinv;\
temp = DNINT(x);\
frac = fabs(x-temp);\
temp = temp*prp_mult + cy;\
if(frac > maxerr) maxerr=frac;\
cy   = DNINT(temp*baseinv);\
x = (temp-cy*base)*wt;\
I'm sure I don't need to explain any of this to you, but for the benefit of the other readers: the key is that the small integer multiplier prp_mult gets applied to the convolution output x *after* rounding-to-nearest-int, so whether prp_mult = 1 or 2 or 3 should not materially affect the roundoff error, it being > 1 just makes for a slightly larger-magnitude carry into the next-higher word. The SIMD carry code is of course more intricate but the principle is the same. If your carry code is at all similar, why not just implement the multiply-by-2 in the final 50 iterations by setting the multiplier = 2?

2021-03-08, 01:09   #21
Prime95
P90 years forever!

Aug 2002
Yeehaw, FL

7×1,069 Posts

Quote:
 Originally Posted by ewmayer If your carry code is at all similar, why not just implement the multiply-by-2 in the final 50 iterations by setting the multiplier = 2?
Fixing RuntimeError's bug is actual trivial as you surmised. So, here's why the fix is extensive!

In a PRP test, when a Gerbicz block completes and there are less than 49 iterations remaining, prime95 switches to double-checking. Say the current value is x with shift count of s. Prime95 does:
Code:
x2 = x + x;    // x2 is now equals x with a shift count of s+1
do last N iterations on x
do last N iterations on x2
undo the shift counts
compare x and x2
The bug is that prime95 was using the gwadd () which looks at how close we are to the FFT limit to decide whether it can do a faster add without normalization. The fix is trivial, either call the gwsmallmul() routine of call gwadd () in such a way that a normalization is always done.

Now, this bug has been there since version 29. So why did it only rear its ugly head now? Well, version 30.4 of the gwnum library was improved to keep better track of how many unnormalized adds have been done and be more aggressive in not doing normalized adds. This more aggressive code, triggered the bug.

The root cause of the problem is that I did not sufficiently study and understand the impact of unnormalized adds on future multiplies. Having done more study I made some interesting discoveries. In the following, I discuss the unnormalized add impact in terms of "FFT-output-bits" where using one more FFT output bit will double the round-off error.

1) SInce forever, gwnum users were told that it was safe to do one unnormalized add prior to a gwmul. Why is this? It turns out gwsquare has much worse roundoff error than gwmul. I measured squaring (gwnum FFT sizes are chosen based squaring roundoff) as 0.527 output bits worse than a multiply. Conveniently, doing an unnormalized add on random data requires 0.509 more output bits.
2) Doing two unnormalized adds (a+b+c)*d requires another 0.288 output bits.
3) Doing a third unnormalized add (a+b+c+d)*e requires another 0.218 output bits.
4) Doing an unnormalized add on non-random data requires 1.0 output bits. This is exactly what the PRP bug was doing. Adding x+x is decidedly non-random, doubling the magnitude of every FFT word.
5) Worse yet, the PRP bug was doing an unnormalized add of non-random data and then calling gwsquare. This requires 2.0 output bits -- quadrupling the roundoff error.

The extensive fix is that the gwnum library interface for gwadd has been upgraded. You now pass in an option that includes how the output will be utimately used (gwsquare, gwmul, etc). An option to indicate this is non-random data. Options to force a normalize, force no normalization, and a few other goodies. This info lets gwnum make much more sensible decisions on whether to do a normalization.

Last fiddled with by Prime95 on 2021-03-08 at 01:11

2021-03-13, 16:39   #22
kriesel

"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

11·463 Posts

Quote:
 Originally Posted by Prime95 Fixing RuntimeError's bug is actual trivial as you surmised. So, here's why the fix is extensive! In a PRP test, when a Gerbicz block completes and there are less than 49 iterations remaining, prime95 switches to double-checking. Say the current value is x with shift count of s. Prime95 does: Code: x2 = x + x; // x2 is now equals x with a shift count of s+1 do last N iterations on x do last N iterations on x2 undo the shift counts compare x and x2 The bug is that prime95 was using the gwadd () which looks at how close we are to the FFT limit to decide whether it can do a faster add without normalization. The fix is trivial, either call the gwsmallmul() routine of call gwadd () in such a way that a normalization is always done. Now, this bug has been there since version 29. So why did it only rear its ugly head now? Well, version 30.4 of the gwnum library was improved to keep better track of how many unnormalized adds have been done and be more aggressive in not doing normalized adds. This more aggressive code, triggered the bug.
As I recall gpuowl in some versions would compute past p-1 iterations for the last GEC block. It seems to me that would be more efficient when (p-1) mod (blocksize) > (blocksize/2) than performing the last (p-1) mod (blocksize) iterations twice with shift differing by one. And more reliable in detecting error in the final iterations. Not much speed difference overall at p~100M, blocksize 50; on average ~blocksize*0.355/p ~50*.355/100M ~ 177.E-9 (1.77E-5 percent), and one tenth that at the 109 limit of mersenne.org assignments. That's like boosting a 3GHz processor by 530. Hz (and ram etc proportionately).

(Where's the 0.355 from? Maybe I botched it, but here goes.
Base case, final iterations, twice with differing shifts always, as prime95 described:
p-1 mod blocksize x 2 shifts; on average blocksize/2 x 2 = 1 x blocksize iterations on average.

Alternately, do p-1 mod blocksize <= blocksize/2 as shifted DC iterations, > as extend to GEC block size.
If the later GEC passes, the successive iterations within it are shown good, including p-1 mod blocksize.
Do p-1 mod blocksize / blocksize <= 1/2 case as before; work blocksize/4 * 2 on average= blocksize/2 * 1/2 occurrence probability = blocksize/4.
Do p-1 mod blocksize / blocksize > 1/2 case as extend to next blocksize; work blocksize*3/4 * 1 * 1/2 occurrence probability = 3/8;
total 5/8, savings 3/8 from the 1.
But the GEC is not free.
IIRC it's ~0.2% at normal blocksize of 1000, or about 2 iterations; 2/50 = 4% at blocksize 50.
The GEC cost is incurred in half the possible cases; p-1 mod blocksize / blocksize > 1/2 so on average 2%.
3/8-.02 ~ .355)

Last fiddled with by kriesel on 2021-03-13 at 17:18

 Thread Tools

 Similar Threads Thread Thread Starter Forum Replies Last Post evoflash Software 8 2013-02-10 18:39 ewergela Hardware 9 2005-09-01 14:51 PhilF Software 0 2005-03-14 02:32 Matt_G Hardware 4 2004-04-12 14:46 sjhanson Hardware 20 2003-02-02 23:28

All times are UTC. The time now is 07:28.

Fri May 7 07:28:38 UTC 2021 up 29 days, 2:09, 0 users, load averages: 2.13, 2.23, 2.26

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.