20140813, 07:12  #1 
May 2013
East. Always East.
11×157 Posts 
Lots of roundoff errors
Hey, folks.
I've been getting a number of roundoff errors in a DC. I wouldn't be heartbroken if the test came out bad since it's a smaller exponent (less wasted time) but I'd still like to get to the bottom of this. I've attached a screenshot of the worker in question. Here's the bits of results.txt relevant to the exponent in question. Code:
[Tue Aug 12 00:32:29 2014] Trying 1000 iterations for exponent 32562559 using 1680K FFT. If average roundoff error is above 0.24236, then a larger FFT will be used. Final average roundoff error is 0.23838, using 1680K FFT for exponent 32562559. [Tue Aug 12 07:50:20 2014] Iteration: 6557188/32562559, Possible error: round off (0.4375) > 0.40 Continuing from last save file. Disregard last error. Result is reproducible and thus not a hardware problem. For added safety, redoing iteration using a slower, more reliable method. Continuing from last save file. [Tue Aug 12 22:27:40 2014] Iteration: 7679973/32562559, Possible error: round off (0.4375) > 0.40 Continuing from last save file. [Wed Aug 13 01:03:36 2014] Iteration: 7679973/32562559, Possible error: round off (0.4375) > 0.40 Continuing from last save file. Disregard last error. Result is reproducible and thus not a hardware problem. For added safety, redoing iteration using a slower, more reliable method. Continuing from last save file. When I started my worker up again, I took the screenshot. 7:50 on Tuesday is some random time of the morning. 22:27 is when we started the round. 1:03 on Wednesday is when I started everything up again. The part that scares me is "2 roundoff errors of which 1 is repeatable" EDIT: Now that I think of it, the 22:27 and 1:03 errors are probably the same ones. I think I stopped the program before it had the chance to doublecheck the questionable iteration. Last fiddled with by TheMawn on 20140813 at 07:13 
20140813, 07:49  #2 
Jun 2003
12357_{8} Posts 
Everything is consistent with an exponent right on the crossover point between FFT sizes. Things should be fine the way they are.
EDIT: Increase your checkpointing frequency so that you lose fewer iterations when restarting. Last fiddled with by axn on 20140813 at 07:50 
20140813, 16:59  #3 
May 2013
East. Always East.
11·157 Posts 
Or should I just start over and force a larger FFT...?

20140813, 17:16  #4 
Jun 2003
23×233 Posts 
Perhaps that is better. At least you wouldn't have to worry this much.
Last fiddled with by axn on 20140813 at 17:17 
20140813, 17:35  #5  
"Kieren"
Jul 2011
In My Own Galaxy!
2·3·1,693 Posts 
Quote:
The above is a follow up to trying to balance my set voltage against Load Line Calibration so that it does not dip below a threshold voltage which I consider safe, but does not end up too high under load. This is somewhat complicated, because the load varies widely between idle, P95 running, and P95 plus two hungry GTX 500 series GPUs. I think I have it under control now after considerable testing with various combinations of GPUs and the different Torture Test modes. I will have to complete a few more assignments (DC) to be really confident about the situation. Last fiddled with by kladner on 20140813 at 17:36 

20140813, 17:47  #6 
P90 years forever!
Aug 2002
Yeehaw, FL
2·7·563 Posts 

20140813, 20:27  #7 
Feb 2012
3^{4}·5 Posts 
Maybe these should be called roundoff warnings. LOL

20140813, 22:16  #8 
∂^{2}ω=0
Sep 2002
República de California
2×11×13×41 Posts 
While 0.4375 is mostly safe (~99% in my experience), if you get multiple such during a test, don't get too relaxed.
George, do you have any largedataset stats on ROEs at the above level, vs badresults? A histogram of "number of 0.4375 errors during test vs % of such tests which failed" would be really useful. 
20140813, 22:29  #9 
May 2013
East. Always East.
11·157 Posts 
I think Misters Lucas and Lehmer would be pretty proud of what we're doing. Needing a special method using Sine and Cosine just to square a (big) integer.

20140813, 22:42  #10 
P90 years forever!
Aug 2002
Yeehaw, FL
2·7·563 Posts 
No I don't. However, prime95 uses a special method to redo any iteration with an ROE above 0.40625. In effect, prime95 can tolerate ROE up to 0.59375.

20140813, 23:13  #11 
∂^{2}ω=0
Sep 2002
República de California
2·11·13·41 Posts 
You can reliably determine if e.g. a 0.4375 is really a 0.5625 which has been NINTaliased? Do tell  something based on an FFT checksum?

Thread Tools  
Similar Threads  
Thread  Thread Starter  Forum  Replies  Last Post 
Post Lots and Lots of Top5000 Primes Here  Kosmaj  Riesel Prime Search  1982  20220410 09:15 
Prime95 roundoff errors  pjaj  Software  24  20211216 01:11 
Possible hardware errors have occurred during the test! 1 ROUNDOFF > 0.4.  Xyzzy  Software  7  20161220 00:01 
POST LOTS AND LOTS AND LOTS OF PRIMES HERE  lsoule  Riesel Prime Search  1999  20100317 22:33 
lots of large primes  Peter Hackman  Factoring  2  20080815 14:26 