20160202, 23:55  #1 
Dec 2002
787 Posts 
Skylake FMA3 round off error
I have my new Skylake at work and it completed its first DC exponent, which was in the 43M range. The system appears to be stable. Before the work was started mprime did a test to see if it could be done in a FFT size of 2240K. The round off error was just below the acceptable value. The DC LucasLehmer test completed successfully.
The system then started to work on its second exponent, 43175681. Same procedure, except that halfway in the DC test it started to show a round off error. Now, at 90% work done there are already 10 round off errors. A stress test that I did after a few errors had shown up did not show up any errors. I am thinking of running a second test on this exponent using FMA2 instead of FMA3 to see if the different rounding in the two methods might be the cause of this. Any thoughts? 
20160203, 05:51  #2 
Sep 2006
Brussels, Belgium
3^{2}×181 Posts 
My experience with a HaswellE (5820K) is that Prime95 is a bit too aggressive when near the limit for a FFT size. I fiddled with the "SoftCrossoverAdjust" parameter in prime.txt to go round this.
Jacob 
20160203, 09:39  #3 
Einyen
Dec 2003
Denmark
19·157 Posts 
I queued it up for a triple check.
FMA3 is part of AVX2. There is nothing called FMA2, the other FFT type in Prime95 is just called AVX. You can use: SoftCrossover=0.1 or SoftCrossover=0 in prime.txt then it will use a larger FFT size at a lower exponent. Edit: Reading undoc.txt again it might be better to use SoftCrossoverAdjust=0.002 or SoftCrossoverAdjust=0.004 instead of disabling the SoftCrossover feature (with SoftCrossover=0). Last fiddled with by ATH on 20160203 at 09:44 
20160203, 23:07  #4  
Dec 2002
1100010011_{2} Posts 
Quote:
SoftCrossover=0.3 SoftCrossoverAdjust=0.020 Let me see how this works out The exponent 43175683 is now being done witt FFT size 2304K instead of 2240K. This bigger size turns out to be about 45% faster than the smaller one. Maybe some interference with the memory speed or so. 

20160204, 00:55  #5 
Basketry That Evening!
"Bunslow the Bold"
Jun 2011
40<A<43 89<O<88
3·29·83 Posts 

20160204, 02:09  #6 
∂^{2}ω=0
Sep 2002
República de California
3·7·19·29 Posts 
My FFTerrormodelingbased lengthsetting function suggests a maxP in the range 4300579443235170 for 2240K, with 'where in the range' depending on the details of the shortlength DFT algos needed to build up that FFT length: 2240K = 2^16.5.7, and whether the FMA version of same give greater or lesser accuracy than the nonFMA ones. p = 43175683 is definitely pushing things.
In my own FMA3 code deployment I found that certain DFT radices showed surprisingly higher ROE levels when coded in the obvious fashion, and much of this was due to certain common arithmetic combinations (such as the twiddlemultiply by the oddindexed 8th roots of unity, (±1±I)/sqrt(2) in the radix8 DFT) when coded up to take advantage of FMA. My workaround for these was based on a lot of experimentation with 'strategic wrongway rounding' of various key arithmetic constants, i.e. round the LSB float64 bit in the opposite direction of that indicated by the quadfloat highprecision version of the same constant. The current state of my various sensitivity analyses here is as much voodoo as rigorous science, however. Last fiddled with by ewmayer on 20160204 at 02:11 
20160204, 09:13  #7 
Dec 2002
787 Posts 
With another machine, I also noticed this before, sometimes a larger FFT is faster than the smaller one. It would be worth to have the benchmark routine extended with a routine (or introduce a seperate one) that successively tests each FFT size for speed and then software wise disable the sizes that are slower than their both bigger and faster ones.
Also, I have to make a confession. The errors started showing up after I closed the cabinet. The GTX580 in it has a power connector that presses against the cover. That may very well have exerted too much pressure on the motherboard. I am not willing to close the cabinet again, before having obtained or custom make a new power cable. ATH, are you doing this exponent on a Skylake, or a less recent machine? 
20160204, 09:19  #8 
Einyen
Dec 2003
Denmark
19·157 Posts 

20160204, 18:59  #9 
Dec 2002
787 Posts 
I now believe the cause of the errors was the power cable of the GPU exerting pressure on the motherboard. I took the tie wraps off the cable and they now bend easily against the side panel.
Can I have this one faulty result taken from the machine's track record? 
20160204, 22:52  #10  
Serpentine Vermin Jar
Jul 2014
CCD_{16} Posts 
Quote:
Okay, after I looked up the correct user's info... your stats for all of your systems combined are: 1 suspect result (M43175681) ... currently unknown if it's bad until a triplecheck...hang in there! 329 good 0 bad 44 unknown When I do my own statistical analysis of systems, I break down a machine's results by user, cpu, year, and app version. So if that does end up being bad, it will only affect that one cpu and app version for any 2016 results. And I only use that info to guess at tiebreakers for mismatched results. I currently consider any system with zero bad and >= 15 good to be the winner, and that's really all the guessing involved. Last fiddled with by Madpoo on 20160204 at 22:58 

20160205, 03:47  #11  
Romulan Interpreter
Jun 2011
Thailand
2^{2}×2,239 Posts 
Quote:
The rule was that the bad results stay bad. For whatever reason. Otherwise we completely mess the statistics... Are we doing statistics or not? Last fiddled with by LaurV on 20160205 at 03:47 

Thread Tools  
Similar Threads  
Thread  Thread Starter  Forum  Replies  Last Post 
Round off error  Androx72  Software  2  20130228 00:00 
mprime ROUND OFF ERROR: Triplecheck advised?  Bdot  Software  5  20121222 22:34 
HDT55TWFK6DGR voltage and round off error  RickC  Hardware  2  20110219 04:07 
Error: Round Off???  edorajh  Software  27  20071110 06:26 
Another Round Off Error Issue  PhilF  Software  12  20050702 19:03 