mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware > GPU Computing

Reply
 
Thread Tools
Old 2020-02-18, 23:54   #1871
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

C8916 Posts
Default

Quote:
Originally Posted by axn View Post
You did notice the extra 0, right?
Yes. Fixed previously. The modern extra-sensitive-touchpad laptops are creating havoc with my interactive use, and in this case, after highlighting 60 for overwrite with 55, I didn't notice that it had changed to just the 6 highlighted, apparently. I learned touch typing around 1970 when most of the typewriters were manual and it was an honor to get to use an IBM Selectric powered typewriter with the flying spinning ball head. One of the things that I remember being taught is thumbs over the space bar. Unfortunately that puts them hovering over the often too sensitive touchpad on a laptop, giving all manner of unintended cursor moves. Normally on my old 17' display laptop I would double-tap the upper left corner and it would indicate touchpad off with a tiny LED there, but that laptop is out of action currently. Touchpad is now turned off by the Windows 10 control on this laptop, which I'm using to access the rest; the wireless mouse is SO much better behaved.

I have one laptop that has a touch screen also, and developed the "bubbles" problem where it senses its own display bezel as touches! It became increasingly sensitive, to the point where it made interactive use almost impossible. Disabling the touch screen device was what made it usable again. https://forums.lenovo.com/t5/Lenovo-...e/td-p/1362239

Last fiddled with by kriesel on 2020-02-19 at 00:18
kriesel is offline   Reply With Quote
Old 2020-02-19, 20:36   #1872
ewmayer
2ω=0
 
ewmayer's Avatar
 
Sep 2002
República de California

101001101111102 Posts
Default

Just started in on a batch of p's ~= 96.4M on my Radeon 7 ... that is close to he upper limit of what can be done @5120K using Prime95 and Mlucas, but I notice gpuOwl more conservatively defaults to 5632K. Without per-iteration ROE checking, the Gerbicz check should still catch residue corruption by excessive ROE in some output during the current G-check interval, so I'd like to test that out.

Is there a way to force it to try 5120K, and if so can this be done mid-run by ctrl-c and restarting with the needed FFT-length command-line flag?

EDIT: The readme is your friend... just killed current run, restarted with '-fft 5120K' ... that has proceeded for another million iterations, so far, so good. Has anyone reading this seen a case where an exponent close to the gpuOwl-set upper limit for a given FFT length hits an ROE-error-disguised-as-Gerbicz-check-error and causes the run to switch to the next-larger FFT length as a result?

Last fiddled with by ewmayer on 2020-02-19 at 21:09
ewmayer is offline   Reply With Quote
Old 2020-02-19, 21:17   #1873
ewmayer
2ω=0
 
ewmayer's Avatar
 
Sep 2002
República de California

29BE16 Posts
Default

Quote:
Originally Posted by ewmayer View Post
EDIT: The readme is your friend... just killed current run, restarted with '-fft 5120K' ... that has proceeded for another million iterations, so far, so good. Has anyone reading this seen a case where an exponent close to the gpuOwl-set upper limit for a given FFT length hits an ROE-error-disguised-as-Gerbicz-check-error and causes the run to switch to the next-larger FFT length as a result?
Once again in answer to my own question - predictably, literally seconds after posting my above edit, my run @5120K hit its first G-check error. The code retried 3 more times, then after the 4th attempt, quit with "3 sequential errors, will stop."

So this seems like a straightforward code fiddle - instead of just barfing, when a run hits a repeatable G-check error as mine did, if the exponent is close to or above the default limit for the FFT length in question, simply switch to the next-larger FFT length.

One related question regarding running near the exponent limit for a given FFT length - the OpenCL args echoed by the program on runstart do not say anything re. the carry-chain length used, but I see a user option "-carry long|short". Which choice gives better accuracy, and how can one tell what the default choice is for a given exponent and FFT length?

And another followup question regarding the "n errors" field output at each checkpoint - my force-5120K run started with "0 errors", then quickly cycled through 1,2,3,4 errors as it hit repeatable G-check errors due to a roundoff-corrupted residue. It then aborted. On restart sans the -fft flag it again defaulted to 5632K and is happily chugging along, but the errors field is now stuck at "2 errors". How did we go from 4 to 2? And shouldn't a repeatable G-check error count as 1 error?

Last fiddled with by ewmayer on 2020-02-19 at 21:33
ewmayer is offline   Reply With Quote
Old 2020-02-20, 09:47   #1874
preda
 
preda's Avatar
 
"Mihai Preda"
Apr 2015

2·11·41 Posts
Default

Quote:
Originally Posted by ewmayer View Post
Once again in answer to my own question - predictably, literally seconds after posting my above edit, my run @5120K hit its first G-check error. The code retried 3 more times, then after the 4th attempt, quit with "3 sequential errors, will stop."

So this seems like a straightforward code fiddle - instead of just barfing, when a run hits a repeatable G-check error as mine did, if the exponent is close to or above the default limit for the FFT length in question, simply switch to the next-larger FFT length.

One related question regarding running near the exponent limit for a given FFT length - the OpenCL args echoed by the program on runstart do not say anything re. the carry-chain length used, but I see a user option "-carry long|short". Which choice gives better accuracy, and how can one tell what the default choice is for a given exponent and FFT length?

And another followup question regarding the "n errors" field output at each checkpoint - my force-5120K run started with "0 errors", then quickly cycled through 1,2,3,4 errors as it hit repeatable G-check errors due to a roundoff-corrupted residue. It then aborted. On restart sans the -fft flag it again defaulted to 5632K and is happily chugging along, but the errors field is now stuck at "2 errors". How did we go from 4 to 2? And shouldn't a repeatable G-check error count as 1 error?
In general I try to keep things simple, not putting too much smarts in the automatic-dynamic FFT size. For example, in this case, just using the default would have been OK. If the user wants more control, it is possible to be explicit about the desired FFT size as you did. OTOH the behavior "let the user explicitly specify a FFT size, but dynamically increase it as needed" seems too complex (tricky) to me.

About carry size, -long provides better accuracy but it's so much slower that it's practically never used nowadays. Basically moving to the next-upper FFT size might well be faster than the -long carry. The default is always short carry.

About the number of errors (2 vs 4), this is a bit tricky: a savefile is only ever created with valid data that passed the check "right now". The number of errors is incremented in RAM, but can only be written as part of a valid savefile. What probably happened in you case is this: an error is hit (count becomes 1), backtrack, does a check earlier than the error point that passes OK and this saves (with count 1), again hits the error point, backtracks, and eventually hits 3 consecutive errors and stops.

Anyway improvements clearly can be made; but I'd like to identify the changes that have a clear behavior, a clear benefit, and not excessive cost before proceeding.
preda is offline   Reply With Quote
Old 2020-02-20, 19:34   #1875
ewmayer
2ω=0
 
ewmayer's Avatar
 
Sep 2002
República de California

2×3×13×137 Posts
Default

Quote:
Originally Posted by preda View Post
In general I try to keep things simple, not putting too much smarts in the automatic-dynamic FFT size. For example, in this case, just using the default would have been OK. If the user wants more control, it is possible to be explicit about the desired FFT size as you did. OTOH the behavior "let the user explicitly specify a FFT size, but dynamically increase it as needed" seems too complex (tricky) to me.
My own code leverages the restart-from-interrupt logic to do this - hit a ROE, first retry the current iteration interval at same FFT length to see if reproducible, then if larger FFT length is indicated based on that, restart from lst good savefile, at the larger FFT length.

Quote:
About carry size, -long provides better accuracy but it's so much slower that it's practically never used nowadays. Basically moving to the next-upper FFT size might well be faster than the -long carry. The default is always short carry.
Thanks - your terminology had me confused, because it sounds very similar to a carry-related accuracy-vs-speed option I implement in my code, but apparently refers to a very different thing. In my current code, rather than computing the all of the DWT weights from scratch or via my older 2-small-table-multiply scheme, I start with a high-accuracy DWT weight computed that way, but for the next few outputs use a simply recurrence to generate the needed weights: just "multiply up" each successive weight by the constant 2^(#smallwords/N), if the result >= 2, multiply by 0.5. But accuracy degrades here with increasing recurrence-chain length, so the code allows 3 different chain lengths, long|medium|short. At runstart it uses some simply "how close is the exponent to the upper limit for the given FFT length?" logic to set the initial chain length, if the run hits a dangerous ROE it will first try switching to the next-shorter chain length, and only if it hits an ROE and is already using the short length will it switch to the next-larger FFT, and revert the chain length to long. The performance hit from the shorter chain lengths is small enough - around 2% - that the next-larger FFT should always be a last resort.

You're right regarding the slowness of -carry long for your code - current expo running at 5632K at 755 us/iter. Halting and restarting with -fft 5120K as I did yesterday cuts timing to 708 us, but is more or less guaranteed to abort with G-check error resulting from incorrectly-rounded output due to excessive ROE. Using '-fft 5120K -carry long' might be safe in terms of ROE, but blows up the per-squaring time to 960 us, so I'm back to the default 5632K here.

Quote:
About the number of errors (2 vs 4), this is a bit tricky: a savefile is only ever created with valid data that passed the check "right now". The number of errors is incremented in RAM, but can only be written as part of a valid savefile. What probably happened in you case is this: an error is hit (count becomes 1), backtrack, does a check earlier than the error point that passes OK and this saves (with count 1), again hits the error point, backtracks, and eventually hits 3 consecutive errors and stops.
Do you expect my run results to be OK, or should I queue it up for early-DC just to be on the safe side?

Quote:
Anyway improvements clearly can be made; but I'd like to identify the changes that have a clear behavior, a clear benefit, and not excessive cost before proceeding.
Of course - being on both sides of the coder/user divide, I know that my job as a user is to say "gimme, gimme", and yours as a coder is to choose your battles very carefully.

Last fiddled with by ewmayer on 2020-02-20 at 19:54
ewmayer is offline   Reply With Quote
Old 2020-02-20, 22:27   #1876
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

62118 Posts
Default

Quote:
Originally Posted by ewmayer View Post
You're right regarding the slowness of -carry long for your code - current expo running at 5632K at 755 us/iter. Halting and restarting with -fft 5120K as I did yesterday cuts timing to 708 us, but is more or less guaranteed to abort with G-check error resulting from incorrectly-rounded output due to excessive ROE. Using '-fft 5120K -carry long' might be safe in terms of ROE, but blows up the per-squaring time to 960 us, so I'm back to the default 5632K here..
You're right to test carry length. It apparently behaves as advertised in Vega and Radeon VII, but in some older gpuowl and older gpu models (RX550, RX480, or both) -carry long was faster; I think for 4M fft length.
kriesel is offline   Reply With Quote
Old 2020-02-21, 03:19   #1877
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

3,209 Posts
Default gpuowl-win v6.11-148-gfc93773 build

Here it is, fresh from -h and no more testing than that. This commit should have the P-1 fft size fix mentioned in https://mersenneforum.org/showpost.p...postcount=1868
Attached Files
File Type: 7z gpuowl-v6.11-148-gfc93773.7z (448.0 KB, 10 views)

Last fiddled with by kriesel on 2020-02-21 at 03:20
kriesel is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
mfakto: an OpenCL program for Mersenne prefactoring Bdot GPU Computing 1583 2020-02-20 18:25
GPUOWL AMD Windows OpenCL issues xx005fs GPU Computing 0 2019-07-26 21:37
Primality testing non-Mersennes lukerichards Software 8 2018-01-24 22:30
Mersenne trial division implementation mathPuzzles Math 8 2017-04-21 07:21
Testing Mersenne cofactors for primality? CRGreathouse Computer Science & Computational Number Theory 18 2013-06-08 19:12

All times are UTC. The time now is 23:56.

Sun Feb 23 23:56:23 UTC 2020 up 23 days, 18:28, 2 users, load averages: 1.90, 2.33, 2.47

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2020, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.