20170726, 21:11  #617  
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
4,733 Posts 
CUDALucas2.05.1 error zero residues
Quote:
The May 5 2.06 beta build incorporates the bad residue checks and halts the program. If you can identify and contact the users, I suggest you recommend they switch versions. Perhaps also refer them to this post for background. FFT benchmarking and threads benchmarking screen output should be redirected to a file. Then it can be examined for signs of trouble like this: fft = 4K, ave time = 0.0760 ms, square: 32, splice: 128 fft = 4K, ave time = 0.0759 ms, square: 64, splice: 128 fft = 4K, ave time = 0.0760 ms, square: 128, splice: 128 fft = 4K, ave time = 0.0763 ms, square: 256, splice: 128 fft = 4K, ave time = 0.0788 ms, square: 512, splice: 128 fft = 4K, ave time = 0.0273 ms, square: 1024, splice: 128 fft = 4K, ave time = 0.0279 ms, square: 1024, splice: 32 fft = 4K, ave time = 0.0270 ms, square: 1024, splice: 64 fft = 4K, ave time = 0.0270 ms, square: 1024, splice: 128 fft = 4K, ave time = 0.0273 ms, square: 1024, splice: 256 fft = 4K, ave time = 0.0271 ms, square: 1024, splice: 512 fft = 4K, ave time = 0.0274 ms, square: 1024, splice: 1024 fft = 4K, min time = 0.0270 ms, square: 1024, splice: 128 Those square 1024 timings above that are barely a third of the others are signs of trouble. Normal variation is percentage points, not nearly 2 or 3fold. Some cards need to be benchmarked to generate fft files and threads files without trying 1024 threads, or without 32 threads, to produce valid reliable results. That's why the bit masks to skip them are provided in the m of the threadbench s e i m option. I've also noted some CUDALucas quirks with a recently acquired GTX1060. Mostly it won't run the 32bit versions of CUDALucas 2.06beta May 5 build, crashing instead of performing an fft benchmark, threads benchmark, residue test, or ordinary LL test. But when it does run win32 versions, it can produce all zeros for the residue check test and so fail every test. I need to look at this some more. Regardless of card, CUDALucas users should take care to avoid low CUDA levels. I've seen very bad benchmark results with older versions such as 4.0 and 4.1, in some cases small multiples faster, or orders of magnitude less iteration time than plausible when doing fft benchmark or threads benchmark. If it looks too good to be true, it probably isn't true. I have proposed, but no one has attempted yet, modifying CUDALucas so that anomalously low fft or threads benchmark values are detected and suppressed or at least warned on. There would need to be some tolerance bounds around a function fit that's around k l^1.03 where l is the fft length. The effect of such exceptions is particularly pernicious in threads benchmarking since the minimum value per fft length is sought, and if any of the twelve or so tested combinations goes bad, so that its iteration time is too low, it supersedes the other choices for that fft length, then when the threads file is being output, it also supersedes, suppresses output, of other values at shorter fft lengths that are not power of two lengths, that may be valid values but exceed the bad low iteration time. Here's an example that occurred on GTX1070 with CUDA4.1. Only the 4096k line is valid, believed to be from correct function. 4.1 Win32: all timings over 4096k anomalously low, by large multiples. Contiguous excerpt from threads file: 4096 32 128 4.48234 8192 32 256 0.73438 16384 32 32 1.46306 32768 128 32 2.93054 65536 128 32 0.04897 People running 100milliondigit exponents will need around 18816k or longer fft lengths and so are more likely to run benchmarks all the way to 65536k. I've seen frequent trouble in exhaustive benchmarking all the way from min to max (1k65536k). It can really fall off a cliff at 65536k, or be much more subtle. 4.0win32 on gtx1060 16384 294471259 35.2237 too small 32768 580225813 70.4184 big enough 65536 1143276383 15.8164 faster, might use this if they didn't know it was trouble. Iteration times are supposed to increase from the small fft length to the long fft length, with some fluctuation upward of the approximately linear trend line between the powers of two on length. If they don't, something's wrong, and the fft file, threads file, and program output are highly suspect. The exception to that is at the very low fft lengths, that most people won't bother to run, under ~100k, there is what appears to be variation in startup delay or other overhead. The cliff at the high end is clearly bad, and is responsible for removal of many valid useful nonpoweroftwo fft values from the resulting fft file. Last fiddled with by kriesel on 20170726 at 21:18 

20170726, 21:31  #618  
If I May
"Chris Halsall"
Sep 2002
Barbados
2·4,663 Posts 
Quote:
This becomes tiring. Are you familiar with the concept of temporal sampling? Last fiddled with by chalsall on 20170726 at 21:39 

20170726, 21:51  #619  
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
4733_{10} Posts 
credit for bad fast runs
Quote:
Are such erroneous and anomalously fast runs credited as the number of GHzDays it ought to have taken? Perhaps if it isn't too much trouble, certain knownbad residues from causes known for years ought earn less or no credit. Or something. 

20170726, 22:13  #620 
If I May
"Chris Halsall"
Sep 2002
Barbados
246E_{16} Posts 

20170727, 00:40  #621  
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
4,733 Posts 
65536k iteration time cliff sometimes
Quote:
Maybe I needed to be clearer. The cliff I referred to is the 65536k timing more than a factor of 100 faster than the next shorter fft length. (No objections to natural variation in combination with timing inaccuracy putting some fuzz on the plot, or larger percentage differences in the low lengths. Granted the measurement of the fft length iteration time is subject to certain experimental errors. The duration of a specific fft length iteration is a scalar though, not a time or spatial varying signal. Its sensitivity to digitization error and perhaps different operand inputs is controllable by making multiple runs and averaging the duration, which CUDALucas has built in.) Maybe I don't understand the point you're trying to make above. Try again? Several GPU/CUDA library/bitness combinations appear to produce a correct fft length iteration time at 65536k fft length. Some do not. I suspect that is a big red flag of a bug and incorrect results for that toofast 65536k computation. Based in part on nearly 50 years of experience with software and nearly 40 years with designing, building and using scientific instrumentation and robotics including realtime control programming in assembler, it looks to me like software issues, not sampling issues, specific to some software combinations, run the same way on the same hardware and timed the same way too. One of the things an engineer learns early on is when to distrust experimental data. Outliers are suspicious and get extra attention. Durations for 65536k below (much) less than 134msec seem outliers to me. From an assortment of fft benchmarks tagged by GPU, CUDALucas version, CUDA library, & driver:  GEFORCE GTX 1060 3GB FFT 2.06BETA 4.0 X64 N378.66.TXT 65536 1143276383 15.8225  GEFORCE GTX 1060 3GB FFT 2.06BETA 4.1 X64 N378.66.TXT 65536 1143276383 0.2968  GEFORCE GTX 1060 3GB FFT 2.06BETA 4.2 X64 N378.66.TXT 65536 1143276383 135.7008  GEFORCE GTX 1060 3GB FFT 2.06BETA 5.0 X64 N378.66.TXT 65536 1143276383 135.6211  GEFORCE GTX 1060 3GB FFT 2.06BETA 5.5 X64 N378.66.TXT 65536 1143276383 135.0425  GEFORCE GTX 1060 3GB FFT 2.06BETA 6.0 X64 N378.66.TXT 65536 1143276383 134.6015  GEFORCE GTX 1060 3GB FFT 2.06BETA 6.5 X64 N378.66.TXT 65536 1143276383 135.7659  GEFORCE GTX 1060 3GB FFT 2.06BETA 7.0 X64 N378.66.TXT 65536 1143276383 134.4818  GEFORCE GTX 1060 3GB FFT 2.06BETA 7.5 X64 N378.66.TXT 65536 1143276383 137.3554  GEFORCE GTX 1060 3GB FFT 2.06BETA 8.0 X64 N378.66.TXT 65536 1143276383 136.9777 

20170727, 01:07  #622 
∂^{2}ω=0
Sep 2002
República de California
3^{2}·1,093 Posts 
"The point you're trying to make" assumes facts not in evidence. When Chris is bored at his job or whatever, he likes to troll his fellow forumites and beat up on n00bs and known easy targets just for sport. This leads to fairly regular intervals of timeouts, a.k.a. "toadification" in the parlance of the mods. If you encounter such a post  usually of a fairly wellestablished 1 or 2line length and conveying no coherent point, best just to ignore it.

20170727, 01:13  #623  
If I May
"Chris Halsall"
Sep 2002
Barbados
2·4,663 Posts 
Yeah... Sorry, most don't get my humour...
Quote:
You didn't get the associated joke (it involves powers of two).... 

20170727, 01:19  #624  
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
4,733 Posts 
no joke
Quote:
If a large exponent result representing a large amount of computation if performed correctly, already attracted attention because it was reported back too fast to be correct, and signaled a new prime, and one cared about the quality of the stats and database, I think it would be up to the mersenne.org board and the PrimeNet administrator whether the excessive computing credit got adjusted down, or an obviously bad and very recent special residue got cleared. Perhaps the policy is everything is added to the database, never deleted. I don't know. (I know that my lifetime stats are seriously underrepresented already. That has nothing to do with "Erasure" above.) Falsely reported primes no matter how innocently they occur seem like a special case meriting more attention than the ordinary composite residue in this project. 

20170727, 15:52  #625 
Sep 2009
2·7·139 Posts 
Presumably all the benchmarks for a given exponent are doing the same calculation with different FFT lengths etc, so should all get the same result. So rejecting any that don't get the correct result should fix the issue.
All you would need is a list of residues generated by each benchmark on known good hardware to compare the results with. Or have I misunderstood how it works? Chris 
20170727, 20:35  #626 
If I May
"Chris Halsall"
Sep 2002
Barbados
2×4,663 Posts 
Quite possibly.
In the optimal case, different candidates would run on different hardware and software, and converge on a very similar value. Ideally 0. Last fiddled with by chalsall on 20170727 at 20:37 
20170727, 22:51  #627  
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
4,733 Posts 
CUDALucas 2.05.1
Quote:
Windows flavor V2.051 executables available now for download still don't have the illegal interim residue checks built in (except perhaps the cuda 8 x64 executable), although the 2.06beta set does. Also, I've verified by testing it's easy to hit various issues that generate the allzero residues fast. Additionally, when the extended residue check self test is run in CUDALucas 2.06beta, the largest fft length used is 8192k.(I ran a set for every x64 CUDA version available of the May 2.06beta on Windows, and none exceeded 8192k residue test.) That's much shorter than the ~18816k needed for 100Mdigits. There's also a behavior where it picks a seemingly tooshort fft length to run large exponents on. I vaguely recall reading of someone else seeing that happen too, on 100M exponents. It occurs at the very top end; an exponent that ought get 65536k drops to 57600k, then generates all zero residues too fast. My impression is CUDALucas requires more user attention to detail than prime95. I'm taking the details of my months of testing by using over to the CUDALucas thread. 

Thread Tools  
Similar Threads  
Thread  Thread Starter  Forum  Replies  Last Post 
Are Bitcoins Prime Related  a1call  Miscellaneous Math  23  20200917 13:17 
Merry Christmas and a prime! (M50 related)  Prime95  News  505  20200118 01:03 
Oops i did it again. (Prime found)  ltd  Prime Sierpinski Project  21  20060104 14:50 
Another new prime (M42Related)  Uncwilly  News  132  20050510 19:47 
some primerelated trick questions  ixfd64  Puzzles  2  20030923 12:53 