20151012, 05:51  #1 
Mar 2010
3×137 Posts 
CUDALucas: which binary to use?
Good morning.
I have decided to do a small research regarding binary selection for CUDALucas. Downloaded the available binaries from Sourceforge, used the nearly default configuration file. I've used M43112609 for LL product calculation time, 2352K FFT size and 256 threads & splice, all running on 'ol GTX Titan. The average numbers of five different values were calculated: Code:
C_4.2: 1.8538 C_5.0: 1.82502 C_5.5: 1.8279 C_6.0: 1.83506 C_6.5: 1.6982 Now, same series of tests on M58496057 with 4320K FFT size: Code:
C_4.2: 3.52024 C_5.0: 3.40014 C_5.5: 3.32784 C_6.0: 3.31762 C_6.5: 3.33362 Miscellaneous observation(s): 1. For some reason, while C4.2  C5.5 binaries are OK with a smaller starting FFT length, C6.0 and C6.5 binaries require bigger FFT sizes, and this is erratic behaviour always occurs. Why this is happening is beyond my knowledge of the topic. There is more to it than that: Code:
Using threads: square 256, splice 256. Starting M58496057 fft length = 4320K Running careful round off test for 1000 iterations. If average error > 0.25, or maximum error > 0.35, the test will restart with a longer FFT. Iteration = 80 < 1000 && err = 0.50000 > 0.35, increasing n from 4320K The fft length 4608K is too large for exponent 58496057, decreasing to 4320K Using threads: square 256, splice 256. Starting M58496057 fft length = 4320K Running careful round off test for 1000 iterations. If average error > 0.25, or maximum error > 0.35, the test will restart with a longer FFT. Iteration 100, average error = 0.00021, max error = 0.00032 Iteration 200, average error = 0.00024, max error = 0.00033 Iteration 300, average error = 0.00025, max error = 0.00033 Iteration 400, average error = 0.00026, max error = 0.00032 Iteration 500, average error = 0.00026, max error = 0.00034 Iteration 600, average error = 0.00026, max error = 0.00032 Iteration 700, average error = 0.00027, max error = 0.00032 Iteration 800, average error = 0.00027, max error = 0.00032 Iteration 900, average error = 0.00027, max error = 0.00032 Iteration 1000, average error = 0.00027 <= 0.25 (max error = 0.00034), continuing test. Some initialisation bug? 2. The situation may (and I have a feeling it will) be different for NV cards of other shader model. Tracking particular "golden" binaries for particular exponent isn't easy, and adding particular shader models into that makes it tougher. One day the developers of CUDALucas may have to consider maintaining only a single build of CUDALucas and deprecating the rest, thus "embracing progress". Comments, along with other CUDA builds ('specially 7.0 and 7.5), are welcome! 
20151012, 06:08  #2 
Romulan Interpreter
"name field"
Jun 2011
Thailand
10011010000101_{2} Posts 
That is nothing wrong with the program, only that your FFT is too big, for this expo I think a ~3M FFT would work better and faster than the 4M3 FFT you use. When I reach home I will check with my cudaLucas setup.
[edit: the 3M is from estimation looking to your error size. You may need a bit higher than 3M. Generally, the best way (i.e. optimum and fast and safe to test) is when the error is around 0.2 (like from 0.15 to 0.25 according with your FFT selection). This FFT is definitely too big] Last fiddled with by LaurV on 20151012 at 06:11 
20151012, 06:31  #3 
Mar 2010
3·137 Posts 
Indeed, I've done more tests and found out that this "bug" or whatever's happening here can't always be reproduced.
Once the proper binary for any FFT size is picked, proper internal benchmarks should be done to find out the best FFT size/thread/splice combinations. Last fiddled with by Karl M Johnson on 20151012 at 06:49 Reason: yes 
20151012, 09:41  #4 
Just call me Henry
"David"
Sep 2007
Liverpool (GMT/BST)
3·5·397 Posts 
Sounds like an unstable card to me.

20151012, 11:19  #5 
Mar 2010
411_{10} Posts 
Not unlikely, even though I don't recall ever submitting a bad LL test before migrating from 337.xx Forceware.
As usual, needs more scientific testing. Any comments on the method? Any hints regarding CUDALucas and how it works with newer CUDA toolkit versions? Last fiddled with by Karl M Johnson on 20151012 at 11:20 Reason: yes 
20151012, 15:06  #6  
Serpentine Vermin Jar
Jul 2014
3324_{10} Posts 
Quote:
The basic code itself won't change with different FFT sizes, so it's only the way the program allocates memory which I think would be the big variable here, and 6.0 / 6.5 must not be entirely the same in that regard. I've never used any of the cuda compilers so I couldn't be more specific, but I'd look to see if any default options have changed, especially as it relates to the memory aspect. Like maybe 6.5 had some build option to do something to have it spend a little more time in garbage collection or something weird... something that would affect a larger memory chunk more than a smaller one. Or maybe something in the way it allocates memory differently, etc. etc. Last fiddled with by Madpoo on 20151012 at 15:07 

20151012, 16:37  #7 
Mar 2010
3×137 Posts 
Okay, I thought the hardware is unstable, it may lose overclocking potential, but as it turns out, it's not entirely related to that.
So far I've gotten no bad residues on C4.2 binaries, but this doesn't mean anything yet. (5.5)C6.0C6.5 could indeed be more stabilitydemanding, even if previous versions worked flawlessly for years. Will report my further findings. 
20151012, 16:53  #8 
If I May
"Chris Halsall"
Sep 2002
Barbados
17·599 Posts 

20151012, 17:37  #9  
Jul 2003
So Cal
3×751 Posts 
Quote:


20151012, 17:56  #10 
If I May
"Chris Halsall"
Sep 2002
Barbados
17×599 Posts 

20151012, 18:15  #11 
Jul 2003
So Cal
4315_{8} Posts 
I throw information away, yes. I question whether it's important.

Thread Tools  
Similar Threads  
Thread  Thread Starter  Forum  Replies  Last Post 
Binary Multitasking  a1call  Lounge  8  20161203 21:20 
CUDALucas writing binary data to screen  patrik  GPU Computing  3  20140720 23:56 
Would you use a 'fat binary' of GMPECM?  jasonp  GMPECM  8  20120212 22:25 
How to build a binary of SVN183?  Andi47  Msieve  12  20100201 19:30 
2d binary representation  only_human  Miscellaneous Math  9  20090223 00:11 