![]() |
![]() |
#2619 | |
"Kieren"
Jul 2011
In My Own Galaxy!
27A916 Posts |
![]() Quote:
EDIT: I don't have an active setup for CuLu, so I can't answer the question. I think I am correct, that the '-r" argument is equivalent to '-r 0' . The higher level self-test is '-r 1' . Last fiddled with by kladner on 2017-07-28 at 02:29 |
|
![]() |
![]() |
![]() |
#2620 | |
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
13·373 Posts |
![]() Quote:
From the readme: -r n runs the short (n = 0) or long (n = 1) version of the self-test. In the table I posted, at item 28, it lists fft lengths run for -r 1 on a GTX 1060 3GB. That list was obtained from a -r 1 run made after fft benchmarking and threads benchmarking. Max fft length it ran was 8192k as listed there. In an earlier run, cudalucas2.06beta-cuda5.0-windows-x64.exe -d %dev% -r 1 >>clstart.txt run on the same GTX 1060 3GB before fft or threads benchmarking ran following residue checks in k (a somewhat different more extensive list): 1, 2, 4, 8, 10, 14, 16, 18, 32, 36, 42, 48, 56, 60, 64, 70, 80, 96, 112, 120, 128, 144, 160, 162, 168, 180, 192, 224, 256, 288, 320, 324, 336, 360, 384, 392, 400, 448, 512, 576, 640, 648, 672, 720, 768, 784, 800, 864, 896, 1024, 1152, 1176, 1296, 1344, 1440, 1568, 1600, 1728, 1792, 2048, 2304, 2592, 2688, 2880, 3136, 3200, 3584, 4096, 4608, 4704, 5184, 5600, 5760, 6048, 6272, 6400, 6480, 7168, 7776, 8064, 8192 I run something like the following (version varies, usually now 2.06beta May build, and a higher cuda level; max possible memtest width) cudalucas2.05.1-cuda4.2-windows-x64 -memtest 116 10 >>clstart.txt cudalucas2.05.1-cuda4.2-windows-x64 -r 1 >>clstart.txt cudalucas2.05.1-cuda4.2-windows-x64 -cufftbench 1 65536 5 >>clstart.txt rem suppress 1024 thread value in threadbench since it causes problems with my GTX480s or Quadro 2000s CUDALucas2.05.1-cuda4.2-windows-x64 -threadbench 1 65536 5 4 >>clstart.txt cudalucas2.05.1-cuda4.2-windows-x64 6972593 >>clstart.txt on any gpu I install or relocate. (Sometimes the 65536 must be reduced; sometimes the threadbench mask allows 1024 threads, both depending on GPU model.) On a GTX 480, cudalucas2.05.1-cuda4.2-windows-x64 -r 1 >>clstart.txt produced the following assortment of fft lengths run, _before_ fft or threads benchmarking were done. More lengths run in total, none above 8192k. 1, 2, 4, 8, 10, 14, 16, 18, 32, 36, 42, 48, 56, 60, 64, 70, 80, 96, 112, 120, 128, 144, 160, 162, 168, 180, 192, 224, 256, 288, 320, 324, 336, 360, 384, 392, 400, 448, 512, 576, 640, 648, 672, 720, 768, 784, 800, 864, 896, 1024, 1152, 1296, 1440, 1568, 1600, 1728, 1792, 2048, 2304, 2592, 2688, 2880, 3136, 3200, 3456, 3600, 4096, 4608, 4704, 5184, 5600, 5760, 6048, 6480, 7168, 8192 From a Gtx 1070 before fft benchmarking, threads benchmarking (May 2.06beta, cuda 6.0, x64) 1, 2, 4, 8, 10, 14, 16, 18, 32, 36, 42, 48, 56, 60, 64, 70, 80, 96, 112, 120, 128, 144, 160, 162, 168, 180, 192, 224, 256, 288, 320, 324, 336, 360, 384, 392, 400, 448, 512, 576, 640, 648, 672, 720, 768, 784, 800, 864, 896, 1024, 1152, 1176, 1296, 1344, 1440, 1568, 1600, 1728, 1792, 2048, 2304, 2592, 2688, 2880, 3136, 3200, 3584, 4096, 4608, 4704*, 5120, 5184, 5600, 5760, 6048, 6272, 6400, 6480, 7168, 7776, 8064, 8192 * 4704 appeared not to actually run: Using threads: square 256, splice 128. Starting self test M86845813 fft length = 4704K Using threads: square 256, splice 128. Starting self test M86845813 fft length = 5120K Iteration 10000 / 86845813, 0x88220ac98093b65c, 5120K, CUDALucas v2.06beta, error = 0.04102, real: 1:05, 6.5254 ms/iter This residue is correct. Not completing a length is rare. More variations on the same GTX 1060 3GB follow. V2.06beta 32bit cuda 6.5 -r 0 (A rare successful run in 32-bit on this card) 4, 8, 16, 64, 72, 160, 360, 720, 1134, 1296, 1440, 1600, 1728, 2048, 2304, 3136 V2.06beta 64bit cuda 6.5 -r 0 4, 8, 16, 64, 72, 160, 360, 720, 1134, 1296, 1440, 1600, 1728, 2048, 2304, 3136 V2.06beta 64bit cuda 6.5 -r (neither 0 nor 1 specified) 4, 8, 16, 64, 72, 160, 360, 720, 1134, 1296, 1440, 1600, 1728, 2048, 2304, 3136 Your statement that -r (no switch value specified) is equivalent to -r 0 (short residue test) seems to be confirmed. My startup scripts all use -r 1 (long test). Item 28 in the table was about -r 1 results. None of <-r, -r 0, -r 1> tests, or on any run (of dozens) I've reviewed ever exceeded fft length 8192k. -r 2 is not a legal input and is not accepted on the May 2.06 beta. |
|
![]() |
![]() |
![]() |
#2621 |
"Kieren"
Jul 2011
In My Own Galaxy!
11×13×71 Posts |
![]()
Sorry. I did not look closely enough at the information provided.
|
![]() |
![]() |
![]() |
#2622 |
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
10010111100012 Posts |
![]() |
![]() |
![]() |
![]() |
#2623 |
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
484910 Posts |
![]()
examining cudalucas 2.06 beta May 5 build source code confirms max exponent for which there's a selftest residue is 149,447,533, corresponding to 8192k fft length max.
Last fiddled with by kriesel on 2017-07-28 at 20:46 |
![]() |
![]() |
![]() |
#2624 |
Random Account
Aug 2009
U.S.A.
70E16 Posts |
![]()
I notice a lot of your tests were done with CUDA 6.5. I am using CUDA 8. My current version of mfaktc requires it. The best times I've gotten out of CuLu 2.06 is around 3.8 ms/iter on my GTX-480. To get that, I have to leave the threads/splice set at their default values of 256 and 128. It is problematic at this setting because I get frequent resets.
Lowering the threads/splice values increases the time to 4.2 ms/iter, roughly. However, it seems more well-behaved at lower settings. The difference is only 0.4 ms, which is not an issue since the difference is so very small. All this is for exponents in the 41M range. |
![]() |
![]() |
![]() |
#2625 | |
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
10010111100012 Posts |
![]() Quote:
I've often seen the CUDALucas 8.0 version (and 4.2) significantly slower in careful benchmark testing. It also depends on the GPU model and exponent size. A few percent slower is significant, to me, as it's the same as losing a day or more of throughput per month; more than a a week per year, or running one of a dozen GPUs at half speed. There's a difference between what maximum CUDA level the NVIDIA driver supports, and the minimum level that a given CUDALucas CUDAPm1 or Mfaktc requires, and what a given level of the SDK supports. CUDALucas2.06beta-CUDA6.0-Windows-x64.exe for example can run with any driver that supports CUDA 6.0 or above, including the latest that supports CUDA8, but not an old driver that supports only up to CUDA 5.5 or lower. With a driver installed that supports up to CUDA 8 requirements, one can run any version of CUDALucas with minimum requirement 4.0 through 8.0 (I've run the experiment by benchmarking all of 2.06beta 4.0 thru 8 on the same driver version), and pick the CUDA level that gives the best speed within accuracy limits for the GPU and exponents at the time. (There are some card and CUDA and fft length combinations that are not as dependable.) The driver versatility on CUDA level is a good thing in that it would allow running mfaktc requiring 8, CUDALucas fastest at 5.5, and CUDAPm1 fastest at some other level, on the same system and same single driver installation. Recently I visited the CUDA wikipedia page and saw that CUDA 9 SDK will drop support for compute capability 2.x cards, which includes older Quadros (2000, 4000), and GTX480; all the way up through the GTX500s and 600s. CUDA6.5 SDK is the last to support older compute capability 1.3 cards like the GTX290. https://en.wikipedia.org/wiki/CUDA#GPUs_supported The versions of Mfaktc I found online when I was looking months ago require CUDA 6.5 or up, not 8.0 minimum. http://www.mersennewiki.org/index.php/Mfaktc lists lots of choices, and CUDA 4.2, 6.5 or 8.0. I haven't the time right now to benchmark the assortment of Mfaktc versions. |
|
![]() |
![]() |
![]() |
#2626 | |
Random Account
Aug 2009
U.S.A.
70E16 Posts |
![]() Quote:
|
|
![]() |
![]() |
![]() |
#2627 |
Random Account
Aug 2009
U.S.A.
111000011102 Posts |
![]()
I had to modify the batch file shown in post 2610:
Code:
@echo off set count=0 set program=cudalucas :loop TITLE %program% current reset count = %count% set /a count+=1 echo %count% >> log.txt echo %count% %program%.exe if %count%==50 goto end goto loop :end del log.txt |
![]() |
![]() |
![]() |
#2628 | |
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
13×373 Posts |
![]() Quote:
|
|
![]() |
![]() |
![]() |
#2629 |
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
10010111100012 Posts |
![]()
Here is today's version of the list I am maintaining. As always, this is in appreciation of the authors' past contributions. Users may want to browse this for workarounds included in some of the descriptions, and for an awareness of some known pitfalls. Please respond with any comments, additions or suggestions you may have.
|
![]() |
![]() |
![]() |
Thread Tools | |
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Don't DC/LL them with CudaLucas | LaurV | Data | 131 | 2017-05-02 18:41 |
CUDALucas / cuFFT Performance on CUDA 7 / 7.5 / 8 | Brain | GPU Computing | 13 | 2016-02-19 15:53 |
CUDALucas: which binary to use? | Karl M Johnson | GPU Computing | 15 | 2015-10-13 04:44 |
settings for cudaLucas | fairsky | GPU Computing | 11 | 2013-11-03 02:08 |
Trying to run CUDALucas on Windows 8 CP | Rodrigo | GPU Computing | 12 | 2012-03-07 23:20 |