mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware > GPU Computing

Reply
 
Thread Tools
Old 2017-07-28, 02:25   #2619
kladner
 
kladner's Avatar
 
"Kieren"
Jul 2011
In My Own Galaxy!

27A916 Posts
Default

Quote:
Originally Posted by kriesel View Post
This compilation is based on mostly my own running and testing since February on Windows, with some info from the forums mixed in. Please chime in with linux experience or in general. The absence of fft lengths greater than 8192k in the -r self test option seems like a priority item. Perhaps a separate -rbig or -r 2 option, with 1000 iterations for the big fft lengths >8192k?
What is the limit with -r 1 ?
EDIT: I don't have an active setup for CuLu, so I can't answer the question. I think I am correct, that the '-r" argument is equivalent to '-r 0' . The higher level self-test is '-r 1' .

Last fiddled with by kladner on 2017-07-28 at 02:29
kladner is offline   Reply With Quote
Old 2017-07-28, 14:48   #2620
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

13·373 Posts
Default -r 1 vs -r 0 vs -r (none), and variations among GPU models, CUDA level or ?

Quote:
Originally Posted by kladner View Post
What is the limit with -r 1 ?
EDIT: I don't have an active setup for CuLu, so I can't answer the question. I think I am correct, that the '-r" argument is equivalent to '-r 0' . The higher level self-test is '-r 1' .
Yes.

From the readme:

-r n runs the short (n = 0) or long (n = 1) version of
the self-test.

In the table I posted, at item 28, it lists fft lengths run for -r 1 on a GTX 1060 3GB. That list was obtained from a -r 1 run made after fft benchmarking and threads benchmarking. Max fft length it ran was 8192k as listed there.

In an earlier run, cudalucas2.06beta-cuda5.0-windows-x64.exe -d %dev% -r 1 >>clstart.txt
run on the same GTX 1060 3GB before fft or threads benchmarking ran following residue checks in k (a somewhat different more extensive list):
1, 2, 4, 8, 10, 14, 16, 18, 32, 36, 42, 48, 56, 60, 64, 70, 80, 96, 112, 120, 128, 144, 160, 162, 168, 180, 192, 224, 256, 288, 320, 324, 336, 360, 384, 392, 400, 448, 512, 576, 640, 648, 672, 720, 768, 784, 800, 864, 896, 1024, 1152, 1176, 1296, 1344, 1440, 1568, 1600, 1728, 1792, 2048, 2304, 2592, 2688, 2880, 3136, 3200, 3584, 4096, 4608, 4704, 5184, 5600, 5760, 6048, 6272, 6400, 6480, 7168, 7776, 8064, 8192

I run something like the following (version varies, usually now 2.06beta May build, and a higher cuda level; max possible memtest width)

cudalucas2.05.1-cuda4.2-windows-x64 -memtest 116 10 >>clstart.txt
cudalucas2.05.1-cuda4.2-windows-x64 -r 1 >>clstart.txt
cudalucas2.05.1-cuda4.2-windows-x64 -cufftbench 1 65536 5 >>clstart.txt
rem suppress 1024 thread value in threadbench since it causes problems with my GTX480s or Quadro 2000s
CUDALucas2.05.1-cuda4.2-windows-x64 -threadbench 1 65536 5 4 >>clstart.txt
cudalucas2.05.1-cuda4.2-windows-x64 6972593 >>clstart.txt

on any gpu I install or relocate. (Sometimes the 65536 must be reduced; sometimes the threadbench mask allows 1024 threads, both depending on GPU model.)

On a GTX 480, cudalucas2.05.1-cuda4.2-windows-x64 -r 1 >>clstart.txt produced the following assortment of fft lengths run, _before_ fft or threads benchmarking were done. More lengths run in total, none above 8192k.

1, 2, 4, 8, 10, 14, 16, 18, 32, 36, 42, 48, 56, 60, 64, 70, 80, 96, 112, 120, 128, 144, 160, 162, 168, 180, 192, 224, 256, 288, 320, 324, 336, 360, 384, 392, 400, 448, 512, 576, 640, 648, 672, 720, 768, 784, 800, 864, 896, 1024, 1152, 1296, 1440, 1568, 1600, 1728, 1792, 2048, 2304, 2592, 2688, 2880, 3136, 3200, 3456, 3600, 4096, 4608, 4704, 5184, 5600, 5760, 6048, 6480, 7168, 8192

From a Gtx 1070 before fft benchmarking, threads benchmarking (May 2.06beta, cuda 6.0, x64)
1, 2, 4, 8, 10, 14, 16, 18, 32, 36, 42, 48, 56, 60, 64, 70, 80, 96, 112, 120, 128, 144, 160, 162, 168, 180, 192, 224, 256, 288, 320, 324, 336, 360, 384, 392, 400, 448, 512, 576, 640, 648, 672, 720, 768, 784, 800, 864, 896, 1024, 1152, 1176, 1296, 1344, 1440, 1568, 1600, 1728, 1792, 2048, 2304, 2592, 2688, 2880, 3136, 3200, 3584, 4096, 4608, 4704*, 5120, 5184, 5600, 5760, 6048, 6272, 6400, 6480, 7168, 7776, 8064, 8192

* 4704 appeared not to actually run:
Using threads: square 256, splice 128.
Starting self test M86845813 fft length = 4704K
Using threads: square 256, splice 128.
Starting self test M86845813 fft length = 5120K
Iteration 10000 / 86845813, 0x88220ac98093b65c, 5120K, CUDALucas v2.06beta, error = 0.04102, real: 1:05, 6.5254 ms/iter
This residue is correct.

Not completing a length is rare.

More variations on the same GTX 1060 3GB follow.

V2.06beta 32bit cuda 6.5 -r 0 (A rare successful run in 32-bit on this card)
4, 8, 16, 64, 72, 160, 360, 720, 1134, 1296, 1440, 1600, 1728, 2048, 2304, 3136

V2.06beta 64bit cuda 6.5 -r 0
4, 8, 16, 64, 72, 160, 360, 720, 1134, 1296, 1440, 1600, 1728, 2048, 2304, 3136

V2.06beta 64bit cuda 6.5 -r (neither 0 nor 1 specified)
4, 8, 16, 64, 72, 160, 360, 720, 1134, 1296, 1440, 1600, 1728, 2048, 2304, 3136

Your statement that -r (no switch value specified) is equivalent to -r 0 (short residue test) seems to be confirmed.

My startup scripts all use -r 1 (long test). Item 28 in the table was about -r 1 results. None of <-r, -r 0, -r 1> tests, or on any run (of dozens) I've reviewed ever exceeded fft length 8192k. -r 2 is not a legal input and is not accepted on the May 2.06 beta.
kriesel is offline   Reply With Quote
Old 2017-07-28, 14:56   #2621
kladner
 
kladner's Avatar
 
"Kieren"
Jul 2011
In My Own Galaxy!

11×13×71 Posts
Default

Sorry. I did not look closely enough at the information provided.
kladner is offline   Reply With Quote
Old 2017-07-28, 18:52   #2622
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

10010111100012 Posts
Default

Quote:
Originally Posted by kladner View Post
Sorry. I did not look closely enough at the information provided.
It's ok.

It turns out by looking at it some more, I noticed and learned some more, so it's all good.
kriesel is offline   Reply With Quote
Old 2017-07-28, 20:46   #2623
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

484910 Posts
Default self test residues limit

examining cudalucas 2.06 beta May 5 build source code confirms max exponent for which there's a selftest residue is 149,447,533, corresponding to 8192k fft length max.

Last fiddled with by kriesel on 2017-07-28 at 20:46
kriesel is offline   Reply With Quote
Old 2017-07-29, 00:24   #2624
storm5510
Random Account
 
storm5510's Avatar
 
Aug 2009
U.S.A.

70E16 Posts
Default

I notice a lot of your tests were done with CUDA 6.5. I am using CUDA 8. My current version of mfaktc requires it. The best times I've gotten out of CuLu 2.06 is around 3.8 ms/iter on my GTX-480. To get that, I have to leave the threads/splice set at their default values of 256 and 128. It is problematic at this setting because I get frequent resets.

Lowering the threads/splice values increases the time to 4.2 ms/iter, roughly. However, it seems more well-behaved at lower settings. The difference is only 0.4 ms, which is not an issue since the difference is so very small. All this is for exponents in the 41M range.
storm5510 is offline   Reply With Quote
Old 2017-07-29, 20:27   #2625
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

10010111100012 Posts
Default cuda levels

Quote:
Originally Posted by storm5510 View Post
I notice a lot of your tests were done with CUDA 6.5. I am using CUDA 8. My current version of mfaktc requires it. The best times I've gotten out of CuLu 2.06 is around 3.8 ms/iter on my GTX-480. To get that, I have to leave the threads/splice set at their default values of 256 and 128. It is problematic at this setting because I get frequent resets.

Lowering the threads/splice values increases the time to 4.2 ms/iter, roughly. However, it seems more well-behaved at lower settings. The difference is only 0.4 ms, which is not an issue since the difference is so very small. All this is for exponents in the 41M range.
I frequently run CUDALucas2.06beta-CUDA6.0-Windows-x64.exe or versions near that because they have done well in my benchmark testing.
I've often seen the CUDALucas 8.0 version (and 4.2) significantly slower in careful benchmark testing. It also depends on the GPU model and exponent size. A few percent slower is significant, to me, as it's the same as losing a day or more of throughput per month; more than a a week per year, or running one of a dozen GPUs at half speed.

There's a difference between what maximum CUDA level the NVIDIA driver supports, and the minimum level that a given CUDALucas CUDAPm1 or Mfaktc requires, and what a given level of the SDK supports. CUDALucas2.06beta-CUDA6.0-Windows-x64.exe for example can run with any driver that supports CUDA 6.0 or above, including the latest that supports CUDA8, but not an old driver that supports only up to CUDA 5.5 or lower. With a driver installed that supports up to CUDA 8 requirements, one can run any version of CUDALucas with minimum requirement 4.0 through 8.0 (I've run the experiment by benchmarking all of 2.06beta 4.0 thru 8 on the same driver version), and pick the CUDA level that gives the best speed within accuracy limits for the GPU and exponents at the time. (There are some card and CUDA and fft length combinations that are not as dependable.) The driver versatility on CUDA level is a good thing in that it would allow running mfaktc requiring 8, CUDALucas fastest at 5.5, and CUDAPm1 fastest at some other level, on the same system and same single driver installation.

Recently I visited the CUDA wikipedia page and saw that CUDA 9 SDK will drop support for compute capability 2.x cards, which includes older Quadros (2000, 4000), and GTX480; all the way up through the GTX500s and 600s.
CUDA6.5 SDK is the last to support older compute capability 1.3 cards like the GTX290.
https://en.wikipedia.org/wiki/CUDA#GPUs_supported

The versions of Mfaktc I found online when I was looking months ago require CUDA 6.5 or up, not 8.0 minimum. http://www.mersennewiki.org/index.php/Mfaktc lists lots of choices, and CUDA 4.2, 6.5 or 8.0. I haven't the time right now to benchmark the assortment of Mfaktc versions.
kriesel is offline   Reply With Quote
Old 2017-07-30, 18:49   #2626
storm5510
Random Account
 
storm5510's Avatar
 
Aug 2009
U.S.A.

70E16 Posts
Default

Quote:
Originally Posted by kriesel View Post
...Recently I visited the CUDA wikipedia page and saw that CUDA 9 SDK will drop support for compute capability 2.x cards, which includes older Quadros (2000, 4000), and GTX480; all the way up through the GTX500s and 600s.
CUDA6.5 SDK is the last to support older compute capability 1.3 cards like the GTX290.
https://en.wikipedia.org/wiki/CUDA#GPUs_supported

The versions of Mfaktc I found online when I was looking months ago require CUDA 6.5 or up, not 8.0 minimum. http://www.mersennewiki.org/index.php/Mfaktc lists lots of choices, and CUDA 4.2, 6.5 or 8.0. I haven't the time right now to benchmark the assortment of Mfaktc versions.
I get my drivers from nVidia's support pages. They're always the latest ones. I've never had any experience with anything below 8. As for the GTX-480 I have, its time is limited. I ocassionally browse around to see what is available, and where. I would 'like' to have something that will get me away from the resets in CuLu.
storm5510 is offline   Reply With Quote
Old 2017-08-07, 15:50   #2627
storm5510
Random Account
 
storm5510's Avatar
 
Aug 2009
U.S.A.

111000011102 Posts
Default

I had to modify the batch file shown in post 2610:

Code:
@echo off
set count=0
set program=cudalucas
:loop
TITLE %program% current reset count = %count%
set /a count+=1
echo %count% >> log.txt
echo %count%
%program%.exe
if %count%==50 goto end
goto loop
:end
del log.txt
If the worktodo.txt file contains no assignment, then the batch goes into a continuous loop. I found the count this morning at over 700,000. I added the lines in bold. With the count value of 50, it loops about a second before it drops out to the prompt. Of course, this value can be set to whatever one desires.
storm5510 is offline   Reply With Quote
Old 2017-08-09, 17:44   #2628
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

13×373 Posts
Default

Quote:
Originally Posted by storm5510 View Post
I had to modify the batch file shown in post 2610:

Code:
@echo off
set count=0
set program=cudalucas
:loop
TITLE %program% current reset count = %count%
set /a count+=1
echo %count% >> log.txt
echo %count%
%program%.exe
if %count%==50 goto end
goto loop
:end
del log.txt
If the worktodo.txt file contains no assignment, then the batch goes into a continuous loop. I found the count this morning at over 700,000. I added the lines in bold. With the count value of 50, it loops about a second before it drops out to the prompt. Of course, this value can be set to whatever one desires.
I experimented with increasing time delays between batch loop iterations as well as a loop count limit of 30. (Think what a mess unbounded loop iterations makes of output redirected to a log file...) Increased time delay had no discernible effect on NVIDIA driver timeout issue impact.
kriesel is offline   Reply With Quote
Old 2017-08-14, 14:33   #2629
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

10010111100012 Posts
Default cudalucas bug and wish list update

Here is today's version of the list I am maintaining. As always, this is in appreciation of the authors' past contributions. Users may want to browse this for workarounds included in some of the descriptions, and for an awareness of some known pitfalls. Please respond with any comments, additions or suggestions you may have.
Attached Files
File Type: pdf cudalucas bug and wishlist table.pdf (66.1 KB, 119 views)
kriesel is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
Don't DC/LL them with CudaLucas LaurV Data 131 2017-05-02 18:41
CUDALucas / cuFFT Performance on CUDA 7 / 7.5 / 8 Brain GPU Computing 13 2016-02-19 15:53
CUDALucas: which binary to use? Karl M Johnson GPU Computing 15 2015-10-13 04:44
settings for cudaLucas fairsky GPU Computing 11 2013-11-03 02:08
Trying to run CUDALucas on Windows 8 CP Rodrigo GPU Computing 12 2012-03-07 23:20

All times are UTC. The time now is 05:44.

Tue Jan 26 05:44:27 UTC 2021 up 54 days, 1:55, 0 users, load averages: 2.66, 2.42, 2.34

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.