mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware > GPU Computing

Reply
 
Thread Tools
Old 2017-07-12, 21:49   #2608
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

12EB16 Posts
Default Anomalous CUDALucas fft benchmark timings at the high end of fftlength etc

Some CUDA library versions in combination with some GPU models produce anomalously low timings at fftlength=65536k in -fftbench in CUDALucas including the latest available, v2.06beta May 5 2017 build. Then other run lengths not a power of two, get compared to that anomalously low value, and if their accurate iteration time exceeds the anomalous time, they are absent from the resulting fft file. See the example attached. Anomalous values may be 2 or 3 times too fast, or orders of magnitude too fast. By removing usefully fast non-power-of-two fft lengths from the fft file, the anomalous timing will increase run times unnecessarily for multiple ranges of exponents.

Code:
 Device              GeForce GTX 1070
Compatibility       6.1
clockRate (MHz)     1708
memClockRate (MHz)  4004

  fft    max exp  ms/iter
    1      22133   0.0668
    2      43633   0.0667
    4      85933   0.0686
    8     169409   0.0969
   16     333803   0.0974
   32     657719   0.0969
   50    1017889   0.1241
   64    1296011   0.1257
   72    1454273   0.1281
   80    1612249   0.1293
   96    1927129   0.1321
  112    2240863   0.1388
  120    2397383   0.1667
  128    2553659   0.1689
  144    2865601   0.1914
  256    5031737   0.3427
  512    9914521   0.6101
 1024   19535569   1.1422
 2048   38492887   2.2206
 4096   75846319   4.4940
 8192  149447533   9.5765
16384  294471259  19.5743
32768  580225813  40.1492
65536 1143276383   0.1921
Since other codes were derived from CUDALucas and use the same CUDA libraries, they may also be affected. Note also that there is a somewhat analogous behavior with CUDALucas threadbench at the high end of fftlength, and another with squaring threads=1024 in CUDALucas threadbench on some GPU models.

Multiple workarounds are available:

1. Don't benchmark that high. (Easiest and quickest.)
2. Use an unaffected CUDA library version / card combination.
or
3. Hand edit the fft file after logging the stdout output via redirection to a file during the benchmark run.
Attached Thumbnails
Click image for larger version

Name:	fft length iteration time example gtx1070 CUDA4.1.png
Views:	65
Size:	15.0 KB
ID:	16451  
kriesel is offline   Reply With Quote
Old 2017-07-16, 17:53   #2609
storm5510
Random Account
 
storm5510's Avatar
 
Aug 2009
U.S.A.

32·199 Posts
Default

I found the message below when I got up this morning. It has been running around five hours, more or less.

Code:
CUDALucas.cu(1989) : cudaSafeCall() Runtime API error 30: unknown error.
Resetting device and restarting from last checkpoint.

Using threads: square 32, splice 32.
CUDALucas.cu(1115) : cudaSafeCall() Runtime API error 46: all CUDA-capable devices are busy or unavailable.
This is from version 2.06 Beta. The GPU is a GTX-480 running driver set 384.76 under Windows 10 x64 v1703.

Note the 'square' and 'splice' values. If I used the default setttings, This event would occur in just a few minutes. The lower I made them, the longer it would take for this error to happen.

Last fiddled with by storm5510 on 2017-07-16 at 17:54
storm5510 is offline   Reply With Quote
Old 2017-07-16, 19:21   #2610
kladner
 
kladner's Avatar
 
"Kieren"
Jul 2011
In My Own Galaxy!

22×43×59 Posts
Default

Quote:
Originally Posted by storm5510 View Post
I found the message below when I got up this morning. It has been running around five hours, more or less.

Code:
CUDALucas.cu(1989) : cudaSafeCall() Runtime API error 30: unknown error.
Resetting device and restarting from last checkpoint.

Using threads: square 32, splice 32.
CUDALucas.cu(1115) : cudaSafeCall() Runtime API error 46: all CUDA-capable devices are busy or unavailable.
This is from version 2.06 Beta. The GPU is a GTX-480 running driver set 384.76 under Windows 10 x64 v1703.

Note the 'square' and 'splice' values. If I used the default setttings, This event would occur in just a few minutes. The lower I made them, the longer it would take for this error to happen.
This is a known problem with Compute Capability 2.0 cards. I think that these include the 470, 480, 570, 580 series. My GTX 460 is unaffected. It is a CC 2.1.
When running CuLu on these cards, folks often used batch files to automatically restart the program.
Code:
@echo off
Set count=0
Set program=CUDALucas
:loop
TITLE %program% Current Reset Count = %count%
Set /A count+=1
echo %count% >> log.txt
echo %count%
%program%.exe
GOTO loop
I also attached this file, but renamed it as .txt
It was generally thought that these timeout errors and restarts did not negatively affect the results. I was always nervous about this, and stopped running CuLu on my 5xx cards.
Attached Files
File Type: txt CuLuLoop count.txt (181 Bytes, 42 views)
kladner is online now   Reply With Quote
Old 2017-07-16, 23:45   #2611
storm5510
Random Account
 
storm5510's Avatar
 
Aug 2009
U.S.A.

32·199 Posts
Default

Code:
@echo off
Set count=0
Set program=CUDALucas
:loop
TITLE %program% Current Reset Count = %count%
Set /A count+=1
echo %count% >> log.txt
echo %count%
%program%.exe
GOTO loop
I"m not a stranger to batch files. I've probably written hundreds since the late 1980's. This one is a bit different. I'm not really sure how to 'break' out of this one. If I ^C, it will drop to the last line and return the the label in line 4. I must be missing something!
storm5510 is offline   Reply With Quote
Old 2017-07-17, 00:08   #2612
kladner
 
kladner's Avatar
 
"Kieren"
Jul 2011
In My Own Galaxy!

22×43×59 Posts
Default

Quote:
Originally Posted by storm5510 View Post
Code:
@echo off
Set count=0
Set program=CUDALucas
:loop
TITLE %program% Current Reset Count = %count%
Set /A count+=1
echo %count% >> log.txt
echo %count%
%program%.exe
GOTO loop
I"m not a stranger to batch files. I've probably written hundreds since the late 1980's. This one is a bit different. I'm not really sure how to 'break' out of this one. If I ^C, it will drop to the last line and return the the label in line 4. I must be missing something!
It has been a long time since I used it. It is certainly a more sophisticated batch file than those I have written. It may have come from FlashJH (Jerry).
Do a Google on "site: mersenneforum.org ['cudalucas timeout' or 'cudalucas restart' or related terms]"
kladner is online now   Reply With Quote
Old 2017-07-17, 00:22   #2613
storm5510
Random Account
 
storm5510's Avatar
 
Aug 2009
U.S.A.

179110 Posts
Default

Quote:
Originally Posted by kladner View Post
It has been a long time since I used it. It is certainly a more sophisticated batch file than those I have written. It may have come from FlashJH (Jerry).
Do a Google on "site: mersenneforum.org ['cudalucas timeout' or 'cudalucas restart' or related terms]"
I get it now, This batch will restart the program after the error I posted about earlier occurs, if no one is watching. Ctrl-Break exits the batch properly. Sorry!
storm5510 is offline   Reply With Quote
Old 2017-07-17, 00:38   #2614
kladner
 
kladner's Avatar
 
"Kieren"
Jul 2011
In My Own Galaxy!

22×43×59 Posts
Default

Quote:
Sorry
Don't be. Happy to provide some history. I spent a good bit of time with CuLu, but that was a few years back. Now that I have an i7-6700k, that delivers 2.2ms/it in the 40-42M range, the GPUs do LLTF.
kladner is online now   Reply With Quote
Old 2017-07-18, 06:20   #2615
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

484310 Posts
Default nvidia driver timeout choices

Quote:
Originally Posted by storm5510 View Post
Code:
@echo off
Set count=0
Set program=CUDALucas
:loop
TITLE %program% Current Reset Count = %count%
Set /A count+=1
echo %count% >> log.txt
echo %count%
%program%.exe
GOTO loop
I"m not a stranger to batch files. I've probably written hundreds since the late 1980's. This one is a bit different. I'm not really sure how to 'break' out of this one. If I ^C, it will drop to the last line and return the the label in line 4. I must be missing something!
If I recall correctly, this is the NVIDIA driver timeout for older compute level 2 cards. There are two other possible approaches.
1. Downgrade your NVIDIA driver to about 306. and convince Windows not to upgrade it automatically.
2. Increase the driver timeout threshold. That can reduce but does not totally eliminate the timeouts. see post 2246: page 205 by wombatman; post 2247 by cudalucas author owftheevil; post 2257 gives the registry key info. post 2130 by flashjh also relates.
kriesel is offline   Reply With Quote
Old 2017-07-20, 04:34   #2616
storm5510
Random Account
 
storm5510's Avatar
 
Aug 2009
U.S.A.

110111111112 Posts
Default

Quote:
Originally Posted by kriesel View Post
...and convince Windows not to upgrade it automatically.
...
There is a option in Windows 10 called, "Device Installation Settings." You can select the "No" option button, which is supposed to stop hardware drivers to be updated. I have it set to "No" and it installs them anyway. Really useful!
storm5510 is offline   Reply With Quote
Old 2017-07-20, 17:13   #2617
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

29·167 Posts
Default unwanted driver updates

Quote:
Originally Posted by storm5510 View Post
There is a option in Windows 10 called, "Device Installation Settings." You can select the "No" option button, which is supposed to stop hardware drivers to be updated. I have it set to "No" and it installs them anyway. Really useful!
Yes, I remember doing battle with automatic updating too, although not on Win10. http://winsupersite.com/windows-10/s...tes-windows-10 says "obligatory" for home version. NVIDIA driver packages are big and SLOW on my low speed link.

I have a vague recollection of needing to also disable driver updating in some NVIDIA downloaded stuff too, during benchmarking versus driver version. Complete removal of higher versions from the system was required. And to make sure version stayed put, I disconnected the network cable before launching a benchmark. Not so handy for a general use machine, of course.
kriesel is offline   Reply With Quote
Old 2017-07-28, 00:43   #2618
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

29×167 Posts
Default CUDALucas bug list and wish list

This compilation is based on mostly my own running and testing since February on Windows, with some info from the forums mixed in. Please chime in with linux experience or in general. The absence of fft lengths greater than 8192k in the -r self test option seems like a priority item. Perhaps a separate -rbig or -r 2 option, with 1000 iterations for the big fft lengths >8192k?
Attached Files
File Type: pdf cudalucas bug and wishlist table.pdf (54.4 KB, 179 views)
kriesel is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
Don't DC/LL them with CudaLucas LaurV Data 131 2017-05-02 18:41
CUDALucas / cuFFT Performance on CUDA 7 / 7.5 / 8 Brain GPU Computing 13 2016-02-19 15:53
CUDALucas: which binary to use? Karl M Johnson GPU Computing 15 2015-10-13 04:44
settings for cudaLucas fairsky GPU Computing 11 2013-11-03 02:08
Trying to run CUDALucas on Windows 8 CP Rodrigo GPU Computing 12 2012-03-07 23:20

All times are UTC. The time now is 06:33.

Sun Jan 17 06:33:42 UTC 2021 up 45 days, 2:45, 0 users, load averages: 1.96, 1.95, 1.73

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.