mersenneforum.org  

Go Back   mersenneforum.org > Extra Stuff > Blogorrhea > kriesel

Closed Thread
 
Thread Tools
Old 2018-05-28, 20:08   #1
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

4,909 Posts
Default CUDALucas-specific reference material

This thread is intended to hold only reference material specifically for CUDALucas
(Suggestions are welcome. Discussion posts in this thread are not encouraged. Please use the reference material discussion thread. Off-topic posts may be moved or removed, to keep the reference threads clean, tidy, and useful.)


If you're already set up and running in CUDALucas, scroll to the bottom of the post for the thread table of contents.


How to set up and run CUDALucas

Gpuowl and PRP are recommended for new first-time primality tests, on GPUs that can run it. It has superior error detection and handling, much lower cost of verification due to proof generation capability, and is also faster, than CUDALucas 2.06. To perform LL DC, on GPUs that can run Gpuowl, Gpuowl is recommended for that too, unless the first test was done with zero shift, since recent Gpuowl versions include the Jacobi check but lacks nonzero shift capability. Attempting prp first is a reliable way to assess a GPU's reliability. Some older NVIDIA gpus can't run Gpuowl, but can run CUDALucas, which lacks the Jacobi check. Running LL DC on them is recommended, not first-time LL tests.

Download from https://download.mersenne.ca/CUDALucas, for Linux https://download.mersenne.ca/CUDALuc...nux-x86_64.zip
or for Windows https://download.mersenne.ca/CUDALuc...indows32.64.7z
or https://sourceforge.net/projects/cudalucas/files/

Create a user directory. Unzip the software in it.
Get the appropriate CUDA level cufft and cudart library files for your gpu and OS from
https://download.mersenne.ca/CUDA-DLLs and place them in the same directory.

Review the cudalucas.ini file. Keep an original version for reference.
Only make changes you're sure of.

Get the cl-startup script below, for Windows, or tdulcet's scripts for Linux from http://www.mersenneforum.org/showpos...12&postcount=1.
Edit carefully to adapt to your gpu and environment.
Read and run them. Be patient. Depending on gpu model and other variables, the Windows startup script can take hours or days to complete. On an RTX2080 (which is better suited to TF), a single pass memory test alone takes about 75 minutes. If it crashes with an out of memory error reduce the number of 25MB blocks to just below what it logged as attempting, and try again. (A test on RTX2080 worked with 314 blocks.)
The cl-startup script includes rerunning a small known Mersenne prime, of your choice by editing the file. Do not proceed to new work until it completes that correctly.

If the gpu shows memory errors, you might be able to clear them up by improving cooling or lowering the clock speed. Until it passes a comprehensive memory test, don't use it for primality testing. Retesting gpu memory annually and regularly performing double checks are recommended.
CUDALucas is very vulnerable to memory errors since it has neither the Gerbicz error check nor the Jacobi check. System ram errors or gpu vram errors can cause wrong primality test results.

See also the draft readme update at https://www.mersenneforum.org/showpo...84&postcount=6

It is likely that future discoveries of Mersenne primes will be made with Gpuowl via PRP/GEC, and confirmed with Gpuowl on Radeon VII running LL with Jacobi check. Parallel confirmations are likely in CUDALucas, prime95, and Mlucas on the fastest available reliable hardware.

To obtain LL DC assignments, go to https://www.mersenne.org/manual_assignment/ and check at the upper right you're logged in.
Specify number of assignments and workers. (Start small.)
At "Preferred work type:" select "Double-check LL tests".
Click "Get Assignments". The page will update with assignments. (Eventually; be patient. Do not click page refresh unless you want multiple batches and have already copied the previous batch.)
Copy and paste the assignments from the page, into a worktodo.txt file in your CUDALucas working directory. Then launch the CUDALucas program to test the assignments. These can take a long time. Start with the smallest you can get, until you develop a sense of time required. Generally, time required on a given gpu is proportional to p2.1 and is measured in days, weeks, or months. Longer than about a month per assignment is likely to be unreliable on even good equipment. ECC system ram may help reliability. An RTX2080 Super takes about 40 hours for a 53M LL DC, and is much more productive for the project when performing TF with mfaktc.

To report results, go to https://www.mersenne.org/manual_result/ and check at the upper right you're logged in.
Copy and paste recent (previously unreported) results into the page and click submit.
The page will refresh. Note any error messages, and whether your double-check(s) matched. I usually append a marker in the results.txt file they came from to indicate what's preceding has been reported, then save it.

If the double-check does not match the first test's residue64, it means at least one of them is wrong. A triple check to resolve which can be requested at https://mersenneforum.org/showthread.php?t=24148
On rare occasions quad or higher checks are needed and can be requested there too. You can also help out there with the workload of triple checks. See post one of that thread and the gpu link there.

While the workload of managing the lengthy tests manually is small, https://www.mersenneforum.org/showpo...92&postcount=3 includes an attachment describing client management software options, which might be useful if you'd like to try to add some automation to the assignment and result reporting process for your gpu(s).

See also for more information on CUDALucas specifications and limits, development and discussion thread link, etc. the attachment at Available Mersenne Prime hunting software



Table of contents
  1. This post
  2. Run time scaling versus exponent for the NVIDIA GTX480 of CUDALucas v2.06 http://www.mersenneforum.org/showpos...23&postcount=2
  3. CUDALucas bug and wish list http://www.mersenneforum.org/showpos...24&postcount=3
  4. links to prior posts concerning pitfalls, setup, configuration, etc. http://www.mersenneforum.org/showpos...19&postcount=4
  5. Startup scripts http://www.mersenneforum.org/showpos...20&postcount=5
  6. Draft readme update https://www.mersenneforum.org/showpo...84&postcount=6
  7. What limits how big an exponent can be run https://www.mersenneforum.org/showpo...93&postcount=7
  8. What's the best CUDA level to run CUDALucas with my gpu? https://www.mersenneforum.org/showpo...47&postcount=8
  9. CUDALucas V2.06 -h help output https://www.mersenneforum.org/showpo...96&postcount=9
  10. Save file size versus exponent https://www.mersenneforum.org/showpo...1&postcount=10
  11. etc tbd

Top of reference tree: https://www.mersenneforum.org/showpo...22&postcount=1
Attached Files
File Type: 7z cl-startup.7z (2.0 KB, 114 views)

Last fiddled with by kriesel on 2021-01-24 at 16:49 Reason: revised how-to section slightly to encourage gpuowl use
kriesel is online now  
Old 2018-05-28, 20:11   #2
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

490910 Posts
Default Lucas-Lehmer run times for CUDALucas on NVIDIA GTX480

Timings for an assortment of exponents are tabulated and charted for reference. Note, only one trial per combination was tabulated, so no measure made or indication given of reproducibility run to run for same inputs. (One exponent's run time was estimated from a fit made on results from several other exponents, with unexpectedly good results. Likely fit error appears to be 1-5% typically, subject to revision later.) See the attachment. This is a somewhat different way of looking at test speed than the GPU Lucas-Lehmer performance benchmarks at http://www.mersenne.ca/cudalucas.php
Run time power fits were made for timings from CUDALucas v2.05.1 on GTX480 separately, for p<106 (p1.339), 106<p<107 (p1.849), 107<p<108 (p2.095). Compare those results to the expected asymptotic run time scaling p2 log p log log p (~p2.117)


Top of reference tree: https://www.mersenneforum.org/showpo...22&postcount=1
Attached Files
File Type: pdf cudalucas ll test run time scaling.pdf (15.4 KB, 188 views)

Last fiddled with by kriesel on 2019-11-18 at 14:25
kriesel is online now  
Old 2018-05-28, 20:23   #3
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

4,909 Posts
Default CUDALucas bug and wish list

Here is the latest posted version of the list I am maintaining for CUDALucas As always, this is in appreciation of the authors' past contributions. Users may want to browse this for workarounds included in some of the descriptions, and for an awareness of some known pitfalls. Please respond with any comments, additions or suggestions you may have, preferably by PM to kriesel.


Top of reference tree: https://www.mersenneforum.org/showpo...22&postcount=1
Attached Files
File Type: pdf cudalucas bug and wishlist table.pdf (112.3 KB, 185 views)

Last fiddled with by kriesel on 2019-11-18 at 14:25 Reason: added items and references
kriesel is online now  
Old 2018-06-22, 15:46   #4
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

132D16 Posts
Default Links to prior posts concerning pitfalls, setup, configuration, etc.

What it may look like if your gpu should not be allowed to run or benchmark 1024 threads
http://www.mersenneforum.org/showpos...postcount=2634

set up cudalucas, and notes on -r option; checks residues for up to 8192k, no higher
http://www.mersenneforum.org/showpos...postcount=2620

Choosing a CUDA level. Considerations include which gives maximum performance, what the installed driver supports, and what the gpu model(s) installed require.
http://www.mersenneforum.org/showpos...postcount=2625

Gpus changing device numbers when another drops out (the gory details)
http://www.mersenneforum.org/showpos...postcount=2603

edited readme vintage April 2017 (no longer current)
http://www.mersenneforum.org/showpos...postcount=2576

unanswered questions about the ini file contents
http://www.mersenneforum.org/showpos...postcount=2579


Top of reference tree: https://www.mersenneforum.org/showpo...22&postcount=1

Last fiddled with by kriesel on 2019-11-18 at 14:25
kriesel is online now  
Old 2018-07-04, 15:52   #5
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

114558 Posts
Default Startup scripts

On Linux:
I haven't tried it or even looked at it, but http://www.mersenneforum.org/showpos...12&postcount=1 indicates a CUDALucas install and startup script.
Make sure you get the latest version of CUDALucas with checks for known-bad interim residues.

On Windows:
The attachment is a draft startup batch file (not installation), derived from a more compact but less annotated one I have used. It is for use after the necessary files are unzipped and placed in a folder and the CUDALucas.ini file configured, driver installed, dlls added, etc.

Note, while the CUDALucas program supports up to 256M fft lengths, they are not recommended. Run times at or above 64M can be years or decades. For example, on a GTX 1080 Ti, a run of an exponent approx 1 billion has an expected fft length of 57344k, which has a benchmark time of about 43.8 msec/iteration, corresponding to a 1.39 year run time estimate. Estimated time on a near maximal exponent ~2,147,483,647 would be ~2,147,483,647 iterations times 94.16 msec/iteration on fft length 128M, or 6.4 years on a GTX 1080 Ti. The chance of a primality test that long completing correctly without GEC or Jacobi check is small..
Exponent is currently capped at 2,147,483,645 for fft lengths 128M to 256M in the CUDALucas fft file. Reliability of a run taking multiple months or years is expected to be low, since there is no Gerbicz check or Jacobi check in CUDALucas. There's also no support in the program for residue self tests above fft length 8M.

Also mersenne.org does not assign primality tests for such high exponents (p>109) or accept results for them (nor does any other site to my knowledge).


Top of reference tree: https://www.mersenneforum.org/showpo...22&postcount=1
Attached Files
File Type: zip cl-startup.zip (2.0 KB, 157 views)

Last fiddled with by kriesel on 2020-07-16 at 19:09
kriesel is online now  
Old 2018-12-23, 14:28   #6
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

4,909 Posts
Default Draft readme update

The CUDALucas readme has been updated somewhat to include info on maximum fft length and more recent CUDA levels, as well as other additions or clarifications.


Top of reference tree: https://www.mersenneforum.org/showpo...22&postcount=1
Attached Files
File Type: txt README-edited4.txt (49.4 KB, 177 views)

Last fiddled with by kriesel on 2019-11-18 at 14:26
kriesel is online now  
Old 2019-01-10, 03:49   #7
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

114558 Posts
Default What limits how big an exponent can be run

The current CUDALucas code supports exponent values up to 231-1, 2147483647. A quick test on GTX1080Ti:
Code:
Wed Jan 09 04:41:05 2019 
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 378.78                 Driver Version: 378.78                    | 
|-------------------------------+----------------------+----------------------+
| GPU  Name            TCC/WDDM | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. | 
|===============================+======================+======================|
|   0  Quadro 2000        WDDM  | 0000:02:00.0      On |                  N/A | 
|100%   78C    P0    N/A /  N/A |     88MiB /  1024MiB |     99%      Default | 
+-------------------------------+----------------------+----------------------+
|   1  GeForce GTX 108... WDDM  | 0000:03:00.0     Off |                  N/A | 
| 66%   82C    P2   220W / 250W |   1619MiB / 11264MiB |    100%      Default | 
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID  Type  Process name                               Usage      | 
|=============================================================================|
|    0      1868    C   ... Documents\mfaktc q2000\mfaktc-win-64.exe N/A      | 
|    1      4644    C   ...CUDALucas2.06beta-CUDA8.0-Windows-x64.exe N/A      | 
+-----------------------------------------------------------------------------+
Code:
Continuing M999999937 @ iteration 4302 with fft length 57344K,  0.00% done 
|   Date     Time    |   Test Num     Iter        Residue        |    FFT   Error     ms/It     Time  |       ETA      Done   | 
|  Jan 09  04:45:26  | M999999937      5000  0xb723ad2cf90fefd5  | 57344K  0.18750  40.3755   28.18s  | 473:09:25:34   0.00%  | 
|  Jan 09  04:46:07  | M999999937      6000  0x00c230e56a4bc3ca  | 57344K  0.20313  40.6178   40.61s  | 472:20:17:29   0.00%  |
|  Jan 09  04:46:48  | M999999937      7000  0x7d01674dde8ecc02  | 57344K  0.18945  40.9224   40.92s  | 472:22:59:37   0.00%  |
Run time, reliability, and hardware life are probably an issue before gpu memory capacity or the software. Run time per exponent/primality test applies equally to PRP as to LL.

Extrapolating linearly for memory requirements (which is optimistic; above 2G, code gets a bit bigger), and by 2.1 power for run time versus exponent, and note, while I was originally composing this, as the gpu warmed up, the projected run time increased about 0.5% beyond what's tabulated here for M1G, from which all the others are extrapolated:

Code:
p     VRAM GB  run time (years per exponent)
M617M   1.00      0.47
M1G     1.62      1.3
M1234M  2.00      2.0
M2G     3.24      5.6 (~current exponent limit in the CUDALucas code)
M2.53G  4.00      9.1 
M3G     4.86     13.1 
M3.32G  5.38     16.2
M3.7G   5.99     20.3 
M4G     6.48     23.9
M4.94G  8.00     37.2 
M5G     8.10     38.2
M6G     9.72     56.
M6.8G  11.02     73.
M7G    11.34     77.
M8G    12.96    102.
M9G    14.58    131.
An 8GB or even 6GB card seems adequate for gigadigit exponents if fast enough. (Yes that would also take some coding extensions.)


Top of reference tree: https://www.mersenneforum.org/showpo...22&postcount=1

Last fiddled with by kriesel on 2020-01-05 at 20:55
kriesel is online now  
Old 2019-06-11, 22:05   #8
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

4,909 Posts
Default What's the best CUDA level to run CUDALucas with my gpu?

It depends. What does best mean? What does your gpu require? What's the lowest CUDA level usable with your gpu? What's the highest CUDA level that still supports your gpu? What other gpus do you have in the same system, that may constrain your choice of driver version or CUDA level? What driver versions and CUDA level support are available for the OS you are using? Which fft lengths will you run the most? Which releases have unacceptable reliability or bugs? Which is fastest on your hardware, and most-used fft lengths, providing the other considerations are acceptable?
In general, unless your gpu is so new that it requires the latest release, it's likely some other CUDA version could perform better. And it can vary depending on the program inputs.

I tested years ago for a performance dependence on NVIDIA driver version (~v260-v378), and did not see any statistically significant difference for driver version, on a GTX480 and Windows 7, CUDA versions 4.2-8.0. (However, on AMD, I have seen reductions of performance with driver version updates, including a drop of over 5%.) On NVIDIA, CUDA version does affect speed.

Test results ranging from CUDA5 to CUDA8 provided by ATH for his Titan black gpu are shown in the attachment. CUDA8 is rarely the fastest.


Top of reference tree: https://www.mersenneforum.org/showpo...22&postcount=1
Attached Files
File Type: pdf titan black cuda timings.pdf (18.7 KB, 175 views)

Last fiddled with by kriesel on 2019-11-26 at 07:20
kriesel is online now  
Old 2019-08-12, 15:59   #9
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

4,909 Posts
Default CUDALucas V2.06 -h help output

Note, this apparently goes to stderr since it is unaffected by output redirection.
Code:
$ CUDALucas -h|-v

$ CUDALucas [-d device_number] [-info] [-i inifile] [-threads t1 t2] [-c checkpoint_iteration] [-f fft_length] [-s folder] [-polite iteration] [-k] exponent|input_file
name

$ CUDALucas [-d device_number] [-info] [-i inifile] [-threads t1 t2] -r [0|1]

$ CUDALucas [-d device_number] -cufftbench start end passes (see cudalucas.ini)

$ CUDALucas [-d device_number] -threadbench start end passes mode (see cudalucas.ini)

$ CUDALucas [-d device_number] -memtest size passes (see cudalucas.ini)

                       -h          print this help message
                       -v          print version number
                       -info       print device information
                       -i          set .ini file name (default = "CUDALucas.ini")
                       -threads    set threads numbers (eg -threads 256 128)
                       -f          set fft length (if round off error then exit)
                       -s          save all checkpoint files
                       -polite     GPU is polite every n iterations (default -polite 0) (-polite 0 = GPU aggressive)
                       -r          exec residue test.
                       -k          enable keys (see CUDALucas.ini for details.)

Top of reference tree: https://www.mersenneforum.org/showpo...22&postcount=1

Last fiddled with by kriesel on 2019-11-18 at 14:27
kriesel is online now  
Old 2020-05-28, 23:32   #10
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

4,909 Posts
Default Save file size versus exponent

Save file size is almost proportional to exponent. There is a small amount of space used for constants regardless of exponent size. Based on fits to observed file size over a wide range of exponents (~1.4M to 1G), a very wide extrapolation is made to estimate file sizes that would be required for some very large exponents, up to M127 as an exponent.
File systems' max file size would become limiting at ~67 bits, but run time becomes limiting much lower. Gpu ram size also imposes a limit.


Top of reference tree: https://www.mersenneforum.org/showpo...22&postcount=1
Attached Files
File Type: pdf file size.pdf (14.0 KB, 125 views)

Last fiddled with by kriesel on 2020-05-31 at 14:06
kriesel is online now  
Closed Thread

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
Reference material discussion thread kriesel kriesel 62 2020-12-12 08:57
Mersenne Prime GPU Computing reference material kriesel kriesel 31 2020-07-09 14:04
Mfaktc-specific reference material kriesel kriesel 8 2020-04-17 03:50
How do you obtain material of which your disapproval governs? jasong jasong 97 2015-09-14 00:17
CUDALucas Residue Test (-r) Reference Table Brain GPU Computing 0 2012-04-12 20:21

All times are UTC. The time now is 03:08.

Tue Mar 2 03:08:09 UTC 2021 up 88 days, 23:19, 0 users, load averages: 1.16, 1.43, 1.70

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.