mersenneforum.org Mersenne Prime mostly-GPU Computing reference material
 Register FAQ Search Today's Posts Mark Forums Read

2018-05-24, 16:11   #2
kriesel

"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

2×5×659 Posts
Available Mersenne Prime hunting software

The attachment is a pdf tabulating GIMPS program names, requirements, limits/capabilities, download locations, discussion forum threads etc. versus computing hardware and computation type. It covers both gpu-oriented software and cpu-only software. It is periodically updated as changes come to my attention.

(Content of the available software tabulation was developed with the help of various posters at http://www.mersenneforum.org/showthread.php?t=22450 and some answers to questions by some of the code authors.)

Top of reference tree: https://www.mersenneforum.org/showpo...22&postcount=1
Attached Files
 Mersenne prime hunting software.pdf (132.9 KB, 76 views)

Last fiddled with by kriesel on 2022-03-20 at 20:38 Reason: update attachment

2018-05-24, 16:19   #3
kriesel

"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

2×5×659 Posts
Available Mersenne prime hunting client management software

While it is not necessary to use such separate client management software, to run single or multiple gpus on GIMPS tasks, some participants may find it useful to do so.

The attachment is a pdf describing available software for automatically obtaining mersenne related work and/or reporting results, etc. While primarily oriented to gpu applications, it includes information on support for cpu oriented applications prime95, mprime, and Mlucas also.

While prime95 and mprime are very well supported by an integral PrimeNet API implementation, there is also reportedly a separate command line monitor for those who want frequent updates on status. See https://www.mersenneforum.org/showthread.php?t=25007

Top of reference tree: https://www.mersenneforum.org/showpo...22&postcount=1
Attached Files
 available client management software.pdf (106.2 KB, 31 views)

Last fiddled with by kriesel on 2022-04-19 at 00:21 Reason: updated & reformatted attachment

2018-05-24, 18:13   #5
kriesel

"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

19BE16 Posts
Ancestry of available software

Some software was derived from other earlier software (or multiple others). The attached crude diagram shows my understanding of ancestor/descendant relationships, gleaned from sources such as source code comments/credits, and web pages. It's intended to show ancestry of code, not concepts. Code shown without connecting lines is believed to have been developed independently. The thin line between lucdwt and prime95 is intended to represent prime95's adverb "loosely".

Top of reference tree: https://www.mersenneforum.org/showpo...22&postcount=1
Attached Files
 parentage.pdf (7.8 KB, 431 views)

Last fiddled with by kriesel on 2019-11-15 at 22:26

2018-05-27, 20:33   #6
kriesel

"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

2·5·659 Posts
Utilities for GPU computing etc.

Here are some things I've run across that I found useful or interesting, or have seen recommended by others.

Versions stated for each, and links, are current as of March 22 2019 or better.
Subject to change without notice, no warranty express or implied, availability versus OS etc will vary, don't look a gift horse in the mouth, ...

URL's are clickable in Acrobat pdf reader.

It can be a bit confusing which OpenCL device is which device number or platform number on multiplatform or multigpu system, especially since a cpu and IGP may add both devices and a platform. Numbering changes when one platform is uninstalled or malfunctioning. lsgpu is a simple scan, enumeration and summary program. A modified version with source, doc, url, and Wiindows exe are attached in lsgpu.7z.

Suggestions, additions, corrections invited by PM to kriesel.

Top of reference tree: https://www.mersenneforum.org/showpo...22&postcount=1
Attached Files
 utilities.pdf (35.6 KB, 444 views) lsgpu.7z (229.9 KB, 250 views)

Last fiddled with by kriesel on 2020-10-27 at 01:18 Reason: added lsgpu & .7z file

2018-05-27, 20:46   #7
kriesel

"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

2×5×659 Posts
list of fft lengths

A table of 7-smooth numbers that are multiples of 210, from 210 to 226 (1k to 65536k or 64M), is provided in attachment "fft lengths.pdf".

This is the list from which code authors are likely to select lengths for fft code implementation, for primality testing or P-1 computation, of significance for running exponents within the mersenne.org exponent range p<109. (Some software implements many of these, and some implements few or just one.)

Some gpu applications are coded for up to 128M or higher, sometimes varying by version number, or CUDA level. (At least CUDALucas, CUDAPm1). gpuOwL has been extended in some versions to a subset of the 7-smooth numbers with a maximum fft length of up to 192M depending on version number. Mlucas is coded for up to 512M (and up to 512M for Fermat numbers).
Additional tables covering up to 128M, 256M and 512M are also attached.

Recently up to 13-smooth is being used in gpuowl's fft lengths; it's currently up to 120M length. Mlucas mentions up to 31-smooth in its source code comments but implements up to 13-smooth.

It's possible to go to higher fft lengths supporting larger exponents, but there's not much point to doing so currently, since the primality test computation would be likely to take longer than the hardware lifetime. Some already-implemented fft lengths would require years or decades run time on individual exponents on the fastest readily available existing hardware.

Top of reference tree: https://www.mersenneforum.org/showpo...22&postcount=1
Attached Files
 fft lengths.pdf (54.1 KB, 500 views) fft lengths above 64Mto 128M.pdf (10.6 KB, 323 views) fft lengths above 128M to 256M.pdf (11.3 KB, 321 views) fft lengths above 256M to 512M.pdf (13.4 KB, 313 views)

Last fiddled with by kriesel on 2022-04-20 at 15:45 Reason: Mlucas smoothness correction, other edits

2018-05-27, 20:54   #8
kriesel

"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

659010 Posts
Four primality test programs' performance charted together

clLucas, CUDALucas, gpulucas, and gpuOwL compared.
Note the speed disparity of the GTX480 CUDA gpu, (at 244W), at 3.6-3.7 times the hardware performance rating of the OpenCL low power card (50W), when interpreting these values. Normalizing to equal speed hardware, gpuOwL seems to perform fastest by a comfortable margin.

A fairer hardware comparison would put clLucas and gpuOwl on an RX480. But I had none available at the time. See also http://www.mersenneforum.org/showpos...&postcount=386 showing the RX480 3.4-3.6 times faster than the RX550 on the same exponents.

Note also, that gpuowl v1.9 is what was measured and compared, and that many performance improvements in gpuowl have been implemented since.

Top of reference tree: https://www.mersenneforum.org/showpo...22&postcount=1
Attached Files
 primality tester performance comparison.pdf (15.1 KB, 446 views)

Last fiddled with by kriesel on 2020-08-27 at 17:30

 2018-05-28, 15:52 #9 kriesel     "TF79LL86GIMPS96gpu17" Mar 2017 US midwest 2·5·659 Posts Mersenne prime hunt work coordination sites vs type and exponent Here's another condensation of information I gathered along the way. Please reply in the discussion thread or by PM with any corrections or additions you may have. This version includes a link to the archived version of Will Edgington's mersenne related site, which was at one time the only public location I knew of with broad coverage of data for Mersenne number factor data for exponents > 232. Unfortunately the zip files there are truncated to 128KB on download and so not usable. Mersenne.ca has been expanded to 1010. As always, additions, corrections or suggestions are invited by PM to kriesel Top of reference tree: https://www.mersenneforum.org/showpo...22&postcount=1 Last fiddled with by kriesel on 2022-04-19 at 00:19 Reason: attachment updated for mersenne.ca change, GIMPS progress, etc
 2018-05-28, 16:17 #10 kriesel     "TF79LL86GIMPS96gpu17" Mar 2017 US midwest 19BE16 Posts Devcon (Automating recovery from Windows TDR events for GPUs) Windows has provision for detecting delays in display devices responding to it. Above a certain threshold delay, it concludes the device is hung and restarts it. Even if the GPU involved is only running math code, not a display, and may not have a display physically connected. (You can find such restarts in the system event log. For example, Event ID 4101; something like "Display driver nvlddmkm stopped responding and has successfully recovered." for NVIDIA. AMD GPUs are also affected. "Display driver amdkmdap stopped responding and has successfully recovered." IGPs may also be affected, and at smaller work chunks since they are usually quite slow.) The purpose of such restarts is avoiding hung consoles or blue-screen OS crashes. Keeping individual GPU tasks small enough to complete within the TDR timeout is a known issue in the general GPU software developer community. On Windows 10, an XFX Radeon VII has the issue and hangs for gpuowl unless a very small fft length is used. Older NVIDIA GPUs with low compute compatibility levels have issues with these display restarts. (For example, GTX480, Quadro 4000, which are Compute Capability 2.0) As GIMPS work advances to larger exponents, computations take longer. Windows detects a long time for the GPU to respond and interprets it as a hung device, and stops and restarts it. Unfortunately, it does this in a way that does not reconnect to existing sessions of GPU-Z or other utilities, or to CUDA applications such as CUDALucas or CUDAPm1 or mfaktc, or the opencl equivalent applications or gpuowl. (Reportedly, this issue does not arise on linux.) Historically, the applications have been wrapped in batch scripts to restart them, and adding a TdrDelay value in the registry larger than the implicit default has been recommended. (see https://docs.microsoft.com/en-us/win...-registry-keys) However, these are incomplete solutions, that often fail. Recently I've found increasing TdrDelay is not enough. On higher bit levels of large exponents, and old slow GPUs, increasing TdrDdiDelay seems to be needed also. Another approach is to run an older driver than the level at which it showed up (on NVIDIA, below about 300). That may be impractical if there's a newer card also present that requires a newer driver. Sometimes while Windows views the GPU as working properly, running applications such as GPU-Z or a newly started GPU computing application can't access the GPU. A system restart clears that situation up. The restart is disruptive to GPU applications running on other GPUs, prime95, and anything else running on the system, and requires operator intervention to stop and restart it all. But, there is another approach. In Windows' Device Manager, disable and reenable the errant display device to avoid a system restart. I've seen this reenable access to sensor readings of a GPU in a preexisting GPU-Z session, as well as make the GPU available again for use by a newly launched CUDA application. The system restart is avoided, allowing prime95 and other GPUs' application instances to continue uninterrupted and undisturbed. This doesn't always work; sometimes the GPU can not be reenabled. It may be that the GPU is overheated, or the power supply is at its limit, etc. The device disable/reenable can be done from the command line or a batch script, minimizing idle time and operator intervention, using the appropriate version of devcon.exe (available in Visual Studio, the Windows Driver Kit, etc) per https://superuser.com/questions/4290...a-command-line https://docs.microsoft.com/en-us/win...devtest/devcon for the version of Windows installed. Such a script may benefit if it includes delays between commands. Some Windows OS versions don't support the timeout command. Delays can be provided on versions where timeout generates an error, by conditional ping to a nonexistent address (preferably in your own LAN address space, for stability), padding the number of seconds wait with 3 zeros since ping timing is in milliseconds. Code: set delay=3 set nonexist=192.168.2.3 timeout /t %delay% if errorlevel 1 ping %nonexist% -n 1 -w %delay%000 Another piece of the puzzle is the device id for devcon.exe in the correct form. Another piece of the puzzle is getting the batch file containing the devcon command to run as administrator. Otherwise devcon will run at too low a privilege level and list devices but not control them, even when the batch file is launched by an account with administrator permissions. (Most online how-tos for it omit that crucial little detail.) For getting devcon to work, see http://classicshell.net/forum/viewtopic.php?f=5&t=423 particularly the requirement to create a shortcut to force the batch file to run as administrator, and https://docs.microsoft.com/en-us/win...local_computer to find the names of your gpu device(s). So, putting the pieces together: a) make the batch file that runs the CUDA app (CUDALucas, CUDAPm1) or OpenCl app that is affected by driver timeouts. In my experience mfaktc and mfakto seems less often affected. Mfaktx seems affected in the higher bit levels of factoring where run times are quite long per class. b) make a shortcut to the batch file, and in the advanced tab of the shortcut properties, set it to run as administrator c) modify the shortcut to cmd /k batchfile so it sticks around after it exits and the flow and any error messages can be examined d) install devcon.exe on the system, either in the working directory of the CUDA app or in \windows\system32, or somewhere else that's in your path. e) use devcon.exe interactively to obtain the unique device ID for the GPU to be controlled f) modify the batch file to use the unique device ID obtained, in the disable and enable lines, and adjust other settings as needed. Be sure to use enough of the id that it identifies a unique GPU device matching the CUDA device number affected. g) secure your system so that running a batch file at elevated privilege is an acceptable risk, h) make adjustments to TDR related registry settings as needed; increasing TDRDelay helps some; increasing TDRDdiDelay may also help. See https://docs.microsoft.com/en-us/win...-registry-keys for the list and defaults i) test j) use and enjoy Draft batch file, to be run from the high-privilege shortcut: Code: set delay=1 set maxdelay=10 set count=0 set countmax=5 set exe=cudaPm1_win64_20130923_CUDA_55.exe set model=GeForce GTX 480 set dev=1 set nonexist=192.168.2.3 cd "\Users\ken\My Documents\cudapm1" echo worktodo.txt >>cudapm1.txt goto loop : change the above set commands etc & quoted device identifiers below, to suit your situation and preferences : following is what does the production work, from the worktodo file, putting results in the results file and appending history in the cudapm1.txt log file : limited looping may be useful in some cases (such as Windows TDR events); too high a count or no limit mostly pointlessly inflates log size, especially if worktodo is emptied :loop echo batch wrapper reports (re)launch of %exe% on %model% at %date% %time% reset count %count% of max %countmax% >>cudapm1.txt title %computername% model %model% %exe% dev %dev% reset count %count% (%0) %exe% -d %dev% >>cudapm1.txt echo batch wrapper reports exit at %date% %time% >>cudapm1.txt echo attempting disable/enable cycle on gpu device >>cudapm1.txt devcon disable "PCI\VEN_10DE&DEV_06C0&SUBSYS_14803842" >>cudapm1.txt devcon enable "PCI\VEN_10DE&DEV_06C0&SUBSYS_14803842" >>cudapm1.txt timeout /T %delay% if errorlevel 1 ping %nonexist% -n 1 -w %delay%000 if %delay% lss %maxdelay% set /A delay=delay*2 if %delay% gtr %maxdelay% set delay=%maxdelay% set /A count=count+1 if %count% lss %countmax% goto loop echo at %date% %time% countmax=%countmax% reached, exiting batch file Possibly at some point the equivalent could be built into the applications' code. For now, there is this batch file workaround. All the above relates to consumer grade NVIDIA GPUs and the WDDM. The Tesla family can use a different driver mode, Tesla Compute Cluster (TCC) mode, for nondisplay compute-only gpus, that does not have the TDR issue. That different driver mode and model (WDM) are not applicable or available for the consumer grade GPUs such as the GeForce models. (For more information, see the "Use a Suitable Driver Model" section in a recent version of "NVIDIA CUDA GETTING STARTED GUIDE FOR MICROSOFT WINDOWS") Top of reference tree: https://www.mersenneforum.org/showpo...22&postcount=1 Last fiddled with by kriesel on 2021-07-12 at 18:29 Reason: removed "old" qualifier
2018-06-01, 20:08   #11
kriesel

"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

2·5·659 Posts

What's the exponent required for 10, 100 or 1000 megadigit Mersenne numbers? The nearest prime exponents and the numbers of digits of their corresponding Mersenne numbers?
How were those calculated? See the attachment for a handy list. It's been checked against
http://oeis.org/A034887 which gave the formula floor(n*log(2)/log(10)) + 1

Also included is a rough ballpark estimate of what's feasible on a GTX1070 in CUDALucas 2.06.

Top of reference tree: https://www.mersenneforum.org/showpo...22&postcount=1
Attached Files

Last fiddled with by kriesel on 2019-11-15 at 22:28

 Similar Threads Thread Thread Starter Forum Replies Last Post Madpoo Lounge 6 2017-01-31 20:03 Brain GPU Computing 20 2015-10-25 18:39 jasong jasong 97 2015-09-14 00:17 Jushi Math 2 2006-08-28 12:07 GP2 Lounge 2 2003-12-03 14:13

All times are UTC. The time now is 19:29.

Sat Jul 2 19:29:58 UTC 2022 up 79 days, 17:31, 1 user, load averages: 1.11, 1.19, 1.10