mersenneforum.org What size numbers can CudaLucas handle?
 Register FAQ Search Today's Posts Mark Forums Read

 2018-10-26, 12:26 #1 robertfrost   Oct 2018 22 Posts What size numbers can CudaLucas handle? I'm currently performing a Lucas Lehmer test on a 100 million digit prime using CudaLucas. Can it handle numbers that large?
2018-10-26, 13:18   #2
kriesel

"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

3×52×79 Posts

Quote:
 Originally Posted by robertfrost I'm currently performing a Lucas Lehmer test on a 100 million digit prime using CudaLucas. Can it handle numbers that large?
Yes, and considerably larger. See the reference material at https://www.mersenneforum.org/forumd....php?f=154.The attachment in post two of https://www.mersenneforum.org/showthread.php?t=23371 lists the commonly used gpu software for mersenne hunting and gives nutshell descriptions of their limits. There are also bug and wish lists for several programs, in application-specific threads, including CUDALucas. This material is currently being actively maintained, with several updates made yesterday.

 2018-10-26, 14:06 #3 robertfrost   Oct 2018 22 Posts oh dear:( Thanks - I searched for ages without finding that (before I asked here). The exponent in question is 3.3*10^8 which looks to be above the limit. Does that mean I must abandon my test and find another way? EDIT... SORRY, IT GOES UP TO 1*10^9 doesn't it? So I'm okay. Not sure if I'm being daft. Last fiddled with by robertfrost on 2018-10-26 at 14:08
2019-01-07, 16:36   #4
kriesel

"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

3·52·79 Posts

Quote:
 Originally Posted by robertfrost SORRY, IT GOES UP TO 1*10^9 doesn't it?
Actually, it turns out, upon further investigation, CUDALucas theoretically goes up to 231-1. It will fft benchmark and thread benchmark to 256M length, and its max exponent is capped at 2147483647. See the attachment at post 3 of

 2019-01-09, 06:46 #5 LaurV Romulan Interpreter     "name field" Jun 2011 Thailand 24·613 Posts Yes, cudaLucas is limited to signed 32-bits word for exponent, but sooner you will reach the limit for FFT due to the memory of the card, unless you rewrite the cuFFT library by yourself.
2019-01-09, 11:05   #6
kriesel

"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

3·52·79 Posts

Quote:
 Originally Posted by LaurV Yes, cudaLucas is limited to signed 32-bits word for exponent, but sooner you will reach the limit for FFT due to the memory of the card, unless you rewrite the cuFFT library by yourself.
A quick test on GTX1080Ti:
Code:
Wed Jan 09 04:41:05 2019
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 378.78                 Driver Version: 378.78                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name            TCC/WDDM | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Quadro 2000        WDDM  | 0000:02:00.0      On |                  N/A |
|100%   78C    P0    N/A /  N/A |     88MiB /  1024MiB |     99%      Default |
+-------------------------------+----------------------+----------------------+
|   1  GeForce GTX 108... WDDM  | 0000:03:00.0     Off |                  N/A |
| 66%   82C    P2   220W / 250W |   1619MiB / 11264MiB |    100%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID  Type  Process name                               Usage      |
|=============================================================================|
|    0      1868    C   ... Documents\mfaktc q2000\mfaktc-win-64.exe N/A      |
|    1      4644    C   ...CUDALucas2.06beta-CUDA8.0-Windows-x64.exe N/A      |
+-----------------------------------------------------------------------------+
Code:
Continuing M999999937 @ iteration 4302 with fft length 57344K,  0.00% done

|   Date     Time    |   Test Num     Iter        Residue        |    FFT   Error     ms/It     Time  |       ETA      Done   |
|  Jan 09  04:45:26  | M999999937      5000  0xb723ad2cf90fefd5  | 57344K  0.18750  40.3755   28.18s  | 473:09:25:34   0.00%  |
|  Jan 09  04:46:07  | M999999937      6000  0x00c230e56a4bc3ca  | 57344K  0.20313  40.6178   40.61s  | 472:20:17:29   0.00%  |
|  Jan 09  04:46:48  | M999999937      7000  0x7d01674dde8ecc02  | 57344K  0.18945  40.9224   40.92s  | 472:22:59:37   0.00%  |
Run time, reliability, and hardware life are probably an issue before gpu memory. Run time per exponent/primality test applies equally to PRP as to LL.

Extrapolating linearly (which is optimistic; above 2G, code gets a bit bigger) and note, while I was composing this, as the gpu warmed up, the projected run time increased about 0.5% beyond what's tabulated here:

Code:
p     VRAM GB  runtime (years per exponent)
M1G     1.62      1.3
M2G     3.24      2.6
M3G     4.86      3.9
M3.7G   5.99      4.8
M4G     6.48      5.2
M5G     8.10      6.5
M6G     9.72      7.8
M6.8G  11.02      8.8
M7G    11.34      9.1
An 8gb or even 6gb card seems adequate for gigadigit exponents if fast enough. (Yes that would also take some coding extensions.)

Any idea why signed int was used instead of unsigned for exponent, or how hard it would be to change (hidden complications)?

Last fiddled with by kriesel on 2019-01-09 at 11:18

 2019-01-10, 03:53 #7 kriesel     "TF79LL86GIMPS96gpu17" Mar 2017 US midwest 3·52·79 Posts Oops Please disregard the run times in the preceding post. The only one that's credible is the 1.3 years for M1G. The run times should be scaling at approximately p2.1, not linearly. The extrapolation table has been adjusted and extended to include estimates for some typical gpu memory capacities, and posted at https://www.mersenneforum.org/showpo...93&postcount=7
 2019-01-10, 09:16 #8 LaurV Romulan Interpreter     "name field" Jun 2011 Thailand 24×613 Posts For the records, cuFFT uses more memory than gwlib/P95 does, and not always transparent for the user. I was never able to run 100M digit LL test (332M+ expo) with my GTX580's with 1.5GB memory (I still own 4 of them, only 2 in production, the other 2 shelved, no available PCIE slots). It will not say that it can't run, but you get a lot of strange errors and mismatches somewhere after a million iteration (for example) and you are never able to finish a test. For the 3GB version of the same card, you can go to about 550M (can't remember exactly the numbers, I had 2 such cards and sold them years ago). However, my 6GB Titans are currently testing M666666667 (ETA in ~4 months) and there is no problem with it. Your CPU does the calculus sequential, and therefore one iteration of LL does not need much memory. In the GPU, all the butterfly is done in the same time in parallel, so cuFFT operates with all the data, somehow (well, this is not really true, but that is the idea) so it needs more memory that the few MB you give to p95 for LL tests. More I can't say, but you don't know if it works until you really do a complete test at that size - backed up by a parallel run in a second card, of course, otherwise you lose the time - I get mismatching errors and i need to resume weekly (2-3 times per month) at the clocks I push the Titans. Last fiddled with by LaurV on 2019-01-10 at 09:22
2019-01-10, 13:14   #9
kriesel

"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

3·52·79 Posts

Quote:
 Originally Posted by LaurV For the records, cuFFT uses more memory than gwlib/P95 does, and not always transparent for the user.
Have you tried using nvidia-smi to show gpu memory usage? Gpu-Z is useful for some things but it seems to show memory usage approximately mod 4GB by comparison to nvidia-smi.

 Similar Threads Thread Thread Starter Forum Replies Last Post JuanraG Factoring 7 2014-11-04 16:43 Karl M Johnson GPU Computing 16 2013-11-03 05:30 dbaugh PrimeNet 6 2012-11-09 19:27 david314 Msieve 21 2012-07-29 15:21 Bundu Software 9 2004-08-21 02:29

All times are UTC. The time now is 19:52.

Sun Dec 5 19:52:17 UTC 2021 up 135 days, 14:21, 1 user, load averages: 1.36, 1.50, 1.48