mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware > GPU Computing

Reply
 
Thread Tools
Old 2018-10-26, 12:26   #1
robertfrost
 
Oct 2018

22 Posts
Default What size numbers can CudaLucas handle?

I'm currently performing a Lucas Lehmer test on a 100 million digit prime using CudaLucas. Can it handle numbers that large?
robertfrost is offline   Reply With Quote
Old 2018-10-26, 13:18   #2
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

592510 Posts
Default

Quote:
Originally Posted by robertfrost View Post
I'm currently performing a Lucas Lehmer test on a 100 million digit prime using CudaLucas. Can it handle numbers that large?
Yes, and considerably larger. See the reference material at https://www.mersenneforum.org/forumd....php?f=154.The attachment in post two of https://www.mersenneforum.org/showthread.php?t=23371 lists the commonly used gpu software for mersenne hunting and gives nutshell descriptions of their limits. There are also bug and wish lists for several programs, in application-specific threads, including CUDALucas. This material is currently being actively maintained, with several updates made yesterday.
kriesel is online now   Reply With Quote
Old 2018-10-26, 14:06   #3
robertfrost
 
Oct 2018

1002 Posts
Default oh dear:(

Thanks - I searched for ages without finding that (before I asked here). The exponent in question is 3.3*10^8 which looks to be above the limit. Does that mean I must abandon my test and find another way?


EDIT... SORRY, IT GOES UP TO 1*10^9 doesn't it? So I'm okay. Not sure if I'm being daft.

Last fiddled with by robertfrost on 2018-10-26 at 14:08
robertfrost is offline   Reply With Quote
Old 2019-01-07, 16:36   #4
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

3×52×79 Posts
Default

Quote:
Originally Posted by robertfrost View Post
SORRY, IT GOES UP TO 1*10^9 doesn't it?
Actually, it turns out, upon further investigation, CUDALucas theoretically goes up to 231-1. It will fft benchmark and thread benchmark to 256M length, and its max exponent is capped at 2147483647. See the attachment at post 3 of

https://www.mersenneforum.org/showthread.php?t=23371 and the CUDALucas reference thread linked at that thread.
kriesel is online now   Reply With Quote
Old 2019-01-09, 06:46   #5
LaurV
Romulan Interpreter
 
LaurV's Avatar
 
"name field"
Jun 2011
Thailand

24×613 Posts
Default

Yes, cudaLucas is limited to signed 32-bits word for exponent, but sooner you will reach the limit for FFT due to the memory of the card, unless you rewrite the cuFFT library by yourself.
LaurV is offline   Reply With Quote
Old 2019-01-09, 11:05   #6
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

3·52·79 Posts
Default

Quote:
Originally Posted by LaurV View Post
Yes, cudaLucas is limited to signed 32-bits word for exponent, but sooner you will reach the limit for FFT due to the memory of the card, unless you rewrite the cuFFT library by yourself.
A quick test on GTX1080Ti:
Code:
Wed Jan 09 04:41:05 2019
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 378.78                 Driver Version: 378.78                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name            TCC/WDDM | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Quadro 2000        WDDM  | 0000:02:00.0      On |                  N/A |
|100%   78C    P0    N/A /  N/A |     88MiB /  1024MiB |     99%      Default |
+-------------------------------+----------------------+----------------------+
|   1  GeForce GTX 108... WDDM  | 0000:03:00.0     Off |                  N/A |
| 66%   82C    P2   220W / 250W |   1619MiB / 11264MiB |    100%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID  Type  Process name                               Usage      |
|=============================================================================|
|    0      1868    C   ... Documents\mfaktc q2000\mfaktc-win-64.exe N/A      |
|    1      4644    C   ...CUDALucas2.06beta-CUDA8.0-Windows-x64.exe N/A      |
+-----------------------------------------------------------------------------+
Code:
Continuing M999999937 @ iteration 4302 with fft length 57344K,  0.00% done

|   Date     Time    |   Test Num     Iter        Residue        |    FFT   Error     ms/It     Time  |       ETA      Done   |
|  Jan 09  04:45:26  | M999999937      5000  0xb723ad2cf90fefd5  | 57344K  0.18750  40.3755   28.18s  | 473:09:25:34   0.00%  |
|  Jan 09  04:46:07  | M999999937      6000  0x00c230e56a4bc3ca  | 57344K  0.20313  40.6178   40.61s  | 472:20:17:29   0.00%  |
|  Jan 09  04:46:48  | M999999937      7000  0x7d01674dde8ecc02  | 57344K  0.18945  40.9224   40.92s  | 472:22:59:37   0.00%  |
Run time, reliability, and hardware life are probably an issue before gpu memory. Run time per exponent/primality test applies equally to PRP as to LL.

Extrapolating linearly (which is optimistic; above 2G, code gets a bit bigger) and note, while I was composing this, as the gpu warmed up, the projected run time increased about 0.5% beyond what's tabulated here:

Code:
p     VRAM GB  runtime (years per exponent)
M1G     1.62      1.3
M2G     3.24      2.6
M3G     4.86      3.9
M3.7G   5.99      4.8
M4G     6.48      5.2
M5G     8.10      6.5
M6G     9.72      7.8
M6.8G  11.02      8.8
M7G    11.34      9.1
An 8gb or even 6gb card seems adequate for gigadigit exponents if fast enough. (Yes that would also take some coding extensions.)

Any idea why signed int was used instead of unsigned for exponent, or how hard it would be to change (hidden complications)?

Last fiddled with by kriesel on 2019-01-09 at 11:18
kriesel is online now   Reply With Quote
Old 2019-01-10, 03:53   #7
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

3×52×79 Posts
Default Oops

Please disregard the run times in the preceding post. The only one that's credible is the 1.3 years for M1G. The run times should be scaling at approximately p2.1, not linearly. The extrapolation table has been adjusted and extended to include estimates for some typical gpu memory capacities, and posted at https://www.mersenneforum.org/showpo...93&postcount=7
kriesel is online now   Reply With Quote
Old 2019-01-10, 09:16   #8
LaurV
Romulan Interpreter
 
LaurV's Avatar
 
"name field"
Jun 2011
Thailand

24·613 Posts
Default

For the records, cuFFT uses more memory than gwlib/P95 does, and not always transparent for the user. I was never able to run 100M digit LL test (332M+ expo) with my GTX580's with 1.5GB memory (I still own 4 of them, only 2 in production, the other 2 shelved, no available PCIE slots). It will not say that it can't run, but you get a lot of strange errors and mismatches somewhere after a million iteration (for example) and you are never able to finish a test.

For the 3GB version of the same card, you can go to about 550M (can't remember exactly the numbers, I had 2 such cards and sold them years ago).

However, my 6GB Titans are currently testing M666666667 (ETA in ~4 months) and there is no problem with it.

Your CPU does the calculus sequential, and therefore one iteration of LL does not need much memory. In the GPU, all the butterfly is done in the same time in parallel, so cuFFT operates with all the data, somehow (well, this is not really true, but that is the idea) so it needs more memory that the few MB you give to p95 for LL tests.

More I can't say, but you don't know if it works until you really do a complete test at that size - backed up by a parallel run in a second card, of course, otherwise you lose the time - I get mismatching errors and i need to resume weekly (2-3 times per month) at the clocks I push the Titans.

Last fiddled with by LaurV on 2019-01-10 at 09:22
LaurV is offline   Reply With Quote
Old 2019-01-10, 13:14   #9
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

3·52·79 Posts
Default

Quote:
Originally Posted by LaurV View Post
For the records, cuFFT uses more memory than gwlib/P95 does, and not always transparent for the user.
Have you tried using nvidia-smi to show gpu memory usage? Gpu-Z is useful for some things but it seems to show memory usage approximately mod 4GB by comparison to nvidia-smi.
kriesel is online now   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
About GNFS and size of smooth numbers JuanraG Factoring 7 2014-11-04 16:43
How to pick the best FFT size: a CUDALucas guide Karl M Johnson GPU Computing 16 2013-11-03 05:30
How to handle ECM results? dbaugh PrimeNet 6 2012-11-09 19:27
Can msieve handle c197's with SNFS? david314 Msieve 21 2012-07-29 15:21
New PC can handle two instances of Prime 95 Bundu Software 9 2004-08-21 02:29

All times are UTC. The time now is 20:40.


Sun Dec 5 20:40:44 UTC 2021 up 135 days, 15:09, 1 user, load averages: 1.32, 1.32, 1.45

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.