What size numbers can CudaLucas handle?
I'm currently performing a Lucas Lehmer test on a 100 million digit prime using CudaLucas. Can it handle numbers that large?

oh dear:(
Thanks  I searched for ages without finding that (before I asked here). The exponent in question is 3.3*10^8 which looks to be above the limit. Does that mean I must abandon my test and find another way?
EDIT... SORRY, IT GOES UP TO 1*10^9 doesn't it? So I'm okay. Not sure if I'm being daft. Last fiddled with by robertfrost on 20181026 at 14:08 
Actually, it turns out, upon further investigation, CUDALucas theoretically goes up to 2^{31}1. It will fft benchmark and thread benchmark to 256M length, and its max exponent is capped at 2147483647. See the attachment at post 3 of
https://www.mersenneforum.org/showthread.php?t=23371 and the CUDALucas reference thread linked at that thread. 
Yes, cudaLucas is limited to signed 32bits word for exponent, but sooner you will reach the limit for FFT due to the memory of the card, unless you rewrite the cuFFT library by yourself.

Code:
Wed Jan 09 04:41:05 2019 ++  NVIDIASMI 378.78 Driver Version: 378.78  +++  GPU Name TCC/WDDM  BusId Disp.A  Volatile Uncorr. ECC   Fan Temp Perf Pwr:Usage/Cap MemoryUsage  GPUUtil Compute M.  ===============================+======================+======================  0 Quadro 2000 WDDM  0000:02:00.0 On  N/A  100% 78C P0 N/A / N/A  88MiB / 1024MiB  99% Default  ++++  1 GeForce GTX 108... WDDM  0000:03:00.0 Off  N/A   66% 82C P2 220W / 250W  1619MiB / 11264MiB  100% Default  ++++ ++  Processes: GPU Memory   GPU PID Type Process name Usage  =============================================================================  0 1868 C ... Documents\mfaktc q2000\mfaktcwin64.exe N/A   1 4644 C ...CUDALucas2.06betaCUDA8.0Windowsx64.exe N/A  ++ Code:
Continuing M999999937 @ iteration 4302 with fft length 57344K, 0.00% done  Date Time  Test Num Iter Residue  FFT Error ms/It Time  ETA Done   Jan 09 04:45:26  M999999937 5000 0xb723ad2cf90fefd5  57344K 0.18750 40.3755 28.18s  473:09:25:34 0.00%   Jan 09 04:46:07  M999999937 6000 0x00c230e56a4bc3ca  57344K 0.20313 40.6178 40.61s  472:20:17:29 0.00%   Jan 09 04:46:48  M999999937 7000 0x7d01674dde8ecc02  57344K 0.18945 40.9224 40.92s  472:22:59:37 0.00%  Extrapolating linearly (which is optimistic; above 2G, code gets a bit bigger) and note, while I was composing this, as the gpu warmed up, the projected run time increased about 0.5% beyond what's tabulated here: Code:
p VRAM GB runtime (years per exponent) M1G 1.62 1.3 M2G 3.24 2.6 M3G 4.86 3.9 M3.7G 5.99 4.8 M4G 6.48 5.2 M5G 8.10 6.5 M6G 9.72 7.8 M6.8G 11.02 8.8 M7G 11.34 9.1 Any idea why signed int was used instead of unsigned for exponent, or how hard it would be to change (hidden complications)? Last fiddled with by kriesel on 20190109 at 11:18 

Oops
Please disregard the run times in the preceding post. The only one that's credible is the 1.3 years for M1G. The run times should be scaling at approximately p^{2.1}, not linearly. The extrapolation table has been adjusted and extended to include estimates for some typical gpu memory capacities, and posted at https://www.mersenneforum.org/showpo...93&postcount=7

For the records, cuFFT uses more memory than gwlib/P95 does, and not always transparent for the user. I was never able to run 100M digit LL test (332M+ expo) with my GTX580's with 1.5GB memory (I still own 4 of them, only 2 in production, the other 2 shelved, no available PCIE slots). It will not say that it can't run, but you get a lot of strange errors and mismatches somewhere after a million iteration (for example) and you are never able to finish a test.
For the 3GB version of the same card, you can go to about 550M (can't remember exactly the numbers, I had 2 such cards and sold them years ago). However, my 6GB Titans are currently testing M666666667 (ETA in ~4 months) and there is no problem with it. Your CPU does the calculus sequential, and therefore one iteration of LL does not need much memory. In the GPU, all the butterfly is done in the same time in parallel, so cuFFT operates with all the data, somehow (well, this is not really true, but that is the idea) so it needs more memory that the few MB you give to p95 for LL tests. More I can't say, but you don't know if it works until you really do a complete test at that size  backed up by a parallel run in a second card, of course, otherwise you lose the time  I get mismatching errors and i need to resume weekly (23 times per month) at the clocks I push the Titans. Last fiddled with by LaurV on 20190110 at 09:22 
Have you tried using nvidiasmi to show gpu memory usage? GpuZ is useful for some things but it seems to show memory usage approximately mod 4GB by comparison to nvidiasmi.

