20110201, 17:44  #397  
Jul 2009
Tokyo
2·5·61 Posts 
Hi, Karl M Johnson
Quote:
Code:
msft@ubuntu:~$ ./CUDALucas c1000 33333333 & [1] 14609 msft@ubuntu:~$ Iteration 10000 M( 33333333 )C, 0xd717246f501c7d94, n = 2097152, CUDALucas v1.0 msft@ubuntu:~$ kill 14609 msft@ubuntu:~$ [1]+ Done ./CUDALucas c1000 33333333 msft@ubuntu:~$ ./CUDALucas c1000 c33333333 caso 2 Iteration 20000 M( 33333333 )C, 0x7f036ff2b230121b, n = 2097152, CUDALucas v1.0 

20110201, 17:51  #398 
Mar 2010
19B_{16} Posts 
Works! Thanks! 
20110202, 02:50  #399 
Jun 2005
3·43 Posts 
Small update. Windows will not rename a file if it is open. This happens in CUDALucas if you run from a checkpoint file  the c24603451 file will remain open. All this means is that CUDALucas won't be able to back up the old checkpoint before updating it. It's not a significant problem  I only noticed it because I needed both the current and backup files to time execution speed.
In any case, I've fixed the problem and included it here. I've also included cudart64_31_9.dll in the archive so it should be everything you need to build and/or run in one shot. cudalucas.1.0a.winx64.zip Run times on my factory overclocked GTX 275, along with some rough run times for current work assignments. I know these aren't the most efficient use of the code but it's a good basis for comparison to a CPU. 8.96 msec/iter @ 2M FFT (~ 2.5 days for a 25M LL double check) 18.8 msec/iter @ 4M FFT (~ 11 days for a 47M LL first time run) Not sure how that compares to Linux versions, but it's definitely fast enough to be useful. 
20110204, 03:05  #400 
P90 years forever!
Aug 2002
Yeehaw, FL
2^{4}·17·29 Posts 
Does Nvidia's license let you include cufft64_31_9.dll? If so, please include that in your zip file so that it truly includes everything you need to run cudalucas.
Last fiddled with by Prime95 on 20110204 at 03:06 
20110204, 12:48  #401 
Jul 2009
Tokyo
1001100010_{2} Posts 
CUFFT Bench
Code:
#include <cuda.h> #include <cuda_runtime.h> #include <cufft.h> #include <cutil_inline.h> int main() { cufftHandle plan; cudaEvent_t start, stop; double *x; double *g_x; int i,j,imax; imax = 1024*1024*4; cutilSafeCall(cudaMalloc((void**)&g_x, sizeof(double)*imax)); x = ((double *)malloc(sizeof(double)*imax)); for(i=0;i<imax;i++)x[i]=0; cutilSafeCall(cudaMemcpy(g_x, x, sizeof(double)*imax, cudaMemcpyHostToDevice)); cutilSafeCall( cudaEventCreate(&start) ); cutilSafeCall( cudaEventCreate(&stop) ); for(j=1024*1024;j<imax;j+=1024*1024) { cufftSafeCall(cufftPlan1d(&plan, j, CUFFT_Z2Z, 1)); cufftSafeCall(cufftExecZ2Z(plan,(cufftDoubleComplex *)g_x,(cufftDoubleComplex *)g_x, CUFFT_INVERSE)); cutilSafeCall( cudaEventRecord(start, 0) ); for(i=0;i<10;i++) cufftSafeCall(cufftExecZ2Z(plan,(cufftDoubleComplex *)g_x,(cufftDoubleComplex *)g_x, CUFFT_INVERSE)); cutilSafeCall( cudaEventRecord(stop, 0) ); cutilSafeCall( cudaEventSynchronize(stop) ); float outerTime; cutilSafeCall( cudaEventElapsedTime(&outerTime, start, stop) ); printf("CUFFT_Z2Z size=%d k time=%f msec\n",j/1024,outerTime/10); cufftSafeCall(cufftDestroy(plan)); } for(j=1024*1024;j<imax;j+=256*1024) { cufftSafeCall(cufftPlan1d(&plan, j, CUFFT_D2Z, 1)); cufftSafeCall(cufftExecD2Z(plan,g_x,(cufftDoubleComplex *)g_x)); cutilSafeCall( cudaEventRecord(start, 0) ); for(i=0;i<10;i++) cufftSafeCall(cufftExecD2Z(plan,g_x,(cufftDoubleComplex *)g_x)); cutilSafeCall( cudaEventRecord(stop, 0) ); cutilSafeCall( cudaEventSynchronize(stop) ); float outerTime; cutilSafeCall( cudaEventElapsedTime(&outerTime, start, stop) ); printf("CUFFT_D2Z size=%d k time=%f msec\n",j/1024,outerTime/10); cufftSafeCall(cufftDestroy(plan)); } cutilSafeCall(cudaFree((char *)g_x)); cutilSafeCall( cudaEventDestroy(start) ); cutilSafeCall( cudaEventDestroy(stop) ); } CUFFT_Z2Z size=2048 k time=6.288720 msec CUFFT_Z2Z size=3072 k time=10.626810 msec CUFFT_D2Z size=1024 k time=1.947040 msec CUFFT_D2Z size=1280 k time=2.580678 msec CUFFT_D2Z size=1536 k time=3.186858 msec CUFFT_D2Z size=1792 k time=3.640893 msec CUFFT_D2Z size=2048 k time=4.063977 msec CUFFT_D2Z size=2304 k time=4.664579 msec CUFFT_D2Z size=2560 k time=5.340890 msec CUFFT_D2Z size=2816 k time=76.725174 msec CUFFT_D2Z size=3072 k time=6.547805 msec CUFFT_D2Z size=3328 k time=98.685196 msec CUFFT_D2Z size=3584 k time=7.542326 msec CUFFT_D2Z size=3840 k time=8.636828 msec Non power of 2 is enhancement. But not enough. 
20110204, 12:56  #402  
Banned
"Luigi"
Aug 2002
Team Italia
4844_{10} Posts 
Quote:
Luigi 

20110204, 13:02  #403 
Jul 2009
Tokyo
2×5×61 Posts 

20110204, 16:27  #404 
Dec 2010
2^{3} Posts 
There are wide variations in the time similar sized transforms based on their factorization: CUFFT (CUDA 3.2) on Fermi supports 2^a * 3^b * 5^c * 7^d transforms, with pure powers of 2 and 3 being pretty fast, but powers of 5 noticeably slower, and products of powers giving ordersofmagnitude differences in runtimes, some quite good, some horrible, depending on which bases and which powers.
I've got tabulations of runtimes based on a complete search of [a, b, c, d] values giving FFTs of length between 2^18 and 2^24. Use it (manually, at present) to pick LL runlengths, and will be correlating it with maximum wordsizes for LL giving acceptable errors. More to understand convolution errors w/ balanced integers than for the Mersenne stuff per se. Just one more thing I should put online for anyone who's interested. 
20110205, 03:12  #405 
Jun 2005
3×43 Posts 
Another update for the windows version. Looks like the previous version I posted will end up in an infinite loop once a test finishes. The result will be printed to mersarch.txt, but not to the console.
There should be no problem using this version to complete a test started using one of the other windows versions. I've included a fix in source and executable attached to this post. I didn't add cufft64_31_9.dll because that file is something like 5MB compressed  it's too large to fit as an attachment. cudalucas.1.0b.winx64.zip 
20110209, 23:22  #407 
Jan 2011
Dudley, MA, USA
49_{16} Posts 
There seems to be a couple upper limits to this right now. I tried running higher numbers, and get a couple different errors:
#CUDALucas 151150000 err = 0.353794, increasing n from 8388608 CUDALucas.cu(534) : cufftSafeCall() CUFFT error. I'm guessing it's because of: "The cuFFT manual states that 1D ffts are supported for < 8 million elements." The other is at exponents around 318750000, I hit the memory limit on my 768MB card. At 336000000, it wants over 1Gb. Combined, these prevent it from being useful for the 100 million digit numbers. (I can't be the only one eyeing this as making that task feasible.) 
Thread Tools  
Similar Threads  
Thread  Thread Starter  Forum  Replies  Last Post 
Don't DC/LL them with CudaLucas  LaurV  Data  131  20170502 18:41 
CUDALucas / cuFFT Performance on CUDA 7 / 7.5 / 8  Brain  GPU Computing  13  20160219 15:53 
CUDALucas: which binary to use?  Karl M Johnson  GPU Computing  15  20151013 04:44 
settings for cudaLucas  fairsky  GPU Computing  11  20131103 02:08 
Trying to run CUDALucas on Windows 8 CP  Rodrigo  GPU Computing  12  20120307 23:20 