mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware > GPU Computing

Reply
 
Thread Tools
Old 2011-02-01, 17:44   #397
msft
 
msft's Avatar
 
Jul 2009
Tokyo

2·5·61 Posts
Default

Hi, Karl M Johnson
Quote:
Originally Posted by Karl M Johnson View Post
By the way, is there a way to restart a certain exponent from a certain iteration ?
Or, at least, from a checkpoint ?

I got a certain exponent to 90k iterations, then got a reboot(I had -c25k flag), and when I started it again, it went from the very beginning.
Code:
msft@ubuntu:~$ ./CUDALucas -c1000 33333333 &
[1] 14609
msft@ubuntu:~$ Iteration 10000 M( 33333333 )C, 0xd717246f501c7d94, n = 2097152, CUDALucas v1.0 

msft@ubuntu:~$ kill 14609
msft@ubuntu:~$ 
[1]+  Done                    ./CUDALucas -c1000 33333333
msft@ubuntu:~$ ./CUDALucas -c1000 c33333333 
caso 2
Iteration 20000 M( 33333333 )C, 0x7f036ff2b230121b, n = 2097152, CUDALucas v1.0
"c33333333" Need "c".
msft is offline   Reply With Quote
Old 2011-02-01, 17:51   #398
Karl M Johnson
 
Karl M Johnson's Avatar
 
Mar 2010

6338 Posts
Default


Works!
Thanks!
Karl M Johnson is offline   Reply With Quote
Old 2011-02-02, 02:50   #399
kjaget
 
kjaget's Avatar
 
Jun 2005

3×43 Posts
Default

Small update. Windows will not rename a file if it is open. This happens in CUDALucas if you run from a checkpoint file - the c24603451 file will remain open. All this means is that CUDALucas won't be able to back up the old checkpoint before updating it. It's not a significant problem - I only noticed it because I needed both the current and backup files to time execution speed.

In any case, I've fixed the problem and included it here. I've also included cudart64_31_9.dll in the archive so it should be everything you need to build and/or run in one shot.

cudalucas.1.0a.winx64.zip

Run times on my factory overclocked GTX 275, along with some rough run times for current work assignments. I know these aren't the most efficient use of the code but it's a good basis for comparison to a CPU.

8.96 msec/iter @ 2M FFT (~ 2.5 days for a 25M LL double check)
18.8 msec/iter @ 4M FFT (~ 11 days for a 47M LL first time run)

Not sure how that compares to Linux versions, but it's definitely fast enough to be useful.
kjaget is offline   Reply With Quote
Old 2011-02-04, 03:05   #400
Prime95
P90 years forever!
 
Prime95's Avatar
 
Aug 2002
Yeehaw, FL

25×35 Posts
Default

Quote:
Originally Posted by kjaget View Post
I've also included cudart64_31_9.dll in the archive so it should be everything you need to build and/or run in one shot.
Does Nvidia's license let you include cufft64_31_9.dll? If so, please include that in your zip file so that it truly includes everything you need to run cudalucas.

Last fiddled with by Prime95 on 2011-02-04 at 03:06
Prime95 is offline   Reply With Quote
Old 2011-02-04, 12:48   #401
msft
 
msft's Avatar
 
Jul 2009
Tokyo

2×5×61 Posts
Default CUFFT Bench

Code:
#include <cuda.h>
#include <cuda_runtime.h>
#include <cufft.h>
#include <cutil_inline.h>
int main()
{
        cufftHandle     plan;
          cudaEvent_t start, stop;
        double          *x;
        double          *g_x;
        int i,j,imax;

        imax = 1024*1024*4;
        cutilSafeCall(cudaMalloc((void**)&g_x, sizeof(double)*imax));
        x = ((double *)malloc(sizeof(double)*imax));
        for(i=0;i<imax;i++)x[i]=0;
        cutilSafeCall(cudaMemcpy(g_x, x, sizeof(double)*imax, cudaMemcpyHostToDevice));
        cutilSafeCall( cudaEventCreate(&start) );
        cutilSafeCall( cudaEventCreate(&stop) );

        for(j=1024*1024;j<imax;j+=1024*1024)
        {
                cufftSafeCall(cufftPlan1d(&plan, j, CUFFT_Z2Z, 1));
                cufftSafeCall(cufftExecZ2Z(plan,(cufftDoubleComplex *)g_x,(cufftDoubleComplex *)g_x, CUFFT_INVERSE));
                cutilSafeCall( cudaEventRecord(start, 0) );
                for(i=0;i<10;i++)
                        cufftSafeCall(cufftExecZ2Z(plan,(cufftDoubleComplex *)g_x,(cufftDoubleComplex *)g_x, CUFFT_INVERSE));
                cutilSafeCall( cudaEventRecord(stop, 0) );
                cutilSafeCall( cudaEventSynchronize(stop) );
                float outerTime;
                cutilSafeCall( cudaEventElapsedTime(&outerTime, start, stop) );
                printf("CUFFT_Z2Z size=%d k time=%f msec\n",j/1024,outerTime/10);
                cufftSafeCall(cufftDestroy(plan));
        }

        for(j=1024*1024;j<imax;j+=256*1024)
        {
                cufftSafeCall(cufftPlan1d(&plan, j, CUFFT_D2Z, 1));
                cufftSafeCall(cufftExecD2Z(plan,g_x,(cufftDoubleComplex *)g_x));
                cutilSafeCall( cudaEventRecord(start, 0) );
                for(i=0;i<10;i++)
                        cufftSafeCall(cufftExecD2Z(plan,g_x,(cufftDoubleComplex *)g_x));
                cutilSafeCall( cudaEventRecord(stop, 0) );
                cutilSafeCall( cudaEventSynchronize(stop) );
                float outerTime;
                cutilSafeCall( cudaEventElapsedTime(&outerTime, start, stop) );
                printf("CUFFT_D2Z size=%d k time=%f msec\n",j/1024,outerTime/10);
                cufftSafeCall(cufftDestroy(plan));
        }
        cutilSafeCall(cudaFree((char *)g_x));
        cutilSafeCall( cudaEventDestroy(start) );
        cutilSafeCall( cudaEventDestroy(stop) );
}
CUFFT_Z2Z size=1024 k time=3.043661 msec
CUFFT_Z2Z size=2048 k time=6.288720 msec
CUFFT_Z2Z size=3072 k time=10.626810 msec
CUFFT_D2Z size=1024 k time=1.947040 msec
CUFFT_D2Z size=1280 k time=2.580678 msec
CUFFT_D2Z size=1536 k time=3.186858 msec
CUFFT_D2Z size=1792 k time=3.640893 msec
CUFFT_D2Z size=2048 k time=4.063977 msec
CUFFT_D2Z size=2304 k time=4.664579 msec
CUFFT_D2Z size=2560 k time=5.340890 msec
CUFFT_D2Z size=2816 k time=76.725174 msec
CUFFT_D2Z size=3072 k time=6.547805 msec
CUFFT_D2Z size=3328 k time=98.685196 msec
CUFFT_D2Z size=3584 k time=7.542326 msec
CUFFT_D2Z size=3840 k time=8.636828 msec

Non power of 2 is enhancement.
But not enough.
msft is offline   Reply With Quote
Old 2011-02-04, 12:56   #402
ET_
Banned
 
ET_'s Avatar
 
"Luigi"
Aug 2002
Team Italia

2×41×59 Posts
Default

Quote:
Originally Posted by msft View Post
Code:
#include <cuda.h>
#include <cuda_runtime.h>
#include <cufft.h>
#include <cutil_inline.h>
int main()
{
        cufftHandle     plan;
          cudaEvent_t start, stop;
        double          *x;
        double          *g_x;
        int i,j,imax;

        imax = 1024*1024*4;
        cutilSafeCall(cudaMalloc((void**)&g_x, sizeof(double)*imax));
        x = ((double *)malloc(sizeof(double)*imax));
        for(i=0;i<imax;i++)x[i]=0;
        cutilSafeCall(cudaMemcpy(g_x, x, sizeof(double)*imax, cudaMemcpyHostToDevice));
        cutilSafeCall( cudaEventCreate(&start) );
        cutilSafeCall( cudaEventCreate(&stop) );

        for(j=1024*1024;j<imax;j+=1024*1024)
        {
                cufftSafeCall(cufftPlan1d(&plan, j, CUFFT_Z2Z, 1));
                cufftSafeCall(cufftExecZ2Z(plan,(cufftDoubleComplex *)g_x,(cufftDoubleComplex *)g_x, CUFFT_INVERSE));
                cutilSafeCall( cudaEventRecord(start, 0) );
                for(i=0;i<10;i++)
                        cufftSafeCall(cufftExecZ2Z(plan,(cufftDoubleComplex *)g_x,(cufftDoubleComplex *)g_x, CUFFT_INVERSE));
                cutilSafeCall( cudaEventRecord(stop, 0) );
                cutilSafeCall( cudaEventSynchronize(stop) );
                float outerTime;
                cutilSafeCall( cudaEventElapsedTime(&outerTime, start, stop) );
                printf("CUFFT_Z2Z size=%d k time=%f msec\n",j/1024,outerTime/10);
                cufftSafeCall(cufftDestroy(plan));
        }

        for(j=1024*1024;j<imax;j+=256*1024)
        {
                cufftSafeCall(cufftPlan1d(&plan, j, CUFFT_D2Z, 1));
                cufftSafeCall(cufftExecD2Z(plan,g_x,(cufftDoubleComplex *)g_x));
                cutilSafeCall( cudaEventRecord(start, 0) );
                for(i=0;i<10;i++)
                        cufftSafeCall(cufftExecD2Z(plan,g_x,(cufftDoubleComplex *)g_x));
                cutilSafeCall( cudaEventRecord(stop, 0) );
                cutilSafeCall( cudaEventSynchronize(stop) );
                float outerTime;
                cutilSafeCall( cudaEventElapsedTime(&outerTime, start, stop) );
                printf("CUFFT_D2Z size=%d k time=%f msec\n",j/1024,outerTime/10);
                cufftSafeCall(cufftDestroy(plan));
        }
        cutilSafeCall(cudaFree((char *)g_x));
        cutilSafeCall( cudaEventDestroy(start) );
        cutilSafeCall( cudaEventDestroy(stop) );
}
CUFFT_Z2Z size=1024 k time=3.043661 msec
CUFFT_Z2Z size=2048 k time=6.288720 msec
CUFFT_Z2Z size=3072 k time=10.626810 msec
CUFFT_D2Z size=1024 k time=1.947040 msec
CUFFT_D2Z size=1280 k time=2.580678 msec
CUFFT_D2Z size=1536 k time=3.186858 msec
CUFFT_D2Z size=1792 k time=3.640893 msec
CUFFT_D2Z size=2048 k time=4.063977 msec
CUFFT_D2Z size=2304 k time=4.664579 msec
CUFFT_D2Z size=2560 k time=5.340890 msec
CUFFT_D2Z size=2816 k time=76.725174 msec
CUFFT_D2Z size=3072 k time=6.547805 msec
CUFFT_D2Z size=3328 k time=98.685196 msec
CUFFT_D2Z size=3584 k time=7.542326 msec
CUFFT_D2Z size=3840 k time=8.636828 msec

Non power of 2 is enhancement.
But not enough.
Was the GPU updating the screen? Those values look rather uncommon...

Luigi
ET_ is online now   Reply With Quote
Old 2011-02-04, 13:02   #403
msft
 
msft's Avatar
 
Jul 2009
Tokyo

2·5·61 Posts
Default

Hi ,ET_
Quote:
Originally Posted by ET_ View Post
Was the GPU updating the screen? Those values look rather uncommon...
Weak length.
msft is offline   Reply With Quote
Old 2011-02-04, 16:27   #404
Andrew Thall
 
Dec 2010

23 Posts
Default

There are wide variations in the time similar sized transforms based on their factorization: CUFFT (CUDA 3.2) on Fermi supports 2^a * 3^b * 5^c * 7^d transforms, with pure powers of 2 and 3 being pretty fast, but powers of 5 noticeably slower, and products of powers giving orders-of-magnitude differences in runtimes, some quite good, some horrible, depending on which bases and which powers.

I've got tabulations of runtimes based on a complete search of [a, b, c, d] values giving FFTs of length between 2^18 and 2^24. Use it (manually, at present) to pick LL runlengths, and will be correlating it with maximum wordsizes for LL giving acceptable errors. More to understand convolution errors w/ balanced integers than for the Mersenne stuff per se.

Just one more thing I should put online for anyone who's interested.
Andrew Thall is offline   Reply With Quote
Old 2011-02-05, 03:12   #405
kjaget
 
kjaget's Avatar
 
Jun 2005

3·43 Posts
Default

Another update for the windows version. Looks like the previous version I posted will end up in an infinite loop once a test finishes. The result will be printed to mersarch.txt, but not to the console.

There should be no problem using this version to complete a test started using one of the other windows versions.

I've included a fix in source and executable attached to this post. I didn't add cufft64_31_9.dll because that file is something like 5MB compressed - it's too large to fit as an attachment.

cudalucas.1.0b.winx64.zip
kjaget is offline   Reply With Quote
Old 2011-02-07, 18:11   #406
mdettweiler
A Sunny Moo
 
mdettweiler's Avatar
 
Aug 2007
USA (GMT-5)

3·2,083 Posts
Default

After a long hiatus while Gary's GPU was offline, it has now (finally!) finished the last n=35M LL-D assignment it started on back in October:

35000761

mdettweiler is offline   Reply With Quote
Old 2011-02-09, 23:22   #407
aaronhaviland
 
Jan 2011
Dudley, MA, USA

73 Posts
Default

There seems to be a couple upper limits to this right now. I tried running higher numbers, and get a couple different errors:

#CUDALucas 151150000
err = 0.353794, increasing n from 8388608
CUDALucas.cu(534) : cufftSafeCall() CUFFT error.

I'm guessing it's because of: "The cuFFT manual states that 1-D ffts are supported for < 8 million elements."

The other is at exponents around 318750000, I hit the memory limit on my 768MB card. At 336000000, it wants over 1Gb.

Combined, these prevent it from being useful for the 100 million digit numbers. (I can't be the only one eyeing this as making that task feasible.)
aaronhaviland is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
Don't DC/LL them with CudaLucas LaurV Data 131 2017-05-02 18:41
CUDALucas / cuFFT Performance on CUDA 7 / 7.5 / 8 Brain GPU Computing 13 2016-02-19 15:53
CUDALucas: which binary to use? Karl M Johnson GPU Computing 15 2015-10-13 04:44
settings for cudaLucas fairsky GPU Computing 11 2013-11-03 02:08
Trying to run CUDALucas on Windows 8 CP Rodrigo GPU Computing 12 2012-03-07 23:20

All times are UTC. The time now is 11:32.


Thu Jan 27 11:32:19 UTC 2022 up 188 days, 6:01, 1 user, load averages: 1.82, 1.66, 1.51

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2022, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.

≠ ± ∓ ÷ × · − √ ‰ ⊗ ⊕ ⊖ ⊘ ⊙ ≤ ≥ ≦ ≧ ≨ ≩ ≺ ≻ ≼ ≽ ⊏ ⊐ ⊑ ⊒ ² ³ °
∠ ∟ ° ≅ ~ ‖ ⟂ ⫛
≡ ≜ ≈ ∝ ∞ ≪ ≫ ⌊⌋ ⌈⌉ ∘ ∏ ∐ ∑ ∧ ∨ ∩ ∪ ⨀ ⊕ ⊗ 𝖕 𝖖 𝖗 ⊲ ⊳
∅ ∖ ∁ ↦ ↣ ∩ ∪ ⊆ ⊂ ⊄ ⊊ ⊇ ⊃ ⊅ ⊋ ⊖ ∈ ∉ ∋ ∌ ℕ ℤ ℚ ℝ ℂ ℵ ℶ ℷ ℸ 𝓟
¬ ∨ ∧ ⊕ → ← ⇒ ⇐ ⇔ ∀ ∃ ∄ ∴ ∵ ⊤ ⊥ ⊢ ⊨ ⫤ ⊣ … ⋯ ⋮ ⋰ ⋱
∫ ∬ ∭ ∮ ∯ ∰ ∇ ∆ δ ∂ ℱ ℒ ℓ
𝛢𝛼 𝛣𝛽 𝛤𝛾 𝛥𝛿 𝛦𝜀𝜖 𝛧𝜁 𝛨𝜂 𝛩𝜃𝜗 𝛪𝜄 𝛫𝜅 𝛬𝜆 𝛭𝜇 𝛮𝜈 𝛯𝜉 𝛰𝜊 𝛱𝜋 𝛲𝜌 𝛴𝜎 𝛵𝜏 𝛶𝜐 𝛷𝜙𝜑 𝛸𝜒 𝛹𝜓 𝛺𝜔