![]() |
Any updates ?
|
Just received a note from Andrew Thall and he is releasing his gpuLucas program at [url]https://github.com/Almajester/gpuLucas[/url].
He claims it is still pretty ugly research code but between the ReadMe file, internal documentation and his [URL="http://andrewthall.org/papers/gpuMersenne2011MKII.pdf"]paper[/URL], that should be enough to make a working copy. It appears the program was developed under Windows 7 using Visual C++ in Visual Studio 2008. I may play with it (time permitting) to see if I can get a working version under Linux. |
[QUOTE=RichD;290603]Just received a note from Andrew Thall and he is releasing his gpuLucas program at [URL]https://github.com/Almajester/gpuLucas[/URL].
I may play with it (time permitting) to see if I can get a working version under Linux.[/QUOTE] Very interesting! I've managed to get a linux version working, myself. (Had a bit of trouble with #include <qd/dd_real.h> being included under nvcc compilation.) Observations: Currently the number to test, and the FFTlen are hard-coded, there is no checkpoint file, it does not bail/restart/change FFTlen if error is too great, and there is no residue output for non-primes. However, after a couple tests, it does seem to be a fair bit faster than CUDALucas: estimated runtime for M(26xxxxxx) using the same FFT size (1572864) is about 47 hrs in CUDALucas, and 40 hrs in gpuLucas (I've actually gotten it down to 36 hrs by fine tuning FFT size, and T_PER_B), but that's just [I]estimated[/I] run-time... |
Hey, that's great!!!
I found the QD package at [url]http://crd-legacy.lbl.gov/~dhbailey/mpdist/[/url] but then I ran into another problem before getting side track. Your observations are what I was expecting (unfortunately). I think [B]TheJudger[/B] has done a lot of work on threads per block (T_PER_B) is his mfaktc program. Might need to be tuned for each card. There is a lot of work that still needs to be done before it can be accepted by the community. Or maybe just the ideas present in the code could be used in existing programs. ?? |
A hybrid of of GL and CL? (Oh, those are such unfortunate acronyms.)
|
Yes, gpulucas appears considerably faster. On a GTX 480, for 43122609 using a 2304K FFT, gpulucas claims to require 51.2 hours and CUDALucas 1.58 claims to require 63.7 hours. Of course both of these are ETA's and not actual runtimes, but that's a nearly 20% difference.
|
Hi,
[QUOTE=RichD;290663]I think [B]TheJudger[/B] has done a lot of work on threads per block (T_PER_B) is his mfaktc program. Might need to be tuned for each card.[/QUOTE] *hmm* not really. Actually "threads per block" is currently fixed at 256 in mfaktc. When I've choosen this number I did some tests with other values, 512 runs out of registers on CC 1.1 GPUs, for other GPUs it does not really make any difference for 128, 256 or 512. The more important number for mfaktc is the number of threads per grid but this might be special to mfaktc, not for all CUDA applications. Oliver |
Hi ,
Work on linux. I think compile option is important. Makefile [code] NVIDIA_SDK = $(HOME)/NVIDIA_GPU_Computing_SDK gpuLucas: gpuLucas.o g++ -fPIC -o gpuLucas gpuLucas.o -L/usr/local/cuda/lib64 -L/usr/local/cuda/lib64 $(NVIDIA_SDK)/C/lib/libcutil_x86_64.a -lqd -lcufft -lm gpuLucas.o: gpuLucas.cu /usr/local/cuda/bin/nvcc -O3 -use_fast_math -gencode arch=compute_20,code=sm_20 --compiler-options="-fno-strict-aliasing" -w -I. -I/usr/local/include -I$(NVIDIA_SDK)/C/common/inc gpuLucas.cu -arch=sm_13 -c clean: -rm *.o gpuLucas [/code] GTX-550Ti [code] [0/50]: iteration 4300: max abs error = 0.226562 [0/50]: iteration 4300: max Bit Vector = 39.000000 Time to rebalance llint: 1.936 ms Time to rebalance and write-back: 821.3 ms Timing: To test M43112609 elapsed time : 75901 msec = 75.9 sec dev. elapsed time: 143860 msec = 143.9 sec est. total time: 620216064 msec = 620216.1 sec Beginning full test of M43112609 [/code] CUDALucas [code] $ ./CUDALucas 43112609 Iteration 10000 M( 43112609 )C, 0xe86891ebf6cd70c4, n = 2359296, CUDALucas v1.58 (2:35 real, 15.4797 ms/iter, ETA 185:19:36) [/code] |
[QUOTE=msft;290911]Hi ,
Work on linux. I think compile option is important. Makefile [code] NVIDIA_SDK = $(HOME)/NVIDIA_GPU_Computing_SDK gpuLucas: gpuLucas.o g++ -fPIC -o gpuLucas gpuLucas.o -L/usr/local/cuda/lib64 -L/usr/local/cuda/lib64 $(NVIDIA_SDK)/C/lib/libcutil_x86_64.a -lqd -lcufft -lm gpuLucas.o: gpuLucas.cu /usr/local/cuda/bin/nvcc -O3 -use_fast_math -gencode arch=compute_20,code=sm_20 --compiler-options="-fno-strict-aliasing" -w -I. -I/usr/local/include -I$(NVIDIA_SDK)/C/common/inc gpuLucas.cu -arch=sm_13 -c clean: -rm *.o gpuLucas [/code] GTX-550Ti [code] [0/50]: iteration 4300: max abs error = 0.226562 [0/50]: iteration 4300: max Bit Vector = 39.000000 Time to rebalance llint: 1.936 ms Time to rebalance and write-back: 821.3 ms Timing: To test M43112609 elapsed time : 75901 msec = 75.9 sec dev. elapsed time: 143860 msec = 143.9 sec est. total time: 620216064 msec = 620216.1 sec Beginning full test of M43112609 [/code] CUDALucas [code] $ ./CUDALucas 43112609 Iteration 10000 M( 43112609 )C, 0xe86891ebf6cd70c4, n = 2359296, CUDALucas v1.58 (2:35 real, 15.4797 ms/iter, ETA 185:19:36) [/code][/QUOTE] looks like the difference is about 13:02:40 |
[QUOTE=science_man_88;290913]looks like the difference is about 13:02:40[/QUOTE]
Indeed. [QUOTE=aaronhaviland;290657] and there is no residue output for non-primes. [/QUOTE] residue is available. but not same mprime. [code] M_1215421 tests as non-prime. M_1215421, 0xfd93939b00a071bf, n = 65536, gpuLucas [/code] mprime: [code] [Work thread Feb 25 18:53] M1215421 is not prime. Res64: FE93935B009871C0. We8: 5EAF771A,140242,00000000 [/code] each h_signalOUT[] value -1 or +1. |
[QUOTE=msft;290918]residue is available.
but not same mprime. [code] M_1215421 tests as non-prime. M_1215421, 0xfd93939b00a071bf, n = 65536, gpuLucas [/code]mprime: [code] [Work thread Feb 25 18:53] M1215421 is not prime. Res64: FE93935B009871C0. We8: 5EAF771A,140242,00000000 [/code]each h_signalOUT[] value -1 or +1.[/QUOTE] Residue output wasn't available when I posted that. Since then, I had submitted a pull request with basic residue output, and a basic Makefile, which has been merged. As far as the residue output: It actually works fine on my end (linux, 64-bit, gtx460): [CODE]M_1215421, 0xfe93935b009871c0, n = 65536, gpuLucas M_1215421, 0xfe93935b009871c0, n = 61440, gpuLucas [/CODE]I've tested it now with several different testIntegers... and different FFT lengths. |
All times are UTC. The time now is 04:22. |
Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2022, Jelsoft Enterprises Ltd.