View Single Post
Old 2012-05-27, 04:12   #1312
Dubslow
Basketry That Evening!
 
Dubslow's Avatar
 
"Bunslow the Bold"
Jun 2011
40<A<43 -89<O<-88

722110 Posts
Smile CUDALucas 2.02

Well, I've tinkered with CUDALucas before, but this time I think it's a more worthwhile upgrade. In addition to standard worktodo.txt functionality as I have previously done, this version now supports proper .ini functionality, in the same vein as mfakt*. That is, almost all of the command line options can now be set via "CUDALucas.ini", so you don't have to remember the complicated command each time you restart it (or spam the up-arrow in my case ). The best part is any command line options will override the values in CUDALucas.ini. I've tentatively labelled it 2.02.

I have not messed with any of the computational code; any bugs that cause incorrect are also in 2.01. (I am not aware of any, for the record. I'm ~15/16 with 2.00 and 2.01.)

There is an extra file, plus anyone compiling on Windows (flash!) should look at lines 32-40. In order to make gcc link function calls in a .cu file functions defined in a .c file, I had to add extern "C" -- no idea how MSVC will react.

There is also a new Makefile, which corrects the thing I mentioned two posts above -- it also now includes warnings from nvcc. msft, I got these warnings when I compiled it:
Code:
nvcc -O2 -arch=sm_13 --compiler-options=-Wall -c CUDALucas.cu
CUDALucas.cu: In function ‘void printbits(double*, int, int, int, int, double, double, int, int, char*)’:
CUDALucas.cu:895:20: warning: zero-length gnu_printf format string
CUDALucas.cu: In function ‘int check(int, char*)’:
CUDALucas.cu:1161:19: warning: comparison between signed and unsigned integer expressions
CUDALucas.cu: In function ‘void printbits(double*, int, int, int, int, double, double, int, int, char*)’:
CUDALucas.cu:848:7: warning: ‘fp’ may be used uninitialized in this function
I could probably fix those myself, but I didn't want to touch the code so I could make the statement above.

The functions to read the ini file were taken from mfaktc and modified to my tastes, styled after Prime95's ini-reading functions.

Note that the ini file name of "CUDALucas.ini" will clobber the old .ini file in Windows, where file names are not case sensitive (or so I hear). However, if anyone has a better idea before flash compiles, it's something like line 42 in CUDALucas.cu.

As the computational stuff hasn't been touched, the checkpoint files are the same. However, your worktodo.txt will need to be reformatted to the "Test=" or "DoubleCheck=" format. Copy and pastes from GPU272 or PrimeNet/Manual will work just fine.

Any bugs should obviously be reported, but this passed some basic testing, and I'm not too worried since the core of this code is already used in mfaktc.


Code:
bill@Gravemind:~/CUDALucas/test∰∂ cat CUDALucas.ini
# You can use this file to customize CUDALucas without having to create a long
# and complex command. I got tired of having to hit the up arrow a bunch of
# times whenever I rebooted, so I created this. You can set most of the command
# line options here; however, if you do use command line options, they will
# override their corresponding value in this file.

# CheckpointIterations is the same as the -c option; it determines how often
# checkpoints are written and also how often CUDALucas prints to terminal.
CheckpointIterations=10000

# This sets the name of the workfile used by CUDALucas.
WorkFile=worktodo.txt

# Polite is the same as the -polite option. If it's 1, each iteration is
# polite. If it's (for example) 12, then every 12th iteration is polite. Thus
# the higher the number, the less polite the program is. Set to 0 to turn off
# completely. Polite!=0 will incur a slight performance drop, but the screen 
# should be more responsive. Trade responsiveness for performance.
Polite=1

# CheckRoundoffAllIterations is the same as the -t option. When active, each 
# iteration's roundoff error is checked, at the price of a small performance 
# cost. I'm not sure how often it's checked otherwise. This is a binary option;
# set to 1 to activate, 0 to de-activate.
CheckRoundoffAllIterations=0

# SaveAllCheckpoints is the same as the -s option. When active, CUDALucas will
# save each checkpoint separately in the folder specified in the "SaveFolder" 
# option below. This is a binary option; set to 1 to activate, 0 to de-activate.
SaveAllCheckpoints=0

# This option is the name of the folder where the separate checkpoint files are
# saved. This option is only checked if SaveAllCheckpoints is activated.
SaveFolder=savefiles

# Interactive is the same as the -k option. When active, you can press p, t, or
# s to change the respective options while the program is running. P is polite, 
# t is CheckRoundoffAllIterations, and s is the SaveAllCheckpoints feature
# below. This is a binary option; set to 1 to activate, 0 to de-activate.
Interactive=0

# Threads is the same as the -threads option. This sets the number of threads
# used in the FFTs. This must be 32, 64, 128, 256, 512, or 1024. (Some FFT
# lengths have a higher minimum than 32.)
Threads=256

# DeviceNumber is the same as the -d option. Use this to run CUDALucas on a GPU
# other than "the first one". Only useful if you have more than one GPU.
DeviceNumber=0

# FFTLength is the same as the -f option. If this is 0, CUDALucas will 
# autoselect a length for each exponent. Otherwise, you can set this with an
# override length; this length will be used for all exponents in worktodo.txt, 
# which may not be optimal (or even possible). In the future, I would like to 
# both create a better FFT length selection function, as well as be able to 
# specify a length on an individual-exponent basis (probably through a field in
# Test= in the work file). To see a list of reasonable FFT lengths, try running
# "$ CUDALucas -cufftbench 32768 3276800 32768" which will test a large range.
# In my personal experience on a GTX 460, I've found that for 26M exponents, 
# FFTLength=1474560 is a good length. (Technical note: FFT length must be a 
# multiple of 128*threads. See
# http://www.mersenneforum.org/showpost.php?p=292776&postcount=959 )
FFTLength=0
Edit: I probably should have put this in the attached ini file, but polite 0 is known to cause CUDALucas to take some small CPU time; polite 64 seems to be a good compromise.

Edit2: Forgot the todo list Here it is:
1) Add some sort of way to specify the FFT on a per-exponent basis, presumably through some field in "DoubleCheck=...". The Prime95 way of doing it would be rather tedious to parse... anyone have any ideas?
2) I'd like to refine the FFT autoselect function to do better; to that end, at some point over the summer I'll write a script to test a bunch of exponents and record the round off error, then distribute that around here to get as much hardware covered as possible. Then we'd use the data to either create a table of FFT lengths or a reasonably accurate regression of some sort. (Prime95 uses (very large) tables.)

Edit3: Anyone who needs a Linux binary can of course just ask me.
Attached Files
File Type: bz2 CUDALucas.2.02.tar.bz2 (17.7 KB, 89 views)

Last fiddled with by Dubslow on 2012-05-27 at 04:47
Dubslow is offline   Reply With Quote