mersenneforum.org CUDALucas (a.k.a. MaclucasFFTW/CUDA 2.3/CUFFTW)
 Register FAQ Search Today's Posts Mark Forums Read

2012-05-25, 03:16   #1310
Dubslow

"Bunslow the Bold"
Jun 2011
40<A<43 -89<O<-88

3·29·83 Posts

Quote:
 Originally Posted by msft Ver 2.01 Fix caluclate raundoff err. Code: M( 26974951 )C, 0xe72576f52d2b0d8c, n = 1474560, CUDALucas v2.00 M( 26975743 )C, 0x67d12882dc466fd7, n = 1474560, CUDALucas v2.00 M( 26767891 )C, 0xbbeb0ad54a815dda, n = 1474560, CUDALucas v2.00 M( 26768243 )C, 0x3280d4e28ef0b188, n = 1474560, CUDALucas v2.00 M( 26822449 )C, 0xad9016be8bd360a9, n = 1474560, CUDALucas v2.00 M( 26823619 )C, 0xb989439408521303, n = 1474560, CUDALucas v2.00 M( 27722911 )C, 0x42c0350cdd3596a9, n = 1572864, CUDALucas v2.00 M( 27192083 )C, 0xc5ec2d3fd58a9ccf, n = 1474560, CUDALucas v2.00 M( 27192391 )C, 0xc882ca1522e59f5e, n = 1474560, CUDALucas v2.00 M( 27699043 )C, 0xd0eda360a4e70525, n = 1572864, CUDALucas v2.01 M( 27703817 )C, 0xd73a94c433bcd689, n = 1572864, CUDALucas v2.01 M( 27706841 )C, 0x6708a475e7e3db2b, n = 1572864, CUDALucas v2.01 M( 27707293 )C, 0xf82f49632f7b11c1, n = 1572864, CUDALucas v2.01 M( 27708413 )C, 0x78c5e390d697eec8, n = 1572864, CUDALucas v2.01 M( 27661351 )C, 0x13df0d77627419bc, n = 1572864, CUDALucas v2.01 All DC successfully.
The 2.01 makefile no longer has -lcudart in it. Is that correct? That is definitely not correct. Then again, it probably only matters to me.

Last fiddled with by Dubslow on 2012-05-25 at 03:18

2012-05-26, 02:45   #1311
kdgehman

Feb 2011

22×3 Posts

Quote:
 Originally Posted by apsen I've got a mismatch with 2.0 for 29668031. Could someone run it with P95? Thanks, Andriy

Sending result to server: UID: KDGehman1978/i5-2500k_2, M29668031 is not prime. Res64: 846461BD45E022__
PrimeNet success code with additional info:
LL test successfully completes double-check of M29668031
CPU credit is 29.5032 GHz-days.

2012-05-27, 04:12   #1312
Dubslow

"Bunslow the Bold"
Jun 2011
40<A<43 -89<O<-88

722110 Posts
CUDALucas 2.02

Well, I've tinkered with CUDALucas before, but this time I think it's a more worthwhile upgrade. In addition to standard worktodo.txt functionality as I have previously done, this version now supports proper .ini functionality, in the same vein as mfakt*. That is, almost all of the command line options can now be set via "CUDALucas.ini", so you don't have to remember the complicated command each time you restart it (or spam the up-arrow in my case ). The best part is any command line options will override the values in CUDALucas.ini. I've tentatively labelled it 2.02.

I have not messed with any of the computational code; any bugs that cause incorrect are also in 2.01. (I am not aware of any, for the record. I'm ~15/16 with 2.00 and 2.01.)

There is an extra file, plus anyone compiling on Windows (flash!) should look at lines 32-40. In order to make gcc link function calls in a .cu file functions defined in a .c file, I had to add extern "C" -- no idea how MSVC will react.

There is also a new Makefile, which corrects the thing I mentioned two posts above -- it also now includes warnings from nvcc. msft, I got these warnings when I compiled it:
Code:
nvcc -O2 -arch=sm_13 --compiler-options=-Wall -c CUDALucas.cu
CUDALucas.cu: In function ‘void printbits(double*, int, int, int, int, double, double, int, int, char*)’:
CUDALucas.cu:895:20: warning: zero-length gnu_printf format string
CUDALucas.cu: In function ‘int check(int, char*)’:
CUDALucas.cu:1161:19: warning: comparison between signed and unsigned integer expressions
CUDALucas.cu: In function ‘void printbits(double*, int, int, int, int, double, double, int, int, char*)’:
CUDALucas.cu:848:7: warning: ‘fp’ may be used uninitialized in this function
I could probably fix those myself, but I didn't want to touch the code so I could make the statement above.

The functions to read the ini file were taken from mfaktc and modified to my tastes, styled after Prime95's ini-reading functions.

Note that the ini file name of "CUDALucas.ini" will clobber the old .ini file in Windows, where file names are not case sensitive (or so I hear). However, if anyone has a better idea before flash compiles, it's something like line 42 in CUDALucas.cu.

As the computational stuff hasn't been touched, the checkpoint files are the same. However, your worktodo.txt will need to be reformatted to the "Test=" or "DoubleCheck=" format. Copy and pastes from GPU272 or PrimeNet/Manual will work just fine.

Any bugs should obviously be reported, but this passed some basic testing, and I'm not too worried since the core of this code is already used in mfaktc.

Code:
bill@Gravemind:~/CUDALucas/test∰∂ cat CUDALucas.ini
# You can use this file to customize CUDALucas without having to create a long
# and complex command. I got tired of having to hit the up arrow a bunch of
# times whenever I rebooted, so I created this. You can set most of the command
# line options here; however, if you do use command line options, they will
# override their corresponding value in this file.

# CheckpointIterations is the same as the -c option; it determines how often
# checkpoints are written and also how often CUDALucas prints to terminal.
CheckpointIterations=10000

# This sets the name of the workfile used by CUDALucas.
WorkFile=worktodo.txt

# Polite is the same as the -polite option. If it's 1, each iteration is
# polite. If it's (for example) 12, then every 12th iteration is polite. Thus
# the higher the number, the less polite the program is. Set to 0 to turn off
# completely. Polite!=0 will incur a slight performance drop, but the screen
# should be more responsive. Trade responsiveness for performance.
Polite=1

# CheckRoundoffAllIterations is the same as the -t option. When active, each
# iteration's roundoff error is checked, at the price of a small performance
# cost. I'm not sure how often it's checked otherwise. This is a binary option;
# set to 1 to activate, 0 to de-activate.
CheckRoundoffAllIterations=0

# SaveAllCheckpoints is the same as the -s option. When active, CUDALucas will
# save each checkpoint separately in the folder specified in the "SaveFolder"
# option below. This is a binary option; set to 1 to activate, 0 to de-activate.
SaveAllCheckpoints=0

# This option is the name of the folder where the separate checkpoint files are
# saved. This option is only checked if SaveAllCheckpoints is activated.
SaveFolder=savefiles

# Interactive is the same as the -k option. When active, you can press p, t, or
# s to change the respective options while the program is running. P is polite,
# t is CheckRoundoffAllIterations, and s is the SaveAllCheckpoints feature
# below. This is a binary option; set to 1 to activate, 0 to de-activate.
Interactive=0

# used in the FFTs. This must be 32, 64, 128, 256, 512, or 1024. (Some FFT
# lengths have a higher minimum than 32.)

# DeviceNumber is the same as the -d option. Use this to run CUDALucas on a GPU
# other than "the first one". Only useful if you have more than one GPU.
DeviceNumber=0

# FFTLength is the same as the -f option. If this is 0, CUDALucas will
# autoselect a length for each exponent. Otherwise, you can set this with an
# override length; this length will be used for all exponents in worktodo.txt,
# which may not be optimal (or even possible). In the future, I would like to
# both create a better FFT length selection function, as well as be able to
# specify a length on an individual-exponent basis (probably through a field in
# Test= in the work file). To see a list of reasonable FFT lengths, try running
# "$CUDALucas -cufftbench 32768 3276800 32768" which will test a large range. # In my personal experience on a GTX 460, I've found that for 26M exponents, # FFTLength=1474560 is a good length. (Technical note: FFT length must be a # multiple of 128*threads. See # http://www.mersenneforum.org/showpost.php?p=292776&postcount=959 ) FFTLength=0 Edit: I probably should have put this in the attached ini file, but polite 0 is known to cause CUDALucas to take some small CPU time; polite 64 seems to be a good compromise. Edit2: Forgot the todo list Here it is: 1) Add some sort of way to specify the FFT on a per-exponent basis, presumably through some field in "DoubleCheck=...". The Prime95 way of doing it would be rather tedious to parse... anyone have any ideas? 2) I'd like to refine the FFT autoselect function to do better; to that end, at some point over the summer I'll write a script to test a bunch of exponents and record the round off error, then distribute that around here to get as much hardware covered as possible. Then we'd use the data to either create a table of FFT lengths or a reasonably accurate regression of some sort. (Prime95 uses (very large) tables.) Edit3: Anyone who needs a Linux binary can of course just ask me. Attached Files  CUDALucas.2.02.tar.bz2 (17.7 KB, 85 views) Last fiddled with by Dubslow on 2012-05-27 at 04:47  2012-05-27, 06:50 #1313 flashjh "Jerry" Nov 2011 Vancouver, WA 1,123 Posts I'm running into a few issues compiling on Windows. I've been able to get around most, but here is the main stuff: Code:  parse.c(166) : warning C4996: 'strcpy': This function or variable may be unsafe. Consider using strcpy_s instead. To disable deprecation, use _CRT_SECURE_NO_WARNINGS. See online help for details. C:\Program Files (x86)\Microsoft Visual Studio 10.0\VC\INCLUDE\string.h(105) : see declaration of 'strcpy' parse.c(176) : warning C4127: conditional expression is constant parse.c(191) : error C3861: 'strncasecmp': identifier not found parse.c(192) : error C3861: 'strncasecmp': identifier not found parse.c(305) : warning C4127: conditional expression is constant parse.c(312) : warning C4996: 'strcpy': This function or variable may be unsafe. Consider using strcpy_s instead. To disable deprecation, use _CRT_SECURE_NO_WARNINGS. See online help for details. C:\Program Files (x86)\Microsoft Visual Studio 10.0\VC\INCLUDE\string.h(105) : see declaration of 'strcpy' make: *** [parse.x64.obj] Error 2 I'll look at it more tomorrow, but maybe you can identify the problem? 2012-05-27, 07:42 #1314 Dubslow Basketry That Evening! "Bunslow the Bold" Jun 2011 40<A<43 -89<O<-88 160658 Posts Quote:  Originally Posted by flashjh I'm running into a few issues compiling on Windows. I've been able to get around most, but here is the main stuff: Code:  parse.c(166) : warning C4996: 'strcpy': This function or variable may be unsafe. Consider using strcpy_s instead. To disable deprecation, use _CRT_SECURE_NO_WARNINGS. See online help for details. C:\Program Files (x86)\Microsoft Visual Studio 10.0\VC\INCLUDE\string.h(105) : see declaration of 'strcpy' parse.c(176) : warning C4127: conditional expression is constant parse.c(191) : error C3861: 'strncasecmp': identifier not found parse.c(192) : error C3861: 'strncasecmp': identifier not found parse.c(305) : warning C4127: conditional expression is constant parse.c(312) : warning C4996: 'strcpy': This function or variable may be unsafe. Consider using strcpy_s instead. To disable deprecation, use _CRT_SECURE_NO_WARNINGS. See online help for details. C:\Program Files (x86)\Microsoft Visual Studio 10.0\VC\INCLUDE\string.h(105) : see declaration of 'strcpy' make: *** [parse.x64.obj] Error 2 I'll look at it more tomorrow, but maybe you can identify the problem? Bluh, you'll have to ask TheJudger how he compiles mfaktc for Windows. I'll look through his code tomorrow and look for Windows ifdefs. Edit: Found one in compatability.h: #ifdef _MSC_VER; #define strncasecomp _strnicmp with the abuse of semicolon/newline. Next stop: makefile.win. Last fiddled with by Dubslow on 2012-05-27 at 07:49 2012-05-27, 15:33 #1315 flashjh "Jerry" Nov 2011 Vancouver, WA 1,123 Posts Quote:  Originally Posted by Dubslow Bluh, you'll have to ask TheJudger how he compiles mfaktc for Windows. I'll look through his code tomorrow and look for Windows ifdefs. Edit: Found one in compatability.h: #ifdef _MSC_VER; #define strncasecomp _strnicmp with the abuse of semicolon/newline. Next stop: makefile.win. Probably should have them already, do you have a link to mfaktc source files? I'm pretty sure I need _strnicmp instead of strncasecomp and strcpy_s instead of strcpy, however, I don't know what to do with the 'conditional expression is constant'. 2012-05-27, 15:58 #1316 Dubslow Basketry That Evening! "Bunslow the Bold" Jun 2011 40<A<43 -89<O<-88 3·29·83 Posts Quote:  Originally Posted by flashjh Probably should have them already, do you have a link to mfaktc source files? http://mersenneforum.org/mfaktc/ (The unqualified 0.18 would be the source.) Quote:  Originally Posted by flashjh I'm pretty sure I need _strnicmp instead of strncasecomp and strcpy_s instead of strcpy, I don't see anywhere in the mfatkc source about strcpy_s, but I'll keep looking, and you certainly know better than me about MSVC Edit: http://msdn.microsoft.com/en-us/libr...(v=vs.80).aspx Quote:  Originally Posted by MS For example, the strcpy function has no way of telling if the string that it is copying is too big for its destination buffer. However, its secure counterpart, strcpy_s, takes the size of the buffer as a parameter, so it can determine if a buffer overrun will occur. If you use strcpy_s to copy eleven characters into a ten-character buffer, that is an error on your part; strcpy_s cannot correct your mistake, but it can detect your error and inform you by invoking the invalid parameter handler. strcpy_s has a different definition, requiring a third size argument; it's just meant to be a "catch stupid programmer error" thing. Since the code does a check for a long line anyways, the extra functionality is unnecessary. Code: bill@Gravemind:~/CUDALucas/test∰∂ cat parse.c #include <stdio.h> #include <string.h> #include <limits.h> #include <ctype.h> #include <errno.h> #include <stdlib.h> #ifdef _MSC_VER #define strncasecmp _strnicmp #define _CRT_SECURE_NO_WARNINGS #endif int isprime(unsigned int n) /* returns 0 if n is composite 1 if n is prime */ { unsigned int i; if(n<=1) return 0; if(n>2 && n%2==0)return 0; i=3; while(i*i <= n && i < 0x10000) { if(n%i==0)return 0; i+=2; } return 1; } Quote:  Originally Posted by flashjh however, I don't know what to do with the 'conditional expression is constant'. Oh, that's just a while(1), that warning can be ignored. (Perhaps deleting the 1 and leaving the conditional blank will work?) Edit: Actually, in further looking, I actually have no clue why that loop is there at all. If the func finds a too-long line, then it reads in the characters until a control character then breaks? Wtf? PS I forgot to mention above, but the help message was moved from no-args to the -h arg. PPS Here's a slightly larger .ini file: Code: # Polite is the same as the -polite option. If it's 1, each iteration is # polite. If it's (for example) 12, then every 12th iteration is polite. Thus # the higher the number, the less polite the program is. Set to 0 to turn off # completely. Polite!=0 will incur a slight performance drop, but the screen # should be more responsive. Trade responsiveness for performance. (Note: # polite=0 is known to cause CUDALucas to use some extra CPU time; Polite=64 or # higher is a good compromise.) Polite=1 flash, could you add this to your copy before you bundle the executable? The attached archive contains the modified parse.c and CUDALucas.ini. Attached Files  CUDALucas.2.02.tar.bz2 (17.7 KB, 89 views) Last fiddled with by Dubslow on 2012-05-27 at 16:27  2012-05-31, 04:21 #1317 Batalov "Serge" Mar 2008 Phi(4,2^7658614+1)/2 2×4,787 Posts OT is moved to a separate thread 2012-05-31, 05:30 #1318 flashjh "Jerry" Nov 2011 Vancouver, WA 1,123 Posts Quote:  Originally Posted by Dubslow http://mersenneforum.org/mfaktc/ (The unqualified 0.18 would be the source.) I don't see anywhere in the mfatkc source about strcpy_s, but I'll keep looking, and you certainly know better than me about MSVC Edit: http://msdn.microsoft.com/en-us/libr...(v=vs.80).aspx strcpy_s has a different definition, requiring a third size argument; it's just meant to be a "catch stupid programmer error" thing. Since the code does a check for a long line anyways, the extra functionality is unnecessary. Code: bill@Gravemind:~/CUDALucas/test∰∂ cat parse.c #include #include #include #include #include #include #ifdef _MSC_VER #define strncasecmp _strnicmp #define _CRT_SECURE_NO_WARNINGS #endif int isprime(unsigned int n) /* returns 0 if n is composite 1 if n is prime */ { unsigned int i; if(n<=1) return 0; if(n>2 && n%2==0)return 0; i=3; while(i*i <= n && i < 0x10000) { if(n%i==0)return 0; i+=2; } return 1; } Oh, that's just a while(1), that warning can be ignored. (Perhaps deleting the 1 and leaving the conditional blank will work?) Edit: Actually, in further looking, I actually have no clue why that loop is there at all. If the func finds a too-long line, then it reads in the characters until a control character then breaks? Wtf? PS I forgot to mention above, but the help message was moved from no-args to the -h arg. PPS Here's a slightly larger .ini file: Code: # Polite is the same as the -polite option. If it's 1, each iteration is # polite. If it's (for example) 12, then every 12th iteration is polite. Thus # the higher the number, the less polite the program is. Set to 0 to turn off # completely. Polite!=0 will incur a slight performance drop, but the screen # should be more responsive. Trade responsiveness for performance. (Note: # polite=0 is known to cause CUDALucas to use some extra CPU time; Polite=64 or # higher is a good compromise.) Polite=1 flash, could you add this to your copy before you bundle the executable? The attached archive contains the modified parse.c and CUDALucas.ini. Attached is CUDALucas 2.02 - UNTESTED. I included all the source files and the MAKEFILE for windows. I modified to 'safe' functions so MSVC would not complain**. The two while(1) lines cause a warning, but they're safe to ignore. No CUDA 3.2 because there were too many errors during compile for me to fix tonight. @Dubslow: This build is based on your newest archive from earlier this afternoon. **I can't test the changes because I remoted into my system to do the build and CuLu doesn't detect my nVidia card while in a remote session. I'll test it tomorrow; if anyone else can test it and let me know if it performs as expected, that would be great -- thanks. Attached Files  CUDALucas2.02x64.zip (184.0 KB, 84 views) Last fiddled with by flashjh on 2012-05-31 at 05:32 2012-05-31, 05:37 #1319 Dubslow Basketry That Evening! "Bunslow the Bold" Jun 2011 40<A<43 -89<O<-88 1C3516 Posts Quote:  Originally Posted by flashjh Attached is CUDALucas 2.02 - UNTESTED. I included all the source files and the MAKEFILE for windows. I modified to 'safe' functions so MSVC would not complain**. The two while(1) lines cause a warning, but they're safe to ignore. No CUDA 3.2 because there were too many errors during compile for me to fix tonight. @Dubslow: This build is based on your newest archive from earlier this afternoon. **I can't test the changes because I remoted into my system to do the build and CuLu doesn't detect my nVidia card while in a remote session. I'll test it tomorrow; if anyone else can test it and let me know if it performs as expected, that would be great -- thanks. Heehee, I've been modifying the source with trifling differences for much of the evening, and that DL was symlinked to my files. I'll see how late it was I've completed one expo with 2.02 myself, and another one due in a few hours. I'll come back and edit this post with a source archive that compiles "out of the box" on both Windows and Linux. Edit: Sheesh, where did you get that makefile? In all 2.x versions I've seen from msft, the makefiles he's included all had arch=sm_13 and -O2; this win makefile has a whole bunch of stuff that is (AFAICT) unnecessary. How did you even get it to compile with arch not sm_13? The change msft made in 2.01 requires sm_13 (at least nvcc threw me an error when I tried later arches of course now it works fine), and we've figured out it's the fastest. Code: CUFLAGS = -m64 --ptxas-options=-v -ccbin=$(CCLOC) -D$(BIT) -Xcompiler /EHsc,/W3,/nologo,/Ox,/Oy,/GL -arch=$(CUDA_ARCH) -DMERS_PACKAGE -DBIT_SIEVE -DTESTING_SMALL_EXPONENTS -DSIEVE_SIZE_IN_BYTES=32 -DNUM_SMALL_PRIMES=32768 -DDO_NOT_USE_LONG_DOUBLE  "-I\$(CUDA)/include"  -D__x86_64__ -O3
I'm pretty sure if you did a Ctrl+F on the defines in the source, you wouldn't find anything. Here's my CUFLAGS:
Code:
CUFLAGS = -O2 -arch=sm_13 --compiler-options=-Wall
The first two flags are from msft's makefile, and the last one is the one I added which produced warnings.

Is this win makefile a relic of MacLucas?

Last fiddled with by Dubslow on 2012-05-31 at 06:12 Reason: s/two are/two flags are/

2012-05-31, 05:45   #1320
flashjh

"Jerry"
Nov 2011
Vancouver, WA

1,123 Posts

Quote:
 Originally Posted by Dubslow Heehee, I've been modifying the source with trifling differences for much of the evening, and that DL was symlinked to my files. I'll see how late it was I've completed one expo with 2.02 myself, and another one due in a few hours. I'll come back and edit this post with a source archive that compiles "out of the box" on both Windows and Linux.

Edit: Can you compile/run Windows versions?

Last fiddled with by flashjh on 2012-05-31 at 05:46

 Similar Threads Thread Thread Starter Forum Replies Last Post LaurV Data 131 2017-05-02 18:41 Brain GPU Computing 13 2016-02-19 15:53 Karl M Johnson GPU Computing 15 2015-10-13 04:44 fairsky GPU Computing 11 2013-11-03 02:08 Rodrigo GPU Computing 12 2012-03-07 23:20

All times are UTC. The time now is 18:58.

Wed Oct 27 18:58:18 UTC 2021 up 96 days, 13:27, 0 users, load averages: 2.31, 2.20, 1.83