![]() |
![]() |
#3081 | |
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
12EF16 Posts |
![]() Quote:
I'm looking forward to updates and any bug fixes or enhancements whenever they're ready for field testing. (I can throw a variety of gpu models at it, from CC2.0 up to gtx1080Ti) One other thing: there are _gs variations of lots of kernels. What does that _gs mean? |
|
![]() |
![]() |
![]() |
#3082 |
"Oliver"
Mar 2005
Germany
2·3·5·37 Posts |
![]() |
![]() |
![]() |
![]() |
#3083 |
6809 > 6502
"""""""""""""""""""
Aug 2003
101×103 Posts
9,209 Posts |
![]() |
![]() |
![]() |
![]() |
#3084 |
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
484710 Posts |
![]() |
![]() |
![]() |
![]() |
#3085 |
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
12EF16 Posts |
![]()
Feel free to link to https://www.mersenneforum.org/showpo...23&postcount=6 from the wiki.
|
![]() |
![]() |
![]() |
#3086 |
"Sam Laur"
Dec 2018
Turku, Finland
2×3×5×11 Posts |
![]()
Ugh... I'm trying to compile mfaktc on Windows now. Instead of Visual Studio 2012, I got Visual Studio 2017 (Community). File paths are all over the place. It really took a while to find all the extra bits needed so that the compile job would run through. But it seems that C++/CLI support installed, then finding and running vcvars64.bat finally did the trick.
I already installed MinGW earlier for other purposes and thus had gnu make. Also installed GPU Toolkit 10.0.130. Still, after a succesful compile, the executable gives this error (also included the last bits of info given by the program) : Code:
CUDA version info binary compiled for CUDA 10.0 CUDA runtime version 10.0 CUDA driver version 10.0 CUDA device info name GeForce RTX 2060 compute capability 7.5 max threads per block 1024 max shared memory per MP 65536 byte number of multiprocessors 30 clock rate (CUDA cores) 1830MHz memory clock rate: 7001MHz memory bus width: 192 bit Automatic parameters threads per grid 983040 GPUSievePrimes (adjusted) 82486 GPUsieve minimum exponent 1055144 running a simple selftest... ERROR: cudaGetLastError() returned 8: invalid device function Code:
NVCCFLAGS += --generate-code arch=compute_75,code=sm_75 # CC 7.5 Turing The same thing also happens if I replace code=sm_75 with code=compute_75 to enable just in time compilation. It shouldn't be because of VS 2017 / VS 2012 differences, but who knows? Maybe I'll try that, too, but not right now ![]() |
![]() |
![]() |
![]() |
#3087 | |
"Sam Laur"
Dec 2018
Turku, Finland
2×3×5×11 Posts |
![]() Quote:
Code:
support for this version of Microsoft Visual Studio has been deprecated! Only the versions between 2013 and 2017 (inclusive) are supported! |
|
![]() |
![]() |
![]() |
#3088 |
"Sam Laur"
Dec 2018
Turku, Finland
2×3×5×11 Posts |
![]()
The basic outline is documented in the mfaktc README.txt, but here are the specific steps I had to do, to make it work. Let's forget about Visual Studio 2017 for the moment and concentrate on Visual Studio 2012. All installation packages listed here are available for free. Even though a Microsoft account is needed for downloading VS2012 Express, it's free to use. And I'm running on Windows 7 64-bit.
First, I got 64-bit MinGW (originally for other reasons, but it includes GNU make) from https://nuwen.net/mingw.html From there, mingw-16.1-without-git.exe is enough for our purposes. Install that somewhere. Then, Visual Studio 2012 Express for Windows Desktop. https://my.visualstudio.com/Download...2012%20express Log in, or create an account and then log in. The one marked "Visual Studio Express 2012" only works on Windows 8 (and up, maybe?) but the "for Windows Desktop" one also works on Windows 7. I got the installer EXE and then ran it. Finally, CUDA Toolkit 10.0 https://developer.nvidia.com/cuda-downloads Download and install. Then prepare the Makefile.win First of all, you need to change the CUDA_DIR to point to where your CUDA Toolkit was installed. For me this was Code:
CUDA_DIR = "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.0" Code:
NVCCFLAGS += --generate-code arch=compute_60,code=sm_60 # CC 6.0 Pascal / GTX10xx NVCCFLAGS += --generate-code arch=compute_70,code=sm_70 # CC 7.0 Volta / Titan V NVCCFLAGS += --generate-code arch=compute_75,code=sm_75 # CC 7.5 Turing / RTX20xx, GTX16xx Finally, time to start compiling. Start a command prompt window. Go to the root folder of where you installed MinGW and run set_distro_paths.bat from there. Then go to wherever that vcvars64.bat is and run it. Then go to the mfaktc-0.21 source folder and make -f Makefile.win Wait a while... (It seems to take a whole lot longer than on Linux gcc + nvcc) Done! If you want to compile other versions (more/less classes, Wagstaff) these can be set by editing params.h and then recompiling. |
![]() |
![]() |
![]() |
#3089 |
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
37·131 Posts |
![]()
Some of the existing points have been refined or expanded, and I've added several additional points recently. It's now up to 40 entries. It's at https://www.mersenneforum.org/showpo...23&postcount=6
|
![]() |
![]() |
![]() |
#3090 |
"Sam Laur"
Dec 2018
Turku, Finland
2·3·5·11 Posts |
![]()
Another poke at the internals. I was factoring a few exponents in mfaktc, where the bit depth was 76-77 (among others). I wondered why the barrett87_mul32_gs kernel was chosen instead of barrett77_mul32_gs. Then I looked at the kernel_benchmarks.txt in the source directory. Okay, tests were done back in CUDA 5.5 days and the freshest card used was a Tesla K20m, three (and a half) generations old by now. That got me wondering, again, have things changed? Well, of course, I HAD to do some benchmarkig of my own on Turing, and at least there, yes they have. Not by much, but now barrett77 is faster than barrett87 by about 1%.
Exponent tested: 66362159, bit depth 68-69 (the same as in kernel_benchmarks.txt), less classes, debug RAW GPU BENCH mode on (disables sieving so the GHz-d/d numbers are low because of that), CUDA 10.1 and RTX 2080 locked at 1800 MHz: Code:
time GHz-d/day barrett76_mul32_gs 02:15.827 572.49 barrett77_mul32_gs 02:24.794 537.04 barrett87_mul32_gs 02:26.262 531.65 barrett88_mul32_gs 02:30.296 517.38 barrett79_mul32_gs 02:45.376 470.20 barrett92_mul32_gs 02:56.342 440.96 75bit_mul32_gs 04:54.998 263.60 95bit_mul32_gs 06:04.134 213.55 It's a small difference, and it only affects this single bit depth, and separate benchmarks should be run on every architecture to see if there are any changes there as well. A lot of work, so is it worth it? I'd like to think yes, since GPU72 is now factoring over 76 bits, and every little bit of extra performance should help. |
![]() |
![]() |
![]() |
#3091 |
"/X\(‘-‘)/X\"
Jan 2013
292910 Posts |
![]()
It would be worth it to test numbers around 90M, 100M, and 110M, too.
|
![]() |
![]() |
![]() |
Thread Tools | |
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
mfakto: an OpenCL program for Mersenne prefactoring | Bdot | GPU Computing | 1668 | 2020-12-22 15:38 |
The P-1 factoring CUDA program | firejuggler | GPU Computing | 753 | 2020-12-12 18:07 |
gr-mfaktc: a CUDA program for generalized repunits prefactoring | MrRepunit | GPU Computing | 32 | 2020-11-11 19:56 |
mfaktc 0.21 - CUDA runtime wrong | keisentraut | Software | 2 | 2020-08-18 07:03 |
World's second-dumbest CUDA program | fivemack | Programming | 112 | 2015-02-12 22:51 |