mersenneforum.org mfaktc: a CUDA program for Mersenne prefactoring
 User Name Remember Me? Password
 Register FAQ Search Today's Posts Mark Forums Read

2019-02-09, 19:26   #3081
kriesel

"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

12EF16 Posts

Quote:
 Originally Posted by TheJudger Correct but not complete. First barrett kernel im mfaktc was BARRETT92, all other kernels are stripped down versions. From BARRETT92 to BARRETT79 first (fixed inverse, multibit in single stage possible, a bit faster) From there we go from BARRETT92 to BARRETT88 and BARRETT87 by (re)moving interim correction steps and some other "tricks" (loss of accuracy in interim steps (small example 22 mod 10 yields 12 (instead of 2))). Trading accuracy for speed. The same "tricks" lead from BARRETT79 to BARRETT77 and BARRETT76. "-pre" versions aren't released into the wild and are not intended for productive usage. Removed old stuff (CC 1.x code, CUDA compatibility < 6.5 dropped, minor changes and bugfixed). Oliver
Thanks for the review, followup on Barretts, and clarification on -pre versions.

I'm looking forward to updates and any bug fixes or enhancements whenever they're ready for field testing. (I can throw a variety of gpu models at it, from CC2.0 up to gtx1080Ti)

One other thing: there are _gs variations of lots of kernels. What does that _gs mean?

2019-02-09, 20:04   #3082
TheJudger

"Oliver"
Mar 2005
Germany

2·3·5·37 Posts

Quote:
 Originally Posted by kriesel One other thing: there are _gs variations of lots of kernels. What does that _gs mean?
GPU sieve

2019-02-11, 18:55   #3083
Uncwilly
6809 > 6502

"""""""""""""""""""
Aug 2003
101×103 Posts

9,209 Posts

Quote:
 Originally Posted by kriesel Concepts in GIMPS trial factoring (TF) (note, sort of mfaktc oriented, more so toward the end)
This would make a good entry for the wiki.

2019-02-12, 03:06   #3084
kriesel

"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

484710 Posts

Quote:
 Originally Posted by Uncwilly This would make a good entry for the wiki.
It's going in one of my reference threads.

2019-02-14, 03:13   #3085
kriesel

"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

12EF16 Posts

Quote:
 Originally Posted by Uncwilly This would make a good entry for the wiki.
Feel free to link to https://www.mersenneforum.org/showpo...23&postcount=6 from the wiki.

 2019-02-15, 18:28 #3086 nomead     "Sam Laur" Dec 2018 Turku, Finland 2×3×5×11 Posts Ugh... I'm trying to compile mfaktc on Windows now. Instead of Visual Studio 2012, I got Visual Studio 2017 (Community). File paths are all over the place. It really took a while to find all the extra bits needed so that the compile job would run through. But it seems that C++/CLI support installed, then finding and running vcvars64.bat finally did the trick. I already installed MinGW earlier for other purposes and thus had gnu make. Also installed GPU Toolkit 10.0.130. Still, after a succesful compile, the executable gives this error (also included the last bits of info given by the program) : Code: CUDA version info binary compiled for CUDA 10.0 CUDA runtime version 10.0 CUDA driver version 10.0 CUDA device info name GeForce RTX 2060 compute capability 7.5 max threads per block 1024 max shared memory per MP 65536 byte number of multiprocessors 30 clock rate (CUDA cores) 1830MHz memory clock rate: 7001MHz memory bus width: 192 bit Automatic parameters threads per grid 983040 GPUSievePrimes (adjusted) 82486 GPUsieve minimum exponent 1055144 running a simple selftest... ERROR: cudaGetLastError() returned 8: invalid device function Which is strange, since I added this in the Makefile Code: NVCCFLAGS += --generate-code arch=compute_75,code=sm_75 # CC 7.5 Turing And it seems to generate 7.5 code during the compilation process. The same thing also happens if I replace code=sm_75 with code=compute_75 to enable just in time compilation. It shouldn't be because of VS 2017 / VS 2012 differences, but who knows? Maybe I'll try that, too, but not right now
2019-02-15, 19:06   #3087
nomead

"Sam Laur"
Dec 2018
Turku, Finland

2×3×5×11 Posts

Quote:
 Originally Posted by nomead It shouldn't be because of VS 2017 / VS 2012 differences, but who knows? Maybe I'll try that, too, but not right now
How wrong can I be? First of all, I *had* to try it now on VS 2012. And now, it works! Even despite the NVCC compiler showing warnings like this:
Code:
support for this version of Microsoft Visual Studio has been deprecated! Only the versions between 2013 and 2017 (inclusive) are supported!

 2019-02-16, 10:57 #3088 nomead     "Sam Laur" Dec 2018 Turku, Finland 2×3×5×11 Posts Compilation notes The basic outline is documented in the mfaktc README.txt, but here are the specific steps I had to do, to make it work. Let's forget about Visual Studio 2017 for the moment and concentrate on Visual Studio 2012. All installation packages listed here are available for free. Even though a Microsoft account is needed for downloading VS2012 Express, it's free to use. And I'm running on Windows 7 64-bit. First, I got 64-bit MinGW (originally for other reasons, but it includes GNU make) from https://nuwen.net/mingw.html From there, mingw-16.1-without-git.exe is enough for our purposes. Install that somewhere. Then, Visual Studio 2012 Express for Windows Desktop. https://my.visualstudio.com/Download...2012%20express Log in, or create an account and then log in. The one marked "Visual Studio Express 2012" only works on Windows 8 (and up, maybe?) but the "for Windows Desktop" one also works on Windows 7. I got the installer EXE and then ran it. Finally, CUDA Toolkit 10.0 https://developer.nvidia.com/cuda-downloads Download and install. Then prepare the Makefile.win First of all, you need to change the CUDA_DIR to point to where your CUDA Toolkit was installed. For me this was Code: CUDA_DIR = "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.0" After that, add code generation for the cards you're planning to use. For example, Code: NVCCFLAGS += --generate-code arch=compute_60,code=sm_60 # CC 6.0 Pascal / GTX10xx NVCCFLAGS += --generate-code arch=compute_70,code=sm_70 # CC 7.0 Volta / Titan V NVCCFLAGS += --generate-code arch=compute_75,code=sm_75 # CC 7.5 Turing / RTX20xx, GTX16xx Then there was a problem with NVCC that needed a fix. It expects to find vcvars64.bat in a certain place, and it seems that the VS2012 Express installer doesn't put it there. Go to C:\Program Files (x86)\Microsoft Visual Studio 11.0\VC\bin and see if the subfolder amd64 exists there, and vcvars64.bat inside of it. If not, you need to copy the subfolder x86_amd64 and its contents to amd64, and rename the now copied vcvarsx86_amd64.bat to vcvars64.bat. Finally, time to start compiling. Start a command prompt window. Go to the root folder of where you installed MinGW and run set_distro_paths.bat from there. Then go to wherever that vcvars64.bat is and run it. Then go to the mfaktc-0.21 source folder and make -f Makefile.win Wait a while... (It seems to take a whole lot longer than on Linux gcc + nvcc) Done! If you want to compile other versions (more/less classes, Wagstaff) these can be set by editing params.h and then recompiling.
 2019-03-11, 14:30 #3089 kriesel     "TF79LL86GIMPS96gpu17" Mar 2017 US midwest 37·131 Posts TF concepts updated Some of the existing points have been refined or expanded, and I've added several additional points recently. It's now up to 40 entries. It's at https://www.mersenneforum.org/showpo...23&postcount=6
 2019-03-14, 04:47 #3090 nomead     "Sam Laur" Dec 2018 Turku, Finland 2·3·5·11 Posts Another poke at the internals. I was factoring a few exponents in mfaktc, where the bit depth was 76-77 (among others). I wondered why the barrett87_mul32_gs kernel was chosen instead of barrett77_mul32_gs. Then I looked at the kernel_benchmarks.txt in the source directory. Okay, tests were done back in CUDA 5.5 days and the freshest card used was a Tesla K20m, three (and a half) generations old by now. That got me wondering, again, have things changed? Well, of course, I HAD to do some benchmarkig of my own on Turing, and at least there, yes they have. Not by much, but now barrett77 is faster than barrett87 by about 1%. Exponent tested: 66362159, bit depth 68-69 (the same as in kernel_benchmarks.txt), less classes, debug RAW GPU BENCH mode on (disables sieving so the GHz-d/d numbers are low because of that), CUDA 10.1 and RTX 2080 locked at 1800 MHz: Code:  time GHz-d/day barrett76_mul32_gs 02:15.827 572.49 barrett77_mul32_gs 02:24.794 537.04 barrett87_mul32_gs 02:26.262 531.65 barrett88_mul32_gs 02:30.296 517.38 barrett79_mul32_gs 02:45.376 470.20 barrett92_mul32_gs 02:56.342 440.96 75bit_mul32_gs 04:54.998 263.60 95bit_mul32_gs 06:04.134 213.55 There is a selection table in mfaktc.c that only checks for compute capability 1.x (where the speed order was 76 -> 77 -> 87 -> 88 -> 79 -> 92) and all the rest get 76 -> 87 -> 88 -> 77 -> 79 -> 92. So the barrett77_mul32_gs kernel is in effect never selected on anything newer than GTX2xx. It's a small difference, and it only affects this single bit depth, and separate benchmarks should be run on every architecture to see if there are any changes there as well. A lot of work, so is it worth it? I'd like to think yes, since GPU72 is now factoring over 76 bits, and every little bit of extra performance should help.
 2019-03-14, 14:59 #3091 Mark Rose     "/X\(‘-‘)/X\" Jan 2013 292910 Posts It would be worth it to test numbers around 90M, 100M, and 110M, too.

 Thread Tools

 Similar Threads Thread Thread Starter Forum Replies Last Post Bdot GPU Computing 1668 2020-12-22 15:38 firejuggler GPU Computing 753 2020-12-12 18:07 MrRepunit GPU Computing 32 2020-11-11 19:56 keisentraut Software 2 2020-08-18 07:03 fivemack Programming 112 2015-02-12 22:51

All times are UTC. The time now is 04:09.

Sun Jan 24 04:09:54 UTC 2021 up 52 days, 21 mins, 0 users, load averages: 2.38, 2.25, 2.26

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.