mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware > GPU Computing

Reply
 
Thread Tools
Old 2019-02-09, 19:26   #3081
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

12EF16 Posts
Default

Quote:
Originally Posted by TheJudger View Post
Correct but not complete. First barrett kernel im mfaktc was BARRETT92, all other kernels are stripped down versions.
From BARRETT92 to BARRETT79 first (fixed inverse, multibit in single stage possible, a bit faster)
From there we go from BARRETT92 to BARRETT88 and BARRETT87 by (re)moving interim correction steps and some other "tricks" (loss of accuracy in interim steps (small example 22 mod 10 yields 12 (instead of 2))). Trading accuracy for speed. The same "tricks" lead from BARRETT79 to BARRETT77 and BARRETT76.


"-pre" versions aren't released into the wild and are not intended for productive usage. Removed old stuff (CC 1.x code, CUDA compatibility < 6.5 dropped, minor changes and bugfixed).

Oliver
Thanks for the review, followup on Barretts, and clarification on -pre versions.

I'm looking forward to updates and any bug fixes or enhancements whenever they're ready for field testing. (I can throw a variety of gpu models at it, from CC2.0 up to gtx1080Ti)

One other thing: there are _gs variations of lots of kernels. What does that _gs mean?
kriesel is offline   Reply With Quote
Old 2019-02-09, 20:04   #3082
TheJudger
 
TheJudger's Avatar
 
"Oliver"
Mar 2005
Germany

2·3·5·37 Posts
Default

Quote:
Originally Posted by kriesel View Post
One other thing: there are _gs variations of lots of kernels. What does that _gs mean?
GPU sieve
TheJudger is offline   Reply With Quote
Old 2019-02-11, 18:55   #3083
Uncwilly
6809 > 6502
 
Uncwilly's Avatar
 
"""""""""""""""""""
Aug 2003
101×103 Posts

9,209 Posts
Default

Quote:
Originally Posted by kriesel View Post
Concepts in GIMPS trial factoring (TF) (note, sort of mfaktc oriented, more so toward the end)
This would make a good entry for the wiki.
Uncwilly is offline   Reply With Quote
Old 2019-02-12, 03:06   #3084
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

484710 Posts
Default

Quote:
Originally Posted by Uncwilly View Post
This would make a good entry for the wiki.
It's going in one of my reference threads.
kriesel is offline   Reply With Quote
Old 2019-02-14, 03:13   #3085
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

12EF16 Posts
Default

Quote:
Originally Posted by Uncwilly View Post
This would make a good entry for the wiki.
Feel free to link to https://www.mersenneforum.org/showpo...23&postcount=6 from the wiki.
kriesel is offline   Reply With Quote
Old 2019-02-15, 18:28   #3086
nomead
 
nomead's Avatar
 
"Sam Laur"
Dec 2018
Turku, Finland

2×3×5×11 Posts
Default

Ugh... I'm trying to compile mfaktc on Windows now. Instead of Visual Studio 2012, I got Visual Studio 2017 (Community). File paths are all over the place. It really took a while to find all the extra bits needed so that the compile job would run through. But it seems that C++/CLI support installed, then finding and running vcvars64.bat finally did the trick.

I already installed MinGW earlier for other purposes and thus had gnu make.

Also installed GPU Toolkit 10.0.130.

Still, after a succesful compile, the executable gives this error (also included the last bits of info given by the program) :

Code:
CUDA version info
  binary compiled for CUDA  10.0
  CUDA runtime version      10.0
  CUDA driver version       10.0

CUDA device info
  name                      GeForce RTX 2060
  compute capability        7.5
  max threads per block     1024
  max shared memory per MP  65536 byte
  number of multiprocessors 30
  clock rate (CUDA cores)   1830MHz
  memory clock rate:        7001MHz
  memory bus width:         192 bit

Automatic parameters
  threads per grid          983040
  GPUSievePrimes (adjusted) 82486
  GPUsieve minimum exponent 1055144

running a simple selftest...
ERROR: cudaGetLastError() returned 8: invalid device function
Which is strange, since I added this in the Makefile
Code:
NVCCFLAGS += --generate-code arch=compute_75,code=sm_75 # CC 7.5 Turing
And it seems to generate 7.5 code during the compilation process.

The same thing also happens if I replace code=sm_75 with code=compute_75 to enable just in time compilation.

It shouldn't be because of VS 2017 / VS 2012 differences, but who knows? Maybe I'll try that, too, but not right now
nomead is offline   Reply With Quote
Old 2019-02-15, 19:06   #3087
nomead
 
nomead's Avatar
 
"Sam Laur"
Dec 2018
Turku, Finland

2×3×5×11 Posts
Default

Quote:
Originally Posted by nomead View Post
It shouldn't be because of VS 2017 / VS 2012 differences, but who knows? Maybe I'll try that, too, but not right now
How wrong can I be? First of all, I *had* to try it now on VS 2012. And now, it works! Even despite the NVCC compiler showing warnings like this:
Code:
support for this version of Microsoft Visual Studio has been deprecated! Only the versions between 2013 and 2017 (inclusive) are supported!
nomead is offline   Reply With Quote
Old 2019-02-16, 10:57   #3088
nomead
 
nomead's Avatar
 
"Sam Laur"
Dec 2018
Turku, Finland

2×3×5×11 Posts
Default Compilation notes

The basic outline is documented in the mfaktc README.txt, but here are the specific steps I had to do, to make it work. Let's forget about Visual Studio 2017 for the moment and concentrate on Visual Studio 2012. All installation packages listed here are available for free. Even though a Microsoft account is needed for downloading VS2012 Express, it's free to use. And I'm running on Windows 7 64-bit.

First, I got 64-bit MinGW (originally for other reasons, but it includes GNU make) from
https://nuwen.net/mingw.html
From there, mingw-16.1-without-git.exe is enough for our purposes. Install that somewhere.

Then, Visual Studio 2012 Express for Windows Desktop.
https://my.visualstudio.com/Download...2012%20express
Log in, or create an account and then log in. The one marked "Visual Studio Express 2012" only works on Windows 8 (and up, maybe?) but the "for Windows Desktop" one also works on Windows 7. I got the installer EXE and then ran it.

Finally, CUDA Toolkit 10.0
https://developer.nvidia.com/cuda-downloads
Download and install.

Then prepare the Makefile.win First of all, you need to change the CUDA_DIR to point to where your CUDA Toolkit was installed. For me this was
Code:
CUDA_DIR = "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.0"
After that, add code generation for the cards you're planning to use. For example,
Code:
NVCCFLAGS += --generate-code arch=compute_60,code=sm_60 # CC 6.0 Pascal / GTX10xx
NVCCFLAGS += --generate-code arch=compute_70,code=sm_70 # CC 7.0 Volta / Titan V
NVCCFLAGS += --generate-code arch=compute_75,code=sm_75 # CC 7.5 Turing / RTX20xx, GTX16xx
Then there was a problem with NVCC that needed a fix. It expects to find vcvars64.bat in a certain place, and it seems that the VS2012 Express installer doesn't put it there. Go to C:\Program Files (x86)\Microsoft Visual Studio 11.0\VC\bin and see if the subfolder amd64 exists there, and vcvars64.bat inside of it. If not, you need to copy the subfolder x86_amd64 and its contents to amd64, and rename the now copied vcvarsx86_amd64.bat to vcvars64.bat.

Finally, time to start compiling. Start a command prompt window.

Go to the root folder of where you installed MinGW and run set_distro_paths.bat from there.

Then go to wherever that vcvars64.bat is and run it.

Then go to the mfaktc-0.21 source folder and make -f Makefile.win
Wait a while... (It seems to take a whole lot longer than on Linux gcc + nvcc)
Done!

If you want to compile other versions (more/less classes, Wagstaff) these can be set by editing params.h and then recompiling.
nomead is offline   Reply With Quote
Old 2019-03-11, 14:30   #3089
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

37·131 Posts
Default TF concepts updated

Some of the existing points have been refined or expanded, and I've added several additional points recently. It's now up to 40 entries. It's at https://www.mersenneforum.org/showpo...23&postcount=6
kriesel is offline   Reply With Quote
Old 2019-03-14, 04:47   #3090
nomead
 
nomead's Avatar
 
"Sam Laur"
Dec 2018
Turku, Finland

2·3·5·11 Posts
Default

Another poke at the internals. I was factoring a few exponents in mfaktc, where the bit depth was 76-77 (among others). I wondered why the barrett87_mul32_gs kernel was chosen instead of barrett77_mul32_gs. Then I looked at the kernel_benchmarks.txt in the source directory. Okay, tests were done back in CUDA 5.5 days and the freshest card used was a Tesla K20m, three (and a half) generations old by now. That got me wondering, again, have things changed? Well, of course, I HAD to do some benchmarkig of my own on Turing, and at least there, yes they have. Not by much, but now barrett77 is faster than barrett87 by about 1%.

Exponent tested: 66362159, bit depth 68-69 (the same as in kernel_benchmarks.txt), less classes, debug RAW GPU BENCH mode on (disables sieving so the GHz-d/d numbers are low because of that), CUDA 10.1 and RTX 2080 locked at 1800 MHz:
Code:
                      time     GHz-d/day
barrett76_mul32_gs  02:15.827   572.49
barrett77_mul32_gs  02:24.794   537.04
barrett87_mul32_gs  02:26.262   531.65
barrett88_mul32_gs  02:30.296   517.38
barrett79_mul32_gs  02:45.376   470.20
barrett92_mul32_gs  02:56.342   440.96
75bit_mul32_gs      04:54.998   263.60
95bit_mul32_gs      06:04.134   213.55
There is a selection table in mfaktc.c that only checks for compute capability 1.x (where the speed order was 76 -> 77 -> 87 -> 88 -> 79 -> 92) and all the rest get 76 -> 87 -> 88 -> 77 -> 79 -> 92. So the barrett77_mul32_gs kernel is in effect never selected on anything newer than GTX2xx.

It's a small difference, and it only affects this single bit depth, and separate benchmarks should be run on every architecture to see if there are any changes there as well. A lot of work, so is it worth it? I'd like to think yes, since GPU72 is now factoring over 76 bits, and every little bit of extra performance should help.
nomead is offline   Reply With Quote
Old 2019-03-14, 14:59   #3091
Mark Rose
 
Mark Rose's Avatar
 
"/X\(‘-‘)/X\"
Jan 2013

292910 Posts
Default

It would be worth it to test numbers around 90M, 100M, and 110M, too.
Mark Rose is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
mfakto: an OpenCL program for Mersenne prefactoring Bdot GPU Computing 1668 2020-12-22 15:38
The P-1 factoring CUDA program firejuggler GPU Computing 753 2020-12-12 18:07
gr-mfaktc: a CUDA program for generalized repunits prefactoring MrRepunit GPU Computing 32 2020-11-11 19:56
mfaktc 0.21 - CUDA runtime wrong keisentraut Software 2 2020-08-18 07:03
World's second-dumbest CUDA program fivemack Programming 112 2015-02-12 22:51

All times are UTC. The time now is 04:09.

Sun Jan 24 04:09:54 UTC 2021 up 52 days, 21 mins, 0 users, load averages: 2.38, 2.25, 2.26

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.