![]() |
![]() |
#1 |
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
11100110011112 Posts |
![]()
This thread is intended as a home for reference material specific to the mfakto program.
(Suggestions are welcome. Discussion posts in this thread are not encouraged. Please use the reference material discussion thread http://www.mersenneforum.org/showthread.php?t=23383. Off-topic posts may be moved or removed, to keep the reference threads clean, tidy, and useful.) Mfakto howto The following assumes you've already read the generic How to get started in gpu computing for GIMPS portion of https://www.mersenneforum.org/showpo...89&postcount=1, and duplicates little of that here. Download a suitable version of Mfakto from here. If you can't find a suitable version there, you may be able to get one built by one of the participants on the Mfakto thread. Install in a suitable user subfolder. Set the needed folder permissions so suitable permissions are inherited by files created there. Modify the mfakto.ini file to customize it to your gpu model, PrimeNet account name, and system name. (I usually incorporate system name, gpu identification, and instance number together. Something like systemname/gpuid-wn; condorella/rx480-w2 for example.) Create a Windows batch file or Linux shell script with a short name. Set the device number there. Consider redirecting console output to a file or employing a good tee program. Create a desktop shortcut for easy launch of the batch file or script. (Eventually, for multiple instances or multiple GPUs this could launch a routine that invokes the individual-instance files with short time delays between, so you have a few seconds to see whether each launched correctly or a bug occurred.) You may want to try GPU-Z as a utility on your Windows system to see an indication of what the computer thinks is installed for your gpu (OpenCL OpenGL etc), graphically monitor gpu parameters, maybe even log them if you want. One of many utilities listed in https://www.mersenneforum.org/showpo...74&postcount=6 which also lists some Linux alternatives. It can be handy while getting a gpu application going. When it's not needed shut it down along with other idle applications to reduce overhead that's costing performance. Now is a good time to run Mfakto with -h >>help.txt in your working directory. Run once, refer to as often as needed. Specifying device is a little different in some OpenCL programs, including Mfakto. Multiple platforms may have OpenCL support on the same system. Examples of OpenCL platform on a hypothetical system are: Intel cpu package NVIDIA OpenCL driver AMD OpenCL driver Specifying device in Mfakto mostly uses a digit for platform followed by a digit for device on that platform. Device specifications in Mfakto command lines for that hypothetical platform might be -d 00 Intel cpu -d 01 Intel IGP -d 10 First OpenCL supported NVIDIA gpu -d 11 second OpenCL supported NVIDIA gpu -d 20 First OpenCL supported AMD gpu I recommend starting from an otherwise idle system, and testing with device load monitoring, which device specification loads which device, one batch file or shell script at a time, and including documenting comments in them. Run the self test, for each device. mfakto -st -d xy >>selftest.txt Or the longer one; mfakto -st2 -d xy >>selftest.txt Check the results. Resolve any reliability issues before proceeding to real GIMPS work. It may be necessary to install MSCR110.dll or MSCP110.dll or both on Windows. in case of further difficulty, see the debug options for mfakto in the help.txt generated earlier. Or to check which device numbers are available, or other utilities, see https://www.mersenneforum.org/showpo...74&postcount=6 Create a worktodo.txt file and put some assignments in there. Start with few or only one, in case your GPU or IGP does not work out. Get the type you plan to run the most. Get them from https://www.mersenne.org/manual_gpu_assignment/ Results are reported manually at https://www.mersenne.org/manual_result/ Run one instance with default settings, modify tuning in the mfakto.ini file, document performance for each modification. Tune one variable at a time. For best accuracy and reproducibility, minimize interactive use during a benchmark period. Averaging many timing samples in a spreadsheet improves accuracy. See Tuning mfakto.ini for performance for tuning advice. When substantially changing type of work, such as switching between 100M tasks and 100Mdigit tasks, or significantly changing bit levels, especially when changing kernels results, re-tuning is suggested. (You may want to save different tunes for different exponent and bit level ranges, for later reuse.) For best performance use a SSD, tune for single-instance first, and test for maximum throughput by experimenting with multiple instances last. (I suggest keeping the work of simultaneous instances similar. If they are too different, different kernels may reduce throughput. Test for that too.) Multiple instances may give slightly higher or lower sustained throughput, and will keep the gpu working if one instance has a problem, runs out of work, or is stopped briefly to replenish work and report results. Slight testing on one gpu indicated two instances produced a few percent lower throughput. So chaining through a batch file or shell script may be better. Slow GPUs benefit less from large GpuSieveSize and multiple instances; fast GPUs benefit more, in Mfaktc, and Mfakto may act similarly. Get familiar with the rocm-smi command line tool at some point if running on Linux with ROCm. That's more efficient for when you get into production mode. Rocm-smi has less overhead than graphical monitoring utilities such as GPU-Z. Unfortunately I know of no Windows equivalent. Beware, Gpu-z gpu order may not match Mfakto device number order in the case of multiple GPUs per system. After getting the program functioning manually, you can consider continuing to operate it that way, or trying one of the client management software described at Available Mersenne prime hunting client management software, each of which have their own install requirements, or GPU to 72 https://mersenneforum.org/forumdisplay.php?f=95 (Note much of the preceding was derived from or copied from https://mersenneforum.org/showthread.php?t=25673) Table of Contents:
Top of reference tree: https://www.mersenneforum.org/showpo...22&postcount=1 Last fiddled with by kriesel on 2021-09-13 at 16:04 Reason: added msvcr110.dll, msvcp110.dll, debug options |
![]() |
![]() |
#2 |
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
53×59 Posts |
![]()
First attachment shows run times in seconds for a wide variety of exponents and bit levels, for the AMD (MSI) RX550, although necessarily a sparse array.
Second shows the associated Ghz-days/day ratings. (Why isn't that the more concise and units-canceled Ghz-equivalent, abbreviated Ghz-eq?) No RX480 data assembled yet. Top of reference tree: https://www.mersenneforum.org/showpo...22&postcount=1 Last fiddled with by kriesel on 2019-11-18 at 14:05 |
![]() |
![]() |
#3 |
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
11100110011112 Posts |
![]()
Here is the most current posted version of the list I am maintaining for mfakto. As always, this is in appreciation of the authors' past contributions. Users may want to browse this for workarounds included in some of the descriptions, and for an awareness of some known pitfalls. Please respond with any comments, additions or suggestions you may have, preferably by PM to kriesel, or in the separate discussion thread here.
Top of reference tree: https://www.mersenneforum.org/showpo...22&postcount=1 Last fiddled with by kriesel on 2019-11-18 at 14:06 |
![]() |
![]() |
#4 |
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
737510 Posts |
![]()
Simply:
AMD yes Intel IGP some (including HD4000 ~5GhzD/d, HD4600 or HD 530 or HD620 ~18GhD/d, UHD630 ~20GhD/d) NVIDIA no; use mfaktc Mali (in some cell phones) apparently not yet Frequently what are thought to be compatibility issues are issues with the gpu's OpenCl driver or device number on the command line. Divide and conquer, with an OpenCl test utility. OpenCl-Z is one for Windows. Top of reference tree: https://www.mersenneforum.org/showpo...22&postcount=1 Last fiddled with by kriesel on 2019-11-18 at 14:06 |
![]() |
![]() |
#5 |
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
163178 Posts |
![]() Code:
mfakto 0.15pre6-Win (64bit build) mfakto (mfakto 0.15pre6-Win) Copyright (C) 2009-2014, Oliver Weihe (o.weihe@t-online.de) Bertram Franz (bertramf@gmx.net) This program comes with ABSOLUTELY NO WARRANTY; for details see COPYING. This is free software, and you are welcome to redistribute it under certain conditions; see COPYING for details. Usage: mfakto [options] -h|--help display this help -d <xy> specify to use OpenCL platform number x and device number y in this program -d c force using all CPUs -d g force using the first GPU -v <n> verbosity level: 0=terse, 1=normal, 2=verbose, 3=debug -tf <exp> <min> <max> trial factor M<exp> from 2^<min> to 2^<max> instead of parsing the worktodo file -i|--inifile <file> load <file> as inifile (default: mfakto.ini) -st selftest using the optimal kernel per testcase -st2 selftest using all possible kernels options for debugging purposes --timertest test of timer functions --sleeptest test of sleep functions --perftest [<n>] performance tests, repeat each test <n> times (def: 10) --CLtest test of some OpenCL functions specify -d before --CLtest to test the specified device Top of reference tree: https://www.mersenneforum.org/showpo...22&postcount=1 Last fiddled with by kriesel on 2019-11-18 at 14:06 |
![]() |
![]() |
#6 |
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
11100110011112 Posts |
![]()
Based on extremely limited testing, there was no evidence multiple instances helps in mfakto as it does in mfaktc. There also seems less payoff to large GPUSieveSize.
Code:
AMD RX480 gpu, Windows7 x64 Adrenalin v18.10.2 driver Factor=E28E...03C3,110088763,73,76 used for performance tuning Keep MoreClasses=1, CheckpointDelay=900, other default settings, tune GPUSieveProcessSize GPUSieveSize GPUSievePrimes Initial tune # Minimum: GPUSievePrimes=54 # Maximum: GPUSievePrimes=1075766 # # Default: GPUSievePrimes=81157 GPUSievePrimes=81157 # GPUSieveSize defines how big of a GPU sieve we use (in M bits). # Higher is usually faster, but the screen may lag easier. # (GPUSieveSize * 1024) must be a multiple of GPUSieveProcessSize. # # Minimum: GPUSieveSize=4 # Maximum: GPUSieveSize=128 # # Default: GPUSieveSize=96 GPUSieveSize=96 # GPUSieveProcessSize defines how many bits of the sieve each TF block # processes (in K bits). Larger values may lead to less wasted cycles by # reducing the number of times all threads in a warp are not TFing a # candidate. However, more shared memory is used which may reduce occupancy. # (GPUSieveSize * 1024) must be a multiple of GPUSieveProcessSize. # # Possible values: 8,16,24,32 # # Default: GPUSieveProcessSize=24 GPUSieveProcessSize=24 # MoreClasses is a switch for defining if 420 (2*2*3*5*7) or 4620 (2*2*3*5*7*11) classes of # factor candidates should be used. Normally, 4620 gives better results but for very small classes # 420 reduces the class initialization overhead enough to provide an overall benefit. # Used only when sieving on the GPU; the CPU-sieve will always use 4620 classes. # # Possible values: # MoreClasses=0 (use 420 classes) # MoreClasses=1 (use 4620 classes) # # Default: MoreClasses=1 MoreClasses=1 GPUSieveProcessSize=24 551.025 GPUSieveProcessSize=32 551.60 GPUSieveProcessSize=16 553.407 * GPUSieveProcessSize=8 540.646 GPUSieveProcessSize=16 GPUSieveSize=128 550.26 GPUSieveSize=96 541.096 GPUSieveSize=64 553.93 * GPUSieveSize=32 537.284 GPUSieveSize=112 538.294 GPUSieveSize=80 transition to 74-75 bit before its start 503.049 GPUSieveSize=64 500.249 GPUSieveSize=96 506.98 * GPUSieveSize=112 496.092 GPUSieveSize=128 496.938 GPUSieveProcessSize=16, GPUSieveSize=64 GPUSievePrimes=100000 497.504 GPUSievePrimes=91000 500.103 GPUSievePrimes=90000 500.6 *** GPUSievePrimes=89000 498.529 GPUSievePrimes=80000 497.565 GPUSieveProcessSize=16, GPUSieveSize=64, GPUSievePrimes=90000 2 instances: (identical work M110M 75 to 76bit in two folders) 1 239.73 2 240.51 combined 480.21, 4.65% less; vs 503.61 single instance; use single instance Last fiddled with by kriesel on 2021-09-13 at 17:07 |
![]() |
![]() |
Thread Tools | |
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Reference material discussion thread | kriesel | kriesel | 96 | 2022-12-12 22:32 |
gpuOwL-specific reference material | kriesel | kriesel | 32 | 2022-08-07 17:06 |
Mfaktc-specific reference material | kriesel | kriesel | 9 | 2022-05-15 13:21 |
CUDALucas-specific reference material | kriesel | kriesel | 9 | 2020-05-28 23:32 |
CUDAPm1-specific reference material | kriesel | kriesel | 12 | 2019-08-12 15:51 |