mersenneforum.org  

Go Back   mersenneforum.org > Extra Stuff > Blogorrhea > kriesel

Closed Thread
 
Thread Tools
Old 2018-05-29, 13:50   #1
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

4,201 Posts
Default Mfakto-specific reference material

This thread is intended as a home for reference material specific to the mfakto program.
(Suggestions are welcome. Discussion posts in this thread are not encouraged. Please use the reference material discussion thread http://www.mersenneforum.org/showthread.php?t=23383. Off-topic posts may be moved or removed, to keep the reference threads clean, tidy, and useful.)


Mfakto howto

The following assumes you've already read the generic How to get started in gpu computing for GIMPS portion of https://www.mersenneforum.org/showpo...89&postcount=1, and duplicates little of that here.

Download a suitable version of Mfakto from here.
If you can't find a suitable version there, you may be able to get one built by one of the participants on the Mfakto thread.

Install in a suitable user subfolder.
Set the needed folder permissions so suitable permissions are inherited by files created there.

Modify the mfakto.ini file to customize it to your gpu model, PrimeNet account name, and system name.
(I usually incorporate system name, gpu identification, and instance number together. Something like systemname/gpuid-wn; condorella/rx480-w2 for example.)
Create a Windows batch file or Linux shell script with a short name.
Set the device number there.
Consider redirecting console output to a file or employing a good tee program.

Create a desktop shortcut for easy launch of the batch file or script. (Eventually, for multiple instances or multiple GPUs this could launch a routine that invokes the individual-instance files with short time delays between, so you have a few seconds to see whether each launched correctly or a bug occurred.)

You may want to try GPU-Z as a utility on your Windows system to see an indication of what the computer thinks is installed for your gpu (OpenCL OpenGL etc), graphically monitor gpu parameters, maybe even log them if you want. One of many utilities listed in https://www.mersenneforum.org/showpo...74&postcount=6 which also lists some Linux alternatives. It can be handy while getting a gpu application going. When it's not needed shut it down along with other idle applications to reduce overhead that's costing performance.

Now is a good time to run Mfakto with -h >>help.txt in your working directory. Run once, refer to as often as needed.

Specifying device is a little different in some OpenCL programs, including Mfakto.
Multiple platforms may have OpenCL support on the same system.
Examples of OpenCL platform on a hypothetical system are:
Intel cpu package
NVIDIA OpenCL driver
AMD OpenCL driver
Specifying device in Mfakto mostly uses a digit for platform followed by a digit for device on that platform.
Device specifications in Mfakto command lines for that hypothetical platform might be
-d 00 Intel cpu
-d 01 Intel IGP
-d 10 First OpenCL supported NVIDIA gpu
-d 11 second OpenCL supported NVIDIA gpu
-d 20 First OpenCL supported AMD gpu
I recommend starting from an otherwise idle system, and testing with device load monitoring, which device specification loads which device, one batch file or shell script at a time, and including documenting comments in them.

Run the self test, for each device. mfakto -st -d xy >>selftest.txt
Or the longer one; mfakto -st2 -d xy >>selftest.txt
Check the results. Resolve any reliability issues before proceeding to real GIMPS work.

Create a worktodo.txt file and put some assignments in there. Start with few or only one, in case your GPU or IGP does not work out. Get the type you plan to run the most. Get them from https://www.mersenne.org/manual_gpu_assignment/

Results are reported manually at https://www.mersenne.org/manual_result/

Run one instance with default settings, modify tuning in the mfakto.ini file, document performance for each modification. Tune one variable at a time. For best accuracy and reproducibility, minimize interactive use during a benchmark period. Averaging many timing samples in a spreadsheet improves accuracy. See Tuning mfakto.ini for performance for tuning advice.

When substantially changing type of work, such as switching between 100M tasks and 100Mdigit tasks, or significantly changing bit levels, especially when changing kernels results, re-tuning is suggested. (You may want to save different tunes for different exponent and bit level ranges, for later reuse.)
For best performance use a SSD, tune for single-instance first, and test for maximum throughput by experimenting with multiple instances last. (I suggest keeping the work of simultaneous instances similar. If they are too different, different kernels may reduce throughput. Test for that too.)
Multiple instances may give slightly higher or lower sustained throughput, and will keep the gpu working if one instance has a problem, runs out of work, or is stopped briefly to replenish work and report results. Slight testing on one gpu indicated two instances produced a few percent lower throughput. So chaining through a batch file or shell script may be better.
Slow GPUs benefit less from large GpuSieveSize and multiple instances; fast GPUs benefit more, in Mfaktc, and Mfakto may act similarly.

Get familiar with the rocm-smi command line tool at some point if running on Linux with ROCm. That's more efficient for when you get into production mode.
Rocm-smi has less overhead than graphical monitoring utilities such as GPU-Z. Unfortunately I know of no Windows equivalent.

Beware, Gpu-z gpu order may not match Mfakto device number order in the case of multiple GPUs per system.

After getting the program functioning manually, you can consider continuing to operate it that way, or trying one of the client management software described at Available Mersenne prime hunting client management software, each of which have their own install requirements, or GPU to 72 https://mersenneforum.org/forumdisplay.php?f=95

(Note much of the preceding was derived from or copied from https://mersenneforum.org/showthread.php?t=25673)


Table of Contents:
  1. This post
  2. Run time versus exponent and bit level for an RX480 or RX550 http://www.mersenneforum.org/showpos...59&postcount=2
  3. Bug and wish list http://www.mersenneforum.org/showpos...37&postcount=3
  4. Gpu compatibility http://www.mersenneforum.org/showpos...80&postcount=4
  5. Mfakto -h help output https://www.mersenneforum.org/showpo...92&postcount=5
  6. Tuning mfakto.ini for performance https://www.mersenneforum.org/showpo...80&postcount=6
  7. etc tbd
See also the Concepts in GIMPS Trial Factoring post at https://www.mersenneforum.org/showpo...23&postcount=6


Top of reference tree: https://www.mersenneforum.org/showpo...22&postcount=1

Last fiddled with by kriesel on 2020-07-16 at 19:05 Reason: added tuning for performance, howto section
kriesel is online now  
Old 2018-05-29, 13:50   #2
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

4,201 Posts
Default run time versus exponent and bit level for an RX480 or RX550

First attachment shows run times in seconds for a wide variety of exponents and bit levels, for the AMD (MSI) RX550, although necessarily a sparse array.

Second shows the associated Ghz-days/day ratings. (Why isn't that the more concise and units-canceled Ghz-equivalent, abbreviated Ghz-eq?)

No RX480 data assembled yet.


Top of reference tree: https://www.mersenneforum.org/showpo...22&postcount=1
Attached Files
File Type: pdf mfakto trial factoring runtime scaling.pdf (19.2 KB, 47 views)
File Type: pdf mfakto trial factoring rx550 ghz-eq.pdf (395.3 KB, 61 views)

Last fiddled with by kriesel on 2019-11-18 at 14:05
kriesel is online now  
Old 2018-05-30, 16:54   #3
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

4,201 Posts
Default mfakto bug and wish list

Here is the most current posted version of the list I am maintaining for mfakto. As always, this is in appreciation of the authors' past contributions. Users may want to browse this for workarounds included in some of the descriptions, and for an awareness of some known pitfalls. Please respond with any comments, additions or suggestions you may have, preferably by PM to kriesel, or in the separate discussion thread here.


Top of reference tree: https://www.mersenneforum.org/showpo...22&postcount=1
Attached Files
File Type: pdf mfakto bug and wish list.pdf (44.3 KB, 50 views)

Last fiddled with by kriesel on 2019-11-18 at 14:06
kriesel is online now  
Old 2018-05-31, 04:17   #4
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

4,201 Posts
Default Gpu compatibility

Simply:
AMD yes
Intel IGP some (including HD4000 ~5GhzD/d, HD4600 or HD 530 or HD620 ~18GhD/d, UHD630 ~20GhD/d)
NVIDIA no; use mfaktc
Mali (in some cell phones) apparently not yet
Frequently what are thought to be compatibility issues are issues with the gpu's OpenCl driver or device number on the command line. Divide and conquer, with an OpenCl test utility. OpenCl-Z is one for Windows.


Top of reference tree: https://www.mersenneforum.org/showpo...22&postcount=1

Last fiddled with by kriesel on 2019-11-18 at 14:06
kriesel is online now  
Old 2019-08-12, 15:35   #5
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

420110 Posts
Default Mfakto -h help output

Code:
mfakto 0.15pre6-Win (64bit build)

mfakto (mfakto 0.15pre6-Win) Copyright (C) 2009-2014,
  Oliver Weihe (o.weihe@t-online.de)
  Bertram Franz (bertramf@gmx.net)
This program comes with ABSOLUTELY NO WARRANTY; for details see COPYING.
This is free software, and you are welcome to redistribute it
under certain conditions; see COPYING for details.


Usage: mfakto [options]
  -h|--help              display this help
  -d <xy>                specify to use OpenCL platform number x and
                         device number y in this program
  -d c                   force using all CPUs
  -d g                   force using the first GPU
  -v <n>                 verbosity level: 0=terse, 1=normal, 2=verbose, 3=debug
  -tf <exp> <min> <max>  trial factor M<exp> from 2^<min> to 2^<max>
                         instead of parsing the worktodo file
  -i|--inifile <file>    load <file> as inifile (default: mfakto.ini)
  -st                    selftest using the optimal kernel per testcase
  -st2                   selftest using all possible kernels

options for debugging purposes
  --timertest            test of timer functions
  --sleeptest            test of sleep functions
  --perftest [<n>]       performance tests, repeat each test <n> times (def: 10)
  --CLtest               test of some OpenCL functions
                         specify -d before --CLtest to test the specified device

Top of reference tree: https://www.mersenneforum.org/showpo...22&postcount=1

Last fiddled with by kriesel on 2019-11-18 at 14:06
kriesel is online now  
Old 2020-07-02, 01:30   #6
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

4,201 Posts
Default Tuning mfakto.ini for performance

Based on extremely limited testing, there was no evidence multiple instances helps in mfakto as it does in mfaktc. There also seems less payoff to large GPUSieveSize.
Code:
AMD RX480 gpu, Windows7 x64 Adrenalin v18.10.2 driver
Factor=E28E...03C3,110088763,73,76 used for performance tuning
Keep MoreClasses=1, CheckpointDelay=900, other default settings, tune
GPUSieveProcessSize
GPUSieveSize
GPUSievePrimes

Initial tune 
# Minimum: GPUSievePrimes=54
# Maximum: GPUSievePrimes=1075766
#
# Default: GPUSievePrimes=81157

GPUSievePrimes=81157

# GPUSieveSize defines how big of a GPU sieve we use (in M bits).
# Higher is usually faster, but the screen may lag easier.
# (GPUSieveSize * 1024) must be a multiple of GPUSieveProcessSize.
#
# Minimum: GPUSieveSize=4
# Maximum: GPUSieveSize=128
#
# Default: GPUSieveSize=96

GPUSieveSize=96

# GPUSieveProcessSize defines how many bits of the sieve each TF block
# processes (in K bits). Larger values may lead to less wasted cycles by
# reducing the number of times all threads in a warp are not TFing a
# candidate.  However, more shared memory is used which may reduce occupancy.
# (GPUSieveSize * 1024) must be a multiple of GPUSieveProcessSize.
#
# Possible values: 8,16,24,32
#
# Default: GPUSieveProcessSize=24

GPUSieveProcessSize=24

# MoreClasses is a switch for defining if 420 (2*2*3*5*7) or 4620 (2*2*3*5*7*11) classes of
# factor candidates should be used. Normally, 4620 gives better results but for very small classes
# 420 reduces the class initialization overhead enough to provide an overall benefit.
# Used only when sieving on the GPU; the CPU-sieve will always use 4620 classes.
#
# Possible values:
# MoreClasses=0 (use 420 classes)
# MoreClasses=1 (use 4620 classes)
#
# Default: MoreClasses=1

MoreClasses=1


GPUSieveProcessSize=24 551.025
GPUSieveProcessSize=32 551.60
GPUSieveProcessSize=16 553.407 *
GPUSieveProcessSize=8  540.646


GPUSieveProcessSize=16 
GPUSieveSize=128 550.26
GPUSieveSize=96  541.096
GPUSieveSize=64  553.93 *
GPUSieveSize=32  537.284
GPUSieveSize=112 538.294

GPUSieveSize=80 transition to 74-75 bit before its start 503.049
GPUSieveSize=64 500.249
GPUSieveSize=96 506.98  *
GPUSieveSize=112 496.092
GPUSieveSize=128 496.938

GPUSieveProcessSize=16, GPUSieveSize=64
GPUSievePrimes=100000 497.504
GPUSievePrimes=91000 500.103
GPUSievePrimes=90000 500.6 ***
GPUSievePrimes=89000 498.529
GPUSievePrimes=80000 497.565

GPUSieveProcessSize=16, GPUSieveSize=64, GPUSievePrimes=90000
2 instances: (identical work M110M 75 to 76bit in two folders)
1 239.73
2 240.51
combined 480.21, 4.65% less;
 vs 503.61 single instance; use single instance


Top of reference tree: https://www.mersenneforum.org/showpo...22&postcount=1

Last fiddled with by kriesel on 2020-07-02 at 02:46
kriesel is online now  
Closed Thread

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
gpuOwL-specific reference material kriesel kriesel 23 2020-07-30 15:17
Reference material discussion thread kriesel kriesel 60 2020-07-19 21:15
CUDALucas-specific reference material kriesel kriesel 9 2020-05-28 23:32
Mfaktc-specific reference material kriesel kriesel 8 2020-04-17 03:50
CUDAPm1-specific reference material kriesel kriesel 12 2019-08-12 15:51

All times are UTC. The time now is 19:56.

Wed Aug 5 19:56:01 UTC 2020 up 19 days, 15:42, 2 users, load averages: 1.87, 1.60, 1.55

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2020, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.