Register FAQ Search Today's Posts Mark Forums Read

2018-06-09, 13:49   #2
kriesel

"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

113538 Posts
Save file format as described by ewmayer

As posted at http://www.mersenneforum.org/showpos...4&postcount=36 by ewmayer, except as updated in bold, first describing the V17.x format:, then planned V18 format additions:
Quote:
 Here is the current Mlucas file format: o 1 byte for test type, numerically, i.e. 256 possible values, mapped to an internal table; o 1 byte for modulus type, currently only Mersennes and Fermats supported; o 8 bytes for iteration count of the residue stored in the file; o ceiling(p/8) bytes for the residue R - i.e. maximally byte-compact, endian-and-FFT-length-of-run-independent; o 8 bytes for Res64 = R (mod 2^64), which should match the leading 8 full-residue bytes in the above bytewise form); o 5 bytes for R (mod 2^35-1); o 5 bytes for R (mod 2^36-1) [these last 2 a.k.a. the Selfridge-Hurwitz residues, based on those guy's Fermat-number work, using a 36-bit-hardware-integer machine; SH also used R (mod 2^36), but that is just the low 36 bits of GIMPS' Res64]; After reading R, I directly compute the two SH residues and compare to the above file-stored checksums; this gives me an md5/sha1-style integrity check of the whole residue R, which the Res64 does not. For v18, I am adding several new fields: o 3 bytes (was 4) for FFT-length-in-K which the code was using at time of savefile write. This is so that if the code switches to a larger-than-default FFT length mid-run based on ROE behavior for the exponent in question, it will immediately resume using the larger FFT length on restart-from-interrupt, rather than resuming using the smaller default FFT length as the current release does. o 8 bytes for circular-shift to apply to the (unshifted) residue read from the file. I include the shift-count-at-iteration-of-savefile-write because [a] the code will choose a random shift count at run-start time (i.e. since this is not specified by the Primenet server, it cannot be read from the worktodo file), and [b] it saves the need for taking an initial-shift value s from the savefile and computing s * 2^iter (mod p). I remove the shift from R prior to the savefile write, so in fact there's really no need to store s to the file (i.e. I could resume-from-interrupt using an entirely different random shift value, applied to R after reading it from the savefile), but for aesthetic reasons I like the idea of doing the whole run based on a single initial value of s, rather than as-many-values-of-s-as-there-were-run-interrupts.
For V19, LL save file format is the same as for V18, while for PRP, per https://www.mersenneforum.org/showpo...50&postcount=6, additionally, following those fields, are:
• full-length residue byte-array (this one holding the accumulated Gerbicz checkproduct) ceiling(p/8) bytes for the value G - i.e. maximally byte-compact, endian-and-FFT-length-of-run-independent;
• 8 bytes for G (mod 2^64), which should match the leading 8 full-residue bytes in the above bytewise form;
• 5 bytes for G (mod 2^35-1);
• 5 bytes for G (mod 2^36-1)
The residue byte-arrays are least significant byte first.

I note that the exponent p itself is in the file name, not in the contents. Also note mlucas V19 PRP implementation is type 1 residues only.

Save file names are p<exponent>, q<exponent>, p<exponent>.10M, etc. For example, for M332220523, p33220523, q33220523, p33220523.10M, and eventually p33220523.20M and so on.

File sizes derived from the preceding are:
• V17.x: 28 + ceiling(p/8) bytes
• V18, and V19 LL: 39 + ceiling(p/8) bytes
• V19 PRP: 57 + 2 * ceiling(p/8) bytes
Size of the V17 p332220523 file is 41527594 bytes; check. (Size allocated on disk is larger due to the disk block size.)

Top of reference tree: https://www.mersenneforum.org/showpo...22&postcount=1

Last fiddled with by kriesel on 2020-05-05 at 00:46 Reason: update for V19 PRP type

 2019-08-12, 17:07 #3 kriesel     "TF79LL86GIMPS96gpu17" Mar 2017 US midwest 10010111010112 Posts Mlucas v17.1 -h help output Code: Mlucas 17.1 http://hogranch.com/mayer/README.html INFO: testing qfloat routines... CPU Family = x86_64, OS = Linux, 64-bit Version, compiled with Gnu C [or other compatible], Version 8.2.0. INFO: CPU supports SSE2 instruction set, but using scalar floating-point build. INFO: Using inline-macro form of MUL_LOHI64. INFO: MLUCAS_PATH is set to "" INFO: using 64-bit-significand form of floating-double rounding constant for scalar-mode DNINT emulation. Setting DAT_BITS = 10, PAD_BITS = 2 INFO: testing IMUL routines... INFO: testing FFT radix tables... For the full list of command line options, run the program with the -h flag. Mlucas command line options: Symbol and abbreviation key: : carriage return | : separator for one-of-the-following multiple-choice menus [] : encloses optional arguments {} : denotes user-supplied numerical arguments of the type noted. ({int} means nonnegative integer, {+int} = positive int, {float} = float.) -argument : Vertical stacking indicates argument short 'nickname' options, -arg : e.g. in this example '-arg' can be used in place of '-argument'. Supported arguments: Default mode: looks for a worktodo.ini file in the local directory; if none found, prompts for manual keyboard entry Help submenus by topic. No additional arguments may follow the displayed ones: -s Post-build self-testing for various FFT-length rnages. -fftlen FFT-length setting. -radset FFT radix-set specification. -m[ersenne] Mersenne-number primality testing. -f[ermat] Fermat-number primality testing. -iters Iteration-number setting. -nthread|cpu Setting threadcount and CPU core affinity. *** NOTE: *** The following self-test options will cause an mlucas.cfg file containing the optimal FFT radix set for the runlength(s) tested to be created (if one did not exist previously) or appended (if one did) with new timing data. Such a file-write is triggered by each complete set of FFT radices available at a given FFT length being tested, i.e. by a self-test without a user-specified -radset argument. (A user-specific Mersenne exponent may be supplied via the -m flag; if none is specified, the program will use the largest permissible exponent for the given FFT length, based on its internal length-setting algorithm). The user must specify the number of iterations for the self-test via the -iters flag; while it is not required, it is strongly recommended to stick to one of the standard timing-test values of -iters = [100,1000,10000], with the larger values being preferred for multithreaded timing tests, in order to assure a decently large slice of CPU time. Similarly, it is recommended to not use the -m flag for such tests, unless roundoff error levels on a given compute platform are such that the default exponent at one or more FFT lengths of interest prevents a reasonable sampling of available radix sets at same. If the user lets the program set the exponent and uses one of the aforementioned standard self-test iteration counts, the resulting best-timing FFT radix set will only be written to the resulting mlucas.cfg file if the timing-test result matches the internally- stored precomputed one for the given default exponent at the iteration count in question, with eligible radix sets consisting of those for which the roundoff error remains below an acceptable threshold. If the user instead specifies the exponent (only allowed for a single-FFT-length timing test)**************** and/or a non-default iteration number, the resulting best-timing FFT radix set will only be written to the resulting mlucas.cfg file if the timing-test results match each other? ********* check logic here ******* This is important for tuning code parameters to your particular platform. FOR BEST RESULTS, RUN ANY SELF-TESTS UNDER ZERO- OR CONSTANT-LOAD CONDITIONS -s {...} Self-test, user must also supply exponent [via -m or -f] and/or FFT length to use. -s tiny Runs 100-iteration self-tests on set of 32 Mersenne exponents, ranging from 173431 to 2455003 -s t This will take around 1 minute on a fast CPU.. -s small Runs 100-iteration self-tests on set of 24 Mersenne exponents, ranging from 173431 to 1245877 -s s This will take around 10 minutes on a fast CPU.. **** THIS IS THE ONLY SELF-TEST ORDINARY USERS ARE RECOMMENDED TO DO: ****** * * * -s medium Runs set of 24 Mersenne exponents, ranging from 1327099 to 9530803 * -s m This will take around an hour on a fast CPU. * * * **************************************************************************** -s large Runs set of 24 Mersenne exponents, ranging from 10151971 to 72851621 -s l This will take around an hour on a fast CPU. -s huge Runs set of 16 Mersenne exponents, ranging from 77597293 to 282508657 -s h This will take a couple of hours on a fast CPU. -s all Runs 100-iteration self-tests of all test Mersenne exponents and all FFT radix sets. -s a This will take several hours on a fast CPU. -fftlen {+int} If {+int} is one of the available FFT lengths (in Kilodoubles), runs all all available FFT radices available at that length, unless the -radset flag is invoked (see below for details). If -fftlen is invoked without the -iters flag, it is assumed the user wishes to do a production run with a non-default FFT length, In this case the program requires a valid worktodo.ini-file entry with exponent not more than 5% larger than the default maximum for that FFT length. If -fftlen is invoked with a user-supplied value of -iters but without a user-supplied exponent, the program will do the specified number of iterations using the default self-test Mersenne or Fermat exponent for that FFT length. If -fftlen is invoked with a user-supplied value of -iters and either the -m or -f flag and a user-supplied exponent, the program will do the specified number of iterations of either the Lucas-Lehmer test with starting value 4 (-m) or the Pe'pin test with starting value 3 (-f) on the user-specified modulus. In either of the latter 2 cases, the program will produce a cfg-file entry based on the timing results, assuming at least one radix set ran the specified #iters to completion without suffering a fatal error of some kind. Use this to find the optimal radix set for a single FFT length on your hardware. NOTE: IF YOU USE OTHER THAN THE DEFAULT MODULUS OR #ITERS FOR SUCH A SINGLE-FFT- LENGTH TIMING TEST, IT IS UP TO YOU TO MANUALLY VERIFY THAT THE RESIDUES OUTPUT MATCH FOR ALL FFT RADIX COMBINATIONS AND THE ROUNDOFF ERRORS ARE REASONABLE! -radset {int} Specific index of a set of complex FFT radices to use, based on the big select table in the function get_fft_radices(). Requires a supported value of -fftlen to also be specified, as well as a value of -iters for the timing test. -m [{+int}] Performs a Lucas-Lehmer primality test of the Mersenne number M(int) = 2^int - 1, where int must be an odd prime. If -iters is also invoked, this indicates a timing test. and requires suitable added arguments (-fftlen and, optionally, -radset) to be supplied. If the -fftlen option (and optionally -radset) is also invoked but -iters is not, the program first checks the first line of the worktodo.ini file to see if the assignment specified there is a Lucas-Lehmer test with the same exponent as specified via the -m argument. If so, the -fftlen argument is treated as a user override of the default FFT length for the exponent. If -radset is also invoked, this is similarly treated as a user- specified radix set for the user-set FFT length; otherwise the program will use the cfg file to select the radix set to be used for the user-forced FFT length. If the worktodo.ini file entry does not match the -m value, a set of timing self-tests is run on the user-specified Mersenne number using all sets of FFT radices available at the specified FFT length. If the -fftlen option is not invoked, the self-tests use all sets of FFT radices available at that exponent's default FFT length. Use this to find the optimal radix set for a single given Mersenne number exponent on your hardware, similarly to the -fftlen option. Performs as many iterations as specified via the -iters flag [required]. -f {int} Performs a base-3 Pe'pin test on the Fermat number F(num) = 2^(2^num) + 1. If desired this can be invoked together with the -fftlen option. as for the Mersenne-number self-tests (see notes about the -m flag; note that not all FFT lengths supported for -m are available for -f). Optimal radix sets and timings are written to a fermat.cfg file. Performs as many iterations as specified via the -iters flag [required]. -iters {int} Do {int} self-test iterations of the type determined by the modulus-related options (-s/-m = Lucas-Lehmer test iterations with initial seed 4, -f = Pe'pin-test squarings with initial seed 3. Top of reference tree: https://www.mersenneforum.org/showpo...22&postcount=1 Last fiddled with by kriesel on 2019-11-18 at 14:34
 2020-05-20, 19:07 #4 kriesel     "TF79LL86GIMPS96gpu17" Mar 2017 US midwest 10010111010112 Posts Mlucas install script for Linux Haven't tried it myself, but there's a post about one at https://mersenneforum.org/showpost.p...9&postcount=34 Top of reference tree: https://www.mersenneforum.org/showpo...22&postcount=1 Last fiddled with by kriesel on 2020-07-16 at 19:22
2020-11-19, 22:32   #5
kriesel

"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

29×167 Posts
Mlucas builds for Linux (or for running on WSL on Windows)

How I built Mlucas in WSL / Ubuntu 18.04 for multiple processor types
(rename the executable between builds to identify the flavor)
Note these are mostly untested.

basic x86-64, & presumably the best bet for Knight's Corner Xeon Phi:
Code:
gcc -c -O3 -DUSE_THREADS ../src/*.c >& build.log
grep error build.log
gcc -o Mlucas *.o -lm -lpthread -lrt
SSE2 such as Xeon x5650, e5645, E5-26xx
Code:
gcc -c -O3 -DUSE_SSE2 -DUSE_THREADS ../src/*.c >& build.log
grep error build.log
gcc -o Mlucas *.o -lm -lpthread -lrt
FMA3 such as i7-7500U, i7-8750H
Code:
gcc -c -O3 -DUSE_AVX2 -mavx2 -DUSE_THREADS ../src/*.c >& build.log
grep error build.log
gcc -o Mlucas *.o -lm -lpthread -lrt
AVX-512 such as Hydra (Knights Landing MIC) Xeon Phi 7250
Code:
gcc -c -O3 -DUSE_AVX512 -march=knl -DUSE_THREADS ../src/*.c >& build.log
grep error build.log
gcc -o Mlucas *.o -lm -lpthread -lrt
AVX-512 such as i5-1035G1
Code:
gcc -c -O3 -DUSE_AVX512 -march=skylake-avx512 -DUSE_THREADS ../src/*.c >& build.log
grep error build.log
gcc -o Mlucas *.o -lm -lpthread -lrt
The above are for linux multithreaded build/run environments. For Windows single-threaded end use see next post.

Attachments are Mlucas v19 builds intended for Linux and were built on Ubuntu v18.04 running on WSL / Win10 on an i7-8750H.

Top of reference tree: https://www.mersenneforum.org/showpo...22&postcount=1
Attached Files
 mlucas-avx512-knl-mt.tar.gz (1.84 MB, 12 views) mlucas-avx512-skylake-mt.tar.gz (1.82 MB, 9 views) mlucas-fma3-mt.tar.gz (1.86 MB, 16 views) mlucas-sse2-mt.tar.gz (1.70 MB, 16 views) mlucas-x86-mt.tar.gz (1.72 MB, 11 views)

Last fiddled with by kriesel on 2021-01-14 at 00:35 Reason: split post to separate Linux and Windows builds, minor edits

2020-11-27, 15:01   #6
kriesel

"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

29·167 Posts
Mlucas builds for Windows

Building for Windows in msys2 is similar to building for Linux or WSL, except:
How I built or attempted in msys2 for Windows single-threaded environments:

SSE2 such as Xeon x5650, e5645, E5-26xx
Code:
gcc -c -O3 -DUSE_SSE2 ../src/*.c >& build.log
grep error build.log
gcc -o Mlucas-sse2 *.o -lm -lrt
x86-64
Code:
gcc -c -O3 ../src/*.c >& build.log
grep error build.log
gcc -o Mlucas-x86 *.o -lm -lrt
FMA3 such as i7-7500U, i7-8750H
Code:
gcc -c -O3 -DUSE_AVX2 -mavx2 ../src/*.c >& build.log
grep error build.log
gcc -o Mlucas-fma3 *.o -lm -lrt
AVX512 such as i5-1035G1
Code:
gcc -c -O3 -DUSE_AVX512 -march=skylake-avx512 ../src/*.c >& build.log
grep error build.log
gcc -o Mlucas-avx512 *.o -lm -lrt

Attachments are single-threaded Mlucas v19 builds intended for Windows 7 or higher, and were built in msys2 running on Windows 7 Pro 64-bit on a dual-Xeon-E5645 HP Z600.

Top of reference tree: https://www.mersenneforum.org/showpo...22&postcount=1
Attached Files
 mlucas-x86.zip (1.73 MB, 9 views) mlucas-sse2.zip (1.75 MB, 8 views) mlucas-fma3.zip (1.84 MB, 9 views)

Last fiddled with by kriesel on 2020-11-27 at 15:05

 Similar Threads Thread Thread Starter Forum Replies Last Post kriesel kriesel 27 2021-01-13 23:25 kriesel kriesel 5 2020-07-02 01:30 kriesel kriesel 9 2020-05-28 23:32 kriesel kriesel 8 2020-04-17 03:50 kriesel kriesel 12 2019-08-12 15:51

All times are UTC. The time now is 17:25.

Mon Jan 18 17:25:24 UTC 2021 up 46 days, 13:36, 0 users, load averages: 1.79, 1.83, 1.74