![]() |
![]() |
#1 |
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
11101011110012 Posts |
![]()
This is a reference thread specific to Ernst Mayer's mfactor program (not to be confused with Peter Montgomery's factor program). And if/when it matters, to the CPU-oriented builds of it. Please comment in the reference material discussion thread, not here. (Posts here may be incorporated with attribution, moved, or removed without recourse.)
In most cases, GIMPS trial factoring should be performed on GPUs using mfaktc or mfakto, or for special purposes, special programs such as mmff on NVIDIA GPUs. Mfactor comes into the picture for special cases they won't handle, such as trial factoring Mersenne numbers beyond those programs' limits. This whole thread is a draft in progress and some posts may be mostly a placeholder at the moment or in portions. Please note that Ernst has described this software as "experimental" and "unsupported". Expect some rough edges. There's a forum thread about Mfactor here, which includes links to previous threads. Mfactor does TF on CPU. Limits depend on executable type, compilation to support some number of words width, and CPU type. (There was also a GPU capability. I've not compiled that, as it had lower performance and no higher exponent coverage than mfakto or mfaktc. Use them instead.) See the bits table attached. (Getting started section work in progress) Initial install I usually set up a Windows subfolder and desktop shortcut early, to make testing out things easy. Tastes differ. This is what I typically use, everything inside the "", adapt to your tastes and system directory structure etc: Shortcut name: "cmd in mfactor" Target: "C:\Windows\System32\cmd.exe /k" Start in directory: "C:\Users\kriesel\Documents\mfactor" Download a suitable program and put it in that directory. If on Windows, it also needs libwinpthread-1.dll, even though it's single-threaded, because of how it was compiled. See second attachment on this post. I usually start by a simple run to emit a help file. Note Ernst has disavowed Mfactor's help accuracy. View its content as hints to what may work, not promises. In particular, checkpoint files are disabled since the addition of multithreading in the code. Also note it sometimes gets confused about the OS; this was run on Windows 10 Home x64, but built on Win 7 using msys2 which supports Linux style build tools on the Windows OS. Mfactor and Mlucas mistake that build environment for Linux. Code:
mfactor-base-1w -h >help.txt Code:
mfactor v2020-03-05 INFO: testing qfloat routines... CPU Family = x86_64, OS = Linux, 64-bit Version, compiled with Gnu C [or other compatible], Version 8.2.0. INFO: CPU supports SSE2 instruction set, but using scalar floating-point build. INFO: Using inline-macro form of MUL_LOHI64. INFO: MLUCAS_PATH is set to "" Setting DAT_BITS = 10, PAD_BITS = 2 INFO: testing IMUL routines... Mfactor command line options ... <CR> Default mode: prompts for manual keyboard entry -h Prints this help file and exits -m {num} Trial-factor the Mersenne number M(num) = 2^num - 1. -mm {num} Trial-factor the double-Mersenne number M(M(num)) = 2^(2^num) - 1. -f {num} Trial-factor the Fermat number F(num) = 2^(2^num) + 1. -file {string} Name of checkpoint file (needed for restart-from-interrupt) -bmin {num} Log2(minimum factor to try), in floating double form. If > 10^9 its whole-number part is taken as the kmin value instead. -bmax {num} Log2(maximum factor to try), in floating double form. If > 10^9 its whole-number part is taken as the kmax value instead. -kmin {num} Lowest factor K value to be tried in each pass ( > 0). -kmax {num} Highest factor K value to be tried in each pass ( < 2^64). -passmin {num} Current factoring pass (0-15). -passmax {num} Maximum pass for the run (0-15). Test for basic operation: If building yourself, or downloading someone else's build, test the resulting build(s) such as by finding the small known factors of MM31. If unfamiliar with Mfactor, try it out on something simple and fast, such as the following. A simple example: command line Code:
mfactor-base-1w -bmin 1 -bmax 48 -mm 31 Code:
mfactor v2020-03-05 INFO: testing qfloat routines... CPU Family = x86_64, OS = Linux, 64-bit Version, compiled with Gnu C [or other compatible], Version 8.2.0. INFO: CPU supports SSE2 instruction set, but using scalar floating-point build. INFO: Using inline-macro form of MUL_LOHI64. 'printf' is not recognized as an internal or external command, operable program or batch file. INFO: MLUCAS_PATH is set to "" INFO: using 64-bit-significand form of floating-double rounding constant for scalar-mode DNINT emulation. Setting DAT_BITS = 10, PAD_BITS = 2 INFO: testing IMUL routines... Mfactor build flags: TRYQ = 4 NUM_SIEVING_PRIME = 100000 TF_CLASSES = 60 MULH64_FAST = true FACTOR_STANDALONE = true NOBRANCH = true USE_128x96 = 1 Mfactor self-tests: Apr2015 mi64_div quicktest passes. mi64_div quicktest passes. Base-2 PRP test of M127 passed: Time = 00:00:00.000 Base-2 PRP test of M607 passed: Time = 00:00:00.000 Base-3 PRP test of M607 passed: Time = 00:00:00.002 Base-2 PRP test of M4423 passed: Time = 00:00:00.078 Base-3 PRP test of M4423 passed: Time = 00:00:00.318 Testing 64-bit Fermat factors... Testing 128-bit Fermat factors... Testing 192-bit Fermat factors... Testing 256-bit Fermat factors... Testing > 256-bit Fermat factors... Testing 63-bit factors... Testing 64-bit factors... Testing 65-bit factors... Testing 96-bit factors... Factoring self-tests completed successfully. p mod 60 = 7 INFO: No factoring savefile t31 found ... starting from scratch. Allocated 255255 words in master template, 4255 in per-pass bit_map [16 x that in bit_atlas] Generating difference table of first 100000 small primes Using first 100000 odd primes; max gap = 114 max sieving prime = 1299721 Searching in the interval k=[0, 16336320], i.e. q=[1.000000e+000, 7.016396e+016] Each of 16 (p mod 60) passes will consist of 1 intervals of length 272272 2949120 ones bits of 16336320 in master sieve template. TRYQ = 4, max sieving prime = 1299721 Time to set up sieve = 00:00:00.044 pass = 0 pass = 1 pass = 2 pass = 3 pass = 4 pass = 5 pass = 6 pass = 7 pass = 8 pass = 9 pass = 10 pass = 11 Factor with k = 68745. This factor is a probable prime. pass = 12 pass = 13 pass = 14 pass = 15 MM(31) has 1 factors in range k = [0, 16336320], passes 0-15 Performed 657696 trial divides Clocks = 00:00:01.302 Code:
Searching in the interval k=[0, 16336320], i.e. q=[1.000000e+000, 7.016396e+016] Each of 16 (p mod 60) passes will consist of 1 intervals of length 272272 Factor with k = 68745. This factor is a probable prime. M(31) has 1 factors in range k = [0, 16336320], passes 0-15 Code:
mfactor-base-1w -bmin 58 -bmax 60 -m 2147483647 Code:
Searching in the interval k=[65345280, 277717440], i.e. q=[2.806558e+017, 1.192787e+018] Each of 16 (p mod 60) passes will consist of 13 intervals of length 272272 M(2147483647) has 0 factors in range k = [65345280, 277717440], passes 0-15 Let's suppose we wanted to run one more bit level for M9999999943. That's nearly 1010, too big for mfaktc or mfakto because the exponent is > 2^32. Normally factoring such large exponents is discouraged. But it will be used here because large exponents and low bit levels make fast examples. Look up the exponent on mersenne.ca: https://www.mersenne.ca/exponent/9999999943 shows it has been trial factored up to 71 bits. We want to run the entire bit level, from 71 to 72 bits. (Reserve the assignment to avoid colliding with someone else that's following the rules) Check the bits table to determine the smallest-word-length that's eligible, because that will be fastest. The exponent fits in one word. The max bit level is also well within the one-word capability. The max k can be determined from f=2kp+1 ~ 272. k=floor((f-1)/2/p), in this case 236,118,325,489 which is ~237.78<264. We have a winner. For conceptual simplicity, let's run a 16-pass single threaded version, and the base x86_64 that ought run on any modern Intel 64-bit CPU. Then the command line for Mfactor is Code:
mfactor-base-1w.exe -m 9999999943 -bmin 71 -bmax 72 Code:
mfactor v2020-03-05 INFO: testing qfloat routines... CPU Family = x86_64, OS = Linux, 64-bit Version, compiled with Gnu C [or other compatible], Version 8.2.0. INFO: CPU supports SSE2 instruction set, but using scalar floating-point build. INFO: Using inline-macro form of MUL_LOHI64. 'printf' is not recognized as an internal or external command, operable program or batch file. INFO: MLUCAS_PATH is set to "" INFO: using 64-bit-significand form of floating-double rounding constant for scalar-mode DNINT emulation. Setting DAT_BITS = 10, PAD_BITS = 2 INFO: testing IMUL routines... Mfactor build flags: TRYQ = 4 NUM_SIEVING_PRIME = 100000 TF_CLASSES = 60 MULH64_FAST = true FACTOR_STANDALONE = true NOBRANCH = true USE_128x96 = 1 Mfactor self-tests: Apr2015 mi64_div quicktest passes. mi64_div quicktest passes. Base-2 PRP test of M127 passed: Time = 00:00:00.000 Base-2 PRP test of M607 passed: Time = 00:00:00.000 Base-3 PRP test of M607 passed: Time = 00:00:00.002 Base-2 PRP test of M4423 passed: Time = 00:00:00.067 Base-3 PRP test of M4423 passed: Time = 00:00:00.232 Testing 64-bit Fermat factors... Testing 128-bit Fermat factors... Testing 192-bit Fermat factors... Testing 256-bit Fermat factors... Testing > 256-bit Fermat factors... Testing 63-bit factors... Testing 64-bit factors... Testing 65-bit factors... Testing 96-bit factors... Factoring self-tests completed successfully. INFO: No factoring savefile t9999999943 found ... starting from scratch. Allocated 255255 words in master template, 4255 in per-pass bit_map [16 x that in bit_atlas] Generating difference table of first 100000 small primes Using first 100000 odd primes; max gap = 114 max sieving prime = 1299721 Searching in the interval k=[118046248320, 236125169280], i.e. q=[2.360925e+021, 4.722503e+021] Each of 16 (p mod 60) passes will consist of 7228 intervals of length 272272 2949120 ones bits of 16336320 in master sieve template. TRYQ = 4, max sieving prime = 1299721 Time to set up sieve = 00:00:00.049 pass = 0.............[k = 144999333545].............[k = 171951897305].............[k = 198904175645].............[k = 225859273565].... pass = 1.............[k = 145002567128].............[k = 171956644088].............[k = 198908961008].............[k = 225862152428].... pass = 2.............[k = 145000796832].............[k = 171952301052].............[k = 198906699432].............[k = 225857822232].... pass = 3.............[k = 145001975837].............[k = 171958146797].............[k = 198910531997].............[k = 225866400137].... pass = 4.............[k = 144998753900].............[k = 171950858120].............[k = 198903586280].............[k = 225856999700].... pass = 5.............[k = 144997601721].............[k = 171951291861].............[k = 198906318861].............[k = 225859767621].... pass = 6.............[k = 145001968652].............[k = 171957272552].............[k = 198908452292].............[k = 225862681712].... pass = 7.............[k = 145000102653].............[k = 171952697133].............[k = 198907395813].............[k = 225862060713].... pass = 8.............[k = 145000255236].............[k = 171956548476].............[k = 198911198436].............[k = 225865900656].... pass = 9.............[k = 145001376821].............[k = 171956349581].............[k = 198910664141].............[k = 225861936521].... pass = 10.............[k = 145000559625].............[k = 171956240805].............[k = 198908430525].............[k = 225860873745].... pass = 11.............[k = 145001525568].............[k = 171954862548].............[k = 198907540188].............[k = 225860573928].... pass = 12.............[k = 145002299813].............[k = 171955457753].............[k = 198907208213].............[k = 225862589693].... pass = 13.............[k = 145000528136].............[k = 171954980936].............[k = 198909000656].............[k = 225863660096].... pass = 14.............[k = 145001672757].............[k = 171953367597].............[k = 198906481317].............[k = 225859766157].... pass = 15.............[k = 144999595200].............[k = 171951504420].............[k = 198905326260].............[k = 225859744080].... M(9999999943) has 0 factors in range k = [118046248320, 236125169280], passes 0-15 Performed 4703862502 trial divides Clocks = 00:33:14.472 Verifying a known factor Ok, suppose we wanted to just verify one of the factors found, on that exponent, using TF in Mfactor. Either confirming the factor, or testing mfactor. We could run the whole bit level containing the factor. But that can take a long time. We can instead of specifying min and max bit levels with -bmin and -bmax, specify k min and max values with -kmin and -kmax. Let's try to verify the 67 bit factor which has k=8408497208. Even if we specify kmin and kmax very close together, IIRC it will run a range corresponding to the size of Mfactor's small-primes sieve, 16336320. This is pretty quick, but not nearly as fast as the server's verification. Code:
mfactor v2020-03-05 INFO: testing qfloat routines... CPU Family = x86_64, OS = Linux, 64-bit Version, compiled with Gnu C [or other compatible], Version 8.2.0. INFO: CPU supports SSE2 instruction set, but using scalar floating-point build. INFO: Using inline-macro form of MUL_LOHI64. 'printf' is not recognized as an internal or external command, operable program or batch file. INFO: MLUCAS_PATH is set to "" INFO: using 64-bit-significand form of floating-double rounding constant for scalar-mode DNINT emulation. Setting DAT_BITS = 10, PAD_BITS = 2 INFO: testing IMUL routines... Mfactor build flags: TRYQ = 4 NUM_SIEVING_PRIME = 100000 TF_CLASSES = 60 MULH64_FAST = true FACTOR_STANDALONE = true NOBRANCH = true USE_128x96 = 1 Mfactor self-tests: Apr2015 mi64_div quicktest passes. mi64_div quicktest passes. Base-2 PRP test of M127 passed: Time = 00:00:00.000 Base-2 PRP test of M607 passed: Time = 00:00:00.000 Base-3 PRP test of M607 passed: Time = 00:00:00.002 Base-2 PRP test of M4423 passed: Time = 00:00:00.077 Base-3 PRP test of M4423 passed: Time = 00:00:00.240 Testing 64-bit Fermat factors... Testing 128-bit Fermat factors... Testing 192-bit Fermat factors... Testing 256-bit Fermat factors... Testing > 256-bit Fermat factors... Testing 63-bit factors... Testing 64-bit factors... Testing 65-bit factors... Testing 96-bit factors... Factoring self-tests completed successfully. INFO: No factoring savefile t9999999943 found ... starting from scratch. Allocated 255255 words in master template, 4255 in per-pass bit_map [16 x that in bit_atlas] Generating difference table of first 100000 small primes Using first 100000 odd primes; max gap = 114 max sieving prime = 1299721 Searching in the interval k=[8396868480, 8413204800], i.e. q=[1.679374e+020, 1.682641e+020] Each of 16 (p mod 60) passes will consist of 1 intervals of length 272272 2949120 ones bits of 16336320 in master sieve template. TRYQ = 4, max sieving prime = 1299721 Time to set up sieve = 00:00:00.043 pass = 0 pass = 1 Factor with k = 8408497208. This factor is a probable prime. pass = 2 pass = 3 pass = 4 pass = 5 pass = 6 pass = 7 pass = 8 pass = 9 pass = 10 pass = 11 pass = 12 pass = 13 pass = 14 pass = 15 M(9999999943) has 1 factors in range k = [8396868480, 8413204800], passes 0-15 Performed 650708 trial divides Clocks = 00:00:01.118 We could also consider trying verifying both factors, using the product of the 2 k's. k = 8408497208 * 41901698172 = 352330312089720703776. That is over 68 bits. Still fits in 1-word executable's exponent and bits limits. But there's a third limit, on k, of 64 bits; Code:
mfactor-base-1w.exe -m 9999999943 -kmin 352330312089720703775 -kmax 352330312089720703777 So in this case, verify the factors separately. Code:
mfactor-base-1w.exe -m 9999999943 -kmin 41901698171 -kmax 41901698173 Each process will make entries in results.txt upon completion. It is up to the user to examine them and determine whether a factor was found by any of the processes. Linux multithreaded ...more someday... ) Some Mfactor notes:
Naming convention is as follows, for the builds posted in this thread: Mfactor-<arch>-<x>w[-tfc][-mt], where: <arch> is base, for any x86-64, ... <x> is number of words, or variable if n is present; if -tfc is present, it's the 960-pass out of 4620 classes variant, otherwise it's 16-pass out of 60 classes; if -mt is present, it's a multithreaded build, otherwise it's single-threaded. Table of contents for Mfactor-specific thread (this thread)
For more background, see https://www.mersenneforum.org/showthread.php?t=25009 and other content available from links there Top of reference tree: https://www.mersenneforum.org/showpo...22&postcount=1 Last fiddled with by kriesel on 2022-04-20 at 02:12 Reason: tyopfix |
![]() |
![]() |
![]() |
#2 |
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
3·5·503 Posts |
![]()
These were built using msys2 on a Windows 7 X64 Pro dual-Xeon E5645 system. They are single threaded because that's all that build approach supports.
Differing number of words allows for fast runs on small operands, and for bigger factors and exponents. See the mfactor bits table attached to post one of this thread. These were built for the common base of 64-bit Intel compatible cpus, not the higher SSE2, AVX, AVX2, or AVX512 flavors, so should run regardless of processor model. (Those higher processor capabilities are only supported for a subset of word lengths, as shown in the bits table attachment of post 1, but would give higher performance where supported.) After renaming factor.c.txt to factor.c, these were built by the following: gcc -c -Os ../get*.c && rm get_preferred_fft_radix.o gcc -c -Os ../imul_macro.c ../mi64.c ../qfloat.c ../rng_isaac.c ../two*c ../types.c ../util.c gcc -c -Os -DFACTOR_STANDALONE -DTRYQ=4 ../factor.c ../get_cpuid.c gcc -o Mfactor-base-1w *o -lm gcc -c -Os -DFACTOR_STANDALONE -DTRYQ=4 -DP2WORD ../factor.c gcc -o Mfactor-base-2w *o -lm gcc -c -Os -DFACTOR_STANDALONE -DTRYQ=4 -DP3WORD ../factor.c gcc -o Mfactor-base-3w *o -lm gcc -c -Os -DFACTOR_STANDALONE -DTRYQ=4 -DP4WORD ../factor.c gcc -o Mfactor-base-4w *o -lm gcc -c -Os -DFACTOR_STANDALONE -DTRYQ=4 -DNWORD ../factor.c gcc -o Mfactor-base-nw *o -lm I believe based on comparing file dates and release dates these were created from source files released with Mlucas V19.0. Top of reference tree: https://www.mersenneforum.org/showpo...22&postcount=1 Last fiddled with by kriesel on 2021-09-19 at 20:42 Reason: cpu flavors |
![]() |
![]() |
![]() |
#3 |
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
3×5×503 Posts |
![]()
These were built using msys2 on a Windows 7 X64 Pro dual-Xeon E5645 system. They are single threaded because that's all that build approach supports. The higher pass count is somewhat more efficient in avoiding composite factor candidates.
It also allows more flexibility in number of processes run in parallel if doing that. See post 5 for batch files for running in parallel. Differing number of words allows for fast runs on small operands, and for bigger factors and exponents. See the mfactor bits table attached to post one of this thread. These were built for the common base of 64-bit Intel compatible cpus, not the higher SSE2, AVX, AVX2, or AVX512 flavors, so should run regardless of processor model. (Those higher processor capabilities are only supported for a subset of word lengths, as shown in the bits table attachment of post 1, but would give higher performance where supported.) After renaming factor.c.txt to factor.c, and building the 16-pass, which already compiled some needed modules, these were built by the following: rem large number of passes builds, for better sieving, finer pass granularity, better manycore multithreading gcc -c -Os -DFACTOR_STANDALONE -DTRYQ=4 -DTF_CLASSES=4620 ../factor.c ../get_cpuid.c gcc -o Mfactor-base-1w-tfc *o -lm gcc -c -Os -DFACTOR_STANDALONE -DTRYQ=4 -DTF_CLASSES=4620 -DP2WORD ../factor.c gcc -o Mfactor-base-2w-tfc *o -lm gcc -c -Os -DFACTOR_STANDALONE -DTRYQ=4 -DTF_CLASSES=4620 -DP3WORD ../factor.c gcc -o Mfactor-base-3w-tfc *o -lm gcc -c -Os -DFACTOR_STANDALONE -DTRYQ=4 -DTF_CLASSES=4620 -DP4WORD ../factor.c gcc -o Mfactor-base-4w-tfc *o -lm gcc -c -Os -DFACTOR_STANDALONE -DTRYQ=4 -DTF_CLASSES=4620 -DNWORD ../factor.c gcc -o Mfactor-base-nw-tfc *o -lm These may be very useful for long tasks on manycore systems (dual-Xeons, Xeon Phi). However, for small tasks, they may be slower. Case in point: On condorella dual e5645 Win 7 X64 Pro It seems there's a considerable overhead disadvantage to many classes at small exponent and bit level 60: M(2147483647) has 3 factors in range k = [0, 68726898240], passes 0-15 Performed 2740062501 trial divides Clocks = 00:19:47.068 Clocks = 00:19:47.068 = 1187.068 seconds. 4620: M(2147483647) has 3 factors in range k = [0, 69004615680], passes 0-959 Performed 2751128805 trial divides Clocks = 00:23:34.701 Clocks = 00:23:34.701 = 1414.701 seconds =1.19176 times that of the 60-classes 16-passes timing I believe based on comparing file dates and release dates these were created from source files released with Mlucas V19.0. Top of reference tree: https://www.mersenneforum.org/showpo...22&postcount=1 Last fiddled with by kriesel on 2021-09-19 at 20:43 Reason: version info |
![]() |
![]() |
![]() |
#4 |
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
1D7916 Posts |
![]()
These were built on Ubuntu 18.04 / WSL on a Windows 10 Pro x64 i7-8750H system. See the mfactor bits table attached to post one of this thread. See also for an indication of what is possible on Knight's Landing and many threads on Linux, https://www.mersenneforum.org/showpo...&postcount=165
I believe based on comparing file dates and release dates these were created from source files released with Mlucas V19.0. Build process for V19.0 was something like the following, for the base x64 1word single-thread build: wget https://www.mersenneforum.org/mayer/...mlucas_v19.txz or https://www.mersenneforum.org/mayer/...lucas_v19.tbz2 tar (some options) filename to unzip mv factor.c.txt factor.c mkdir ./obj_mfac cd ./obj_mfac gcc -c -Os ../get*.c && rm get_preferred_fft_radix.o gcc -c -Os ../imul_macro.c ../mi64.c ../qfloat.c ../rng_isaac.c ../two*c ../types.c ../util.c gcc -c -Os -DFACTOR_STANDALONE -DTRYQ=4 ../factor.c ../get_cpuid.c gcc -o Mfactor-base-1w *o -lm For 2-word, many factor classes, multithreaded, after the preceding, something like: gcc -c -Os -DUSE_THREADS -DFACTOR_STANDALONE -DTRYQ=4 -DTF_CLASSES=4620 -DP2WORD ../factor.c ../get_cpuid.c gcc -c -Os -DUSE_THREADS ../threadpool.c ../util.c gcc -o Mfactor-base-2w-tfc-mt *o -lm -lpthread A test run of the 1w executable, on Ubuntu 18.04/WSL1/Win10HomeX64, i7-8750H CPU: Code:
./Mfactor-base-1w -mm 31 -bmin 1 -bmax 48 INFO: testing qfloat routines... CPU Family = x86_64, OS = Linux, 64-bit Version, compiled with Gnu C [or other compatible], Version 7.4.0. INFO: CPU supports SSE2 instruction set, but using scalar floating-point build. INFO: Using inline-macro form of MUL_LOHI64. INFO: MLUCAS_PATH is set to "" INFO: using 64-bit-significand form of floating-double rounding constant for scalar-mode DNINT emulation. Setting DAT_BITS = 10, PAD_BITS = 2 INFO: testing IMUL routines... Mfactor build flags: TRYQ = 4 NUM_SIEVING_PRIME = 100000 TF_CLASSES = 60 MULH64_FAST = true FACTOR_STANDALONE = true NOBRANCH = true USE_128x96 = 1 Mfactor self-tests: Apr2015 mi64_div quicktest passes. mi64_div quicktest passes. Base-2 PRP test of M127 passed: Time = 00:00:00.000 Base-2 PRP test of M607 passed: Time = 00:00:00.000 Base-3 PRP test of M607 passed: Time = 00:00:00.000 Base-2 PRP test of M4423 passed: Time = 00:00:00.093 Base-3 PRP test of M4423 passed: Time = 00:00:00.375 Testing 64-bit Fermat factors... Testing 128-bit Fermat factors... Testing 192-bit Fermat factors... Testing 256-bit Fermat factors... Testing > 256-bit Fermat factors... Testing 63-bit factors... Testing 64-bit factors... Testing 65-bit factors... Testing 96-bit factors... Factoring self-tests completed successfully. p mod 60 = 7 INFO: Will write savefile t31 every 2^28 = 268435456 factor candidates tried. INFO: No factoring savefile t31 found ... starting from scratch. Allocated 255255 words in master template, 4255 in per-pass bit_map [16 x that in bit_atlas] Generating difference table of first 100000 small primes Using first 100000 odd primes; max gap = 114 max sieving prime = 1299721 Searching in the interval k=[0, 16336320], i.e. q=[1.000000e+00, 7.016396e+16] Each of 16 (p mod 60) passes will consist of 1 intervals of length 272272 2949120 ones bits of 16336320 in master sieve template. TRYQ = 4, max sieving prime = 1299721 Time to set up sieve = 00:00:00.078 pass = 0 pass = 1 pass = 2 pass = 3 pass = 4 pass = 5 pass = 6 pass = 7 pass = 8 pass = 9 pass = 10 pass = 11 Factor with k = 68745. This factor is a probable prime. pass = 12 pass = 13 pass = 14 pass = 15 MM(31) has 1 factors in range k = [0, 16336320], passes 0-15 Performed 657696 trial divides Clocks = 00:00:01.390 1) -nthread 2 or more, or omitted, which defaults to -nthread <# of hyperthreads>, fails. Only -nthread 1 worked. That may be due to an error in my build sequence for multithreaded compiles. Code:
./Mfactor-base-2w-tfc-mt -mm 31 -bmin 1 -bmax 48 -nthread 2 INFO: testing qfloat routines... CPU Family = x86_64, OS = Linux, 64-bit Version, compiled with Gnu C [or other compatible], Version 7.4.0. INFO: CPU supports SSE2 instruction set, but using scalar floating-point build. INFO: Using inline-macro form of MUL_LOHI64. INFO: MLUCAS_PATH is set to "" INFO: using 64-bit-significand form of floating-double rounding constant for scalar-mode DNINT emulation. Setting DAT_BITS = 10, PAD_BITS = 2 INFO: testing IMUL routines... INFO: System has 12 available processor cores. NTHREADS = 2 Set affinity for the following 2 cores: 0.1. Factor.c: Init threadpool of 2 threads twopmodq96_q4: Setting up for as many as 6 threads... ERROR: at line 1092 of file ../twopmodq80.c Assertion failed: Multithreading currently only supported for SIMD builds! Code:
/Mfactor-base-2w-tfc-mt -mm 31 -bmin 1 -bmax 48 -nthread 6 INFO: testing qfloat routines... CPU Family = x86_64, OS = Linux, 64-bit Version, compiled with Gnu C [or other compatible], Version 7.4.0. INFO: CPU supports SSE2 instruction set, but using scalar floating-point build. INFO: Using inline-macro form of MUL_LOHI64. INFO: MLUCAS_PATH is set to "" INFO: using 64-bit-significand form of floating-double rounding constant for scalar-mode DNINT emulation. Setting DAT_BITS = 10, PAD_BITS = 2 INFO: testing IMUL routines... INFO: System has 12 available processor cores. NTHREADS = 6 Set affinity for the following 6 cores: 0.1.2.3.4.5. Factor.c: Init threadpool of 6 threads twopmodq96_q4: Setting up for as many as 6 threads... ERROR: at line 1092 of file ../twopmodq80.c Assertion failed: Multithreading currently only supported for SIMD builds! Code:
./Mfactor-base-2w-tfc-mt -mm 31 -bmin 1 -bmax 48 -nthread 7 INFO: testing qfloat routines... CPU Family = x86_64, OS = Linux, 64-bit Version, compiled with Gnu C [or other compatible], Version 7.4.0. INFO: CPU supports SSE2 instruction set, but using scalar floating-point build. INFO: Using inline-macro form of MUL_LOHI64. INFO: MLUCAS_PATH is set to "" INFO: using 64-bit-significand form of floating-double rounding constant for scalar-mode DNINT emulation. Setting DAT_BITS = 10, PAD_BITS = 2 INFO: testing IMUL routines... INFO: System has 12 available processor cores. NTHREADS = 7 Set affinity for the following 7 cores: 0.1.2.3.4.5.6. Factor.c: Init threadpool of 7 threads twopmodq96_q4: Setting up for as many as 6 threads... ERROR: at line 482 of file ../twopmodq96.c Assertion failed: Multithreading requires max_threads >= NTHREADS Code:
./Mfactor-base-2w-tfc-mt -mm 31 -bmin 1 -bmax 48 INFO: testing qfloat routines... CPU Family = x86_64, OS = Linux, 64-bit Version, compiled with Gnu C [or other compatible], Version 7.4.0. INFO: CPU supports SSE2 instruction set, but using scalar floating-point build. INFO: Using inline-macro form of MUL_LOHI64. INFO: MLUCAS_PATH is set to "" INFO: using 64-bit-significand form of floating-double rounding constant for scalar-mode DNINT emulation. Setting DAT_BITS = 10, PAD_BITS = 2 INFO: testing IMUL routines... INFO: System has 12 available processor cores. NTHREADS = 12 Set affinity for the following 12 cores: 0.1.2.3.4.5.6.7.8.9.10.11. Factor.c: Init threadpool of 12 threads twopmodq96_q4: Setting up for as many as 6 threads... ERROR: at line 482 of file ../twopmodq96.c Assertion failed: Multithreading requires max_threads >= NTHREADS Code:
./Mfactor-base-2w-tfc-mt -mm 31 -bmin 1 -bmax 48 -nthread 1 INFO: testing qfloat routines... CPU Family = x86_64, OS = Linux, 64-bit Version, compiled with Gnu C [or other compatible], Version 7.4.0. INFO: CPU supports SSE2 instruction set, but using scalar floating-point build. INFO: Using inline-macro form of MUL_LOHI64. INFO: MLUCAS_PATH is set to "" INFO: using 64-bit-significand form of floating-double rounding constant for scalar-mode DNINT emulation. Setting DAT_BITS = 10, PAD_BITS = 2 INFO: testing IMUL routines... INFO: System has 12 available processor cores. NTHREADS = 1 Set affinity for the following 1 cores: 0. Factor.c: Init threadpool of 1 threads twopmodq96_q4: Setting up for as many as 6 threads... *Mfactor build flags: TRYQ = 4 NUM_SIEVING_PRIME = 100000 TF_CLASSES = 4620 MULH64_FAST = true FACTOR_STANDALONE = true NOBRANCH = true USE_128x96 = 1 Mfactor self-tests: Apr2015 mi64_div quicktest passes. mi64_div quicktest passes. Base-2 PRP test of M127 passed: Time = 00:00:00.000 Base-2 PRP test of M607 passed: Time = 00:00:00.000 Base-3 PRP test of M607 passed: Time = 04:20:25.000 Base-2 PRP test of M4423 passed: Time = 39:03:45.000 Base-3 PRP test of M4423 passed: Time =151:54:35.000 Testing 64-bit Fermat factors... Testing 128-bit Fermat factors... Testing 192-bit Fermat factors... Testing 256-bit Fermat factors... Testing > 256-bit Fermat factors... Testing 63-bit factors... Testing 64-bit factors... Testing 65-bit factors... Testing 96-bit factors... Factoring self-tests completed successfully. p mod 4620 = 3727 p mod 4620 v2 = 1387 Warning: Differing (p % TF_CLASSES) values from Powering and direct-long-div! Proceeding using the 2nd result (1387). INFO: Will write savefile t31 every 2^28 = 268435456 factor candidates tried. INFO: No factoring savefile t31 found ... starting from scratch. Allocated 255255 words in master template, 3537 in per-pass bit_map [960 x that in bit_atlas] Generating difference table of first 100000 small primes Using first 100000 odd primes; max gap = 114 max sieving prime = 1299721 Searching in the interval k=[0, 1045524480], i.e. q=[1.000000e+00, 4.490493e+18] Each of 960 (p mod 4620) passes will consist of 1 intervals of length 226304 2949120 ones bits of 16336320 in master sieve template. TRYQ = 4, max sieving prime = 1299721 Time to set up sieve =789:55:50.000 INFO: 960 passes to do; bit_map has 3536 64-bit words. INFO: Doing 960 threadpool-waves of 1 pool threads each: Pass 0: ... Pass 217: Factor with k = 20269004. This factor is a probable prime. Pass 218: ... Pass 843: Factor with k = 68745. This factor is a probable prime. Pass 844: ... Pass 957: Pass 958: Pass 959: MM(31) has 2 factors in range k = [0, 1045524480], passes 0-959 Performed 41933177 trial divides Clocks =24431:25:25.000 Code:
time ./Mfactor-base-2w-tfc-mt -mm 31 -bmin 1 -bmax 48 -nthread 1 >2wmm31b48.txt INFO: using 64-bit-significand form of floating-double rounding constant for scalar-mode DNINT emulation. twopmodq96_q4: Setting up for as many as 6 threads... Apr2015 mi64_div quicktest passes. mi64_div quicktest passes. Searching in the interval k=[0, 1045524480], i.e. q=[1.000000e+00, 4.490493e+18] Each of 960 (p mod 4620) passes will consist of 1 intervals of length 226304 real 1m37.129s user 1m30.875s sys 0m0.234s 3) Did not handle exponents > 57 bits such as MM61 (while from the bits table, I'd expect up to 114), or higher than 96 for bmax, as if it is 1word, not 2word. This was apparently a build error. The executable has been built again and tested to not have that issue. Top of reference tree: https://www.mersenneforum.org/showpo...22&postcount=1 Last fiddled with by kriesel on 2021-09-19 at 20:46 Reason: version info, draft build process, 2word mt errors & replace |
![]() |
![]() |
![]() |
#5 |
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
3×5×503 Posts |
![]()
Launching separate processes for separate pass numbers with output redirection to pass-specific files permits using multiple cores on Windows. If there's a system crash before the processes complete their work, it's possible to resume each from roughly where it left off, by manually specifying beginning and ending k values. For large process counts, that can become tedious.
The following two paragraphs first appeared here: "Poor man's multithreading" is running multiple processes for the same bit level and exponent, with different passmin and passmax. For example, 4-way, to use 4 cores with an msys2 compiled image, passmin 0 passmin 3, passmin 4 passmax 7, passmin 8 passmax 11, passmin 12 passmax 15. This works well for powers of two passes per run. 1,2,4,8. If the build is done with -DTF_CLASSES=4620 for finer pass granularity, then passmin and passmax ranges become 0 to 959, 960 = 26 * 3 * 5 in number. This larger number of passes with numerous small factors allows for much more choice of degree of parallelism. 960 is a highly composite number: 1, 2, 3, 4, 5, 6, 8, 10, 12, 15, 16, 20, 24, 30, 32, 40, 48, 60, 64, 80, 96, 120, 160, 192, 240, 320, 480, 960 For brief runs there is no point to going to high degrees of parallelism, and -DTF_CLASSES seems to introduce higher overhead into a single run. For lengthy runs, the only way to get run times reasonable may be high degrees of parallelism. Using hyperthreading helps. These additional choices may better fit the available number of cpu cores and hyperthreads on a given system. Note that in the case of a system problem, automated update restart, power outage, etc., it can be unpleasant to have a large number of incomplete passes to deal with. A good UPS, stable up-to-date reliable system and well chosen number of parallel processes are recommended to minimize the size of the chore to continue from k values capture in log files. Or sacrifice some throughput and resume all the processes from the lowest maximum k value reached among all the processes being resumed, with a script to relaunch them all. Nevertheless, parallel processes can be powerful, when the run time is weeks even with 16-64 processes. Different hardware seems to behave differently. On dual-Xeon-E5-2697v2 (dual-12-core, 2-way hyperthreading), I've run 16 processes in parallel and seen only minor differences in duration among the parallel processes, and ~15% impact on prime95 throughput. On a Knights Landing Xeon Phi (which have 4-way hyperthreading and 64, 68 or 72 cores), with 64 Mfactor processes and 4 prime95 workers, I've seen the Mfactor processes vary significantly in run time (longest = 1.68 x shortest; 151.8 hours vs. 255+ for mfactor-base-2w-tfc -m 60651732991 -bmin 85 -bmax 86, 64 processes, with the OS assigning processes to the cores without user involvement, MCDRAM only), and the prime95 workers' impact varied greatly too, from ~10% to the highest numbered worker indicating more than 100% increase in primality test iteration time IIRC. The exponent and bit level entries in the attached Windows batch files are for illustration only. Please do not run them as is without coordination with me. They take too long to waste time by duplicating effort. There's no web or other server site known to me for coordinating work on such large exponents, other than perhaps posting messages somewhere on the forum. https://www.mersenneforum.org/showpo...04&postcount=5 gives an indication of how long one of the Mfactor runs took. For simplicity, or maybe I didn't think of it soon enough, the log files are output in the working directory, not one level lower. It's straightforward to create a folder for the exponent and final bit level, put the code there, and create all the run files there, then move the code to another for another run. But not required. Since the individual processes' log files are named according to exponent, starting pass number, and ending bit level, multiple bit levels or even exponents could be run in the same directory at the same time, without log file name collision. For example, running 1 process to do bit level x, 2 processes for x+1, 4 processes for x+2, 8 processes for x+3, which would all complete in about the same time, assuming there are enough hyperthreads available that each mfactor process gets its own register set. I strongly recommend starting with small process counts and small bit levels for small run times to familiarize with the program and batch script operation, and confirm run time projections before attempting higher bit levels or more complex runs. Run time of a bit level, when setup overhead is small compared to factoring time, ideally scales as 2bits / exponent / parallelprocesscount. Run times of weeks are easy to exceed. Top of reference tree: https://www.mersenneforum.org/showpo...22&postcount=1 Last fiddled with by kriesel on 2021-01-12 at 18:24 |
![]() |
![]() |
![]() |
#6 |
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
11101011110012 Posts |
![]()
Some things I'd like to see incorporated:
Reworking the save files for parallel thread run resume compatibility. See https://mersenneforum.org/showpost.p...7&postcount=27 Build recipe for v20.x Mfactor on Linux. I think it would go like this for the base X64 1word version: Code:
~/mlucas_v20.1/src$ mv factor.c.txt factor.c ~/mlucas_v20.1/src$ cd .. ~/mlucas_v20.1$ mkdir ./obj_mfac ~/mlucas_v20.1$ cd ./obj_mfac ~/mlucas_v20.1/obj_mfac$ gcc -c -Os ../src/get*.c && rm get_preferred_fft_radix.o ~/mlucas_v20.1/obj_mfac$ gcc -c -Os ../src/imul_macro.c ../src/mi64.c ../src/qfloat.c ../src/rng_isaac.c ../src/two*c ../src/types.c ../src/util.c ~/mlucas_v20.1/obj_mfac$ gcc -c -Os -DFACTOR_STANDALONE -DTRYQ=4 ../src/factor.c ~/mlucas_v20.1/obj_mfac$ gcc -o Mfactor *.o -lm A solution for building native multithreaded Windows versions. A solution for the KNL/WSL/Ubuntu core-hopping seen with Mlucas, probably applicable to Mfactor also. A solution for multithreading on base x64 instruction set. If such is compiled, it warns at run time with -nthread > 1 that it is not supported, and then terminates. Output ETAs Run logging, without the necessity of tee or redirection Top of reference tree: https://www.mersenneforum.org/showpo...22&postcount=1 Last fiddled with by kriesel on 2021-09-18 at 16:10 Reason: remove draft status, add ETAs, logging |
![]() |
![]() |
![]() |
#7 |
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
3×5×503 Posts |
![]()
This is an incomplete list of bugs/issues for an incomplete list of versions of Mfactor.
Versions prior to V19 have not been built or tested. V19 1)compiled under msys2 for Windows nevertheless thinks there will be a Linux shell capable of accepting certain commands. This results in each run producing the following error:V19.1 not built or tested V20.1 (and probably v20.0) produces multiple error messages and fails to compile factor.c to factor.o, so no link to produce an executable is possible. This is due to a change of type in some variables in some routines used also by Mlucas, from int to uint32, which were not also changed in factor.c. There are also errors involving too many or too few arguments. Code:
../src/factor.c:37:6: error: conflicting types for ‘NRADICES’ int NRADICES, RADIX_VEC[10]; /* RADIX_VEC[] stores sequence of complex FFT radices used. */ ^~~~~~~~ In file included from ../src/carry.h:29:0, from ../src/Mlucas.h:30, from ../src/factor.c:33: ../src/Mdata.h:450:15: note: previous declaration of ‘NRADICES’ was here extern uint32 NRADICES, RADIX_VEC[10]; /* NRADICES, RADIX_VEC[] store number & set of complex FFT radices used. */ ^~~~~~~~ ../src/factor.c:37:16: error: conflicting types for ‘RADIX_VEC’ int NRADICES, RADIX_VEC[10]; /* RADIX_VEC[] stores sequence of complex FFT radices used. */ ^~~~~~~~~ In file included from ../src/carry.h:29:0, from ../src/Mlucas.h:30, from ../src/factor.c:33: ../src/Mdata.h:450:25: note: previous declaration of ‘RADIX_VEC’ was here extern uint32 NRADICES, RADIX_VEC[10]; /* NRADICES, RADIX_VEC[] store number & set of complex FFT radices used. */ ^~~~~~~~~ ../src/factor.c: In function ‘main’: ../src/factor.c:1052:4: warning: #warning Building factor.c in unthreaded (i.e. single-main-thread) mode. [-Wcpp] #warning Building factor.c in unthreaded (i.e. single-main-thread) mode. ^~~~~~~ ../src/factor.c:1248:3: error: too few arguments to function ‘mi64_shrl’ mi64_shrl(p, q, findex, lenP); ^~~~~~~~~ In file included from ../src/Mdata.h:31:0, from ../src/carry.h:29, from ../src/Mlucas.h:30, from ../src/factor.c:33: ../src/mi64.h:202:12: note: declared here DEV uint64 mi64_shrl (const uint64 x[], uint64 y[], uint32 nshift, uint32 len, uint32 output_len); ^~~~~~~~~ ../src/factor.c:1266:4: error: too few arguments to function ‘mi64_shrl’ mi64_shrl(q, q, findex, lenP); ^~~~~~~~~ In file included from ../src/Mdata.h:31:0, from ../src/carry.h:29, from ../src/Mlucas.h:30, from ../src/factor.c:33: ../src/mi64.h:202:12: note: declared here DEV uint64 mi64_shrl (const uint64 x[], uint64 y[], uint32 nshift, uint32 len, uint32 output_len); ^~~~~~~~~ ../src/factor.c:2366:12: error: too many arguments to function ‘PerPass_tfSieve’ count += PerPass_tfSieve( ^~~~~~~~~~~~~~~ In file included from ../src/Mlucas.h:32:0, from ../src/factor.c:33: ../src/factor.h:316:9: note: declared here uint64 PerPass_tfSieve( ^~~~~~~~~~~~~~~ ../src/factor.c: At top level: ../src/factor.c:2537:9: error: conflicting types for ‘PerPass_tfSieve’ uint64 PerPass_tfSieve( ^~~~~~~~~~~~~~~ https://www.mersenneforum.org/mayer/...mlucas_v19.txz or https://www.mersenneforum.org/mayer/...lucas_v19.tbz2 Compiling V20.1 seems to require glibc v2.30 or higher, producing lots of warnings about the version otherwise. Ubuntu 18.04 has v2.27. One example: Code:
In file included from ../src/types.h:30:0, from ../src/align.h:29, from ../src/Mlucas.h:29, from ../src/get_fft_radices.c:23: ../src/platform.h:1263:3: warning: #warning GLIBC either not defined or version < 2.30 ... including <sys/sysctl.h> header. [-Wcpp] #warning GLIBC either not defined or version < 2.30 ... including <sys/sysctl.h> header. ^~~~~~~ Last fiddled with by kriesel on 2021-11-23 at 11:15 |
![]() |
![]() |
![]() |
Thread Tools | |
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Ernst's Mfactor program | kriesel | Software | 35 | 2022-08-31 23:32 |
Mlucas-specific reference thread | kriesel | kriesel | 20 | 2021-11-18 19:22 |
Running Mfactor | M0CZY | LMH > 100M | 2 | 2011-02-23 11:48 |
Mfactor sieve code thread | ewmayer | Operation Billion Digits | 27 | 2006-11-03 08:05 |
Is it possible to reserve a specific n-value for 2^n-1? | jasong | PrimeNet | 1 | 2006-09-21 00:10 |