mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Software > Mlucas

Reply
 
Thread Tools
Old 2021-08-31, 22:14   #1
ewmayer
2ω=0
 
ewmayer's Avatar
 
Sep 2002
República de California

1165810 Posts
Default Mlucas v20.1 (latest) available

This is an Update-release of v20, but with enough changes as to warrant a minor-version number increment. As always, download via the README page.

*** I urge users to delete (or rename) the mlucas.cfg file they are using for runs and run the self-tests using the v20.1 build to generate a fresh one, due to the v20 suboptimal-radix-set selection issue mentioned in the list below. ***

Changes include:
  • The help menu has been scrapped in favor of a help.txt file in the same top-level directory as makemake.sh and primenet.py.
  • Algorithmic improvements which yield a 10-20% faster p-1 stage 2. In my p-1 runs using the initial v20 release, the ratio between time-per-modmul in stage 2 vs stage 1 was in the 1.35-1.4 range. (We expect stage 2 modmuls to be somehwat slower than stage 1 because they FFT-convolve pairs of distinct inputs whereas stage 1 does auto-convolutions of a single input, but 1.4x is rather on the large side). The improved code yields a timing ratio in the 1.15-1.2 range.
  • A bug in the stage 2 "number of buffers available based on current RAM allocation" was allowing the difference of that value and the number of auxiliary-computation stage 2 buffers of 5 to drop below (signed int)0, which yielded nonsense when the result was stored in its target unsiged-int variable. (This led to the stage 2 code to try to allocate some 4-billion-plus number of buffers, resulting in an unable-to-alloc error-exit.) That is now fixed. Also, said number-buffers-available computation is now being done at the start of each stage 2, rather than just once at run-start.
  • A new command-line option '-pm1_s2_nbuf' allowing users to override the above runtime-auto-computation and directly set an upper bound on the number of stage 2 memory buffers used. The constraints on this are detailed in the help.txt file. For stage 2 restarts there is an added constraint related to small-prime relocation, namely that if stage 2 was begun with a multiple of 24 or 40 buffers, the restart-value must also be a multiple of the same base-count, 24 or 40. Said constraint will be automatically enforced. If the resulting buffer count exhausts available memory, performance will suffer due to system memory-swapping, thus this flag should only be invoked by uesrs who know what they are doing.
  • A fix for 2 bugs brought to my attention by Ken Kriesel:
    1. A suboptimal-radix-set selection bug in the self-testing;
    2. For p-1 factor-found cases, the JSON output written to results.txt was not wrapping the factor (currently there will be at most 1 factor printed, which in rare cases will be the product of 2 prime factors) in double-quotes, which was causing submission of the result via the online manual result-reporting page at mersenne.org to fail. As best I can tell, automated submissions using either the primenet.py script which ships with the Mlucas v20 release or the Dulcet/Connelly enhanced primenet.py script should be fine with or without the quotes, but users are encouraged to upgrade to v20.1 to gain the benefit of the faster stage 2.
  • A fix for a missing null-string-terminator bug in the p-1 assignment-splitting code brought to my attention by tdulcet, which was leading to the Test/PRP one of the resulting assignment pair to contain whatever chars the string buffer in question happened to be holding beyond the (missing) end of the Test/PRP assignment.
  • Reference-residues for 128-240M were incorrect, due to a hidden assumption in once piece of the residue-shift-handling code (which figures out where to inject the -2 of each LL-test iteration into the circularly-shifted residue) which amounted to assuming p < 231.
  • v20.1 raises the largest Mersenne number testable to match the longstanding Fermat-number limit, set by the maximum supported FFT length of 512M. (Note that exponents > 232, thus FFT lengths 256-512M, require '-shift 0' to run.) In practice, this translates to M(p) with p approaching 9 billion. Clearly, full-length primality tests of numbers this large are nowhere near practicable as of this writing, but such moduli can be useful for software and hardware parallel-scaling tests.
  • Miscellaneous additional minor bug- and pretty-print fixes.
As always, please subscribe to this thread (and unsubscribe from any older Mlucas-release threads) to be notified of any bug and patch reports.

Last fiddled with by ewmayer on 2021-09-02 at 20:38 Reason: primenet.org -> mersenne.org
ewmayer is offline   Reply With Quote
Old 2021-09-01, 22:18   #2
ewmayer
2ω=0
 
ewmayer's Avatar
 
Sep 2002
República de California

2·3·29·67 Posts
Default

Brief post illustrating how users who hit the v20 assignment-borkage-due-to-missing-string-terminator issue mentioned in the above list can manually patch up affected assignments, which is preferable to the code skipping them due to "unable to parse" reasons. Here the original example sent to me by tdulcet:
Code:
cat worktodo.ini
Pminus1=F3AC27E83049B4409813291299C836B3,1,2,113334787,-1,900000,32000000
Test=F3AC27E83049B4409813291299C836B3,113334787,76,1", "fft-length":5767168, "B1":900000, "factors":[188971360622975631014921], "program":{"name":"Mlucas", "version":"20.0"}, "timestamp":"2021-08-27 13:58:46 GMT", "aid":"74ECE80F64762AFE11E83B9818CF3A46"}
1
The program has split a Test= assignment ending in ",76,0", with the trailing 0 indicating "no p-1 has been done", into a p-1 assignment and the same Test= assignment, but then it tries to replace the railing 0 with a 1, so if the p-1 does not find a factor, things proceed to the LL-test, but that is now flagged as "p-1 has been done" so does not again get split into A P-1/Test pair. The problem is that my initial implementation of this failed to first insert a string-terminating null '\0' following the "...,76,". The same string buffer had in the meantime also been used to hold a JSON-output line for writing to results.txt, so the ensuing strcat() with "1\n" left all the JSON-line contents following the "76," and appended the "1\n" starting with the string-terminator for the JSON output, which ends with ...A46"}.

Long story short, if you end up with such a mangled Test= (or DoubleCheck=) assignment in your worktodo.ini file, delete everything following the "[TF bits]," and replace with a "1"; in the above example the fixed-up assignment would be
Code:
Pminus1=F3AC27E83049B4409813291299C836B3,1,2,113334787,-1,900000,32000000
Test=F3AC27E83049B4409813291299C836B3,113334787,76,1
(Note that the 1 following the ,76 in the mangled assignment was a coincidence, it was the rightmost digit of the found factor reported in the JSON output, 188971360622975631014921.)

For PRP assignments mangled similarly, note that the trailing-digit convention is different than for Test/DoubleCheck: for PRP assignments, the trailing digit represents "PRP tests saved if a p-1 factor is found", thus a p-1/PRP assignment pair mangled like this:
Code:
Pminus1=C57FF1C644A0CB16F5E2B5B3A9FC4E1D,1,2,98024161,-1,800000,29000000
PRP=C57FF1C644A0CB16F5E2B5B3A9FC4E1D,1,2,98024161,-1,77,[stuff leftover from previous write of char-buffer]0
gets demangled like so:
Code:
Pminus1=C57FF1C644A0CB16F5E2B5B3A9FC4E1D,1,2,98024161,-1,800000,29000000
PRP=C57FF1C644A0CB16F5E2B5B3A9FC4E1D,1,2,98024161,-1,77,0
with the trailing ",0" in the PRP= assignment being PRP version of "p-1 has been done".

Apologies for the fubar - all my tests of the assignment-splitting code happened to be under debugger, which nulls everything including char-buffers from one run to the next. (I.e. the debugger was providing the needed string-terminating null.)
ewmayer is offline   Reply With Quote
Old 2021-09-05, 02:47   #3
paulunderwood
 
paulunderwood's Avatar
 
Sep 2002
Database er0rr

3,863 Posts
Default factor found on a73

Yay, I found my first factor using the a73 of my Odroid N2: Found 77-digit factor in Stage 2: 126440940410782170073559 (of M105592247)

Clocks for stage 2 have gone from 00:25:14 (v20) to 00:23:39 (v20.1)

Last fiddled with by paulunderwood on 2021-09-05 at 02:52
paulunderwood is offline   Reply With Quote
Old 2021-09-05, 03:02   #4
ewmayer
2ω=0
 
ewmayer's Avatar
 
Sep 2002
República de California

2·3·29·67 Posts
Default

Quote:
Originally Posted by paulunderwood View Post
Yay, I found my first factor using the a73 of my Odroid N2: Found 77-digit factor in Stage 2: 126440940410782170073559 (of M105592247)

Clocks for stage 2 have gone from 00:25:14 (v20) to 00:23:39 (v20.1)
Congrats, but note that due to a small typo, "77 digits" means binary digits, a.k.a. bits. We can dream of finding a 77-decimal-digit monster, tho.

So only a 6-7% stage 2 speedup on Arm, vs the 15% I see on avx-512 on my Intel NUC - but still decent.
ewmayer is offline   Reply With Quote
Old 2021-09-17, 23:34   #5
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

5,783 Posts
Default

Quote:
Originally Posted by ewmayer View Post
Congrats, but note that due to a small typo, "77 digits" means binary digits, a.k.a. bits.
FYI I have V20.0 examples of that for both stage 1 and 2.
kriesel is offline   Reply With Quote
Old 2021-09-19, 13:52   #6
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

5,783 Posts
Default

I've updated the Mlucas reference thread somewhat, and added a list of the last several versions, with links to the corresponding threads, and a wish list, and a bug list.

A couple minor issues have been discussed in PM with Ernst but not appeared in Mlucas forum threads before now IIRC:

When there is a restart in P-1 stage 2, the following result record for P-1 stopped/restarted in stage 2 has 1970-01-01 midnight as time stamp, instead of the actual completion time.

P-1 factors found at a GCD early in stage 2 are reported as if they were found in stage 1, with only stage 1 bound given, omitting whatever the effective stage 2 bound was. (Gpuowl v7.x also does this.) This may be considered feature-absence rather than bug.

Last fiddled with by kriesel on 2021-09-19 at 14:00
kriesel is offline   Reply With Quote
Old 2021-09-19, 16:31   #7
axn
 
axn's Avatar
 
Jun 2003

19·271 Posts
Default

Quote:
Originally Posted by kriesel View Post
P-1 factors found at a GCD early in stage 2 are reported as if they were found in stage 1, with only stage 1 bound given, omitting whatever the effective stage 2 bound was. (Gpuowl v7.x also does this.) This may be considered feature-absence rather than bug.
Due to the way the primes are paired, some of the smallest stage 2 primes are paired with some of the largest. So, at no point (until the very end), there might be a bound such that all smaller primes have been handled in stage 2.
axn is online now   Reply With Quote
Old 2021-09-19, 17:39   #8
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

5,783 Posts
Default

Quote:
Originally Posted by axn View Post
Due to the way the primes are paired, some of the smallest stage 2 primes are paired with some of the largest. So, at no point (until the very end), there might be a bound such that all smaller primes have been handled in stage 2.
Understood. If it's smallest and largest paired and processed first, a modest B2 claim may be valid. Up to the "largest smallest" of the pairs that none got skipped. It's not simple, but I believe it's under consideration to implement that.
kriesel is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
Mlucas v20 available ewmayer Mlucas 9 2021-09-02 20:36
Mlucas v19.1 available ewmayer Mlucas 46 2021-07-06 19:40
Mlucas v19 available ewmayer Mlucas 89 2021-02-01 20:37
Mlucas v18 available ewmayer Mlucas 48 2019-11-28 02:53
mlucas on sun delta_t Mlucas 14 2007-10-04 05:45

All times are UTC. The time now is 09:30.


Thu Oct 21 09:30:21 UTC 2021 up 90 days, 3:59, 1 user, load averages: 1.14, 1.07, 1.05

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.