mersenneforum.org Mlucas v20.1 available
 Register FAQ Search Today's Posts Mark Forums Read

 2021-08-31, 22:14 #1 ewmayer ∂2ω=0     Sep 2002 República de California 13·29·31 Posts Mlucas v20.1 available This is an Update-release of v20, but with enough changes as to warrant a minor-version number increment. As always, download via the README page. *** I urge users to delete (or rename) the mlucas.cfg file they are using for runs and run the self-tests using the v20.1 build to generate a fresh one, due to the v20 suboptimal-radix-set selection issue mentioned in the list below. *** Changes include: The help menu has been scrapped in favor of a help.txt file in the same top-level directory as makemake.sh and primenet.py. Algorithmic improvements which yield a 10-20% faster p-1 stage 2. In my p-1 runs using the initial v20 release, the ratio between time-per-modmul in stage 2 vs stage 1 was in the 1.35-1.4 range. (We expect stage 2 modmuls to be somehwat slower than stage 1 because they FFT-convolve pairs of distinct inputs whereas stage 1 does auto-convolutions of a single input, but 1.4x is rather on the large side). The improved code yields a timing ratio in the 1.15-1.2 range. A bug in the stage 2 "number of buffers available based on current RAM allocation" was allowing the difference of that value and the number of auxiliary-computation stage 2 buffers of 5 to drop below (signed int)0, which yielded nonsense when the result was stored in its target unsiged-int variable. (This led to the stage 2 code to try to allocate some 4-billion-plus number of buffers, resulting in an unable-to-alloc error-exit.) That is now fixed. Also, said number-buffers-available computation is now being done at the start of each stage 2, rather than just once at run-start. A new command-line option '-pm1_s2_nbuf' allowing users to override the above runtime-auto-computation and directly set an upper bound on the number of stage 2 memory buffers used. The constraints on this are detailed in the help.txt file. For stage 2 restarts there is an added constraint related to small-prime relocation, namely that if stage 2 was begun with a multiple of 24 or 40 buffers, the restart-value must also be a multiple of the same base-count, 24 or 40. Said constraint will be automatically enforced. If the resulting buffer count exhausts available memory, performance will suffer due to system memory-swapping, thus this flag should only be invoked by uesrs who know what they are doing. A fix for 2 bugs brought to my attention by Ken Kriesel: A suboptimal-radix-set selection bug in the self-testing; For p-1 factor-found cases, the JSON output written to results.txt was not wrapping the factor (currently there will be at most 1 factor printed, which in rare cases will be the product of 2 prime factors) in double-quotes, which was causing submission of the result via the online manual result-reporting page at mersenne.org to fail. As best I can tell, automated submissions using either the primenet.py script which ships with the Mlucas v20 release or the Dulcet/Connelly enhanced primenet.py script should be fine with or without the quotes, but users are encouraged to upgrade to v20.1 to gain the benefit of the faster stage 2. A fix for a missing null-string-terminator bug in the p-1 assignment-splitting code brought to my attention by tdulcet, which was leading to the Test/PRP one of the resulting assignment pair to contain whatever chars the string buffer in question happened to be holding beyond the (missing) end of the Test/PRP assignment. Reference-residues for 128-240M were incorrect, due to a hidden assumption in once piece of the residue-shift-handling code (which figures out where to inject the -2 of each LL-test iteration into the circularly-shifted residue) which amounted to assuming p < 231. v20.1 raises the largest Mersenne number testable to match the longstanding Fermat-number limit, set by the maximum supported FFT length of 512M. (Note that exponents > 232, thus FFT lengths 256-512M, require '-shift 0' to run.) In practice, this translates to M(p) with p approaching 9 billion. Clearly, full-length primality tests of numbers this large are nowhere near practicable as of this writing, but such moduli can be useful for software and hardware parallel-scaling tests. Miscellaneous additional minor bug- and pretty-print fixes. As always, please subscribe to this thread (and unsubscribe from any older Mlucas-release threads) to be notified of any bug and patch reports. Last fiddled with by ewmayer on 2021-09-02 at 20:38 Reason: primenet.org -> mersenne.org
 2021-09-01, 22:18 #2 ewmayer ∂2ω=0     Sep 2002 República de California 101101101001112 Posts Brief post illustrating how users who hit the v20 assignment-borkage-due-to-missing-string-terminator issue mentioned in the above list can manually patch up affected assignments, which is preferable to the code skipping them due to "unable to parse" reasons. Here the original example sent to me by tdulcet: Code: cat worktodo.ini Pminus1=F3AC27E83049B4409813291299C836B3,1,2,113334787,-1,900000,32000000 Test=F3AC27E83049B4409813291299C836B3,113334787,76,1", "fft-length":5767168, "B1":900000, "factors":[188971360622975631014921], "program":{"name":"Mlucas", "version":"20.0"}, "timestamp":"2021-08-27 13:58:46 GMT", "aid":"74ECE80F64762AFE11E83B9818CF3A46"} 1 The program has split a Test= assignment ending in ",76,0", with the trailing 0 indicating "no p-1 has been done", into a p-1 assignment and the same Test= assignment, but then it tries to replace the railing 0 with a 1, so if the p-1 does not find a factor, things proceed to the LL-test, but that is now flagged as "p-1 has been done" so does not again get split into A P-1/Test pair. The problem is that my initial implementation of this failed to first insert a string-terminating null '\0' following the "...,76,". The same string buffer had in the meantime also been used to hold a JSON-output line for writing to results.txt, so the ensuing strcat() with "1\n" left all the JSON-line contents following the "76," and appended the "1\n" starting with the string-terminator for the JSON output, which ends with ...A46"}. Long story short, if you end up with such a mangled Test= (or DoubleCheck=) assignment in your worktodo.ini file, delete everything following the "[TF bits]," and replace with a "1"; in the above example the fixed-up assignment would be Code: Pminus1=F3AC27E83049B4409813291299C836B3,1,2,113334787,-1,900000,32000000 Test=F3AC27E83049B4409813291299C836B3,113334787,76,1 (Note that the 1 following the ,76 in the mangled assignment was a coincidence, it was the rightmost digit of the found factor reported in the JSON output, 188971360622975631014921.) For PRP assignments mangled similarly, note that the trailing-digit convention is different than for Test/DoubleCheck: for PRP assignments, the trailing digit represents "PRP tests saved if a p-1 factor is found", thus a p-1/PRP assignment pair mangled like this: Code: Pminus1=C57FF1C644A0CB16F5E2B5B3A9FC4E1D,1,2,98024161,-1,800000,29000000 PRP=C57FF1C644A0CB16F5E2B5B3A9FC4E1D,1,2,98024161,-1,77,[stuff leftover from previous write of char-buffer]0 gets demangled like so: Code: Pminus1=C57FF1C644A0CB16F5E2B5B3A9FC4E1D,1,2,98024161,-1,800000,29000000 PRP=C57FF1C644A0CB16F5E2B5B3A9FC4E1D,1,2,98024161,-1,77,0 with the trailing ",0" in the PRP= assignment being PRP version of "p-1 has been done". Apologies for the fubar - all my tests of the assignment-splitting code happened to be under debugger, which nulls everything including char-buffers from one run to the next. (I.e. the debugger was providing the needed string-terminating null.)
 2021-09-05, 02:47 #3 paulunderwood     Sep 2002 Database er0rr 1111100001012 Posts factor found on a73 Yay, I found my first factor using the a73 of my Odroid N2: Found 77-digit factor in Stage 2: 126440940410782170073559 (of M105592247) Clocks for stage 2 have gone from 00:25:14 (v20) to 00:23:39 (v20.1) Last fiddled with by paulunderwood on 2021-09-05 at 02:52
2021-09-05, 03:02   #4
ewmayer
2ω=0

Sep 2002
República de California

13×29×31 Posts

Quote:
 Originally Posted by paulunderwood Yay, I found my first factor using the a73 of my Odroid N2: Found 77-digit factor in Stage 2: 126440940410782170073559 (of M105592247) Clocks for stage 2 have gone from 00:25:14 (v20) to 00:23:39 (v20.1)
Congrats, but note that due to a small typo, "77 digits" means binary digits, a.k.a. bits. We can dream of finding a 77-decimal-digit monster, tho.

So only a 6-7% stage 2 speedup on Arm, vs the 15% I see on avx-512 on my Intel NUC - but still decent.

2021-09-17, 23:34   #5
kriesel

"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

608810 Posts

Quote:
 Originally Posted by ewmayer Congrats, but note that due to a small typo, "77 digits" means binary digits, a.k.a. bits.
FYI I have V20.0 examples of that for both stage 1 and 2.

 2021-09-19, 13:52 #6 kriesel     "TF79LL86GIMPS96gpu17" Mar 2017 US midwest 608810 Posts I've updated the Mlucas reference thread somewhat, and added a list of the last several versions, with links to the corresponding threads, and a wish list, and a bug list. A couple minor issues have been discussed in PM with Ernst but not appeared in Mlucas forum threads before now IIRC: When there is a restart in P-1 stage 2, the following result record for P-1 stopped/restarted in stage 2 has 1970-01-01 midnight as time stamp, instead of the actual completion time. P-1 factors found at a GCD early in stage 2 are reported as if they were found in stage 1, with only stage 1 bound given, omitting whatever the effective stage 2 bound was. (Gpuowl v7.x also does this.) This may be considered feature-absence rather than bug. Last fiddled with by kriesel on 2021-09-19 at 14:00
2021-09-19, 16:31   #7
axn

Jun 2003

24×7×47 Posts

Quote:
 Originally Posted by kriesel P-1 factors found at a GCD early in stage 2 are reported as if they were found in stage 1, with only stage 1 bound given, omitting whatever the effective stage 2 bound was. (Gpuowl v7.x also does this.) This may be considered feature-absence rather than bug.
Due to the way the primes are paired, some of the smallest stage 2 primes are paired with some of the largest. So, at no point (until the very end), there might be a bound such that all smaller primes have been handled in stage 2.

2021-09-19, 17:39   #8
kriesel

"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

23×761 Posts

Quote:
 Originally Posted by axn Due to the way the primes are paired, some of the smallest stage 2 primes are paired with some of the largest. So, at no point (until the very end), there might be a bound such that all smaller primes have been handled in stage 2.
Understood. If it's smallest and largest paired and processed first, a modest B2 claim may be valid. Up to the "largest smallest" of the pairs that none got skipped. It's not simple, but I believe it's under consideration to implement that.

 Similar Threads Thread Thread Starter Forum Replies Last Post ewmayer Mlucas 9 2021-09-02 20:36 ewmayer Mlucas 46 2021-07-06 19:40 ewmayer Mlucas 89 2021-02-01 20:37 ewmayer Mlucas 48 2019-11-28 02:53 delta_t Mlucas 14 2007-10-04 05:45

All times are UTC. The time now is 10:45.

Mon Jan 17 10:45:23 UTC 2022 up 178 days, 5:14, 0 users, load averages: 1.19, 1.23, 1.02