mersenneforum.org Mlucas v20.1.1 (latest) available
 Register FAQ Search Today's Posts Mark Forums Read

 2021-11-29, 03:18 #12 ewmayer ∂2ω=0     Sep 2002 República de California 23×32×163 Posts Patch Alert: Some recent code changes to clean up the messaging and file-writing left a few dangling fclose() calls in the two *_mod_square.c source files, potentially leading to a null-pointer fclose crash following emission of a roundoff error warning. Fixed. Also a few help.txt file changes to improve coherence of the how-to-kill text. The md5 value in the OP has been updated to match this upload (1686232 bytes, md5 = dc5487e984196a32b47a8066ec9a6803).
2021-11-29, 13:53   #13
tdulcet

"Teal Dulcet"
Jun 2018

2×3×11 Posts

Quote:
 Originally Posted by ewmayer Those of you using tdulcet's mlucas.sh install script will want to grab the latest version, but check the SUM-field value and if it differs from the one (dc5487e984196a32b47a8066ec9a6803) for the current-download of v20.1.1, manually change it to the md5 checksum listed for the latter at the README.
I just pushed the updated script to my repository. See here for the full changes

It will now automatically add lines to the bench.txt file for future reference in the same format as those added by Prime95/MPrime to the respective results.bench.txt file when running a throughput benchmark. This is in addition to the benchmark summery table I added to that file in my previous update of the script (see post above). Here is an example of these lines on a 4 core ARM system:
Code:
Timings for 2048K FFT length (4 cores, 1 threads, 4 workers): 40.41, 41.10, 40.81, 40.32 ms.  Throughput: 98.381 iter/sec.
Timings for 2304K FFT length (4 cores, 1 threads, 4 workers): 48.30, 48.96, 48.34, 49.33 ms.  Throughput: 82.083 iter/sec.
Timings for 2560K FFT length (4 cores, 1 threads, 4 workers): 52.98, 53.24, 53.09, 53.06 ms.  Throughput: 75.340 iter/sec.
Timings for 2816K FFT length (4 cores, 1 threads, 4 workers): 58.45, 58.60, 58.59, 58.59 ms.  Throughput: 68.306 iter/sec.
Timings for 3072K FFT length (4 cores, 1 threads, 4 workers): 64.35, 64.80, 64.45, 64.69 ms.  Throughput: 61.947 iter/sec.
Timings for 3328K FFT length (4 cores, 1 threads, 4 workers): 69.29, 69.48, 69.34, 69.57 ms.  Throughput: 57.621 iter/sec.
Timings for 3584K FFT length (4 cores, 1 threads, 4 workers): 75.27, 75.47, 75.24, 75.52 ms.  Throughput: 53.067 iter/sec.
Timings for 3840K FFT length (4 cores, 1 threads, 4 workers): 81.52, 81.82, 81.66, 81.73 ms.  Throughput: 48.970 iter/sec.
Timings for 4096K FFT length (4 cores, 1 threads, 4 workers): 78.06, 78.70, 78.06, 78.87 ms.  Throughput: 51.007 iter/sec.
Timings for 4608K FFT length (4 cores, 1 threads, 4 workers): 97.03, 97.62, 97.09, 97.78 ms.  Throughput: 41.076 iter/sec.
Timings for 5120K FFT length (4 cores, 1 threads, 4 workers): 107.23, 107.67, 107.00, 107.60 ms.  Throughput: 37.253 iter/sec.
Timings for 5632K FFT length (4 cores, 1 threads, 4 workers): 118.54, 119.28, 118.63, 119.31 ms.  Throughput: 33.630 iter/sec.
Timings for 6144K FFT length (4 cores, 1 threads, 4 workers): 129.88, 130.56, 128.54, 129.97 ms.  Throughput: 30.832 iter/sec.
Timings for 6656K FFT length (4 cores, 1 threads, 4 workers): 141.49, 142.41, 141.57, 142.44 ms.  Throughput: 28.174 iter/sec.
Timings for 7168K FFT length (4 cores, 1 threads, 4 workers): 153.81, 155.76, 155.40, 155.35 ms.  Throughput: 25.794 iter/sec.
Timings for 7680K FFT length (4 cores, 1 threads, 4 workers): 167.57, 169.11, 168.69, 168.74 ms.  Throughput: 23.736 iter/sec.

 2021-12-02, 19:38 #14 ewmayer ∂2ω=0     Sep 2002 República de California 23·32·163 Posts Patch Alert:Due to a user report of bad behavior of regular (non '-9') kill with his multithreaded run, signal handling has been changed to immediate-exit without savefile write. Suggest users 'killall -9 Mlucas' any ongoing jobs at their earliest convenience and switch to the updated code; using that, regular 'kill' should work. Also, workfile assignments are now echoed to the per-exponent log (.stat) file, not just to stderr (e.g. to the nohup.out file). The md5 value in the OP has been updated to match this upload (1688188 bytes, md5 = 970c4dde58417bd7f6be0e4af4b59b4e).
2021-12-03, 10:58   #15
tdulcet

"Teal Dulcet"
Jun 2018

2·3·11 Posts

Quote:
 Originally Posted by ewmayer Those of you using tdulcet's mlucas.sh install script will want to grab the latest version, but check the SUM-field value and if it differs from the one (970c4dde58417bd7f6be0e4af4b59b4e) for the current-download of v20.1.1, manually change it to the md5 checksum listed for the latter at the README.
I just pushed the new md5sum to my repository. See here for the change.

2022-02-05, 22:36   #16
kriesel

"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

3×11×199 Posts

Quote:
 Originally Posted by ixfd64 I noticed that mlucas now supports exponents up to around nine billion.
Testing indicates the empirical limits are work type dependent. Something about signed and unsigned int32 in places. See attachment of https://www.mersenneforum.org/showpo...8&postcount=36

I think the limit situation may improve somewhat with the next release.

Last fiddled with by kriesel on 2022-02-05 at 22:37

2022-02-07, 18:58   #17
ewmayer
2ω=0

Sep 2002
República de California

23×32×163 Posts

Quote:
 Originally Posted by kriesel Testing indicates the empirical limits are work type dependent. Something about signed and unsigned int32 in places. See attachment of https://www.mersenneforum.org/showpo...8&postcount=36 I think the limit situation may improve somewhat with the next release.
There should be no issue running LL, PRP or p-1 with exponents approaching 9 billion - I think you may be alluding to the current 32-bit limit on residue shift. From the README: " exponents > 2^32, thus FFT lengths 256-512M, require '-shift 0' to run."

I've been running p-1 on F33, exponent ~8.6 billion, for around 6 months now, max ROEs are a little under 0.10 for that.

2022-02-07, 23:17   #18
kriesel

"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

3·11·199 Posts

Quote:
 Originally Posted by ewmayer There should be no issue running LL, PRP or p-1 with exponents approaching 9 billion - I think you may be alluding to the current 32-bit limit on residue shift. From the README: " exponents > 2^32, thus FFT lengths 256-512M, require '-shift 0' to run." I've been running p-1 on F33, exponent ~8.6 billion, for around 6 months now, max ROEs are a little under 0.10 for that.
Or started testing with an earlier version of 20.1.1, or forgot about -shift 0, or both. IIRC there was a time when we were ferreting out exponent limits and IIRC the usable exponent range varied. Lots of code paths as you well know. Will retest a few Mn cases & worktypes with v20.1.1 2022-12-02 and PM re any anomalies found.

2022-02-08, 20:23   #19
ewmayer
2ω=0

Sep 2002
República de California

23×32×163 Posts

Ken PMed me with some questions and examples of issues he hit for M(p) with p > 2^32 - for the benefit of any other users wanting to play with such stuff, copy of my reply to him:
Quote:
o Anyone who wishes to, can change the 'if(p > (1ull << 32))' used to limit #iters for expos > 2^32 to 'if(0)' in Mlucas.h or up the value of MAX_SELFTEST_ITERS in Mdata.h. I just want users to be darn sure they want to burn that kind of runtime, consider it a "feature must be manually enabled by user" caution.

o Re. your attempt with 8937021997, that is just above the (FMA-build mode) default limit for 512M FFT, and the ensuing error message prints just the low 32 bits of p - fixed in local build. You'll need to force '-fft 512M' for such cases.

o Re. "check_kbnc: Mersenne exponent must be prime!" for p = 8937021689 - yep, that's a bug, some incorrectly nested logic. In Mlucas.c::check_knbc(), the clause starting with 'if(i == -1)' needs to be modified thusly:
Code:
		if(i == -1) {
uint32 phi32 = (*p >> 32);
if((phi32 && !isPRP64(*p)) || (!phi32 && !is_prime((uint32)*p))) {
fprintf(stderr,"%s: Mersenne exponent must be prime!\n",func); break;
}
MODULUS_TYPE = MODULUS_TYPE_MERSENNE;
} else if(i == 1) {
In the current release version, if the exponent > 2^32 the left-clause isPRP64 call returns 1 as expected, but the or (||)-following clause gets tested next, i.e. the code checks if the low 32 bits of p are prime. That latter clause must be taken only if p < 2^32. I was wondering why I didn't hit this issue for the p-1 example run of p = 8589934693 I PMed you and Teal about at end of last October, but the low 32 bits of that p happen to == 101, which is prime. Fixed via above in my local branch, but does not affect large-exponent self-testing, so will roll out in v21.

Quote:
 Originally Posted by kriesel Did that 232 exponent cap get fully undone, all Mersenne worktypes? If the tweak was modified to 1E6 iterations as selftest now states, Code: Full-length LL/PRP/Pepin tests on exponents this large not supported; will exit at self-test limit of 1000000. and applies for Mersenne P-1 stage 1 also, that could be trouble for large-p attempts on Mersennes. OBD B1 ~17M so iter count ~ 25M. 1Gbit B1 ~ 5.5M so iter count ~8M. With so many cases and subcases, and F33 stage 1 behind you and lots of proof work ahead, it would be easy to miss a case or more.
If you search for the "will exit at" error-print in Mlucas.c, you'll see it's inside a if(TEST_TYPE == TEST_TYPE_PRIMALITY || TEST_TYPE == TEST_TYPE_PRP) clause, whose outermost if() is for Mersenne moduli, i.e. that limit applies only to Mersenne LL/PRP tests. The overall iteration < 2^32 limit for all test types still applies though. Should anyone decide to manually tweak the above to allow LL/PRP to iter > 10^6 and start such a run which hits the 2^32 limit before I get around to relaxing it, I'll be happy to let them accuse me of false advertising. :)

2022-02-09, 09:58   #20
kriesel

"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

3·11·199 Posts

Quote:
 Originally Posted by ewmayer Ken PMed me with some questions and examples of issues he hit for M(p) with p > 2^32 - for the benefit of any other users wanting to play with such stuff, copy of my reply to him:
"Should anyone decide to manually tweak the above to allow LL/PRP to iter > 10^6 and start such a run which hits the 2^32 limit before I get around to relaxing it, I'll be happy to let them accuse me of false advertising. :)"
Yeah, you're pretty safe there, since at an estimated 88.5 years to 232 iterations on 256M fft length on one of my "faster" test systems, we'd potentially need to leave instructions for our heirs, and theirs! (Other than your source code that is.)

Last fiddled with by kriesel on 2022-02-09 at 10:30

 2022-03-07, 14:44 #21 mathwiz   Mar 2019 277 Posts I am trying to get the pm1 standalone binary to build on Ubuntu. Getting the following errors: Code: $clang -c -DPM1_STANDALONE -O3 pm1.c pm1.c:34:3: warning: Building pm1.c in PM1_STANDALONE mode. [-W#warnings] #warning Building pm1.c in PM1_STANDALONE mode. ^ pm1.c:898:4: warning: Building pm1_stage2() in standalone (modmul-counting) mode! [-W#warnings] #warning Building pm1_stage2() in standalone (modmul-counting) mode! ^ pm1.c:1004:2: error: use of undeclared identifier 'dtmp'; did you mean 'tmp'? dtmp = mlucas_getOptVal(MLUCAS_INI_FILE,"InterimGCD"); // Any failure-to-find-or-parse can be checked for via isNaN(dtmp) ^~~~ tmp pm1.c:956:9: note: 'tmp' declared here uint64 tmp,q,q0,q1,q2, qlo = 0ull,qhi, reloc_start, pinv64 = 0ull; ^ pm1.c:1005:5: error: use of undeclared identifier 'dtmp'; did you mean 'tmp'? if(dtmp == 0) { ^~~~ tmp pm1.c:956:9: note: 'tmp' declared here uint64 tmp,q,q0,q1,q2, qlo = 0ull,qhi, reloc_start, pinv64 = 0ull; ^ pm1.c:1008:3: error: use of undeclared identifier 'interim_gcd' interim_gcd = 0; ^ pm1.c:1715:2: error: use of undeclared identifier 'q_old_10M' q_old_10M = (uint32)(q0 * inv10m); ^ pm1.c:1715:28: error: use of undeclared identifier 'inv10m' q_old_10M = (uint32)(q0 * inv10m); Am I missing some necessary -D flag, or is something else wrong? For reference I have: Code: $ clang --version Debian clang version 13.0.1-+rc1-1~exp4 Target: x86_64-pc-linux-gnu Thread model: posix InstalledDir: /usr/bin
2022-03-07, 19:55   #22
ewmayer
2ω=0

Sep 2002
República de California

267308 Posts

Quote:
 Originally Posted by mathwiz I am trying to get the pm1 standalone binary to build on Ubuntu. Getting the following errors:
That's a developer-only flag I used to help gauge performance of various stage 2 prime-pairing-related settings during my algorithm development work, by way of counting modmuls without actually doing them. Looks like since I last turned it on, other parts of the code have changed.

There is no "p-1 standalone" mode for the actual user build - if p-1 is all you want to do, you use the regular build as described on the README webpage ('bash makemake.sh' inside the unzipped source tarball) and simply restrict the assignment types to Pminus1 and/or Pfactor ones.

Last fiddled with by ewmayer on 2022-03-07 at 20:00

 Similar Threads Thread Thread Starter Forum Replies Last Post ewmayer Mlucas 7 2021-09-19 17:39 ewmayer Mlucas 9 2021-09-02 20:36 ewmayer Mlucas 46 2021-07-06 19:40 ewmayer Mlucas 89 2021-02-01 20:37 delta_t Mlucas 14 2007-10-04 05:45

All times are UTC. The time now is 09:45.

Tue Jun 28 09:45:25 UTC 2022 up 75 days, 7:46, 1 user, load averages: 0.54, 0.83, 0.89