mersenneforum.org Prime95 v30.4/30.5/30.6
 Register FAQ Search Today's Posts Mark Forums Read

2021-09-30, 20:05   #452
Prime95
P90 years forever!

Aug 2002
Yeehaw, FL

3·52·107 Posts

Quote:
 Originally Posted by ixfd64 Any chance we could write PRP results to results.txt too?
Try "OutputComposites=1" and, in case you might get very lucky, "OutputPrimes=1"

2021-09-30, 20:31   #453
chalsall
If I May

"Chris Halsall"
Sep 2002

246208 Posts

Quote:
 Originally Posted by James Heinrich If by pretty-print you mean presenting JSON over multiple lines with indenting and such then no, as this will break manual results which is based on the assumption that one-line=one-result.
IMHO, JSON was designed for machines, not humans.

For the latter tools are available, for development, testing, and QA purposes.

Code:
chalsall@hobbit:~$echo '{"hello":"World"}' | jq { "hello": "World" } Last fiddled with by chalsall on 2021-09-30 at 20:34 Reason: s/, humans/, not humans/; # Inverses should *not* be inferred... 2021-09-30, 22:40 #454 ixfd64 Bemusing Prompter "Danny" Dec 2002 California 24·5·31 Posts Quote:  Originally Posted by Prime95 Try "OutputComposites=1" and, in case you might get very lucky, "OutputPrimes=1" Yes, that's exactly what I was looking for. I checked undoc.txt and can't believe I didn't see these. Thank you! Last fiddled with by ixfd64 on 2021-09-30 at 22:41 2021-10-01, 12:27 #455 Zhangrc "University student" May 2021 Beijing, China 22×67 Posts Quote:  Originally Posted by kriesel Assuming the ~p2.1 scaling also applies to GCD operations The time complexity is almost linear logarithmic, O(log(ab)). Code: int gcd(int x,int y ) { if(x < y) return gcd(y,x); // x>y if( y == 0) return x; // if y=0, x is GCD else { if( !(x%2) ) { if( !(y%2) ) //x,y both even return 2*gcd(x >> 1, y >> 1); else // x is even, y is odd return gcd(x >> 1, y ); } else { if( !(y%2) ) // x is odd, y is even return gcd(x, y >> 1); else // x, y both odd return gcd(y,x-y); } } } and AFAIK there's no speed to gain using multithreaded GCD. It's a kind of function iteration where each step depends on the results from the last step. Last fiddled with by Zhangrc on 2021-10-01 at 12:30  2021-10-01, 14:44 #456 tha Dec 2002 11010101012 Posts What cause of action would be recommended for this core segmentation fault? Nothing else was running at the time. Code: [Comm thread Oct 1 13:40] Sending result to server: UID: Tha/Z-170, M9262289 completed P-1, B1=2000000, B2=146000000, Wi8: E7182B42 [Comm thread Oct 1 13:40] [Work thread Oct 1 13:40] [Work thread Oct 1 13:40] P-1 on M9194659 with B1=5000000, B2=TBD [Work thread Oct 1 13:40] Setting affinity to run helper thread 1 on CPU core #2 [Work thread Oct 1 13:40] Using FMA3 FFT length 480K, Pass1=384, Pass2=1280, clm=4, 4 threads [Work thread Oct 1 13:40] Setting affinity to run helper thread 2 on CPU core #3 [Work thread Oct 1 13:40] Setting affinity to run helper thread 3 on CPU core #4 [Comm thread Oct 1 13:40] PrimeNet success code with additional info: [Comm thread Oct 1 13:40] CPU credit is 3.9457 GHz-days. [Comm thread Oct 1 13:40] Done communicating with server. [Work thread Oct 1 13:48] M9194659 stage 1 is 13.85% complete. Time: 458.543 sec. [Work thread Oct 1 13:56] M9194659 stage 1 is 27.71% complete. Time: 454.645 sec. [Work thread Oct 1 14:03] M9194659 stage 1 is 41.58% complete. Time: 458.479 sec. [Work thread Oct 1 14:11] M9194659 stage 1 is 55.44% complete. Time: 456.359 sec. [Work thread Oct 1 14:18] M9194659 stage 1 is 69.30% complete. Time: 460.225 sec. [Work thread Oct 1 14:26] M9194659 stage 1 is 83.16% complete. Time: 436.106 sec. [Work thread Oct 1 14:33] M9194659 stage 1 is 97.02% complete. Time: 436.115 sec. [Work thread Oct 1 14:35] M9194659 stage 1 complete. 14429932 transforms. Time: 3254.242 sec. [Work thread Oct 1 14:35] With trial factoring done to 2^71, optimal B2 is 91*B1 = 455000000. [Work thread Oct 1 14:35] If no prior P-1, chance of a new factor is 8.92% [Work thread Oct 1 14:35] D: 1050, relative primes: 2848, stage 2 primes: 23755332, pair%=94.69 [Work thread Oct 1 14:35] D: 1050, relative primes: 2819, stage 2 primes: 23755332, pair%=94.61 [Work thread Oct 1 14:35] Using 11021MB of memory. [Work thread Oct 1 14:35] Stage 2 init complete. 27785 transforms. Time: 16.280 sec. [Work thread Oct 1 14:44] M9194659 stage 2 is 7.41% complete. Time: 526.239 sec. [Work thread Oct 1 14:52] M9194659 stage 2 is 14.96% complete. Time: 526.555 sec. [Work thread Oct 1 15:01] M9194659 stage 2 is 22.61% complete. Time: 526.607 sec. [Work thread Oct 1 15:10] M9194659 stage 2 is 30.29% complete. Time: 526.931 sec. [Work thread Oct 1 15:19] M9194659 stage 2 is 37.96% complete. Time: 526.598 sec. [Work thread Oct 1 15:27] M9194659 stage 2 is 45.63% complete. Time: 526.763 sec. [Work thread Oct 1 15:36] M9194659 stage 2 is 53.28% complete. Time: 526.889 sec. [Work thread Oct 1 15:45] M9194659 stage 2 is 60.82% complete. Time: 526.498 sec. [Work thread Oct 1 15:54] M9194659 stage 2 is 68.22% complete. Time: 526.293 sec. [Work thread Oct 1 16:03] M9194659 stage 2 is 75.58% complete. Time: 526.346 sec. [Work thread Oct 1 16:11] M9194659 stage 2 is 82.96% complete. Time: 532.123 sec. [Work thread Oct 1 16:20] M9194659 stage 2 is 90.33% complete. Time: 526.133 sec. [Work thread Oct 1 16:29] M9194659 stage 2 is 97.73% complete. Time: 526.361 sec. [Work thread Oct 1 16:32] M9194659 stage 2 complete. 26612490 transforms. Time: 7011.609 sec. [Work thread Oct 1 16:32] Stage 2 GCD complete. Time: 1.302 sec. [Work thread Oct 1 16:32] P-1 found a factor in stage #2, B1=5000000, B2=455000000. [Work thread Oct 1 16:32] M9194659 has a factor: 1184258844011627168698855977003308763574671960534764772350463733066848456139362163010122679897458456388791012515229078287455071 (P-1, B1=5000000, B2=455000000) [Comm thread Oct 1 16:32] Sending result to server: UID: Tha/Z-170, M9194659 has a factor: 1184258844011627168698855977003308763574671960534764772350463733066848456139362163010122679897458456388791012515229078287455071 (P-1, B1=5000000, B2=455000000) [Comm thread Oct 1 16:32] [Work thread Oct 1 16:32] [Work thread Oct 1 16:32] P-1 on M9262387 with B1=2000000, B2=TBD [Work thread Oct 1 16:32] Setting affinity to run helper thread 1 on CPU core #2 [Work thread Oct 1 16:32] Using FMA3 FFT length 480K, Pass1=384, Pass2=1280, clm=4, 4 threads [Work thread Oct 1 16:32] Setting affinity to run helper thread 2 on CPU core #3 [Work thread Oct 1 16:32] Setting affinity to run helper thread 3 on CPU core #4 [Comm thread Oct 1 16:32] PrimeNet success code with additional info: [Comm thread Oct 1 16:32] Composite factor 1184258844011627168698855977003308763574671960534764772350463733066848456139362163010122679897458456388791012515229078287455071 = 367786361 * 201803650751527169 * 267438738796927 * 101994518728598164687111 * 441343633 * 422426099229769 * 23010527886744119 * 62376566657 * 2185975009297 [Comm thread Oct 1 16:32] Already have factor 367786361 for M9194659 [Comm thread Oct 1 16:32] Already have factor 201803650751527169 for M9194659 [Comm thread Oct 1 16:32] Already have factor 267438738796927 for M9194659 [Comm thread Oct 1 16:32] Already have factor 441343633 for M9194659 [Comm thread Oct 1 16:32] Already have factor 422426099229769 for M9194659 [Comm thread Oct 1 16:32] Already have factor 23010527886744119 for M9194659 [Comm thread Oct 1 16:32] Already have factor 62376566657 for M9194659 [Comm thread Oct 1 16:32] Already have factor 2185975009297 for M9194659 [Comm thread Oct 1 16:32] CPU credit is 11.8294 GHz-days. Segmentation fault (core dumped) henk@Z170:~/mersenne$ ^C Code: Linux64,Prime95,v30.6,build 3 Your choice: 14 Consult readme.txt prior to changing any of these settings. Temporary disk space limit in GB/worker (6.000000): Daytime P-1/P+1/ECM stage 2 memory in GB (10.800000): Nighttime P-1/P+1/ECM stage 2 memory in GB (10.800000): Upload bandwidth limit in Mbps (5.000000): Upload large files time period start (00:00): Upload large files time period end (24:00): Download limit for certification work in MB/day (400): Skip advanced resource settings (Y):
2021-10-01, 14:49   #457
kriesel

"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

25×3×71 Posts
Parallelism in GCD computation; runtime order

Quote:
 Originally Posted by Zhangrc The time complexity is almost linear logarithmic, O(log(ab)). and AFAIK there's no speed to gain using multithreaded GCD. It's a kind of function iteration where each step depends on the results from the last step.
Thanks for that for single-precision operands. The larger operand would be Mp. The smaller would be ~Mp/2. o(log Mp2/2) would be o(p2). https://stackoverflow.com/questions/...clid-algorithm

Now consider the usual GIMPS case where one of the operands is Mp, p~108, requiring millions of words to store, not just one, and even simple operations may take time ~proportional to p to execute once. The other starting operand is 3large_power Mod Mp, so ~p-1 bits large on average. Also note that potential factors are typically >264 before P-1 is attempted, due to prior TF depth, so the final operands are likely multi-word also. Even assuming the operands are stored in multi-word arrays of >1M packed 64-bit-unsigned initially, one can imagine dividing x>>1 into some small m parallel threads for most of the work, and alternatively computing a new least significant word and bit offset and mask. I don't see how you avoid at least proportional to p iterations, since the algorithm posted removes one bit out of p bits per pass. What is the run time for multiprecision general subtraction, o(n) each where n is the number of bits in the lesser operand?

Anything less than ~p2.1 scaling (really O(p2 log p log log p)) makes my case stronger, that scaling downward from larger exponents to the wavefront exponents indicates significant single-core-GCD time at the wavefront.

GIMPS code parallelizes individual iterations of PRP or LL or P-1 powering. The fact that one iteration depends on the previous does not preclude parallelizing in iterations or individual operations within iterations, and gaining considerably in speed with available numbers of processor cores. (Four cores / task optimal throughput is routine in CPU-based code prime95, Mlucas. GPU code uses much higher parallelism for the same algorithms.)

Making use of parallelism to speed individual iterations of GCD is possible, as a quick web search reveals: https://www.sciencedirect.com/scienc...70866707000585. No one's coded it yet for GIMPS, choosing to focus optimization efforts on larger portions of the P-1 factoring computation first. That paper gives o(n / log n) for n1+e processors "where n is the bit-length of the larger input" which is p for Mp. That is a LOT of processors. GPUs have lots of processors, but not that many. Converting that back to sequential not parallel gives o(p2+e/log p).

There are probably slight savings to be had based on knowledge of Mp and its possible factors always being odd, and knowing an Mp-oriented GCD can always be called with a specific operand order, avoiding the initial x>y test.

The order of GMP's GCD is given in https://oaciss.uoregon.edu/icpp18/pu...23s2-file1.pdf as
Quote:
 The GCD algorithms used by GMP for large input are essentiallyO(N^(1+ε) logN) for a fairly large value ofε[6, section 15.3.3]
(An interesting paper in its own right from 2018, extrapolating that modular integers may be useful to compute GCDs faster using multiple GPUs.)

http://www.iaeng.org/IJCS/issues_v42...CS_42_4_01.pdf is a somewhat different approach, where many GCDs are done in parallel on CPU or GPU.

Last fiddled with by kriesel on 2021-10-01 at 15:27

2021-10-01, 15:19   #458
Prime95
P90 years forever!

Aug 2002
Yeehaw, FL

3×52×107 Posts

Quote:
 Originally Posted by tha What cause of action would be recommended for this core segmentation fault? Nothing else was running at the time.
Probably already fixed (in 30.7). Looks like the "message too large from server" bug.

2021-10-02, 03:05   #459
Zhangrc

"University student"
May 2021
Beijing, China

22×67 Posts

Quote:
 Originally Posted by kriesel o(log Mp2/2) would be o(p2).
No because log(ab)=log(a)+log(b). O(log Mp2/2) is O(log 22p-1) = O(2p-1).

2021-10-02, 14:51   #460
kriesel

"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

1AA016 Posts

Quote:
 Originally Posted by Zhangrc No.
In gmp, & gmplib, which IIRC is used in prime95 / mprime, gpuowl, and now also Mlucas v20.x, P-1 GCD (and probably P+1 GCD in mprime/ prime95), per https://gmplib.org/manual/Nomenclature-and-Types
"A limb means the part of a multi-precision number that fits in a single machine word. (We chose this word because a limb of the human body is analogous to a digit, only larger, and containing several digits.) Normally a limb is 32 or 64 bits."

https://gmplib.org/manual/Binary-GCD
"At small sizes GMP uses an O(N^2) binary style GCD." where N is number of limbs per operand.
"Currently, the binary algorithm is used for GCD only when N < 3." So x,y each less than 129 bits at most.

https://gmplib.org/manual/Lehmer_0027s-Algorithm
Per iteration, reduces inputs by almost a word size.
"The resulting algorithm is asymptotically O(N^2), just as the Euclidean algorithm and the binary algorithm." where N is number of limbs per operand.
For Mp of interest to GIMPS wavefront activity, N= p/limbsize = ~10^8/64 or /32.

"For inputs larger than GCD_DC_THRESHOLD, GCD is computed via the HGCD (Half GCD) function, as a generalization to Lehmer’s algorithm."
"The asymptotic running time of both HGCD and GCD is O(M(N)*log(N)), where M(N) is the time for multiplying two N-limb numbers." and where N is number of limbs per operand.
M(Mp) is O(p log p log log p) as per Knuth and many other sources for large operands using fft methods. Log2 Mp is ~p; N is p/64 or p/32 depending on unsigned-64-bit or 32-bit in gmp; log N is log (p-64 or p-32) = ~log p for large p such as the >10^8 bit operands common in GIMPS first test or preparatory P-1.
So O(M(N)*log(N)) ~ O(p log p log log p * log p) = O(p (log p)^2 log log p)

(Hmm, where does one find or determine a value for GCD_DC_THRESHOLD?)

https://www.ams.org/journals/mcom/20...17-0/home.html confirms asymptotic run time O(n (log n)^2 log log n) where n is number of bits of an operand.

Last fiddled with by kriesel on 2021-10-02 at 14:58

2021-10-02, 18:26   #461
SethTro

"Seth"
Apr 2019

19·23 Posts

Quote:
 Originally Posted by kriesel (Hmm, where does one find or determine a value for GCD_DC_THRESHOLD?)
I wrote some of that documentation :)

You can find thresholds at https://gmplib.org/devel/thres/
GCD_DC_THRESHOLD seems to be around 300-400 depending on platform.

Last fiddled with by SethTro on 2021-10-02 at 18:26

2021-10-02, 20:12   #462
tha

Dec 2002

85310 Posts

Quote:
 Originally Posted by Prime95 Probably already fixed (in 30.7). Looks like the "message too large from server" bug.
Can we download and test that version? Or should we wait for more wrinkles to be ironed out?

All times are UTC. The time now is 16:17.

Mon Sep 26 16:17:35 UTC 2022 up 39 days, 13:46, 1 user, load averages: 2.65, 2.45, 2.29