mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Software

Reply
 
Thread Tools
Old 2021-09-30, 20:05   #452
Prime95
P90 years forever!
 
Prime95's Avatar
 
Aug 2002
Yeehaw, FL

23·7·137 Posts
Default

Quote:
Originally Posted by ixfd64 View Post
Any chance we could write PRP results to results.txt too?
Try "OutputComposites=1" and, in case you might get very lucky, "OutputPrimes=1"
Prime95 is offline   Reply With Quote
Old 2021-09-30, 20:31   #453
chalsall
If I May
 
chalsall's Avatar
 
"Chris Halsall"
Sep 2002
Barbados

234678 Posts
Default

Quote:
Originally Posted by James Heinrich View Post
If by pretty-print you mean presenting JSON over multiple lines with indenting and such then no, as this will break manual results which is based on the assumption that one-line=one-result.
IMHO, JSON was designed for machines, not humans.

For the latter tools are available, for development, testing, and QA purposes.

Code:
chalsall@hobbit:~$ echo '{"hello":"World"}' | jq
{
  "hello": "World"
}

Last fiddled with by chalsall on 2021-09-30 at 20:34 Reason: s/, humans/, not humans/; # Inverses should *not* be inferred...
chalsall is offline   Reply With Quote
Old 2021-09-30, 22:40   #454
ixfd64
Bemusing Prompter
 
ixfd64's Avatar
 
"Danny"
Dec 2002
California

32×269 Posts
Default

Quote:
Originally Posted by Prime95 View Post
Try "OutputComposites=1" and, in case you might get very lucky, "OutputPrimes=1"
Yes, that's exactly what I was looking for. I checked undoc.txt and can't believe I didn't see these. Thank you!

Last fiddled with by ixfd64 on 2021-09-30 at 22:41
ixfd64 is offline   Reply With Quote
Old 2021-10-01, 12:27   #455
Zhangrc
 
"University student"
May 2021
Beijing, China

53 Posts
Default

Quote:
Originally Posted by kriesel View Post
Assuming the ~p2.1 scaling also applies to GCD operations
The time complexity is almost linear logarithmic, O(log(ab)).
Code:
int gcd(int x,int y )
{
    if(x < y) return gcd(y,x);  // x>y
    if( y == 0) return x;  // if y=0, x is GCD 
    else
    {
         if( !(x%2) )
         {                 
           if( !(y%2) )  //x,y both even
               return 2*gcd(x >> 1, y >> 1);    
           else      // x is even, y is odd
               return gcd(x >> 1, y );  
         }
         else 
         {
           if( !(y%2) )  // x is  odd,  y is even
               return gcd(x, y >> 1);
           else       // x, y both odd
               return gcd(y,x-y); 
         }
    }
}
and AFAIK there's no speed to gain using multithreaded GCD. It's a kind of function iteration where each step depends on the results from the last step.

Last fiddled with by Zhangrc on 2021-10-01 at 12:30
Zhangrc is offline   Reply With Quote
Old 2021-10-01, 14:44   #456
tha
 
tha's Avatar
 
Dec 2002

14738 Posts
Default

What cause of action would be recommended for this core segmentation fault? Nothing else was running at the time.

Code:
[Comm thread Oct 1 13:40] Sending result to server: UID: Tha/Z-170, M9262289 completed P-1, B1=2000000, B2=146000000, Wi8: E7182B42
[Comm thread Oct 1 13:40] 
[Work thread Oct 1 13:40] 
[Work thread Oct 1 13:40] P-1 on M9194659 with B1=5000000, B2=TBD
[Work thread Oct 1 13:40] Setting affinity to run helper thread 1 on CPU core #2
[Work thread Oct 1 13:40] Using FMA3 FFT length 480K, Pass1=384, Pass2=1280, clm=4, 4 threads
[Work thread Oct 1 13:40] Setting affinity to run helper thread 2 on CPU core #3
[Work thread Oct 1 13:40] Setting affinity to run helper thread 3 on CPU core #4
[Comm thread Oct 1 13:40] PrimeNet success code with additional info:
[Comm thread Oct 1 13:40] CPU credit is 3.9457 GHz-days.
[Comm thread Oct 1 13:40] Done communicating with server.
[Work thread Oct 1 13:48] M9194659 stage 1 is 13.85% complete. Time: 458.543 sec.
[Work thread Oct 1 13:56] M9194659 stage 1 is 27.71% complete. Time: 454.645 sec.
[Work thread Oct 1 14:03] M9194659 stage 1 is 41.58% complete. Time: 458.479 sec.
[Work thread Oct 1 14:11] M9194659 stage 1 is 55.44% complete. Time: 456.359 sec.
[Work thread Oct 1 14:18] M9194659 stage 1 is 69.30% complete. Time: 460.225 sec.
[Work thread Oct 1 14:26] M9194659 stage 1 is 83.16% complete. Time: 436.106 sec.
[Work thread Oct 1 14:33] M9194659 stage 1 is 97.02% complete. Time: 436.115 sec.
[Work thread Oct 1 14:35] M9194659 stage 1 complete. 14429932 transforms. Time: 3254.242 sec.
[Work thread Oct 1 14:35] With trial factoring done to 2^71, optimal B2 is 91*B1 = 455000000.
[Work thread Oct 1 14:35] If no prior P-1, chance of a new factor is 8.92%
[Work thread Oct 1 14:35] D: 1050, relative primes: 2848, stage 2 primes: 23755332, pair%=94.69
[Work thread Oct 1 14:35] D: 1050, relative primes: 2819, stage 2 primes: 23755332, pair%=94.61
[Work thread Oct 1 14:35] Using 11021MB of memory.
[Work thread Oct 1 14:35] Stage 2 init complete. 27785 transforms. Time: 16.280 sec.
[Work thread Oct 1 14:44] M9194659 stage 2 is 7.41% complete. Time: 526.239 sec.
[Work thread Oct 1 14:52] M9194659 stage 2 is 14.96% complete. Time: 526.555 sec.
[Work thread Oct 1 15:01] M9194659 stage 2 is 22.61% complete. Time: 526.607 sec.
[Work thread Oct 1 15:10] M9194659 stage 2 is 30.29% complete. Time: 526.931 sec.
[Work thread Oct 1 15:19] M9194659 stage 2 is 37.96% complete. Time: 526.598 sec.
[Work thread Oct 1 15:27] M9194659 stage 2 is 45.63% complete. Time: 526.763 sec.
[Work thread Oct 1 15:36] M9194659 stage 2 is 53.28% complete. Time: 526.889 sec.
[Work thread Oct 1 15:45] M9194659 stage 2 is 60.82% complete. Time: 526.498 sec.
[Work thread Oct 1 15:54] M9194659 stage 2 is 68.22% complete. Time: 526.293 sec.
[Work thread Oct 1 16:03] M9194659 stage 2 is 75.58% complete. Time: 526.346 sec.
[Work thread Oct 1 16:11] M9194659 stage 2 is 82.96% complete. Time: 532.123 sec.
[Work thread Oct 1 16:20] M9194659 stage 2 is 90.33% complete. Time: 526.133 sec.
[Work thread Oct 1 16:29] M9194659 stage 2 is 97.73% complete. Time: 526.361 sec.
[Work thread Oct 1 16:32] M9194659 stage 2 complete. 26612490 transforms. Time: 7011.609 sec.
[Work thread Oct 1 16:32] Stage 2 GCD complete. Time: 1.302 sec.
[Work thread Oct 1 16:32] P-1 found a factor in stage #2, B1=5000000, B2=455000000.
[Work thread Oct 1 16:32] M9194659 has a factor: 1184258844011627168698855977003308763574671960534764772350463733066848456139362163010122679897458456388791012515229078287455071 (P-1, B1=5000000, B2=455000000)
[Comm thread Oct 1 16:32] Sending result to server: UID: Tha/Z-170, M9194659 has a factor: 1184258844011627168698855977003308763574671960534764772350463733066848456139362163010122679897458456388791012515229078287455071 (P-1, B1=5000000, B2=455000000)
[Comm thread Oct 1 16:32] 
[Work thread Oct 1 16:32] 
[Work thread Oct 1 16:32] P-1 on M9262387 with B1=2000000, B2=TBD
[Work thread Oct 1 16:32] Setting affinity to run helper thread 1 on CPU core #2
[Work thread Oct 1 16:32] Using FMA3 FFT length 480K, Pass1=384, Pass2=1280, clm=4, 4 threads
[Work thread Oct 1 16:32] Setting affinity to run helper thread 2 on CPU core #3
[Work thread Oct 1 16:32] Setting affinity to run helper thread 3 on CPU core #4
[Comm thread Oct 1 16:32] PrimeNet success code with additional info:
[Comm thread Oct 1 16:32] Composite factor 1184258844011627168698855977003308763574671960534764772350463733066848456139362163010122679897458456388791012515229078287455071 = 367786361 * 201803650751527169 * 267438738796927 * 101994518728598164687111 * 441343633 * 422426099229769 * 23010527886744119 * 62376566657 * 2185975009297
[Comm thread Oct 1 16:32] Already have factor 367786361 for M9194659
[Comm thread Oct 1 16:32] Already have factor 201803650751527169 for M9194659
[Comm thread Oct 1 16:32] Already have factor 267438738796927 for M9194659
[Comm thread Oct 1 16:32] Already have factor 441343633 for M9194659
[Comm thread Oct 1 16:32] Already have factor 422426099229769 for M9194659
[Comm thread Oct 1 16:32] Already have factor 23010527886744119 for M9194659
[Comm thread Oct 1 16:32] Already have factor 62376566657 for M9194659
[Comm thread Oct 1 16:32] Already have factor 2185975009297 for M9194659
[Comm thread Oct 1 16:32] CPU credit is 11.8294 GHz-days.
Segmentation fault (core dumped)
henk@Z170:~/mersenne$ ^C
Code:
Linux64,Prime95,v30.6,build 3

Your choice: 14

Consult readme.txt prior to changing any of these settings.

Temporary disk space limit in GB/worker (6.000000): 
Daytime P-1/P+1/ECM stage 2 memory in GB (10.800000): 
Nighttime P-1/P+1/ECM stage 2 memory in GB (10.800000): 
Upload bandwidth limit in Mbps (5.000000): 
Upload large files time period start (00:00): 
Upload large files time period end (24:00): 
Download limit for certification work in MB/day (400): 
Skip advanced resource settings (Y):
tha is offline   Reply With Quote
Old 2021-10-01, 14:49   #457
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

19·311 Posts
Default Parallelism in GCD computation; runtime order

Quote:
Originally Posted by Zhangrc View Post
The time complexity is almost linear logarithmic, O(log(ab)).
and AFAIK there's no speed to gain using multithreaded GCD. It's a kind of function iteration where each step depends on the results from the last step.
Thanks for that for single-precision operands. The larger operand would be Mp. The smaller would be ~Mp/2. o(log Mp2/2) would be o(p2). https://stackoverflow.com/questions/...clid-algorithm

Now consider the usual GIMPS case where one of the operands is Mp, p~108, requiring millions of words to store, not just one, and even simple operations may take time ~proportional to p to execute once. The other starting operand is 3large_power Mod Mp, so ~p-1 bits large on average. Also note that potential factors are typically >264 before P-1 is attempted, due to prior TF depth, so the final operands are likely multi-word also. Even assuming the operands are stored in multi-word arrays of >1M packed 64-bit-unsigned initially, one can imagine dividing x>>1 into some small m parallel threads for most of the work, and alternatively computing a new least significant word and bit offset and mask. I don't see how you avoid at least proportional to p iterations, since the algorithm posted removes one bit out of p bits per pass. What is the run time for multiprecision general subtraction, o(n) each where n is the number of bits in the lesser operand?

Anything less than ~p2.1 scaling (really O(p2 log p log log p)) makes my case stronger, that scaling downward from larger exponents to the wavefront exponents indicates significant single-core-GCD time at the wavefront.

GIMPS code parallelizes individual iterations of PRP or LL or P-1 powering. The fact that one iteration depends on the previous does not preclude parallelizing in iterations or individual operations within iterations, and gaining considerably in speed with available numbers of processor cores. (Four cores / task optimal throughput is routine in CPU-based code prime95, Mlucas. GPU code uses much higher parallelism for the same algorithms.)

Making use of parallelism to speed individual iterations of GCD is possible, as a quick web search reveals: https://www.sciencedirect.com/scienc...70866707000585. No one's coded it yet for GIMPS, choosing to focus optimization efforts on larger portions of the P-1 factoring computation first. That paper gives o(n / log n) for n1+e processors "where n is the bit-length of the larger input" which is p for Mp. That is a LOT of processors. GPUs have lots of processors, but not that many. Converting that back to sequential not parallel gives o(p2+e/log p).

There are probably slight savings to be had based on knowledge of Mp and its possible factors always being odd, and knowing an Mp-oriented GCD can always be called with a specific operand order, avoiding the initial x>y test.

The order of GMP's GCD is given in https://oaciss.uoregon.edu/icpp18/pu...23s2-file1.pdf as
Quote:
The GCD algorithms used by
GMP for large input are essentiallyO(N^(1+ε) logN) for a fairly large
value ofε[6, section 15.3.3]
(An interesting paper in its own right from 2018, extrapolating that modular integers may be useful to compute GCDs faster using multiple GPUs.)

http://www.iaeng.org/IJCS/issues_v42...CS_42_4_01.pdf is a somewhat different approach, where many GCDs are done in parallel on CPU or GPU.

Last fiddled with by kriesel on 2021-10-01 at 15:27
kriesel is online now   Reply With Quote
Old 2021-10-01, 15:19   #458
Prime95
P90 years forever!
 
Prime95's Avatar
 
Aug 2002
Yeehaw, FL

23×7×137 Posts
Default

Quote:
Originally Posted by tha View Post
What cause of action would be recommended for this core segmentation fault? Nothing else was running at the time.
Probably already fixed (in 30.7). Looks like the "message too large from server" bug.
Prime95 is offline   Reply With Quote
Old 2021-10-02, 03:05   #459
Zhangrc
 
"University student"
May 2021
Beijing, China

53 Posts
Default

Quote:
Originally Posted by kriesel View Post
o(log Mp2/2) would be o(p2).
No because log(ab)=log(a)+log(b). O(log Mp2/2) is O(log 22p-1) = O(2p-1).
Zhangrc is offline   Reply With Quote
Old 2021-10-02, 14:51   #460
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

19×311 Posts
Default

Quote:
Originally Posted by Zhangrc View Post
No.
In gmp, & gmplib, which IIRC is used in prime95 / mprime, gpuowl, and now also Mlucas v20.x, P-1 GCD (and probably P+1 GCD in mprime/ prime95), per https://gmplib.org/manual/Nomenclature-and-Types
"A limb means the part of a multi-precision number that fits in a single machine word. (We chose this word because a limb of the human body is analogous to a digit, only larger, and containing several digits.) Normally a limb is 32 or 64 bits."

https://gmplib.org/manual/Binary-GCD
"At small sizes GMP uses an O(N^2) binary style GCD." where N is number of limbs per operand.
"Currently, the binary algorithm is used for GCD only when N < 3." So x,y each less than 129 bits at most.

https://gmplib.org/manual/Lehmer_0027s-Algorithm
Per iteration, reduces inputs by almost a word size.
"The resulting algorithm is asymptotically O(N^2), just as the Euclidean algorithm and the binary algorithm." where N is number of limbs per operand.
For Mp of interest to GIMPS wavefront activity, N= p/limbsize = ~10^8/64 or /32.

https://gmplib.org/manual/Subquadratic-GCD says
"For inputs larger than GCD_DC_THRESHOLD, GCD is computed via the HGCD (Half GCD) function, as a generalization to Lehmer’s algorithm."
"The asymptotic running time of both HGCD and GCD is O(M(N)*log(N)), where M(N) is the time for multiplying two N-limb numbers." and where N is number of limbs per operand.
M(Mp) is O(p log p log log p) as per Knuth and many other sources for large operands using fft methods. Log2 Mp is ~p; N is p/64 or p/32 depending on unsigned-64-bit or 32-bit in gmp; log N is log (p-64 or p-32) = ~log p for large p such as the >10^8 bit operands common in GIMPS first test or preparatory P-1.
So O(M(N)*log(N)) ~ O(p log p log log p * log p) = O(p (log p)^2 log log p)

(Hmm, where does one find or determine a value for GCD_DC_THRESHOLD?)

https://www.ams.org/journals/mcom/20...17-0/home.html confirms asymptotic run time O(n (log n)^2 log log n) where n is number of bits of an operand.

Last fiddled with by kriesel on 2021-10-02 at 14:58
kriesel is online now   Reply With Quote
Old 2021-10-02, 18:26   #461
SethTro
 
SethTro's Avatar
 
"Seth"
Apr 2019

1011111102 Posts
Default

Quote:
Originally Posted by kriesel View Post
(Hmm, where does one find or determine a value for GCD_DC_THRESHOLD?)
I wrote some of that documentation :)

You can find thresholds at https://gmplib.org/devel/thres/
GCD_DC_THRESHOLD seems to be around 300-400 depending on platform.

Last fiddled with by SethTro on 2021-10-02 at 18:26
SethTro is offline   Reply With Quote
Old 2021-10-02, 20:12   #462
tha
 
tha's Avatar
 
Dec 2002

827 Posts
Default

Quote:
Originally Posted by Prime95 View Post
Probably already fixed (in 30.7). Looks like the "message too large from server" bug.
Can we download and test that version? Or should we wait for more wrinkles to be ironed out?
tha is offline   Reply With Quote
Reply

Thread Tools


All times are UTC. The time now is 14:03.


Sat Nov 27 14:03:01 UTC 2021 up 127 days, 8:32, 0 users, load averages: 0.59, 0.86, 0.96

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.