20210930, 20:05  #452 
P90 years forever!
Aug 2002
Yeehaw, FL
8084_{10} Posts 

20210930, 20:31  #453  
If I May
"Chris Halsall"
Sep 2002
Barbados
2×3^{4}×67 Posts 
Quote:
For the latter tools are available, for development, testing, and QA purposes. Code:
chalsall@hobbit:~$ echo '{"hello":"World"}'  jq { "hello": "World" } Last fiddled with by chalsall on 20210930 at 20:34 Reason: s/, humans/, not humans/; # Inverses should *not* be inferred... 

20210930, 22:40  #454  
Bemusing Prompter
"Danny"
Dec 2002
California
3·829 Posts 
Quote:
Last fiddled with by ixfd64 on 20210930 at 22:41 

20211001, 12:27  #455 
"University student"
May 2021
Beijing, China
269_{10} Posts 
The time complexity is almost linear logarithmic, O(log(ab)).
Code:
int gcd(int x,int y ) { if(x < y) return gcd(y,x); // x>y if( y == 0) return x; // if y=0, x is GCD else { if( !(x%2) ) { if( !(y%2) ) //x,y both even return 2*gcd(x >> 1, y >> 1); else // x is even, y is odd return gcd(x >> 1, y ); } else { if( !(y%2) ) // x is odd, y is even return gcd(x, y >> 1); else // x, y both odd return gcd(y,xy); } } } Last fiddled with by Zhangrc on 20211001 at 12:30 
20211001, 14:44  #456 
Dec 2002
1534_{8} Posts 
What cause of action would be recommended for this core segmentation fault? Nothing else was running at the time.
Code:
[Comm thread Oct 1 13:40] Sending result to server: UID: Tha/Z170, M9262289 completed P1, B1=2000000, B2=146000000, Wi8: E7182B42 [Comm thread Oct 1 13:40] [Work thread Oct 1 13:40] [Work thread Oct 1 13:40] P1 on M9194659 with B1=5000000, B2=TBD [Work thread Oct 1 13:40] Setting affinity to run helper thread 1 on CPU core #2 [Work thread Oct 1 13:40] Using FMA3 FFT length 480K, Pass1=384, Pass2=1280, clm=4, 4 threads [Work thread Oct 1 13:40] Setting affinity to run helper thread 2 on CPU core #3 [Work thread Oct 1 13:40] Setting affinity to run helper thread 3 on CPU core #4 [Comm thread Oct 1 13:40] PrimeNet success code with additional info: [Comm thread Oct 1 13:40] CPU credit is 3.9457 GHzdays. [Comm thread Oct 1 13:40] Done communicating with server. [Work thread Oct 1 13:48] M9194659 stage 1 is 13.85% complete. Time: 458.543 sec. [Work thread Oct 1 13:56] M9194659 stage 1 is 27.71% complete. Time: 454.645 sec. [Work thread Oct 1 14:03] M9194659 stage 1 is 41.58% complete. Time: 458.479 sec. [Work thread Oct 1 14:11] M9194659 stage 1 is 55.44% complete. Time: 456.359 sec. [Work thread Oct 1 14:18] M9194659 stage 1 is 69.30% complete. Time: 460.225 sec. [Work thread Oct 1 14:26] M9194659 stage 1 is 83.16% complete. Time: 436.106 sec. [Work thread Oct 1 14:33] M9194659 stage 1 is 97.02% complete. Time: 436.115 sec. [Work thread Oct 1 14:35] M9194659 stage 1 complete. 14429932 transforms. Time: 3254.242 sec. [Work thread Oct 1 14:35] With trial factoring done to 2^71, optimal B2 is 91*B1 = 455000000. [Work thread Oct 1 14:35] If no prior P1, chance of a new factor is 8.92% [Work thread Oct 1 14:35] D: 1050, relative primes: 2848, stage 2 primes: 23755332, pair%=94.69 [Work thread Oct 1 14:35] D: 1050, relative primes: 2819, stage 2 primes: 23755332, pair%=94.61 [Work thread Oct 1 14:35] Using 11021MB of memory. [Work thread Oct 1 14:35] Stage 2 init complete. 27785 transforms. Time: 16.280 sec. [Work thread Oct 1 14:44] M9194659 stage 2 is 7.41% complete. Time: 526.239 sec. [Work thread Oct 1 14:52] M9194659 stage 2 is 14.96% complete. Time: 526.555 sec. [Work thread Oct 1 15:01] M9194659 stage 2 is 22.61% complete. Time: 526.607 sec. [Work thread Oct 1 15:10] M9194659 stage 2 is 30.29% complete. Time: 526.931 sec. [Work thread Oct 1 15:19] M9194659 stage 2 is 37.96% complete. Time: 526.598 sec. [Work thread Oct 1 15:27] M9194659 stage 2 is 45.63% complete. Time: 526.763 sec. [Work thread Oct 1 15:36] M9194659 stage 2 is 53.28% complete. Time: 526.889 sec. [Work thread Oct 1 15:45] M9194659 stage 2 is 60.82% complete. Time: 526.498 sec. [Work thread Oct 1 15:54] M9194659 stage 2 is 68.22% complete. Time: 526.293 sec. [Work thread Oct 1 16:03] M9194659 stage 2 is 75.58% complete. Time: 526.346 sec. [Work thread Oct 1 16:11] M9194659 stage 2 is 82.96% complete. Time: 532.123 sec. [Work thread Oct 1 16:20] M9194659 stage 2 is 90.33% complete. Time: 526.133 sec. [Work thread Oct 1 16:29] M9194659 stage 2 is 97.73% complete. Time: 526.361 sec. [Work thread Oct 1 16:32] M9194659 stage 2 complete. 26612490 transforms. Time: 7011.609 sec. [Work thread Oct 1 16:32] Stage 2 GCD complete. Time: 1.302 sec. [Work thread Oct 1 16:32] P1 found a factor in stage #2, B1=5000000, B2=455000000. [Work thread Oct 1 16:32] M9194659 has a factor: 1184258844011627168698855977003308763574671960534764772350463733066848456139362163010122679897458456388791012515229078287455071 (P1, B1=5000000, B2=455000000) [Comm thread Oct 1 16:32] Sending result to server: UID: Tha/Z170, M9194659 has a factor: 1184258844011627168698855977003308763574671960534764772350463733066848456139362163010122679897458456388791012515229078287455071 (P1, B1=5000000, B2=455000000) [Comm thread Oct 1 16:32] [Work thread Oct 1 16:32] [Work thread Oct 1 16:32] P1 on M9262387 with B1=2000000, B2=TBD [Work thread Oct 1 16:32] Setting affinity to run helper thread 1 on CPU core #2 [Work thread Oct 1 16:32] Using FMA3 FFT length 480K, Pass1=384, Pass2=1280, clm=4, 4 threads [Work thread Oct 1 16:32] Setting affinity to run helper thread 2 on CPU core #3 [Work thread Oct 1 16:32] Setting affinity to run helper thread 3 on CPU core #4 [Comm thread Oct 1 16:32] PrimeNet success code with additional info: [Comm thread Oct 1 16:32] Composite factor 1184258844011627168698855977003308763574671960534764772350463733066848456139362163010122679897458456388791012515229078287455071 = 367786361 * 201803650751527169 * 267438738796927 * 101994518728598164687111 * 441343633 * 422426099229769 * 23010527886744119 * 62376566657 * 2185975009297 [Comm thread Oct 1 16:32] Already have factor 367786361 for M9194659 [Comm thread Oct 1 16:32] Already have factor 201803650751527169 for M9194659 [Comm thread Oct 1 16:32] Already have factor 267438738796927 for M9194659 [Comm thread Oct 1 16:32] Already have factor 441343633 for M9194659 [Comm thread Oct 1 16:32] Already have factor 422426099229769 for M9194659 [Comm thread Oct 1 16:32] Already have factor 23010527886744119 for M9194659 [Comm thread Oct 1 16:32] Already have factor 62376566657 for M9194659 [Comm thread Oct 1 16:32] Already have factor 2185975009297 for M9194659 [Comm thread Oct 1 16:32] CPU credit is 11.8294 GHzdays. Segmentation fault (core dumped) henk@Z170:~/mersenne$ ^C Code:
Linux64,Prime95,v30.6,build 3 Your choice: 14 Consult readme.txt prior to changing any of these settings. Temporary disk space limit in GB/worker (6.000000): Daytime P1/P+1/ECM stage 2 memory in GB (10.800000): Nighttime P1/P+1/ECM stage 2 memory in GB (10.800000): Upload bandwidth limit in Mbps (5.000000): Upload large files time period start (00:00): Upload large files time period end (24:00): Download limit for certification work in MB/day (400): Skip advanced resource settings (Y): 
20211001, 14:49  #457  
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
2^{6}×3×37 Posts 
Parallelism in GCD computation; runtime order
Quote:
Now consider the usual GIMPS case where one of the operands is Mp, p~10^{8}, requiring millions of words to store, not just one, and even simple operations may take time ~proportional to p to execute once. The other starting operand is 3^{large_power} Mod Mp, so ~p1 bits large on average. Also note that potential factors are typically >2^{64} before P1 is attempted, due to prior TF depth, so the final operands are likely multiword also. Even assuming the operands are stored in multiword arrays of >1M packed 64bitunsigned initially, one can imagine dividing x>>1 into some small m parallel threads for most of the work, and alternatively computing a new least significant word and bit offset and mask. I don't see how you avoid at least proportional to p iterations, since the algorithm posted removes one bit out of p bits per pass. What is the run time for multiprecision general subtraction, o(n) each where n is the number of bits in the lesser operand? Anything less than ~p^{2.1} scaling (really O(p^{2} log p log log p)) makes my case stronger, that scaling downward from larger exponents to the wavefront exponents indicates significant singlecoreGCD time at the wavefront. GIMPS code parallelizes individual iterations of PRP or LL or P1 powering. The fact that one iteration depends on the previous does not preclude parallelizing in iterations or individual operations within iterations, and gaining considerably in speed with available numbers of processor cores. (Four cores / task optimal throughput is routine in CPUbased code prime95, Mlucas. GPU code uses much higher parallelism for the same algorithms.) Making use of parallelism to speed individual iterations of GCD is possible, as a quick web search reveals: https://www.sciencedirect.com/scienc...70866707000585. No one's coded it yet for GIMPS, choosing to focus optimization efforts on larger portions of the P1 factoring computation first. That paper gives o(n / log n) for n^{1+e} processors "where n is the bitlength of the larger input" which is p for Mp. That is a LOT of processors. GPUs have lots of processors, but not that many. Converting that back to sequential not parallel gives o(p^{2+e}/log p). There are probably slight savings to be had based on knowledge of Mp and its possible factors always being odd, and knowing an Mporiented GCD can always be called with a specific operand order, avoiding the initial x>y test. The order of GMP's GCD is given in https://oaciss.uoregon.edu/icpp18/pu...23s2file1.pdf as Quote:
http://www.iaeng.org/IJCS/issues_v42...CS_42_4_01.pdf is a somewhat different approach, where many GCDs are done in parallel on CPU or GPU. Last fiddled with by kriesel on 20211001 at 15:27 

20211001, 15:19  #458 
P90 years forever!
Aug 2002
Yeehaw, FL
2^{2}·43·47 Posts 

20211002, 03:05  #459 
"University student"
May 2021
Beijing, China
269 Posts 

20211002, 14:51  #460 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
2^{6}×3×37 Posts 
In gmp, & gmplib, which IIRC is used in prime95 / mprime, gpuowl, and now also Mlucas v20.x, P1 GCD (and probably P+1 GCD in mprime/ prime95), per https://gmplib.org/manual/NomenclatureandTypes
"A limb means the part of a multiprecision number that fits in a single machine word. (We chose this word because a limb of the human body is analogous to a digit, only larger, and containing several digits.) Normally a limb is 32 or 64 bits." https://gmplib.org/manual/BinaryGCD "At small sizes GMP uses an O(N^2) binary style GCD." where N is number of limbs per operand. "Currently, the binary algorithm is used for GCD only when N < 3." So x,y each less than 129 bits at most. https://gmplib.org/manual/Lehmer_0027sAlgorithm Per iteration, reduces inputs by almost a word size. "The resulting algorithm is asymptotically O(N^2), just as the Euclidean algorithm and the binary algorithm." where N is number of limbs per operand. For Mp of interest to GIMPS wavefront activity, N= p/limbsize = ~10^8/64 or /32. https://gmplib.org/manual/SubquadraticGCD says "For inputs larger than GCD_DC_THRESHOLD, GCD is computed via the HGCD (Half GCD) function, as a generalization to Lehmer’s algorithm." "The asymptotic running time of both HGCD and GCD is O(M(N)*log(N)), where M(N) is the time for multiplying two Nlimb numbers." and where N is number of limbs per operand. M(Mp) is O(p log p log log p) as per Knuth and many other sources for large operands using fft methods. Log2 Mp is ~p; N is p/64 or p/32 depending on unsigned64bit or 32bit in gmp; log N is log (p64 or p32) = ~log p for large p such as the >10^8 bit operands common in GIMPS first test or preparatory P1. So O(M(N)*log(N)) ~ O(p log p log log p * log p) = O(p (log p)^2 log log p) (Hmm, where does one find or determine a value for GCD_DC_THRESHOLD?) https://www.ams.org/journals/mcom/20...170/home.html confirms asymptotic run time O(n (log n)^2 log log n) where n is number of bits of an operand. Last fiddled with by kriesel on 20211002 at 14:58 
20211002, 18:26  #461  
"Seth"
Apr 2019
1D9_{16} Posts 
Quote:
You can find thresholds at https://gmplib.org/devel/thres/ GCD_DC_THRESHOLD seems to be around 300400 depending on platform. Last fiddled with by SethTro on 20211002 at 18:26 

20211002, 20:12  #462 
Dec 2002
2^{2}·5·43 Posts 
