20230421, 01:59  #45 
If I May
"Chris Halsall"
Sep 2002
Barbados
2^{2}·5·571 Posts 

20230421, 12:45  #46  
Dec 2022
2×3×7×13 Posts 
Quote:
I immediately ask myself it is changes optimum bounds. Clearly, it can't change the optimum B2 for a fixed B1  but should B1 remain fixed? I think the answer is, theoretically, no. As the average run time is reduces for the same factor probability, the marginal benefit should favor increasing B1 (and so also B2, we can take a first approximation that B2/B1 should be fixed). That is small, though, as only if a factor is actually found in stage 2 does it come into action. Mental calculation beats James's calculator for differences this small  simply work out the time saved and the increased factor probablity with bounds  I get a rough estimate that the increase should be about 1.5% for your conditions  not really enough to worry about, and smaller than the effect of the extra bit of TF you do over the 'target'. 

20230423, 16:40  #47 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
8,069 Posts 
IIRC, the OBD TF target of 92 bits was chosen based on the high GHD demand making use of recent NVIDIA GPUs likely, which have high TF performance and extremely low DP/SP performance ratio such as 1/64, which pushes the optimal use of such GPUs to higher TF bits than the usual target. And OBD TF software was ready, long before OBD P1 software was.
Improvements in efficiency of a P1 stage will increase the optimal bound of that stage, and affect the optimal bound for the other stage slightly. Addition of polynomial multiplication in prime95 improved its stage 2 efficiency considerably, drastically increasing the optimal stage 2 bound and somewhat increasing the optimal stage 1 bound. Addition of polynomial multiplication to gpuowl stage 2 is unlikely, since most GPUs lack sufficient onboard ram to make good use of it. Addition of polynomial multiplication to Mlucas could help OBD, but I don't expect that to occur within the year or longer, if ever. Extension of prime95 to much larger fft lengths (~192M) would bring polynomial multiplication accelerated P1 stage 2 to OBD, but that also is not expected to occur any time soon. Last fiddled with by kriesel on 20230423 at 16:41 
20230425, 12:45  #48 
Dec 2022
2·3·7·13 Posts 
I am sure your decision to TF to 92 was sensibly based on your GPU capability, yes, so that's not in question. Although the periodic GCDs seem to make _stage 2_ more efficient, they do not actually raise the optimum B2/B1  as I omitted a proof I give this: fixing B1, the optimal B2 is when the marginal increase in factors found falls below the marginal cost of a longer stage 2 (compared with the primality test), and neither is changed by the periodic GCDs. It is, as I said, rather raising optimal B1 that is the primary effect.

20230425, 14:26  #49 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
8069_{10} Posts 
OBD TF to 92 bits might be 1 bit too high to be computeeffortoptimal. But it may be closer to calendar time optimal.
Calendar could be compressed further by doing the last bit of TF in parallel with the previous bits and the start of P1 on stage 1, on 3 distinct runs & hardware, at some risk of wasted effort if one of them finds a factor. Periodic GCD in P1 stage 2 reduces stage2 cost when a factor is found early. Optimal P1 bounds are whatever maximizes the probable savings in total compute time per exponent on the average. I think that will be where the partial derivatives of total compute time with B1 or B2 are zero, within the constraint of allowable ram. Mersenne.ca target bounds probably do not reflect the effect of periodic stage 2 GCD, at least within the 01G range or prime95 range ~1.169G, if not 10G. Early termination on factor found would I think cause slightly higher B2 to be optimal, by lowering average cost up to a specified B2. Finding optimal B1,B2 for the Mlucas case of early and often stage 2 GCD is more complex than for the prime95, gpuowl, cudapm1 etc case of performing stage 2 GCD once at the end. I note that in GIMPS software currently, GCD is a singlecore computation. To my knowledge no one has implemented for GIMPS software a multithreaded GCD algorithm. This can be confirmed by viewing source code, by cpu core utilization during GCDs, and by observing a increase in performance of prime95 when running with Mlucas during Mlucas GCDs. Gpuowl: stage 1 GCD is done on cpu in parallel (as a separate thread) with speculative start of stage 2 on the GPU; stage 2 GCD is done in parallel (as a separate thread) with the next worktodo item if available occupying the GPU. prime95: GCD is done sequentially with other work for the worker, so n1 cores go idle for the duration of the GCD on an n core worker. Mlucas: GCD is done sequentially, so n1 cores go idle for the duration of the GCD in an ncore instance. CUDAPm1: GCDs are done on one cpu core, so the GPU goes idle for the duration of the GCD. For currently plausible numbers of cpu cores, the potential performance gain from parallelizing GCD seems small. https://www.wisdom.weizmann.ac.il/~oded/PDF/gcd.pdf https://oaciss.uoregon.edu/icpp18/pu...23s2file1.pdf Last fiddled with by kriesel on 20230425 at 14:31 
20230429, 12:43  #50 
Dec 2022
2×3×7×13 Posts 
This is the correct rule and how I drew my conclusion. The extra GCDs _do not_ change the derivative with B2, as the new factors found with a marginal increase in it occur at the end of stage 2 where there's no time saving. They _do_ change that with B1, because the numerator (factor probability) remains the same, while the denominator (time) falls. It is still nothing to worry about as P1 efficiency is a very flat plateau.

20230703, 17:18  #51 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
8,069 Posts 
The additional stage 2 GCDs have nonzero cost and nonzero probable benefit to overall compute time. If chosen correctly they can reduce average cost of stage 2 to a given bound, by up to perhaps ~1/2 stage 2 compute time x probability of stage 2 factor found, changing the totalcomputetime (of P1 and PRP) cost function vs. bounds, and therefore likely changing the averageprobablecomputetimeoptimal B2 (slightly). Optimal B2 increased greatly with recent efficiency improvements in prime95. The additional cost saving due to additional stage 2 GCDs is a small effect; very roughly 3% stage 2 factor probability, times ~40% stage 2 run time, times ~2.5% ratio P1 time/prp time overall, times ~1/2 stage 2 as a total of P1 run time, .03 0.4 /40 /2 ~ 0.015% of PRP time, or 0.6% of P1 time. Number of GCDs is an additional optimization variable. GCD time scales more slowly with exponent than PRP time does, so optimization would favor more GCDs at larger exponent.
The benefit was real enough & implementation straightforward enough for Ernst Mayer to implement interim stage 2 GCDs in Mlucas. (~0.6% of a year on F33 or OBD depending on hardware is ~2.2 days saved per run on average.) Given a choice to think Ernst got it wrong, or Andrew Usher, I'll bet on Ernst being right. He's been at this for decades, publicly. The payoff functions in total probable compute time over many exponents of unknown factorability other than TF history, could vary between polynomial multiplication implementation of P1 and other implementations. Which is to say, the cost of additional GCDs might make sense in Mlucas P1 implementation and not in v30.8 or later mprime's polynomial multiplication implementation. I would also trust George to get that call right, as well as prioritizing other larger efficiency or reliability improvements. Just yesterday Gerbicz posted about GEC applicability to ECM stage 1, which might appeal to George as a future enhancement. Last fiddled with by kriesel on 20230703 at 17:26 
Thread Tools  
Similar Threads  
Thread  Thread Starter  Forum  Replies  Last Post 
A couple of 15e candidates  fivemack  NFS@Home  1  20141130 07:52 
How to calculate FFT lengths of candidates  pepi37  Riesel Prime Search  8  20140417 20:51 
No available candidates on server  japelprime  Prime Sierpinski Project  2  20111228 07:38 
Adding New Candidates  wblipp  Operation Billion Digits  6  20110410 17:45 
new candidates for M...46 and M48  cochet  Miscellaneous Math  4  20081024 14:33 