![]() |
![]() |
#45 |
If I May
"Chris Halsall"
Sep 2002
Barbados
1129110 Posts |
![]() |
![]() |
![]() |
![]() |
#46 | |
Dec 2022
3×5×29 Posts |
![]() Quote:
I immediately ask myself it is changes optimum bounds. Clearly, it can't change the optimum B2 for a fixed B1 - but should B1 remain fixed? I think the answer is, theoretically, no. As the average run time is reduces for the same factor probability, the marginal benefit should favor increasing B1 (and so also B2, we can take a first approximation that B2/B1 should be fixed). That is small, though, as only if a factor is actually found in stage 2 does it come into action. Mental calculation beats James's calculator for differences this small - simply work out the time saved and the increased factor probablity with bounds - I get a rough estimate that the increase should be about 1.5% for your conditions - not really enough to worry about, and smaller than the effect of the extra bit of TF you do over the 'target'. |
|
![]() |
![]() |
![]() |
#47 |
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
11110000110002 Posts |
![]()
IIRC, the OBD TF target of 92 bits was chosen based on the high GHD demand making use of recent NVIDIA GPUs likely, which have high TF performance and extremely low DP/SP performance ratio such as 1/64, which pushes the optimal use of such GPUs to higher TF bits than the usual target. And OBD TF software was ready, long before OBD P-1 software was.
Improvements in efficiency of a P-1 stage will increase the optimal bound of that stage, and affect the optimal bound for the other stage slightly. Addition of polynomial multiplication in prime95 improved its stage 2 efficiency considerably, drastically increasing the optimal stage 2 bound and somewhat increasing the optimal stage 1 bound. Addition of polynomial multiplication to gpuowl stage 2 is unlikely, since most GPUs lack sufficient onboard ram to make good use of it. Addition of polynomial multiplication to Mlucas could help OBD, but I don't expect that to occur within the year or longer, if ever. Extension of prime95 to much larger fft lengths (~192M) would bring polynomial multiplication accelerated P-1 stage 2 to OBD, but that also is not expected to occur any time soon. Last fiddled with by kriesel on 2023-04-23 at 16:41 |
![]() |
![]() |
![]() |
#48 |
Dec 2022
3·5·29 Posts |
![]()
I am sure your decision to TF to 92 was sensibly based on your GPU capability, yes, so that's not in question. Although the periodic GCDs seem to make _stage 2_ more efficient, they do not actually raise the optimum B2/B1 - as I omitted a proof I give this: fixing B1, the optimal B2 is when the marginal increase in factors found falls below the marginal cost of a longer stage 2 (compared with the primality test), and neither is changed by the periodic GCDs. It is, as I said, rather raising optimal B1 that is the primary effect.
|
![]() |
![]() |
![]() |
#49 |
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
23×32×107 Posts |
![]()
OBD TF to 92 bits might be 1 bit too high to be compute-effort-optimal. But it may be closer to calendar time optimal.
Calendar could be compressed further by doing the last bit of TF in parallel with the previous bits and the start of P-1 on stage 1, on 3 distinct runs & hardware, at some risk of wasted effort if one of them finds a factor. Periodic GCD in P-1 stage 2 reduces stage2 cost when a factor is found early. Optimal P-1 bounds are whatever maximizes the probable savings in total compute time per exponent on the average. I think that will be where the partial derivatives of total compute time with B1 or B2 are zero, within the constraint of allowable ram. Mersenne.ca target bounds probably do not reflect the effect of periodic stage 2 GCD, at least within the 0-1G range or prime95 range ~1.169G, if not 10G. Early termination on factor found would I think cause slightly higher B2 to be optimal, by lowering average cost up to a specified B2. Finding optimal B1,B2 for the Mlucas case of early and often stage 2 GCD is more complex than for the prime95, gpuowl, cudapm1 etc case of performing stage 2 GCD once at the end. I note that in GIMPS software currently, GCD is a single-core computation. To my knowledge no one has implemented for GIMPS software a multithreaded GCD algorithm. This can be confirmed by viewing source code, by cpu core utilization during GCDs, and by observing a increase in performance of prime95 when running with Mlucas during Mlucas GCDs. Gpuowl: stage 1 GCD is done on cpu in parallel (as a separate thread) with speculative start of stage 2 on the GPU; stage 2 GCD is done in parallel (as a separate thread) with the next worktodo item if available occupying the GPU. prime95: GCD is done sequentially with other work for the worker, so n-1 cores go idle for the duration of the GCD on an n core worker. Mlucas: GCD is done sequentially, so n-1 cores go idle for the duration of the GCD in an n-core instance. CUDAPm1: GCDs are done on one cpu core, so the GPU goes idle for the duration of the GCD. For currently plausible numbers of cpu cores, the potential performance gain from parallelizing GCD seems small. https://www.wisdom.weizmann.ac.il/~oded/PDF/gcd.pdf https://oaciss.uoregon.edu/icpp18/pu...23s2-file1.pdf Last fiddled with by kriesel on 2023-04-25 at 14:31 |
![]() |
![]() |
![]() |
#50 |
Dec 2022
3·5·29 Posts |
![]()
This is the correct rule and how I drew my conclusion. The extra GCDs _do not_ change the derivative with B2, as the new factors found with a marginal increase in it occur at the end of stage 2 where there's no time saving. They _do_ change that with B1, because the numerator (factor probability) remains the same, while the denominator (time) falls. It is still nothing to worry about as P-1 efficiency is a very flat plateau.
|
![]() |
![]() |
![]() |
Thread Tools | |
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
A couple of 15e candidates | fivemack | NFS@Home | 1 | 2014-11-30 07:52 |
How to calculate FFT lengths of candidates | pepi37 | Riesel Prime Search | 8 | 2014-04-17 20:51 |
No available candidates on server | japelprime | Prime Sierpinski Project | 2 | 2011-12-28 07:38 |
Adding New Candidates | wblipp | Operation Billion Digits | 6 | 2011-04-10 17:45 |
new candidates for M...46 and M48 | cochet | Miscellaneous Math | 4 | 2008-10-24 14:33 |