Thread: GpuOwl 7.x
View Single Post
Old 2020-09-26, 04:41   #2
preda's Avatar
"Mihai Preda"
Apr 2015

53×11 Posts
Default Merged PRP and P-1 first-stage

I'll explain briefly why this is a good thing compared to the "classic" P-1 first-stage.

P-1 first stage is conceptually simple:
a) given the bound B1, compute powerSmooth(B1) which is roughly the product of all primes under B1. The small primes are taken to a higher power (than 1).
b) compute X = 3^powerSmooth(B1)
c) compute the GCD(X - 1, Mp) hoping to come up with a factor

The costly operation is computing 3^powerSmooth(B1). In the classical setup, its cost is log2(powerSmooth(B1)) "squarings". (i.e. the number of bits in powerSmooth(B1)).

log2(powerSmooth(B1)) ~= 1.442 * B1
(BTW, 1.442 ~= 1/ln(2))

For example, doing first-stage with B1=1M costs about 1.4M "PRP iterations".

2. Merged with PRP
During normal PRP we compute 3^(2^k) iterating over k. We can use these values (the normal PRP iterations) for cheaper P-1 first stage -- Robert Gerbicz showed a nice way to aggregate them for computing the 3^powerSmooth(B1). The new algorithm uses a number N of "cache" buffers (thus, a lot of memory). The cost of P-1 first-stage (in iterations) *on top of the PRP* becomes:

B1 * 1.442 / (log2(N) + 2)

Where N is the number of buffers. This represents a reduction of the cost compared to the "classic" P-1 FS of a factor of log2(N)+2. For example, with about 256 buffers, that's a 10x reduction in cost.

The catch is that this tiny cost is "on top of" the PRP cost.

Let's make clear what that means:
if somebody wants to run the PRP anyway, then the additional cost of P-1 FS is really tiny and that's great. If somebody does not care for the PRP and wants to do *only* the P-1 FS, then there is no cost reduction at all compared to the "classic" P-1.

classicCost = B1 * 1.442
"number of buffers" N
"reduction factor" F = 1 / (log2(N) + 2)
cost excluding the PRP iterations = classicCost * F
cost including the PRP iterations = classicCost * (1 + F)

When F=1/10 as in the example (N=256), the cost "on top" of PRP is only 0.1 * classic, while the cost including the PRP component is 1.1 * classic (a slight *increase*).

Aside from these cost consideration, one little advantage of the new algorithm is that it is partially protected with the GEC error check. ("partially" means that only the PRP iterations themselves are protected, but this is still an improvement compared to the "classic" P-1 FS which is not protected at all).

Last fiddled with by preda on 2020-09-26 at 04:47
preda is offline   Reply With Quote