Merged PRP and P1 firststage
I'll explain briefly why this is a good thing compared to the "classic" P1 firststage.
P1 first stage is conceptually simple:
a) given the bound B1, compute powerSmooth(B1) which is roughly the product of all primes under B1. The small primes are taken to a higher power (than 1).
b) compute X = 3^powerSmooth(B1)
c) compute the GCD(X  1, Mp) hoping to come up with a factor
The costly operation is computing 3^powerSmooth(B1). In the classical setup, its cost is log2(powerSmooth(B1)) "squarings". (i.e. the number of bits in powerSmooth(B1)).
log2(powerSmooth(B1)) ~= 1.442 * B1
(BTW, 1.442 ~= 1/ln(2))
For example, doing firststage with B1=1M costs about 1.4M "PRP iterations".
2. Merged with PRP
During normal PRP we compute 3^(2^k) iterating over k. We can use these values (the normal PRP iterations) for cheaper P1 first stage  Robert Gerbicz showed a nice way to aggregate them for computing the 3^powerSmooth(B1). The new algorithm uses a number N of "cache" buffers (thus, a lot of memory). The cost of P1 firststage (in iterations) *on top of the PRP* becomes:
B1 * 1.442 / (log2(N) + 2)
Where N is the number of buffers. This represents a reduction of the cost compared to the "classic" P1 FS of a factor of log2(N)+2. For example, with about 256 buffers, that's a 10x reduction in cost.
The catch is that this tiny cost is "on top of" the PRP cost.
Let's make clear what that means:
if somebody wants to run the PRP anyway, then the additional cost of P1 FS is really tiny and that's great. If somebody does not care for the PRP and wants to do *only* the P1 FS, then there is no cost reduction at all compared to the "classic" P1.
classicCost = B1 * 1.442
"number of buffers" N
"reduction factor" F = 1 / (log2(N) + 2)
cost excluding the PRP iterations = classicCost * F
cost including the PRP iterations = classicCost * (1 + F)
When F=1/10 as in the example (N=256), the cost "on top" of PRP is only 0.1 * classic, while the cost including the PRP component is 1.1 * classic (a slight *increase*).
Aside from these cost consideration, one little advantage of the new algorithm is that it is partially protected with the GEC error check. ("partially" means that only the PRP iterations themselves are protected, but this is still an improvement compared to the "classic" P1 FS which is not protected at all).
Last fiddled with by preda on 20200926 at 04:47
