mersenneforum.org gpuOWL for Wagstaff
 Register FAQ Search Today's Posts Mark Forums Read

 2019-07-08, 16:50 #1 GP2     Sep 2003 258610 Posts mprime can do k*b^n+c How feasible would it be, in principle, to adapt gpuOwL to be more flexible? Wagstaff in particular: (2^p+1)/3
2019-07-08, 20:10   #2
paulunderwood

Sep 2002
Database er0rr

2·11·191 Posts

Quote:
 Originally Posted by GP2 mprime can do k*b^n+c How feasible would it be, in principle, to adapt gpuOwL to be more flexible? Wagstaff in particular: (2^p+1)/3

Working mod 2^p+1 is almost as easy as 2^p-1. Then a final division by 3 to get mod (2^p+1)/3. E.g:

Code:
p=127;Mod(3,2^p+1)^2^p
Mod(9, 170141183460469231731687303715884105729)
Gerbicz error checking can be done too,

2019-07-08, 20:34   #3
GP2

Sep 2003

2×3×431 Posts

Quote:
 Originally Posted by paulunderwood Working mod 2^p+1 is almost as easy as 2^p-1. Then a final division by 3 to get mod (2^p+1)/3.
In mprime, a type-5 residue for Wagstaff simply calculates 3^(2^p) mod (2^p + 1). So I don't think you need to do a division.

Probably type-4 would also be applicable to Wagstaff? Or perhaps type-2, which is similar to type-4 except using N−1 instead of N+1.

Code:
2:	SPRP variant, N is PRP if a^((N-1)/2) = +/-1 mod N
4:	SPRP variant. N is PRP if a^((N+1)/2) = +/-a mod N

2019-07-09, 19:03   #4
GP2

Sep 2003

2·3·431 Posts

Quote:
 Originally Posted by kriesel I'd like to see some P-1 related gpuowl fixes and extensions before Mihai tackles another endeavor such as extension to Wagstaff prp.
It's up to him to decide what he spends his time and effort doing. I was thinking that there might be some relatively trivial modification.

Like I mentioned earlier, the Wagstaff PRP calculation for type 5 is 3^(2^p) mod (2^p + 1) whereas for Mersenne (where type 1 and type 5 are the same thing), it's 3^(2^p − 2) mod (2^p − 1). I don't know if there is a similarly simple modification for type 4 or type 2 residues.

Since gpuOwL is a GitHub project, theoretically someone else could make the modification, possibly even forking from an earlier version that still used Mersenne type 1 residues.

2019-07-10, 00:51   #5
kriesel

"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

3×11×199 Posts

Quote:
 Originally Posted by GP2 It's up to him to decide what he spends his time and effort doing.
Of course. He volunteers his time, according to his talents and interests, like many others. None of us has a claim on him or each other, or authority to select one path versus another for him. To his credit, he sometimes accepts or asks for input from the user community. And if we users summarize outstanding issues or new desires, it can make him more efficient. Win-win.

Quote:
 I was thinking that there might be some relatively trivial modification.
It seems to me that the power difference is trivial, but the mod difference is less so. mod 2p-1 result fits in p bits, and can be done rapidly in binary by adding the quotient to the remainder displaced rightward by p bits; mod 2p+1 can't. Seems like p+1 bits storage and subtract quotient after a p bit right shift would be in order. That in turn implies borrows rather than carries as in the existing code. But all that is from thinking in untransformed integer binary operand terms.

Quote:
 Like I mentioned earlier, the Wagstaff PRP calculation for type 5 is 3^(2^p) mod (2^p + 1) whereas for Mersenne (where type 1 and type 5 are the same thing), it's 3^(2^p − 2) mod (2^p − 1). I don't know if there is a similarly simple modification for type 4 or type 2 residues. Since gpuOwL is a GitHub project, theoretically someone else could make the modification, possibly even forking from an earlier version that still used Mersenne type 1 residues.
Which would be ~gpuowl v1.5 to 3.9. https://www.mersenneforum.org/showpo...3&postcount=15
There are other ways to do Wagstaff, to ~920M, though maybe not as high a p as you'd like to go to if you're thinking of taking the new Mersenne conjecture testing further.
There are also other ways to do p-1 factoring on Mersennes, although not above ~432.5M in CUDAPm1 in practice, or ~920M in mprime/prime95, and not on OpenCl at all.

2019-07-11, 21:33   #6
preda

"Mihai Preda"
Apr 2015

56616 Posts

Quote:
 Originally Posted by GP2 I was thinking that there might be some relatively trivial modification. Like I mentioned earlier, the Wagstaff PRP calculation for type 5 is 3^(2^p) mod (2^p + 1) whereas for Mersenne (where type 1 and type 5 are the same thing), it's 3^(2^p − 2) mod (2^p − 1). I don't know if there is a similarly simple modification for type 4 or type 2 residues.
This is what I don't know (maybe somebody could enlighten me) about the implementation of "mod 2^p+1":

In the Mersenne case, we want a cyclic convolution. The simple weighting that is done before/after the FFT achieves that.

For the "mod 2^p+1", we want a negacyclic convolution. Can this be achieved through a similar weighting (with different weights)? Or is something more involved needed?

To add a bit more detail: in the mersenne case, the weights are real. IF for 2^p+1 we need weighting with complex weights, this changes the implementation significantly because the FFT input is not real anymore.

2019-07-12, 00:13   #7
Prime95
P90 years forever!

Aug 2002
Yeehaw, FL

7,901 Posts

Quote:
 Originally Posted by preda This is what I don't know (maybe somebody could enlighten me) about the implementation of "mod 2^p+1"
There are two weights. You still have the real weights to distribute the p bits uniformly over the FFTLEN words.

You also need to apply complex roots-of-minus-one to "trick" the FFT into doing a negacyclic convolution instead of a cyclic convolution. You don't need any extra FFT memory, but you do need a modified first pass that takes real inputs and produces weighted complex FFT'ed outputs. Not easy, but not hard either.

Next you need a new simpler second pass that scraps all the Hermetian symmetry computations before the point-wise squaring.

2019-07-19, 23:27   #8
ewmayer
2ω=0

Sep 2002
República de California

2DD816 Posts

Quote:
 Originally Posted by preda This is what I don't know (maybe somebody could enlighten me) about the implementation of "mod 2^p+1": In the Mersenne case, we want a cyclic convolution. The simple weighting that is done before/after the FFT achieves that.
No, the FFT is inherently cyclic-convolutional ... the IBDWT weighting allow us to use a prime-length "bit folding" boundary in conjunction with an underlying polynomial-multiply which most naturally lends itself to a bitness which is highly composite, by way of being a multiple of the transform length.

Quote:
 For the "mod 2^p+1", we want a negacyclic convolution. Can this be achieved through a similar weighting (with different weights)? Or is something more involved needed? To add a bit more detail: in the mersenne case, the weights are real. IF for 2^p+1 we need weighting with complex weights, this changes the implementation significantly because the FFT input is not real anymore.
As George noted, for (mod 2^p+1) you need 2 distinct weightings: the IBDWT one to allow for a prime-length bit-folding, and the standard acyclic-effecting weighting, which for a length-n transform uses the first n complex (2*n)th roots of unity. That needs a complex FFT algorithm; for length-n real input vector you can use a length-(n/2) complex FFT. Noting that the [j]th and [j+n/2]th acyclic weights (call them 'awt') are related by awt[j+n/2] = I*awt[j], you can see that in this context it makes sense to group pairs of real inputs together not via the usual (x[j],x[j+1])-treated-as-a-complex-datum scheme but rather in (x[j],x[j+n/2]) pairs, since applying the acyclic-weights turns those 2 reals into (awt[j]*x[j],I*awt[j]*x[j+n/2]), i.e. we can pull out the shared complex acyclic-multiplier awt[j] = exp(I*j/(2*n) to get a weighted complex input awt[j]*(x[j] + I*x[j+n/2]). This is the so-called "right-angle transform" trick. Crandall & Fagin recapped it (since it wasn't new) in the Fermat-mod section of the same 1994 paper where they introduced the Mersenne-mod IBDWT.

2019-07-20, 00:51   #9
preda

"Mihai Preda"
Apr 2015

2×691 Posts

Thank you, this is useful!

Quote:
 Originally Posted by ewmayer No, the FFT is inherently cyclic-convolutional ... the IBDWT weighting allow us to use a prime-length "bit folding" boundary in conjunction with an underlying polynomial-multiply which most naturally lends itself to a bitness which is highly composite, by way of being a multiple of the transform length. As George noted, for (mod 2^p+1) you need 2 distinct weightings: the IBDWT one to allow for a prime-length bit-folding, and the standard acyclic-effecting weighting, which for a length-n transform uses the first n complex (2*n)th roots of unity. That needs a complex FFT algorithm; for length-n real input vector you can use a length-(n/2) complex FFT. Noting that the [j]th and [j+n/2]th acyclic weights (call them 'awt') are related by awt[j+n/2] = I*awt[j], you can see that in this context it makes sense to group pairs of real inputs together not via the usual (x[j],x[j+1])-treated-as-a-complex-datum scheme but rather in (x[j],x[j+n/2]) pairs, since applying the acyclic-weights turns those 2 reals into (awt[j]*x[j],I*awt[j]*x[j+n/2]), i.e. we can pull out the shared complex acyclic-multiplier awt[j] = exp(I*j/(2*n) to get a weighted complex input awt[j]*(x[j] + I*x[j+n/2]). This is the so-called "right-angle transform" trick. Crandall & Fagin recapped it (since it wasn't new) in the Fermat-mod section of the same 1994 paper where they introduced the Mersenne-mod IBDWT.

 2019-11-22, 07:30 #10 axn     Jun 2003 124008 Posts Forked the Wagstaff discussion into a separate thread (copy - original posts are still there).
2019-11-22, 16:29   #11
paulunderwood

Sep 2002
Database er0rr

2×11×191 Posts

Quote:
 Originally Posted by axn Forked the Wagstaff discussion into a separate thread (copy - original posts are still there).
Now we need someone competent to fork the source code

I for one would run Wagstaff.

 Similar Threads Thread Thread Starter Forum Replies Last Post SELROC GpuOwl 69 2021-09-29 10:07 GP2 Wagstaff PRP Search 414 2020-12-27 08:11 Prime95 GpuOwl 91 2019-12-30 08:30 M344587487 GpuOwl 14 2018-12-29 08:11 preda PrimeNet 2 2017-10-07 21:32

All times are UTC. The time now is 09:00.

Tue Jun 28 09:00:20 UTC 2022 up 75 days, 7:01, 1 user, load averages: 1.34, 1.23, 1.13