mersenneforum.org Is this a puzzle?
 Register FAQ Search Today's Posts Mark Forums Read

2020-08-05, 16:26   #12
R. Gerbicz

"Robert Gerbicz"
Oct 2005
Hungary

7·199 Posts

Quote:
 Originally Posted by axn Code: 136110417:717547 173564051:717463

I'm at set bits=644353 for an equivalent defined problem AND still counting.

2020-08-05, 16:47   #13
axn

Jun 2003

111258 Posts

Quote:
 Originally Posted by R. Gerbicz https://www.youtube.com/watch?v=EY27...e=youtu.be&t=9 I'm at set bits=644353 for an equivalent defined problem AND still counting.
LOL. More like too dumb, too slow

BTW, Interested in the your problem and solution approach.

2020-08-05, 19:12   #14
R. Gerbicz

"Robert Gerbicz"
Oct 2005
Hungary

7×199 Posts

Quote:
 Originally Posted by axn BTW, Interested in the your problem and solution approach.
OK, here it is:
Using an improved method Houston, reached 111.181 set bits (!), and with lower limits of the solution this could work easily on GPU also. [note that we started from 720737 ones of N !!!].

See the history here: https://mersenneforum.org/showpost.p...8&postcount=21 why we need this for P-1 test
(from another topic)
The problem is to compute x^N mod M given x^(2^e) mod M for all (small values).
It is similar to addition chain problem but here the 2^e is given free.

Use the sliding window method, you could ask why a left to right method works here,
the short reason is that because the "bases" x^(2^e) is given.

So let N=sum(i=0,t,2^e[i]*u[i]), where for a fixed E>0:
0<u[i]<2^(E+1) and u[i] is odd, and in each possible choice we choose the maximal u[] that still fits
in (0,2^(E+1)) [this choice is unique].

You need to compute x^N mod M what is x^sum(i=0,t,2^e[i]*u[i]) mod M=

prod(i=1,2^(E+1),2,(x^(2^e[i]))^u[i]) =

= prod(i=1,2^(E+1)-1,2,(prod(x^(2^e[j]) : u[j]=i))^i) mod M =

= prod(i=1,2^(E+1)-1,2,y[i]^i) mod M,

Here for each i we defined y[i]=prod(x^(2^e[j]) : u[j]=i)
And can still compute that final product with a chain method:

Say for E=2:

you need R=y[1]*y[3]^3*y[5]^5*y[7]^7

a0=y[7]
res=y[7]

a1=y[5]*a0 (=y[5]*y[7])
res=res*a1 (=y[5]*y[7]^2)

a2=y[3]*a1 (=y[3]*y[5]*y[7])
res=res*a2 (=y[3]*y[5]^2*y[7]^3)

a4=y[1]*a3 (=y[1]*y[3]*y[5]*y[7])
res=res^2
res=res*a4 (=y[1]*y[3]^3*y[5]^5*y[7]^7)

then R=res.

So basically you'd need only 2^(E+1) mulmods in this last part.

Returning to our original problem, the optimal solution is using E=13, with O(2^E*length(M)) memory, not store the x^(2^e) numbers in memory just the y[] array.
Use such number that is still optimal/fits in GPU memory.

Last fiddled with by R. Gerbicz on 2020-08-06 at 14:15 Reason: minor typos (in one place there was N instead of M)

2020-08-05, 21:53   #15
uau

Jan 2017

2×37 Posts

Quote:
 Originally Posted by R. Gerbicz OK, here it is: Using an improved method Houston, reached 111.181 set bits (!), and with lower limits of the solution this could work easily on GPU also. [note that we started from 720737 ones of N !!!].
Hmm what do you mean by bit count here? Are you talking about the same thing (number of 1 bits in N*m for some integer m)? The algorithm you describe seems to be about computing x^N mod M more cheaply, I don't see how it'd apply to finding an m such that N*m has fewer set bits...

Last fiddled with by uau on 2020-08-05 at 21:53

2020-08-05, 22:20   #16
R. Gerbicz

"Robert Gerbicz"
Oct 2005
Hungary

7·199 Posts

Quote:
 Originally Posted by uau Hmm what do you mean by bit count here? Are you talking about the same thing (number of 1 bits in N*m for some integer m)? The algorithm you describe seems to be about computing x^N mod M more cheaply, I don't see how it'd apply to finding an m such that N*m has fewer set bits...
See: the original task for P-1 method is to get x^N mod M, but with multiplying N with a "small" number is still good for the method, so when you compute r(k)=x^(k*N) mod M [and then gcd(r(k)-1,M)]. In our case x^(2^e) is given free, because you're computing the prp residue for M=2^p-1. The idea from Preda was to reuse these residues to compute the required residue for the P-1 test cheaper than the standard way, and it is really possible you need just hamming(N) mulmods if log2(N)=b this is roughly b/2 for random N, and here you can still use a small multiple of N to lower this, what this topic started.

In my method I'm using much smaller number of multiplications to compute this x^N mod M, if all x^(2^e) mod M is given. Note that this is a non-constant improvement, in general we can reach O(b/log(b)) mulmods, what is much better than the average b/2. You could still use the k*N idea here but as in above that gives very small gain.

2020-08-05, 22:38   #17
uau

Jan 2017

2·37 Posts

Quote:
 Originally Posted by R. Gerbicz In my method I'm using much smaller number of multiplications to compute this x^N mod M, if all x^(2^e) mod M is given.
Yes I got that part, what I meant to ask was what the "reached 111.181 set bits" was measuring - in context it first seemed to be about the N*m thing, but then your algorithm was about something else. So you mean something like your algorithm computing the value using 111181 multiply operations?

Last fiddled with by uau on 2020-08-05 at 22:38

2020-08-05, 22:43   #18
R. Gerbicz

"Robert Gerbicz"
Oct 2005
Hungary

139310 Posts

Quote:
 Originally Posted by uau Yes I got that part, what I meant to ask was what the "reached 111.181 set bits" was measuring - in context it first seemed to be about the N*m thing, but then your algorithm was about something else. So you mean something like your algorithm computing the value using 111181 multiply operations?
Yes, so used the "number of set bits"="number of multiplications" to describe the solution's goodness.
Written a small Pari code to get that number for E=13 for the given N.

2020-08-06, 06:14   #19
preda

"Mihai Preda"
Apr 2015

118910 Posts

Quote:
 Originally Posted by R. Gerbicz OK, here it is:
Nice Robert, thank you! I look forward to implementing this, I have a feeling P-1 first-stage just got a speed boost :)

(disclaimer: when used in conjuction with PRP)

 2020-08-06, 17:11 #20 bsquared     "Ben" Feb 2007 CDA16 Posts If it is possible/cheap to compute 3^-1 mod M, then it is possible to do even a little better by first recoding the exponent in signed representation {-1, 0, 1}. Code: powersmooth(1000000) has 1442080 bits exp has 720737 non-zeros recoded exp has 480780 non-zeros sliding windowed recoded exp needs 100622 muls Edit: Also, all of the signed exponent words are odd. Storage of the precomputed window terms doubles because you need to precompute and store 2^-E through 2^E, but is then halved because you only need to precompute/store half of them (the odd ones), so you still need storage of O(2^E*length(M)). Last fiddled with by bsquared on 2020-08-06 at 17:16
2020-08-06, 17:54   #21
R. Gerbicz

"Robert Gerbicz"
Oct 2005
Hungary

7×199 Posts

Quote:
 Originally Posted by bsquared If it is possible/cheap to compute 3^-1 mod M, then it is possible to do even a little better by first recoding the exponent in signed representation {-1, 0, 1}. Code: powersmooth(1000000) has 1442080 bits exp has 720737 non-zeros recoded exp has 480780 non-zeros sliding windowed recoded exp needs 100622 muls
Really don't understand, could you explain it?
But I've just revisited my pari code and forget to subtract one, so my updated code gives only 104341 multiplications for E=13 (and that is still the optimal).
In general for random N, and if we fill up the whole y[] array, then we can expect log2(N)/(E+2)+2^E multiplications (in our case this formula gives 104330 as approx. so quite close to the real number).

2020-08-06, 18:44   #22
bsquared

"Ben"
Feb 2007

63328 Posts

Quote:
 Originally Posted by R. Gerbicz Really don't understand, could you explain it? But I've just revisited my pari code and forget to subtract one, so my updated code gives only 104341 multiplications for E=13 (and that is still the optimal). In general for random N, and if we fill up the whole y[] array, then we can expect log2(N)/(E+2)+2^E multiplications (in our case this formula gives 104330 as approx. so quite close to the real number).
I'm just looking for ways to minimize the number of multiplications in a modular exponentiation x^N mod M.
In the handbook of applied cryptography, chapter 14 algorithm 14.85 we can just use left-to-right sliding windows to get the number of multiplications down to 102989 plus precomputing 2^12 odd multipliers x^b, 1 <= b < 2^k (b odd), using the (optimal) max window size k=13.

Algorithm 14.121 describes how to recode the exponent in signed digits.
For example e = 2^9 + 2^8 + 2^6 + 2^5 + 2^4 + 2^2 + 2 + 1 = 2^10 - 2^7 - 2^3 - 1.

Doing the recoding first on our 1442080-bit exponent reduces the number of non-zero bits from to 720737 (one bits) to 480780 (one or minus-one bits).
Then doing the sliding window algorithm on the recoded exponent results in 100622 muls (plus precomputation). Those multiplications will range over elements x^b, -2^13 < b < 2^13, b odd.

So, now that I've written all this down, I see that recoding probably doesn't save much over just doing sliding window. Especially since you need x^-1 mod M with recoding.

 Similar Threads Thread Thread Starter Forum Replies Last Post Harrywill Puzzles 4 2017-05-03 05:10 Uncwilly Puzzles 75 2012-06-12 13:42 bcp19 Puzzles 18 2012-03-02 04:11 nibble4bits Puzzles 37 2006-02-27 09:35 Orgasmic Troll Puzzles 6 2005-12-08 07:19

All times are UTC. The time now is 12:10.

Sat Sep 19 12:10:10 UTC 2020 up 9 days, 9:21, 1 user, load averages: 2.28, 1.95, 1.87