mersenneforum.org TF algorithm(s)
 Register FAQ Search Today's Posts Mark Forums Read

 2011-06-19, 13:31 #1 davieddy     "Lucan" Dec 2006 England 2·3·13·83 Posts TF algorithm(s) I assume the "exponentiation" algorithm as described on the "Math" page is as good as any. Say the TFs have 45 bits. The first five steps can be precalculated for a given exponent. For the rest of the bits in the exponent, one has to find a 90 bit number mod the TF. I accomplished this by shifting both right (or more likely selecting the appropriate storage bytes) and doing a 64/32 bit div (careful about the quotient overflowing), multiplying the quotient by the TF and then a tiny bit of "suck and see". Two questions: 1) Is this the best way of doing division on this size of number? 2) Is there someone (Euclid perhaps) who has a cleverer way of arriving at the remainder? David Last fiddled with by davieddy on 2011-06-19 at 13:43
 2011-06-19, 14:42 #2 LaurV Romulan Interpreter     Jun 2011 Thailand 25×5×59 Posts you can build a lookup table with 45 entries for every power of 2 bigger or equal to your TF, so your mod function is just maximum 46 additions mod TF. This is not the fastest method, but is the easiest (in fact, depending how fast you can add and search the bits on the 90-bit result from 45 to 90, shift and check bit x, it can be the fastest too, long ago i tried to implement that on a virtex, but failed miserably due to insuficient knowlwdges of fpga design). think like that, if you do LL, compute Sn=(Sn-1)^2-2, then to to compute the Sn mod 2^n-1 at every step, that is just adding the two parts of the result together (the lower n bits with the higher n bits) and check if you got 2^n-1 (in this case the mod is 0, but when you repeat in the next step, it does not matter, if you compute the square, as in LL or exponentiation, then (x^2 mod n) = ((n-x)^2 mod n) so I always select the one with less bits). same for general case: 11010110 (base2) mod 5 is (from right to left): 2+4+1+4+3=4 mod 5 Last fiddled with by LaurV on 2011-06-19 at 14:46
2011-06-19, 15:42   #3
davieddy

"Lucan"
Dec 2006
England

647410 Posts

Quote:
 Originally Posted by LaurV you can build a lookup table with 45 entries for every power of 2 bigger or equal to your TF, so your mod function is just maximum 46 additions mod TF. This is not the fastest method, but is the easiest (in fact, depending how fast you can add and search the bits on the 90-bit result from 45 to 90, shift and check bit x, it can be the fastest too, long ago i tried to implement that on a virtex, but failed miserably due to insuficient knowlwdges of fpga design). think like that, if you do LL, compute Sn=(Sn-1)^2-2, then to to compute the Sn mod 2^n-1 at every step, that is just adding the two parts of the result together (the lower n bits with the higher n bits) and check if you got 2^n-1 (in this case the mod is 0, but when you repeat in the next step, it does not matter, if you compute the square, as in LL or exponentiation, then (x^2 mod n) = ((n-x)^2 mod n) so I always select the one with less bits). same for general case: 11010110 (base2) mod 5 is (from right to left): 2+4+1+4+3=4 mod 5
THX

Sounds "uncrankish"
Will attempt to interpret it when sober (cf Winston Churchill).

Meantime can I take this opportunity to repeat my plea for participants
to divulge their age, and approx location*. "Experience" is optional, and quickly sussed.

As I think Uncwilly said in another thread:
Welcome to the House of Fun

David

*Default sex seems to be male more's the pity!

Last fiddled with by davieddy on 2011-06-19 at 15:45

2011-06-21, 08:56   #4
davieddy

"Lucan"
Dec 2006
England

2·3·13·83 Posts

Quote:
 Originally Posted by davieddy Sounds "uncrankish" Will attempt to interpret it when sober (cf Winston Churchill).
@LaurV
I'm afraid I think you have got the wrong end of the stick there.

We are not TFing a fixed 91 bit number.
We are TFing 2^50xxxxxx - 1, and 50M has 26 bits.

The "exponentiation" algorithm (see the Math page) involves
26 steps of which the result of the first 5 can be precalculated if the
TF bit range >47 bits, because the result (a power of 2) is less than the TFs.

For the remaining 21 steps, the result has to be found mod TF.
If the TF has 50 bits, the next step may produce 101 bits (squaring
and doubling the unpredictable 50 bit result of the previous step).
This is what needs to be "modded TF".

No point in building a lookup table for just 21 steps is there?
(Given a decent div instruction of course).

David

Last fiddled with by davieddy on 2011-06-21 at 09:39

2011-06-21, 10:52   #5
LaurV
Romulan Interpreter

Jun 2011
Thailand

100100111000002 Posts

Quote:
 Originally Posted by davieddy @LaurV I'm afraid I think you have got the wrong end of the stick there.
no, I did not. You have an N which is 100 bits (in your former example it was 45 or so, that was why I used that number as an example) which you have to compute it modulo a number of 50 bits. Maybe my English is a bit "crankish" especially when I am in hurry, but my math here is not. I stressed out "here".

You have to TF one m which is 2^p-1. Say m=2^50xxxxxx, as you said, with 20-30 bits on the exponent, but that does not matter. To do TF, you consider potential factors.

Not my business how you compute that potential factors. I personally consider numbers of the form 8kp+sp+1 (with s=2 or 6, depending of p being 3 or 1 mod 4, respectively), or 8kp+1 which are 2SPRP as potential factors. It is easy to show that if a factor f is of the form 2kp+1 and 8x+1, then it is of the form 8kp+1, because then 4 divides k. It is also easy to show that if a factor is 2kp+1 and 8x-1, that it has to be 8kp+2p+1 for all p that are 3 mod 4, and it has to be 8kp+6p+1 for all p that are 1 mod 4.

So, the TF code could look more or less like that:

1. take next p prime (from the exponent's list).
2. s=2p (left shift 1 position), z=4s (left shift 2 positions)
3. if p=1 mod 4, then s=3s (just check last 2 bits of p, for multiplying by 3 is just a shift and add)
4. f=s+1
5. if f is not 2SPRP go to step 8 (much faster then primality check or sieving an 60 to 80 bit interval, and you can end up trying some factors which are not primes, but it really does not matter).
6. check if 2^p=1 mod f, by exponentiation
7. if so, factor found and it is f, stop.
8. s=z-s
9. f=f+s
10. if not at the end of interval for factors that you want to check, go to step 5.

This is the fastest method to generate potential factors, as you need to make only two additions and an PRP check to get to the next factor, without walking through all 2kp+1 numbers, assuming you do not have plenty of memory to store a sieved string of bits (as GIMPS is doing). As I said already, I tried to implement this stuff in "hardware", as my job is electronic design and MCU programming. A 4kp+1 or 12kp+1 or 20kp+1 could never be a factor of a mersenne number, for an odd k.

So, for some p which is 1 mod 4 you need only to test 6p+1, 8p+1, 14p+1, 16p+1, 22p+1, 24p+1 and so on, and for some p which is 3 mod 4 you only need to test 2p+1, 8p+1, 10p+1, 16p+1, 18p+1, 24p+1 ... for SPRP-ness and eventually do an exponentiation. Now there could be a long discussion here, about the existence of the "+" factors (that is, factors of the form 8x+1) and "-" factors (that is factors of the form 8x-1). One composite m can not have only "+" factors, because their product would be 1 mod 8, and m is always 7 mod 8. So, if m is composite, it always has an odd number of "-" factors (read: at least one) and it can possibly have no "+" factors. So, it could make sense that someone looks only for the factors of the form 8kp+sp+1, as above, therefore reaching higher "bit count" faster, but this could be a waste of resources if the only existent "-" factor is huge. Nobody says the "-" factor is the smallest, in fact is not, for example 2^29-1 has the factors 233, 1103, 2089, and the "minus" factor is 1103. Whatever...

I assumed you were only asking here about step 6 in the algorithm. If not, then, yes, I got the wrong side of the stick.

For your example, you can "precalculate" the first 5 steps (nothing to precalculate, just a 1 shifted 32 positions to the left). Then you square it (not my business how you do that!) and you have a result, call it w, which is to be computed "mod f" (same f from my step 6 above). Did I got it right up to now?

Now, you start aligning your f to w, and doing a series of comparings and subtractions, in fact w is just a power of 2 in this step, and the enigma could be solved with a simple lookup in an array that stores "precalculated" powers of 2, modulo f.

Say f is 70 bits (more realistic case, because you want to find factors in 2^60++ range for a 50M exponent). To generate the table you start with 2^70 (an 1 shifted 70 positions to the left), which is the first power of 2 bigger then your f, and you store 2^70-f as the first entry in the table. This is right, because your f is 70 bits, so is higher then 2^69, therefore 2^70 can not contain 2f, it just contains 1f, so 2^70-f is smaller then f. Then you do 70 times a shift left by 1 bit, and if the result is bigger then f, you subtract f and store it in the table. For that, you did 70 shifts and about (statistically) 55 subtractions.

Now, the only problem you have with exponentiation is related to the squaring, because multiplying by 2 is just a shift left, a test (if w>f) and an eventual subtraction of f.

So, for your 21 squares you will get numbers in range of 1 to 140 bits. That numbers, 21 of them, you have to compute mod f. For each, you have to do aligning f to the MSB and check if the result of the subtraction is not negative, in such case realigning f to the bit which is "next to MSB" in w (well, this you can do it in a more clever way, but just for the sake of the example), subtracting f, and if the result is still bigger then f, repeat. In the worst case, you make 70 align-check-subtract cycles. Align is costly-est if not done clever.

If you would have a table as built above, then you have to parse the bits in w from 70 to 140, and for each bit you find being 1 you have to add the value in the table to the modulus. If is bigger then f, you subtract f. In the worst case you make 70 additions and 70 subtractions, but you avoid the alignment process. This is "theoretical" worst case, because it would imply all bits being 1 from 70 to 140, and all additions being bigger then f, which will never happen. Statistically you have about half of the bits set, and about 65 percent of additions will require a subtract of f.

Trust me, this is faster then the "classic", and only involves additions and subtractions and bit check. For our particular example, if you just make 7 operation (addition or subtraction of f) less on each of your 21 steps, then you already have 144 operations less comparing with the "clasic", and you spent 140 to generate the table, so you are already 4 operations faster.

Last fiddled with by LaurV on 2011-06-21 at 11:07

2011-06-21, 12:58   #6
davieddy

"Lucan"
Dec 2006
England

194A16 Posts
Getting on the same wavelength/hymnsheet

Any assistance from a third party in this direction would be appreciated.
(PM if necessary).

Quote:
 Originally Posted by LaurV Maybe my English is a bit "crankish" especially when I am in hurry
Which is why I was wondering where you come from.
Quote:
 but my math here is not. I stressed out "here".
I was taking that for granted.
I stressed out too:)
Quote:
 You have to TF one m which is 2^p-1. Say m=2^50xxxxxx, as you said, with 20-30 bits on the exponent
26 bits
Quote:
 To do TF, you consider potential factors. So, the TF code could look more or less like that: .... 6. check if 2^p=1 mod f, by exponentiation This is the fastest method to generate potential factors, as you need to make only two additions and an PRP check to get to the next factor, without walking through all 2kp+1 numbers, assuming you do not have plenty of memory to store a sieved string of bits (as GIMPS is doing)
Sieving say 15015*8 bits (k values) at a time works fine.
Quote:
 . As I said already, I tried to implement this stuff in "hardware", as my job is electronic design and MCU programming.
Yep. This might be the source of misunderstanding. I am not
discussing how division is accomplished bit by bit in either software
or hardware.
Quote:
 I assumed you were only asking here about step 6 in the algorithm. If not, then, yes, I got the wrong side of the stick.
I was
Quote:
 For your example, you can "precalculate" the first 5 steps (nothing to precalculate, just a 1 shifted 32 positions to the left). Then you square it (not my business how you do that!) and you have a result, call it w, which is to be computed "mod f" (same f from my step 6 above). Did I got it right up to now?
No. You get a power of 2 between 32 and 47, depending on the
top five bits of the exponent
Quote:
That's what I have been doing

Calculating 1/f might be worth it though.
The whole question is hardware dependent anyway.

David

Last fiddled with by davieddy on 2011-06-21 at 13:17

2011-06-21, 15:03   #7
LaurV
Romulan Interpreter

Jun 2011
Thailand

25·5·59 Posts

Quote:
 Originally Posted by davieddy No. You get a power of 2 between 32 and 47, depending on the top five bits of the exponent
Bingo! you got me here! That was a stupid thing I said, I did not think enough deep for it! But his changes nothing, if you know your f being larger then 47 bits, you just shift the 1 in cause with 32+n, where n is the number given by the first 5 bits of the exponent.

Quote:
 The whole question is hardware dependent anyway.
Fully agree. That I was saying since first post :D

Last fiddled with by LaurV on 2011-06-21 at 15:10

2011-06-21, 15:44   #8

"Richard B. Woods"
Aug 2002
Wisconsin USA

22·3·641 Posts

Help me understand here:
Quote:
 Originally Posted by LaurV So, the TF code could look more or less like that: 1. take next p prime (from the exponent's list).
What "next" p prime?

If we're TFing 2^50xxxxxx-1, isn't p a single fixed value (50xxxxxx) while we try a list of candidate factors 2kp+1?

Or, are you proposing some shortcut for TFing a list of several exponents (50xxxxxx, 50xxxxyy, 50xxxxzz, ...)?

Last fiddled with by cheesehead on 2011-06-21 at 15:54

2011-06-21, 16:18   #9
davieddy

"Lucan"
Dec 2006
England

2·3·13·83 Posts

Quote:
 Originally Posted by LaurV Bingo! you got me here! That was a stupid thing I said, I did not think enough deep for it!
I still agree that "worthwhile precalculation" was somewhat
overemphasizing the difficulty (although not the usefulness)
of doing it.

You Really Got Me

David

PS Excuse my music links. They are a bit of a trade/birthmark, like
the gerbils, snakes, fish and bamboozle trees

2011-06-21, 17:27   #10
LaurV
Romulan Interpreter

Jun 2011
Thailand

25·5·59 Posts

Quote:
 Originally Posted by cheesehead Help me understand here: What "next" p prime? If we're TFing 2^50xxxxxx-1, isn't p a single fixed value (50xxxxxx) while we try a list of candidate factors 2kp+1? Or, are you proposing some shortcut for TFing a list of several exponents (50xxxxxx, 50xxxxyy, 50xxxxzz, ...)?
neither of it, read step 1 as "chose your exponent" :D i assumed you get them from a list. that is not the point of the post.

(joke) of course in my mind was something more grandiose as "trial factoring all mersenne for all prime exponents" (hehe) but the algoritm went wrong, because in that case I should not say "stop" in step 7, but say "go to step 1"

2011-06-21, 18:35   #11
davieddy

"Lucan"
Dec 2006
England

2×3×13×83 Posts

Quote:
 Originally Posted by cheesehead Or, are you proposing some shortcut for TFing a list of several exponents (50xxxxxx, 50xxxxyy, 50xxxxzz, ...)?
That thought occurred to me.
That way it would make sense to do anything that facilitated doing "mod f"
for fixed f.

But how many "p"s satisfy f=2kp + 1 etc?

David

PS Do note that "choose p" was the outermost loop
in LaurV's "program".

Last fiddled with by davieddy on 2011-06-21 at 19:01

 Similar Threads Thread Thread Starter Forum Replies Last Post jasonp Math 2 2012-06-17 20:04 geoff Twin Prime Search 5 2010-11-07 17:02 Unregistered Homework Help 0 2007-08-09 17:40 nuggetprime Riesel Prime Search 5 2007-04-20 04:19 Angular Math 6 2002-09-26 00:13

All times are UTC. The time now is 14:04.

Fri May 7 14:04:55 UTC 2021 up 29 days, 8:45, 0 users, load averages: 2.75, 2.65, 2.72