 mersenneforum.org Sieving freakishly big MMs (was "World record" phone number?)
 Register FAQ Search Today's Posts Mark Forums Read  2012-10-10, 01:34   #111
Prime95
P90 years forever!

Aug 2002
Yeehaw, FL

1E5E16 Posts Quote:
 Originally Posted by ewmayer Very nice! So now one just needs to do the normalize/carry step - that will allow me to work out the details of the wraparound carry - and divide every resulting word by 185 before the next squaring?
Cool - we're getting close.

Here are the comments from the code:

; *************** Top carry adjust macro ******************
; This macro corrects the carry out of the topmost word when k is not 1.
; The problem is the top carry is from b^ceil(logb(k)+n) rather than at k*b^n.
; So we recompute the top carry by multiplying by b^ceil(logb(k)) and then
; dividing by k. The integer part is the new carry and the remainder is
; added back to the top three words.

In your example, this means we leave leave 24 bits in the top word (not 23.53... bits). Part of carry out of the top word is wrapped to the first FFT word and part is added to the top 2 or 3 words of the FFT.

There is no "divide every word by k by 185 before the next squaring" step. Before we square the value is multiplied by 1/k. Squaring gives us the result times 1/k^2. Carry propagation (as you saw in your 4-word example) does a multiply by k. This gives us a result times 1/k -- exactly what is needed for the next squaring step.

In short, we pre-divide by k before doing a series of squarings and post-multiply by k when we need to produce a final result.   2012-10-10, 22:51   #112
ewmayer
2ω=0

Sep 2002
República de California

23×3×487 Posts Exact result: z = 185*x^2 % m = 57343615719861605804690122685

z0=z%b1;z1=(z>>25)%b0;z2=(z>>49)%b0;z3=(z>>73);

gives

z0 = 3418045
z1 = 14771380
z2 = 8030859
z3 = 6071491

Compare with the actual outputs:

y0 = round(185*x0); y1 = round(185*x1); y2 = round(185*x2); y3 = round(185*x3);

Check: y = y0+b1*(y1+b0*(y2+b0*y3)) = 804533277125630163550906103961834307913

y % m = 57343615719861605804690122685, as desired . So the modded result is correct;
trick is to get the carry step to correctly mod the result by way of the wraparound carry.

Normalize/carry of outputs y0-y3:

y0 = 88232760403902793 = 23937353 + cy*b1, cy = (y0>>25) = 2629541170, 23937353 not close to z0 but still lacking wraparound
y1 = 40134908515305136; t = y1+cy = 40134911144846306 = 14771170 + cy*b0, cy = (t>>24) = 2392227121, 14771170 is close to z1
y2 = 81210328945528154; t = y2+cy = 81210331337755275 = 8030859 + cy*b0, cy = (t>>24) = 4840512951, 8030859 matches z2
y3 = 85183269270472460; t = y3+cy = 85183274110985411 = 3187907, that differs from expected-value z3 above, but you note that is expected .

Our pre-wraparound-carry digits y0 and y1 collectively sum to 23937353 + b1*14771170 = 495638243262793 ;
Exact result has z0 + b1*z1 = 495645269174205, which implies that we need a wraparound carry = 7025911412 .
We need to extract this from the high word via your

Quote:
 Originally Posted by Prime95 ; *************** Top carry adjust macro ****************** ; This macro corrects the carry out of the topmost word when k is not 1. ; The problem is the top carry is from b^ceil(logb(k)+n) rather than at k*b^n. ; So we recompute the top carry by multiplying by b^ceil(logb(k)) and then ; dividing by k. The integer part is the new carry and the remainder is ; added back to the top three words. In your example, this means we leave leave 24 bits in the top word (not 23.53... bits). Part of carry out of the top word is wrapped to the first FFT word and part is added to the top 2 or 3 words of the FFT.
In my example: b = 2, k = 185, so logb(k) = log2(185) = 7.53..., what I call 'l2k' in my Pari code.
The multiplier b^ceil(logb(k)) = 2^ceil(l2k) = 2^ceil(7.53...) = 2^8 .
So multiply y3 by 128/185 (or just multiply the high FFT output by 128 rather than 185 during output processing) to yield y3 = 58937613333083648 instead of 85183269270472460 as above.
This divided by expected wraparound carry = 58937613333083648 / 7025911412 = 8388607.5239178731620477767561126203395 .
Note 2^23 = 8388608, i.e. this quotient is just less than 2^23.

So we can get the needed wraparound carry out of the modified high output via ceil(y3/2^23), but that would yield a negative normalized high digit, not the expected value, 6071491.   2012-10-11, 04:02   #113
Prime95
P90 years forever!

Aug 2002
Yeehaw, FL

2·132·23 Posts Quote:
 Originally Posted by ewmayer y3 = 85183269270472460; t = y3+cy = 85183274110985411 = 3187907, that differs from expected-value z3 above, but you note that is expected .
85183274110985411 = 3187907 + 5077318794*b0.

5077318794 is the carry out of the top word that we need to split.

Quote:
 The multiplier b^ceil(logb(k)) = 2^ceil(l2k) = 2^ceil(7.53...) = 2^8 .
5077318794 * 2^8 / 185 = 3512955706 which is half the wrap around carry you need, so I've made a mistake somewhere.

5077318794 * 2^8 % 185 = 22

z3 = 6071491. z3 = 3187907 + (22 << 17).

So we need to figure out where the off by a factor of 2 bug comes from and how to derive the constant 17. But, it's late....

Last fiddled with by Prime95 on 2012-10-11 at 08:02 Reason: Fixed mess made by cutting and pasting   2012-10-11, 08:12   #114
Prime95
P90 years forever!

Aug 2002
Yeehaw, FL

11110010111102 Posts Quote:
 Originally Posted by Prime95 5077318794 * 2^8 / 185 = 3512955706 which is half the wrap around carry you need, so I've made a mistake somewhere.
Ah, the mistake is I typed 128 (for 2^8 -- thank Ernst for writing about multiplying y3 by 128 / 185....) into the calculator instead of 256. Moral of the story, don't do math late at night after several beers. I've used that excuse on the forum several times before, but I'm a slow learner.

5077318794 * 2^8 / 185 = 7025911412
5077318794 * 2^8 % 185 = 44

z3 = 3187907 + (44 << 16)

The 16 comes from the original n in k*b^n+c. That n was 89. Sixteen is 89 - 25 bits in z0 - 24 bits in z1 - 24 bits in z2.   2012-10-11, 15:48   #115
LaurV
Romulan Interpreter

"name field"
Jun 2011
Thailand

71×139 Posts Till you solve the math challenges (btw interesting discussion and I just begun to understand what you are talking about :D) I will be back doing the dirty work. Here is the table, where I STOPPED. This means I don't do sieving anymore, as it is low-profit, compared with Ernst's sieve. Because my pari seems to be fast enough under 32 bits, I took almost all the table to 4G29, finding a couple of factors in the way. The few cells which are still at 1G had some errors in the log files and I let them to 1G. Also, most of the k's on my rows were taken to 20G, but not all, few of them still at 10G. I found few more factors on the way. I did not touch the #47 line, as that is now most probably sieved much higher with Ernst's tool, except for the last cell on the line (the over_1000 mark) which is probable not in the big sieve (no idea if you sieve under 1000 only, or other ranges).

So, here I stopped with prescreening/sieving.

What I am going to do for the time being, is extending the "watermarks" for TF, as seen on the third page of the sheet, I am going to push some of those limits higher. The results written there are from last weekend. For the blue lines I started from scratch (this means I double-checked Toni's, etc's, ranges/results, and continue higher). For the yellow cells I only went higher (as the process is slower), i.e. did not recheck the previous range. I may tackle MM#15 too, but that 2G limit is a bit high, it would need more then a week for a full retest, with the free cores I have.

The log files attached may not be complete (collecting them manually from different places) but the excel table should be complete.
Attached Files Pari.zip (33.3 KB, 109 views)

Last fiddled with by LaurV on 2012-10-11 at 15:50   2012-10-11, 16:31   #116
ET_
Banned

"Luigi"
Aug 2002
Team Italia

113468 Posts Quote:
 Originally Posted by LaurV Till you solve the math challenges (btw interesting discussion and I just begun to understand what you are talking about :D) I will be back doing the dirty work. Here is the table, where I STOPPED. This means I don't do sieving anymore, as it is low-profit, compared with Ernst's sieve. Because my pari seems to be fast enough under 32 bits, I took almost all the table to 4G29, finding a couple of factors in the way. The few cells which are still at 1G had some errors in the log files and I let them to 1G. Also, most of the k's on my rows were taken to 20G, but not all, few of them still at 10G. I found few more factors on the way. I did not touch the #47 line, as that is now most probably sieved much higher with Ernst's tool, except for the last cell on the line (the over_1000 mark) which is probable not in the big sieve (no idea if you sieve under 1000 only, or other ranges). So, here I stopped with prescreening/sieving. What I am going to do for the time being, is extending the "watermarks" for TF, as seen on the third page of the sheet, I am going to push some of those limits higher. The results written there are from last weekend. For the blue lines I started from scratch (this means I double-checked Toni's, etc's, ranges/results, and continue higher). For the yellow cells I only went higher (as the process is slower), i.e. did not recheck the previous range. I may tackle MM#15 too, but that 2G limit is a bit high, it would need more then a week for a full retest, with the free cores I have. The log files attached may not be complete (collecting them manually from different places) but the excel table should be complete.
Thank you for your update, LaurV! I will update my tables as soon as possible with your results. The basic idea is to fill the "holes" and take each range to the same level; after that I will dedicate one core to raise the level of M#47 candidates.

Luigi   2012-10-11, 21:35   #117
ewmayer
2ω=0

Sep 2002
República de California

23·3·487 Posts Quote:
 Originally Posted by Prime95 Ah, the mistake is I typed 128 (for 2^8 -- thank Ernst for writing about multiplying y3 by 128 / 185....) into the calculator instead of 256. Moral of the story, don't do math late at night after several beers. I've used that excuse on the forum several times before, but I'm a slow learner.
Or a steady drinker. :) But what's my excuse? (Here's one: I think the 2 and 8 in 2^8 somehow caused the nearby power of two 128 to pop into my head).

Quote:
 5077318794 * 2^8 / 185 = 7025911412 5077318794 * 2^8 % 185 = 44 z3 = 3187907 + (44 << 16) The 16 comes from the original n in k*b^n+c. That n was 89. Sixteen is 89 - 25 bits in z0 - 24 bits in z1 - 24 bits in z2.
Aha - So the final details are that only the *carryout* of the high FFT word gets multiplied by ceil(logb(k))/k = 2^8/185, and now I finally understand what you mean by "splitting". Many thanks for your patience in explaining this. How long did it take you to work it all out originally?

In terms of practical coding issues, besides the input-wordsize restrictions added by the k-term, the major difference I see is that instead of using pure-integer arithmetic to keep track of the current wordsize (e.g. by tracking #smallwords*array-index % N), one now needs to allow non-integral quantities (via n + log2(k)) in the wordsize bookkeeping, is that right?

Lastly, what to do if c in k*b^n+c is negative? Do you just layer on a standard set of negacyclic-convolution weights and apply the generalized Mersenne-mod-style DWT to k*b^n+|c| ?   2012-10-14, 15:48 #118 ET_ Banned   "Luigi" Aug 2002 Team Italia 2×41×59 Posts Deep sieving project page To LaurV, Ernst, Phil, axn and all the sievers that helped: Here is the online table with all the ranges. It is still a beta-production, some subpages have still not been uploaded, but it is useful to have a place where we can watch the project proceed. As you can see, there are some glitches: as I don't have the factors for some results, I am rerunning them. The k implied are the following: Code: 1257787: k = 8,29,48,53 (2T, 1T, 1T, 500G) 1398269: k = 5,44 (2T, 500G) 3021377: k = 44,60 (2T, 1T) 6972593: k = 5,24 (1T) 20996011: k = 8 (1T) 24036583: k = 8 (500G) eliminated by sieving If you, perchance, have the factors of such ks, I will gladly add them to the DB, and start other sievings. Thank you. Luigi Last fiddled with by ET_ on 2012-10-17 at 19:03   2012-10-14, 20:12   #119
Prime95
P90 years forever!

Aug 2002
Yeehaw, FL

171368 Posts Quote:
 Originally Posted by ewmayer How long did it take you to work it all out originally?
Not very long. Once I got the idea to treat k*2^n as 2^(n+log2k) so that carries wrap around properly, the big question was can I retrieve integer results after the FFT. I did a small example, like you did and the rest fell into place.

The hard part is writing all that assembly code to propagate and split carries without overflowing 53-bits of precision.

Quote:
 In terms of practical coding issues, besides the input-wordsize restrictions added by the k-term, the major difference I see is that instead of using pure-integer arithmetic to keep track of the current wordsize (e.g. by tracking #smallwords*array-index % N), one now needs to allow non-integral quantities (via n + log2(k)) in the wordsize bookkeeping, is that right?
Tracking word sizes is pretty easy for me. I do it once up front and store a big-vs-little flag in an array. This flag is used to index into other arrays so that there are no conditionals during carry propagation.

You need to make sure you compute the big-vs-little word flag *exactly* the same way whenever you use the flag. I vaguely remember a nasty bug where the % N operation was extremely close to an integer -- one piece of code rounded up and another rounded down.

Quote:
 Lastly, what to do if c in k*b^n+c is negative? Do you just layer on a standard set of negacyclic-convolution weights and apply the generalized Mersenne-mod-style DWT to k*b^n+|c| ?
Yes, if c is negative you apply DWT weights that range between 1 and c. If c is positive you'll need to also apply weights but the weights are complex numbers, not reals.   2012-10-16, 17:08 #120 LaurV Romulan Interpreter   "name field" Jun 2011 Thailand 71·139 Posts MM#36 (as you eliminated k=1001): the next possible candidate bigger then a thousand is 1064 (sieved to q=20G). Code: Testing for p=2976221 from q=0 to 20000000000 2* 1004 *M2976221+1 is divisible by 41 2* 1005 *M2976221+1 is divisible by 9693913 2* 1008 *M2976221+1 is divisible by 1523 2* 1009 *M2976221+1 is divisible by 3 2* 1012 *M2976221+1 is divisible by 3 2* 1013 *M2976221+1 is divisible by 757 2* 1016 *M2976221+1 is divisible by 7 2* 1017 *M2976221+1 is divisible by 5 2* 1020 *M2976221+1 is divisible by 157 2* 1021 *M2976221+1 is divisible by 3 2* 1024 *M2976221+1 is divisible by 3 2* 1025 *M2976221+1 is divisible by 127 2* 1028 *M2976221+1 is divisible by 11 2* 1029 *M2976221+1 is divisible by 7043 2* 1032 *M2976221+1 is divisible by 5 2* 1033 *M2976221+1 is divisible by 3 2* 1036 *M2976221+1 is divisible by 3 2* 1037 *M2976221+1 is divisible by 5 2* 1040 *M2976221+1 is divisible by 17 2* 1041 *M2976221+1 is divisible by 6247 2* 1044 *M2976221+1 is divisible by 7 2* 1045 *M2976221+1 is divisible by 3 2* 1048 *M2976221+1 is divisible by 3 2* 1049 *M2976221+1 is divisible by 13 2* 1052 *M2976221+1 is divisible by 5 2* 1053 *M2976221+1 is divisible by 21757 2* 1056 *M2976221+1 is divisible by 20789 2* 1057 *M2976221+1 is divisible by 3 2* 1060 *M2976221+1 is divisible by 3 2* 1061 *M2976221+1 is divisible by 11 2* 1064 *M2976221+1 reached the upper limit q=20000000000 without a factor being found! Also, for MM#34: Code: 2*92*M1257787+1 does not divide MM1257787 The next possible candidate is 93, which is now under test (on a slow computer).   2012-10-16, 17:28   #121
ET_
Banned

"Luigi"
Aug 2002
Team Italia

2·41·59 Posts Quote:
 Originally Posted by LaurV MM#36 (as you eliminated k=1001): the next possible candidate bigger then a thousand is 1064 (sieved to q=20G). Code: Testing for p=2976221 from q=0 to 20000000000 2* 1004 *M2976221+1 is divisible by 41 2* 1005 *M2976221+1 is divisible by 9693913 2* 1008 *M2976221+1 is divisible by 1523 2* 1009 *M2976221+1 is divisible by 3 2* 1012 *M2976221+1 is divisible by 3 2* 1013 *M2976221+1 is divisible by 757 2* 1016 *M2976221+1 is divisible by 7 2* 1017 *M2976221+1 is divisible by 5 2* 1020 *M2976221+1 is divisible by 157 2* 1021 *M2976221+1 is divisible by 3 2* 1024 *M2976221+1 is divisible by 3 2* 1025 *M2976221+1 is divisible by 127 2* 1028 *M2976221+1 is divisible by 11 2* 1029 *M2976221+1 is divisible by 7043 2* 1032 *M2976221+1 is divisible by 5 2* 1033 *M2976221+1 is divisible by 3 2* 1036 *M2976221+1 is divisible by 3 2* 1037 *M2976221+1 is divisible by 5 2* 1040 *M2976221+1 is divisible by 17 2* 1041 *M2976221+1 is divisible by 6247 2* 1044 *M2976221+1 is divisible by 7 2* 1045 *M2976221+1 is divisible by 3 2* 1048 *M2976221+1 is divisible by 3 2* 1049 *M2976221+1 is divisible by 13 2* 1052 *M2976221+1 is divisible by 5 2* 1053 *M2976221+1 is divisible by 21757 2* 1056 *M2976221+1 is divisible by 20789 2* 1057 *M2976221+1 is divisible by 3 2* 1060 *M2976221+1 is divisible by 3 2* 1061 *M2976221+1 is divisible by 11 2* 1064 *M2976221+1 reached the upper limit q=20000000000 without a factor being found! Also, for MM#34: Code: 2*92*M1257787+1 does not divide MM1257787 The next possible candidate is 93, which is now under test (on a slow computer).
Thank you for your sieving work LaurV, I will add k=1064 immediately. As for MM#34: obviously k=92 should be eliminated, but I would like to have the value of the factor to update all the possible candidates given by Ernst's table. Any hint?

Luigi   Thread Tools Show Printable Version Email this Page Similar Threads Thread Thread Starter Forum Replies Last Post CRGreathouse Number Theory Discussion Group 51 2018-12-16 21:55 LaurV Hobbies 74 2018-07-11 19:33 Batalov Computer Science & Computational Number Theory 40 2013-03-16 09:19 outlnder Soap Box 20 2005-02-03 09:30 nitai1999 Software 7 2004-08-26 18:12

All times are UTC. The time now is 20:39.

Sat Jan 22 20:39:00 UTC 2022 up 183 days, 15:07, 0 users, load averages: 1.85, 1.93, 1.95