20220720, 22:46  #23 
"Serge"
Mar 2008
Phi(4,2^7658614+1)/2
2^{4}×3^{3}×23 Posts 
Re: 32 bits, and later 64 bit limit, and then higher.
Every road begins with a few first steps.
Just to give you, perhaps, a sandwich for the road (because this road is long), there are certain elements that can be used for 1) prototyping, 2) refactoring into something intermediate, 3) deep refactoring, with custom building blocks. (That is all with the metadecision to stop at any stage if/when you might find that this was already done, or idea turned to be invalid or unoriginal or simply you might have lost interest. Or every once in a while you might have given it all you've got and the problem has not yielded. I've had a few where you do a lot of invisible work with no result to show... except maybe that final search limit was this or that. The other metaconsideration is to stop after or in parallel with Step 1 and estimate what, statistically, you expect to get. And at expense of how many coreyears. It is a separate skill to acquire. Many people keep banging at problems where their expected yield is 1 hit per 10,000 core years ... and they only have 32 cores. Do the arithmetic.) For step 1) most of the time one can find that if you start using Pari/GP and get over the few first syntax hurdles, you will never quit it. Once you have a script you can test some hypothesis to a certain limit and with a bit of luck for some problems you will be done. For step 2) one would take a gp scaffold code and refactor it into GMPassisted C/C++. This will be faster, but not lightning fast. For step 3) you can define (or refine based on your results from steps 1 and 2) the definition domain of the final implementation. Let's say this will be a 96bit limit. (64bit is even easier to implement; but 96 is quite commonly good; btw lots of folks have 96bit oriented code.) Then if we take this toy problem as an example, you will realize that you want to implement a class (or a structure) with three 32bit unsigned int limbs, and implement a) addition of elements a + b from it (sum up, carry over), b) multiplication x * y  and you can do either "grade school" multiplication or Karatsuba. You can also borrow ad hoc implementations by studying other folks code, for example srsieve or polysieve (there is readily available ~55bit multiplication implementation there that is cleverly using doubles for an intermediate). You can also find small ASM snippets to mulmod, powmod, invmod etc that can be embedded into C that will make code much faster, and then leave all other logic (if/else/for's) to gcc O3. Then it is always fun to compare what you will get with simply using GMP despite its 'shoot from large cannons' overhead when all you need is 64 or 96 bits. Just 2 cents for the road. These hints might be useful for the general process of prototyping anything. Anything that you might think of tomorrow. 
20220720, 23:20  #24 
If I May
"Chris Halsall"
Sep 2002
Barbados
2^{4}×5×7×19 Posts 

20220721, 18:30  #25  
Jul 2022
7^{2} Posts 
Quote:
I will try to follow your advice. In the meantime I created this function that allows you to find the factor (~ 42 bits) of 13379827 or the factor (~ 46 bits) of 668267879 in less than 0.7 s. Code:
uint64_t XY_modulo_D_LN(uint64_t X,uint64_t Y,uint64_t D,uint64_t D_max) { // D<2^54 D>D_max uint64_t mod_D; if (X<=D_max && Y<=D_max ) { mod_D=(X*Y)%D; } else if (X<=D_max  Y<=D_max ) { uint64_t X1,N_t; uint64_t RD=(uint64_t)D%D_max; uint64_t KD=(uint64_t)D/D_max; if (Y<=D_max) { X1=Y; Y=X; X=X1; } mod_D=(X*(Y%D_max))%D_max; N_t=X*(Y/D_max)+(X*(Y%D_max))/D_max; mod_D=(mod_D+(D_max*(N_t%KD))%D)%D; mod_D=(mod_D+D((N_t/KD)*RD)%D)%D; } else { uint64_t RD=(uint64_t)D%D_max; uint64_t KD=(uint64_t)D/D_max; uint64_t Rx=(uint64_t)X%D_max; uint64_t Ry=(uint64_t)Y%D_max; uint64_t Kx=(uint64_t)X/D_max; uint64_t Ky=(uint64_t)Y/D_max; uint64_t mod_D_t,N_t,R_t; mod_D=(Rx*Ry)%D; mod_D_t=((D_max*(D_max%KD))%D)%D; mod_D_t=(mod_D_t+D((D_max/KD)*RD)%D)%D; N_t=mod_D_t; R_t=Kx; if (N_t>=(D_max/R_t)) { mod_D_t=(R_t*(N_t%D_max))%D_max; N_t=R_t*(N_t/D_max)+(R_t*(N_t%D_max))/D_max; mod_D_t=(mod_D_t+(D_max*(N_t%KD))%D)%D; mod_D_t=(mod_D_t+D((N_t/KD)*RD)%D)%D; } else mod_D_t=mod_D_t*R_t; N_t=mod_D_t; R_t=Ky; if (N_t>=(D_max/R_t)) { mod_D_t=(R_t*(N_t%D_max))%D_max; N_t=R_t*(N_t/D_max)+(R_t*(N_t%D_max))/D_max; mod_D_t=(mod_D_t+(D_max*(N_t%KD))%D)%D; mod_D_t=(mod_D_t+D((N_t/KD)*RD)%D)%D; } else mod_D_t=mod_D_t*R_t; N_t=mod_D+mod_D_t; mod_D=N_t; if (N_t>=D_max) { mod_D=(N_t)%D_max; mod_D=(mod_D+(D_max*((N_t/D_max)%KD))%D)%D; mod_D=(mod_D+D(((N_t/D_max)/KD)*RD)%D)%D; } if (Rx>=KD) { mod_D_t=((D_max*(Rx%KD))%D)%D; mod_D_t=(mod_D_t+D((Rx/KD)*RD)%D)%D; } else mod_D_t=(Rx*D_max)%D; N_t=mod_D_t; R_t=Ky; if (N_t>=(D_max/R_t)) { mod_D_t=(R_t*(N_t%D_max))%D_max; N_t=R_t*(N_t/D_max)+(R_t*(N_t%D_max))/D_max; mod_D_t=(mod_D_t+(D_max*(N_t%KD))%D)%D; mod_D_t=(mod_D_t+D((N_t/KD)*RD)%D)%D; } else mod_D_t=mod_D_t*R_t; N_t=mod_D+mod_D_t; mod_D=N_t; if (N_t>=D_max) { mod_D=(N_t)%D_max; mod_D=(mod_D+(D_max*((N_t/D_max)%KD))%D)%D; mod_D=(mod_D+D(((N_t/D_max)/KD)*RD)%D)%D; } if (Ry>=KD) { mod_D_t=((D_max*(Ry%KD))%D)%D; mod_D_t=(mod_D_t+D((Ry/KD)*RD)%D)%D; } else mod_D_t=(Ry*D_max)%D; N_t=mod_D_t; R_t=Kx; if (N_t>=(D_max/R_t)) { mod_D_t=(R_t*(N_t%D_max))%D_max; N_t=R_t*(N_t/D_max)+(R_t*(N_t%D_max))/D_max; mod_D_t=(mod_D_t+(D_max*(N_t%KD))%D)%D; mod_D_t=(mod_D_t+D((N_t/KD)*RD)%D)%D; } else mod_D_t=mod_D_t*R_t; N_t=mod_D+mod_D_t; mod_D=N_t; if (N_t>=D_max) { mod_D=(N_t)%D_max; mod_D=(mod_D+(D_max*((N_t/D_max)%KD))%D)%D; mod_D=(mod_D+D(((N_t/D_max)/KD)*RD)%D)%D; } } return mod_D; } bool is_2pow_onemod(uint64_t P,uint64_t D) { // return true if 2^P = 1 (mod D) uint64_t x=2; uint64_t y=1; uint64_t D_max=(uint64_t)1<<30; if (D<=D_max) { while (P > (uint64_t)1) { if (P%2 == 1) y = (x * y) %D; x = (x * x) %D; P >>= 1; } return (((x*y)%D) == 1); } else { while (P > (uint64_t)1) { if (P%2 == 1) y=XY_modulo_D_LN(x,y,D,D_max); x=XY_modulo_D_LN(x,x,D,D_max); P >>= 1; } return (XY_modulo_D_LN(x,y,D,D_max) == (uint64_t)1); } } 

20220721, 19:23  #26 
"Forget I exist"
Jul 2009
Dumbassville
2^{4}·3·5^{2}·7 Posts 
I'm not fluent in C but did your code use Mp prime ? If so D=2k(Mp)+1 for some k. You only have to test D with certain k values. Nevermind I see the 1 mod 8 versus 7 mod 8 talk earlier in the thread didn't read it fully.
Last fiddled with by science_man_88 on 20220721 at 20:00 
20220721, 21:17  #27 
"Forget I exist"
Jul 2009
Dumbassville
8400_{10} Posts 
you can check because that's when 2kp+1 doesn't divide by 3.

20220722, 08:21  #28  
Jul 2022
7^{2} Posts 
Quote:
Here you find the final version https://gist.github.com/user140242/0...d2738788bb1cae. A wheel sieve is used which has been set to eliminate all multiples of small primes less than min(bW, 524288) including multiples of 3. 

20220904, 15:44  #29  
Jul 2022
110001_{2} Posts 
To get the mind busy.
Several times I have been told in this thread that the best solution to calculate (2 ^ p1) module D is the one described in the math page. As described, given a certain number P, a number of steps equal to the number of bits of P converted into binary are required. Example for p=86000063 you need 27 steps. In reality, different solutions can be adopted to achieve the same result with the same complexity, for example: fix b = 6 then 2^b=64 P=P%64 +P/64 *64 = P%64 +b1 *64 b1=b1%64 +b1/64 *64 = P%64 +b2 *64 then b2=b1/64 ..... bm+1 =bm%64 +bm/64 *64 continue until bm/64 <64 the remainder must be stored in a vector [P%64, b1%64, ... , bm%64] then we start from mod=2^(bm/64) the product is repeated for b=6 times mod=(mod*mod)%D mod=(mod*mod)%D mod=(mod*mod)%D mod=(mod*mod)%D mod=(mod*mod)%D mod=(mod*mod)%D then it is calculated mod=(mod* 2^(bm%64) )%D the whole is repeated for all the remainders in the vector. Practical example P=86000063 and D=61920045361 Quote:
Quote:


20220915, 13:21  #30 
Jul 2022
7^{2} Posts 
In reference to post #29 maybe this: Find factors of a Mersenne number might be of interest to someone.
