20220821, 14:34  #1 
Jul 2022
34_{16} Posts 
LucasLehmer Primality Testing
I made this version of LucasLehmer Primality Testing if (2^P  1) is prime then the elements of the vector esp_L_i are all 1.
The process is a little slow but it works on the sum of exponents and there are no multiplications do you think it can be useful? I want to clarify that I am not a programmer, I came up with the idea and I realized it quickly without trying to optimize it because it is too slow and I do not know if it is worth spending time to improve it. Code:
#include <iostream> #include <cmath> #include <algorithm> #include <vector> #include <cstdlib> #include <stdint.h> bool ll_test(uint64_t P) { std::vector<uint64_t> esp_L_i(P); std::vector<uint64_t> esp_L_i_p2(P); std::vector<uint64_t> esp_L_i_m2(P); esp_L_i[2]=1; esp_L_i_m2[1]=1; for (uint64_t step=2;step<P;step++) { if (esp_L_i[1]==0) { for (uint64_t i=0;i<P;i++) esp_L_i_p2[i]=esp_L_i[i]; esp_L_i_p2[1]=1; } else { esp_L_i_p2[0]=esp_L_i[0]; esp_L_i_p2[1]=1+esp_L_i[1]; for (uint64_t i=2;i<P;i++) { esp_L_i_p2[i]=esp_L_i[i]+esp_L_i_p2[i1]/2; esp_L_i_p2[i1]%=2; } while (esp_L_i_p2[P1]>1) { esp_L_i_p2[0]+=esp_L_i_p2[P1]/2; esp_L_i_p2[P1]%=2; for (uint64_t i=1;i<P;i++) { esp_L_i_p2[i]+=esp_L_i_p2[i1]/2; esp_L_i_p2[i1]%=2; } } } for (uint64_t i=0;i<P;i++) esp_L_i[i]=0; for (uint64_t i=0;i<P;i++) if (esp_L_i_m2[i]==1) for (uint64_t j=0;j<P;j++) if (esp_L_i_p2[j]==1) { esp_L_i[(i+j)%P]+=1; if (esp_L_i[(i+j)%P]==P) { for (uint64_t i=1;i<P;i++) { esp_L_i[i]+=esp_L_i[i1]/2; esp_L_i[i1]%=2; } while (esp_L_i[P1]>1) { esp_L_i[0]+=esp_L_i[P1]/2; esp_L_i[P1]%=2; for (uint64_t i=1;i<P;i++) { esp_L_i[i]+=esp_L_i[i1]/2; esp_L_i[i1]%=2; } } } } for (uint64_t i=1;i<P;i++) { esp_L_i[i]+=esp_L_i[i1]/2; esp_L_i[i1]%=2; } while (esp_L_i[P1]>1) { esp_L_i[0]+=esp_L_i[P1]/2; esp_L_i[P1]%=2; for (uint64_t i=1;i<P;i++) { esp_L_i[i]+=esp_L_i[i1]/2; esp_L_i[i1]%=2; } } for (uint64_t i=0;i<P;i++) esp_L_i_m2[i]=esp_L_i[i]; if (esp_L_i[1]==0) esp_L_i[1]=1; else { esp_L_i[1]+=1; for (uint64_t i=2;i<P;i++) { esp_L_i[i]+=esp_L_i[i1]/2; esp_L_i[i1]%=2; } while (esp_L_i[P1]>1) { esp_L_i[0]+=esp_L_i[P1]/2; esp_L_i[P1]%=2; for (uint64_t i=1;i<P;i++) { esp_L_i[i]+=esp_L_i[i1]/2; esp_L_i[i1]%=2; } } } } uint64_t i; for (i=0;i<P;i++) if(esp_L_i[i]==0) break; if (i<P) return false; return true; } int main() { std::cout <<"ll test: "<< ll_test(607)<< std::endl; return 0; } 
20220821, 15:13  #2 
Jul 2022
52_{10} Posts 
I want to post a further clarification the idea was born why
I observed this detail in L_2 and L_4 note that if L_1=4 then L_2=L_1^22=(L_12)(L_1+2)+2 L_2 = 14 = 2^3 + 2^2 + 2^1 and the exponents 3%3=0 , 1 , 2%3=2 they are all present L_4 = 37634 = 2^15 + 2^12 + 2^9 + 2^8 + 2^1 and the exponents 15%5=0 , 1 , 12%5=2 , 8%5=3 , 9%5=4 they are all present I thought about developing the program and testing it for larger numbers. In fact I think that adding the exponents of 2 is the same as doing a multiplication. Last fiddled with by User140242 on 20220821 at 15:54 
20220822, 07:15  #3 
Jul 2022
2^{2}·13 Posts 
In rewriting the code I made a mistake I forgot the 0 to initialize the vectors in the declaration.
What it does is perform the multiplication of (L_i2)*(L_i+2) and at the same time the modulo (2^P1) is done. In fact for example 2^(b*P) = 1 (mod 2^P 1) so 1 is added in esp_L_i[0] furthermore we have 2^(a + b*P) = 2^a *(2^b*P1) + 2^a and a* 2^(P1) = a/2 * (2^P 1) + a/2 then the value of (L_i2)*(L_i+2) is stored for use in the next step and the value of L_(i+1) is calculated by adding 2. Last fiddled with by User140242 on 20220822 at 07:19 
20220822, 08:04  #4 
Undefined
"The unspeakable one"
Jun 2006
My evil lair
2·3^{4}·41 Posts 
I'm not sure what you are trying to achieve by writing the code.
I'm guessing you are already aware that there are plenty of arbitrary precision calculators already available for free. So assuming you just want to play with some code for fun and experience then if you want it to be faster for large numbers, the you will need a different multiply algorithm. Look into FFT or NTT. 
20220822, 08:10  #5  
Jul 2022
34_{16} Posts 
Quote:
Thanks for the advice, in fact I wrote the code just for fun. 

20220822, 15:20  #6 
Jul 2022
64_{8} Posts 
Even if it is useless, this is the best I have been able to do to optimize it.
Code:
#include <iostream> #include <cmath> #include <algorithm> #include <vector> #include <cstdlib> #include <stdint.h> bool ll_test(uint64_t P) { std::vector<uint64_t> bin_L_i(P,0); std::vector<uint64_t> esp_L_i_m2(P,0); uint64_t l_m2=1; esp_L_i_m2[0]=1; for (uint64_t step=2;step<P;step++) { for (uint64_t i=0;i<l_m2;i++) { bin_L_i[(2+esp_L_i_m2[i])%P]+=1; for (uint64_t j=0;j<l_m2;j++) bin_L_i[(esp_L_i_m2[i]+esp_L_i_m2[j])%P]+=1; } for (uint64_t i=1;i<P;i++) { bin_L_i[i]+=bin_L_i[i1]>>1; bin_L_i[i1]&=1; } while (bin_L_i[P1]>1) { bin_L_i[0]+=bin_L_i[P1]>>1; bin_L_i[P1]&=1; for (uint64_t i=1;i<P;i++) { bin_L_i[i]+=bin_L_i[i1]>>1; bin_L_i[i1]&=1; } } l_m2=0; for (uint64_t i=0;i<P;i++) if (bin_L_i[i]==1) { bin_L_i[i]=0; esp_L_i_m2[l_m2]=i; l_m2++; } } if (l_m2==P1 && esp_L_i_m2[1]==2) return true; return false; } int main() { std::cout <<"ll test: "<< ll_test(607)<< std::endl; return 0; } 
20220822, 17:11  #7  
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
5×1,423 Posts 
Quote:
Also I don't see anything that looks like multithreading. Or use of floating point for speed. Or output of the least significant 64 bits of the final residue for error detection by doublechecking. Or Jacobi symbol check. It looks to me very wasteful of cycles and space. I suspect an optimizing compiler is saving it somewhat. Perhaps I misunderstand what you're doing there. Some explanatory comments in the post, code, or both might help. Compiling it via msys32/mingw and using its environment for timing runs, I get O(p^{~3}) with a large proportionality constant, such that it is thousands or millions of times slower than prime95 on the same CPU core, so unusable at ~100K or higher. Last fiddled with by kriesel on 20220822 at 17:20 

20220822, 18:14  #8  
Jul 2022
52_{10} Posts 
Quote:
I had already realized that it was very slow I posted it only because I had proposed a different way. The code in my last post does this: example if P=5 and MP=2^5 1 =31 in the single step=3 we know that L2=14 L2_m2=(L22)%31= 12 = 2^2+2^3 (L32)%31= L2_m2*L2_m2 +4*L2_m2 = 192%31= 6 multiplication is done by adding the exponents of 2 (2^2 +2^3) * (2^2 +2^3) +2^2 *(2^2 +2^3)= =2^2 * 2^2 + 2^2 * 2^3 + 2^3 * 2^2 + 2^3 * 2^3 + 2^2 * 2^2 +2^2 * 2^3= = 2^4 +2^5 +2^5+ 2^6 +2^4 +2^5 if module 5 of the exponents is made you get 2^4 +2^0 +2^0 + 2^1 + 2^4 +2^0 you get bin_L_i=[3 1 0 0 2] after for loop bin_L_i=[1 0 1 0 2] and after the while loop bin_L_i=[0 1 1 0 0] so is found esp_L_i_m2=[1 , 2] which corresponds to 2^1+2^2 = 6 =192%31 

20220826, 07:49  #9  
Jul 2022
2^{2}·13 Posts 
It is always very slow but looking at the report in
Quote:
the execution time for P = 4423 is 981s making some small changes it is possible to go down to 90s I don't know if it is possible to optimize it again. Code:
#include <iostream> #include <cmath> #include <algorithm> #include <vector> #include <cstdlib> #include <stdint.h> bool ll_test(uint64_t P) { std::vector<uint64_t> bin_L_i(P,0); std::vector<uint64_t> esp_L_i_m2(P,0); uint64_t b = 31; if (P<32) b=P1; std::vector<uint64_t> C_max(b,0); for (uint64_t i=1;i<b;i++) C_max[i]=1<<i; uint64_t cd = 0; uint64_t esp_1=P/2+1; uint64_t l_m2=1; uint64_t i1,j1; esp_L_i_m2[0]=1; for (uint64_t step=2;step<P;step++) { cd=0; for (i1=0;i1<l_m2 && esp_L_i_m2[i1]<esp_1;i1++) { bin_L_i[2+esp_L_i_m2[i1]]+=1; bin_L_i[2*esp_L_i_m2[i1]]+=1; for (j1=i1+1;j1<l_m2 && (1+esp_L_i_m2[i1]+esp_L_i_m2[j1])<P;j1++) bin_L_i[1+esp_L_i_m2[i1]+esp_L_i_m2[j1]]+=1; for (;j1<l_m2;j1++) bin_L_i[1+esp_L_i_m2[i1]+esp_L_i_m2[j1]P]+=1; } for (;i1<l_m2 && (2+esp_L_i_m2[i1])<P;i1++) { bin_L_i[2+esp_L_i_m2[i1]]+=1; bin_L_i[2*esp_L_i_m2[i1]P]+=1; for (j1=i1+1;j1<l_m2;j1++) bin_L_i[1+esp_L_i_m2[i1]+esp_L_i_m2[j1]P]+=1; } for (;i1<l_m2;i1++) { bin_L_i[2+esp_L_i_m2[i1]P]+=1; bin_L_i[2*esp_L_i_m2[i1]P]+=1; for (j1=i1+1;j1<l_m2;j1++) bin_L_i[1+esp_L_i_m2[i1]+esp_L_i_m2[j1]P]+=1; } bin_L_i[2]+=bin_L_i[1]>>1; bin_L_i[1]&=1; bin_L_i[0]+=bin_L_i[P1]>>1; bin_L_i[P1]&=1; bin_L_i[1]+=bin_L_i[0]>>1; bin_L_i[0]&=1; for (uint64_t esp=Pb+1;esp<P1;esp++) { if (bin_L_i[esp]>=C_max[Pesp]) { bin_L_i[0]+=bin_L_i[esp]/C_max[Pesp]; bin_L_i[esp]%=C_max[Pesp]; } } l_m2=0; for (uint64_t i=1;i<P;i++) { bin_L_i[i]+=bin_L_i[i1]>>1; bin_L_i[i1]&=1; if (bin_L_i[i1]==1) { esp_L_i_m2[l_m2]=i1; l_m2++; cd++; } } if (bin_L_i[P1]==1) { esp_L_i_m2[l_m2]=P1; l_m2++; } else { if (bin_L_i[P1]<4) { if (cd==P1 && bin_L_i[P1]==2) { l_m2=1; esp_L_i_m2[0]=P1; bin_L_i[P1]=0; } else if (cd==P1 && bin_L_i[P1]==3) { l_m2=1; esp_L_i_m2[0]=0; bin_L_i[P1]=0; } else if (cd==P2 && bin_L_i[0]==0) { l_m2=P1; if (bin_L_i[P1]==3) l_m2++; for (uint64_t i=0;i<l_m2;i++) esp_L_i_m2[i]=i; bin_L_i[P1]=0; } } } while (bin_L_i[P1]>1) { l_m2=0; bin_L_i[0]+=bin_L_i[P1]>>1; bin_L_i[P1]&=1; for (uint64_t i=1;i<P;i++) { bin_L_i[i]+=bin_L_i[i1]>>1; bin_L_i[i1]&=1; if (bin_L_i[i1]==1) { esp_L_i_m2[l_m2]=i1; l_m2++; } } if (bin_L_i[P1]==1) { esp_L_i_m2[l_m2]=P1; l_m2++; } } std::fill(bin_L_i.begin(), bin_L_i.end(), 0); } if (l_m2==P1 && esp_L_i_m2[1]==2) return true; return false; } int main() { std::cout <<"ll test: "<< ll_test(607)<< std::endl; return 0; } Last fiddled with by User140242 on 20220826 at 08:47 

20220826, 18:41  #10 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
1101111001011_{2} Posts 
It takes multithreading and efficient implementation of the fastest known algorithms to be competitive (IBDWT or possibly NTT). Grammar school multiplication is known to be competitive for only the shortest operands, <~256 bits, because it is O(p^{2}) per iteration, so O(p^{3}) per primality test, a very slow algorithm for large operands as in GIMPS.
It would be helpful if your code accepted exponent as a command line parameter, instead of hard coding and requiring recompile for each. Last fiddled with by kriesel on 20220826 at 18:50 
20220827, 15:32  #11 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
5×1,423 Posts 
Oops, g++ default optimization is O0 (none). O3 provides considerable improvement. (Mprime is still estimated ~10^{5} times faster at p~1M on the same CPU core, and has an increasingly large advantage at higher exponents.)
Modified main() to accept exponent input: Code:
int main(int argc,char* argv[]) { int expon; // cout << "You have entered " << argc << " arguments:" << "\n"; // for (int i = 0; i < argc; ++i) cout << argv[i] << "\n"; if ( argc > 1 ) { expon=std::stoi(argv[1]); printf ( "Exponent %d\n",expon); std::cout <<"ll test (1 prime, 0 composite): "<< ll_test(expon)<< std::endl; } else { printf ( "Enter an integer exponent greater than one, as a command line parameter.\n"); } return 0; } 
Thread Tools  
Similar Threads  
Thread  Thread Starter  Forum  Replies  Last Post 
Alternative Math for Lucas Lehmer Testing and Perfect Numbers  LarsNet  Miscellaneous Math  0  20210309 22:44 
LucasLehmer Primes  henryzz  And now for something completely different  42  20190603 14:09 
Modifying the Lucas Lehmer Primality Test into a fast test of nothing  Trilo  Miscellaneous Math  25  20180311 23:20 
LucasLehmer  Dougal  Information & Answers  9  20090206 10:25 
LucasLehmer primality test  CMOSMIKE  Math  5  20020906 18:46 