mersenneforum.org Lucas-Lehmer Primality Testing
 Register FAQ Search Today's Posts Mark Forums Read

 2022-08-21, 14:34 #1 User140242   Jul 2022 3416 Posts Lucas-Lehmer Primality Testing I made this version of Lucas-Lehmer Primality Testing if (2^P - 1) is prime then the elements of the vector esp_L_i are all 1. The process is a little slow but it works on the sum of exponents and there are no multiplications do you think it can be useful? I want to clarify that I am not a programmer, I came up with the idea and I realized it quickly without trying to optimize it because it is too slow and I do not know if it is worth spending time to improve it. Code: #include #include #include #include #include #include bool ll_test(uint64_t P) { std::vector esp_L_i(P); std::vector esp_L_i_p2(P); std::vector esp_L_i_m2(P); esp_L_i[2]=1; esp_L_i_m2[1]=1; for (uint64_t step=2;step1) { esp_L_i_p2[0]+=esp_L_i_p2[P-1]/2; esp_L_i_p2[P-1]%=2; for (uint64_t i=1;i1) { esp_L_i[0]+=esp_L_i[P-1]/2; esp_L_i[P-1]%=2; for (uint64_t i=1;i1) { esp_L_i[0]+=esp_L_i[P-1]/2; esp_L_i[P-1]%=2; for (uint64_t i=1;i1) { esp_L_i[0]+=esp_L_i[P-1]/2; esp_L_i[P-1]%=2; for (uint64_t i=1;i
 2022-08-21, 15:13 #2 User140242   Jul 2022 5210 Posts I want to post a further clarification the idea was born why I observed this detail in L_2 and L_4 note that if L_1=4 then L_2=L_1^2-2=(L_1-2)(L_1+2)+2 L_2 = 14 = 2^3 + 2^2 + 2^1 and the exponents 3%3=0 , 1 , 2%3=2 they are all present L_4 = 37634 = 2^15 + 2^12 + 2^9 + 2^8 + 2^1 and the exponents 15%5=0 , 1 , 12%5=2 , 8%5=3 , 9%5=4 they are all present I thought about developing the program and testing it for larger numbers. In fact I think that adding the exponents of 2 is the same as doing a multiplication. Last fiddled with by User140242 on 2022-08-21 at 15:54
 2022-08-22, 07:15 #3 User140242   Jul 2022 22·13 Posts In rewriting the code I made a mistake I forgot the 0 to initialize the vectors in the declaration. What it does is perform the multiplication of (L_i-2)*(L_i+2) and at the same time the modulo (2^P-1) is done. In fact for example 2^(b*P) = 1 (mod 2^P -1) so 1 is added in esp_L_i[0] furthermore we have 2^(a + b*P) = 2^a *(2^b*P-1) + 2^a and a* 2^(P-1) = a/2 * (2^P -1) + a/2 then the value of (L_i-2)*(L_i+2) is stored for use in the next step and the value of L_(i+1) is calculated by adding 2. Last fiddled with by User140242 on 2022-08-22 at 07:19
 2022-08-22, 08:04 #4 retina Undefined     "The unspeakable one" Jun 2006 My evil lair 2·34·41 Posts I'm not sure what you are trying to achieve by writing the code. I'm guessing you are already aware that there are plenty of arbitrary precision calculators already available for free. So assuming you just want to play with some code for fun and experience then if you want it to be faster for large numbers, the you will need a different multiply algorithm. Look into FFT or NTT.
2022-08-22, 08:10   #5
User140242

Jul 2022

3416 Posts

Quote:
 Originally Posted by retina I'm not sure what you are trying to achieve by writing the code. I'm guessing you are already aware that there are plenty of arbitrary precision calculators already available for free. So assuming you just want to play with some code for fun and experience then if you want it to be faster for large numbers, the you will need a different multiply algorithm. Look into FFT or NTT.

Thanks for the advice, in fact I wrote the code just for fun.

 2022-08-22, 15:20 #6 User140242   Jul 2022 648 Posts Even if it is useless, this is the best I have been able to do to optimize it. Code: #include #include #include #include #include #include bool ll_test(uint64_t P) { std::vector bin_L_i(P,0); std::vector esp_L_i_m2(P,0); uint64_t l_m2=1; esp_L_i_m2[0]=1; for (uint64_t step=2;step>1; bin_L_i[i-1]&=1; } while (bin_L_i[P-1]>1) { bin_L_i[0]+=bin_L_i[P-1]>>1; bin_L_i[P-1]&=1; for (uint64_t i=1;i>1; bin_L_i[i-1]&=1; } } l_m2=0; for (uint64_t i=0;i
2022-08-22, 17:11   #7
kriesel

"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

5×1,423 Posts

Quote:
 Originally Posted by User140242 I made this version of Lucas-Lehmer Primality Testing if (2^P - 1) is prime then the elements of the vector esp_L_i are all 1. The process is a little slow but it works on the sum of exponents and there are no multiplications do you think it can be useful?
Useful? Other than as a learning exercise, I doubt it. It's not clear to me what's going on there. I see deeply nested for loops that look to me like its run time may scale as high as O(p5) although if the conditional branches are taken rarely it may be lower order. Competitive code scales as p2 log p log log p per primality test, or roughly p2.1. 62M(5-2.1) would be a very large speed disadvantage. Even an O(p^0.5) disadvantage can be deadly to usefulness. See also https://www.mersenneforum.org/showpo...21&postcount=7

Also I don't see anything that looks like multithreading. Or use of floating point for speed. Or output of the least significant 64 bits of the final residue for error detection by double-checking. Or Jacobi symbol check. It looks to me very wasteful of cycles and space. I suspect an optimizing compiler is saving it somewhat. Perhaps I misunderstand what you're doing there. Some explanatory comments in the post, code, or both might help.

Compiling it via msys32/mingw and using its environment for timing runs, I get O(p~3) with a large proportionality constant, such that it is thousands or millions of times slower than prime95 on the same CPU core, so unusable at ~100K or higher.
Attached Files
 primality testing run time scalings compared.pdf (32.3 KB, 30 views)

Last fiddled with by kriesel on 2022-08-22 at 17:20

2022-08-22, 18:14   #8
User140242

Jul 2022

5210 Posts

Quote:
 Originally Posted by kriesel Useful? Other than as a learning exercise, I doubt it. It's not clear to me what's going on there. ...

I had already realized that it was very slow I posted it only because I had proposed a different way.

The code in my last post does this:

example if P=5 and MP=2^5 -1 =31 in the single step=3 we know that L2=14

L2_m2=(L2-2)%31= 12 = 2^2+2^3

(L3-2)%31= L2_m2*L2_m2 +4*L2_m2 = 192%31= 6

multiplication is done by adding the exponents of 2

(2^2 +2^3) * (2^2 +2^3) +2^2 *(2^2 +2^3)=
=2^2 * 2^2 + 2^2 * 2^3 + 2^3 * 2^2 + 2^3 * 2^3 + 2^2 * 2^2 +2^2 * 2^3=
= 2^4 +2^5 +2^5+ 2^6 +2^4 +2^5

if module 5 of the exponents is made you get

2^4 +2^0 +2^0 + 2^1 + 2^4 +2^0

you get
bin_L_i=[3 1 0 0 2]

after for loop
bin_L_i=[1 0 1 0 2]

and after the while loop
bin_L_i=[0 1 1 0 0]

so is found
esp_L_i_m2=[1 , 2]

which corresponds to 2^1+2^2 = 6 =192%31

2022-08-26, 07:49   #9
User140242

Jul 2022

22·13 Posts

It is always very slow but looking at the report in

Quote:
 Originally Posted by kriesel Compiling it via msys32/mingw and using its environment for timing runs, I get O(p~3) with a large proportionality constant, such that it is thousands or millions of times slower than prime95 on the same CPU core, so unusable at ~100K or higher.

the execution time for P = 4423 is 981s making some small changes it is possible to go down to 90s I don't know if it is possible to optimize it again.

Code:
#include <iostream>
#include <cmath>
#include <algorithm>
#include <vector>
#include <cstdlib>
#include <stdint.h>

bool ll_test(uint64_t P)
{

std::vector<uint64_t> bin_L_i(P,0);
std::vector<uint64_t> esp_L_i_m2(P,0);

uint64_t b = 31;
if (P<32)
b=P-1;
std::vector<uint64_t> C_max(b,0);
for (uint64_t i=1;i<b;i++)
C_max[i]=1<<i;
uint64_t cd = 0;
uint64_t esp_1=P/2+1;
uint64_t l_m2=1;
uint64_t i1,j1;
esp_L_i_m2[0]=1;
for (uint64_t step=2;step<P;step++)
{
cd=0;
for (i1=0;i1<l_m2 && esp_L_i_m2[i1]<esp_1;i1++)
{
bin_L_i[2+esp_L_i_m2[i1]]+=1;
bin_L_i[2*esp_L_i_m2[i1]]+=1;
for (j1=i1+1;j1<l_m2 && (1+esp_L_i_m2[i1]+esp_L_i_m2[j1])<P;j1++)
bin_L_i[1+esp_L_i_m2[i1]+esp_L_i_m2[j1]]+=1;
for (;j1<l_m2;j1++)
bin_L_i[1+esp_L_i_m2[i1]+esp_L_i_m2[j1]-P]+=1;
}
for (;i1<l_m2 && (2+esp_L_i_m2[i1])<P;i1++)
{
bin_L_i[2+esp_L_i_m2[i1]]+=1;
bin_L_i[2*esp_L_i_m2[i1]-P]+=1;
for (j1=i1+1;j1<l_m2;j1++)
bin_L_i[1+esp_L_i_m2[i1]+esp_L_i_m2[j1]-P]+=1;
}
for (;i1<l_m2;i1++)
{
bin_L_i[2+esp_L_i_m2[i1]-P]+=1;
bin_L_i[2*esp_L_i_m2[i1]-P]+=1;
for (j1=i1+1;j1<l_m2;j1++)
bin_L_i[1+esp_L_i_m2[i1]+esp_L_i_m2[j1]-P]+=1;
}
bin_L_i[2]+=bin_L_i[1]>>1;
bin_L_i[1]&=1;
bin_L_i[0]+=bin_L_i[P-1]>>1;
bin_L_i[P-1]&=1;
bin_L_i[1]+=bin_L_i[0]>>1;
bin_L_i[0]&=1;
for (uint64_t esp=P-b+1;esp<P-1;esp++)
{
if (bin_L_i[esp]>=C_max[P-esp])
{

bin_L_i[0]+=bin_L_i[esp]/C_max[P-esp];
bin_L_i[esp]%=C_max[P-esp];
}
}
l_m2=0;
for (uint64_t i=1;i<P;i++)
{
bin_L_i[i]+=bin_L_i[i-1]>>1;
bin_L_i[i-1]&=1;
if (bin_L_i[i-1]==1)
{
esp_L_i_m2[l_m2]=i-1;
l_m2++;
cd++;
}
}
if (bin_L_i[P-1]==1)
{
esp_L_i_m2[l_m2]=P-1;
l_m2++;
}
else
{
if (bin_L_i[P-1]<4)
{
if (cd==P-1 && bin_L_i[P-1]==2)
{
l_m2=1;
esp_L_i_m2[0]=P-1;
bin_L_i[P-1]=0;
}
else if (cd==P-1 && bin_L_i[P-1]==3)
{
l_m2=1;
esp_L_i_m2[0]=0;
bin_L_i[P-1]=0;
}
else if (cd==P-2 && bin_L_i[0]==0)
{
l_m2=P-1;
if (bin_L_i[P-1]==3)
l_m2++;
for (uint64_t i=0;i<l_m2;i++)
esp_L_i_m2[i]=i;
bin_L_i[P-1]=0;
}
}
}
while (bin_L_i[P-1]>1)
{
l_m2=0;
bin_L_i[0]+=bin_L_i[P-1]>>1;
bin_L_i[P-1]&=1;
for (uint64_t i=1;i<P;i++)
{
bin_L_i[i]+=bin_L_i[i-1]>>1;
bin_L_i[i-1]&=1;
if (bin_L_i[i-1]==1)
{
esp_L_i_m2[l_m2]=i-1;
l_m2++;
}
}
if (bin_L_i[P-1]==1)
{
esp_L_i_m2[l_m2]=P-1;
l_m2++;
}
}
std::fill(bin_L_i.begin(), bin_L_i.end(), 0);
}
if (l_m2==P-1 && esp_L_i_m2[1]==2)
return true;
return false;
}

int main()
{

std::cout <<"ll test: "<< ll_test(607)<< std::endl;

return 0;
}

Last fiddled with by User140242 on 2022-08-26 at 08:47

2022-08-26, 18:41   #10
kriesel

"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

11011110010112 Posts

It takes multithreading and efficient implementation of the fastest known algorithms to be competitive (IBDWT or possibly NTT). Grammar school multiplication is known to be competitive for only the shortest operands, <~256 bits, because it is O(p2) per iteration, so O(p3) per primality test, a very slow algorithm for large operands as in GIMPS.
It would be helpful if your code accepted exponent as a command line parameter, instead of hard coding and requiring recompile for each.
Attached Files
 primality testing run time scalings compared.pdf (56.7 KB, 14 views)

Last fiddled with by kriesel on 2022-08-26 at 18:50

2022-08-27, 15:32   #11
kriesel

"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

5×1,423 Posts

Oops, g++ default optimization is -O0 (none). -O3 provides considerable improvement. (Mprime is still estimated ~105 times faster at p~1M on the same CPU core, and has an increasingly large advantage at higher exponents.)

Modified main() to accept exponent input:
Code:
int main(int argc,char* argv[])
{
int expon;
// cout << "You have entered " << argc         << " arguments:" << "\n";
// for (int i = 0; i < argc; ++i)       cout << argv[i] << "\n";
if ( argc > 1 ) {
expon=std::stoi(argv[1]);
printf ( "Exponent %d\n",expon);
std::cout <<"ll test (1 prime, 0 composite): "<< ll_test(expon)<< std::endl;
} else {
printf ( "Enter an integer exponent greater than one, as a command line parameter.\n");
}
return 0;
}
Attached Files
 primality testing run time scalings compared.pdf (58.2 KB, 15 views)

 Similar Threads Thread Thread Starter Forum Replies Last Post LarsNet Miscellaneous Math 0 2021-03-09 22:44 henryzz And now for something completely different 42 2019-06-03 14:09 Trilo Miscellaneous Math 25 2018-03-11 23:20 Dougal Information & Answers 9 2009-02-06 10:25 CMOSMIKE Math 5 2002-09-06 18:46

All times are UTC. The time now is 18:36.

Tue Dec 6 18:36:25 UTC 2022 up 110 days, 16:04, 0 users, load averages: 0.69, 0.70, 0.78