mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Software

Reply
 
Thread Tools
Old 2020-05-27, 04:26   #23
Citrix
 
Citrix's Avatar
 
Jun 2003

30478 Posts
Default gcwsieve enhancements

Code:
// We're trying to determine if n*b^n (mod p) = +1/-1.  This requires an two
// operations, an exponentiation (b^n) and a multiplication (n).  We can make a
// change to eliminate one of those operations and convert the second one into addition which    // should be faster. Doing this requires us to
// compute the multiplicative inverse of b.  The multiplicative inverse of
// b (which I will call B) is the number such at b*B = 1 mod p.  Here is how
// the algorithm works.
//
// We start with our number:
//       n*b^n (mod p) = +1/-1
// We then generate B such that b*B=1 (mod p)
// // To get then next term (which I will call N) where N>n , we need to
// check the following  
//	   N=n+r
//       N*b^N(mod p) = +1/-1
//       (n+r)*b^(n+r) (mod p) = +1/-1
// We multiply both sides by B^r
//       (n+r)*b^(n+r))B^r (mod p) = +B^r/-B^r
//       (n+r)*b^n (mod p) = +B^r/-B^r
//       n*b^n + r*b^n (mod p) = +B^r/-B^r
// We already know n*b^n and b^n are from the previous step.

// (Array 1) Now we pre-compute +-B^r for a range of r since we will be using it multiple times (1 to rmax = M*sqrt(nmax-nmin))
// (Array 2) for each n*b^n in the loop we pre-compute r*b^n for small consecutive values of r (1 to rdiffmax) – can be done by modular addition
// We can calculate each successive candidate by just adding a term from Array 2 and checking it with corresponding term from Array 1.

M for size of Array 1 can be adjusted for maximum efficiency depending on efficiency of the loops.

// once we reach rmax then we multiply last term by b^rmax and b^n by b^rmax 
   n*b^n + rmax*b^n (mod p)
   (n+rmax)* b^n (mod p)
   (n+rmax)* b^n*b^rmax(mod p)
   n'*b^n'(mod p)
   and also generate b^n'=b^n*b^rmax. (mod p)
// We can now repeat the loop again starting at n'.


Other suggestions
Your current code takes into account n values left mod 2 but not for other higher primes.
Code can be made faster if you only look at subsequences left for mod 6 or higher number. BestQ code from srsieve can be used here. 
Depending on size of subsequence being sieved we can adjust size of the loops for maximum efficiency.

On a GPU if memory storage is a concern Array 1 can be reduced in size and Array 2 might not be worth saving. This will be still faster than current code.
Citrix is offline   Reply With Quote
Old 2020-05-27, 15:41   #24
rogue
 
rogue's Avatar
 
"Mark"
Apr 2003
Between here and the

10111010000012 Posts
Default

Quote:
Originally Posted by Citrix View Post
Code:
// We're trying to determine if n*b^n (mod p) = +1/-1.  This requires an two
// operations, an exponentiation (b^n) and a multiplication (n).  We can make a
// change to eliminate one of those operations and convert the second one into addition which 
// should be faster. Doing this requires us to
// compute the multiplicative inverse of b.  The multiplicative inverse of
// b (which I will call B) is the number such at b*B = 1 mod p.  Here is how
// the algorithm works.
//
// We start with our number:
//       n*b^n (mod p) = +1/-1
// We then generate B such that b*B=1 (mod p)
//
// To get then next term (which I will call N) where N>n , we need to
// check the following  
//	   N=n+r
//       N*b^N(mod p) = +1/-1
//       (n+r)*b^(n+r) (mod p) = +1/-1
// We multiply both sides by B^r
//       (n+r)*b^(n+r))B^r (mod p) = +B^r/-B^r
//       (n+r)*b^n (mod p) = +B^r/-B^r
//       n*b^n + r*b^n (mod p) = +B^r/-B^r
// We already know n*b^n and b^n are from the previous step.

// (Array 1) Now we pre-compute +-B^r for a range of r since we will be using it multiple times (1 to rmax = M*sqrt(nmax-nmin))
// (Array 2) for each n*b^n in the loop we pre-compute r*b^n for small consecutive values of r (1 to rdiffmax) – can be done by modular addition
// We can calculate each successive candidate by just adding a term from Array 2 and checking it with corresponding term from Array 1.

M for size of Array 1 can be adjusted for maximum efficiency depending on efficiency of the loops.

// once we reach rmax then we multiply last term by b^rmax and b^n by b^rmax 
   n*b^n + rmax*b^n (mod p)
   (n+rmax)* b^n (mod p)
   (n+rmax)* b^n*b^rmax(mod p)
   n'*b^n'(mod p)
   and also generate b^n'=b^n*b^rmax. (mod p)
// We can now repeat the loop again starting at n'.

Other suggestions
Your current code takes into account n values left mod 2 but not for other higher primes.
Code can be made faster if you only look at subsequences left for mod 6 or higher number. BestQ code from srsieve can be used here. 
Depending on size of subsequence being sieved we can adjust size of the loops for maximum efficiency.

On a GPU if memory storage is a concern Array 1 can be reduced in size and Array 2 might not be worth saving. This will be still faster than current code.
It took me a while, but I believe I see where you are going with this. Obviously as the range of n increases, so does the benefit.

Memory is definitely a concern on the GPU, but I think that something similar could be done there as well.

I will table your other suggestions because I don't understand them. Someone else (maybe you) would be interested implementing.
rogue is online now   Reply With Quote
Old 2020-05-27, 16:54   #25
Citrix
 
Citrix's Avatar
 
Jun 2003

32·52·7 Posts
Default

Quote:
Originally Posted by rogue View Post
I will table your other suggestions because I don't understand them.
I am suggesting using a few more primes, other than 2, like 3,5,7,13 etc (Covering sets)
The code below is from mtsieve for prime 2.

Code:
 // If the base is odd, then all n must be even and thus the difference
      // between any two remaining n for the base must also be even.
      if (idx > 2 && ii_Base & 1)
      {
         if (idx & 1)
            continue;
         
         avx_set_16a(powers[idx-2]);
         avx_set_16b(powers[2]);
         avx_mulmod(dps, reciprocals);
         avx_get_16a(powers[idx]);
      }
      else 
      {
         avx_set_16a(powers[idx-1]);
         avx_set_16b(powers[1]);
         avx_mulmod(dps, reciprocals);
         avx_get_16a(powers[idx]);
      }
Citrix is offline   Reply With Quote
Old 2020-05-27, 18:21   #26
rogue
 
rogue's Avatar
 
"Mark"
Apr 2003
Between here and the

174116 Posts
Default

I see. This is how I have commented the non-AVX code:

Code:
// We're trying to determine if n*b^n (mod p) = +1/-1.  This requires an two
// operations, an exponentiation (b^n) and a multiplcation (n).  By precomputing
// certain values, we can actually replace the exponentiation and mulitplication
// with array lookups.  Doing this requires us to compute the multiplicative inverse
// of b and to build two arrays.  The multiplicative inverse of b (which I will call B)
// is the number such at b*B = 1 mod p.
//
// We start with our n where n=minN:
//       n*b^n (mod p) = +1/-1
//    -> n*b^n*B^n (mod p) = +1/-1 * B^n (mod p)
//    -> n (mod p) = +1/-1 B^n (mod p)
//
// Find all distinct r staring with n=minN where n+r is the next term until n=maxN.
//
// Pre-compute B^r (mod p) for all possible r.
// Pre-compute n*b^n + r*b^n (mod p) for all possible r.
//
// For each successive term:
//       (n+r)*b^(n+r) (mod p) = +1/-1
//    -> (n+r)*b^(n+r)*B^r (mod p) = +/-B^r (mod p)
//    -> (n+r)*b^n (mod p) = +/-B^r (mod p)
//    -> n*b^n + r*b^n (mod p) = +/-B^r (mod p)
//
//    Since both n*b^n + r*b^n (mod p) and +/-B^r (mod p) have been pre-cmoputed
//    we just need to compare elements of two arrays.
//
//    If n*b^n + r*b^n (mod p) = -B^r, then p is a factor of n*b^n+1.
//    If n*b^n + r*b^n (mod p) = +B^r, then p is a factor of n*b^n-1
I think that your suggestion for bestQ is not necessary as I build a table of "the covering set" of r infrequently.

There are now two loops for each group of primes. One to compute B^r (mod p) and n*b^n + r*b^n (mod p) and the one that detects if we have a factor. This has not been tested yet, so I have likely made mistakes along the way. Because of the need to convert so many doubles back to uint64_t, I'm not certain that an AVX version will be faster than this. That will need some testing. On the GPU side, I can make additional changes to eliminate unused elements of the left and right arrays.

Code:
   while (ii_Terms[termIndex] > 0)
   {
      theN = ii_Terms[termIndex];

      diff = prevN - theN;
      
      for (pIdx=0; pIdx<4; pIdx++)
      {
         // If n*b^n + r*b^n (mod p) = -B^r, then p is a factor of n*b^n+1.
         // If n*b^n + r*b^n (mod p) = +B^r, then p is a factor of n*b^n-1
         if (il_LeftTerms[DIFF_INDEX(diff)+pIdx] == il_RightTerms[DIFF_INDEX(diff)+pIdx])
            if (ip_CullenWoodallApp->ReportFactor(ps[pIdx], theN, -1))
               VerifyFactor(ps[pIdx], theN, -1);
            
         if (il_LeftTerms[DIFF_INDEX(diff)+pIdx] == ps[pIdx] - il_RightTerms[DIFF_INDEX(diff)+pIdx])
            if (ip_CullenWoodallApp->ReportFactor(ps[pIdx], theN, +1))
               VerifyFactor(ps[pIdx], theN, +1);
      }
         
      prevN = theN;
      termIndex++;
   };
rogue is online now   Reply With Quote
Old 2020-05-28, 05:25   #27
Citrix
 
Citrix's Avatar
 
Jun 2003

30478 Posts
Default

I am not sure if you understood the algorithm correctly.
Here is the pseudocode

Code:
Given a base b, nmin, nmax, prime p to sieve.

Choose a value C where C<=nmax-nmin  && (nmax-nmin+1)%C==0; need to experimentally find the best C value so code is fastest. This will be constant for all primes. Can adjust nmax so it solves the second modular part.

C=0 or C=nmax-nmin are not the best solutions. The best solution is somewhere in between based on efficiency of the code. (see ***)

Then generate B such that b*B=1 (mod p)

Then create an Array Invs[] = B^0, B^1,..., B^C (mod p)
Then create an Array Invsm[] = p-B^0, p-B^1,..., p-B^C (mod p)

Then we create a loop starting at X=nmin with step C upto nmax (e.g. for(int X=nmin; X<nmax; X=X+C)) {

We generate b^X and X*b^X (in each iteration of the loop can calculate B^lastX*B^C=B^CurrentX and then multiply by X)

We create an array Array MaxDiff []= 0*b^X, 1*b^X, ...., m*b^X (mod p; modular addition)
m would be around 50 for base 2. Could be smaller or larger based on density of remaining candidates in file. Similar to size of MAX_POWERS in current code. Use mean and standard deviations to find best m value. The small values 1*b^X are rarely used and might not need them. 


Now we find all remaining candidates in the sieve between n=X and X+C (read from a bitmap)

Let the first candidate be n=r1

Then we add X^b^x+MaxDiff[r1-X] (mod p) = y
Check if y==Invs[r1-X] (for Woodall) and y==Invsm[r1-X] (for Cullen)  --> Print factors

Let the next candidate be n=r2

Then we add y+MaxDiff[r2-r1] (mod p) = y
Check if y==Invs[r2-X] (for Woodall) and y==Invsm[r2-X] (for Cullen) --> Print factors

and so on till we reach the last term under X+C

Need to check if difference between 2 consecutive remaining terms is greater than Size of MaxDiff then we would need an extra step (similar to current code)
}// end loop


Best Q can further help as some values in Array Invs and Invsm might not need to be generated. Need to choose C value that is a multiple of Best_Q and adjust the nmin value so it is also a multiple of the Best_Q.

*For odd bases MaxDiff only needs 1/2 the values - otherwise should not apply Best_Q to MaxDiff

Why are you converting from double to uint64_t? You can do modular addition in double type.



***Notes on calculating best C value
Work done = C/q mulmod in first loop + N_range/C*(2 mulmod + 50 add and 50 subtract for MaxDiff array) in second loop.
Where q is adjustment due to Best_Q improvements
50 add+ 50 subtract = t mulmod (base on CPU time)
Then we have C/q + (N_range* (2+t))/C needs to be the smallest so work done is fastest.

Then C= sqrt((N_range*(2+t)*q)) is the best solution.
Then Array Invs should be of size ~= sqrt(n_range)*sqrt((2+t)*q) for best results.
I hope this makes more sense.

Last fiddled with by Citrix on 2020-05-28 at 05:40
Citrix is offline   Reply With Quote
Old 2020-05-28, 14:38   #28
rogue
 
rogue's Avatar
 
"Mark"
Apr 2003
Between here and the

5,953 Posts
Default

Honestly I don't fully understand and can't spend the brain power on it at the moment. If you want to take a crack at writing the code per your algorithm, then go for it. I think you have the best chance of implementing the code per your vision.
rogue is online now   Reply With Quote
Old 2020-05-28, 20:41   #29
Citrix
 
Citrix's Avatar
 
Jun 2003

32·52·7 Posts
Default

I did try implementing the algorithm. There possibly is some small bug as it is not producing all the factors. I can't find it right now. I do not have a debugger for GCC to help me.

For base 2
I initially sieved range nmin=10,000 to nmax=100,000 to remove all small p (under 1M). There were ~ 3,600 candidates left.
I used this sieved file to test the following:-
The original code gives a speed of 33Kp/sec for p range 1G to 2G.
The new code gives a speed of 2.5Kp/sec for p range 1G to 2G.

Then I tested the code nmin=1,000,000 to nmax=2,000,000 from 1G to 2G (without removing any small terms).
The original code gives a speed of 128 p/sec for p range 1G to 2G.
The new code gives a speed of 804p/sec for p range 1G to 2G.

So 8 times faster but as the original code removes terms from the sieve the code would get faster and faster whereas the new algorithm would stay at almost the same speed.

For the new algorithm to be significantly faster than the old algorithm we would need to test a range of at least 100-400M.

Since no one that I know of is planning on sieving a range that large- I would hold off on further development of this code.
Citrix is offline   Reply With Quote
Old 2020-05-28, 21:53   #30
Citrix
 
Citrix's Avatar
 
Jun 2003

32×52×7 Posts
Default

I am posting a simpler explanation of the algorithm here - in case some can come up with a faster implementation

Code:
Take all the remaining candidates in a range Nmin to Nmax
Let b be the base and B be the multiplicative inverse such that b*B = 1 mod p.

We choose a C= sqrt(Nmax-Nmin)
We then divide all n values into C different groups
In each group G_r={n1,n2,n3...,} such that n1=n2=n3=r (mod C)

We want to sieve each of these C groups separately. 

For G_r group
since n1=r (mod C) then n1=k1*C+r
since n2=r (mod C) then n2=k2*C+r

The first term in this group is n1*b^n1+-1
The second term in this group is n2*b^n2+-1

We multiply all these terms by their corresponding B^(k*C) value

The first term 
    n1*b^n1= +-1  (mod p)
    n1*b^n1*B^(k1*C)= +-B^(k1*C)  (mod p)
    Since we know that n1=k1*C+r we get
    n1*b^r=+-B^(k1*C)  (mod p)


The second term 
    n2*b^n2= +-1  (mod p)
    n2*b^n2*B^(k2*C)= +-B^(k2*C)  (mod p)
    Since we know that n2=k2*C+r we get
    n2*b^r=+-B^(k2*C)  (mod p)

Note that B^(k*C) values will be used in all the C groups and we can save on this work instead of repeatedly calculating it.
Also note n2*b^r - n1*b^r = (k2-k1)*C*b^r
If we know C*b^r (which we can calculate for each group) we can generate all these terms by addition only. 

If n1*b^r=-B^(k1*C) (mod p) then n1*b^n1+1 is divisible by p
If n1*b^r=B^(k1*C) (mod) p then n1*b^n1-1 is divisible by p
Citrix is offline   Reply With Quote
Old 2020-06-30, 15:19   #31
rogue
 
rogue's Avatar
 
"Mark"
Apr 2003
Between here and the

5,953 Posts
Default

I see where my logic is wrong referring to what I was trying to implement. Not certain if I can resurrect.
rogue is online now   Reply With Quote
Old 2020-07-08, 16:20   #32
rogue
 
rogue's Avatar
 
"Mark"
Apr 2003
Between here and the

135018 Posts
Default

One place where the framework could benefit significantly is by switching from right-to-left exponentiation to left-to-right exponentiation. If anyone is interested in changing an ASM routine to use left-to-right exponentiation, please let me know.
rogue is online now   Reply With Quote
Old 2020-08-02, 17:22   #33
Happy5214
 
Happy5214's Avatar
 
"Alexander"
Nov 2008
The Alamo City

1100001012 Posts
Default

Quote:
Originally Posted by rogue View Post
One place where the framework could benefit significantly is by switching from right-to-left exponentiation to left-to-right exponentiation. If anyone is interested in changing an ASM routine to use left-to-right exponentiation, please let me know.
I was thinking about doing this as an exercise in learning floating-point ASM, but the existing code appears to me to already be left-to-right, matching the algorithm in the source comments. Am I reading it wrong?
Happy5214 is online now   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
srsieve/sr2sieve enhancements rogue Software 281 2020-09-29 16:36
mtsieve rogue Software 447 2020-09-19 05:50
LLRnet enhancements kar_bon No Prime Left Behind 10 2008-03-28 11:21
TODO list and suggestions/comments/enhancements Greenbank Octoproth Search 2 2006-12-03 17:28
Suggestions for future enhancements Reboot It Software 16 2003-10-17 01:31

All times are UTC. The time now is 23:12.

Tue Oct 27 23:12:13 UTC 2020 up 47 days, 20:23, 2 users, load averages: 2.32, 1.90, 1.84

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2020, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.