 mersenneforum.org mtsieve enhancements
 User Name Remember Me? Password
 Register FAQ Search Today's Posts Mark Forums Read  2020-05-27, 04:26 #23 Citrix   Jun 2003 63416 Posts gcwsieve enhancements Code: // We're trying to determine if n*b^n (mod p) = +1/-1. This requires an two // operations, an exponentiation (b^n) and a multiplication (n). We can make a // change to eliminate one of those operations and convert the second one into addition which // should be faster. Doing this requires us to // compute the multiplicative inverse of b. The multiplicative inverse of // b (which I will call B) is the number such at b*B = 1 mod p. Here is how // the algorithm works. // // We start with our number: // n*b^n (mod p) = +1/-1 // We then generate B such that b*B=1 (mod p) // // To get then next term (which I will call N) where N>n , we need to // check the following // N=n+r // N*b^N(mod p) = +1/-1 // (n+r)*b^(n+r) (mod p) = +1/-1 // We multiply both sides by B^r // (n+r)*b^(n+r))B^r (mod p) = +B^r/-B^r // (n+r)*b^n (mod p) = +B^r/-B^r // n*b^n + r*b^n (mod p) = +B^r/-B^r // We already know n*b^n and b^n are from the previous step. // (Array 1) Now we pre-compute +-B^r for a range of r since we will be using it multiple times (1 to rmax = M*sqrt(nmax-nmin)) // (Array 2) for each n*b^n in the loop we pre-compute r*b^n for small consecutive values of r (1 to rdiffmax) – can be done by modular addition // We can calculate each successive candidate by just adding a term from Array 2 and checking it with corresponding term from Array 1. M for size of Array 1 can be adjusted for maximum efficiency depending on efficiency of the loops. // once we reach rmax then we multiply last term by b^rmax and b^n by b^rmax n*b^n + rmax*b^n (mod p) (n+rmax)* b^n (mod p) (n+rmax)* b^n*b^rmax(mod p) n'*b^n'(mod p) and also generate b^n'=b^n*b^rmax. (mod p) // We can now repeat the loop again starting at n'. Other suggestions Your current code takes into account n values left mod 2 but not for other higher primes. Code can be made faster if you only look at subsequences left for mod 6 or higher number. BestQ code from srsieve can be used here. Depending on size of subsequence being sieved we can adjust size of the loops for maximum efficiency. On a GPU if memory storage is a concern Array 1 can be reduced in size and Array 2 might not be worth saving. This will be still faster than current code.   2020-05-27, 15:41   #24
rogue

"Mark"
Apr 2003
Between here and the

11010010011112 Posts Quote:
 Originally Posted by Citrix Code: // We're trying to determine if n*b^n (mod p) = +1/-1. This requires an two // operations, an exponentiation (b^n) and a multiplication (n). We can make a // change to eliminate one of those operations and convert the second one into addition which // should be faster. Doing this requires us to // compute the multiplicative inverse of b. The multiplicative inverse of // b (which I will call B) is the number such at b*B = 1 mod p. Here is how // the algorithm works. // // We start with our number: // n*b^n (mod p) = +1/-1 // We then generate B such that b*B=1 (mod p) // // To get then next term (which I will call N) where N>n , we need to // check the following // N=n+r // N*b^N(mod p) = +1/-1 // (n+r)*b^(n+r) (mod p) = +1/-1 // We multiply both sides by B^r // (n+r)*b^(n+r))B^r (mod p) = +B^r/-B^r // (n+r)*b^n (mod p) = +B^r/-B^r // n*b^n + r*b^n (mod p) = +B^r/-B^r // We already know n*b^n and b^n are from the previous step. // (Array 1) Now we pre-compute +-B^r for a range of r since we will be using it multiple times (1 to rmax = M*sqrt(nmax-nmin)) // (Array 2) for each n*b^n in the loop we pre-compute r*b^n for small consecutive values of r (1 to rdiffmax) – can be done by modular addition // We can calculate each successive candidate by just adding a term from Array 2 and checking it with corresponding term from Array 1. M for size of Array 1 can be adjusted for maximum efficiency depending on efficiency of the loops. // once we reach rmax then we multiply last term by b^rmax and b^n by b^rmax n*b^n + rmax*b^n (mod p) (n+rmax)* b^n (mod p) (n+rmax)* b^n*b^rmax(mod p) n'*b^n'(mod p) and also generate b^n'=b^n*b^rmax. (mod p) // We can now repeat the loop again starting at n'. Other suggestions Your current code takes into account n values left mod 2 but not for other higher primes. Code can be made faster if you only look at subsequences left for mod 6 or higher number. BestQ code from srsieve can be used here. Depending on size of subsequence being sieved we can adjust size of the loops for maximum efficiency. On a GPU if memory storage is a concern Array 1 can be reduced in size and Array 2 might not be worth saving. This will be still faster than current code.
It took me a while, but I believe I see where you are going with this. Obviously as the range of n increases, so does the benefit.

Memory is definitely a concern on the GPU, but I think that something similar could be done there as well.

I will table your other suggestions because I don't understand them. Someone else (maybe you) would be interested implementing.   2020-05-27, 16:54   #25
Citrix

Jun 2003

22×397 Posts Quote:
 Originally Posted by rogue I will table your other suggestions because I don't understand them.
I am suggesting using a few more primes, other than 2, like 3,5,7,13 etc (Covering sets)
The code below is from mtsieve for prime 2.

Code:
 // If the base is odd, then all n must be even and thus the difference
// between any two remaining n for the base must also be even.
if (idx > 2 && ii_Base & 1)
{
if (idx & 1)
continue;

avx_set_16a(powers[idx-2]);
avx_set_16b(powers);
avx_mulmod(dps, reciprocals);
avx_get_16a(powers[idx]);
}
else
{
avx_set_16a(powers[idx-1]);
avx_set_16b(powers);
avx_mulmod(dps, reciprocals);
avx_get_16a(powers[idx]);
}   2020-05-27, 18:21 #26 rogue   "Mark" Apr 2003 Between here and the 3×5×449 Posts I see. This is how I have commented the non-AVX code: Code: // We're trying to determine if n*b^n (mod p) = +1/-1. This requires an two // operations, an exponentiation (b^n) and a multiplcation (n). By precomputing // certain values, we can actually replace the exponentiation and mulitplication // with array lookups. Doing this requires us to compute the multiplicative inverse // of b and to build two arrays. The multiplicative inverse of b (which I will call B) // is the number such at b*B = 1 mod p. // // We start with our n where n=minN: // n*b^n (mod p) = +1/-1 // -> n*b^n*B^n (mod p) = +1/-1 * B^n (mod p) // -> n (mod p) = +1/-1 B^n (mod p) // // Find all distinct r staring with n=minN where n+r is the next term until n=maxN. // // Pre-compute B^r (mod p) for all possible r. // Pre-compute n*b^n + r*b^n (mod p) for all possible r. // // For each successive term: // (n+r)*b^(n+r) (mod p) = +1/-1 // -> (n+r)*b^(n+r)*B^r (mod p) = +/-B^r (mod p) // -> (n+r)*b^n (mod p) = +/-B^r (mod p) // -> n*b^n + r*b^n (mod p) = +/-B^r (mod p) // // Since both n*b^n + r*b^n (mod p) and +/-B^r (mod p) have been pre-cmoputed // we just need to compare elements of two arrays. // // If n*b^n + r*b^n (mod p) = -B^r, then p is a factor of n*b^n+1. // If n*b^n + r*b^n (mod p) = +B^r, then p is a factor of n*b^n-1 I think that your suggestion for bestQ is not necessary as I build a table of "the covering set" of r infrequently. There are now two loops for each group of primes. One to compute B^r (mod p) and n*b^n + r*b^n (mod p) and the one that detects if we have a factor. This has not been tested yet, so I have likely made mistakes along the way. Because of the need to convert so many doubles back to uint64_t, I'm not certain that an AVX version will be faster than this. That will need some testing. On the GPU side, I can make additional changes to eliminate unused elements of the left and right arrays. Code:  while (ii_Terms[termIndex] > 0) { theN = ii_Terms[termIndex]; diff = prevN - theN; for (pIdx=0; pIdx<4; pIdx++) { // If n*b^n + r*b^n (mod p) = -B^r, then p is a factor of n*b^n+1. // If n*b^n + r*b^n (mod p) = +B^r, then p is a factor of n*b^n-1 if (il_LeftTerms[DIFF_INDEX(diff)+pIdx] == il_RightTerms[DIFF_INDEX(diff)+pIdx]) if (ip_CullenWoodallApp->ReportFactor(ps[pIdx], theN, -1)) VerifyFactor(ps[pIdx], theN, -1); if (il_LeftTerms[DIFF_INDEX(diff)+pIdx] == ps[pIdx] - il_RightTerms[DIFF_INDEX(diff)+pIdx]) if (ip_CullenWoodallApp->ReportFactor(ps[pIdx], theN, +1)) VerifyFactor(ps[pIdx], theN, +1); } prevN = theN; termIndex++; };   2020-05-28, 05:25 #27 Citrix   Jun 2003 22×397 Posts I am not sure if you understood the algorithm correctly. Here is the pseudocode Code: Given a base b, nmin, nmax, prime p to sieve. Choose a value C where C<=nmax-nmin && (nmax-nmin+1)%C==0; need to experimentally find the best C value so code is fastest. This will be constant for all primes. Can adjust nmax so it solves the second modular part. C=0 or C=nmax-nmin are not the best solutions. The best solution is somewhere in between based on efficiency of the code. (see ***) Then generate B such that b*B=1 (mod p) Then create an Array Invs[] = B^0, B^1,..., B^C (mod p) Then create an Array Invsm[] = p-B^0, p-B^1,..., p-B^C (mod p) Then we create a loop starting at X=nmin with step C upto nmax (e.g. for(int X=nmin; X Print factors Let the next candidate be n=r2 Then we add y+MaxDiff[r2-r1] (mod p) = y Check if y==Invs[r2-X] (for Woodall) and y==Invsm[r2-X] (for Cullen) --> Print factors and so on till we reach the last term under X+C Need to check if difference between 2 consecutive remaining terms is greater than Size of MaxDiff then we would need an extra step (similar to current code) }// end loop Best Q can further help as some values in Array Invs and Invsm might not need to be generated. Need to choose C value that is a multiple of Best_Q and adjust the nmin value so it is also a multiple of the Best_Q. *For odd bases MaxDiff only needs 1/2 the values - otherwise should not apply Best_Q to MaxDiff Why are you converting from double to uint64_t? You can do modular addition in double type. ***Notes on calculating best C value Work done = C/q mulmod in first loop + N_range/C*(2 mulmod + 50 add and 50 subtract for MaxDiff array) in second loop. Where q is adjustment due to Best_Q improvements 50 add+ 50 subtract = t mulmod (base on CPU time) Then we have C/q + (N_range* (2+t))/C needs to be the smallest so work done is fastest. Then C= sqrt((N_range*(2+t)*q)) is the best solution. Then Array Invs should be of size ~= sqrt(n_range)*sqrt((2+t)*q) for best results. I hope this makes more sense. Last fiddled with by Citrix on 2020-05-28 at 05:40   2020-05-28, 14:38 #28 rogue   "Mark" Apr 2003 Between here and the 673510 Posts Honestly I don't fully understand and can't spend the brain power on it at the moment. If you want to take a crack at writing the code per your algorithm, then go for it. I think you have the best chance of implementing the code per your vision.   2020-05-28, 20:41 #29 Citrix   Jun 2003 22·397 Posts I did try implementing the algorithm. There possibly is some small bug as it is not producing all the factors. I can't find it right now. I do not have a debugger for GCC to help me. For base 2 I initially sieved range nmin=10,000 to nmax=100,000 to remove all small p (under 1M). There were ~ 3,600 candidates left. I used this sieved file to test the following:- The original code gives a speed of 33Kp/sec for p range 1G to 2G. The new code gives a speed of 2.5Kp/sec for p range 1G to 2G. Then I tested the code nmin=1,000,000 to nmax=2,000,000 from 1G to 2G (without removing any small terms). The original code gives a speed of 128 p/sec for p range 1G to 2G. The new code gives a speed of 804p/sec for p range 1G to 2G. So 8 times faster but as the original code removes terms from the sieve the code would get faster and faster whereas the new algorithm would stay at almost the same speed. For the new algorithm to be significantly faster than the old algorithm we would need to test a range of at least 100-400M. Since no one that I know of is planning on sieving a range that large- I would hold off on further development of this code.   2020-05-28, 21:53 #30 Citrix   Jun 2003 22×397 Posts I am posting a simpler explanation of the algorithm here - in case some can come up with a faster implementation Code: Take all the remaining candidates in a range Nmin to Nmax Let b be the base and B be the multiplicative inverse such that b*B = 1 mod p. We choose a C= sqrt(Nmax-Nmin) We then divide all n values into C different groups In each group G_r={n1,n2,n3...,} such that n1=n2=n3=r (mod C) We want to sieve each of these C groups separately. For G_r group since n1=r (mod C) then n1=k1*C+r since n2=r (mod C) then n2=k2*C+r The first term in this group is n1*b^n1+-1 The second term in this group is n2*b^n2+-1 We multiply all these terms by their corresponding B^(k*C) value The first term n1*b^n1= +-1 (mod p) n1*b^n1*B^(k1*C)= +-B^(k1*C) (mod p) Since we know that n1=k1*C+r we get n1*b^r=+-B^(k1*C) (mod p) The second term n2*b^n2= +-1 (mod p) n2*b^n2*B^(k2*C)= +-B^(k2*C) (mod p) Since we know that n2=k2*C+r we get n2*b^r=+-B^(k2*C) (mod p) Note that B^(k*C) values will be used in all the C groups and we can save on this work instead of repeatedly calculating it. Also note n2*b^r - n1*b^r = (k2-k1)*C*b^r If we know C*b^r (which we can calculate for each group) we can generate all these terms by addition only. If n1*b^r=-B^(k1*C) (mod p) then n1*b^n1+1 is divisible by p If n1*b^r=B^(k1*C) (mod) p then n1*b^n1-1 is divisible by p   2020-06-30, 15:19 #31 rogue   "Mark" Apr 2003 Between here and the 1A4F16 Posts I see where my logic is wrong referring to what I was trying to implement. Not certain if I can resurrect.   2020-07-08, 16:20 #32 rogue   "Mark" Apr 2003 Between here and the 151178 Posts One place where the framework could benefit significantly is by switching from right-to-left exponentiation to left-to-right exponentiation. If anyone is interested in changing an ASM routine to use left-to-right exponentiation, please let me know.   2020-08-02, 17:22   #33
Happy5214

"Alexander"
Nov 2008
The Alamo City

2·3·139 Posts Quote:
 Originally Posted by rogue One place where the framework could benefit significantly is by switching from right-to-left exponentiation to left-to-right exponentiation. If anyone is interested in changing an ASM routine to use left-to-right exponentiation, please let me know.
I was thinking about doing this as an exercise in learning floating-point ASM, but the existing code appears to me to already be left-to-right, matching the algorithm in the source comments. Am I reading it wrong?   Thread Tools Show Printable Version Email this Page Similar Threads Thread Thread Starter Forum Replies Last Post rogue Software 729 2022-09-10 06:06 rogue Software 304 2021-11-06 13:51 kar_bon No Prime Left Behind 10 2008-03-28 11:21 Greenbank Octoproth Search 2 2006-12-03 17:28 Reboot It Software 16 2003-10-17 01:31

All times are UTC. The time now is 19:20.

Wed Sep 28 19:20:17 UTC 2022 up 41 days, 16:48, 2 users, load averages: 1.14, 1.62, 1.72

Copyright ©2000 - 2022, Jelsoft Enterprises Ltd. This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.

≠ ± ∓ ÷ × · − √ ‰ ⊗ ⊕ ⊖ ⊘ ⊙ ≤ ≥ ≦ ≧ ≨ ≩ ≺ ≻ ≼ ≽ ⊏ ⊐ ⊑ ⊒ ² ³ °
∠ ∟ ° ≅ ~ ‖ ⟂ ⫛
≡ ≜ ≈ ∝ ∞ ≪ ≫ ⌊⌋ ⌈⌉ ∘ ∏ ∐ ∑ ∧ ∨ ∩ ∪ ⨀ ⊕ ⊗ 𝖕 𝖖 𝖗 ⊲ ⊳
∅ ∖ ∁ ↦ ↣ ∩ ∪ ⊆ ⊂ ⊄ ⊊ ⊇ ⊃ ⊅ ⊋ ⊖ ∈ ∉ ∋ ∌ ℕ ℤ ℚ ℝ ℂ ℵ ℶ ℷ ℸ 𝓟
¬ ∨ ∧ ⊕ → ← ⇒ ⇐ ⇔ ∀ ∃ ∄ ∴ ∵ ⊤ ⊥ ⊢ ⊨ ⫤ ⊣ … ⋯ ⋮ ⋰ ⋱
∫ ∬ ∭ ∮ ∯ ∰ ∇ ∆ δ ∂ ℱ ℒ ℓ
𝛢𝛼 𝛣𝛽 𝛤𝛾 𝛥𝛿 𝛦𝜀𝜖 𝛧𝜁 𝛨𝜂 𝛩𝜃𝜗 𝛪𝜄 𝛫𝜅 𝛬𝜆 𝛭𝜇 𝛮𝜈 𝛯𝜉 𝛰𝜊 𝛱𝜋 𝛲𝜌 𝛴𝜎𝜍 𝛵𝜏 𝛶𝜐 𝛷𝜙𝜑 𝛸𝜒 𝛹𝜓 𝛺𝜔