mersenneforum.org Pretty Fast Primality Test (for numbers = 3 mod 4)
 Register FAQ Search Today's Posts Mark Forums Read

 2014-04-12, 20:22 #2 tapion64   Apr 2014 1258 Posts The post is now set in stone. I hope I'm not wrong >.>
 2014-04-13, 17:52 #3 tapion64   Apr 2014 5×17 Posts I did some more work on this. Essentially, if n = 4m+11, then the highest term to add to the product chain is |16*m^2-16m-5| < 16*m^2, so sqrt(|16*m^2-16m-5|) < sqrt(16*m^2) = 4m < 4m+11, thus we can guarantee that if 4m+11 is prime, it won't appear in the product chain before you test it. This is necessary but not sufficient to prove the conjecture. There is a possibility that perhaps n is not prime, but its factors simply haven't appeared in the product chain yet. It seems unlikely that this would happen, but I'm not sure how to prove it. Still, the number is guaranteed composite if it divides that product, and 'probably' prime if it does not. To take it to the deterministic stage without proving my conjecture, you'd then have to verify that the residue is congruent to 2^(n-10) mod n, which is still a fairly inexpensive operation to prove primality. Using a divide and conquer approach requires log(m+1) stages of multiplication mod n, so this algorithm is basically O(log(n)^2*log(log(n))*log(log(log(n))) using FFT. Although that might only hold if you have atleast (m+1)/2 parallel streams. Still, seems fast. Last fiddled with by tapion64 on 2014-04-13 at 18:22
 2014-04-13, 21:14 #4 tapion64   Apr 2014 10101012 Posts Well, I'm a loser. I spent all this time just assuming that the algorithm didn't work for n = 1 mod 4... It actually does! The only thing that changes is how you calculate m. Instead of m = (n-11)/4, m = (n-9)/4, the congruence test will still work. I'm also getting closer to proving that if n is composite, it's sufficient to say that it will divide the product of the sequence. I wonder if there's a way to come up with linear instead of quadratic product sequences that would have the same effect? They need to grow somewhat quickly to accumulate factors and be large enough to be divisible by n, which is guaranteed by the quadratic growth, but honestly I don't see why you couldn't do the same thing with the right linear sequence. To highlight: This is now the first ever general purpose, parallelizable primality test. I will be working on writing CUDA code to benchmark, as soon as Nvidia approves my registered developer application.
2014-04-13, 21:42   #5
R.D. Silverman

Nov 2003

746010 Posts

Quote:
 Originally Posted by tapion64 So, I was actually trying to prove Sophie Germain conjecture by utilizing Wilson's theorem with respect to properties of Mersenne factors, but instead I found a pretty neat (and probably not exceedingly useful) consequence of the theorem. I won't bother going through the entirety of the math, as it took me several hours of work to get the result,
So what? It took a whole several hours?? Gollee!

If you won't post the actual math and algorithm then do us all a favor:

DON'T POST. This is a *math* forum.

I read what you did post. It is full of undefined variables, undefined expressions, and general gibberish.

However, the little bit that does make sense tells us that the algorithm
(if it works; we only have your word for it which is not sufficient for
scientific discussion) is pretty much worthless. If it takes (n-4m+3)/4
iterations, then it is PURELY EXPONENTIAL ALGORITHM.

Furthermore, you say below immediately below:

"runs until (n-4m+3)/4, m chosen sufficiently small for the number to test"

is totally WRONG, because as a function of n, the smaller m is, the LARGER
(n-4m+3)/4 becomes. This claim alone makes me suspect that you
don't know what you doing.

Prove that I am full of it. Post the math and the full algorithm.

2014-04-13, 21:46   #6
R.D. Silverman

Nov 2003

746010 Posts

Quote:
 Originally Posted by tapion64 Well, I'm a loser.
This, I believe.

Quote:
 To highlight: This is now the first ever general purpose, parallelizable primality test. I will be working on writing CUDA code to benchmark, as soon as Nvidia approves my registered developer application.
The sentence above is pure horseshit. There are existing algorithms that
run perfectly in parallel.

Stop making grandiose claims that you can not back up. Post the
math or STFU.

Last fiddled with by R.D. Silverman on 2014-04-13 at 21:47

2014-04-14, 04:21   #7
danaj

"Dana Jacobsen"
Feb 2011
Bangkok, TH

90810 Posts

Quote:
 Originally Posted by tapion64 This is now the first ever general purpose, parallelizable primality test.

ECPP: Valente's 1992 thesis describes distributed ECPP in quite a bit of detail. Morain has been running distributed ECPP for a long time. I believe the Franke, Kleinjung, and Wirth group also had a distributed vesion. Modern Primo runs on multiple cores and parallelizes quite nicely. I've debated adding an MPI layer to my code. There are numerous ways to find parallelism.

APR-CL is a great method, but it isn't something I'm very familiar with. Bosma's 1990 thesis has a short section about parallelization.

AKS: Trivially easy to parallel or distribute. Each of the s tests require only three total variables (the loop number a, the polynomial modulus r, and the input n). A very slow and not currently useful algorithm compared to others, but so easy to parallelize.

The BLS75 methods (n-1, n+1, hybrids) should be easily parallelizible, given that on non-trivial numbers they are stuck factoring large numbers. I don't believe these are really used any more on numbers that require deep factoring, as ECPP and APR-CL are far more suitable. However there are quite a few numbers that do work.

Or are we discussing compositeness tests ala BPSW?

 2014-04-14, 06:18 #8 tapion64   Apr 2014 1258 Posts Wow, dude. Calm down. First, when I say it took several hours, I mean that it's all kind of jumbled up across text files and I haven't had the time to make it nice and neat yet. The math is there, but it's not a formal proof, it's just me tweaking and testing and retweaking. I wanted to prove the last conjecture that it's sufficient to only test if the congruence is 0 before I put it all together in a formal proof. I tried to give bounds and conditions on numbers where necessary, but like I said this isn't the formal proof yet. It should be obvious that all variables discussed here are integers, and their bounds are stated unless I missed a few. Anyways, it's not particularly hard to get to my version using basic manipulation of Wilson's Theorem that n is prime iff (n-1)! = -1 mod n. Step 1. You can write (n-1)! as (n-1)*(n-2)*...*((n-3)/2)*...*(1) Step 2. You can split this into +/-(((n-1)/2)!)^2. The sign is uncertain because if (n-1)/2 is odd, the sign will be negative, and if it's even the sign will be positive. Previously I solved this by forcing the condition n = 3 mod 4, but I have a better way now. So to account for this, we have (-1)^((n-3)/2) * (((n-1)/2)!)^2 = 1 mod n. Step 3. Now we take the square root. If n = 3 mod 4, we get ((n-1)/2)! = +/- 1 mod n. If n = 1 mod 4, we get sqrt(-1)*((n-1)/2)! = +/- 1 mod n. Note that (n-1)^2 = n^2 - 2n - 1 = -1 mod n, so n-1 is always a solution for sqrt(-1), but since we have +/- on RHS, it doesn't matter and we can use ((n-1)/2)! = +/- 1 for all n = 1 mod 2. Step 4. (n-1)/2 * 2 = n-1 = -1 mod n, but again we have +/- on RHS, so we can get rid of both of those terms, giving us ((n-3)/2)! = +/- 2 mod n Step 5. Here's where it gets tricky. I had to write this out by example a few times to get it right, so showing an example would probably work best here too. Take n = 11, so ((11-3)/2)! = (4)! = 4*3*2. Then look at n = 13, so ((13-3)/2)! = (5)! = 5*4*3*2. Notice again the discrepancy in the number of terms based on n = 1 or 3 mod 4. I'll work it for 3 first, and then show it for 1. Set 2 to the side for a moment, and you have 4*3. If you have n = 15 instead, you have 6*5*4*3. We're going to split it into two products of length (n-7)/4. We can write it as (n-3)/2*...*(n+5)/4 multiplied by (n+1)/4*...*3. We can write these as products over variable k as product((n-2k-3)/2); k = 0 to (n-11)/4, n > = 11) and product(k+3;k = 0 to (n-11)/4). We could actually do this for n >= 3 or 7, but those are trivial and 11 sets up nicely. Notice that in the first product, we have 2 in the denominator, and there are (n-7)/4 of them. Then remember that we set aside 2 earlier, so we have (n-11)/4 of them in denominator. Multiply both sides by 2^(n-11)/4 and that makes the formula n-2k-3, and our RHS is now +/-2^(n-7)/4. Note that we can now remove n from the first product to get -2k-3, and both of our products are independent of n and run on the same index. Since we have -(2k+3) and sign doesn't matter, we have (2k+3)*(k+3) = 2k^2+9(k+1). This gives us the first base case: product(2*k^2+9(k+1);k=0 to (n-11)/4, n>=11, n = 3 mod 4) = +/-2^((n-7)/4) mod n Now, that's fine and good and all, but here's why I said if you choose n = 4m+11 small enough, which was bad wording on my part: If you choose n = 11 in that formula, you're doing a product from k = 0 to 0, i.e. a single calculation of the product at 0, which is trivially seen to always be 9. In the form above, it's not easy to see how this is useful, because it's based on n >= 11. But the product can be "pushed forward" by calculating the value of the formula at the current upper limit of k = (n-11)/4, tacking it on as a precalculated part of the product (or storing, or whatever), moving the limit for k down to (n-15)/4. Thus if you have already calculated all the terms in the product sequence that came before that, choosing n = 15 results in a k = 0 which means we multiply the product sequence by 9 and check if it's equal to +/-2^((n+5)/4) mod n. I'm getting tired now, and I've explained all of the math that I didn't put in the original post, so you can see that everything flows to the point of where my original post begins explaining. Oh, except for the product calculation for n = 1 mod 4... But maybe you'll just take my word that it works out nicely for now, or work it yourself. I'll finish explaining tomorrow. As to why you can't follow my original post from that point, maybe it was difficult to understand if you didn't see the math that got there? Perhaps it will no longer be gibberish to you. But I can guarantee that it all logically flows. I'll clarify whatever you're having trouble following. I'll just end by acknowledging that yes, there are some other general purpose methods that run in parallel, but most are probabilistic or unproven, or simply too expensive to do when n gets large like trial factoring. I should have specified that I meant it was the first deterministic /feasible/ general purpose algorithm to run in parallel. Since it's a sequence of multiplications mod n that can be calculated independently (and a large amount of it could be cached to save time), and then divide and conquer using the power given to us by the GPU's threading to recombine mod n, it becomes a very usable primality test for large numbers (atleast, numbers that we can reasonably do FFT mod n on). The odds are also highly likely to come up with a product that n will divide if n is composite within the first few steps of divide and conquer, based on the relative size of the numbers. But some of it will be just luck, if the right factors aren't grouped together until the last multiplication, it's the same worst case as n actually being prime. Anyways, you don't have to believe my findings. But atleast try and understand what I have written before insulting me. One more thing. I do realize that this is essentially O(n) multiplication operations, so it's technically an exponential time algorithm. But because we can precalc, cache, reduce via modulo before and after each stage, and divide and conquer on GPU, it will have more like an O(log(N)) speed. Last fiddled with by tapion64 on 2014-04-14 at 07:10
2014-04-14, 07:09   #9
tapion64

Apr 2014

5×17 Posts

Quote:
 Originally Posted by danaj What about examples such as, ECPP: Valente's 1992 thesis describes distributed ECPP in quite a bit of detail. Morain has been running distributed ECPP for a long time. I believe the Franke, Kleinjung, and Wirth group also had a distributed vesion. Modern Primo runs on multiple cores and parallelizes quite nicely. I've debated adding an MPI layer to my code. There are numerous ways to find parallelism. APR-CL is a great method, but it isn't something I'm very familiar with. Bosma's 1990 thesis has a short section about parallelization. AKS: Trivially easy to parallel or distribute. Each of the s tests require only three total variables (the loop number a, the polynomial modulus r, and the input n). A very slow and not currently useful algorithm compared to others, but so easy to parallelize. The BLS75 methods (n-1, n+1, hybrids) should be easily parallelizible, given that on non-trivial numbers they are stuck factoring large numbers. I don't believe these are really used any more on numbers that require deep factoring, as ECPP and APR-CL are far more suitable. However there are quite a few numbers that do work. Or are we discussing compositeness tests ala BPSW?
It's my understanding that ECPP still has some unproven conjectures it's relying on to get to O(log(n)^6). Likewise AKS needs Agrawal's Conjecture to be true to get it to O(log(n)^3.5), but evidence suggests it's probably false, so it's at the same complexity as ECPP, which is still too slow. I hadn't looked into the other two very much. I can't find values for the constant for APR-CL anywhere for some reason, but it's an algorithm that's superpolynomial (but subexponential) which means it eventually can't compete with AKS. The point at which that happens might be really far off, but I can't see without knowing the constant. Also seems like BLS relies too heavily on factoring.

EDIT: Looked into Modern Primo, the website only showed times for running CPU, 5000 digit number took over 9 hours on a 4 core / 8 thread 3.4 GHz i7. It doesn't even mention a GPU port on the site, where'd you find this?

EDIT2: Also, I can't believe I forgot to mention this, but we can use this to quickly factor the composite number as well, back tracking its tree to show which values of the sequence were multiplied that yielded in a multiple of n. It would require precalculating the factors of the sequence, but it'd be reusable for any number.

Last fiddled with by tapion64 on 2014-04-14 at 07:39

2014-04-14, 07:46   #10
axn

Jun 2003

490110 Posts

Quote:
 Originally Posted by tapion64 One more thing. I do realize that this is essentially O(n) multiplication operations, so it's technically an exponential time algorithm. But because we can precalc, cache, reduce via modulo before and after each stage, and divide and conquer on GPU, it will have more like an O(log(N)) speed.
Sounds like wishful thinking. A GPU can't really change the complexity of the algorithm -- at most, it can provide a constant factor speedup over a CPU. Do you have an actual algorithmic improvement (that changes the complexity) using all the other techniques listed? Or do they all amount to constant factor speedups?

2014-04-14, 09:00   #11
LaurV
Romulan Interpreter

Jun 2011
Thailand

17·19·29 Posts

Quote:
 Originally Posted by tapion64 One more thing. I do realize that this is essentially O(n) multiplication operations, so it's technically an exponential time algorithm. But because we can precalc, cache, reduce via modulo before and after each stage, and divide and conquer on GPU, it will have more like an O(log(N)) speed.
Do we still have that thread of "white pearls said on mersenneforum"? If so, this is my proposal for it.

 Similar Threads Thread Thread Starter Forum Replies Last Post Trilo Miscellaneous Math 25 2018-03-11 23:20 princeps Math 15 2012-04-02 21:49 Arkadiusz Math 6 2011-04-05 19:39 T.Rex Math 0 2004-10-26 21:37 nukemyrman Hardware 7 2003-03-04 16:08

All times are UTC. The time now is 05:53.

Mon Apr 12 05:53:54 UTC 2021 up 4 days, 34 mins, 1 user, load averages: 2.26, 1.90, 1.69