mersenneforum.org  

Go Back   mersenneforum.org > Math Stuff > Computer Science & Computational Number Theory

Reply
 
Thread Tools
Old 2013-06-14, 16:48   #12
danaj
 
"Dana Jacobsen"
Feb 2011
Bangkok, TH

2·3·151 Posts
Default

Quote:
Originally Posted by henryzz View Post
In terms of BOINC, I would like to see it where a huge number is proved by cooperation from many people.
This is interesting in that the structure would be substantially different than a program meant for proving many "small" numbers.

Some thoughts off the top of my head. Using ECPP and going purely by the certificate required, we can do one step at a time. Given the n value, our job is to find a suitable m/q/a/b/x/y set. We can break this down into smaller and parallel pieces.
  • find suitable D values. This is trivial with small numbers. I'm not sure whether it would be worth running in parallel or not. We'll have to later generate the corresponding Hilbert or Weber polynomial, which isn't so trivial. That may be a project in itself -- create monster databases of class polynomials (see http://hilbert-class-polynomial.appspot.com for example).
  • For each suitable D value we generate 2, 4, or 6 candidate m values. For each candidate m we want to find a suitable prime factor q. This now becomes a variant on distributed factoring. This means we can either send one m value to each host and let them churn away on it (e.g. parallel or GPU ECM), or something finer grained that would need more communication.
  • We checkpoint occasionally, with the project storing the candidate D/m values and their partial factorizations.
  • Once we've found a suitable q, we may or may not want the other jobs to keep looking for a while -- every new one they find gives us another tree. At some point we'll want to terminate all the searches at this level (someone found one or more q values, or we've decided this branch is a dead end and we're doing better on a previous one)
  • Now we have a q value that looks good, so we do this all again and repeat until q is small enough. Based on the previous item, we can see the tree starts out rather narrow (say, 60ish jobs depending on how many D values we check) but after the first q is found we branch out. Since we can assume these values are huge, then we can take full advantage of parallel ECM to keep each host busy.
  • Now it's time to generate the polynomials, find the roots, find the appropriate curve, and then points on the curve. These can be done in a second parallel step after we've made the entire chain, or in parallel with the other work. The former has the advantage of not doing work we will throw away. The latter has more immediate parallelism. If something is screwy and we can't find a curve and points for a given n/m/q then the latter would let us know sooner. This data gets stored with the n/D/m/q data so we have all the required parts for the proof if we use this m/q value.
  • Another task can run an independent validation on the data generated in the previous item. We may or may not want to do yet another full validation when we're all done, but this seems like a good idea to include here.
danaj is offline   Reply With Quote
Old 2013-06-14, 17:58   #13
henryzz
Just call me Henry
 
henryzz's Avatar
 
"David"
Sep 2007
Cambridge (GMT/BST)

23·719 Posts
Default

It is finding q that I suspect could be best made parallel.

How many factors are found leaving a large composite? Do you treat those composites any different to the original ones? They should be worth more effort as the size removed will be twice as big assuming the next factor leaves a prp.

How difficult is it to generate the polynomials? It seems that your program runs with not many and could do with more. If you have a program that will generate more, I am willing to run it for a bit.

It would be nice to try and push up the limit of where your code is better than primo. It looks like it would hopefully be possible to make it useful for smallish numbers even if primo takes over further up.
Currently the factordb has a load of candidates 550-600 digits which need processing.
What is needed to improve that region?

Last fiddled with by henryzz on 2013-06-14 at 18:03
henryzz is offline   Reply With Quote
Old 2013-06-14, 17:59   #14
danaj
 
"Dana Jacobsen"
Feb 2011
Bangkok, TH

11100010102 Posts
Default

Quote:
Originally Posted by yoyo View Post
Thanks for starting this thread. I don't have any clue about the math behind prime certificate generation and prime proving and would enjoy that other experts join the discussion and generate a good open source tool for it.
This page on the "prime pages" website is pretty useful. Here follows a rambling text with much handwaving, ovesimplification, and not reviewed for correctness. There's plenty of literature.

We first want to run a compositeness test on the number, as these are very fast compared to proofs, and some of the proofs can take ages to finish if given a composite. This may be a single Miller-Rabin test, M-R with the first N prime bases, M-R with N random bases, a BPSW variant, or BPSW + one or more M-R tests. Depends on which decade you're living in and how paranoid you are. BPSW is what most people use these days (exceptions being libtommath and hence Perl 6, GMP's mpz_probab_prime_p, and old versions of most math software before they all switched to BPSW)

If our number is small enough, we could be done right here. BPSW on numbers smaller than 2^64 is deterministic (people have hand-checked the results). There are deterministic M-R sets for small numbers -- Primo uses an old set of these.

Some proofs that don't give certificates:
  • Miller-Rabin. Step 1, prove the generalized Riemann hypothesis. Step 2, run M-R with all bases less than 2*ln(n)^2. We're still working on step 1, so this isn't useful.
  • AKS. Polynomial time. Super, duper, gosh-golly slow. No certificate. In theory it is its own polynomial time certificate, but run times for large numbers are measured in hundreds of years (cite: Brent's 2010 slides, these are numbers that take under 10 seconds with APR-CL and ECPP), so that's not helpful. Might run faster someday. There are various open source implementations.
  • APR-CL. Quite fast. A few open source implementations. WraithX's GMP version looks very readable.
Proofs with certificates. Almost all of these will be nested or chained, as we have to make sure every sub-step gets verified (e.g. for n-1 proofs, we have to verify primality of each factor, for ECPP we have to verify primality of each 'q'):
  • Lucas, where the certificate is often called a "Pratt" certificate after Pratt's seminal 1975 paper. Factor n-1 completely, then run the Lucas primality test. The values needed are n, all the factors of n-1, and a single 'a' value. This works nicely for small numbers and numbers where n-1 is easily factored. It's not very practical for larger values. Verifying the certificate is really easy.
  • n-1 (generalized Pocklington, sometimes called Pocklington-Lehmer). Requires factoring n-1 to n^(1/2). The certficate will consist of n, the factors of n-1, and an 'a' value for each factor. Pari v2.1 used this test, Primo has a very specific version (where n-1 is a semiprime). Certificates are very easy to check.
  • n-1 (BLS75 theorem 5). Requires factoring n-1 to (n/2)^(1/3). Very similar to the above, with just two extra conditions to check. Math::Prime::Util uses this. Certificates are just as easy and fast to check as generalized Pocklington.
  • n-1 (BLS75 theorem 7, Konyagin & Pomerance, etc.) These let one do even less factoring of n-1. BLS75 T7 becomes a lot more work for certification as one has to trial divide the unfactored portion up to a given limit.
  • n+1. The BLS75 paper includes lots of examples of this, and also hybrids with n-1. Primo uses a specific form of n+1, but I've never used it. Certificates shouldn't be too hard to check?
  • ECPP. Works on elliptic curves, and creates a sequence of proofs along the lines of: given this n, a, b, m, q, x, and y value, if various easily checkable conditions hold and if q is prime, then n is prime [I'm handwaving away years of fascinating progress and math -- reading more about ECPP is worth the effort]. We just keep walking 'q' values down until we can prove it via another method (I use a BPSW test and stop when q is less than 2^64, and have a compile time option to do a quick n-1 test for each q so it can stop earlier; GMP-ECPP stops at 1000M and does a trial division loop; Primo stops at 34*10^13 and does a strong probable prime test to the first 7 prime bases). The certificate is just a sequence of [n, a, b, m, q, x, y] blocks, where we can check the conditions on each one until the last one which will use some other method for the final q. Primo calls these [N, A, B, R*S, R, X, Y], but encodes A,B,X,Y in either 3 values (type 3: A,B,T) or 2 values (type 4: J,T) and computes A,B,X,Y from them. The validation does require EC point multiplication in affine coordinates. This isn't too hard to program with GMP, and you can find code in many places (e.g. ecpp-dj, GMP-ECPP, WraithX's Primo validator, etc.). I also have a version in pure Perl, but it's rather slow.


For Factordb/Boinc use, I think the most crucial step is coming up with a well defined certificate format. Then we could use all sorts of software as long as it produces the correct format or could be transformed to it. I have scripts that turn GMP-ECPP and SAGE output into the certificates I use, for instance. I could do Primo if I implement a validator for its n+1 test.

IMO we want:
  • small number. Primo calls this type 0. I think we have sufficient evidence now that we can just put 64-bit numbers here, as there are 7-base M-R tests as well as BPSW to validate these.
  • n-1. I think BLS75 theorem 5 (m=1) here, as generalized Pocklington is a trivial subset of this. Primo's type 1 should fit here as well. We store n, a list of partial factors of n-1, and the 'a' values associated with them.
  • n+1. I'm not very familiar with this, but certainly whatever goes here should be able to handle Primo type 2. I think BLS75 theorem 17 (m=1) is a very general case that should cover most cases and produce certificates that can be easily verified. I believe the data for the certificate would be similar to n-1 but would need to check (the validation process is completely different).
  • ECPP. The most general form would be a list of [n, a, b, m, q, x, y] values. Primo type 3 and 4 tests can be transformed to these. GMP-ECPP and ecpp-dj generate this data so can easily create it. We could also have shortcuts or more types for the shortened forms Primo uses.
I think non-markup plain text is a good idea, but if someone thinks YAML is good, that could work. XML would take some convincing :). IMO base 10 and maybe an option for base 62. Primo is set up to do a chain walking down, which is fine for ECPP and the restricted n-1 and n+1 tests, but doesn't work as well for general n-1 / n+1. One option is nesting, another is a flat list and make the validator ensure that everything that has to be proved eventually shows up.
danaj is offline   Reply With Quote
Old 2013-06-15, 09:47   #15
danaj
 
"Dana Jacobsen"
Feb 2011
Bangkok, TH

2×3×151 Posts
Default

Quote:
Originally Posted by henryzz View Post
It is finding q that I suspect could be best made parallel.

How many factors are found leaving a large composite? Do you treat those composites any different to the original ones? They should be worth more effort as the size removed will be twice as big assuming the next factor leaves a prp.
I'm not following what you're asking. With the FPS version of my code run with verbose 2, if it can't find any easy factors, it will show the progress of each candidate, e.g. "q no luck (820 digits)" meaning it didn't find any more factors or "q reduced (798 digits)" indicating it found one or more factors and now it's smaller (but still composite). I did contemplate having it put more effort into the smallest ones.

My FPS code stores all the partial factorizations so it can run each stage starting exactly where it left off. I chose a hackier solution for FAS, where every factor found after stage 1 is added to a list (sfacs). The first thing done for stage 2+ factoring is to trial divide by all the values on that list (which is usually empty or has only a few values). It's kind of ugly, but it works reasonably well and with FAS it rarely gets used anyway. My understanding of GMP-ECPP is that it refactors the numbers from scratch every time it increases Bmax or Dmax.

There are also some possible speedups I'm not doing here. For instance, I just added some code that does simple n-1 and n+1 reductions, which can create steps akin to Primo's type 1 and type 2 steps. This basically adds two more candidates. Additionally, it's worth looking at Crandall and Pomerance section 3.3 "Recognizing smooth numbers" (or Bernstein's paper). I believe this is what Primo is doing with its sieve parameter, but I'm just guessing. Basically there are some ways to trial factor a list of numbers that are very efficient, which would be great to use at the start.

Quote:
How difficult is it to generate the polynomials? It seems that your program runs with not many and could do with more. If you have a program that will generate more, I am willing to run it for a bit.
Atkin/Morain 1992 notes that they used 119 polys when they started. They had 2246 for later runs. I'm currently using 2650, and have a test version with 3457. Morain's 2005 paper indicates it was version 11.0.5 that computed polys at runtime.

For generation, SAGE and GMP-ECPP will generate polys, and NZMath put up the web service for small Hilbert polys. I have ~4000 generated -- some of them are enormous so I didn't include them, and some of them fail root finding, which makes me think they aren't generated correctly. I'm sorry to say I don't have a nice standalone program that spits out ready-to-use polys. Valente, Cohen, Atkin/Morain, Crandall and Pomerance, Kaltofen et al., Konstantinou at al., etc are all good resources. Another thought is to ask someone like Marcel Martin or Fran├žois Morain, since I'm sure both of them have files containing them as well as code to generate them. Either would be useful.

Quote:
It would be nice to try and push up the limit of where your code is better than primo. It looks like it would hopefully be possible to make it useful for smallish numbers even if primo takes over further up.
I hope it is at least a little bit useful now :) It's useful for me at least. Beating Primo seems ... ambitious to say the least -- Martin has about a 15 year head start on me. I guess there is a niche in the small range. I like that my code is open source, command line, and is relatively portable.

Quote:
Currently the factordb has a load of candidates 550-600 digits which need processing. What is needed to improve that region?
Lots of things I can think of, and I'm sure a lot I haven't thought of.
  • Discriminants -- there will be a few numbers that give us trouble but we just move on and don't worry about those until later. I could add some more values to the current set, but it seems like a diminishing return.
  • Add the n-1 and n+1 steps. My brief testing today looked positive, but I'm not sure if it'll have a big impact on 500+ digit numbers.
  • Investigate Bernstein 2004 smoothness paper (C&P section 3.3).
  • Factoring speedups are always useful. Morain seems to use Pollard Rho first, but I found it was worse than getting straight to p-1. This is implementation specific, but still worth some more investigation.
  • Maybe a two-pass system for factoring stage 1, where we try to spend even less time factoring in the hopes one of the results will pop out early. Or look into whether some sort of parallel factoring (across all candidates) would make any sense at all.
  • Various heuristics. This means things like not using a q immediately if it isn't "small enough" (see Morain 2005 for ideas).
  • Tuning. I found I got much better overall speed with the big poly set if I turned off the large degree polys after we got past the early steps. I have it slowly reducing the max degree. The idea is that we save time if we just backtrack vs. searching another 50 numbers at this level (where the root finding also is more expensive later).
  • I bet my poly sqrt code could be faster.
  • Crandall and Pomerance mention a few times that we can use projective coordinates for the EC math (I may be misunderstanding). At the moment that isn't a bottleneck, but it wouldn't hurt.
danaj is offline   Reply With Quote
Old 2013-07-21, 07:33   #16
danaj
 
"Dana Jacobsen"
Feb 2011
Bangkok, TH

2×3×151 Posts
Default

Version 1.01 is available: ecpp-dj-1.01.tar.gz

Some of the changes:
  • A much better defined text certificate format (described in the proof-text-format.txt file).
  • Inclusion of a Perl script to do verification of this format as well as Primo certificates. Also includes a trivial converter for GMP-ECPP output. A caveat is that the Perl script uses a development version of my Perl modules, so you'd need to get the two modules from github until I put the newest version on CPAN. I'll write a C+GMP version soon so it will be completely self-contained.
  • Simple n-1 and n+1 proofs are attempted at each stage, just like Atkin and Morain described back in 1992.
  • I added a really simple single-stage p+1 factoring algorithm which helps out in a few cases.
  • More discriminants, mostly thanks to Echidna, whose polynomials all worked without a hitch just plugging them in. Everything is filled in through degree 21, and I could add some more. They're getting pretty big at this point. Figuring out how to use Weber polys when D=0 mod 3 or another invariant would help a lot in reducing the size (and performance a bit).
  • If you put WraithX's mpz_aprcl code in the directory and add "-DUSE_APRCL" to the Makefile, then -aprcl is an available option to allow easy comparison.
  • Some changes to the heuristics in factoring depths and which discriminants to use at which stages. Some of this was due to the extra discriminants, and some based on what I saw while testing 1000-2000 digit inputs, which I had done very little of before. It still could use tuning. Something that helps one set of numbers will often hurt another set, and it can take 1-10 hours to see the results of a change over a set of large inputs, making this time consuming.
  • I changed the BPSW test from a strong Lucas-Selfridge to an extra strong (Baillie OEIS parameters) test.
Performance is generally better than version 1.00. It should be better behaved for larger inputs.

This is a work in progress, so please give feedback.
danaj is offline   Reply With Quote
Old 2013-07-21, 10:29   #17
danaj
 
"Dana Jacobsen"
Feb 2011
Bangkok, TH

2·3·151 Posts
Default

Performance graph: [link to image]

Because of the scale (from under 0.0001 seconds to over 40,000) I've used a log-log graph. This does require some extra thought to interpret. 1 second is in the middle, differences below that line may look large but aren't a big deal for most uses. Differences near the top can be huge though, e.g. where GMP-ECPP swings between 2 minutes and 11 hours for various numbers.

The numbers used are the ones from WraithX's timing data for his APRCL code. Using multiple numbers and showing some combination of min, max, average, median would be better for the ECPP algorithms whose times vary wildly with number. For the algorithms that allowed (BPSW, APRCL, ECPP-dj) I ran multiple invocations until they passed 5 minutes to get more timing resolution (noting this results in an average time). I used a constant seed for GMP-ECPP and ECPP-dj so the results would be repeatable.

In gray I show the time for the extra-strong BPSW test. This of course will be faster than any of the other results. This is the strong probable prime test (e.g. Miller-Rabin) with base 2, followed by an extra-strong Lucas test. I recently switched from the strong Lucas-Selfridge test -- the extra strong test runs faster, has fewer pseudoprimes, has similar residue class properties, and a reduced version of it has been good enough for Pari.

The pale orange is the time for my Perl script to verify an ECPP certificate output by ECPP-dj. The EC computations are sent to C+GMP, and are in fact the same code as used in my ECPP, but there is still a fair bit of Perl BigInt work. I'd expect a pure C+GMP version to be a little faster. This seems to run about 1.5-3x faster than the one by WraithX, but it also clearly hasn't gotten as much use nor eyes on it. The point here is that doing a verification takes much less time than the original proof -- for the 2000 digit example it is ~80 seconds vs. 2.5 hours or more for the proof.

The nice smooth blue line is WraithX's APRCL code. It does not produce a certificate, but it is quite fast and gives very predictable performance. This is similar in behaviour to the times I've seen from Pari's APRCL.

The red line is my ECPP code, with the preliminary n-1 test turned off (since I figured we were benchmarking ECPP). The real reason for the prelimitary test is to take care of some small inputs (typically under 30 digits) that can be done quickly, but as Morain's ECPP code shows, some of the example numbers are special cases where n-1 is easy. Overall on these examples it is faster than APRCL until about 1000 digits, though with the caveat that any given number may be faster or slower based on how the factoring turns out.

The dark green line is Morain's ECPP 6.4.5a from 2001. It is not open source, but is publicly available. He has moved on to different, better, faster code, but it is not available in any form. You can clearly see the numbers where it did a preliminary factoring step (e.g. 210 digits where the example number is 2^688 * 687 - 1). Past 310 digits it starts taking much longer to prove than my code.

The pale blue is GMP-ECPP 2.46 (10 July 2013). I left everything as default and used "-s 1" on the command line. Twiddling with #defines and parameters for each number can improve the performance a lot, but this makes the user guess as to how low POLY_SIZE can be set to before things segfault, and whether to lower or raise the precision (the default 10000 is far too high for small numbers, but probably too low for big numbers -- Atkin/Morain 1992 as well as the plethora of papers by Konstantinou et al. discuss estimating the precision needed). As I found last time, it's much slower than the other implementations.

GMP-ECPP prints out enough information to trivially create a certificate., which I verified for each of the results. ECPP-dj was run with certificate output and the results verified (though with my own code, so take that with a grain of salt).

One thing I see is that all these ECPP implementations start getting very choppy in performance once the size goes above a certain point. One thing I noticed with the 1000+ digit numbers is that finally my poly root finder is a bottleneck, which actually made me feel better, since many of the early papers indicate it was one of the largest performance issues, yet for "small" numbers it just didn't really come up for me. With the big numbers especially, it becomes a game of tradeoffs and heuristics deciding how deep to factor and whether to look at large-degree polynomials at all, knowing that they'll hurt in stage 2.

I didn't put numbers for Primo here. Based on the earlier results it would be very fast.
danaj is offline   Reply With Quote
Old 2013-07-21, 15:33   #18
CRGreathouse
 
CRGreathouse's Avatar
 
Aug 2006

5,939 Posts
Default

Wonderful graph, thanks! Very helpful in seeing what's going on.
CRGreathouse is offline   Reply With Quote
Old 2013-07-21, 23:00   #19
henryzz
Just call me Henry
 
henryzz's Avatar
 
"David"
Sep 2007
Cambridge (GMT/BST)

23×719 Posts
Default

That looks like quite a nice improvement. It is a little hard to compare with primo as working out exact values is pretty much impossible. It looks like beating aprcl should be the aim for now.
henryzz is offline   Reply With Quote
Old 2013-07-22, 15:17   #20
mjm
 
May 2012

23 Posts
Default

Quote:
Originally Posted by danaj View Post
Primo calls these [N, A, B, R*S, R, X, Y], but encodes A,B,X,Y in either 3 values (type 3: A,B,T) or 2 values (type 4: J,T) and computes A,B,X,Y from them. The validation does require EC point multiplication in affine coordinates.
Why does it require EC point multiplication in affine coordinates?
mjm is offline   Reply With Quote
Old 2013-07-29, 07:21   #21
danaj
 
"Dana Jacobsen"
Feb 2011
Bangkok, TH

2·3·151 Posts
Default

Quote:
Originally Posted by mjm View Post
Why does it require EC point multiplication in affine coordinates?
It shouldn't actually require affine coordinates. I've thought about changing my code to do it in projective, which should be much faster (at certainly is for ECM). This won't change the proof times very much (so hasn't been high on my list), but is where almost all verification time is spent.
danaj is offline   Reply With Quote
Old 2013-08-05, 16:33   #22
chris2be8
 
chris2be8's Avatar
 
Sep 2009

2·7·139 Posts
Default

Quote:
Originally Posted by danaj View Post
This page on the "prime pages" website is pretty useful.
One question inspired by that page is can ECPP exploit known factors of n+1 or n-1? Eg cases like R49081 where n-1 has several algebraic factors, but not enough are fully factored to prove it prime by the n-1 method.

Chris
chris2be8 is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
New ECPP record mjm Computer Science & Computational Number Theory 33 2020-02-13 14:50
ECPP on Windows? CRGreathouse Software 10 2015-09-14 12:32
Can I just leave this here? (ECPP) trhabib Miscellaneous Math 6 2011-08-19 16:34
Looking for ECPP software nuggetprime Software 14 2010-03-07 17:09
new ECPP article R. Gerbicz GMP-ECM 2 2006-09-13 16:24

All times are UTC. The time now is 01:32.

Wed Dec 2 01:32:01 UTC 2020 up 82 days, 22:42, 1 user, load averages: 2.10, 1.79, 1.67

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2020, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.