mersenneforum.org  

Go Back   mersenneforum.org > Factoring Projects > Factoring

Reply
 
Thread Tools
Old 2017-10-11, 00:48   #34
jasonp
Tribal Bullet
 
jasonp's Avatar
 
Oct 2004

2·3·19·31 Posts
Default

The MPQS code in GGNFS was incredibly fast back when I tested it; the ECM code in CADO-NFS is also blazing fast for ranges of inputs that are above the sizes discussed here (64-128 bits).

If you want another (not well optimized) SIQS, I wrote this in 2005 and modernized it a short while ago.
jasonp is offline   Reply With Quote
Old 2017-10-11, 04:00   #35
danaj
 
"Dana Jacobsen"
Feb 2011
Bangkok, TH

22×227 Posts
Default

Quote:
Originally Posted by jasonp View Post
The MPQS code in GGNFS was incredibly fast back when I tested it; the ECM code in CADO-NFS is also blazing fast for ranges of inputs that are above the sizes discussed here (64-128 bits).

If you want another (not well optimized) SIQS, I wrote this in 2005 and modernized it a short while ago.
I think 64-128 bit is a different set of codes to examine. Looking at code that doesn't require GMP is also interesting for some uses (albeit not big factoring projects).

The cof-siqs run on the semiprime set does quite poorly at small bit sizes. It misses factoring a few at each size so would need a backup. It looks like it outperforms smallmpqs on the data sets I have (which stop at 64-bit). At 63-bit it's close but Ben's new Pollard-Rho beats it as does mine when I use his new mulredc. At 64-bit it is ~30% faster than my native Pollard-Rho, about 10% faster than Ben's 64-bit version, and factor63 doesn't handle that size.
danaj is offline   Reply With Quote
Old 2017-10-11, 09:34   #36
arbooker
 
arbooker's Avatar
 
"Andrew Booker"
Mar 2013

5·17 Posts
Default

Here are few ideas for improving factor63 that I had after reading recent posts. Any opinions?
  • I could generate most of the table at compile time. That would take a few minutes but reduce the download size to about half a GB. I could also include the download as part of the build so that it's less work for the user.
  • I could add a simple checksum calculation at initialization time, which would guard against file corruption and might help convince the OS that we really want the table loaded in memory. (The call to madvise() is supposed to do that, but maybe it isn't enough.) On my corei5 laptop running Linux, that adds about 0.42 seconds to initfactor63() when the table is already cached. On a slower AMD server, it adds about 3 seconds.
  • Would it be worth writing a 64-bit version? I could use one of your implementations of 64-bit mulredc() and extend everything else in a straightforward way. That would make the data table even larger though (about 6GB).
arbooker is offline   Reply With Quote
Old 2017-10-11, 13:36   #37
bsquared
 
bsquared's Avatar
 
"Ben"
Feb 2007

1101001010112 Posts
Default

Quote:
Originally Posted by arbooker View Post
Here are few ideas for improving factor63 that I had after reading recent posts. Any opinions?
  • I could generate most of the table at compile time. That would take a few minutes but reduce the download size to about half a GB. I could also include the download as part of the build so that it's less work for the user.
  • I could add a simple checksum calculation at initialization time, which would guard against file corruption and might help convince the OS that we really want the table loaded in memory. (The call to madvise() is supposed to do that, but maybe it isn't enough.) On my corei5 laptop running Linux, that adds about 0.42 seconds to initfactor63() when the table is already cached. On a slower AMD server, it adds about 3 seconds.
  • Would it be worth writing a 64-bit version? I could use one of your implementations of 64-bit mulredc() and extend everything else in a straightforward way. That would make the data table even larger though (about 6GB).
1) I like it, but even better how about creating a second program that generates the table completely? I would much rather download a 1kB file, compile and let run for a while, and then start using factor63, than download 4 or 6 GB. But I am ignorant - does full table building take much much longer than a typical download (i.e. hours)?
2) A small up front cost is fine for most of the projects that I imagine would use factor63 since it would be greatly amortized. That said, the systems I've been running it on seem to cache it just fine (real ~= user)
3) I'm leaning toward no, mostly because of the additional file size and because my pbrent routine is now within about 1% or even slightly faster than factor63, using no table. But admittedly that is for the specialized semiprime problem, not for general inputs. And also the speed of my pbrent routine I think mostly comes from an assembler implementation of mulredc/addmod, which you are welcome to use (P.M me if interested), and that would likely put factor63 solidly in the lead again.
bsquared is offline   Reply With Quote
Old 2017-10-11, 14:00   #38
bsquared
 
bsquared's Avatar
 
"Ben"
Feb 2007

3,371 Posts
Default

Here are some quick comparisons. "rand" is the first test we tried with inputs from an LCG. I haven't tried "rand" with the new pbrent yet - it needs a recipe for checking isprime, finding small factors, etc., before trying that test.

Code:
bits  pbrent   f63    f63-asm
48     3.6     4.2     3.9
55    11.3    13.3    12.2
60    27.4    26.5    24.2
rand   --      1.62    1.47
bsquared is offline   Reply With Quote
Old 2017-10-11, 14:53   #39
arbooker
 
arbooker's Avatar
 
"Andrew Booker"
Mar 2013

5×17 Posts
Default

Quote:
Originally Posted by bsquared View Post
1) I like it, but even better how about creating a second program that generates the table completely? I would much rather download a 1kB file, compile and let run for a while, and then start using factor63, than download 4 or 6 GB. But I am ignorant - does full table building take much much longer than a typical download (i.e. hours)?
There are actually three tables:
  • Factorization table for small numbers. This is the largest (about 3GB in factor63) and easiest to generate--it takes just a few minutes.
  • Table of Pollard rho collisions, about 560MB. This could in principle be done by the user, but we're talking at least days to compute the full table. My memory is hazy, but it was some number of hours on 64 cores when I computed it.
  • Table of strong pseudoprimes, about 180 MB. This is the result of a lot of computing effort by Feitsma et al and could not be easily reproduced. (In fact to my knowledge no one has independently verified the computation, which should be done at some point.) An alternative is to add a second base or Lucas test (a la Baillie-PSW) so that a much smaller table (or even no table) suffices, but that would make isprime() take about twice as long when the input is prime.
Quote:
2) A small up front cost is fine for most of the projects that I imagine would use factor63 since it would be greatly amortized. That said, the systems I've been running it on seem to cache it just fine (real ~= user)
I could add it as an option, but I won't bother if there's no real need.
Quote:
3) I'm leaning toward no, mostly because of the additional file size and because my pbrent routine is now within about 1% or even slightly faster than factor63, using no table. But admittedly that is for the specialized semiprime problem, not for general inputs.
This is understandable. The collision table handles some rare failure cases for Pollard rho, but those could be dealt with by a backup algorithm without hurting the average performance. (This could even be just a second P-R with a different polynomial, which would shrink the table a lot.) Where I think the tables are most likely to help is in factoring small numbers. In factor63 that's done entirely by table lookup for numbers of square root size (at most 2^31.5). That even helps when doing Pollard rho, since smooth composite factors are pretty common. That speedup won't be apparent from a semiprime test, so it would be worth comparing performance on random numbers also.
Quote:
And also the speed of my pbrent routine I think mostly comes from an assembler implementation of mulredc/addmod, which you are welcome to use
Thanks. That's definitely worth a try, though I'm skeptical of making a huge improvement for 63-bit mulredc. Pre-pentium I understood what the chip was doing well enough to write better assembly language than my compiler. But that's no longer the case, and I usually find that it's best to coax the C compiler into emitting something close to what I want. That's not too hard for 63-bit mulredc, but for 64-bit you need access to the carry bit, so assembly language is the only reasonable route. [EDIT: I just saw your stats on this. Nice! How about making a pull request on github with your asm mulredc?]
Quote:
that would likely put factor63 solidly in the lead again.
We shouldn't view this as a competition, and I don't mind if anyone wants to extract just some bits of the factor63 code and leave the rest.
arbooker is offline   Reply With Quote
Old 2017-10-11, 16:34   #40
danaj
 
"Dana Jacobsen"
Feb 2011
Bangkok, TH

22·227 Posts
Default

Quote:
Originally Posted by arbooker View Post
Table of strong pseudoprimes, about 180 MB. This is the result of a lot of computing effort by Feitsma et al and could not be easily reproduced. (In fact to my knowledge no one has independently verified the computation, which should be done at some point.) An alternative is to add a second base or Lucas test (a la Baillie-PSW) so that a much smaller table (or even no table) suffices, but that would make isprime() take about twice as long when the input is prime.
You could use an Euler-Plumb test which is only a couple extra lines, typically faster, and generates about 40% the number of pseudoprimes. Or a strong pseudoprime test which isn't much different in cost than a Fermat test and generates about 30% the number. Both are subsets of psp-2 so you can easily create the complete list. Or as you say add a Lucas test though that makes it take longer.

Quote:
[...] In factor63 that's done entirely by table lookup for numbers of square root size (at most 2^31.5). That even helps when doing Pollard rho, since smooth composite factors are pretty common. That speedup won't be apparent from a semiprime test, so it would be worth comparing performance on random numbers also.
Both types of tests are important. Ben's Lehman implementation helps immensely for small sizes. That's going to be an interesting thing to see. So far there's been some good progress in 32- to 64-bit single-algorithm semiprime factoring.

Quote:
Nice! How about making a pull request on github with your asm mulredc?
No pull request, but you can see Ben's 63-bit and 64-bit asm here

Quote:
We shouldn't view this as a competition, and I don't mind if anyone wants to extract just some bits of the factor63 code and leave the rest.
I think this has been great, since this is something I care about. Everyone has something to offer, and together we can make faster solutions. There are also a variety of use cases, making some solutions better for some purposes. There's plenty of room. Hopefully this will pull out a few more implementations as well.
danaj is offline   Reply With Quote
Old 2017-10-11, 16:55   #41
bsquared
 
bsquared's Avatar
 
"Ben"
Feb 2007

3,371 Posts
Default

Quote:
Originally Posted by danaj View Post
Both types of tests are important. Ben's Lehman implementation helps immensely for small sizes. That's going to be an interesting thing to see. So far there's been some good progress in 32- to 64-bit single-algorithm semiprime factoring.
Proper attribution: Warren D. Smith kindly provided that Lehman implementation to yafu some time ago. It is very well tuned.

I don't really view this as a competition, but a competitive angle does sometimes provide motivation :)
bsquared is offline   Reply With Quote
Old 2017-10-11, 22:19   #42
danaj
 
"Dana Jacobsen"
Feb 2011
Bangkok, TH

16148 Posts
Default

Nvm on the test update. I got confused by the terms "psp" and "Fermat test" in the code. You're doing a Miller-Rabin aka strong pseudoprime test already.

For 32-bit inputs you could use a small hash table of bases so only one test is necessary for correct answers, though you're already using your big factor table for most of that range, and it doesn't materially change your table size. After that I guess your choice of big table (memory cost), two tests and a ~1M entry table (+1x CPU, still some memory cost), or a Lucas test and no table (+1.5x CPU, no memory). There are hashed 2-M-R test solutions but they only go to ~49 bits which doesn't seem to really help you.


Ben is right that Warren D. Smith wrote the Lehman implementation which has had minor changes since. My HOLF implementation seems to do well at small sizes -- more testing would be useful here for those few people that care about microsecond differences in factoring 17- to 32-bit composites. 32-bit to 42-bit where Pollard-Rho crosses HOLF/Lehman is another interesting range. From 42- to 63-bit so far it seems the optimized Pollard-Rho implementations are winning.
danaj is offline   Reply With Quote
Old 2017-10-12, 21:17   #43
arbooker
 
arbooker's Avatar
 
"Andrew Booker"
Mar 2013

5×17 Posts
Default

Quote:
Originally Posted by danaj View Post
After that I guess your choice of big table (memory cost), two tests and a ~1M entry table (+1x CPU, still some memory cost), or a Lucas test and no table (+1.5x CPU, no memory).
Agreed, though I strongly favor the first solution. For PCs (or even smartphones), 180MB isn't too much memory to sacrifice these days, and despite the poor cache locality, it's still quite a bit quicker than running a second test.

I'm experimenting with a 64-bit version using Ben's asm mulredc. One thing that surprised me is how much time is saved in factor63 by using the Sylvester sequence, which allowed me to replace one addmod by a simple addition (the line y = mulredc(y,y+one)). For 64 bits we need to use addmod (or submod, which is slightly quicker), and that single change adds more time than the savings from using asm mulredc. (On the bright side, this shows how close to the bone we're cutting in the code.) I noticed that Ben's routine handles 63-bit and 64-bit n separately, and I might just move the if check farther out so the penalty is only incurred for 64-bit numbers.
arbooker is offline   Reply With Quote
Old 2017-10-12, 22:10   #44
bsquared
 
bsquared's Avatar
 
"Ben"
Feb 2007

64538 Posts
Default

Quote:
Originally Posted by arbooker View Post
One thing that surprised me is how much time is saved in factor63 by using the Sylvester sequence, which allowed me to replace one addmod by a simple addition (the line y = mulredc(y,y+one)).
This is a new one for me... do you have a reference you could point me to? I was wondering was kind of iteration was going on in your code. I had assumed there was an addmod buried in there somewhere that I wasn't smart enough to find.
bsquared is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
Factoring Mersenne numbers paulunderwood Miscellaneous Math 18 2017-08-27 14:56
Factoring Big Numbers In c# ShridharRasal Factoring 10 2008-03-20 17:17
Question about factoring code Peter Nelson Software 9 2005-03-30 18:28
Factoring Fermat Numbers Axel Fox Software 14 2003-07-04 18:57
Questions about SSE2 code and Factoring Joe O Software 2 2002-09-13 23:39

All times are UTC. The time now is 12:22.

Sun Mar 7 12:22:47 UTC 2021 up 94 days, 8:34, 0 users, load averages: 1.09, 1.32, 1.36

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.