mersenneforum.org  

Go Back   mersenneforum.org > Factoring Projects > Factoring

Reply
 
Thread Tools
Old 2017-08-19, 14:44   #1
arbooker
 
arbooker's Avatar
 
"Andrew Booker"
Mar 2013

5×17 Posts
Default Some code for factoring numbers up to 2^63

While working on aliquot sequences last year, I wrote some fast code for factoring and primality testing of numbers < 2^63. I figured that it might be useful to others, so I've made it available on github at
https://github.com/arbooker/factor63

One feature of the code is a guaranteed upper bound on the running time for factoring. It's based on Pollard rho, with a data table for resolving collisions and prime factors with long cycles. The primality testing is based on Jan Feitsma's base-2 pseudoprime database.

Sergei Chernykh has released similar code for factoring that works up to 2^64. I stopped at 2^63 both to simplify the coding of Montgomery arithmetic and to keep the data table under 4GB.

The code is written for GCC on 64-bit Linux. I'm not planning to port it to any other platforms, but I don't mind if other people with the know-how would like to do so (that's why I put it on github). As far as I can see, the only system-dependent things that would need to be ported are the POSIX function mmap(), support for 128-bit arithmetic (GCC's __uint128_t) and GCC's __builtin_ctzl()/__builtin_clzl().
arbooker is offline   Reply With Quote
Old 2017-10-05, 18:27   #2
Till
 
Till's Avatar
 
"Tilman Neumann"
Jan 2016
Germany

2×227 Posts
Default

Did you compare it to a good Squfof implementation?
In my experiments, Squfof turns out much faster than any PollardRho-style algorithm. Albeit yours may be much more refined...
Till
Till is offline   Reply With Quote
Old 2017-10-05, 20:25   #3
danaj
 
"Dana Jacobsen"
Feb 2011
Bangkok, TH

22×227 Posts
Default

Quote:
Originally Posted by Till View Post
Did you compare it to a good Squfof implementation?
In my experiments, Squfof turns out much faster than any PollardRho-style algorithm. Albeit yours may be much more refined...
Till
I haven't yet looked at in detail, but a couple comments about that.

GNU factor uses both P-R and SQUFOF. Pollard-Rho has its place. Especially since it's quite amenable to using Montgomery math.

Yafu is another place to look for some very well optimized 64-bit factoring, though it's not well separated from its source.

I use small trial factoring followed by some Pollard-Rho (Brent), P-1, and SQUFOF. Your mileage will vary. I found it to be faster than Pari/GP and GNU factor. My is_prime is about 4-5x faster than Pari's C function uisprime.

For my measurements I typically assume random inputs in terms of tuning for performance. That means some wasted time if you knew the inputs were more skewed toward semiprimes.


As for factor63, the tiny code size is nice, but the 3.8GB mmapped file required for downloading is obnoxious and makes the whole thing unusable for most purposes. I couldn't distribute a project using this. But I see that for people who just want to run something on their own it could work.

I recommend making a #ifdef'd main function that does simple factoring.

The isprime for large values is about 1.6x faster than mine, though requires the huge table and stops at 63 bit. The Fermat test is nice and fast regardless.

For factoring, it's a bit interesting. The actual CPU time used is pretty good -- about 30% faster than mine for 62-bit inputs, but the total clock time is worse due to having to traverse the giant file.

Given the input 8871153694124723083 it will fail with a floating point exception. It works with iterations bumped to 500000. In my case I run SQUFOF if pbrent fails to find a factor, then tune the iteration count to get the best performance (number varies by input size and how fast mulmods are, since I have to run on other platforms without x86 or uint128_t). It seems you need a backup plan, or raise the iterations to something larger.

I've been meaning for years to pull my code out into a generic library. I'm kind of thinking now that I should merge it with my GMP code for similar functionality. My holdup is mainly that this is a lot of work that isn't fun numerical coding :).
danaj is offline   Reply With Quote
Old 2017-10-05, 20:42   #4
CRGreathouse
 
CRGreathouse's Avatar
 
Aug 2006

3·1,993 Posts
Default

Jérôme Milan compared algorithms and found that, of the algorithms he tested, SQUFOF was far and away the king from 45 bits until the crossover with SIQS around 59 bits. ECM, CFRAC, McKee, MPQS, and QS didn't even get close. Rho, unfortunately, wasn't compared, but maybe someone could run factor63 against TIFA's siqs_factors and squfof_factors? And/or Msieve and yafu?
CRGreathouse is offline   Reply With Quote
Old 2017-10-05, 21:30   #5
danaj
 
"Dana Jacobsen"
Feb 2011
Bangkok, TH

22×227 Posts
Default

Quote:
Originally Posted by CRGreathouse View Post
Jérôme Milan compared algorithms and found that, of the algorithms he tested, SQUFOF was far and away the king from 45 bits until the crossover with SIQS around 59 bits.
In section 3 it says the composites are all semiprimes, making it an entirely different problem than looking at random n-bit inputs. If the goal is to optimize average time on random n-bit inputs, then it's definitely worth doing some work to look for small factors, and Pollard-Rho is often a good choice to use before SQUFOF.

[/quote] ECM, CFRAC, McKee, MPQS, and QS didn't even get close. Rho, unfortunately, wasn't compared, but maybe someone could run factor63 against TIFA's siqs_factors and squfof_factors? And/or Msieve and yafu?[/QUOTE]

factor63's big table of numbers makes it problematic. Not only does it have a mod-2 bitmask of primality and millions of pseudoprimes, but it includes 146 million 32-bit numbers used in the Pollard-Rho implementation to help find collisions.
danaj is offline   Reply With Quote
Old 2017-10-05, 21:36   #6
bsquared
 
bsquared's Avatar
 
"Ben"
Feb 2007

22·3·293 Posts
Default

Quote:
Originally Posted by CRGreathouse View Post
Jérôme Milan compared algorithms and found that, of the algorithms he tested, SQUFOF was far and away the king from 45 bits until the crossover with SIQS around 59 bits. ECM, CFRAC, McKee, MPQS, and QS didn't even get close. Rho, unfortunately, wasn't compared, but maybe someone could run factor63 against TIFA's siqs_factors and squfof_factors? And/or Msieve and yafu?
The Lehman method is also good for small inputs... Yafu uses it below 40 bits.

Here is some comparitive data for squfof implementations. (Yafu's squfof is now about twice as fast as was reported there (an avx2 vectorized version).)

I'll see if I can get Factor63 running on the same system to compare.

Last fiddled with by bsquared on 2017-10-05 at 21:36
bsquared is offline   Reply With Quote
Old 2017-10-06, 02:34   #7
bsquared
 
bsquared's Avatar
 
"Ben"
Feb 2007

22×3×293 Posts
Default

Adding this quick test wrapper:

Code:
int main(int argc, char **argv)
{
    u64 state = 28953;
    i64 p[64];
    int i, d, num = 10000, e[64];
    FILE *fid;
    
    initfactor63("factor.bin");
    
    fid = fopen("results.txt", "w");
    for (i=0; i<num; i++)
    {
        state = 6364136223846793005ULL * state + 1442695040888963407ULL;
        d = factor63(p, e, state>>1);
        fprintf(fid, "%lu has %d factors\n", state>>1, d);
    }
    
    fclose(fid);
    return 0;
}
I get about 1.2 ms per random input, on average:
Code:
 
 493 % gcc -O2 factor63.c -lm
 494 % time ./a.out
0.226u 0.240s 0:12.83 3.5%      0+0k 0+664io 1529pf+0w
bsquared is offline   Reply With Quote
Old 2017-10-06, 04:08   #8
CRGreathouse
 
CRGreathouse's Avatar
 
Aug 2006

597910 Posts
Default

Do you have comparison figures for other programs/algorithms? The link only goes to 59 bits.
CRGreathouse is offline   Reply With Quote
Old 2017-10-06, 05:14   #9
danaj
 
"Dana Jacobsen"
Feb 2011
Bangkok, TH

22·227 Posts
Default

Quote:
Originally Posted by bsquared View Post
I get about 1.2 ms per random input, on average:
Code:
 493 % gcc -O2 factor63.c -lm
 494 % time ./a.out
0.226u 0.240s 0:12.83 3.5%      0+0k 0+664io 1529pf+0w
I'm not following how that turns into 1.2ms (0.226s for 10k inputs). Wouldn't that be 23us?

On my Macbook using the same wrapper but bumped to 100k. Note the real vs. user time.
Code:
$ gcc -O2 -march=native factor63.c
$ time ./a.out

real	0m5.312s
user	0m2.066s
sys	0m0.472s
Using a wrapper around the Math-Prime-Util factor_exp call that gives the identical 100k line output
Code:
real	0m2.728s
user	0m2.681s
sys	0m0.026s
With a bit more P-R before calling SQUFOF I can shave off another 0.1s but that's still not 2.0. At that point only 6 of the 100,000 numbers make it past the P-R at all. With my default settings, 108/100000 get through. So almost all the time is spent in P-R.

I'd love to have a faster SQUFOF even though it looks like it wouldn't make much difference here (but would if we made the inputs harder). About one year ago I finally converted my P-R-Brent to Montgomery math which gave a nice speedup.


Added:
FLINT2 (https://github.com/fredrik-johansson/flint2), n_factor from ulong_extras. Same shasum.
Code:
$ time ./a.out

real	0m27.774s
user	0m26.480s
sys	0m0.353s

Last fiddled with by danaj on 2017-10-06 at 05:46 Reason: Add FLINT2
danaj is offline   Reply With Quote
Old 2017-10-06, 06:57   #10
danaj
 
"Dana Jacobsen"
Feb 2011
Bangkok, TH

22·227 Posts
Default

GNU NT Factor, handles up to 127-bits. Similar wrapper, calling factor() so skips all the text based input. Same 100k line output.

The factor() code does some trial division first then calls the appropriate algorithm with either 1-word (what we use) or 2-word (for 127-bit) paths. So we just use the normal path. The 64-bit P-R algorithm main loop looks quite similar to factor63 and MPU, just with different code snippets for the mont_mulmod aka mulredc and different inner/outer loop counts depending on how we tuned. factor63 goes a completely different direction once we find a gcd != 1 and exit the main loop.

Pollard-Rho:

Code:
$ time ./factor

real	0m4.046s
user	0m4.005s
sys	0m0.029s
SQUFOF:

Code:
$ time ./factor

real	0m27.199s
user	0m26.372s
sys	0m0.203s
danaj is offline   Reply With Quote
Old 2017-10-06, 07:03   #11
danaj
 
"Dana Jacobsen"
Feb 2011
Bangkok, TH

22×227 Posts
Default

Quote:
Originally Posted by danaj View Post
I'm not following how that turns into 1.2ms (0.226s for 10k inputs). Wouldn't that be 23us?
Finally decoded this, not long after my edit window closed. So on my Macbook with SSD there is a penalty for using the big file but it's ~2-3x. On your machine it was a massive 50x penalty. Hence the 1283us instead of 23us. Ouch.
danaj is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
Factoring Mersenne numbers paulunderwood Miscellaneous Math 18 2017-08-27 14:56
Factoring Big Numbers In c# ShridharRasal Factoring 10 2008-03-20 17:17
Question about factoring code Peter Nelson Software 9 2005-03-30 18:28
Factoring Fermat Numbers Axel Fox Software 14 2003-07-04 18:57
Questions about SSE2 code and Factoring Joe O Software 2 2002-09-13 23:39

All times are UTC. The time now is 08:10.


Wed Jul 28 08:10:38 UTC 2021 up 5 days, 2:39, 0 users, load averages: 1.60, 1.58, 1.56

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.