mersenneforum.org Some code for factoring numbers up to 2^63
 Register FAQ Search Today's Posts Mark Forums Read

 2017-08-19, 14:44 #1 arbooker     "Andrew Booker" Mar 2013 22·23 Posts Some code for factoring numbers up to 2^63 While working on aliquot sequences last year, I wrote some fast code for factoring and primality testing of numbers < 2^63. I figured that it might be useful to others, so I've made it available on github at https://github.com/arbooker/factor63 One feature of the code is a guaranteed upper bound on the running time for factoring. It's based on Pollard rho, with a data table for resolving collisions and prime factors with long cycles. The primality testing is based on Jan Feitsma's base-2 pseudoprime database. Sergei Chernykh has released similar code for factoring that works up to 2^64. I stopped at 2^63 both to simplify the coding of Montgomery arithmetic and to keep the data table under 4GB. The code is written for GCC on 64-bit Linux. I'm not planning to port it to any other platforms, but I don't mind if other people with the know-how would like to do so (that's why I put it on github). As far as I can see, the only system-dependent things that would need to be ported are the POSIX function mmap(), support for 128-bit arithmetic (GCC's __uint128_t) and GCC's __builtin_ctzl()/__builtin_clzl().
 2017-10-05, 18:27 #2 Till     "Tilman Neumann" Jan 2016 Germany 13×41 Posts Did you compare it to a good Squfof implementation? In my experiments, Squfof turns out much faster than any PollardRho-style algorithm. Albeit yours may be much more refined... Till
2017-10-05, 20:25   #3
danaj

"Dana Jacobsen"
Feb 2011
Bangkok, TH

90910 Posts

Quote:
 Originally Posted by Till Did you compare it to a good Squfof implementation? In my experiments, Squfof turns out much faster than any PollardRho-style algorithm. Albeit yours may be much more refined... Till
I haven't yet looked at in detail, but a couple comments about that.

GNU factor uses both P-R and SQUFOF. Pollard-Rho has its place. Especially since it's quite amenable to using Montgomery math.

Yafu is another place to look for some very well optimized 64-bit factoring, though it's not well separated from its source.

I use small trial factoring followed by some Pollard-Rho (Brent), P-1, and SQUFOF. Your mileage will vary. I found it to be faster than Pari/GP and GNU factor. My is_prime is about 4-5x faster than Pari's C function uisprime.

For my measurements I typically assume random inputs in terms of tuning for performance. That means some wasted time if you knew the inputs were more skewed toward semiprimes.

As for factor63, the tiny code size is nice, but the 3.8GB mmapped file required for downloading is obnoxious and makes the whole thing unusable for most purposes. I couldn't distribute a project using this. But I see that for people who just want to run something on their own it could work.

I recommend making a #ifdef'd main function that does simple factoring.

The isprime for large values is about 1.6x faster than mine, though requires the huge table and stops at 63 bit. The Fermat test is nice and fast regardless.

For factoring, it's a bit interesting. The actual CPU time used is pretty good -- about 30% faster than mine for 62-bit inputs, but the total clock time is worse due to having to traverse the giant file.

Given the input 8871153694124723083 it will fail with a floating point exception. It works with iterations bumped to 500000. In my case I run SQUFOF if pbrent fails to find a factor, then tune the iteration count to get the best performance (number varies by input size and how fast mulmods are, since I have to run on other platforms without x86 or uint128_t). It seems you need a backup plan, or raise the iterations to something larger.

I've been meaning for years to pull my code out into a generic library. I'm kind of thinking now that I should merge it with my GMP code for similar functionality. My holdup is mainly that this is a lot of work that isn't fun numerical coding :).

 2017-10-05, 20:42 #4 CRGreathouse     Aug 2006 5,987 Posts Jérôme Milan compared algorithms and found that, of the algorithms he tested, SQUFOF was far and away the king from 45 bits until the crossover with SIQS around 59 bits. ECM, CFRAC, McKee, MPQS, and QS didn't even get close. Rho, unfortunately, wasn't compared, but maybe someone could run factor63 against TIFA's siqs_factors and squfof_factors? And/or Msieve and yafu?
2017-10-05, 21:30   #5
danaj

"Dana Jacobsen"
Feb 2011
Bangkok, TH

11100011012 Posts

Quote:
 Originally Posted by CRGreathouse Jérôme Milan compared algorithms and found that, of the algorithms he tested, SQUFOF was far and away the king from 45 bits until the crossover with SIQS around 59 bits.
In section 3 it says the composites are all semiprimes, making it an entirely different problem than looking at random n-bit inputs. If the goal is to optimize average time on random n-bit inputs, then it's definitely worth doing some work to look for small factors, and Pollard-Rho is often a good choice to use before SQUFOF.

[/quote] ECM, CFRAC, McKee, MPQS, and QS didn't even get close. Rho, unfortunately, wasn't compared, but maybe someone could run factor63 against TIFA's siqs_factors and squfof_factors? And/or Msieve and yafu?[/QUOTE]

factor63's big table of numbers makes it problematic. Not only does it have a mod-2 bitmask of primality and millions of pseudoprimes, but it includes 146 million 32-bit numbers used in the Pollard-Rho implementation to help find collisions.

2017-10-05, 21:36   #6
bsquared

"Ben"
Feb 2007

373110 Posts

Quote:
 Originally Posted by CRGreathouse Jérôme Milan compared algorithms and found that, of the algorithms he tested, SQUFOF was far and away the king from 45 bits until the crossover with SIQS around 59 bits. ECM, CFRAC, McKee, MPQS, and QS didn't even get close. Rho, unfortunately, wasn't compared, but maybe someone could run factor63 against TIFA's siqs_factors and squfof_factors? And/or Msieve and yafu?
The Lehman method is also good for small inputs... Yafu uses it below 40 bits.

Here is some comparitive data for squfof implementations. (Yafu's squfof is now about twice as fast as was reported there (an avx2 vectorized version).)

I'll see if I can get Factor63 running on the same system to compare.

Last fiddled with by bsquared on 2017-10-05 at 21:36

 2017-10-06, 02:34 #7 bsquared     "Ben" Feb 2007 7×13×41 Posts Adding this quick test wrapper: Code: int main(int argc, char **argv) { u64 state = 28953; i64 p[64]; int i, d, num = 10000, e[64]; FILE *fid; initfactor63("factor.bin"); fid = fopen("results.txt", "w"); for (i=0; i>1); fprintf(fid, "%lu has %d factors\n", state>>1, d); } fclose(fid); return 0; } I get about 1.2 ms per random input, on average: Code:  493 % gcc -O2 factor63.c -lm 494 % time ./a.out 0.226u 0.240s 0:12.83 3.5% 0+0k 0+664io 1529pf+0w
 2017-10-06, 04:08 #8 CRGreathouse     Aug 2006 5,987 Posts Do you have comparison figures for other programs/algorithms? The link only goes to 59 bits.
2017-10-06, 05:14   #9
danaj

"Dana Jacobsen"
Feb 2011
Bangkok, TH

32×101 Posts

Quote:
 Originally Posted by bsquared I get about 1.2 ms per random input, on average: Code:  493 % gcc -O2 factor63.c -lm 494 % time ./a.out 0.226u 0.240s 0:12.83 3.5% 0+0k 0+664io 1529pf+0w
I'm not following how that turns into 1.2ms (0.226s for 10k inputs). Wouldn't that be 23us?

On my Macbook using the same wrapper but bumped to 100k. Note the real vs. user time.
Code:
$gcc -O2 -march=native factor63.c$ time ./a.out

real	0m5.312s
user	0m2.066s
sys	0m0.472s
Using a wrapper around the Math-Prime-Util factor_exp call that gives the identical 100k line output
Code:
real	0m2.728s
user	0m2.681s
sys	0m0.026s
With a bit more P-R before calling SQUFOF I can shave off another 0.1s but that's still not 2.0. At that point only 6 of the 100,000 numbers make it past the P-R at all. With my default settings, 108/100000 get through. So almost all the time is spent in P-R.

I'd love to have a faster SQUFOF even though it looks like it wouldn't make much difference here (but would if we made the inputs harder). About one year ago I finally converted my P-R-Brent to Montgomery math which gave a nice speedup.

FLINT2 (https://github.com/fredrik-johansson/flint2), n_factor from ulong_extras. Same shasum.
Code:
$time ./a.out real 0m27.774s user 0m26.480s sys 0m0.353s Last fiddled with by danaj on 2017-10-06 at 05:46 Reason: Add FLINT2  2017-10-06, 06:57 #10 danaj "Dana Jacobsen" Feb 2011 Bangkok, TH 32×101 Posts GNU NT Factor, handles up to 127-bits. Similar wrapper, calling factor() so skips all the text based input. Same 100k line output. The factor() code does some trial division first then calls the appropriate algorithm with either 1-word (what we use) or 2-word (for 127-bit) paths. So we just use the normal path. The 64-bit P-R algorithm main loop looks quite similar to factor63 and MPU, just with different code snippets for the mont_mulmod aka mulredc and different inner/outer loop counts depending on how we tuned. factor63 goes a completely different direction once we find a gcd != 1 and exit the main loop. Pollard-Rho: Code: $ time ./factor real 0m4.046s user 0m4.005s sys 0m0.029s SQUFOF: Code: \$ time ./factor real 0m27.199s user 0m26.372s sys 0m0.203s
2017-10-06, 07:03   #11
danaj

"Dana Jacobsen"
Feb 2011
Bangkok, TH

32×101 Posts

Quote:
 Originally Posted by danaj I'm not following how that turns into 1.2ms (0.226s for 10k inputs). Wouldn't that be 23us?
Finally decoded this, not long after my edit window closed. So on my Macbook with SSD there is a penalty for using the big file but it's ~2-3x. On your machine it was a massive 50x penalty. Hence the 1283us instead of 23us. Ouch.

 Similar Threads Thread Thread Starter Forum Replies Last Post paulunderwood Miscellaneous Math 18 2017-08-27 14:56 ShridharRasal Factoring 10 2008-03-20 17:17 Peter Nelson Software 9 2005-03-30 18:28 Axel Fox Software 14 2003-07-04 18:57 Joe O Software 2 2002-09-13 23:39

All times are UTC. The time now is 23:55.

Mon Jan 30 23:55:39 UTC 2023 up 165 days, 21:24, 1 user, load averages: 1.02, 0.98, 0.96