20170819, 14:44  #1 
"Andrew Booker"
Mar 2013
2^{2}·23 Posts 
Some code for factoring numbers up to 2^63
While working on aliquot sequences last year, I wrote some fast code for factoring and primality testing of numbers < 2^63. I figured that it might be useful to others, so I've made it available on github at
https://github.com/arbooker/factor63 One feature of the code is a guaranteed upper bound on the running time for factoring. It's based on Pollard rho, with a data table for resolving collisions and prime factors with long cycles. The primality testing is based on Jan Feitsma's base2 pseudoprime database. Sergei Chernykh has released similar code for factoring that works up to 2^64. I stopped at 2^63 both to simplify the coding of Montgomery arithmetic and to keep the data table under 4GB. The code is written for GCC on 64bit Linux. I'm not planning to port it to any other platforms, but I don't mind if other people with the knowhow would like to do so (that's why I put it on github). As far as I can see, the only systemdependent things that would need to be ported are the POSIX function mmap(), support for 128bit arithmetic (GCC's __uint128_t) and GCC's __builtin_ctzl()/__builtin_clzl(). 
20171005, 18:27  #2 
"Tilman Neumann"
Jan 2016
Germany
13×41 Posts 
Did you compare it to a good Squfof implementation?
In my experiments, Squfof turns out much faster than any PollardRhostyle algorithm. Albeit yours may be much more refined... Till 
20171005, 20:25  #3  
"Dana Jacobsen"
Feb 2011
Bangkok, TH
909_{10} Posts 
Quote:
GNU factor uses both PR and SQUFOF. PollardRho has its place. Especially since it's quite amenable to using Montgomery math. Yafu is another place to look for some very well optimized 64bit factoring, though it's not well separated from its source. I use small trial factoring followed by some PollardRho (Brent), P1, and SQUFOF. Your mileage will vary. I found it to be faster than Pari/GP and GNU factor. My is_prime is about 45x faster than Pari's C function uisprime. For my measurements I typically assume random inputs in terms of tuning for performance. That means some wasted time if you knew the inputs were more skewed toward semiprimes. As for factor63, the tiny code size is nice, but the 3.8GB mmapped file required for downloading is obnoxious and makes the whole thing unusable for most purposes. I couldn't distribute a project using this. But I see that for people who just want to run something on their own it could work. I recommend making a #ifdef'd main function that does simple factoring. The isprime for large values is about 1.6x faster than mine, though requires the huge table and stops at 63 bit. The Fermat test is nice and fast regardless. For factoring, it's a bit interesting. The actual CPU time used is pretty good  about 30% faster than mine for 62bit inputs, but the total clock time is worse due to having to traverse the giant file. Given the input 8871153694124723083 it will fail with a floating point exception. It works with iterations bumped to 500000. In my case I run SQUFOF if pbrent fails to find a factor, then tune the iteration count to get the best performance (number varies by input size and how fast mulmods are, since I have to run on other platforms without x86 or uint128_t). It seems you need a backup plan, or raise the iterations to something larger. I've been meaning for years to pull my code out into a generic library. I'm kind of thinking now that I should merge it with my GMP code for similar functionality. My holdup is mainly that this is a lot of work that isn't fun numerical coding :). 

20171005, 20:42  #4 
Aug 2006
5,987 Posts 
Jérôme Milan compared algorithms and found that, of the algorithms he tested, SQUFOF was far and away the king from 45 bits until the crossover with SIQS around 59 bits. ECM, CFRAC, McKee, MPQS, and QS didn't even get close. Rho, unfortunately, wasn't compared, but maybe someone could run factor63 against TIFA's siqs_factors and squfof_factors? And/or Msieve and yafu?

20171005, 21:30  #5  
"Dana Jacobsen"
Feb 2011
Bangkok, TH
1110001101_{2} Posts 
Quote:
[/quote] ECM, CFRAC, McKee, MPQS, and QS didn't even get close. Rho, unfortunately, wasn't compared, but maybe someone could run factor63 against TIFA's siqs_factors and squfof_factors? And/or Msieve and yafu?[/QUOTE] factor63's big table of numbers makes it problematic. Not only does it have a mod2 bitmask of primality and millions of pseudoprimes, but it includes 146 million 32bit numbers used in the PollardRho implementation to help find collisions. 

20171005, 21:36  #6  
"Ben"
Feb 2007
3731_{10} Posts 
Quote:
Here is some comparitive data for squfof implementations. (Yafu's squfof is now about twice as fast as was reported there (an avx2 vectorized version).) I'll see if I can get Factor63 running on the same system to compare. Last fiddled with by bsquared on 20171005 at 21:36 

20171006, 02:34  #7 
"Ben"
Feb 2007
7×13×41 Posts 
Adding this quick test wrapper:
Code:
int main(int argc, char **argv) { u64 state = 28953; i64 p[64]; int i, d, num = 10000, e[64]; FILE *fid; initfactor63("factor.bin"); fid = fopen("results.txt", "w"); for (i=0; i<num; i++) { state = 6364136223846793005ULL * state + 1442695040888963407ULL; d = factor63(p, e, state>>1); fprintf(fid, "%lu has %d factors\n", state>>1, d); } fclose(fid); return 0; } Code:
493 % gcc O2 factor63.c lm 494 % time ./a.out 0.226u 0.240s 0:12.83 3.5% 0+0k 0+664io 1529pf+0w 
20171006, 04:08  #8 
Aug 2006
5,987 Posts 
Do you have comparison figures for other programs/algorithms? The link only goes to 59 bits.

20171006, 05:14  #9  
"Dana Jacobsen"
Feb 2011
Bangkok, TH
3^{2}×101 Posts 
Quote:
On my Macbook using the same wrapper but bumped to 100k. Note the real vs. user time. Code:
$ gcc O2 march=native factor63.c $ time ./a.out real 0m5.312s user 0m2.066s sys 0m0.472s Code:
real 0m2.728s user 0m2.681s sys 0m0.026s I'd love to have a faster SQUFOF even though it looks like it wouldn't make much difference here (but would if we made the inputs harder). About one year ago I finally converted my PRBrent to Montgomery math which gave a nice speedup. Added: FLINT2 (https://github.com/fredrikjohansson/flint2), n_factor from ulong_extras. Same shasum. Code:
$ time ./a.out real 0m27.774s user 0m26.480s sys 0m0.353s Last fiddled with by danaj on 20171006 at 05:46 Reason: Add FLINT2 

20171006, 06:57  #10 
"Dana Jacobsen"
Feb 2011
Bangkok, TH
3^{2}×101 Posts 
GNU NT Factor, handles up to 127bits. Similar wrapper, calling factor() so skips all the text based input. Same 100k line output.
The factor() code does some trial division first then calls the appropriate algorithm with either 1word (what we use) or 2word (for 127bit) paths. So we just use the normal path. The 64bit PR algorithm main loop looks quite similar to factor63 and MPU, just with different code snippets for the mont_mulmod aka mulredc and different inner/outer loop counts depending on how we tuned. factor63 goes a completely different direction once we find a gcd != 1 and exit the main loop. PollardRho: Code:
$ time ./factor real 0m4.046s user 0m4.005s sys 0m0.029s Code:
$ time ./factor real 0m27.199s user 0m26.372s sys 0m0.203s 
20171006, 07:03  #11 
"Dana Jacobsen"
Feb 2011
Bangkok, TH
3^{2}×101 Posts 
Finally decoded this, not long after my edit window closed. So on my Macbook with SSD there is a penalty for using the big file but it's ~23x. On your machine it was a massive 50x penalty. Hence the 1283us instead of 23us. Ouch.

Thread Tools  
Similar Threads  
Thread  Thread Starter  Forum  Replies  Last Post 
Factoring Mersenne numbers  paulunderwood  Miscellaneous Math  18  20170827 14:56 
Factoring Big Numbers In c#  ShridharRasal  Factoring  10  20080320 17:17 
Question about factoring code  Peter Nelson  Software  9  20050330 18:28 
Factoring Fermat Numbers  Axel Fox  Software  14  20030704 18:57 
Questions about SSE2 code and Factoring  Joe O  Software  2  20020913 23:39 