![]() |
![]() |
#1 |
"Andrew Booker"
Mar 2013
22·23 Posts |
![]()
While working on aliquot sequences last year, I wrote some fast code for factoring and primality testing of numbers < 2^63. I figured that it might be useful to others, so I've made it available on github at
https://github.com/arbooker/factor63 One feature of the code is a guaranteed upper bound on the running time for factoring. It's based on Pollard rho, with a data table for resolving collisions and prime factors with long cycles. The primality testing is based on Jan Feitsma's base-2 pseudoprime database. Sergei Chernykh has released similar code for factoring that works up to 2^64. I stopped at 2^63 both to simplify the coding of Montgomery arithmetic and to keep the data table under 4GB. The code is written for GCC on 64-bit Linux. I'm not planning to port it to any other platforms, but I don't mind if other people with the know-how would like to do so (that's why I put it on github). As far as I can see, the only system-dependent things that would need to be ported are the POSIX function mmap(), support for 128-bit arithmetic (GCC's __uint128_t) and GCC's __builtin_ctzl()/__builtin_clzl(). |
![]() |
![]() |
![]() |
#2 |
"Tilman Neumann"
Jan 2016
Germany
13×41 Posts |
![]()
Did you compare it to a good Squfof implementation?
In my experiments, Squfof turns out much faster than any PollardRho-style algorithm. Albeit yours may be much more refined... Till |
![]() |
![]() |
![]() |
#3 | |
"Dana Jacobsen"
Feb 2011
Bangkok, TH
90910 Posts |
![]() Quote:
GNU factor uses both P-R and SQUFOF. Pollard-Rho has its place. Especially since it's quite amenable to using Montgomery math. Yafu is another place to look for some very well optimized 64-bit factoring, though it's not well separated from its source. I use small trial factoring followed by some Pollard-Rho (Brent), P-1, and SQUFOF. Your mileage will vary. I found it to be faster than Pari/GP and GNU factor. My is_prime is about 4-5x faster than Pari's C function uisprime. For my measurements I typically assume random inputs in terms of tuning for performance. That means some wasted time if you knew the inputs were more skewed toward semiprimes. As for factor63, the tiny code size is nice, but the 3.8GB mmapped file required for downloading is obnoxious and makes the whole thing unusable for most purposes. I couldn't distribute a project using this. But I see that for people who just want to run something on their own it could work. I recommend making a #ifdef'd main function that does simple factoring. The isprime for large values is about 1.6x faster than mine, though requires the huge table and stops at 63 bit. The Fermat test is nice and fast regardless. For factoring, it's a bit interesting. The actual CPU time used is pretty good -- about 30% faster than mine for 62-bit inputs, but the total clock time is worse due to having to traverse the giant file. Given the input 8871153694124723083 it will fail with a floating point exception. It works with iterations bumped to 500000. In my case I run SQUFOF if pbrent fails to find a factor, then tune the iteration count to get the best performance (number varies by input size and how fast mulmods are, since I have to run on other platforms without x86 or uint128_t). It seems you need a backup plan, or raise the iterations to something larger. I've been meaning for years to pull my code out into a generic library. I'm kind of thinking now that I should merge it with my GMP code for similar functionality. My holdup is mainly that this is a lot of work that isn't fun numerical coding :). |
|
![]() |
![]() |
![]() |
#4 |
Aug 2006
5,987 Posts |
![]()
Jérôme Milan compared algorithms and found that, of the algorithms he tested, SQUFOF was far and away the king from 45 bits until the crossover with SIQS around 59 bits. ECM, CFRAC, McKee, MPQS, and QS didn't even get close. Rho, unfortunately, wasn't compared, but maybe someone could run factor63 against TIFA's siqs_factors and squfof_factors? And/or Msieve and yafu?
|
![]() |
![]() |
![]() |
#5 | |
"Dana Jacobsen"
Feb 2011
Bangkok, TH
11100011012 Posts |
![]() Quote:
[/quote] ECM, CFRAC, McKee, MPQS, and QS didn't even get close. Rho, unfortunately, wasn't compared, but maybe someone could run factor63 against TIFA's siqs_factors and squfof_factors? And/or Msieve and yafu?[/QUOTE] factor63's big table of numbers makes it problematic. Not only does it have a mod-2 bitmask of primality and millions of pseudoprimes, but it includes 146 million 32-bit numbers used in the Pollard-Rho implementation to help find collisions. |
|
![]() |
![]() |
![]() |
#6 | |
"Ben"
Feb 2007
373110 Posts |
![]() Quote:
Here is some comparitive data for squfof implementations. (Yafu's squfof is now about twice as fast as was reported there (an avx2 vectorized version).) I'll see if I can get Factor63 running on the same system to compare. Last fiddled with by bsquared on 2017-10-05 at 21:36 |
|
![]() |
![]() |
![]() |
#7 |
"Ben"
Feb 2007
7×13×41 Posts |
![]()
Adding this quick test wrapper:
Code:
int main(int argc, char **argv) { u64 state = 28953; i64 p[64]; int i, d, num = 10000, e[64]; FILE *fid; initfactor63("factor.bin"); fid = fopen("results.txt", "w"); for (i=0; i<num; i++) { state = 6364136223846793005ULL * state + 1442695040888963407ULL; d = factor63(p, e, state>>1); fprintf(fid, "%lu has %d factors\n", state>>1, d); } fclose(fid); return 0; } Code:
493 % gcc -O2 factor63.c -lm 494 % time ./a.out 0.226u 0.240s 0:12.83 3.5% 0+0k 0+664io 1529pf+0w |
![]() |
![]() |
![]() |
#8 |
Aug 2006
5,987 Posts |
![]()
Do you have comparison figures for other programs/algorithms? The link only goes to 59 bits.
|
![]() |
![]() |
![]() |
#9 | |
"Dana Jacobsen"
Feb 2011
Bangkok, TH
32×101 Posts |
![]() Quote:
On my Macbook using the same wrapper but bumped to 100k. Note the real vs. user time. Code:
$ gcc -O2 -march=native factor63.c $ time ./a.out real 0m5.312s user 0m2.066s sys 0m0.472s Code:
real 0m2.728s user 0m2.681s sys 0m0.026s I'd love to have a faster SQUFOF even though it looks like it wouldn't make much difference here (but would if we made the inputs harder). About one year ago I finally converted my P-R-Brent to Montgomery math which gave a nice speedup. Added: FLINT2 (https://github.com/fredrik-johansson/flint2), n_factor from ulong_extras. Same shasum. Code:
$ time ./a.out real 0m27.774s user 0m26.480s sys 0m0.353s Last fiddled with by danaj on 2017-10-06 at 05:46 Reason: Add FLINT2 |
|
![]() |
![]() |
![]() |
#10 |
"Dana Jacobsen"
Feb 2011
Bangkok, TH
32×101 Posts |
![]()
GNU NT Factor, handles up to 127-bits. Similar wrapper, calling factor() so skips all the text based input. Same 100k line output.
The factor() code does some trial division first then calls the appropriate algorithm with either 1-word (what we use) or 2-word (for 127-bit) paths. So we just use the normal path. The 64-bit P-R algorithm main loop looks quite similar to factor63 and MPU, just with different code snippets for the mont_mulmod aka mulredc and different inner/outer loop counts depending on how we tuned. factor63 goes a completely different direction once we find a gcd != 1 and exit the main loop. Pollard-Rho: Code:
$ time ./factor real 0m4.046s user 0m4.005s sys 0m0.029s Code:
$ time ./factor real 0m27.199s user 0m26.372s sys 0m0.203s |
![]() |
![]() |
![]() |
#11 |
"Dana Jacobsen"
Feb 2011
Bangkok, TH
32×101 Posts |
![]()
Finally decoded this, not long after my edit window closed. So on my Macbook with SSD there is a penalty for using the big file but it's ~2-3x. On your machine it was a massive 50x penalty. Hence the 1283us instead of 23us. Ouch.
|
![]() |
![]() |
![]() |
Thread Tools | |
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Factoring Mersenne numbers | paulunderwood | Miscellaneous Math | 18 | 2017-08-27 14:56 |
Factoring Big Numbers In c# | ShridharRasal | Factoring | 10 | 2008-03-20 17:17 |
Question about factoring code | Peter Nelson | Software | 9 | 2005-03-30 18:28 |
Factoring Fermat Numbers | Axel Fox | Software | 14 | 2003-07-04 18:57 |
Questions about SSE2 code and Factoring | Joe O | Software | 2 | 2002-09-13 23:39 |