mersenneforum.org  

Go Back   mersenneforum.org > Factoring Projects > Factoring

Reply
 
Thread Tools
Old 2017-11-01, 18:32   #67
danaj
 
"Dana Jacobsen"
Feb 2011
Bangkok, TH

22·227 Posts
Default

Quote:
Originally Posted by jasonp View Post
If you want another (not well optimized) SIQS, I wrote this in 2005 and modernized it a short while ago.
Thank you very much for sharing this, Jason. Is this also public domain? The files don't have any author or licensing info so I'd like to verify. It looks like this is all new on 2014-09-28 and not related to any of the RDS code, but instead your 2007 tinyqs (you mention it's originally from 2005).

My initial performance indications are great for this. The x86-64 "pbrent" codes are slightly faster up to 63-64 bits, but co-siqs doesn't have the asm use. From 64-70 bits it's debatable but co-siqs is certainly close if not ahead, and up to 126 bit it is faster than anything else I have tried.

I wrote a "SQUFOF126" which is a modified version of my version of Ben's racing SQUFOF, using 64-bit native int core and GMP for the setup. It's at least in the ballpark, but not faster than co-siqs. It seems SQUFOF has been pushed out of every range.

For non even-split semiprimes, I find a small amount of low-overhead p-1 in front can as much as halve the average time taken. My experience has been that ECM isn't as useful for this sort of optimistic up-front test on small sizes. I need to add a standalone small p-1 to my todo (currently it uses my prime iterator which is a little higher cost than a simple array as well as pulling in a lot more code, albeit more flexible).
danaj is offline   Reply With Quote
Old 2017-11-01, 20:20   #68
Till
 
Till's Avatar
 
"Tilman Neumann"
Jan 2016
Germany

2×3×71 Posts
Default

Quote:
Originally Posted by danaj View Post
I wrote a "SQUFOF126" which is a modified version of my version of Ben's racing SQUFOF, using 64-bit native int core and GMP for the setup. It's at least in the ballpark, but not faster than co-siqs. It seems SQUFOF has been pushed out of every range.
Hi Dana,
if you use multipliers of the form 1680 * "square-free sequence" = {1, 2, 3, 5, 6, 7, 10, 11, 13, ...} and switch multipliers after ~ 1.5*N^(1/5) iterations, you'ld get the algorithm I proposed in my PSIQS thread. It will only work until some 90+ bits because of the larger multipliers, but it looked quite fine in Java. I am curious how a highly optimized C/asm implementation compares.

Btw. 1680 is just one very smooth constant I found to work well. 5040, 10080 or the like could be slightly better.
Till is offline   Reply With Quote
Old 2017-11-02, 20:40   #69
jasonp
Tribal Bullet
 
jasonp's Avatar
 
Oct 2004

2×3×19×31 Posts
Default

Quote:
Originally Posted by danaj View Post
Thank you very much for sharing this, Jason. Is this also public domain? The files don't have any author or licensing info so I'd like to verify. It looks like this is all new on 2014-09-28 and not related to any of the RDS code, but instead your 2007 tinyqs (you mention it's originally from 2005).
My pleasure, consider it public domain and include anywhere.

The code has nothing to do with RDS' original sieve, it was part of an early project to build a line sieve suitable for kilobit-scale GNFS. I never did anything else with it and dusted it off when there was a chance RDS would contribute to the group effort for sieving 2,991- if his code could be overhauled to scale up to problems that large. Unfortunately just adding 3LP cofactorization wasn't enough to make his lattice sieve scale up to handle large numbers, but at least the code is out there.

ISTR the SIQS code that is in the Franke-Kleinjung sievers was twice as fast as this when I last benchmarked it a lifetime ago, but it does have a 96-bit limit.
jasonp is offline   Reply With Quote
Old 2017-11-02, 22:55   #70
henryzz
Just call me Henry
 
henryzz's Avatar
 
"David"
Sep 2007
Cambridge (GMT/BST)

2×2,909 Posts
Default

Quote:
Originally Posted by jasonp View Post
ISTR the SIQS code that is in the Franke-Kleinjung sievers was twice as fast as this when I last benchmarked it a lifetime ago, but it does have a 96-bit limit.
The code in the v5 sievers has a much higher limit.
henryzz is online now   Reply With Quote
Old 2017-11-08, 18:08   #71
R. Gerbicz
 
R. Gerbicz's Avatar
 
"Robert Gerbicz"
Oct 2005
Hungary

5×172 Posts
Default

The used key idea is the Bernstein's method, see: https://cr.yp.to/papers/sf.pdf.
We factorize in polynomial time(!) per integer, but this is only true if you have MANY factorized integers,
hence this method needs also a lot of memory.
There is some trade-off here, if you simply have not got enough memory, see also
my test when we factorized 2e6 and also when factorized 9e6 fifty-five bits integers, in the latter
case we are close to an optimal setup, but even in the first case, when we have only 2e6 integers,
the used time is still not that bad.

I've some improvements on this algorithm, that's why this code is somewhat longer.
Contains also an improvement on this idea: http://www.mersenneforum.org/showpos...40&postcount=8
we use only 3 bases(!)

Sample runs of my code.
Code:
gerbicz@gerbicz:~/gmp-6.1.2$ g++ -static -m64 -O2 -fomit-frame-pointer -frename-registers -flto -mtune=skylake -march=skylake -mavx2 -Wall -o bernstein bernstein.c -lgmp -lm
gerbicz@gerbicz:~/gmp-6.1.2$
gerbicz@gerbicz:~/gmp-6.1.2$
gerbicz@gerbicz:~/gmp-6.1.2$ ./bernstein -composites 0 -diff 1 -in semi_55_9e6.txt
Simultaneous fast factorization for many (small) numbers using D. J. Bernstein's method.
primepi(2^16)=6542;
primepi(2^32)=203280221;
Setup took 4 sec. [to build the isprime table]; not counted in total time(!)
Read input=semi_55_9e6.txt
Processed 9000000 lines in 349 sec.; total 9000000 lines in 349 sec.
Total processed 9000000 numbers in 349 sec.
There were (too) few numbers in the file, for optimal performance we would expect roughly 9957119 numbers within the same range and difficulty.
gerbicz@gerbicz:~/gmp-6.1.2$
gerbicz@gerbicz:~/gmp-6.1.2$
gerbicz@gerbicz:~/gmp-6.1.2$ ./bernstein -composites 0 -diff 1 -in semi_60_15e6.txt
Simultaneous fast factorization for many (small) numbers using D. J. Bernstein's method.
primepi(2^16)=6542;
primepi(2^32)=203280221;
Setup took 4 sec. [to build the isprime table]; not counted in total time(!)
Read input=semi_60_15e6.txt
Processed 15000000 lines in 1390 sec.; total 15000000 lines in 1390 sec.
Total processed 15000000 numbers in 1390 sec.
There were (too) few numbers in the file, for optimal performance we would expect roughly 51630451 numbers within the same range and difficulty.
gerbicz@gerbicz:~/gmp-6.1.2$ 
gerbicz@gerbicz:~/gmp-6.1.2$
gerbicz@gerbicz:~/gmp-6.1.2$ ./bernstein -composites 0 -diff 1 -in semi_55_2e6.txt
Simultaneous fast factorization for many (small) numbers using D. J. Bernstein's method.
primepi(2^16)=6542;
primepi(2^32)=203280221;
Setup took 4 sec. [to build the isprime table]; not counted in total time(!)
Read input=semi_55_2e6.txt
Processed 2000000 lines in 93 sec.; total 2000000 lines in 93 sec.
Total processed 2000000 numbers in 93 sec.
There were (too) few numbers in the file, for optimal performance we would expect roughly 9957292 numbers within the same range and difficulty.
gerbicz@gerbicz:~/gmp-6.1.2$ 
gerbicz@gerbicz:~/gmp-6.1.2$ 
gerbicz@gerbicz:~/gmp-6.1.2$ ./bernstein -composites 0 -diff 0 -in easy_40_4e6.txt
Simultaneous fast factorization for many (small) numbers using D. J. Bernstein's method.
primepi(2^16)=6542;
primepi(2^32)=203280221;
Setup took 4 sec. [to build the isprime table]; not counted in total time(!)
Read input=easy_40_4e6.txt
Processed 469841 lines in 3 sec.; total 469841 lines in 3 sec.
Processed 475395 lines in 3 sec.; total 945236 lines in 6 sec.
Processed 470891 lines in 2 sec.; total 1416127 lines in 8 sec.
Processed 472562 lines in 3 sec.; total 1888689 lines in 11 sec.
Processed 471654 lines in 3 sec.; total 2360343 lines in 14 sec.
Processed 472816 lines in 2 sec.; total 2833159 lines in 16 sec.
Processed 471924 lines in 3 sec.; total 3305083 lines in 19 sec.
Processed 470969 lines in 3 sec.; total 3776052 lines in 22 sec.
Processed 223948 lines in 1 sec.; total 4000000 lines in 23 sec.
Total processed 4000000 numbers in 23 sec.
gerbicz@gerbicz:~/gmp-6.1.2$
In table format, comparing only with the fast factor63 (failed to install some other):

Code:
             factor63.c   bernstein.c
semi_55_9e6:   847 sec.   349 sec.
semi_60_15e6: 3407 sec.  1390 sec.
semi_55_2e6:   201 sec.    93 sec.
easy_40_4e6:     5 sec.    23 sec.
(for string_L_n we factorize n integers each of them has L bits)

Yeah, for "easy" numbers my algorithm is slower, not only in 40 bits, but also for 60 bits, and maybe even on 64 bits.

In the code you can see the used PARI-Gp functions to generate these files, but
you can also download it, in the same order as above:
https://drive.google.com/file/d/1WLQ...ew?usp=sharing
https://drive.google.com/file/d/1CJG...ew?usp=sharing
https://drive.google.com/file/d/14sn...ew?usp=sharing
https://drive.google.com/file/d/13wQ...ew?usp=sharing

ps. I've 16 GB of Ram and in the only 60 bits test (factorization of 15e6 hard numbers)
we've used all memory for some seconds, went to use also some swap memory,
but we survived. [we don't use constantly the same amount of memory, so this helps]

If you have many numbers in the file, then for a better run it should be
sorted in increasing order [the pari-gp methods is not doing this].
Attached Files
File Type: zip bernstein.zip (48.1 KB, 67 views)
R. Gerbicz is offline   Reply With Quote
Old 2017-11-08, 18:31   #72
danaj
 
"Dana Jacobsen"
Feb 2011
Bangkok, TH

22×227 Posts
Default

Thanks Robert, that's quite interesting.

Re hashed bases, a single hashed base is definitely a win for 32-bit inputs, but since a Lucas test takes about 1.5x the cost of a M-R test, I find it nicer to just do that and get faster overall performance and skip the big table as well. Though it depends on your test input set (if they're all SPSP2 pseudoprimes, then the 2 or 3 M-R test will look good).

Also depends on your implementations of course. The 1.5x or so cost is for x86-64 with Montgomery math. GMP for larger inputs is something in the 1.5x to 1.8x range if I recall correctly but I'd have to dig to get the measurements. LibTomMath was more like 6x but maybe it could be sped up.

Always nice to have more options, and I'll have to play with it.
danaj is offline   Reply With Quote
Old 2017-11-08, 18:59   #73
arbooker
 
arbooker's Avatar
 
"Andrew Booker"
Mar 2013

8510 Posts
Default

Quote:
Originally Posted by R. Gerbicz View Post
The used key idea is the Bernstein's method, see: https://cr.yp.to/papers/sf.pdf.
Nice work! I was aware of this algorithm of Bernstein and his toy implementation. It's nice to see an optimized version and impressive that it compares favorably to the one-off algorithms even for small bit sizes.

Probably the optimal strategy is to do some low-level Pollard rho to find the small prime factors, then run Bernstein to take care of all the stubborn cases in one go.
arbooker is offline   Reply With Quote
Old 2017-11-08, 19:20   #74
R. Gerbicz
 
R. Gerbicz's Avatar
 
"Robert Gerbicz"
Oct 2005
Hungary

5×172 Posts
Default

Quote:
Originally Posted by danaj View Post
Thanks Robert, that's quite interesting.

Re hashed bases, a single hashed base is definitely a win for 32-bit inputs, but since a Lucas test takes about 1.5x the cost of a M-R test, I find it nicer to just do that and get faster overall performance and skip the big table as well. Though it depends on your test input set (if they're all SPSP2 pseudoprimes, then the 2 or 3 M-R test will look good).
In our interesting case for hard numbers (diff=1) we are not doing even a single isprime call.
For easy numbers we are calling at most one per number, here nothing for c<2^32 (just a table lookup)
or when c<=maxp^2 (when we know that c=1 or c is prime), where maxp is the max. prime in the smaller product tree.
(and here there is also a special save, when you use the composites=1 as a switch)

With your decent isprime() functions the method would be somewhat faster for the easy case.
[ you have to replace the isprime64 function ].

Quote:
Originally Posted by danaj View Post
LibTomMath was more like 6x but maybe it could be sped up.
In costly operations most of the cases for large integers we are doing multiplication, and no hard division (only division by 2^e, but it is cheap), it could be interesting to ask why? This is due to the
scaled remainder tree (SRT in the code), doing only one hard floating point division and multiplication we avoid all other integer divisions what is common for a "regular" remainder tree.
Though interestingly this SRT gave almost no speedup in the code.
R. Gerbicz is offline   Reply With Quote
Old 2017-11-08, 19:56   #75
R. Gerbicz
 
R. Gerbicz's Avatar
 
"Robert Gerbicz"
Oct 2005
Hungary

5A516 Posts
Default

Quote:
Originally Posted by arbooker View Post
Nice work! I was aware of this algorithm of Bernstein and his toy implementation. It's nice to see an optimized version and impressive that it compares favorably to the one-off algorithms even for small bit sizes.
Not remembered for this toy.c, my basic implementation, what I called vintage was this bernstein_demo.c (renamed today), see only one test run:
Code:
gerbicz@gerbicz:~/gmp-6.1.2$ ./demo
Polinom ideju faktorizalas egyszerre, sok 'kis' szamra
D. J. Bernstein modszere alapjan.
Hany bites random szamokat szeretnel (<63 bit) ? 40
Hanyat generaljak? 1000000
A random szamok maximuma=1099510120925
Primek szama a maximum negyzetgyokeig=82025
Kesz 7 masodperc alatt.
Faktorizalas eredmenye a ki.txt fileban talalhato.
gerbicz@gerbicz:~/gmp-6.1.2$
Left the text in Hungarian , so the first question asks L, the second the n value, then the code generates n random integers, each of them has L bits, and factorize these. The output is in ki.txt . This has the L<63 limitation [the original bernstein can factorize up to L=64], uses more memory for large L, using only one remainder tree, but still has a decent speed, though it is slower than bernstein.c. Displays the used time in the last but one line.

Quote:
Originally Posted by arbooker View Post
Probably the optimal strategy is to do some low-level Pollard rho to find the small prime factors, then run Bernstein to take care of all the stubborn cases in one go.
Note that in this version in the easy case (diff=0) actually first we extract the small prime factors using smaller product trees, but the same algorithm. For the remaining composite(!) cofactors we are using the big trees (using primes up to max sqrt(n)).
That is why the diff=0 is faster than diff=1, ofcourse only if there were easy numbers in the input file.
But it could be possible that extracting the smallish factors with for example Pollard rho would be faster.
Attached Files
File Type: zip bernstein_demo.zip (1.8 KB, 57 views)
R. Gerbicz is offline   Reply With Quote
Old 2017-11-09, 10:02   #76
henryzz
Just call me Henry
 
henryzz's Avatar
 
"David"
Sep 2007
Cambridge (GMT/BST)

581810 Posts
Default

Could Bernstein's method be useful for splitting large primes in SIQS or GNFS? It is certainly possible to have a lot of numbers to factor simultaneously. I get the impression that memory usage would be an issue for GNFS at least. I need to remind myself how Bernstein's method works.
henryzz is online now   Reply With Quote
Old 2017-11-09, 12:50   #77
danaj
 
"Dana Jacobsen"
Feb 2011
Bangkok, TH

22×227 Posts
Default

Quote:
Originally Posted by henryzz View Post
Could Bernstein's method be useful for splitting large primes in SIQS or GNFS? It is certainly possible to have a lot of numbers to factor simultaneously. I get the impression that memory usage would be an issue for GNFS at least. I need to remind myself how Bernstein's method works.
Memory usage is a big deal for my application, but something like this for much larger inputs has been on my wish list for ECPP for a while. There we get 50-500 numbers and the goal is to strip out factors until we get a satisfactory one or decide we should give up and try a different tree. Not the sheer volume of numbers that QS or NFS has, and significantly larger (hundreds or thousands of digits). Primo does a variety of clever things here.
danaj is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
Factoring Mersenne numbers paulunderwood Miscellaneous Math 18 2017-08-27 14:56
Factoring Big Numbers In c# ShridharRasal Factoring 10 2008-03-20 17:17
Question about factoring code Peter Nelson Software 9 2005-03-30 18:28
Factoring Fermat Numbers Axel Fox Software 14 2003-07-04 18:57
Questions about SSE2 code and Factoring Joe O Software 2 2002-09-13 23:39

All times are UTC. The time now is 20:02.

Thu Mar 4 20:02:52 UTC 2021 up 91 days, 16:14, 0 users, load averages: 2.40, 2.03, 1.69

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.