20171101, 18:32  #67  
"Dana Jacobsen"
Feb 2011
Bangkok, TH
2^{2}·227 Posts 
Quote:
My initial performance indications are great for this. The x8664 "pbrent" codes are slightly faster up to 6364 bits, but cosiqs doesn't have the asm use. From 6470 bits it's debatable but cosiqs is certainly close if not ahead, and up to 126 bit it is faster than anything else I have tried. I wrote a "SQUFOF126" which is a modified version of my version of Ben's racing SQUFOF, using 64bit native int core and GMP for the setup. It's at least in the ballpark, but not faster than cosiqs. It seems SQUFOF has been pushed out of every range. For non evensplit semiprimes, I find a small amount of lowoverhead p1 in front can as much as halve the average time taken. My experience has been that ECM isn't as useful for this sort of optimistic upfront test on small sizes. I need to add a standalone small p1 to my todo (currently it uses my prime iterator which is a little higher cost than a simple array as well as pulling in a lot more code, albeit more flexible). 

20171101, 20:20  #68  
"Tilman Neumann"
Jan 2016
Germany
2×3×71 Posts 
Quote:
if you use multipliers of the form 1680 * "squarefree sequence" = {1, 2, 3, 5, 6, 7, 10, 11, 13, ...} and switch multipliers after ~ 1.5*N^(1/5) iterations, you'ld get the algorithm I proposed in my PSIQS thread. It will only work until some 90+ bits because of the larger multipliers, but it looked quite fine in Java. I am curious how a highly optimized C/asm implementation compares. Btw. 1680 is just one very smooth constant I found to work well. 5040, 10080 or the like could be slightly better. 

20171102, 20:40  #69  
Tribal Bullet
Oct 2004
2×3×19×31 Posts 
Quote:
The code has nothing to do with RDS' original sieve, it was part of an early project to build a line sieve suitable for kilobitscale GNFS. I never did anything else with it and dusted it off when there was a chance RDS would contribute to the group effort for sieving 2,991 if his code could be overhauled to scale up to problems that large. Unfortunately just adding 3LP cofactorization wasn't enough to make his lattice sieve scale up to handle large numbers, but at least the code is out there. ISTR the SIQS code that is in the FrankeKleinjung sievers was twice as fast as this when I last benchmarked it a lifetime ago, but it does have a 96bit limit. 

20171102, 22:55  #70 
Just call me Henry
"David"
Sep 2007
Cambridge (GMT/BST)
2×2,909 Posts 

20171108, 18:08  #71 
"Robert Gerbicz"
Oct 2005
Hungary
5×17^{2} Posts 
The used key idea is the Bernstein's method, see: https://cr.yp.to/papers/sf.pdf.
We factorize in polynomial time(!) per integer, but this is only true if you have MANY factorized integers, hence this method needs also a lot of memory. There is some tradeoff here, if you simply have not got enough memory, see also my test when we factorized 2e6 and also when factorized 9e6 fiftyfive bits integers, in the latter case we are close to an optimal setup, but even in the first case, when we have only 2e6 integers, the used time is still not that bad. I've some improvements on this algorithm, that's why this code is somewhat longer. Contains also an improvement on this idea: http://www.mersenneforum.org/showpos...40&postcount=8 we use only 3 bases(!) Sample runs of my code. Code:
gerbicz@gerbicz:~/gmp6.1.2$ g++ static m64 O2 fomitframepointer frenameregisters flto mtune=skylake march=skylake mavx2 Wall o bernstein bernstein.c lgmp lm gerbicz@gerbicz:~/gmp6.1.2$ gerbicz@gerbicz:~/gmp6.1.2$ gerbicz@gerbicz:~/gmp6.1.2$ ./bernstein composites 0 diff 1 in semi_55_9e6.txt Simultaneous fast factorization for many (small) numbers using D. J. Bernstein's method. primepi(2^16)=6542; primepi(2^32)=203280221; Setup took 4 sec. [to build the isprime table]; not counted in total time(!) Read input=semi_55_9e6.txt Processed 9000000 lines in 349 sec.; total 9000000 lines in 349 sec. Total processed 9000000 numbers in 349 sec. There were (too) few numbers in the file, for optimal performance we would expect roughly 9957119 numbers within the same range and difficulty. gerbicz@gerbicz:~/gmp6.1.2$ gerbicz@gerbicz:~/gmp6.1.2$ gerbicz@gerbicz:~/gmp6.1.2$ ./bernstein composites 0 diff 1 in semi_60_15e6.txt Simultaneous fast factorization for many (small) numbers using D. J. Bernstein's method. primepi(2^16)=6542; primepi(2^32)=203280221; Setup took 4 sec. [to build the isprime table]; not counted in total time(!) Read input=semi_60_15e6.txt Processed 15000000 lines in 1390 sec.; total 15000000 lines in 1390 sec. Total processed 15000000 numbers in 1390 sec. There were (too) few numbers in the file, for optimal performance we would expect roughly 51630451 numbers within the same range and difficulty. gerbicz@gerbicz:~/gmp6.1.2$ gerbicz@gerbicz:~/gmp6.1.2$ gerbicz@gerbicz:~/gmp6.1.2$ ./bernstein composites 0 diff 1 in semi_55_2e6.txt Simultaneous fast factorization for many (small) numbers using D. J. Bernstein's method. primepi(2^16)=6542; primepi(2^32)=203280221; Setup took 4 sec. [to build the isprime table]; not counted in total time(!) Read input=semi_55_2e6.txt Processed 2000000 lines in 93 sec.; total 2000000 lines in 93 sec. Total processed 2000000 numbers in 93 sec. There were (too) few numbers in the file, for optimal performance we would expect roughly 9957292 numbers within the same range and difficulty. gerbicz@gerbicz:~/gmp6.1.2$ gerbicz@gerbicz:~/gmp6.1.2$ gerbicz@gerbicz:~/gmp6.1.2$ ./bernstein composites 0 diff 0 in easy_40_4e6.txt Simultaneous fast factorization for many (small) numbers using D. J. Bernstein's method. primepi(2^16)=6542; primepi(2^32)=203280221; Setup took 4 sec. [to build the isprime table]; not counted in total time(!) Read input=easy_40_4e6.txt Processed 469841 lines in 3 sec.; total 469841 lines in 3 sec. Processed 475395 lines in 3 sec.; total 945236 lines in 6 sec. Processed 470891 lines in 2 sec.; total 1416127 lines in 8 sec. Processed 472562 lines in 3 sec.; total 1888689 lines in 11 sec. Processed 471654 lines in 3 sec.; total 2360343 lines in 14 sec. Processed 472816 lines in 2 sec.; total 2833159 lines in 16 sec. Processed 471924 lines in 3 sec.; total 3305083 lines in 19 sec. Processed 470969 lines in 3 sec.; total 3776052 lines in 22 sec. Processed 223948 lines in 1 sec.; total 4000000 lines in 23 sec. Total processed 4000000 numbers in 23 sec. gerbicz@gerbicz:~/gmp6.1.2$ Code:
factor63.c bernstein.c semi_55_9e6: 847 sec. 349 sec. semi_60_15e6: 3407 sec. 1390 sec. semi_55_2e6: 201 sec. 93 sec. easy_40_4e6: 5 sec. 23 sec. Yeah, for "easy" numbers my algorithm is slower, not only in 40 bits, but also for 60 bits, and maybe even on 64 bits. In the code you can see the used PARIGp functions to generate these files, but you can also download it, in the same order as above: https://drive.google.com/file/d/1WLQ...ew?usp=sharing https://drive.google.com/file/d/1CJG...ew?usp=sharing https://drive.google.com/file/d/14sn...ew?usp=sharing https://drive.google.com/file/d/13wQ...ew?usp=sharing ps. I've 16 GB of Ram and in the only 60 bits test (factorization of 15e6 hard numbers) we've used all memory for some seconds, went to use also some swap memory, but we survived. [we don't use constantly the same amount of memory, so this helps] If you have many numbers in the file, then for a better run it should be sorted in increasing order [the parigp methods is not doing this]. 
20171108, 18:31  #72 
"Dana Jacobsen"
Feb 2011
Bangkok, TH
2^{2}×227 Posts 
Thanks Robert, that's quite interesting.
Re hashed bases, a single hashed base is definitely a win for 32bit inputs, but since a Lucas test takes about 1.5x the cost of a MR test, I find it nicer to just do that and get faster overall performance and skip the big table as well. Though it depends on your test input set (if they're all SPSP2 pseudoprimes, then the 2 or 3 MR test will look good). Also depends on your implementations of course. The 1.5x or so cost is for x8664 with Montgomery math. GMP for larger inputs is something in the 1.5x to 1.8x range if I recall correctly but I'd have to dig to get the measurements. LibTomMath was more like 6x but maybe it could be sped up. Always nice to have more options, and I'll have to play with it. 
20171108, 18:59  #73  
"Andrew Booker"
Mar 2013
85_{10} Posts 
Quote:
Probably the optimal strategy is to do some lowlevel Pollard rho to find the small prime factors, then run Bernstein to take care of all the stubborn cases in one go. 

20171108, 19:20  #74  
"Robert Gerbicz"
Oct 2005
Hungary
5×17^{2} Posts 
Quote:
For easy numbers we are calling at most one per number, here nothing for c<2^32 (just a table lookup) or when c<=maxp^2 (when we know that c=1 or c is prime), where maxp is the max. prime in the smaller product tree. (and here there is also a special save, when you use the composites=1 as a switch) With your decent isprime() functions the method would be somewhat faster for the easy case. [ you have to replace the isprime64 function ]. In costly operations most of the cases for large integers we are doing multiplication, and no hard division (only division by 2^e, but it is cheap), it could be interesting to ask why? This is due to the scaled remainder tree (SRT in the code), doing only one hard floating point division and multiplication we avoid all other integer divisions what is common for a "regular" remainder tree. Though interestingly this SRT gave almost no speedup in the code. 

20171108, 19:56  #75  
"Robert Gerbicz"
Oct 2005
Hungary
5A5_{16} Posts 
Quote:
Code:
gerbicz@gerbicz:~/gmp6.1.2$ ./demo Polinom ideju faktorizalas egyszerre, sok 'kis' szamra D. J. Bernstein modszere alapjan. Hany bites random szamokat szeretnel (<63 bit) ? 40 Hanyat generaljak? 1000000 A random szamok maximuma=1099510120925 Primek szama a maximum negyzetgyokeig=82025 Kesz 7 masodperc alatt. Faktorizalas eredmenye a ki.txt fileban talalhato. gerbicz@gerbicz:~/gmp6.1.2$ Quote:
That is why the diff=0 is faster than diff=1, ofcourse only if there were easy numbers in the input file. But it could be possible that extracting the smallish factors with for example Pollard rho would be faster. 

20171109, 10:02  #76 
Just call me Henry
"David"
Sep 2007
Cambridge (GMT/BST)
5818_{10} Posts 
Could Bernstein's method be useful for splitting large primes in SIQS or GNFS? It is certainly possible to have a lot of numbers to factor simultaneously. I get the impression that memory usage would be an issue for GNFS at least. I need to remind myself how Bernstein's method works.

20171109, 12:50  #77  
"Dana Jacobsen"
Feb 2011
Bangkok, TH
2^{2}×227 Posts 
Quote:


Thread Tools  
Similar Threads  
Thread  Thread Starter  Forum  Replies  Last Post 
Factoring Mersenne numbers  paulunderwood  Miscellaneous Math  18  20170827 14:56 
Factoring Big Numbers In c#  ShridharRasal  Factoring  10  20080320 17:17 
Question about factoring code  Peter Nelson  Software  9  20050330 18:28 
Factoring Fermat Numbers  Axel Fox  Software  14  20030704 18:57 
Questions about SSE2 code and Factoring  Joe O  Software  2  20020913 23:39 