Thilo and me found another fast algorithm.

This developed as follows:

1. The long tail of our splitted Lehman implementation is just a Hart one-line factorizer (HOLF). This let us believe that the congruences would speed up HOLF as well. Indeed that gave an immediate improvement like factor 3 compared to a simple HOLF implementation.

2. Next we noticed that only the most discriminative k-loop with k==0 % 3 is needed. Having a single loop, we need no bound check either.

3. We found a better multiplier 315=3*3*5*7, so now we test k with k==0 % 315 only. This has no negative consequences for memory requirements, because only sqrts of k that are multiples of 315 need to be stored.

4. Finally, it showed to be beneficious to _race_ HOLF against trial division in a single loop. This makes the algorithm robust and very fast for test numbers having small prime factors.

In Java, the new algorithm is factoring hard semiprimes about 20% faster than the Lehman implementation we proposed before. The smaller the factors can be, the bigger the speedup gets. The crossover point of the new algorithm with trial division is at ~25 bit and with Pollard-Rho at ~50 bit. Arrays with 1 million entries are sufficient to factor any number <= 54 bit.

Here is the source code:

https://github.com/TilmanNeumann/jav...TDiv_Race.java
We'll try next to apply the racing idea to our Lehman implementation.

P.S. Segmenting the arrays might be another idea to speed the algorithms up in C(++). It didn't help in Java, though.