![]() |
![]() |
#12 | |
Feb 2012
Prague, Czech Republ
101010112 Posts |
![]() Quote:
Computing a number modulo Mersenne number is fast: http://homepage.divms.uiowa.edu/~jon...d.shtml#exmod3. The bits take only 2MB of memrory, possibly half of that using pre-filtering mod 4. |
|
![]() |
![]() |
![]() |
#13 |
"Tilman Neumann"
Jan 2016
Germany
1101100102 Posts |
![]()
Ben, how did you implement isSquare(test = a^2 - 4kN)? I am curious if some tricks help to speed it up in C. Can we find your source code somewhere?
And, it will be worth to give the "divide by multiplication of inverse" tdiv a try. |
![]() |
![]() |
![]() |
#14 |
"Dana Jacobsen"
Feb 2011
Bangkok, TH
22×227 Posts |
![]()
What I have found quite fast, portable, and not needing big tables in C is something like:
Code:
static int is_perfect_square(uint64_t n) { /* Step 1, reduce to 18% of inputs */ uint32_t m = n & 127; if ((m*0x8bc40d7d) & (m*0xa1e2f5d1) & 0x14020a) return 0; /* Step 2, reduce to 7% of inputs (mod 99 reduces to 4% but slower) */ m = n %240; if ((m*0xfa445556) & (m*0x8021feb1) & 0x614aaa0f) return 0; /* m = n % 99; if ((m*0x5411171d) & (m*0xe41dd1c7) & 0x80028a80) return 0; */ /* Step 3, do the square root instead of any more rejections */ m = isqrt(n); return (uint64_t)m*(uint64_t)m == n; } Having a fast is_perfect_square() is completely critical to the performance of Hart's OLF and SQUFOF, and important for the Smith Lehman also. For each of those I found is a bit faster on x86_64 by using just the single mod 128 test. Which is better depends somewhat on the platform and library. isqrt is a wrapper function around libm's sqrt() but checking for overflow and ensuring correct integer results regardless of rounding from some dodgy platform. I'd love to see something faster that is still portable and not using large tables. As an aside, similar filters can be made for testing for perfect cubes and higher powers. |
![]() |
![]() |
![]() |
#15 |
"Ben"
Feb 2007
22·23·37 Posts |
![]()
Nothing I tried w.r.t. sqrt guarding helped improve speed. Well, a single compare with Warren's issq1024[test & 1023] table was just as fast, but not faster. Everything else slowed it down. It may be that the speed of sqrt on modern processors has made such checks unnecessary.
On my Xeon E5-2697 processor, 100k 32-bit semiprimes are factored in 0.08 seconds. That's 800 nanoseconds per factorization, on average. Pretty cool! |
![]() |
![]() |
![]() |
#16 |
Sep 2003
A1B16 Posts |
![]() |
![]() |
![]() |
![]() |
#17 | |
Nov 2005
101 Posts |
![]() Quote:
since 255 = 3 * 5 * 17 this should at least check if the number is a square mod 5*17, since in the Lehman_Fast code we use a multiplier 2*3. I wrote a test and yes the following code Code:
private static long mod255Jones(long a) { a = (a >> 16) + (a & 0xFFFF); /* sum base 2**16 digits */ a = (a >> 8) + (a & 0xFF); /* sum base 2**8 digits */ if (a < 255) return a; if (a < (2 * 255)) return a - 255; return a - (2 * 255); } So I precompute the squares % 255 in the boolean array squaresMod255, added the following check to the Lehman code (only in the lehmanEven method): Code:
// added this check if (squaresMod255[(int) mod255Jones(test)]) { final long b = (long) Math.sqrt(test); if (b*b == test) { return gcdEngine.gcd(a+b, N); } } Code:
if (squaresMod255[(int) test % 255]) { More interesting might be the Mersenne number 4095 = 2^12-1 = 3 * 3 * 5 * 7 * 13 since each prime divides the possible squares by 2. With a % 4095 still 37% slowdown. Is there als a faster way to calculate mod 4095? Yes M24 = 16777215 = 3* 3* 5* 7*13* 17* 241 reduces the number of squares by an other factor of 4. Surprisingly mod 24 = 3*8 has no slowdown. We set up the 'a' in the loop such that this check will always be true. But interesting that the Java Just Intime Compiler seems to know of this fact. It seem like it is better to use multipliers for k to increase the chance that the numbers will be a square then checking before testing. I have a more complex algorithm https://github.com/ThiloHarich/facto...an_Fast33.java with an additional multiplier 2*3*5 this gives some more speed in java. Other bigger multipliers are also possible, but they reduce the chance of finding a solution. Last fiddled with by ThiloHarich on 2019-01-18 at 23:35 |
|
![]() |
![]() |
![]() |
#18 |
"Ben"
Feb 2007
D4C16 Posts |
![]() |
![]() |
![]() |
![]() |
#19 |
Nov 2005
101 Posts |
![]()
Lehman suggest to use multipliers for k.
His argument was that if a/b is closest to the ratio of the divisors of n then using k = a*b*k' is looking in the area. But there is an other reason. If k = d*k' then a^2 - kn = y^2 <-> a^2 - dk'n = y^2 <-> a^2 = y^2 mod d. This means that the a is chosen such that test = a^2 - kn is a square mod d. So using a multiplier 2*3*5*7 for example ensures that a^2 - kn is always a square mod 2*3*5*7. This means we do not have to check this with a expensive mod operation. For k even we know that all a must be odd. If k is odd there are less 'a' mod 2^k. This is why we should first investigate in even k with are multiples of some small primes. A simple version of this idea can be found here: https://github.com/ThiloHarich/facto...manSimple.java The code is rather simple and short.. This Algorithm is based on the lehman algorithm and the Hart variant of it. It somehow tries to combine both algorithms. First it tries to find solution with a k = 2*3*3*5*7*k' = 630 * k'. This ensures that the created a = ceil (sqrt (k*n)) | 1 produces numbers a^2 - kn which are squares mod 2,9,5 and 7. This is there is a high chance that this numbers are squares and the algorithm finds a factor. If this part of the algorithm does not find a factor we investigate in numbers for an odd k. Here we choose k=1. in each step we look into numbers of both kinds. The numbers for k=1 were investigated very seldom. They are like a insurance to find the factor, if the fast part is nit finding numbers. Sorry for brainstorming you with ideas of algorithms. All have quite similar running times. Maybe somebody has the right idea to make more progress here. I also tried to look into multiple (different) multipliers in parallel, but no performance gain. |
![]() |
![]() |
![]() |
#20 |
Sep 2003
1010000110112 Posts |
![]() |
![]() |
![]() |
![]() |
#21 |
"Tilman Neumann"
Jan 2016
Germany
1B216 Posts |
![]()
I did a little benchmark in Java. The following snippet (I added the addition to sum and logging it to prevent the compiler optimizing it to nothing)
Code:
long t0 = System.currentTimeMillis(); long sum = 0; for (int i=0; i<10000000; i++) { sum += (int) Math.sqrt(NArray[i]); } long t1 = System.currentTimeMillis(); LOG.info("Unguarded sqrt took " + (t1-t0) + "ms, sum=" + sum); So it computes more than 125 Mio sqrts per second. The other way round: 1 sqrt, 1 add, 1 increment, 1 comparison and 1 array access do not take more than 16 to 22 clock cycles. This looks like Math.sqrt() being a pretty fast hardware operation. I also tested Danas proposition to guard the square root but it slows the Lehman down a little bit. The above would explain why. Last fiddled with by Till on 2019-01-19 at 17:12 Reason: make explicit it is Math.sqrt() which seems to be implemented in hardware |
![]() |
![]() |
![]() |
#22 |
"Tilman Neumann"
Jan 2016
Germany
43410 Posts |
![]()
Btw. it might be possible to further speed up the C implementation using AVX/SSE. Java is doing that under the hood when it finds suitable loops.
|
![]() |
![]() |
![]() |
Thread Tools | |
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Do normal adults give themselves an allowance? (...to fast or not to fast - there is no question!) | jasong | jasong | 35 | 2016-12-11 00:57 |
SQUFOF implementation | alpertron | Factoring | 15 | 2010-04-12 19:16 |
ECM/FPGA Implementation | rdotson | Hardware | 12 | 2006-03-26 22:58 |
need C implementation of divb(r,k) | Leith | Miscellaneous Math | 4 | 2005-01-18 23:14 |
RSA implementation | flava | Programming | 12 | 2004-10-26 03:51 |