mersenneforum.org New feature in my ECM applet
 Register FAQ Search Today's Posts Mark Forums Read

2010-04-06, 16:46   #56
bsquared

"Ben"
Feb 2007

2·32·191 Posts

Quote:
 Originally Posted by alpertron Just search the string int numberThreads = 1; in the file ecm.java and then change the number of threads.
Thanks.

Interestingly, the bytecode compiler seems to do a much better job with threading than gcj. I measured the scaling efficiency for 2 and 4 threads and compared to YAFU, for reference. Efficiency is computed as the ratio of theoretical peak performance improvement to actual.

Does your applet multi-thread the linear algebra as well as sieving? YAFU does not multi-thread the post-processing, so the attachment has the efficiency computed both with and without post-processing times included.

I loath table formatting with vBulletin, so the data is in the attachment.
Attached Files
 comparison.zip (2.3 KB, 94 views)

 2010-04-06, 18:36 #57 alpertron     Aug 2002 Buenos Aires, Argentina 54C16 Posts The multithreading is only done on the sieving. Since the Block Lanczos code only needs a few percent of the total running time, I think the applet does not need to execute parallel linear algebra. In your spreadsheet it is clear that the double large prime variation is a lot faster than my simple prime variation for larger composite numbers.
2010-04-06, 18:45   #58
bsquared

"Ben"
Feb 2007

2×32×191 Posts

Quote:
 Originally Posted by alpertron The multithreading is only done on the sieving. Since the Block Lanczos code only needs a few percent of the total running time, I think the applet does not need to execute parallel linear algebra.
I agree, but it does make a noticeable difference if this time is included in the efficiency calculations or not. I just wanted to be fair in the comparison to YAFU efficiency numbers.

Quote:
 Originally Posted by alpertron In your spreadsheet it is clear that the double large prime variation is a lot faster than my simple prime variation for larger composite numbers.
Well, it's true that DLP benefits larger composites, but YAFU doesn't "turn on" the double large prime variation until above about C82, so all of these tests use the single large prime variation. The C82 cutoff was empirically determined... below that DLP starts to be slower, at least in YAFU.

2010-04-06, 19:01   #59
alpertron

Aug 2002
Buenos Aires, Argentina

22×3×113 Posts

Quote:
 Originally Posted by bsquared I agree, but it does make a noticeable difference if this time is included in the efficiency calculations or not. I just wanted to be fair in the comparison to YAFU efficiency numbers. Well, it's true that DLP benefits larger composites, but YAFU doesn't "turn on" the double large prime variation until above about C82, so all of these tests use the single large prime variation. The C82 cutoff was empirically determined... below that DLP starts to be slower, at least in YAFU.
It is interesting to notice that in the Lenstra and Manasse paper about the double large prime method they said that the code ran faster starting from 75 digits. They used Pollard Rho. So it appears that I have to implement it and then test the speed.

2010-04-07, 07:10   #60
xilman
Bamboozled!

"𒉺𒌌𒇷𒆷𒀭"
May 2003
Down not across

29BE16 Posts

Quote:
 Originally Posted by alpertron It is interesting to notice that in the Lenstra and Manasse paper about the double large prime method they said that the code ran faster starting from 75 digits. They used Pollard Rho. So it appears that I have to implement it and then test the speed.
There are many differences in QS implementations since those early days. In particular, it was before the invention of SIQS.

Pollard rho is good enough for factoring small co-factors but subsequent experiments, some of them done by me, appear to show that QS specifically tailored for such small integers is significantly better. SQFOF also seems to be better than rho.

Paul

 2010-04-07, 12:13 #61 alpertron     Aug 2002 Buenos Aires, Argentina 22×3×113 Posts I was profiling the applet using the standard tools: Code: appletviewer -J-Xrunhprof:cpu=samples,depth=6 ecm.htm The output is saved in java.hprof.txt There are several timings that must be considered (at least in my applet): * Sieving * Walk through sieve in order to find candidates for trial division * Trial division. The polynomial initialization only needs 3% of the running time, so it is negligible. The second step in my applet takes about 15% - 20% of the total running time, and the trial division depends on the input number: for 60 digits it takes about 40% of the running time while for 100 digits the percentage is reduced to less than 10%. In C language you can cast the sieve array to whatever you want so you can reduce the timing of the walk almost to nothing, but in Java this cast is not possible. So I sieve two polynomials at the same time and then add the logarithm of the current prime to the first or the second byte of the short. When using two large primes, I assume that the sieve time will be about halved and the trial division time will be increased. It is clear that this step needs to be optimized otherwise it will dominate the execution time. The naive trial division require n divisions where n is the number of multiple precision words. The problem is that Java performs 64-bit divisions which are very slow. So I was able to improve the timing by using sieve information: For the primes which are not the divisors of the quadratic coefficient I compute the difference between the index of the sieve and soln1 (using Contini's notation) modulo the current prime. Then if this value is zero or prime - difsoln where difsoln = soln2 - soln1 (mod prime), that prime is a divisor of (ax+b)^2-N, so I divide one or more times (while the remainder is zero) by the prime. This only requires one 32-bit division in average instead of 4 or more 64-bit divisions.
 2010-04-07, 13:55 #62 alpertron     Aug 2002 Buenos Aires, Argentina 25148 Posts I was thinking on Pollard Rho using Montgomery multiplications because it requires no divisions. According to the third table at http://www.behardware.com/articles/6...-duo-test.html the divisions are very slow compared with multiplications.
 2010-04-08, 14:23 #63 ThiloHarich     Nov 2005 101 Posts @ bsquared so you are using JDK 1.4? Switching to JDK 1.6 gives a big performance increase, since the -server Option is not on for a Standard Applet. See http://www.mersenneforum.org/showpos...1&postcount=30 I stopped working on my Java Factorizer, since I had the same performance like alpertron, except the matrix step because i do not have the Block Lancosz Matrix Step implemented. And the parallel variant was not running properly. Maybe I will start working on it. Pollard Rho works rather slow for me. I expected much more out of it. I resieve and since the smooth numbers are very sparse I do the division with the standard BigInterger.divide on the smooth numbers. It still takes only very view time.
2010-04-08, 17:22   #64
xilman
Bamboozled!

"𒉺𒌌𒇷𒆷𒀭"
May 2003
Down not across

246768 Posts

Quote:
 Originally Posted by ThiloHarich Pollard Rho works rather slow for me.
I'm not surprised. Try SQFOF, or ECM and/or QS especially tailored for very small integers.

Paul

2010-04-08, 18:23   #65
bsquared

"Ben"
Feb 2007

2·32·191 Posts

Quote:
 Originally Posted by ThiloHarich @ bsquared so you are using JDK 1.4? Switching to JDK 1.6 gives a big performance increase, since the -server Option is not on for a Standard Applet. See http://www.mersenneforum.org/showpos...1&postcount=30
Code:
 % javac -version
javac 1.6.0_18

% java -version
java version "1.6.0_18"
Java(TM) SE Runtime Environment (build 1.6.0_18-b07)
Java HotSpot(TM) 64-Bit Server VM (build 16.0-b13, mixed mode)
I first compiled the applet code with javac, then ran it using "java ecm". This was slower than the binary produced by gcj, but as I mentioned before, threading using javac seems to be better.

 2010-04-08, 19:02 #66 alpertron     Aug 2002 Buenos Aires, Argentina 135610 Posts I'll use SQUFOF given that it takes little space for its implementation (this is a bonus in Java).

 Similar Threads Thread Thread Starter Forum Replies Last Post a1call Programming 19 2019-11-08 22:31 ET_ Lone Mersenne Hunters 69 2014-06-01 17:34 3.14159 Miscellaneous Math 7 2010-06-01 01:29 alpertron Factoring 14 2006-01-01 04:00 jinydu Lounge 2 2004-05-05 08:33

All times are UTC. The time now is 19:00.

Sun May 16 19:00:14 UTC 2021 up 38 days, 13:41, 0 users, load averages: 2.30, 2.64, 2.94