mersenneforum.org  

Go Back   mersenneforum.org > Factoring Projects > Factoring

Reply
 
Thread Tools
Old 2010-04-06, 16:46   #56
bsquared
 
bsquared's Avatar
 
"Ben"
Feb 2007

2×32×191 Posts
Default

Quote:
Originally Posted by alpertron View Post
Just search the string int numberThreads = 1; in the file ecm.java and then change the number of threads.
Thanks.

Interestingly, the bytecode compiler seems to do a much better job with threading than gcj. I measured the scaling efficiency for 2 and 4 threads and compared to YAFU, for reference. Efficiency is computed as the ratio of theoretical peak performance improvement to actual.

Does your applet multi-thread the linear algebra as well as sieving? YAFU does not multi-thread the post-processing, so the attachment has the efficiency computed both with and without post-processing times included.

I loath table formatting with vBulletin, so the data is in the attachment.
Attached Files
File Type: zip comparison.zip (2.3 KB, 93 views)
bsquared is offline   Reply With Quote
Old 2010-04-06, 18:36   #57
alpertron
 
alpertron's Avatar
 
Aug 2002
Buenos Aires, Argentina

22·3·113 Posts
Default

The multithreading is only done on the sieving. Since the Block Lanczos code only needs a few percent of the total running time, I think the applet does not need to execute parallel linear algebra. In your spreadsheet it is clear that the double large prime variation is a lot faster than my simple prime variation for larger composite numbers.
alpertron is offline   Reply With Quote
Old 2010-04-06, 18:45   #58
bsquared
 
bsquared's Avatar
 
"Ben"
Feb 2007

D6E16 Posts
Default

Quote:
Originally Posted by alpertron View Post
The multithreading is only done on the sieving. Since the Block Lanczos code only needs a few percent of the total running time, I think the applet does not need to execute parallel linear algebra.
I agree, but it does make a noticeable difference if this time is included in the efficiency calculations or not. I just wanted to be fair in the comparison to YAFU efficiency numbers.

Quote:
Originally Posted by alpertron View Post
In your spreadsheet it is clear that the double large prime variation is a lot faster than my simple prime variation for larger composite numbers.
Well, it's true that DLP benefits larger composites, but YAFU doesn't "turn on" the double large prime variation until above about C82, so all of these tests use the single large prime variation. The C82 cutoff was empirically determined... below that DLP starts to be slower, at least in YAFU.
bsquared is offline   Reply With Quote
Old 2010-04-06, 19:01   #59
alpertron
 
alpertron's Avatar
 
Aug 2002
Buenos Aires, Argentina

22·3·113 Posts
Default

Quote:
Originally Posted by bsquared View Post
I agree, but it does make a noticeable difference if this time is included in the efficiency calculations or not. I just wanted to be fair in the comparison to YAFU efficiency numbers.



Well, it's true that DLP benefits larger composites, but YAFU doesn't "turn on" the double large prime variation until above about C82, so all of these tests use the single large prime variation. The C82 cutoff was empirically determined... below that DLP starts to be slower, at least in YAFU.
It is interesting to notice that in the Lenstra and Manasse paper about the double large prime method they said that the code ran faster starting from 75 digits. They used Pollard Rho. So it appears that I have to implement it and then test the speed.
alpertron is offline   Reply With Quote
Old 2010-04-07, 07:10   #60
xilman
Bamboozled!
 
xilman's Avatar
 
"π’‰Ίπ’ŒŒπ’‡·π’†·π’€­"
May 2003
Down not across

2×72×109 Posts
Default

Quote:
Originally Posted by alpertron View Post
It is interesting to notice that in the Lenstra and Manasse paper about the double large prime method they said that the code ran faster starting from 75 digits. They used Pollard Rho. So it appears that I have to implement it and then test the speed.
There are many differences in QS implementations since those early days. In particular, it was before the invention of SIQS.

Pollard rho is good enough for factoring small co-factors but subsequent experiments, some of them done by me, appear to show that QS specifically tailored for such small integers is significantly better. SQFOF also seems to be better than rho.


Paul
xilman is offline   Reply With Quote
Old 2010-04-07, 12:13   #61
alpertron
 
alpertron's Avatar
 
Aug 2002
Buenos Aires, Argentina

22×3×113 Posts
Default

I was profiling the applet using the standard tools:

Code:
appletviewer -J-Xrunhprof:cpu=samples,depth=6 ecm.htm
The output is saved in java.hprof.txt

There are several timings that must be considered (at least in my applet):

* Sieving
* Walk through sieve in order to find candidates for trial division
* Trial division.

The polynomial initialization only needs 3% of the running time, so it is negligible. The second step in my applet takes about 15% - 20% of the total running time, and the trial division depends on the input number: for 60 digits it takes about 40% of the running time while for 100 digits the percentage is reduced to less than 10%.

In C language you can cast the sieve array to whatever you want so you can reduce the timing of the walk almost to nothing, but in Java this cast is not possible. So I sieve two polynomials at the same time and then add the logarithm of the current prime to the first or the second byte of the short.

When using two large primes, I assume that the sieve time will be about halved and the trial division time will be increased. It is clear that this step needs to be optimized otherwise it will dominate the execution time.

The naive trial division require n divisions where n is the number of multiple precision words. The problem is that Java performs 64-bit divisions which are very slow. So I was able to improve the timing by using sieve information: For the primes which are not the divisors of the quadratic coefficient I compute the difference between the index of the sieve and soln1 (using Contini's notation) modulo the current prime. Then if this value is zero or prime - difsoln where difsoln = soln2 - soln1 (mod prime), that prime is a divisor of (ax+b)^2-N, so I divide one or more times (while the remainder is zero) by the prime. This only requires one 32-bit division in average instead of 4 or more 64-bit divisions.
alpertron is offline   Reply With Quote
Old 2010-04-07, 13:55   #62
alpertron
 
alpertron's Avatar
 
Aug 2002
Buenos Aires, Argentina

22·3·113 Posts
Default

I was thinking on Pollard Rho using Montgomery multiplications because it requires no divisions. According to the third table at http://www.behardware.com/articles/6...-duo-test.html the divisions are very slow compared with multiplications.
alpertron is offline   Reply With Quote
Old 2010-04-08, 14:23   #63
ThiloHarich
 
ThiloHarich's Avatar
 
Nov 2005

101 Posts
Default

@ bsquared so you are using JDK 1.4?
Switching to JDK 1.6 gives a big performance increase, since the -server Option is not on for a Standard Applet. See http://www.mersenneforum.org/showpos...1&postcount=30

I stopped working on my Java Factorizer, since I had the same performance like alpertron, except the matrix step because i do not have the Block Lancosz Matrix Step implemented. And the parallel variant was not running properly. Maybe I will start working on it.
Pollard Rho works rather slow for me. I expected much more out of it.
I resieve and since the smooth numbers are very sparse I do the division with the standard BigInterger.divide on the smooth numbers. It still takes only very view time.
ThiloHarich is offline   Reply With Quote
Old 2010-04-08, 17:22   #64
xilman
Bamboozled!
 
xilman's Avatar
 
"π’‰Ίπ’ŒŒπ’‡·π’†·π’€­"
May 2003
Down not across

246728 Posts
Default

Quote:
Originally Posted by ThiloHarich View Post
Pollard Rho works rather slow for me.
I'm not surprised. Try SQFOF, or ECM and/or QS especially tailored for very small integers.

Paul
xilman is offline   Reply With Quote
Old 2010-04-08, 18:23   #65
bsquared
 
bsquared's Avatar
 
"Ben"
Feb 2007

2×32×191 Posts
Default

Quote:
Originally Posted by ThiloHarich View Post
@ bsquared so you are using JDK 1.4?
Switching to JDK 1.6 gives a big performance increase, since the -server Option is not on for a Standard Applet. See http://www.mersenneforum.org/showpos...1&postcount=30
Code:
 % javac -version
javac 1.6.0_18

% java -version
java version "1.6.0_18"
Java(TM) SE Runtime Environment (build 1.6.0_18-b07)
Java HotSpot(TM) 64-Bit Server VM (build 16.0-b13, mixed mode)
I first compiled the applet code with javac, then ran it using "java ecm". This was slower than the binary produced by gcj, but as I mentioned before, threading using javac seems to be better.
bsquared is offline   Reply With Quote
Old 2010-04-08, 19:02   #66
alpertron
 
alpertron's Avatar
 
Aug 2002
Buenos Aires, Argentina

22·3·113 Posts
Default

I'll use SQUFOF given that it takes little space for its implementation (this is a bonus in Java).
alpertron is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
Java applet alternative a1call Programming 19 2019-11-08 22:31
New online applet for factorization ET_ Lone Mersenne Hunters 69 2014-06-01 17:34
A strange applet: 3.14159 Miscellaneous Math 7 2010-06-01 01:29
Faster factorization applet alpertron Factoring 14 2006-01-01 04:00
Binomial Expansion Applet jinydu Lounge 2 2004-05-05 08:33

All times are UTC. The time now is 23:42.

Tue May 11 23:42:55 UTC 2021 up 33 days, 18:23, 0 users, load averages: 4.15, 3.73, 3.42

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.