20100406, 16:46  #56  
"Ben"
Feb 2007
2×3^{2}×191 Posts 
Quote:
Interestingly, the bytecode compiler seems to do a much better job with threading than gcj. I measured the scaling efficiency for 2 and 4 threads and compared to YAFU, for reference. Efficiency is computed as the ratio of theoretical peak performance improvement to actual. Does your applet multithread the linear algebra as well as sieving? YAFU does not multithread the postprocessing, so the attachment has the efficiency computed both with and without postprocessing times included. I loath table formatting with vBulletin, so the data is in the attachment. 

20100406, 18:36  #57 
Aug 2002
Buenos Aires, Argentina
2^{2}·3·113 Posts 
The multithreading is only done on the sieving. Since the Block Lanczos code only needs a few percent of the total running time, I think the applet does not need to execute parallel linear algebra. In your spreadsheet it is clear that the double large prime variation is a lot faster than my simple prime variation for larger composite numbers.

20100406, 18:45  #58  
"Ben"
Feb 2007
D6E_{16} Posts 
Quote:
Well, it's true that DLP benefits larger composites, but YAFU doesn't "turn on" the double large prime variation until above about C82, so all of these tests use the single large prime variation. The C82 cutoff was empirically determined... below that DLP starts to be slower, at least in YAFU. 

20100406, 19:01  #59  
Aug 2002
Buenos Aires, Argentina
2^{2}·3·113 Posts 
Quote:


20100407, 07:10  #60  
Bamboozled!
"πΊππ·π·π"
May 2003
Down not across
2×7^{2}×109 Posts 
Quote:
Pollard rho is good enough for factoring small cofactors but subsequent experiments, some of them done by me, appear to show that QS specifically tailored for such small integers is significantly better. SQFOF also seems to be better than rho. Paul 

20100407, 12:13  #61 
Aug 2002
Buenos Aires, Argentina
2^{2}×3×113 Posts 
I was profiling the applet using the standard tools:
Code:
appletviewer JXrunhprof:cpu=samples,depth=6 ecm.htm There are several timings that must be considered (at least in my applet): * Sieving * Walk through sieve in order to find candidates for trial division * Trial division. The polynomial initialization only needs 3% of the running time, so it is negligible. The second step in my applet takes about 15%  20% of the total running time, and the trial division depends on the input number: for 60 digits it takes about 40% of the running time while for 100 digits the percentage is reduced to less than 10%. In C language you can cast the sieve array to whatever you want so you can reduce the timing of the walk almost to nothing, but in Java this cast is not possible. So I sieve two polynomials at the same time and then add the logarithm of the current prime to the first or the second byte of the short. When using two large primes, I assume that the sieve time will be about halved and the trial division time will be increased. It is clear that this step needs to be optimized otherwise it will dominate the execution time. The naive trial division require n divisions where n is the number of multiple precision words. The problem is that Java performs 64bit divisions which are very slow. So I was able to improve the timing by using sieve information: For the primes which are not the divisors of the quadratic coefficient I compute the difference between the index of the sieve and soln1 (using Contini's notation) modulo the current prime. Then if this value is zero or prime  difsoln where difsoln = soln2  soln1 (mod prime), that prime is a divisor of (ax+b)^2N, so I divide one or more times (while the remainder is zero) by the prime. This only requires one 32bit division in average instead of 4 or more 64bit divisions. 
20100407, 13:55  #62 
Aug 2002
Buenos Aires, Argentina
2^{2}·3·113 Posts 
I was thinking on Pollard Rho using Montgomery multiplications because it requires no divisions. According to the third table at http://www.behardware.com/articles/6...duotest.html the divisions are very slow compared with multiplications.

20100408, 14:23  #63 
Nov 2005
101 Posts 
@ bsquared so you are using JDK 1.4?
Switching to JDK 1.6 gives a big performance increase, since the server Option is not on for a Standard Applet. See http://www.mersenneforum.org/showpos...1&postcount=30 I stopped working on my Java Factorizer, since I had the same performance like alpertron, except the matrix step because i do not have the Block Lancosz Matrix Step implemented. And the parallel variant was not running properly. Maybe I will start working on it. Pollard Rho works rather slow for me. I expected much more out of it. I resieve and since the smooth numbers are very sparse I do the division with the standard BigInterger.divide on the smooth numbers. It still takes only very view time. 
20100408, 17:22  #64 
Bamboozled!
"πΊππ·π·π"
May 2003
Down not across
24672_{8} Posts 

20100408, 18:23  #65  
"Ben"
Feb 2007
2×3^{2}×191 Posts 
Quote:
Code:
% javac version javac 1.6.0_18 % java version java version "1.6.0_18" Java(TM) SE Runtime Environment (build 1.6.0_18b07) Java HotSpot(TM) 64Bit Server VM (build 16.0b13, mixed mode) 

20100408, 19:02  #66 
Aug 2002
Buenos Aires, Argentina
2^{2}·3·113 Posts 
I'll use SQUFOF given that it takes little space for its implementation (this is a bonus in Java).

Thread Tools  
Similar Threads  
Thread  Thread Starter  Forum  Replies  Last Post 
Java applet alternative  a1call  Programming  19  20191108 22:31 
New online applet for factorization  ET_  Lone Mersenne Hunters  69  20140601 17:34 
A strange applet:  3.14159  Miscellaneous Math  7  20100601 01:29 
Faster factorization applet  alpertron  Factoring  14  20060101 04:00 
Binomial Expansion Applet  jinydu  Lounge  2  20040505 08:33 