20081210, 21:54  #1 
∂^{2}ω=0
Sep 2002
República de California
2^{2}·5·11·53 Posts 
Need GMP trialdivision timings
If someone who regularly uses GMP would be so kind, I would very much appreciate the following: could you let me know how long it takes the latest version of GMP [or whichever you have installed on your PC, so long as it's fairly recent] to trialfactor a random 1megabit integer up to a smallprime limit of 1,000,000? Also let me know what hardware/OS you used.
If possible, timings under both 32 and 64bit Linux or Windows would be nice. Many thanks, Ernst 
20081211, 12:15  #2 
(loop (#_fork))
Feb 2006
Cambridge, England
3×19×113 Posts 
64bit Ubuntu on K8/2200, trial division of a random megabit number by the first 78498 primes by repeated calls to mpz_tdiv_ui takes 12.63 seconds.
GCD(megabit number, product of first 78498 primes) takes 1.88 seconds, so you probably want to do that instead. See attached source. 
20081211, 16:05  #3 
∂^{2}ω=0
Sep 2002
República de California
2D8C_{16} Posts 
Many thanks, Tom  indeed once the product of the trial divisors becomes sufficiently large one should instead use the GCD approach. But I wanted to gauge the speed of raw trialdivision, to compare with some new code I am playing with there. [Which needs 6 seconds for the same task in 32bit mode on a 2GHz p4, using straight integer code, no SSE. Will post the 64bit timing once I get a chance to try it on a linux box].

20081211, 16:26  #4 
(loop (#_fork))
Feb 2006
Cambridge, England
3·19·113 Posts 
On a 2.8GHz 32bit P4 (Xeon) the time for princ.cpp as posted above is
trial division 69.5s GCD 12.25s so your code is faster than using GCD. On a 64bit system I'm finding GCD very significantly (factor 10 or more) faster than trial division on megabit numbers even for primes up to 10^4: see attached graph. Last fiddled with by fivemack on 20081211 at 16:37 
20081211, 17:05  #5 
"Ben"
Feb 2007
2×1,789 Posts 
Some timings on various systems using Tom's code, compiled with gmp 4.2.3
on: Code:
processor : 3 vendor_id : AuthenticAMD cpu family : 15 model : 33 model name : Dual Core AMD Opteron(tm) Processor 265 stepping : 2 cpu MHz : 1793.741 cache size : 1024 KB physical id : 1 siblings : 2 core id : 1 cpu cores : 2 apicid : 3 initial apicid : 3 fpu : yes fpu_exception : yes cpuid level : 1 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt lm 3dnowext 3dnow rep_good pni lahf_lm cmp_legacy bogomips : 3587.56 TLB size : 1024 4K pages clflush size : 64 cache_alignment : 64 address sizes : 40 bits physical, 48 bits virtual power management: ts fid vid ttp trial: 14.45 sec gcd: 2.31 sec On: Code:
vendor_id : GenuineIntel cpu family : 15 model : 4 model name : Intel(R) Xeon(TM) CPU 3.60GHz physical id : 255 siblings : 1 runqueue : 1 stepping : 3 cpu MHz : 3600.255 cache size : 2048 KB fpu : yes fpu_exception : yes cpuid level : 5 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss tm ferr syscall nx lm sse3 monitor dscpl gv3 tm2 cnxtid bogomips : 7182.74 clflush size : 64 address sizes : 36 bits physical, 48 bits virtual trial: 19.13 sec gcd: 3.37 sec and on: Code:
processor : 0 vendor_id : GenuineIntel cpu family : 6 model : 15 model name : Intel(R) Xeon(R) CPU X5365 @ 3.00GHz physical id : 0 siblings : 4 core id : 0 cpu cores : 4 runqueue : 0 stepping : 11 cpu MHz : 2992.510 cache size : 4096 KB fpu : yes fpu_exception : yes cpuid level : 10 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm ferr syscall lm sse3 monitor dscpl gv3 tm2 bogomips : 5976.88 clflush size : 64 address sizes : 38 bits physical, 48 bits virtual trial: 13.44 sec gcd: 1.3 sec 
20081211, 19:44  #6  
Feb 2006
Denmark
2×5×23 Posts 
Quote:
PFGW Version 20041023.Win_Dev (Beta 'caveat utilitor') [FFT v23.8] It uses GMP 4.1.3. The trial factoring algorithm is based on my TreeSieve program and takes 1s on a core of a busy Core 2 Duo 2.4 GHz with 32bit Windows Vista. No PFGW release version has been made since 2004 and the release versions use a far slower trial factoring (around 20s). 

20081211, 19:59  #7  
∂^{2}ω=0
Sep 2002
República de California
2^{2}×5×11×53 Posts 
Quote:
Also, could you give a very brief description of your algorithm, and what kinds of operations predominate? Thanks, Ernst 

20081211, 22:12  #8 
Feb 2006
Denmark
346_{8} Posts 
I was not first to think of the algorithm which is also called a remainder tree. It's dominated by modulo operations where both operands are large.
See for example the first paragraph of http://cr.yp.to/arith/scaledmod20040820.pdf. My experimental 2004 implementation with a description is in http://hjem.get2net.dk/jka/math/treesieve.zip The following is copied from treesieve.txt there. Suppose we want to trial factor a large number c by a lot of small primes, e.g. with around 20 bits each. Let xn mean an nbit number. An asymptotically fast modulus, like the one in GMP, can compute c mod x200 much faster than 10 computations of c mod x20. It can also compute c mod x400 faster than c mod x200 two times, and so on. TreeSieve uses this, e.g. by first multiplying 10 consecutive small primes x20_i to form x200, then two of these to form x400, and so on. The multiplications can e.g. stop when xn is around sqrt(c). Let mod be a binary operator (C's %) as in 13 mod 5 = 3. If x20 divides x200 then elementary math says c mod x20 = (c mod x200) mod x20. Let rn_i = c mod xn_i Algorithm example where the product sequence goes x20, x200, x400: 1) Compute the next 20 primes x20_1..x20_20 2) Compute x200_1 = x20_1 * x20_2 * ... * x20_10 and x200_2 = x20_11 * x20_12 * ... * x20_20 3) Compute x400 = x200_1 * x200_2 4) Compute r400 = c mod x400 5) Compute r200_1 = r400 mod x200_1 and r200_2 = r400 mod x200_2 6) For i=1..10, compute r20_i = r200_1 mod x20_i For i=11..20, compute r20_i = r200_2 mod x20_i 7) Use r20_i in the application (test for 0 in TreeSieve) 8) Goto 1) We now have c mod x20 for 20 primes. Experimentation with TreeSieve shows 1) to 6) is much faster than the traditional: For i=1..20, compute r20_i = c mod x20_i 
Thread Tools  
Similar Threads  
Thread  Thread Starter  Forum  Replies  Last Post 
Sublinear complexity of trial division?  yih117  Math  5  20180202 02:49 
Mersenne trial division implementation  mathPuzzles  Math  8  20170421 07:21 
trial division over a factor base  Peter Hackman  Factoring  7  20091026 18:27 
P95 trial division strategy  SPWorley  Math  8  20090824 23:26 
Trial division software for Mersenne  SPWorley  Factoring  7  20090816 00:23 