![]() |
![]() |
#1 |
∂2ω=0
Sep 2002
República de California
2·11·13·41 Posts |
![]()
If someone who regularly uses GMP would be so kind, I would very much appreciate the following: could you let me know how long it takes the latest version of GMP [or whichever you have installed on your PC, so long as it's fairly recent] to trial-factor a random 1-megabit integer up to a small-prime limit of 1,000,000? Also let me know what hardware/OS you used.
If possible, timings under both 32 and 64-bit Linux or Windows would be nice. Many thanks, -Ernst |
![]() |
![]() |
![]() |
#2 |
(loop (#_fork))
Feb 2006
Cambridge, England
33·239 Posts |
![]()
64-bit Ubuntu on K8/2200, trial division of a random megabit number by the first 78498 primes by repeated calls to mpz_tdiv_ui takes 12.63 seconds.
GCD(megabit number, product of first 78498 primes) takes 1.88 seconds, so you probably want to do that instead. See attached source. |
![]() |
![]() |
![]() |
#3 |
∂2ω=0
Sep 2002
República de California
101101110011102 Posts |
![]()
Many thanks, Tom - indeed once the product of the trial divisors becomes sufficiently large one should instead use the GCD approach. But I wanted to gauge the speed of raw trial-division, to compare with some new code I am playing with there. [Which needs 6 seconds for the same task in 32-bit mode on a 2GHz p4, using straight integer code, no SSE. Will post the 64-bit timing once I get a chance to try it on a linux box].
|
![]() |
![]() |
![]() |
#4 |
(loop (#_fork))
Feb 2006
Cambridge, England
33·239 Posts |
![]()
On a 2.8GHz 32-bit P4 (Xeon) the time for princ.cpp as posted above is
trial division 69.5s GCD 12.25s so your code is faster than using GCD. On a 64-bit system I'm finding GCD very significantly (factor 10 or more) faster than trial division on megabit numbers even for primes up to 10^4: see attached graph. Last fiddled with by fivemack on 2008-12-11 at 16:37 |
![]() |
![]() |
![]() |
#5 |
"Ben"
Feb 2007
3,617 Posts |
![]()
Some timings on various systems using Tom's code, compiled with gmp 4.2.3
on: Code:
processor : 3 vendor_id : AuthenticAMD cpu family : 15 model : 33 model name : Dual Core AMD Opteron(tm) Processor 265 stepping : 2 cpu MHz : 1793.741 cache size : 1024 KB physical id : 1 siblings : 2 core id : 1 cpu cores : 2 apicid : 3 initial apicid : 3 fpu : yes fpu_exception : yes cpuid level : 1 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt lm 3dnowext 3dnow rep_good pni lahf_lm cmp_legacy bogomips : 3587.56 TLB size : 1024 4K pages clflush size : 64 cache_alignment : 64 address sizes : 40 bits physical, 48 bits virtual power management: ts fid vid ttp trial: 14.45 sec gcd: 2.31 sec On: Code:
vendor_id : GenuineIntel cpu family : 15 model : 4 model name : Intel(R) Xeon(TM) CPU 3.60GHz physical id : 255 siblings : 1 runqueue : 1 stepping : 3 cpu MHz : 3600.255 cache size : 2048 KB fpu : yes fpu_exception : yes cpuid level : 5 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss tm ferr syscall nx lm sse3 monitor ds-cpl gv3 tm2 cnxt-id bogomips : 7182.74 clflush size : 64 address sizes : 36 bits physical, 48 bits virtual trial: 19.13 sec gcd: 3.37 sec and on: Code:
processor : 0 vendor_id : GenuineIntel cpu family : 6 model : 15 model name : Intel(R) Xeon(R) CPU X5365 @ 3.00GHz physical id : 0 siblings : 4 core id : 0 cpu cores : 4 runqueue : 0 stepping : 11 cpu MHz : 2992.510 cache size : 4096 KB fpu : yes fpu_exception : yes cpuid level : 10 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm ferr syscall lm sse3 monitor ds-cpl gv3 tm2 bogomips : 5976.88 clflush size : 64 address sizes : 38 bits physical, 48 bits virtual trial: 13.44 sec gcd: 1.3 sec |
![]() |
![]() |
![]() |
#6 | |
Feb 2006
Denmark
2·5·23 Posts |
![]() Quote:
PFGW Version 20041023.Win_Dev (Beta 'caveat utilitor') [FFT v23.8] It uses GMP 4.1.3. The trial factoring algorithm is based on my TreeSieve program and takes 1s on a core of a busy Core 2 Duo 2.4 GHz with 32-bit Windows Vista. No PFGW release version has been made since 2004 and the release versions use a far slower trial factoring (around 20s). |
|
![]() |
![]() |
![]() |
#7 | |
∂2ω=0
Sep 2002
República de California
101101110011102 Posts |
![]() Quote:
Also, could you give a very brief description of your algorithm, and what kinds of operations predominate? Thanks, -Ernst |
|
![]() |
![]() |
![]() |
#8 |
Feb 2006
Denmark
3468 Posts |
![]()
I was not first to think of the algorithm which is also called a remainder tree. It's dominated by modulo operations where both operands are large.
See for example the first paragraph of http://cr.yp.to/arith/scaledmod-20040820.pdf. My experimental 2004 implementation with a description is in http://hjem.get2net.dk/jka/math/treesieve.zip The following is copied from treesieve.txt there. Suppose we want to trial factor a large number c by a lot of small primes, e.g. with around 20 bits each. Let xn mean an n-bit number. An asymptotically fast modulus, like the one in GMP, can compute c mod x200 much faster than 10 computations of c mod x20. It can also compute c mod x400 faster than c mod x200 two times, and so on. TreeSieve uses this, e.g. by first multiplying 10 consecutive small primes x20_i to form x200, then two of these to form x400, and so on. The multiplications can e.g. stop when xn is around sqrt(c). Let mod be a binary operator (C's %) as in 13 mod 5 = 3. If x20 divides x200 then elementary math says c mod x20 = (c mod x200) mod x20. Let rn_i = c mod xn_i Algorithm example where the product sequence goes x20, x200, x400: 1) Compute the next 20 primes x20_1..x20_20 2) Compute x200_1 = x20_1 * x20_2 * ... * x20_10 and x200_2 = x20_11 * x20_12 * ... * x20_20 3) Compute x400 = x200_1 * x200_2 4) Compute r400 = c mod x400 5) Compute r200_1 = r400 mod x200_1 and r200_2 = r400 mod x200_2 6) For i=1..10, compute r20_i = r200_1 mod x20_i For i=11..20, compute r20_i = r200_2 mod x20_i 7) Use r20_i in the application (test for 0 in TreeSieve) 8) Goto 1) We now have c mod x20 for 20 primes. Experimentation with TreeSieve shows 1) to 6) is much faster than the traditional: For i=1..20, compute r20_i = c mod x20_i |
![]() |
![]() |
![]() |
Thread Tools | |
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Sublinear complexity of trial division? | yih117 | Math | 5 | 2018-02-02 02:49 |
Mersenne trial division implementation | mathPuzzles | Math | 8 | 2017-04-21 07:21 |
trial division over a factor base | Peter Hackman | Factoring | 7 | 2009-10-26 18:27 |
P95 trial division strategy | SPWorley | Math | 8 | 2009-08-24 23:26 |
Trial division software for Mersenne | SPWorley | Factoring | 7 | 2009-08-16 00:23 |