mersenneforum.org  

Go Back   mersenneforum.org > Factoring Projects > Factoring

Reply
 
Thread Tools
Old 2008-12-10, 21:54   #1
ewmayer
2ω=0
 
ewmayer's Avatar
 
Sep 2002
República de California

101101001100102 Posts
Default Need GMP trial-division timings

If someone who regularly uses GMP would be so kind, I would very much appreciate the following: could you let me know how long it takes the latest version of GMP [or whichever you have installed on your PC, so long as it's fairly recent] to trial-factor a random 1-megabit integer up to a small-prime limit of 1,000,000? Also let me know what hardware/OS you used.

If possible, timings under both 32 and 64-bit Linux or Windows would be nice.

Many thanks,
-Ernst
ewmayer is offline   Reply With Quote
Old 2008-12-11, 12:15   #2
fivemack
(loop (#_fork))
 
fivemack's Avatar
 
Feb 2006
Cambridge, England

23·797 Posts
Default

64-bit Ubuntu on K8/2200, trial division of a random megabit number by the first 78498 primes by repeated calls to mpz_tdiv_ui takes 12.63 seconds.

GCD(megabit number, product of first 78498 primes) takes 1.88 seconds, so you probably want to do that instead.

See attached source.
Attached Files
File Type: txt princ.cpp.txt (1.5 KB, 208 views)
fivemack is offline   Reply With Quote
Old 2008-12-11, 16:05   #3
ewmayer
2ω=0
 
ewmayer's Avatar
 
Sep 2002
República de California

2D3216 Posts
Default

Many thanks, Tom - indeed once the product of the trial divisors becomes sufficiently large one should instead use the GCD approach. But I wanted to gauge the speed of raw trial-division, to compare with some new code I am playing with there. [Which needs 6 seconds for the same task in 32-bit mode on a 2GHz p4, using straight integer code, no SSE. Will post the 64-bit timing once I get a chance to try it on a linux box].
ewmayer is offline   Reply With Quote
Old 2008-12-11, 16:26   #4
fivemack
(loop (#_fork))
 
fivemack's Avatar
 
Feb 2006
Cambridge, England

23×797 Posts
Default

On a 2.8GHz 32-bit P4 (Xeon) the time for princ.cpp as posted above is

trial division 69.5s
GCD 12.25s

so your code is faster than using GCD.

On a 64-bit system I'm finding GCD very significantly (factor 10 or more) faster than trial division on megabit numbers even for primes up to 10^4: see attached graph.
Attached Thumbnails
Click image for larger version

Name:	trialdiv.png
Views:	193
Size:	30.9 KB
ID:	3010  

Last fiddled with by fivemack on 2008-12-11 at 16:37
fivemack is offline   Reply With Quote
Old 2008-12-11, 17:05   #5
bsquared
 
bsquared's Avatar
 
"Ben"
Feb 2007

3,347 Posts
Default

Some timings on various systems using Tom's code, compiled with gmp 4.2.3

on:
Code:
processor       : 3
vendor_id       : AuthenticAMD
cpu family      : 15
model           : 33
model name      : Dual Core AMD Opteron(tm) Processor 265
stepping        : 2
cpu MHz         : 1793.741
cache size      : 1024 KB
physical id     : 1
siblings        : 2
core id         : 1
cpu cores       : 2
apicid          : 3
initial apicid  : 3
fpu             : yes
fpu_exception   : yes
cpuid level     : 1
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt lm 3dnowext 3dnow rep_good pni lahf_lm cmp_legacy
bogomips        : 3587.56
TLB size        : 1024 4K pages
clflush size    : 64
cache_alignment : 64
address sizes   : 40 bits physical, 48 bits virtual
power management: ts fid vid ttp
I get:

trial: 14.45 sec
gcd: 2.31 sec

On:
Code:
vendor_id       : GenuineIntel
cpu family      : 15
model           : 4
model name      :                   Intel(R) Xeon(TM) CPU 3.60GHz
physical id     : 255
siblings        : 1
runqueue        : 1
stepping        : 3
cpu MHz         : 3600.255
cache size      : 2048 KB
fpu             : yes
fpu_exception   : yes
cpuid level     : 5
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss tm ferr syscall nx lm sse3 monitor ds-cpl gv3 tm2 cnxt-id
bogomips        : 7182.74
clflush size    : 64
address sizes   : 36 bits physical, 48 bits virtual
I get:
trial: 19.13 sec
gcd: 3.37 sec

and on:
Code:
processor       : 0
vendor_id       : GenuineIntel
cpu family      : 6
model           : 15
model name      : Intel(R) Xeon(R) CPU           X5365  @ 3.00GHz
physical id     : 0
siblings        : 4
core id         : 0
cpu cores       : 4
runqueue        : 0
stepping        : 11
cpu MHz         : 2992.510
cache size      : 4096 KB
fpu             : yes
fpu_exception   : yes
cpuid level     : 10
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm ferr syscall lm sse3 monitor ds-cpl gv3 tm2
bogomips        : 5976.88
clflush size    : 64
address sizes   : 38 bits physical, 48 bits virtual
I get:
trial: 13.44 sec
gcd: 1.3 sec
bsquared is offline   Reply With Quote
Old 2008-12-11, 19:44   #6
Jens K Andersen
 
Jens K Andersen's Avatar
 
Feb 2006
Denmark

23010 Posts
Default

Quote:
Originally Posted by ewmayer View Post
once the product of the trial divisors becomes sufficiently large one should instead use the GCD approach.
GCD is not the fastest option. At the files area of http://tech.groups.yahoo.com/group/openpfgw/ is the development version
PFGW Version 20041023.Win_Dev (Beta 'caveat utilitor') [FFT v23.8]
It uses GMP 4.1.3. The trial factoring algorithm is based on my TreeSieve program and takes 1s on a core of a busy Core 2 Duo 2.4 GHz with 32-bit Windows Vista. No PFGW release version has been made since 2004 and the release versions use a far slower trial factoring (around 20s).
Jens K Andersen is offline   Reply With Quote
Old 2008-12-11, 19:59   #7
ewmayer
2ω=0
 
ewmayer's Avatar
 
Sep 2002
República de California

101101001100102 Posts
Default

Quote:
Originally Posted by Jens K Andersen View Post
GCD is not the fastest option. At the files area of http://tech.groups.yahoo.com/group/openpfgw/ is the development version
PFGW Version 20041023.Win_Dev (Beta 'caveat utilitor') [FFT v23.8]
It uses GMP 4.1.3. The trial factoring algorithm is based on my TreeSieve program and takes 1s on a core of a busy Core 2 Duo 2.4 GHz with 32-bit Windows Vista. No PFGW release version has been made since 2004 and the release versions use a far slower trial factoring (around 20s).
Hi, Jens: I thought Vista was strictly 64-bit.

Also, could you give a very brief description of your algorithm, and what kinds of operations predominate?

Thanks,
-Ernst
ewmayer is offline   Reply With Quote
Old 2008-12-11, 22:12   #8
Jens K Andersen
 
Jens K Andersen's Avatar
 
Feb 2006
Denmark

2×5×23 Posts
Default

I was not first to think of the algorithm which is also called a remainder tree. It's dominated by modulo operations where both operands are large.
See for example the first paragraph of http://cr.yp.to/arith/scaledmod-20040820.pdf.
My experimental 2004 implementation with a description is in http://hjem.get2net.dk/jka/math/treesieve.zip
The following is copied from treesieve.txt there.

Suppose we want to trial factor a large number c by a lot of small primes, e.g. with around 20 bits each. Let xn mean an n-bit number.
An asymptotically fast modulus, like the one in GMP, can compute c mod x200 much faster than 10 computations of c mod x20. It can also compute c mod x400 faster than c mod x200 two times, and so on.
TreeSieve uses this, e.g. by first multiplying 10 consecutive small primes x20_i to form x200, then two of these to form x400, and so on. The multiplications can e.g. stop when xn is around sqrt(c).
Let mod be a binary operator (C's %) as in 13 mod 5 = 3.
If x20 divides x200 then elementary math says c mod x20 = (c mod x200) mod x20.

Let rn_i = c mod xn_i
Algorithm example where the product sequence goes x20, x200, x400:
1) Compute the next 20 primes x20_1..x20_20
2) Compute x200_1 = x20_1 * x20_2 * ... * x20_10
and x200_2 = x20_11 * x20_12 * ... * x20_20
3) Compute x400 = x200_1 * x200_2
4) Compute r400 = c mod x400
5) Compute r200_1 = r400 mod x200_1
and r200_2 = r400 mod x200_2
6) For i=1..10, compute r20_i = r200_1 mod x20_i
For i=11..20, compute r20_i = r200_2 mod x20_i
7) Use r20_i in the application (test for 0 in TreeSieve)
8) Goto 1)

We now have c mod x20 for 20 primes. Experimentation with TreeSieve shows 1) to 6) is much faster than the traditional:
For i=1..20, compute r20_i = c mod x20_i
Jens K Andersen is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
Sublinear complexity of trial division? yih117 Math 5 2018-02-02 02:49
Mersenne trial division implementation mathPuzzles Math 8 2017-04-21 07:21
trial division over a factor base Peter Hackman Factoring 7 2009-10-26 18:27
P95 trial division strategy SPWorley Math 8 2009-08-24 23:26
Trial division software for Mersenne SPWorley Factoring 7 2009-08-16 00:23

All times are UTC. The time now is 09:45.

Wed Dec 2 09:45:25 UTC 2020 up 83 days, 6:56, 1 user, load averages: 3.32, 3.08, 2.63

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2020, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.