mersenneforum.org  

Go Back   mersenneforum.org > Factoring Projects > GMP-ECM

Reply
 
Thread Tools
Old 2012-02-12, 10:07   #34
debrouxl
 
debrouxl's Avatar
 
Sep 2009

977 Posts
Default

Another data point, for numbers between C144 and C29x: C237 is slower on the GPU, but obviously faster on the CPU, than C29x:
Code:
$ echo 472367364481324943429608990380363865230376899949857658144588096283146783114430372207621802600829155058766951167631153619328587819346877117165453306904995816614534365740792256712736351604580048562330248528078693598071309876495244264859329 | ./gpu_ecm -vv -n 64 -save 80009_248_3e6_1 3000000
#Compiled for a NVIDIA GPU with compute capability 1.3.
#Will use device 0 : GeForce GT 540M, compute capability 2.1, 2 MPs.
#s has 4328086 bits
Precomputation of s took 0.256s
Input number is 472367364481324943429608990380363865230376899949857658144588096283146783114430372207621802600829155058766951167631153619328587819346877117165453306904995816614534365740792256712736351604580048562330248528078693598071309876495244264859329 (237 digits)
Using B1=3000000, firstinvd=563947071, with 64 curves
[snip]
gpu_ecm took : 1637.614s (0.000+1637.610+0.004)
Throughput : 0.039


$ echo 472367364481324943429608990380363865230376899949857658144588096283146783114430372207621802600829155058766951167631153619328587819346877117165453306904995816614534365740792256712736351604580048562330248528078693598071309876495244264859329 | ./ecm -c 1 3000000
bash: ./ecm: Aucun fichier ou dossier de ce type
debrouxl@asus2:~/ecm/gpu/gpu_ecm$ echo 472367364481324943429608990380363865230376899949857658144588096283146783114430372207621802600829155058766951167631153619328587819346877117165453306904995816614534365740792256712736351604580048562330248528078693598071309876495244264859329 | ecm -c 1 3000000
GMP-ECM 6.5-dev [configured with GMP 5.0.90, --enable-asm-redc, --enable-assert] [ECM]
Input number is 472367364481324943429608990380363865230376899949857658144588096283146783114430372207621802600829155058766951167631153619328587819346877117165453306904995816614534365740792256712736351604580048562330248528078693598071309876495244264859329 (237 digits)
Using B1=3000000, B2=5706890290, polynomial Dickson(6), sigma=379651352
Step 1 took 42974ms
Step 2 took 12981ms
On that number, the core busy with gpu_ecm is spending a bit more than half of its time in "system" state. I guess that it's what xilman mentioned above ?
Quote:
A third is to reduce the (presently extortionate IMO) amount of cpu time used by busy-waiting for the kernels to complete.
debrouxl is offline   Reply With Quote
Old 2012-02-12, 12:58   #35
lorgix
 
lorgix's Avatar
 
Sep 2010
Scandinavia

3·5·41 Posts
Default

Quote:
Originally Posted by xilman View Post
I screwed up computing the time per curve


1792 curves took 141 hours to run. I evaluated (1792 * 141 / 3600) to obtain the quoted figure of 70 seconds per curve.

The correct expression is (141 * 3600 / 1792), which evaluates to 283 seconds per curve.
Although this is four times worse than the initial figure, it is still 2.4 times faster than a singe core.

Sorry about that.
Still a very interesting development. Congrats on the factors.
lorgix is offline   Reply With Quote
Old 2012-02-12, 13:18   #36
xilman
Bamboozled!
 
xilman's Avatar
 
"π’‰Ίπ’ŒŒπ’‡·π’†·π’€­"
May 2003
Down not across

3×3,529 Posts
Default

Since my first experiments, I've been playing with a version which uses 512-bit arithmetic (fudged with CFLAGS+=-DNB_DIGITS=16 in the relevant line of Makefile). As expected, ECM runs around 3 times faster on ~500 bit numbers with this change.

One of the things on my to-do list is to add greater flexibility to the choice of bignum sizes.

Experiments with both 1024 and 512-bit arithmetic indicate that running more than the default number of curves is a Good Thing, presumably by hiding memory latency. The downside, of course, is that the display stays rather sluggish for a proportionately long time. I'm trying to estimate how long a run will take and then kick it off overnight when display latency is likely to be unimportant.


Paul
xilman is offline   Reply With Quote
Old 2012-02-12, 21:26   #37
frmky
 
frmky's Avatar
 
Jul 2003
So Cal

40248 Posts
Default

Quote:
Originally Posted by xilman View Post
I'm trying to estimate how long a run will take and then kick it off overnight when display latency is likely to be unimportant.
I added a percent complete counter in the for loop launching the kernels in cudautil.cu. I don't think adding an ETA would be difficult.
frmky is offline   Reply With Quote
Old 2012-02-12, 22:07   #38
R.D. Silverman
 
R.D. Silverman's Avatar
 
Nov 2003

22·5·373 Posts
Default

Quote:
Originally Posted by xilman View Post
In case it is not clear to bystanders, this code is not fire and forget. It is not production quality.

If you want to use it, you will need to get your hands dirty. I'm prepared to help as best I can after you've followed the instructions in the svn distro and after you've made a sincere effort to get things working by yourself. I am not prepared to bottle-feed, to wipe noses or to change {nappies,diapers}.

That may sound harsh but it's the way the world of alpha-code development works and you'll need to get used to it if you want to play with the big boys and girls. Once you pass the audition you'll find most developers are very friendly and helpful.

Neither am I addressing these remarks to any particular individuals who may, or may not, have posted in this thread.

Paul
I totally agree with you.

However, allow me to point out that when I present a similar attitude
toward the learning of the algorithms discussed herein and the mathematics
behind them, I am lambasted for my efforts.

Participants should be willing to put in the effort or they should leave.
R.D. Silverman is offline   Reply With Quote
Old 2012-02-12, 23:40   #39
xilman
Bamboozled!
 
xilman's Avatar
 
"π’‰Ίπ’ŒŒπ’‡·π’†·π’€­"
May 2003
Down not across

3·3,529 Posts
Default

Quote:
Originally Posted by R.D. Silverman View Post
I totally agree with you.

However, allow me to point out that when I present a similar attitude
toward the learning of the algorithms discussed herein and the mathematics
behind them, I am lambasted for my efforts.

Participants should be willing to put in the effort or they should leave.
It seems to me that one difference is that there is a large amount of fire-and-forget code available and that code is suited to the majority of the people here. Only those who prepared for all the frustrations of working at the bleeding edge have any great need to be able to build, debug and install alpha code from a subversion repository.

Much of the mathematics discussed here is not at the bleeding edge, IMO. It is closer in spirit to oft-times cranky but nonetheless well understood and supported applications such as mainstream gmp-ecm.

IMO, your diatribes against those wishing to perform bleeding edge mathematics are fully justified. They are less appropriate, again IMO, further away from the bleeding edge. I hope I would never feel the urge to issue my earlier warnings to those who only wish to use gmp-ecm and are confused by its jargon and multitudinous options.
xilman is offline   Reply With Quote
Old 2012-02-13, 00:40   #40
R.D. Silverman
 
R.D. Silverman's Avatar
 
Nov 2003

22×5×373 Posts
Default

Quote:
Originally Posted by xilman View Post
It seems to me that one difference is that there is a large amount of fire-and-forget code available and that code is suited to the majority of the people here. Only those who prepared for all the frustrations of working at the bleeding edge have any great need to be able to build, debug and install alpha code from a subversion repository.
We agree.

Indeed. I have even heard one of the people (whom I hold in contempt)
admit that he does not even know how to use a compiler.

Quote:

Much of the mathematics discussed here is not at the bleeding edge, IMO. It is closer in spirit to oft-times cranky but nonetheless well understood and supported applications such as mainstream gmp-ecm.
And from my point of view too many of the participants herein do
not understand things even at that level. Nor do they seem willing
to make the attempt. They don't even understand mathematics that
was known 150+ years ago. Nor do they want to make the effort.
R.D. Silverman is offline   Reply With Quote
Old 2012-02-14, 19:12   #41
xilman
Bamboozled!
 
xilman's Avatar
 
"π’‰Ίπ’ŒŒπ’‡·π’†·π’€­"
May 2003
Down not across

1058710 Posts
Default

Quote:
Originally Posted by xilman View Post
Experiments with both 1024 and 512-bit arithmetic indicate that running more than the default number of curves is a Good Thing
Another data point shows that even choosing the correct default number is significant.

Out of the box (well, my box anyway) the default build appears to use parameters suitable for a CC1.3 system, despite there being a Fermi card installed. A run on a C302 with these parameters chooses 112 curves arranged 32x16 x 7x1x1 and takes 3845.428 seconds. Rebuilding with "make cc=2" and re-running took 5539.049 seconds for 224 curves arranged 32x32 x 7x1x1. The ratio (224/112) * (3845.428 / 5539.049) is 1.388.

I suggest a 39% speed-up is worth having.
xilman is offline   Reply With Quote
Old 2012-02-14, 22:02   #42
Ralf Recker
 
Ralf Recker's Avatar
 
Oct 2010

BF16 Posts
Default A few quick tests with a small B1 value

CC 2.0 card (GTX 470, stock clocks), 512 bit arithmetic, CUDA SDK 4.0. The c151 was taken from the Aliquot sequence 890460:i898

Code:
ralf@quadriga:~/dev/gpu_ecm$ LD_LIBRARY_PATH=/usr/local/cuda/lib64/ ./gpu_ecm -d 0 -save c151.save 250000 < c151
Precomputation of s took 0.004s
Input number is 4355109846524047003246531292211765742521128216321735054909228664961069056051308281896789359834792526662067203883345116753066761522281210568477760081509 (151 digits)
Using B1=250000, firstinvd=24351435, with 448 curves
gpu_ecm took : 116.363s (0.000+116.355+0.008)
Throughput : 3.850
Doubling the number of curves improves the throughput:

Code:
ralf@quadriga:~/dev/gpu_ecm$ LD_LIBRARY_PATH=/usr/local/cuda/lib64/ ./gpu_ecm -d 0 -n 896 -save c151.save 250000 < c151
Precomputation of s took 0.004s
Input number is 4355109846524047003246531292211765742521128216321735054909228664961069056051308281896789359834792526662067203883345116753066761522281210568477760081509 (151 digits)
Using B1=250000, firstinvd=1471710578, with 896 curves
gpu_ecm took : 179.747s (0.000+179.731+0.016)
Throughput : 4.985
32 curves less and the throughput increases by another 30%
Code:
ralf@quadriga:~/dev/gpu_ecm$ LD_LIBRARY_PATH=/usr/local/cuda/lib64/ ./gpu_ecm -d 0 -n 864 -save c151.save 250000 < c151
Precomputation of s took 0.004s
Input number is 4355109846524047003246531292211765742521128216321735054909228664961069056051308281896789359834792526662067203883345116753066761522281210568477760081509 (151 digits)
Using B1=250000, firstinvd=1374804691, with 864 curves
gpu_ecm took : 130.964s (0.000+130.948+0.016)
Throughput : 6.597
The throughput on a CC 2.1 card (GTX 460, 725 MHz factory OC) for the same number:

Code:
 224 curves - Throughput : 2.289
 416 curves - Throughput : 4.223
 448 curves - Throughput : 4.547
 480 curves - Throughput : 3.039
 672 curves - Throughput : 4.233
 896 curves - Throughput : 4.638
1792 curves - Throughput : 4.753

Last fiddled with by Ralf Recker on 2012-02-14 at 22:36 Reason: Caption, CC 2.1 results
Ralf Recker is offline   Reply With Quote
Old 2012-02-15, 19:19   #43
ET_
Banned
 
ET_'s Avatar
 
"Luigi"
Aug 2002
Team Italia

26·3·52 Posts
Default gpu_ecm ready to work

OK, I downloaded the source code with cc=1.3, and successfully compiled it

Sadly, I see differences between the Xilman and Ralf Recker outputs.

The executable passes the test.

What represents the (needed) parameter N in the command line? All I can see is that it has to do with the xfin, zfin and xunif parameters, and should be odd...

I also tried ./gpu_ecm 9699691 11000 -n 1 <in where in contains the number 65798732165875434667. I got the factor 347 that is not a factor of the number in input...

To testify my good will:

Code:
./gpu_ecm 9699691 11000 -n 1 <in
#Compiled for a NVIDIA GPU with compute capability 1.3.
#Will use device 0 : GeForce GTX 275, compute capability 1.3, 30 MPs.
#gpu_ecm launched with :
N=9699691
B1=11000
curves=1
firstsigma=11
#used seed 1329332970 to generate sigma

#Begin GPU computation...
#All kernels launched, waiting for results...
#All kernels finished, analysing results...
#Looking for factors for the curves with sigma=11
  xfin=3111202
  zfin=7720056
  #Factor found : 347 (with z)
#Results : 1 factor found

#Temps gpu : 15.080 init&copy=0.040 computation=15.040
Now, I understand that the program is not "fire ad forget", and I would really, REALLY know more about it, but the interface is not documented, the use of gmp-ecm is different and in the link posted by Jason there is no indication that a README file is present anywhere in the trunk.

Would you mind (now that my hands have been contaminated by bits and compilers) shedding some light to this obscure valley? Even a link explaining what N means in this context would suffice...

Many thanks...

Luigi

P.S. after some more fiddling, I noticed that 347 is a factor of 9699691, so I think I got the meaning of N after all...

With N3 and 448 curves, my GTX275 has the same speed of my Intel I5-750.

Last fiddled with by ET_ on 2012-02-15 at 19:51 Reason: Gee... I shouldn't mess with it when I'm back from work.
ET_ is offline   Reply With Quote
Old 2012-02-15, 19:44   #44
jasonp
Tribal Bullet
 
jasonp's Avatar
 
Oct 2004

353410 Posts
Default

That directory was not the trunk, this is, complete with lots of readme files.
jasonp is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
Running CUDA on non-Nvidia GPUs Rodrigo GPU Computing 3 2016-05-17 05:43
Error in GMP-ECM 6.4.3 and latest svn ATH GMP-ECM 10 2012-07-29 17:15
latest SVN 1677 ATH GMP-ECM 7 2012-01-07 18:34
Has anyone seen my latest treatise? davieddy Lounge 0 2011-01-21 19:29
Latest version? [CZ]Pegas Software 3 2002-08-23 17:05

All times are UTC. The time now is 00:46.

Sun Feb 28 00:46:45 UTC 2021 up 86 days, 20:58, 0 users, load averages: 1.77, 2.15, 2.12

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.