Another data point, for numbers between C144 and C29x: C237 is slower on the GPU, but obviously faster on the CPU, than C29x:
Code:
$ echo 472367364481324943429608990380363865230376899949857658144588096283146783114430372207621802600829155058766951167631153619328587819346877117165453306904995816614534365740792256712736351604580048562330248528078693598071309876495244264859329  ./gpu_ecm vv n 64 save 80009_248_3e6_1 3000000 #Compiled for a NVIDIA GPU with compute capability 1.3. #Will use device 0 : GeForce GT 540M, compute capability 2.1, 2 MPs. #s has 4328086 bits Precomputation of s took 0.256s Input number is 472367364481324943429608990380363865230376899949857658144588096283146783114430372207621802600829155058766951167631153619328587819346877117165453306904995816614534365740792256712736351604580048562330248528078693598071309876495244264859329 (237 digits) Using B1=3000000, firstinvd=563947071, with 64 curves [snip] gpu_ecm took : 1637.614s (0.000+1637.610+0.004) Throughput : 0.039 $ echo 472367364481324943429608990380363865230376899949857658144588096283146783114430372207621802600829155058766951167631153619328587819346877117165453306904995816614534365740792256712736351604580048562330248528078693598071309876495244264859329  ./ecm c 1 3000000 bash: ./ecm: Aucun fichier ou dossier de ce type debrouxl@asus2:~/ecm/gpu/gpu_ecm$ echo 472367364481324943429608990380363865230376899949857658144588096283146783114430372207621802600829155058766951167631153619328587819346877117165453306904995816614534365740792256712736351604580048562330248528078693598071309876495244264859329  ecm c 1 3000000 GMPECM 6.5dev [configured with GMP 5.0.90, enableasmredc, enableassert] [ECM] Input number is 472367364481324943429608990380363865230376899949857658144588096283146783114430372207621802600829155058766951167631153619328587819346877117165453306904995816614534365740792256712736351604580048562330248528078693598071309876495244264859329 (237 digits) Using B1=3000000, B2=5706890290, polynomial Dickson(6), sigma=379651352 Step 1 took 42974ms Step 2 took 12981ms Quote:


Quote:


Since my first experiments, I've been playing with a version which uses 512bit arithmetic (fudged with CFLAGS+=DNB_DIGITS=16 in the relevant line of Makefile). As expected, ECM runs around 3 times faster on ~500 bit numbers with this change.
One of the things on my todo list is to add greater flexibility to the choice of bignum sizes. Experiments with both 1024 and 512bit arithmetic indicate that running more than the default number of curves is a Good Thing, presumably by hiding memory latency. The downside, of course, is that the display stays rather sluggish for a proportionately long time. I'm trying to estimate how long a run will take and then kick it off overnight when display latency is likely to be unimportant. Paul 
I added a percent complete counter in the for loop launching the kernels in cudautil.cu. I don't think adding an ETA would be difficult.

Quote:
However, allow me to point out that when I present a similar attitude toward the learning of the algorithms discussed herein and the mathematics behind them, I am lambasted for my efforts. Participants should be willing to put in the effort or they should leave. 

Quote:
Much of the mathematics discussed here is not at the bleeding edge, IMO. It is closer in spirit to ofttimes cranky but nonetheless well understood and supported applications such as mainstream gmpecm. IMO, your diatribes against those wishing to perform bleeding edge mathematics are fully justified. They are less appropriate, again IMO, further away from the bleeding edge. I hope I would never feel the urge to issue my earlier warnings to those who only wish to use gmpecm and are confused by its jargon and multitudinous options. 

Quote:
Indeed. I have even heard one of the people (whom I hold in contempt) admit that he does not even know how to use a compiler. Quote:
not understand things even at that level. Nor do they seem willing to make the attempt. They don't even understand mathematics that was known 150+ years ago. Nor do they want to make the effort. 

Quote:
Out of the box (well, my box anyway) the default build appears to use parameters suitable for a CC1.3 system, despite there being a Fermi card installed. A run on a C302 with these parameters chooses 112 curves arranged 32x16 x 7x1x1 and takes 3845.428 seconds. Rebuilding with "make cc=2" and rerunning took 5539.049 seconds for 224 curves arranged 32x32 x 7x1x1. The ratio (224/112) * (3845.428 / 5539.049) is 1.388. I suggest a 39% speedup is worth having. 

A few quick tests with a small B1 value
CC 2.0 card (GTX 470, stock clocks), 512 bit arithmetic, CUDA SDK 4.0. The c151 was taken from the Aliquot sequence 890460:i898
Code:
ralf@quadriga:~/dev/gpu_ecm$ LD_LIBRARY_PATH=/usr/local/cuda/lib64/ ./gpu_ecm d 0 save c151.save 250000 < c151 Precomputation of s took 0.004s Input number is 4355109846524047003246531292211765742521128216321735054909228664961069056051308281896789359834792526662067203883345116753066761522281210568477760081509 (151 digits) Using B1=250000, firstinvd=24351435, with 448 curves gpu_ecm took : 116.363s (0.000+116.355+0.008) Throughput : 3.850 Code:
ralf@quadriga:~/dev/gpu_ecm$ LD_LIBRARY_PATH=/usr/local/cuda/lib64/ ./gpu_ecm d 0 n 896 save c151.save 250000 < c151 Precomputation of s took 0.004s Input number is 4355109846524047003246531292211765742521128216321735054909228664961069056051308281896789359834792526662067203883345116753066761522281210568477760081509 (151 digits) Using B1=250000, firstinvd=1471710578, with 896 curves gpu_ecm took : 179.747s (0.000+179.731+0.016) Throughput : 4.985 Code:
ralf@quadriga:~/dev/gpu_ecm$ LD_LIBRARY_PATH=/usr/local/cuda/lib64/ ./gpu_ecm d 0 n 864 save c151.save 250000 < c151 Precomputation of s took 0.004s Input number is 4355109846524047003246531292211765742521128216321735054909228664961069056051308281896789359834792526662067203883345116753066761522281210568477760081509 (151 digits) Using B1=250000, firstinvd=1374804691, with 864 curves gpu_ecm took : 130.964s (0.000+130.948+0.016) Throughput : 6.597 Code:
224 curves  Throughput : 2.289 416 curves  Throughput : 4.223 448 curves  Throughput : 4.547 480 curves  Throughput : 3.039 672 curves  Throughput : 4.233 896 curves  Throughput : 4.638 1792 curves  Throughput : 4.753 Last fiddled with by Ralf Recker on 20120214 at 22:36 Reason: Caption, CC 2.1 results 
gpu_ecm ready to work
OK, I downloaded the source code with cc=1.3, and successfully compiled it
Sadly, I see differences between the Xilman and Ralf Recker outputs. The executable passes the test. What represents the (needed) parameter N in the command line? All I can see is that it has to do with the xfin, zfin and xunif parameters, and should be odd... I also tried ./gpu_ecm 9699691 11000 n 1 <in where in contains the number 65798732165875434667. I got the factor 347 that is not a factor of the number in input... To testify my good will: Code:
./gpu_ecm 9699691 11000 n 1 <in #Compiled for a NVIDIA GPU with compute capability 1.3. #Will use device 0 : GeForce GTX 275, compute capability 1.3, 30 MPs. #gpu_ecm launched with : N=9699691 B1=11000 curves=1 firstsigma=11 #used seed 1329332970 to generate sigma #Begin GPU computation... #All kernels launched, waiting for results... #All kernels finished, analysing results... #Looking for factors for the curves with sigma=11 xfin=3111202 zfin=7720056 #Factor found : 347 (with z) #Results : 1 factor found #Temps gpu : 15.080 init©=0.040 computation=15.040 Would you mind (now that my hands have been contaminated by bits and compilers) shedding some light to this obscure valley? Even a link explaining what N means in this context would suffice... Many thanks... Luigi P.S. after some more fiddling, I noticed that 347 is a factor of 9699691, so I think I got the meaning of N after all... With N3 and 448 curves, my GTX275 has the same speed of my Intel I5750. Last fiddled with by ET_ on 20120215 at 19:51 Reason: Gee... I shouldn't mess with it when I'm back from work. 
