Resurrecting this thread. If anyone is running numbers smaller than C155 they should reach out to me.

(Moderator note: Referenced thread is

here)

My new CGBN enabled code is something like 7x faster

Code:

$ echo "(2^499-1)/20959" | ./ecm -gpu -gpucurves 3584 -sigma 3:1000 20000
Input number is (2^499-1)/20959 (146 digits)
Using B1=20000, B2=3804582, sigma=3:1000-3:4583 (3584 curves)
Computing 3584 Step 1 took 93ms of CPU time / 7258ms of GPU time
Computing 3584 Step 2 on CPU took 71933ms
$$ echo "(2^499-1)/20959" | ./ecm -gpu -cgbn -gpucurves 3584 -sigma 3:1000 20000
Input number is (2^499-1)/20959 (146 digits)
Using B1=20000, B2=3804582, sigma=3:1000-3:4583 (3584 curves)
Computing 3584 Step 1 took 15ms of CPU time / 1019ms of GPU time
Computing 3584 Step 2 on CPU took 72142ms

For numbers smaller than C300 It's generally 2-3x faster

Code:

$ echo "(2^997-1)" | ./ecm -gpu -sigma 3:1000 20000
Input number is (2^997-1) (301 digits)
Using B1=20000, B2=3804582, sigma=3:1000-3:2791 (1792 curves)
Computing 1792 Step 1 took 91ms of CPU time / 3810ms of GPU time
Computing 1792 Step 2 on CPU took 83417ms
$ echo "(2^997-1)" | ./ecm -gpu -cgbn -sigma 3:1000 20000
Input number is (2^997-1) (301 digits)
Using B1=20000, B2=3804582, sigma=3:1000-3:2791 (1792 curves)
Computing 1792 Step 1 took 15ms of CPU time / 1588ms of GPU time
Computing 1792 Step 2 on CPU took 83521ms

I'm working on the code actively in

https://github.com/sethtroisi/gmp-ec...pu_integration if you are a developer and can possible distribute Linux binaries if we had a place to store them.