 Do they need to be B1 power-smooth or something similar? Maybe there is a limit on exponent? What language is that code? If it is python 2 then I have a few things undefined starting with ZZ. What tweaks would be needed to the code for the other parameterizations? Non-gpu code uses parameterization 1 currently I think.
2020-01-17, 12:00   #13
SethTro

"Seth"
Apr 2019

2·5·41 Posts

Quote:
 Originally Posted by henryzz Do they need to be B1 power-smooth or something similar? Maybe there is a limit on exponent? What language is that code? If it is python 2 then I have a few things undefined starting with ZZ. What tweaks would be needed to the code for the other parameterizations? Non-gpu code uses parameterization 1 currently I think.
Yes they need to be B1 power-smooth.

The code is sage.

Heres' the parameterization for param0, given by paul zimmerman in https://homepages.cwi.nl/~herman/Zimmermann.pdf

Code:
def GroupOrderParam0(p, sigma):
K = GF(p)
v = (4*sigma) * K.gen()
u = (sigma^2 - 5) * K.gen()
x = u^3
b = 4*x*v
a = (v-u)^3*(3*u+v)
A = a/b-2
x = x/v^3
b = x^3 + A*x^2 + x
return factor(EllipticCurve([0,b*A,0,b^2,0]).order())

 2020-01-17, 15:53 #14 Nooks   Jul 2018 19 Posts GPU-enabled ECM has been very useful as I work my way through https://stdkmd.net/nrr/3/37771.htm ; being able to bring that to bear on composites of more than 300 digits would be quite helpful.
2020-01-18, 15:23   #15
EdH

"Ed Hall"
Dec 2009

2×11×191 Posts

Quote:
 Originally Posted by chris2be8 . . . Of course being able to do QS (or even better lattice sieving) on a GPU would be *very* nice. . . .
Have you seen my thread on GPU-GLS?

There are no instructions for use, but some hints were provided in the thread and the source code has some further command info. I got a little ways playing with it using Colab, but haven't pursued much further?

 2020-01-18, 17:14 #16 chris2be8     Sep 2009 24·139 Posts GPU-GLS would be nice, but the slowest step in the size range I'm working on is sieving. And has anyone actually got it working? Chris
 2020-01-21, 02:54 #17 SethTro     "Seth" Apr 2019 2·5·41 Posts I finished a script to generate random primes and verify the results from GPU https://github.com/sethtroisi/gmp-ecm/pull/1 The end result is that changing ECM_GPU_NB_DIGITS doesn't seem to change first stage results (this could have maybe been more easily verified by just generating two save files and comparing but what you gonna do).
2020-01-21, 10:20   #18
SethTro

"Seth"
Apr 2019

41010 Posts

Quote:
 Originally Posted by VBCurtis Start around post 160 of the GPU-ECM thread: https://mersenneforum.org/showthread.php?t=16480&page=4
I found that this can causes bad results

Code:
make clean
make
# Modify ECM_GPU_NB_DIGITS
make # without make clean
run_tests.sh
maybe a quick self-check at startup would be good.

 2020-01-21, 12:11 #19 storm5510 Random Account     Aug 2009 7D916 Posts The last time I checked, the GPU ceiling for GMP-ECM was something like 2^1018, or similar. I stopped using GMP-ECM completely because its results format was not compatible with the Primenet server. Simply put, there was no way to submit results. As for ECM's themselves, I keep them low by leaving the RAM settings in Prime95 at their defaults. The largest I ever see is less then 2^50000. I hammered on the infamous M1277 for about a month. I ran stage 1 on Prime95 and stage 2 on GMP-ECM.
 2020-02-08, 16:55 #20 chris2be8     Sep 2009 24·139 Posts I've managed to get ecm gpu using 512 bits working (ecm-gpu.orig contains ecm as downloaded): Code: chris@sirius:~$cp -rp ecm-gpu.orig ecm-gpu.512 chris@sirius:~$ cd ecm-gpu.512/trunk/ chris@sirius:~/ecm-gpu.512/trunk$vi ecm-gpu.h chris@sirius:~/ecm-gpu.512/trunk$ diff ecm-gpu.h ~/ecm-gpu/trunk/ecm-gpu.h 11c11 < #define ECM_GPU_NB_DIGITS 16 //by default --- > #define ECM_GPU_NB_DIGITS 32 //by default chris@sirius:~/ecm-gpu.512/trunk$autoreconf -si chris@sirius:~/ecm-gpu.512/trunk$ ./configure --enable-gpu=sm30 chris@sirius:~/ecm-gpu.512/trunk$make make check produced errors though, due to trying to process numbers over 506 bits. I managed to patch test.gpuecm to bypass them so it seems to work. But I can't be sure that fully tested it. And it's about 3 times faster: Code: `chris@sirius:~/ecm-gpu.512/trunk$ time /home/chris/ecm-gpu.512/trunk/ecm -gpu -save test512.save 25e4 1
 2020-02-09, 02:35 #21 SethTro     "Seth" Apr 2019 2×5×41 Posts If you check the SVN repo I added check_gpuecm.sage which tests finding primes vs theoretical results (it generates a bunch of small primes that can be found with B1=x, then checks that they get found correctly) I'm on mobile but can write more later
2021-09-10, 03:54   #22
SethTro

"Seth"
Apr 2019

2×5×41 Posts

Quote:
 Originally Posted by R.D. Silverman Interesting question. I guess [based on what is publicly reported] that it is under 1024 bits. One still must run stage 2. Even if GPU stage 1 took zero time the net result only cuts the total time in half after running stage 2.
I finished some new code today to measure what the speedup from adding a GPU is. In the extreme case where you use just one GPU and one CPU core.

All of these are equivalent to one t35 (using -param3)
1,116 curves at B1=1e6, B2=1e9 (traditional)
747 curves at B1=1.3e6, B2=2.86e9 (equal time for both stages)
1850 curves at B1=1.9e6, B2=28.5e6 (equal time for both stages with a 40x faster stage 1)

Now these take respectively (for the 146 digit input I tested)
1116 * (1082 + 757)ms = 34.2 minutes
747 * (1414 + 1479)ms = 36 minutes
1850 * (2045 + 72)ms = 65 minutes (on CPU)
1850 * (2045/40 + 72)ms = 3.8 minutes (1.5 minutes GPU + 2 minutes CPU)

So now 34.2 minutes for a t35 has been reduced to 3.8 minutes! or a 9x speedup!

In the case that you pair 8 CPUs with a GPU, the stage 1 effective speedup is only 40/8 = 5x and the overall speedup is muted to 3x
from 34.2/8 = 4.3 minutes to 45 seconds GPU + 51 seconds CPU = 1.5 minutes

