![]() |
![]() |
#34 |
"Ed Hall"
Dec 2009
Adirondack Mtns
22·3·5·59 Posts |
![]()
It took quite a few tries to get a machine with avx512 today, but all went well when I did.
I haven't tried this yet, but hope to soon and wondered what the thoughts are for a setup that uses the GPU branch of ECM for stage 1 and this avx-ecm for stage 2. This may be the exact answer I've been searching for with my Colab ECM-GPU session setup, since prior to this I could only run a single stage 2 thread against all the stage 1 curves via GPU. Thoughts? |
![]() |
![]() |
![]() |
#35 |
"Ben"
Feb 2007
64418 Posts |
![]()
A GPU is no doubt the most efficient approach for stg1 if you can get one. This program is probably a good option for stg2 but keep in mind:
1) you will have to run more curves, not quite 2x more but probably close to that. Gmp-ecm with -v is your friend. 2) avx-ecm does not yet have the ability to resume curves :) |
![]() |
![]() |
![]() |
#36 | |
"Ed Hall"
Dec 2009
Adirondack Mtns
22·3·5·59 Posts |
![]() Quote:
Thanks for all your work. |
|
![]() |
![]() |
![]() |
#37 |
Aug 2006
135118 Posts |
![]() |
![]() |
![]() |
![]() |
#38 |
Mar 2019
11×13 Posts |
![]()
I notice that with large inputs, say 2^8192+1, GMP-ECM (with GWNUM) is still significantly faster at least on my system (Xeon Gold 6154 @ 3GHz).
For example, with this input and B1=1e6, GMP-ECM curves take about 20-30 seconds each: Code:
Using gwnum_ecmStage1(1, 2, 8192, 1, 1000000, 1) Step 1 took 19146ms Estimated memory usage: 203.12MB Initializing tables of differences for F took 6ms Computing roots of F took 186ms Building F from its roots took 393ms Computing 1/F took 176ms Initializing table of differences for G took 32ms Computing roots of G took 142ms Building G from its roots took 388ms Computing roots of G took 146ms Building G from its roots took 388ms Computing G * H took 105ms Reducing G * H mod F took 156ms Computing roots of G took 142ms Building G from its roots took 389ms Computing G * H took 105ms Reducing G * H mod F took 155ms Computing roots of G took 146ms Building G from its roots took 388ms Computing G * H took 106ms Reducing G * H mod F took 156ms Computing roots of G took 146ms Building G from its roots took 388ms Computing G * H took 105ms Reducing G * H mod F took 155ms Computing roots of G took 143ms Building G from its roots took 387ms Computing G * H took 104ms Reducing G * H mod F took 155ms Computing polyeval(F,G) took 911ms Computing product of all F(g_i) took 64ms Step 2 took 6291ms ********** Factor found in step 2: 3603109844542291969 Found prime factor of 19 digits: 3603109844542291969 Composite cofactor ((2^8192+1)/2710954639361)/3603109844542291969 has 2436 digits |
![]() |
![]() |
![]() |
#39 | |
"Ed Hall"
Dec 2009
Adirondack Mtns
22×3×5×59 Posts |
![]() Quote:
|
|
![]() |
![]() |
![]() |
#40 | |
"Ben"
Feb 2007
3,361 Posts |
![]() Quote:
1) No fair using gwnum 2) You are running avx-ecm with 4 threads, which is 32 curves, so whatever the time ends up being you will have to divide by 32 for a fair comparison. But you are correct that avx-ecm will become less and less competitive as the input size goes up, because I do not have any sub-quadratic methods implemented. You are also correct that GMP-ECM will benefit significantly on numbers of special form (like the one you picked). I'm not advocating that avx-ecm replace gmp-ecm, far from it! It is just another tool that may be useful in some situations. Long term, it probably makes the most sense to merge avx-ecm into gmp-ecm somehow. Last fiddled with by bsquared on 2020-01-05 at 16:51 |
|
![]() |
![]() |
![]() |
#41 | |
"Ben"
Feb 2007
3,361 Posts |
![]() Quote:
So the question what is the most efficient possible path depends on the size and form of the number. |
|
![]() |
![]() |
![]() |
#42 | |
Sep 2009
111101111002 Posts |
![]() Quote:
Eg: Code:
rm $NAME.save ecm -gpu -save $NAME.save $B1 1 <$INI | tee -a $LOG split -nr/2 $NAME.save $NAME.save wait # If you are in a loop ecm -resume $NAME.saveaa $B1 | tee -a $LOG.1 | grep [Ff]actor & ecm -resume $NAME.saveaa $B1 | tee -a $LOG.1 | grep [Ff]actor & # This would be the end of the loop wait # Until all stage 2's end $INI is name of file with the number in it. $B1 you can probably guess. $LOG is the name of the log files. If you want to do more curves than the GPU does in 1 go then put the code into a loop. This is a simplified version of my code, which has to allow for the CPU being faster doing stage 2 than the GPU doing stage 1. But I've not tested this. See ecm's README file for details of what the parameters do. And README.gpu. Chris |
|
![]() |
![]() |
![]() |
#43 | |
If I May
"Chris Halsall"
Sep 2002
Barbados
2·3·1,567 Posts |
![]() Quote:
I've already "burnt" my disposable Kaggle account, so I've been hesitant to experiment with CPU usage there with my main / development account, but CPU-only Kaggle instances give you two real cores; hyperthreaded to four. FWIW. |
|
![]() |
![]() |
![]() |
#44 | |
"Ben"
Feb 2007
3,361 Posts |
![]() Quote:
I was right, it crashed ![]() While I was fixing the bug I realized that I was using Montgomery arithmetic for this input that would be much better off using a special Mersenne reduction. So I implemented that feature, and now AVX-ECM is about 4.5 times faster than GMP-ECM on 2^1277-1. GMP-ECM: Code:
B1 Time, 1 curve, seconds 7e5 5.67 7e6 57.2 7e7 581 ... 7e9 59448 (extrapolated) Code:
B1 Time, 8 curves, seconds 7e5 10.5 7e6 101.1 7e7 1019 ... 7e9 102351 (actual) I've checked the latest code into r393 if anyone wants to do some testing. You will need a avx512 capable cpu. Last fiddled with by bsquared on 2020-10-30 at 21:00 Reason: specify input, and requirements |
|
![]() |
![]() |