20200104, 17:57  #34 
"Ed Hall"
Dec 2009
Adirondack Mtns
111001100011_{2} Posts 
It took quite a few tries to get a machine with avx512 today, but all went well when I did.
I haven't tried this yet, but hope to soon and wondered what the thoughts are for a setup that uses the GPU branch of ECM for stage 1 and this avxecm for stage 2. This may be the exact answer I've been searching for with my Colab ECMGPU session setup, since prior to this I could only run a single stage 2 thread against all the stage 1 curves via GPU. Thoughts? 
20200104, 20:47  #35 
"Ben"
Feb 2007
41·83 Posts 
A GPU is no doubt the most efficient approach for stg1 if you can get one. This program is probably a good option for stg2 but keep in mind:
1) you will have to run more curves, not quite 2x more but probably close to that. Gmpecm with v is your friend. 2) avxecm does not yet have the ability to resume curves :) 
20200105, 04:00  #36  
"Ed Hall"
Dec 2009
Adirondack Mtns
29×127 Posts 
Quote:
Thanks for all your work. 

20200105, 04:11  #37 
Aug 2006
3^{2}·5·7·19 Posts 

20200105, 15:38  #38 
Mar 2019
149 Posts 
I notice that with large inputs, say 2^8192+1, GMPECM (with GWNUM) is still significantly faster at least on my system (Xeon Gold 6154 @ 3GHz).
For example, with this input and B1=1e6, GMPECM curves take about 2030 seconds each: Code:
Using gwnum_ecmStage1(1, 2, 8192, 1, 1000000, 1) Step 1 took 19146ms Estimated memory usage: 203.12MB Initializing tables of differences for F took 6ms Computing roots of F took 186ms Building F from its roots took 393ms Computing 1/F took 176ms Initializing table of differences for G took 32ms Computing roots of G took 142ms Building G from its roots took 388ms Computing roots of G took 146ms Building G from its roots took 388ms Computing G * H took 105ms Reducing G * H mod F took 156ms Computing roots of G took 142ms Building G from its roots took 389ms Computing G * H took 105ms Reducing G * H mod F took 155ms Computing roots of G took 146ms Building G from its roots took 388ms Computing G * H took 106ms Reducing G * H mod F took 156ms Computing roots of G took 146ms Building G from its roots took 388ms Computing G * H took 105ms Reducing G * H mod F took 155ms Computing roots of G took 143ms Building G from its roots took 387ms Computing G * H took 104ms Reducing G * H mod F took 155ms Computing polyeval(F,G) took 911ms Computing product of all F(g_i) took 64ms Step 2 took 6291ms ********** Factor found in step 2: 3603109844542291969 Found prime factor of 19 digits: 3603109844542291969 Composite cofactor ((2^8192+1)/2710954639361)/3603109844542291969 has 2436 digits 
20200105, 16:19  #39  
"Ed Hall"
Dec 2009
Adirondack Mtns
111001100011_{2} Posts 
Quote:


20200105, 16:41  #40  
"Ben"
Feb 2007
41·83 Posts 
Quote:
1) No fair using gwnum 2) You are running avxecm with 4 threads, which is 32 curves, so whatever the time ends up being you will have to divide by 32 for a fair comparison. But you are correct that avxecm will become less and less competitive as the input size goes up, because I do not have any subquadratic methods implemented. You are also correct that GMPECM will benefit significantly on numbers of special form (like the one you picked). I'm not advocating that avxecm replace gmpecm, far from it! It is just another tool that may be useful in some situations. Long term, it probably makes the most sense to merge avxecm into gmpecm somehow. Last fiddled with by bsquared on 20200105 at 16:51 

20200105, 16:51  #41  
"Ben"
Feb 2007
41·83 Posts 
Quote:
So the question what is the most efficient possible path depends on the size and form of the number. 

20200105, 17:29  #42  
Sep 2009
3^{4}·5^{2} Posts 
Quote:
Eg: Code:
rm $NAME.save ecm gpu save $NAME.save $B1 1 <$INI  tee a $LOG split nr/2 $NAME.save $NAME.save wait # If you are in a loop ecm resume $NAME.saveaa $B1  tee a $LOG.1  grep [Ff]actor & ecm resume $NAME.saveaa $B1  tee a $LOG.1  grep [Ff]actor & # This would be the end of the loop wait # Until all stage 2's end $INI is name of file with the number in it. $B1 you can probably guess. $LOG is the name of the log files. If you want to do more curves than the GPU does in 1 go then put the code into a loop. This is a simplified version of my code, which has to allow for the CPU being faster doing stage 2 than the GPU doing stage 1. But I've not tested this. See ecm's README file for details of what the parameters do. And README.gpu. Chris 

20200105, 18:01  #43  
If I May
"Chris Halsall"
Sep 2002
Barbados
3·5^{2}·127 Posts 
Quote:
I've already "burnt" my disposable Kaggle account, so I've been hesitant to experiment with CPU usage there with my main / development account, but CPUonly Kaggle instances give you two real cores; hyperthreaded to four. FWIW. 

20201030, 20:55  #44  
"Ben"
Feb 2007
D4B_{16} Posts 
Quote:
I was right, it crashed . While I was fixing the bug I realized that I was using Montgomery arithmetic for this input that would be much better off using a special Mersenne reduction. So I implemented that feature, and now AVXECM is about 4.5 times faster than GMPECM on 2^12771. GMPECM: Code:
B1 Time, 1 curve, seconds 7e5 5.67 7e6 57.2 7e7 581 ... 7e9 59448 (extrapolated) Code:
B1 Time, 8 curves, seconds 7e5 10.5 7e6 101.1 7e7 1019 ... 7e9 102351 (actual) I've checked the latest code into r393 if anyone wants to do some testing. You will need a avx512 capable cpu. Last fiddled with by bsquared on 20201030 at 21:00 Reason: specify input, and requirements 
