mersenneforum.org A Python Diver for GMP-ECM...
 Register FAQ Search Today's Posts Mark Forums Read

2016-06-10, 05:38   #89
cgy606

Feb 2012

32×7 Posts

Quote:
 Originally Posted by WraithX Recently I noticed that the ecm.py script would crash if it was given factorial or primorial input strings. This is because the python "eval" function can't handle these characters. So, instead of writing my own equation parser to figure out how many digits are in these input strings, I've just grabbed the output from ecm.exe to see how many digits it reports are in the input number. So, Announcing ecm.py v0.35: Code: Fixed: - ecm.py no longer calculates number of digits on its own, it reads this information from the ecm executable. This fixed a problem where the python "eval" function would crash when it encountered factorial or primorial characters. ie, you can now do: echo "140!+1" | python ecm.py 1e6 and it will work correctly, without crashing. - also fixed the output when the ecm binary is not found. It will no longer print out the misleading "ECM_BIN_PATH", it will print out "ECM_PATH" to match the variable name in the python code.
Great, I have been factoring factorial +/- numbers.

2016-06-10, 06:06   #90
cgy606

Feb 2012

32·7 Posts

Quote:
So some food for thought on how to proceed. Clearly we need not be concerned about case '1' as in principal that is already implemented... we check the factor found and the cofactor for primality, if they both pass, kill the script and drink a beer.

Case '2' is the most fruitful of our efforts. The 'best' way to proceed IMHO is to kill all the threads running (assuming that the script starts and stops at roughly the same time at the start of each curve, the way that yafu works), test the primality of each factor. If the larger one is composite (WLOG let us assume that the smaller factor is the one found), then the script determines how many curves have been completed (at the current B1/B2 values), calculates how many curves remain in order to complete the original input, and then reschedules the remain number of curves given the number of threads being used. To illustrate:

Factoring C170 B1 = 3e6 B2 =### threads = 4 total curves remaining = 2352
factor found prp37 (curve 221 thread = 1 sigma = ###)
composite cofactor C134

Factoring C134 B1 = 3e6 B2 =### threads = 4 total curves = 1468

I think you get the idea...

Case '3' is nothing more than a transpose of case '2'. We found a "small" composite factor and a large probable prime factor. In principal we could reschedule the curves like we did in case '2' but their is probably a better way to factor this number, which I will explain in case '4'

For case '4' we find 2 composite cofactors. Let's assume for the sake of argument that one is larger (i.e. more decimal digits than the other). We could continue factoring that one in the same fashion as we did in case '2', but let's turn our attention to the smaller one. What does it mean when ecm finds two composite cofactors? Usually it means that smaller factors were not eliminated from the beginning (i.e. with some other factoring method like trial division or rho P+1/P-1) and thus B1 was selected so high that in a single curve, it effectively found 2 factors and not one (we would like to claim this was intentional but no one would believe this statement)! Anyways, if it finds a factor of N digits, then the smallest cofactor of this composite number can have at most ~N/2 digits. But how large is this composite cofactor of the original number we are factoring expected to be? Well, the current ecm record is 82 digits (I think). For the sake of argument, let's be a little conservative and assume that somebody out their runs a curve at B1 = 25e9 (or something crazy like that) on a C300 number and finds a C90 and a C210 (lucky!!!). Clearly, one of the C90 factors should have been found be about 5k curves at B1 = 11e6 (on average of course). In principal we could run ecm on the C90 until B1 = 11e6 bounds or we could let SIQS or some other factoring algorithm hack at it (perhaps even trial division given that maybe even smaller factors were not eliminated from the C300 to begin with). Anyways, the story I am trying to paint here is that if two composites are found, it basically means that a very large B1 bound was selected while at the same time small factors were not eliminated. Given that this cofactor is not large (less than 90 digits or so), we should focus on the larger (and more important cofactor to continuing factoring on) and reschedule the remaining curves for that guy analogous to case '2'.

I hope this makes sense...

2016-06-18, 20:25   #91
WraithX

Mar 2006

2·35 Posts
Announcing ecm.py v0.36...

Announcing ecm.py v0.36:
Code:
New Feature:
- Added the ability for ecm.py to perform the remaining number of requested curves on any composite factors found.
(You can activate this by setting "find_one_factor_and_stop = 0", it is 1 by default)
I've added the ability for ecm.py to continue working on any composite factors found, it will perform the remaining number of requested curves. I've run quite a few tests locally and it seems to work well. However, if you do run into any problems, please let me know.
Attached Files
 ecm-py_v0.36.zip (16.2 KB, 166 views)

2016-07-10, 04:24   #92
WraithX

Mar 2006

2·35 Posts
Announcing ecm.py v0.38...

Announcing ecm.py v0.38:
Code:
New feature:
- Added the ability for ecm.py to resume a GMP-ECM (compatible) save file, and
it will evenly distribute the resume lines across several instances of GMP-ECM
Calling this can be as simple as:
ecm.py -resume resume.txt

Or you can use additional options, like:
Code:
ecm.py -threads 3 -out output.txt -maxmem 300 -pollfiles 60 -resume resume.txt
------------------------------------------------------------------------------
Which would spread the resume lines from resume.txt across 3 instances of gmp-ecm,
and give each one the command line option "-maxmem 100"  ( = 300/3)
and poll the output files every 60 seconds to look for factors, or see if a gmp-ecm instance has finished
and save all gmp-ecm output to the file output.txt
* Like always, you can specify the "threads" and "pollfiles" options inside the script
Here is a description of this new feature, which can also be found in the script:
Code:
# If we are using the "-resume" feature of gmp-ecm, we will make some assumptions about the job...
# 1) This is designed to be a _simple_ way to speed up resuming ecm by running several resume jobs in parallel.
#      ie, we will not try to replicate all resume capabilities of gmp-ecm
# 2) If we find identical lines in our resume file, we will only resume one of them and skip the others
#      - If this happens, we will print out a notice to the user (if VERBOSE >= v_normal) so they know what is going on
# 3) We will use the B1 value in the resume file, and not resume with higher values of B1
# 4) We will let gmp-ecm determine which B2 value to use, which can be affected by "-maxmem" and "-k"
# 5) We will try to split up the resume work evenly between the threads.
#     - We will put total/num_threads resume lines into each file, and total%num_threads files will each get one extra line.
#      At the end of a job or when restarting a job, we will write any completed resume lines out to a "finished file"
#      This "finished file" will be used to help us keep track of work done, in case we are interrupted and need to (re)resume later
#      We will query the output files once every poll_file_delay seconds.
#    resume_job_<filename>_inp_t00.txt # input resume file for use by gmp-ecm in thread 0
#    resume_job_<filename>_inp_t01.txt # input resume file for use by gmp-ecm in thread 1
#    ...etc...
#    resume_job_<filename>_out_t00.txt # output file for resume job of gmp-ecm in thread 0
#    resume_job_<filename>_out_t01.txt # output file for resume job of gmp-ecm in thread 1
#    ...etc...
#    resume_job_<filename>_finished.txt # file where we write out each resume line that we have finished with gmp-ecm
#    where <filename> is based on the resume file name, but with any "." characters replaced by a dash.
I know this skips over v0.37. I have created a version 0.37 with similar functionality, but it put each resume line into its own file (one at a time, not all at once) and would give that input file to gmp-ecm to resume, and save the output to another file. Once that resume line was finished processing, it would delete both the input and output files, and then move on to the next resume line. So, if a resume file had 1000 lines to resume, then the script would created/delete 1000 input files and 1000 output files. I didn't want to tax any filesystems by creating/deleting so many files, so I rewrote it as detailed above.
Attached Files
 ecm-py_v0.38.zip (22.7 KB, 168 views)

 2016-07-10, 05:29 #93 wombatman I moo ablest echo power!     May 2013 3·5·7·17 Posts This is AWESOME. Thanks for making this update.
 2016-07-14, 12:45 #94 swellman     Jun 2012 24×7×29 Posts +1 Fantastic functionality. Love it!
 2016-08-03, 06:14 #95 UBR47K     Aug 2015 4416 Posts Is there anyway to specify B1 value when using the "-resume" switch? I'd like to use GMP-ECM for stage 2 with Prime95 stage 1 results.txt
 2016-08-05, 23:46 #96 cgy606   Feb 2012 32×7 Posts I tried running the script on a resume file produced from a gpu stage 1 run. I am getting the following error: python ecm.py -threads 8 -resume gpu.save -> ___________________________________________________________________ -> | Running ecm.py, a Python driver for distributing GMP-ECM work | -> | on a single machine. It is copyright, 2011-2016, David Cleaver | -> | and is a conversion of factmsieve.py that is Copyright, 2010, | -> | Brian Gladman. Version 0.38 (Python 2.6 or later) 7th Jul 2016 | -> |_________________________________________________________________| -> Resuming work from resume file: gpu.save -> Spreading the work across 8 thread(s) ->============================================================================= -> Working on the number(s) in the resume file: gpu.save -> Using up to 8 instances of GMP-ECM... -> Found 1024 unique resume lines to work on. -> Will start working on the 1024 resume lines. Traceback (most recent call last): File "ecm.py", line 2393, in parse_ecm_options(sys.argv, set_args = True, first = True) File "ecm.py", line 2235, in parse_ecm_options run_ecm_resume_job() File "ecm.py", line 1850, in run_ecm_resume_job threadList = [[i, '', 0, '', '', [], False] for i in xrange(intNumThreads)] NameError: name 'xrange' is not defined Any ideas about what is going wrong?
 2016-08-06, 00:19 #97 VBCurtis     "Curtis" Feb 2005 Riverside, CA 507110 Posts Looks like you didn't give B1 or B2 parameters to ecm.py. When I do stage 2 from a GPU'ed stage 1, I put on the command line the same B1 value I ran Stage 1 on (note you can put a higher one here, and it'll use the CPU to extend B1 before starting stage 2).
2016-08-06, 00:28   #98
cgy606

Feb 2012

32×7 Posts

Quote:
 Originally Posted by VBCurtis Looks like you didn't give B1 or B2 parameters to ecm.py. When I do stage 2 from a GPU'ed stage 1, I put on the command line the same B1 value I ran Stage 1 on (note you can put a higher one here, and it'll use the CPU to extend B1 before starting stage 2).
The command line input that the ecm.py creator posted didn't indicate a B1 or B2 value. I tried it byadding the B1 and B2 values at the end of the command line, no effect:

python ecm.py -threads 8 -resume gpu.save 11e6 35133391030
-> ___________________________________________________________________
-> | Running ecm.py, a Python driver for distributing GMP-ECM work |
-> | on a single machine. It is copyright, 2011-2016, David Cleaver |
-> | and is a conversion of factmsieve.py that is Copyright, 2010, |
-> | Brian Gladman. Version 0.38 (Python 2.6 or later) 7th Jul 2016 |
-> |_________________________________________________________________|

-> Resuming work from resume file: gpu.save
->=============================================================================
-> Working on the number(s) in the resume file: gpu.save
-> Using up to 8 instances of GMP-ECM...
-> Found 1024 unique resume lines to work on.
-> Will start working on the 1024 resume lines.
Traceback (most recent call last):
File "ecm.py", line 2393, in <module>
parse_ecm_options(sys.argv, set_args = True, first = True)
File "ecm.py", line 2235, in parse_ecm_options
run_ecm_resume_job()
File "ecm.py", line 1850, in run_ecm_resume_job
threadList = [[i, '', 0, '', '', [], False] for i in xrange(intNumThreads)]
NameError: name 'xrange' is not defined

Last fiddled with by cgy606 on 2016-08-06 at 00:29

 2016-08-06, 01:18 #99 UBR47K     Aug 2015 22×17 Posts Try running with python2. That error happens when you try to run the script with python 3

 Similar Threads Thread Thread Starter Forum Replies Last Post kelzo Programming 3 2016-11-27 05:16 daxmick Programming 2 2014-02-10 01:45 Xyzzy Programming 20 2009-09-08 15:51 yqiang GMP-ECM 2 2007-04-22 00:14 a216vcti Programming 7 2005-10-30 00:37

All times are UTC. The time now is 04:07.

Sat Dec 4 04:07:51 UTC 2021 up 133 days, 22:36, 0 users, load averages: 0.86, 1.04, 1.09