20130223, 11:15  #1  
Feb 2013
7 Posts 
General Questions
Hi,
If any of you have some spare time it would be great if you could answer some of my beginner questions :) I've been looking into factoring for the last few days  I'm hoping to eventually build a rig to factor C15x. The way I understand it at the moment is that there are a few steps to doing this. The first is using msieve to generate some "good" polynomials. So to start with I split up msieve and made it distributed so I can run it in Amazon EC2  this seems to work well and I get loads of results. How do I determine what is the best polynomial? e.g. I've read I have to look at the "e" value but are these ok for a C15x? Quote:
Thanks P. 

20130223, 13:52  #2 
Tribal Bullet
Oct 2004
110111010000_{2} Posts 
Those polynomials are pretty good, but you can probably find polynomials that make the sieving run 20% faster if you spent a bit longer looking. Of course if you have EC2 instances waiting to sieve, and the sieving will take at most a few days, then it's better to take what you have and get started. Beyond an illdefined point, the time you save in sieving a better polynomial will be wasted in the search for that better polynomial.
You want the polynomial that sieves the fastest, and that's loosely correlated to the polynomial with the largest Evalue. In practice you have to 'test sieve', i.e. run the sieving tools for maybe 1/1000 of the complete job, and find which of your top 25 polynomials generates the highest relationspersecond rate with all the other sieving parameters held constant. Folks here have a lot of practice doing that, I don't. Again, with EC2 instances waiting it's less important to find the absolute best polynomial, because the job will be so short anyway. Explaining what exactly the number field sieve does is tricky, the references in the Msieve readme do a much better job than a forum post will. Each relation from the sieving is a congruence a == b, and we multiply millions of those congruences together so that the product of the a's and the b's are squares, but the square root of each product is different modulo the number N to be factored. That they are different means that gcd(sqrt(product(a))+sqrt(product(b)), N) is a factor of N about half the time. The quadratic sieve is easier to understand, because each relation has a and b as integers at all times. The number field sieve is both much more powerful and much more difficult to understand, because the 'b' part of each relation is not a number but a 'thing' in an algebraic number field. At the end of the algorithm, before the gcd, we apply a 'trap door' that converts the square root of the product of all the b's from thingworld into a number, and that number is a usable square root. This is more powerful because the size of each 'thing' is smaller and easier to deal with on average, compared to working in numbers all the time. The polynomial you choose determines the number field, and the shape of the trap door, and the size of the things in thingworld. The sieving searches through the polynomials, evaluated at x = c/d for lots of different c and d, looking for polynomial values that factor into small primes. Those become the relations that are raw material for the square roots, and the matrix that is built helps determine which relations participate in the product of the a' and b's. Last fiddled with by jasonp on 20130223 at 14:22 Reason: lots of corrections 
20130223, 17:01  #3 
Sep 2009
977 Posts 
For 512bit semiprimes, I've already seen polynomials with E values above 3.2e12, so you would indeed get a better polynomial by having your CPUs (or preferably, GPUs, which dwarf CPUs in throughput on that task).
Maybe you're trying to factor a C15x with x a bit higher than 4 or 5, in which case E ~2.7e12 is good. With recent processors, one can sieve a 512bit semiprime on 32 cores in less than 4 days wall clock time, and postprocess it in less than half a day. 
20130223, 17:32  #4 
Sep 2009
2,027 Posts 
I suggest you go to http://gilchrist.ca/jeff/factoring/n...ers_guide.html and follow the instructions to factor a 100 digit number. Seeing the whole process through with an easy case will make it easier to understand.
You may want to try 110, 120, 130 and 140 digit numbers before starting on 15x digit numbers. Chris (still a relative beginner) 
20130224, 05:47  #5 
Feb 2013
7 Posts 
Thanks for all the replies, they've made things a lot clearer :) I managed to find E ~3.06e12 for a C159 using mseive  I converted it to a .poly file for use with gnfs. I'm now working on a script to get gnfs distributed over EC2 using spot instances. As it stands, i'm looking at ~16 days on a CoreI7 (8 cores) for the sieving  hopefully once I have it working on EC2 I can bring this down to ~1 hour for under $500 :D

20130224, 06:18  #6 
Basketry That Evening!
"Bunslow the Bold"
Jun 2011
40<A<43 89<O<88
1110000110101_{2} Posts 
Yeesh, that's a lot of money for a single factorization! Depending on your motives, some of us might have (or maybe will) help out with the hardware we already have...
Last fiddled with by Dubslow on 20130224 at 06:19 Reason: s/in/out (first time i've made that mistake...) 
20130226, 08:38  #7 
Feb 2013
7 Posts 
I've used 60 Amazon "Spot" instances (32 core Xeons)  a total of 1920 cores  this bought the second stage down to 2 hours and a total cost of $40. :)

20130226, 09:47  #8 
"Serge"
Mar 2008
Phi(4,2^7658614+1)/2
41×229 Posts 
Can you take the matrix step down to 2 hours, though?

20130226, 18:21  #9 
Sep 2009
3D1_{16} Posts 
Heh, what kind of software setup are you using to distribute jobs on so many cores ?

20130227, 00:01  #10 
Feb 2013
7 Posts 
Im using 2 scripts I wrote, 1 sits on a sever (PHP) and distributes work via an API that a client (Python) requests. I've setup the spot instances so as soon as they boot they request work. The work is basically a set of arbitrary commands that the client runs (32 cores means 32 separate clients)  work is then sent back to the server and it requests more work. Its pretty much just a version of the factmsieve.py script that runs over HTTP  crude but effective.
Costs for the EC2 spot instances (2x16 core xeons) are around 37 us/cents per hour which is quite reasonable. I'm sending the work out in 10 minute chunks (in case the spot instance is terminated  however, this didn't happen at all during the processing)  I've not looked at distributing the later stages of it yet, i'll probably look at that tonight. Once I've got a bunch of scripts that are relatively stable I'll stick them up on github. 
20130227, 07:01  #11  
Sep 2009
977 Posts 
Quote:
Several months ago, I had precisely started making a similar, simple HTTPbased reservation system for distributing the factorization of a 512bit semiprime more easily then setting up a local instance of RSALSderived BOINC (NFS@Home, Tom Ritter's cloudandcontrol) and attaching clients to it. I had the first version of a server (Perl) mostly done: it could handle reservations, submissions (files not uploaded to the server, for the first version) and it kept a log that it used for correctly resuming distribution after the last distributed WU. I never finished the client, because it was, in the end, easy enough to make sh scripts that called gnfslasieve4I14e directly and shuffle the files around through scp / sftp. I had "only" 4042 cores on 8 hosts or so. Quote:
You'll be doing everyone a favor. 

Thread Tools  
Similar Threads  
Thread  Thread Starter  Forum  Replies  Last Post 
Is this solvable in general.  jwaltos  Homework Help  34  20170921 14:12 
General Status???  R.D. Silverman  NFSNET Discussion  4  20070719 18:43 
General formula  pacionet  Miscellaneous Math  15  20051208 08:00 
some questions of mine, in general  jerico2day  Software  5  20050330 09:19 
general Mersenne  Val  15k Search  10  20040313 20:56 