mersenneforum.org  

Go Back   mersenneforum.org > Factoring Projects > Msieve

Reply
 
Thread Tools
Old 2010-12-01, 21:57   #1
tal
 
Nov 2010

916 Posts
Talking Factored my first SemiPrime -Some Questions

Hey all - I have some beginners questions about the use of msieve in conjunction with ggnfs, hopefully I put this in (mostly) the right place.

The idea of factoring some large numbers appealed to me, but I know it's a bit more complicated than just running a script - so I've been reading up on it a lot lately. I followed the guide and successfully factored RSA-100, which was really exciting - but I have some questions so I actually understand what I was doing besides just doing the magical incantations. I realize some of this process might be overkill when talking about a 100-digit semiprime, but I want to get the process right for something small, so when I move on to larger numbers I can understand it all.

For the record, I'm running on a Linux box, 2GB RAM, Dual-Core Athlon64 4200+ with Hyperthreading enabled, in 32 bit (x86) - yes that's not a typo. When I installed the OS I made the mistake of not doing x64, and rebuilding the toolchain in gentoo is impossible, so I will probably wipe and reinstall soon, but for now it's 32 bit. I intend to do benchmarking to compare with and without Hyperthreading, x32 and x64, how optimal and suboptimal L1_BITS affects it... I have two GeForce 480's in my desktop (running Win7) to do CUDA testing on, and I have some more boxes I can borrow or turn on to do distributed sieving.

Part 1: Polynomial Selection

Selecting a good polynomial reduces the time spent sieving, and there are tests to determine what a 'good' polynomial is - the one I see mentioned a lot being the Murphy test. And I saw people comparing polynomials using a Murphy Score, which was of the form 2.64e-12... here are some selections from rsa100.dat.p:
...
# norm 7.551775e-14 alpha -4.695883 e 1.040e-08 rroots 2
skew: 1607312.30
c0: -15717476569555150190764775856
c1: 9064072593136772886192
c2: 29855101586021
c3: 1661649458
c4: 7320
Y0: -675330080117467742399765
Y1: 80844668967923
# norm 5.777448e-14 alpha -4.550075 e 9.261e-09 rroots 0
skew: 1995747.27
c0: 37257076699243727116282745836
c1: -3080766778179746568716
c2: -21401399698772505
c3: 1237242768
c4: 8316
Y0: -654133332715871134713393
Y1: 77694547530973
...
I recognize c0-c4 are the coefficients used in the polynomial, and I assume that Y0 and Y1 are more polynomial search parameters that are way over my head. But what are norm, e, alpha, and rroots (real roots?)? Is e the Murphy Score? It goes up and down over the course of the search - I thought it would continually get better. (Why keep track of polynomials that are worse than the one you have?) Here is the log for rsa-100.dat.p

Later on, I'd like to run multiple polynomial selections in parallel over different ranges and choose the best. How can I objectively compare two polynomials to see which is 'better'? Is it only worth comparing Murphy Scores?

Part 2: Sieving

After polynomial selection ran for about an hour (longer than the .35 hours it promised) I killed it, and ran the script to start sieving. It estimated about 4M relations needed, and kicked off four jobs at once to compute them. It started q at q = 900000. I understand Q is the range over which we are searching for relations, and the most distrubutable part of the operation, and it seems that you find more relations at lower Q values - but I have a few academic questions.

Why did it start Q at 900K? Similarly in over in this thread Q is started at 20M, and here, Q=30M. How do you determine a good Q-start-value?

Related, I have no idea what the -M 1 parameter means. I know -M 0 means start on the "algebraic side of mpqs treatment" and -M 1 means the rational side - but I haven't the faintest what that _actually_ means.

I also don't understand the difference between the gnfs-lasieve4IXXe binaries. 11 through 16, they seem to be suited for different jobs. I've seen one thread talk about using 15e, another 16e. Looking at the code of ./factmsieve.py, it seems each is for a different range of digits in the composite (when working with the gnfs, I'm not too interested in the sfns):

- <95 = lasieve4I11e
- 96-110 = lasieve4I12e
- 111-140 = lasieve4I13e
- 141-158 = lasieve4I14e
- 158-185 = lasieve4I15e
- >185 = lasieve4I16e

If that is correct, what is the (approximate) difference between them? I suppose as far as telling when it is better to take a step up (or down) - it would just do test sieving and see which performs faster?

So I sieved and I sieved and it didn't take very long, by the time Q got to 1.3M, I had 4778012 relations, 116.7% of the estimated minimum (4095000). Time to move on.

Part 3: Linear Algebra

Once upon a time I took a computer vision course, things like edge detection and pattern recognition, and movement prediction - without taking linear algebra. I'm proud (or ashamed) to say that after me they made Linear Algebra a prequisite for the course. So I'm not too hot at it.

Tue Nov 30 15:43:12 2010 -> ./msieve -s ./rsa100.dat -l ./rsa100.log -i ./rsa100.ini -nf ./rsa100.fb -t 4 -nc1
Tue Nov 30 15:47:24 2010 LatSieveTime: 2859.37
Tue Nov 30 15:47:24 2010 -> Running matrix solving step ...
Tue Nov 30 15:47:24 2010 -> ./msieve -s ./rsa100.dat -l ./rsa100.log -i ./rsa100.ini -nf ./rsa100.fb -t 4 -nc2
Tue Nov 30 15:55:16 2010 -> Running square root step ...
Tue Nov 30 15:55:16 2010 -> ./msieve -s ./rsa100.dat -l ./rsa100.log -i ./rsa100.ini -nf ./rsa100.fb -t 4 -nc3
Tue Nov 30 15:57:30 2010 -> Computing 1.29115e+09 scale for this machine...
Tue Nov 30 15:57:30 2010 -> procrels -speedtest> PIPE
Tue Nov 30 15:57:46 2010 -> Factorization summary written to g100-rsa100.txt
Step -nc1 has to do with checking the relations. Can this be done incrementally as sieve jobs complete? Or must it be run on the entire sieve results? Can it be run, and the results discarded, to check the validity of a sieve job, as a sanity check? If possible, can you explain what it's actually *doing*?

Anway, I was so excited when I saw the news - my primes were ready to come out of the oven.

$ cat g100-rsa100.txt
Number: rsa100
N = 1522605027922533360535618378132637429718068114961380688657908494580122963258952897654000350692006139 (100 digits)
Divisors found:
Version: Msieve-1.40
Total time: 2.93 hours.
Factorization parameters were as follows:
n: 1522605027922533360535618378132637429718068114961380688657908494580122963258952897654000350692006139
Y0: -871428731799383793275219
Y1: 81811708972999
c0: 22384488491775383738875545270
c1: -8624944207655121457719
c2: -16162116011420596
c3: 3616907164
c4: 2640
skew: 2095497.31
type: gnfs
Factor base limits: 1800000/1800000
Large primes per side: 3
Large prime bits: 26/26
Sieved algebraic special-q in [0, 0)
Total raw relations: 4778012
Relations:
Pruned matrix :
Polynomial selection time: 0.00 hours.
Total sieving time: 2.93 hours.
Total relation processing time: 0.00 hours.
Matrix solve time: 0.00 hours.
time per square root: 0.00 hours.
Prototype def-par.txt line would be: gnfs,99,4,58,1500,0.003,0.4,220,15,10000,2000,1800000,1800000,26,26,48,48,2.5,2.5,100000
total time: 2.93 hours.
I have no primes, no primes for me =( I had no idea what went wrong.

But I typed up nearly this whole post, and finally found them!

$ tail rsa100.log.mpi00 -n 3
Tue Nov 30 15:57:30 2010 prp50 factor: 37975227936943673922808872755445627854565536638199
Tue Nov 30 15:57:30 2010 prp50 factor: 40094690950920881030683735292761468389214899724061
Tue Nov 30 15:57:30 2010 elapsed time 00:02:14
My primes! I haven't the faintest why the python script failed to give the result of all my (okay not really 'my') hard work, but I found them.

So I have a lot of academic questions wrapped up in this narrative (I tried to make them stand out), but I also wanted to thank everyone (especially jasonp) profusely for all the hard work they've put in to making these tools - it's really exciting.
tal is offline   Reply With Quote
Old 2010-12-02, 02:18   #2
jasonp
Tribal Bullet
 
jasonp's Avatar
 
Oct 2004

3·1,181 Posts
Default

For the poly selection, Y0 and Y1 are the coefficients of the rational polynomial. The other numbers attempt to measure how good the polynomial will be if you sieved with it; the Murphy E score is the most accurate, and the most expensive to compute. It was basically invented so that two polynomials could be directly compared by their E values. The reason the code prints out large numbers of polynomials is that the E value is not a perfect measure of the potential in a polynomial, and for larger problems there will be many polynomials that have very similar E values. So for the largest problems you have to choose the best polynomial via test sieving, and the E score comparison is there to reduce the amount of test sieving you have to do. I agree that it's more helpful to print out, say, the top 10 polynomials, but you also don't want the code to print nothing for weeks until it's done. Right now any polynomial whose E value exceeds a given (fairly stringent) bound is printed.

The linear algebra is divided into two halves; -nc1 does the first and -nc2 does the second. Both require all the relations to be available at once. The first half is the 'filtering' phase, which throws away relations that will not help complete the factorization, and combines the rest into small groups that make the resulting matrix much smaller. The actual linear algebra then selects a very specific collection of (about half of) those groups that allow the factorization to proceed. I'm afraid the details here are complicated.

Congrats on getting started!
jasonp is offline   Reply With Quote
Old 2010-12-02, 03:21   #3
R.D. Silverman
 
R.D. Silverman's Avatar
 
Nov 2003

22×5×373 Posts
Default

Full (very long) quote removed
Please only quote what you are replying to


Would you like references? If you really want to learn about the algorithm,
I can provide them. Otherwise, it will probably remain just a black box.
And I would be happy to answer any mathematical questions you might
have about the algorithm.

And you should start by learning about QS and how it works. Trying to
understand NFS before studying QS will be nigh to impossible unless
you have a very strong math background. (advanced undergrad with at
least one course in algebraic number theory or abstract algebra; do you
know what a prime ideal is?)

An excellent general discussion can be found in the Crandall & Pomerance
book: Prime Numbers; A Computational Perspective. It will also give the
number theory background material that is needed.

Last fiddled with by smh on 2010-12-16 at 20:38 Reason: eliminate some redundancy
R.D. Silverman is offline   Reply With Quote
Old 2010-12-02, 03:27   #4
CRGreathouse
 
CRGreathouse's Avatar
 
Aug 2006

3·1,993 Posts
Default

tal -- For what it's worth, Silverman here is a great resource on this subject, but he doesn't like beginner questions. :) He's a real expert on the NFS and quite generous with his time.

Quote:
Originally Posted by R.D. Silverman View Post
An excellent general discussion can be found in the Crandall & Pomerance
book: Prime Numbers; A Computational Perspective. It will also give the
number theory background material that is needed.
Yes, this is an excellent introduction to the subject.

Quote:
Originally Posted by R.D. Silverman View Post
And you should start by learning about QS and how it works. Trying to
understand NFS before studying QS will be nigh to impossible unless
you have a very strong math background. (advanced undergrad with at
least one course in algebraic number theory or abstract algebra; do you
know what a prime ideal is?)
I'd call it impossible, frankly, without either knowing the QS or having at least that much background. Even with that background I'd certainly learn the QS first.

Last fiddled with by CRGreathouse on 2010-12-02 at 03:28
CRGreathouse is offline   Reply With Quote
Old 2010-12-02, 03:41   #5
R.D. Silverman
 
R.D. Silverman's Avatar
 
Nov 2003

22×5×373 Posts
Default

Quote:
Originally Posted by CRGreathouse View Post
tal -- For what it's worth, Silverman here is a great resource on this subject, but he doesn't like beginner questions. :) .
On the contrary. I respond quite well to questions. What I don't like
is beginner assertions and presumptions. I also openly admit to intolerance
toward "willful ignorance": those the claim to be interested, but are unwilling
to learn.
R.D. Silverman is offline   Reply With Quote
Old 2010-12-02, 05:18   #6
CRGreathouse
 
CRGreathouse's Avatar
 
Aug 2006

135338 Posts
Default

Quote:
Originally Posted by R.D. Silverman View Post
On the contrary. I respond quite well to questions. What I don't like
is beginner assertions and presumptions. I also openly admit to intolerance
toward "willful ignorance": those the claim to be interested, but are unwilling
to learn.
I think you respond extremely well to questions from those who have appropriate background and who have done appropriate preparation. You have in the past faired much less well with questions from beginners (though, I grant, not uniformly poorly). My comment was meant only as guidance to the OP: learn your material, then when you have a decent understanding come to you with questions.

CRGreathouse is offline   Reply With Quote
Old 2010-12-02, 14:26   #7
Brian Gladman
 
Brian Gladman's Avatar
 
May 2008
Worcester, United Kingdom

72×11 Posts
Default

Hi Tal,

With reference to your 'missing primes' when using my factmsieve.py script, I have had bug reports indicating that the script sometimes fails to run the square root step for no apparent reason. If the final summary file is deleted and the script is then run again, it then completes the factorisation. I have not been able to debug this as I have not seen the behaviour.

But I have now spent some time trying to find what might cause this and I think that I have may have found the issue. It seems to be caused by some Python versions becoming confused about indentation when a script contains mixed tab/space indents.

I hope the attached version (v0.76) corrects this problem. This file now uses only spaces (no tabs) and is a Windows convention file with CRLF line endings (if it is run on *nix, it _might_ need to be converted for *nix line endings).

I have also made a few other minor changes to improve the scripts termination behaviour. Please report any issues here.

Brian
Attached Files
File Type: zip factmsieve.76.zip (19.1 KB, 190 views)
Brian Gladman is offline   Reply With Quote
Old 2010-12-02, 14:51   #8
tal
 
Nov 2010

32 Posts
Default

R.D. - I certainly don't want to waste your time or patience, so I'll treat it more as a black box for now, brush up on my background and read up about the Quadratic Sieve for now. I think I understand enough right now to go through factoring larger numbers, and benchmarking each step should give me an idea going forward if things are taking more or less time than expected, and if I should start experimenting with other options.

Brian - I was running v0.74 on a Linux box, under Python 2.6.5 - I run gentoo, so everything, including python, is compiled from source. I'll take a look at the newest version and hopefully time will permit me to run through the whole process again today and see what might have gone awry.
tal is offline   Reply With Quote
Old 2010-12-02, 15:15   #9
CRGreathouse
 
CRGreathouse's Avatar
 
Aug 2006

3·1,993 Posts
Default

Quote:
Originally Posted by tal View Post
CRGreathouse is offline   Reply With Quote
Old 2010-12-02, 15:40   #10
R.D. Silverman
 
R.D. Silverman's Avatar
 
Nov 2003

11101001001002 Posts
Default

Quote:
Originally Posted by tal View Post
R.D. - I certainly don't want to waste your time or patience, so I'll treat it more as a black box for now,
I'm not trying to be mean. (But many people might believe otherwise).
But you can't ask any meaningful questions until you have at least minimal
understanding of how the algorithms work.

Even something simple as a reply to your question about the meaning
of Y0 and Y1 in the inputs will not make much sense unless you have
some understanding of the algorithm itself. Similarly for other
parametric inputs.


There are some simplified explanations that you can find for both QS
and NFS on the web. I will give you one of my favorite mantras:
"Google is my friend"

What is your CS background? Have you had a course in algorithms?
You say that you have taken Linear Algebra. Have you taken any
other math courses?
R.D. Silverman is offline   Reply With Quote
Old 2010-12-02, 16:01   #11
tal
 
Nov 2010

32 Posts
Default

Quote:
Originally Posted by R.D. Silverman View Post
Even something simple as a reply to your question about the meaning
of Y0 and Y1 in the inputs will not make much sense unless you have
some understanding of the algorithm itself. Similarly for other
parametric inputs.
Completely reasonable.


Quote:
Originally Posted by R.D. Silverman View Post
There are some simplified explanations that you can find for both QS
and NFS on the web. I will give you one of my favorite mantras:
"Google is my friend"

What is your CS background? Have you had a course in algorithms?
You say that you have taken Linear Algebra. Have you taken any
other math courses?
I have a strong CS background - compilers, databases, esoteric C techniques, networking, data structures, and algorithms (asymptotic notation included of course) - reading, editing, or compiling the code doesn't scare me in the least. I've been reading and writing code for the past 6 years full time at both work and home. It's my math background that's lacking. I did not take linear algebra - nor much Calculus (neither Differential Equations nor Multivariable). I took a Discrete Math for Cryptography course a few years ago that I loved -I recently wanted to relearn the details of ElGamal encryption, so I spent the time relearning the basics of Group Theory and by the time I had a solid grasp on that, the actual ElGamal part was obvious.

I'm not scared to start at QS, and work my way backwards googling and reading, ingesting each building block as I come across it - it's just going to take some time.

Last fiddled with by tal on 2010-12-02 at 16:04 Reason: clarification
tal is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
Semiprime and n-almost prime candidate for the k's with algebra for the Sierpinski/Riesel problem sweety439 sweety439 11 2020-09-23 01:42
Generating a uniform random n-bit semiprime danaj Programming 17 2017-09-15 16:41
Factored vs. Completely factored aketilander Factoring 4 2012-08-08 18:09
2,709+ factored JHansen Factoring 47 2005-06-29 17:59
RSA-200 factored xilman Factoring 23 2005-06-02 03:24

All times are UTC. The time now is 16:23.


Mon Oct 25 16:23:40 UTC 2021 up 94 days, 10:52, 0 users, load averages: 1.44, 1.36, 1.31

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.