mersenneforum.org  

Go Back   mersenneforum.org > Factoring Projects > Msieve

Reply
 
Thread Tools
Old 2015-08-14, 14:15   #1
mfeltz
 
mfeltz's Avatar
 
Aug 2015
Virginia, US

38 Posts
Default Combining Msieve with CADO NFS

I am getting into GNFS factoring through work and have been reading a lot on the subject but still have a long way to go so I apologize in advance for n00b questions.

Through my reading I have found that CADO NFS seems to have state of the art poly select and sieving code, while msieve seems to have the best filtering code. I can't tell which one has better linear algebra (CADO uses Block Wiedemann and Msieve of course uses Block Lanczos).

What I would like to do is perform poly select and sieving through CADO and then finish with Msieve. The CADO readme.msieve file has some directions on how to conjoin the two:

Quote:
II) Using msieve filtering and linear algebra with relations in CADO format

1) Create a file msieve.fb, which contains:

N <number to be factored>
R0 <coeff of x^0, rational side>
R1 <coeff of x^1, rational side>
A0 <coeff of x^0, algebraic side>
A1 <etc>
A2 <etc>
A3 <etc>
A4 <etc>
A5 <etc>

This can be done with:

$ ./convert_poly -of msieve < cxxx.poly > msieve.fb

2) create a file msieve.dat, which contains:

N <number to be factored>
<all the relations in GGNFS/CADO format>

(Do not include free relations, since the CADO-NFS format is not
recognized by msieve, and msieve includes them in the filtering.)

3) then run "msieve -nc -v <number to be factored>"
The msieve output goes into msieve.log.
You can add the "-t 4" option to use 4 threads.
My questions are: What is the msieve.dat file supposed to look like? Does anyone know which output file(s) from CADO contain(s) just the non-free relations? Has anyone else tried to do this? Does what I am doing make any sense whatsoever?
mfeltz is offline   Reply With Quote
Old 2015-08-14, 15:53   #2
xilman
Bamboozled!
 
xilman's Avatar
 
"π’‰Ίπ’ŒŒπ’‡·π’†·π’€­"
May 2003
Down not across

246768 Posts
Default

Quote:
Originally Posted by mfeltz View Post
Through my reading I have found that CADO NFS seems to have state of the art poly select and sieving code, while msieve seems to have the best filtering code.
My personal experience, using each of them to factor C15x with GNFS running on my particular hardware set-up, is that GGNFS has the faster siever. A properly set up msieve installation will use that siever. CADO-NFS may well have the better polynomial finder but if you have a GPU only msieve will use it for the moment. Msieve's GPU poly finder is perhaps 100 times as fast as its CPU-only version and that's a big head start for CADO-NFS to overcome. CADO-NFS also appears to be the winner in the linear algebra department but I can't give you hard figures on that one. Where CADO really shines is the ease with which the polynomial selection and sieving phases can be split over multiple networked systems. It can be done with msieve but, frankly, it's a PITA.

Bear in mind that all of the above applies to my set-up factoring small integers. The opinions might be quite different if you want to use hundreds of machines to factor >C200 composites.
xilman is offline   Reply With Quote
Old 2015-08-19, 16:17   #3
mfeltz
 
mfeltz's Avatar
 
Aug 2015
Virginia, US

3 Posts
Default

Thank you for the feedback. I am actually doing CADO on a cluster at the moment and it does indeed shine in that department. I have made it further along in the process and can now ask a more refined question. Specifically I have the purge.purged output from CADO which I believe is what I need to feed into msieve as the relations file. Here are some example lines:

ae1de,17d:c,73,f7b,fbd,6da1,4,a,13,85,2b0,724,863,193e,8c6c
-aee1ef,13:0,c,23,3a,531,56a,2b95,9,91,288,520,e83,18b0,4074,8c6c
fd9c,409:c,10b,1cc,5cb,56d1,2,6,9,30,16b,186,566,ff2,1b97,8c6c
313019,1e0:22,ec,1cf,537,16b6d,4,8,11,13,16,1b,33,39,15b,9ff,8c6c,11fd3
32e4f3,1034:32,61a,702,bf5,1496,6,a,29,40,68,21c,1389,2848,2cb0,8c6c

However after doing some more reading on this forum and looking at the msieve source code, it appears that the first two elements on each line (before the colon) are supposed to be digits. They appear if anything to be in hex format. Here is the msieve source code that leads me to believe they should be digits:

/* read the relation coordinates */

a = strtoll(buf, &next_field, 10);
tmp = next_field;
if (tmp[0] != ',' || !isdigit(tmp[1]))
return -1;

When I attempt to read the file, msieve gives me the following error for every relation:

error -1 reading relation 21571

Am I using the wrong file?
mfeltz is offline   Reply With Quote
Old 2015-08-19, 18:07   #4
jasonp
Tribal Bullet
 
jasonp's Avatar
 
Oct 2004

2×29×61 Posts
Default

The purge file format was changed after the msieve parser was written, and I haven't had the time to adapt to the new format.
jasonp is offline   Reply With Quote
Old 2015-08-19, 19:12   #5
mfeltz
 
mfeltz's Avatar
 
Aug 2015
Virginia, US

3 Posts
Default

Okay thanks for letting me know Jason. I think I will try my hand at updating the parser. If you have time to reply again and have any hints as to how to adapt it for the new format I would much appreciate it.
mfeltz is offline   Reply With Quote
Old 2015-12-01, 04:30   #6
VBCurtis
 
VBCurtis's Avatar
 
"Curtis"
Feb 2005
Riverside, CA

478910 Posts
Default

Since I still haven't gotten CUDA properly set up under Linux, I'm game to try the CADO poly-select tools. I've gotten CADO 2.1.1 built, and have run the sample factorization to confirm things work.

I then edited the sample parameters file to start poly select on a C166 I have on my todo list. Two questions:
1. What is the P parameter in poly select?
2. How does one tell CADO to only run poly select? I would like to give it a couple CPU-days to poly-select, then compare to a GPU-msieve poly via test sieving, but I haven't figured out how to prevent it from proceeding to sieve the poly it chooses.

I plan to have it root-opt the 200 best polys from a ~100 thread-hour first pass. Anyone with CADO experience wish to comment on that plan?
VBCurtis is offline   Reply With Quote
Old 2015-12-01, 13:36   #7
fivemack
(loop (#_fork))
 
fivemack's Avatar
 
Feb 2006
Cambridge, England

18F016 Posts
Default

Quote:
Originally Posted by mfeltz View Post
Thank you for the feedback. I am actually doing CADO on a cluster at the moment and it does indeed shine in that department. I have made it further along in the process and can now ask a more refined question. Specifically I have the purge.purged output from CADO which I believe is what I need to feed into msieve as the relations file. Here are some example lines:

ae1de,17d:c,73,f7b,fbd,6da1,4,a,13,85,2b0,724,863,193e,8c6c
-aee1ef,13:0,c,23,3a,531,56a,2b95,9,91,288,520,e83,18b0,4074,8c6c
fd9c,409:c,10b,1cc,5cb,56d1,2,6,9,30,16b,186,566,ff2,1b97,8c6c
313019,1e0:22,ec,1cf,537,16b6d,4,8,11,13,16,1b,33,39,15b,9ff,8c6c,11fd3
32e4f3,1034:32,61a,702,bf5,1496,6,a,29,40,68,21c,1389,2848,2cb0,8c6c

However after doing some more reading on this forum and looking at the msieve source code, it appears that the first two elements on each line (before the colon) are supposed to be digits.
Also, it looks as if the numbers appearing in the rest of the line from CADO are indices (written out in hex) into an array of small primes, rather than the primes themselves, and it's not quite clear where the space between rational-side and algebraic-side factorisations is; so you may need to re-factorise the values of the algebraic and rational polynomials at X=0xae1de Y=0x17d.
fivemack is online now   Reply With Quote
Old 2016-03-16, 06:53   #8
Dubslow
Basketry That Evening!
 
Dubslow's Avatar
 
"Bunslow the Bold"
Jun 2011
40<A<43 -89<O<-88

3×29×83 Posts
Default

Quote:
Originally Posted by VBCurtis View Post
Since I still haven't gotten CUDA properly set up under Linux, I'm game to try the CADO poly-select tools. I've gotten CADO 2.1.1 built, and have run the sample factorization to confirm things work.

I then edited the sample parameters file to start poly select on a C166 I have on my todo list. Two questions:
1. What is the P parameter in poly select?
2. How does one tell CADO to only run poly select? I would like to give it a couple CPU-days to poly-select, then compare to a GPU-msieve poly via test sieving, but I haven't figured out how to prevent it from proceeding to sieve the poly it chooses.

I plan to have it root-opt the 200 best polys from a ~100 thread-hour first pass. Anyone with CADO experience wish to comment on that plan?
What did you figure out? I'm trying the same thing you are (on the good old C195 we all know and love).

As far as I can tell, there is no documentation of all the available parameters, nor how to indicate which parameters are irrelevant to what I'm trying to do (I have deliberately left the large prime bounds unset thank you very much, since you won't be doing any sieving. Now stfu and find a poly )

Last fiddled with by Dubslow on 2016-03-16 at 06:58
Dubslow is offline   Reply With Quote
Old 2016-03-16, 15:51   #9
VBCurtis
 
VBCurtis's Avatar
 
"Curtis"
Feb 2005
Riverside, CA

4,789 Posts
Default

I never found an answer to 2, but it mattered less once I set P high enough. P indicates the depth to search each leading coefficient- a bit like msieve's seconds-per-A1 setting. I tinkered with choices of P such that cado would cover a range of A1 coefficients roughly as fast as msieve-GPU, but that was done because I was running both at the same time and wanted similar ranges checked.

Param settings can be changed mostly-on-the-fly, with a restart of CADO to effect the changes. So, if you tire of stage 1, you can change the coeff range to be what you've already completed to force stage 2, etc.
VBCurtis is offline   Reply With Quote
Old 2016-03-16, 16:30   #10
Dubslow
Basketry That Evening!
 
Dubslow's Avatar
 
"Bunslow the Bold"
Jun 2011
40<A<43 -89<O<-88

3×29×83 Posts
Default

Quote:
Originally Posted by VBCurtis View Post
I never found an answer to 2, but it mattered less once I set P high enough. P indicates the depth to search each leading coefficient- a bit like msieve's seconds-per-A1 setting. I tinkered with choices of P such that cado would cover a range of A1 coefficients roughly as fast as msieve-GPU, but that was done because I was running both at the same time and wanted similar ranges checked.

Param settings can be changed mostly-on-the-fly, with a restart of CADO to effect the changes. So, if you tire of stage 1, you can change the coeff range to be what you've already completed to force stage 2, etc.
Yich...

To the mailing list we go!
Dubslow is offline   Reply With Quote
Old 2016-03-16, 21:12   #11
Dubslow
Basketry That Evening!
 
Dubslow's Avatar
 
"Bunslow the Bold"
Jun 2011
40<A<43 -89<O<-88

3×29×83 Posts
Default

Quote:
Originally Posted by VBCurtis View Post
I never found an answer to 2, but it mattered less once I set P high enough. P indicates the depth to search each leading coefficient- a bit like msieve's seconds-per-A1 setting. I tinkered with choices of P such that cado would cover a range of A1 coefficients roughly as fast as msieve-GPU, but that was done because I was running both at the same time and wanted similar ranges checked.

Param settings can be changed mostly-on-the-fly, with a restart of CADO to effect the changes. So, if you tire of stage 1, you can change the coeff range to be what you've already completed to force stage 2, etc.
Did you ever find this documentation? It's in the parameters/factor/params.c90 file.

Code:
###########################################################################
# Polynomial selection task with Kleinjung's algorithm (2008)
###########################################################################

tasks.polyselect.degree = 4             # degree of the algebraic polynomial

#tasks.polyselect.threads = 2
#    # How many threads to use per polyselect process

tasks.polyselect.P = 10000              # choose lc(g) with two prime factors in [P,2P]
    # Setting a large value will most likely find better polynomials,
    # but the number of found polynomials will be smaller.
    # As a rule of thumb, we want at least 100 found polynomials in total
    # (without norm limitation, see below).

tasks.polyselect.admin = 0          # min value for leading coefficient of f(x)
    # If not set, the default is 0.

tasks.polyselect.admax = 150e3      # max value for leading coefficient of f(x)

tasks.polyselect.incr = 60          # increment for leading coefficient of f(x)
    # This factor is usually a smooth number, which forces projective roots in
    # the algebraic polynomial. 60 is a good start, 210 is popular as well.
    # Warning: to ensure lc(f) is divisible by incr, admin should be divisible
    # by incr too.

    # The polynomial selection search time is proportional to the
    # length of the search interval, i.e., (admax-admin)/incr.

tasks.polyselect.nrkeep = 40

tasks.polyselect.adrange = 5000         # size of individual tasks
    # Polynomial selection is split into several individual tasks. The
    # complete range from admin to admax has to be covered for the polynomial
    # selection to complete. The number of individual tasks is
    # (polsel_admax-polsel_admin)/polsel_adrange. Each such task is issued as
    # a workunit to a slave for computation.

tasks.polyselect.nq = 256
    # Number of small primes in the leading coefficient of the linear polynomial
    # Safe to leave at the default value
    # Recommended values are powers of the degree, e.g., 625 for degree 5,
    # or 1296 for degree 6. Here 256 = 4^4 thus the leading coefficient of
    # the linear polynomial will be q1*q2*q3*q4*p1*p2 where q1,q2,q3,q4 are
    # small primes, and P <= p1, p2 < 2*P.
What do lc(f) and lc(g) mean? Presumably f and g are the polynomials?

Edit: Here's another interesting tidbit:
Code:
tasks.I = 11
    # Sieving range in lattice siever
    # The lattice siever sieves over a range in the (i,j) plane which is
    # 2^I times 2^(I-1), to put things simply (some rescaling may change
    # this, but the size of the sieve area remains constant). Increasing
    # I by 1 multiplies the amount of required RAM for the siever by a
    # factor of 4.
    # Note: now I is used also in the polynomial selection (for the computation
    # of Murphy's E value) thus we define it at the "tasks" level, not at the
    # "tasks.sieve" level.
MoreEdit: Apparently they use I=14 all the way up through C190, with the C220 example file at I=15. That's rather different than the ggnfs sievers.

Last fiddled with by Dubslow on 2016-03-16 at 21:26
Dubslow is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
use msieve linear algebra after CADO-NFS filtering aein Msieve 2 2017-10-05 01:52
Combining CPUs Edmond Lounge 11 2017-07-03 16:31
Combining low quality random numbers sources only_human Miscellaneous Math 3 2016-05-20 05:47
how to run msieve or cado-nfs on mpi-cluster? ravlyuchenko Msieve 1 2011-08-16 12:12
Combining NewPGen files? roger Riesel Prime Search 4 2008-01-15 00:01

All times are UTC. The time now is 11:19.

Tue May 18 11:19:36 UTC 2021 up 40 days, 6 hrs, 0 users, load averages: 1.39, 1.71, 1.68

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.