mersenneforum.org Combining Msieve with CADO NFS
 Register FAQ Search Today's Posts Mark Forums Read

2015-08-14, 14:15   #1
mfeltz

Aug 2015
Virginia, US

310 Posts

I am getting into GNFS factoring through work and have been reading a lot on the subject but still have a long way to go so I apologize in advance for n00b questions.

Through my reading I have found that CADO NFS seems to have state of the art poly select and sieving code, while msieve seems to have the best filtering code. I can't tell which one has better linear algebra (CADO uses Block Wiedemann and Msieve of course uses Block Lanczos).

What I would like to do is perform poly select and sieving through CADO and then finish with Msieve. The CADO readme.msieve file has some directions on how to conjoin the two:

Quote:
 II) Using msieve filtering and linear algebra with relations in CADO format 1) Create a file msieve.fb, which contains: N R0 R1 A0 A1 A2 A3 A4 A5 This can be done with: \$ ./convert_poly -of msieve < cxxx.poly > msieve.fb 2) create a file msieve.dat, which contains: N (Do not include free relations, since the CADO-NFS format is not recognized by msieve, and msieve includes them in the filtering.) 3) then run "msieve -nc -v " The msieve output goes into msieve.log. You can add the "-t 4" option to use 4 threads.
My questions are: What is the msieve.dat file supposed to look like? Does anyone know which output file(s) from CADO contain(s) just the non-free relations? Has anyone else tried to do this? Does what I am doing make any sense whatsoever?

2015-08-14, 15:53   #2
xilman
Bamboozled!

"𒉺𒌌𒇷𒆷𒀭"
May 2003
Down not across

2D8316 Posts

Quote:
 Originally Posted by mfeltz Through my reading I have found that CADO NFS seems to have state of the art poly select and sieving code, while msieve seems to have the best filtering code.
My personal experience, using each of them to factor C15x with GNFS running on my particular hardware set-up, is that GGNFS has the faster siever. A properly set up msieve installation will use that siever. CADO-NFS may well have the better polynomial finder but if you have a GPU only msieve will use it for the moment. Msieve's GPU poly finder is perhaps 100 times as fast as its CPU-only version and that's a big head start for CADO-NFS to overcome. CADO-NFS also appears to be the winner in the linear algebra department but I can't give you hard figures on that one. Where CADO really shines is the ease with which the polynomial selection and sieving phases can be split over multiple networked systems. It can be done with msieve but, frankly, it's a PITA.

Bear in mind that all of the above applies to my set-up factoring small integers. The opinions might be quite different if you want to use hundreds of machines to factor >C200 composites.

 2015-08-19, 16:17 #3 mfeltz     Aug 2015 Virginia, US 310 Posts Thank you for the feedback. I am actually doing CADO on a cluster at the moment and it does indeed shine in that department. I have made it further along in the process and can now ask a more refined question. Specifically I have the purge.purged output from CADO which I believe is what I need to feed into msieve as the relations file. Here are some example lines: ae1de,17d:c,73,f7b,fbd,6da1,4,a,13,85,2b0,724,863,193e,8c6c -aee1ef,13:0,c,23,3a,531,56a,2b95,9,91,288,520,e83,18b0,4074,8c6c fd9c,409:c,10b,1cc,5cb,56d1,2,6,9,30,16b,186,566,ff2,1b97,8c6c 313019,1e0:22,ec,1cf,537,16b6d,4,8,11,13,16,1b,33,39,15b,9ff,8c6c,11fd3 32e4f3,1034:32,61a,702,bf5,1496,6,a,29,40,68,21c,1389,2848,2cb0,8c6c However after doing some more reading on this forum and looking at the msieve source code, it appears that the first two elements on each line (before the colon) are supposed to be digits. They appear if anything to be in hex format. Here is the msieve source code that leads me to believe they should be digits: /* read the relation coordinates */ a = strtoll(buf, &next_field, 10); tmp = next_field; if (tmp[0] != ',' || !isdigit(tmp[1])) return -1; When I attempt to read the file, msieve gives me the following error for every relation: error -1 reading relation 21571 Am I using the wrong file?
 2015-08-19, 18:07 #4 jasonp Tribal Bullet     Oct 2004 32·5·79 Posts The purge file format was changed after the msieve parser was written, and I haven't had the time to adapt to the new format.
 2015-08-19, 19:12 #5 mfeltz     Aug 2015 Virginia, US 3 Posts Okay thanks for letting me know Jason. I think I will try my hand at updating the parser. If you have time to reply again and have any hints as to how to adapt it for the new format I would much appreciate it.
 2015-12-01, 04:30 #6 VBCurtis     "Curtis" Feb 2005 Riverside, CA 13×433 Posts Since I still haven't gotten CUDA properly set up under Linux, I'm game to try the CADO poly-select tools. I've gotten CADO 2.1.1 built, and have run the sample factorization to confirm things work. I then edited the sample parameters file to start poly select on a C166 I have on my todo list. Two questions: 1. What is the P parameter in poly select? 2. How does one tell CADO to only run poly select? I would like to give it a couple CPU-days to poly-select, then compare to a GPU-msieve poly via test sieving, but I haven't figured out how to prevent it from proceeding to sieve the poly it chooses. I plan to have it root-opt the 200 best polys from a ~100 thread-hour first pass. Anyone with CADO experience wish to comment on that plan?
2015-12-01, 13:36   #7
fivemack
(loop (#_fork))

Feb 2006
Cambridge, England

2·7·461 Posts

Quote:
 Originally Posted by mfeltz Thank you for the feedback. I am actually doing CADO on a cluster at the moment and it does indeed shine in that department. I have made it further along in the process and can now ask a more refined question. Specifically I have the purge.purged output from CADO which I believe is what I need to feed into msieve as the relations file. Here are some example lines: ae1de,17d:c,73,f7b,fbd,6da1,4,a,13,85,2b0,724,863,193e,8c6c -aee1ef,13:0,c,23,3a,531,56a,2b95,9,91,288,520,e83,18b0,4074,8c6c fd9c,409:c,10b,1cc,5cb,56d1,2,6,9,30,16b,186,566,ff2,1b97,8c6c 313019,1e0:22,ec,1cf,537,16b6d,4,8,11,13,16,1b,33,39,15b,9ff,8c6c,11fd3 32e4f3,1034:32,61a,702,bf5,1496,6,a,29,40,68,21c,1389,2848,2cb0,8c6c However after doing some more reading on this forum and looking at the msieve source code, it appears that the first two elements on each line (before the colon) are supposed to be digits.
Also, it looks as if the numbers appearing in the rest of the line from CADO are indices (written out in hex) into an array of small primes, rather than the primes themselves, and it's not quite clear where the space between rational-side and algebraic-side factorisations is; so you may need to re-factorise the values of the algebraic and rational polynomials at X=0xae1de Y=0x17d.

2016-03-16, 06:53   #8
Dubslow

"Bunslow the Bold"
Jun 2011
40<A<43 -89<O<-88

3×29×83 Posts

Quote:
 Originally Posted by VBCurtis Since I still haven't gotten CUDA properly set up under Linux, I'm game to try the CADO poly-select tools. I've gotten CADO 2.1.1 built, and have run the sample factorization to confirm things work. I then edited the sample parameters file to start poly select on a C166 I have on my todo list. Two questions: 1. What is the P parameter in poly select? 2. How does one tell CADO to only run poly select? I would like to give it a couple CPU-days to poly-select, then compare to a GPU-msieve poly via test sieving, but I haven't figured out how to prevent it from proceeding to sieve the poly it chooses. I plan to have it root-opt the 200 best polys from a ~100 thread-hour first pass. Anyone with CADO experience wish to comment on that plan?
What did you figure out? I'm trying the same thing you are (on the good old C195 we all know and love).

As far as I can tell, there is no documentation of all the available parameters, nor how to indicate which parameters are irrelevant to what I'm trying to do (I have deliberately left the large prime bounds unset thank you very much, since you won't be doing any sieving. Now stfu and find a poly )

Last fiddled with by Dubslow on 2016-03-16 at 06:58

 2016-03-16, 15:51 #9 VBCurtis     "Curtis" Feb 2005 Riverside, CA 13·433 Posts I never found an answer to 2, but it mattered less once I set P high enough. P indicates the depth to search each leading coefficient- a bit like msieve's seconds-per-A1 setting. I tinkered with choices of P such that cado would cover a range of A1 coefficients roughly as fast as msieve-GPU, but that was done because I was running both at the same time and wanted similar ranges checked. Param settings can be changed mostly-on-the-fly, with a restart of CADO to effect the changes. So, if you tire of stage 1, you can change the coeff range to be what you've already completed to force stage 2, etc.
2016-03-16, 16:30   #10
Dubslow

"Bunslow the Bold"
Jun 2011
40<A<43 -89<O<-88

722110 Posts

Quote:
 Originally Posted by VBCurtis I never found an answer to 2, but it mattered less once I set P high enough. P indicates the depth to search each leading coefficient- a bit like msieve's seconds-per-A1 setting. I tinkered with choices of P such that cado would cover a range of A1 coefficients roughly as fast as msieve-GPU, but that was done because I was running both at the same time and wanted similar ranges checked. Param settings can be changed mostly-on-the-fly, with a restart of CADO to effect the changes. So, if you tire of stage 1, you can change the coeff range to be what you've already completed to force stage 2, etc.
Yich...

To the mailing list we go!

2016-03-16, 21:12   #11
Dubslow

"Bunslow the Bold"
Jun 2011
40<A<43 -89<O<-88

722110 Posts

Quote:
 Originally Posted by VBCurtis I never found an answer to 2, but it mattered less once I set P high enough. P indicates the depth to search each leading coefficient- a bit like msieve's seconds-per-A1 setting. I tinkered with choices of P such that cado would cover a range of A1 coefficients roughly as fast as msieve-GPU, but that was done because I was running both at the same time and wanted similar ranges checked. Param settings can be changed mostly-on-the-fly, with a restart of CADO to effect the changes. So, if you tire of stage 1, you can change the coeff range to be what you've already completed to force stage 2, etc.
Did you ever find this documentation? It's in the parameters/factor/params.c90 file.

Code:
###########################################################################
# Polynomial selection task with Kleinjung's algorithm (2008)
###########################################################################

tasks.polyselect.degree = 4             # degree of the algebraic polynomial

#    # How many threads to use per polyselect process

tasks.polyselect.P = 10000              # choose lc(g) with two prime factors in [P,2P]
# Setting a large value will most likely find better polynomials,
# but the number of found polynomials will be smaller.
# As a rule of thumb, we want at least 100 found polynomials in total
# (without norm limitation, see below).

# If not set, the default is 0.

# This factor is usually a smooth number, which forces projective roots in
# the algebraic polynomial. 60 is a good start, 210 is popular as well.
# Warning: to ensure lc(f) is divisible by incr, admin should be divisible
# by incr too.

# The polynomial selection search time is proportional to the

# Polynomial selection is split into several individual tasks. The
# complete range from admin to admax has to be covered for the polynomial
# selection to complete. The number of individual tasks is
# a workunit to a slave for computation.

# Number of small primes in the leading coefficient of the linear polynomial
# Safe to leave at the default value
# Recommended values are powers of the degree, e.g., 625 for degree 5,
# or 1296 for degree 6. Here 256 = 4^4 thus the leading coefficient of
# the linear polynomial will be q1*q2*q3*q4*p1*p2 where q1,q2,q3,q4 are
# small primes, and P <= p1, p2 < 2*P.
What do lc(f) and lc(g) mean? Presumably f and g are the polynomials?

Edit: Here's another interesting tidbit:
Code:
tasks.I = 11
# Sieving range in lattice siever
# The lattice siever sieves over a range in the (i,j) plane which is
# 2^I times 2^(I-1), to put things simply (some rescaling may change
# this, but the size of the sieve area remains constant). Increasing
# I by 1 multiplies the amount of required RAM for the siever by a
# factor of 4.
# Note: now I is used also in the polynomial selection (for the computation
# of Murphy's E value) thus we define it at the "tasks" level, not at the
# "tasks.sieve" level.
MoreEdit: Apparently they use I=14 all the way up through C190, with the C220 example file at I=15. That's rather different than the ggnfs sievers.

Last fiddled with by Dubslow on 2016-03-16 at 21:26

 Similar Threads Thread Thread Starter Forum Replies Last Post aein Msieve 2 2017-10-05 01:52 Edmond Lounge 11 2017-07-03 16:31 only_human Miscellaneous Math 3 2016-05-20 05:47 ravlyuchenko Msieve 1 2011-08-16 12:12 roger Riesel Prime Search 4 2008-01-15 00:01

All times are UTC. The time now is 21:44.

Sun Jan 29 21:44:21 UTC 2023 up 164 days, 19:12, 0 users, load averages: 0.52, 0.75, 0.83