mersenneforum.org  

Go Back   mersenneforum.org > Factoring Projects > YAFU

Reply
 
Thread Tools
Old 2021-04-05, 07:13   #1
bur
 
Aug 2020

17110 Posts
Default Explanation for yafu output

There are some things I wonder about, is there a simple explanation what's going on? I know this isn't strictly yafu, but I hope this is still the most fitting subforum.

Feel free to explain any single line of the output I pasted, when waiting for a factorization to finish it's interesting to know what is going on. :) My knowledge in math is limited though.
  • How is the "need at least x relations" for NFS and SIQS chosen?
    Code:
    found 542503 relations, need at least 4468504
    .
  • SIQS mentions full and partial relations, what are those? It's mentioned in the Pollard paper, but not in a way that I can understand.
    .
  • After nfs found the polynomial it accumulates relations:
    Code:
    total yield: 13895, q=1150861 (0.00246 sec/rel; ETA 0h02m)
    What is q? When does it stop for a specific q?
    .
  • After that it says
    Code:
    326 Special q, 488 reduction iterations
    reports: 99084057->10611292->9562226->4168473->4168470->3934638
    Number of relations with k rational and l algebraic primes for (k,l)=:
    Can this output be explained in simple terms?
    .
  • Afterwards several of these
    Code:
    Total yield: 70275
    0/0 mpqs failures, 32943/35804 vain mpqs
    milliseconds total: Sieve 39943 Sched 0 medsched 23512
    TD 72687 (Init 3342, MPQS 45683) Sieve-Change 57558
    TD side 0: init/small/medium/large/search: 859 3317 1029 1114 6297
    come up. It seems this yields some more relations. Once more, can it be explained in simple terms?
    .
  • In the linear algebra/filtering step it says
    Code:
    keeping 5478273 ideals with weight <= 200, target excess is 37555
    commencing in-memory singleton removal
    begin with 4730654 relations and 5478273 unique ideals
    reduce to 1339923 relations and 1328963 ideals in 22 passes
    max relations containing the same ideal: 85
    nfs: raising min_rels by 5.00 percent to 4997472
    What is the "target excess"? And what is the goal here? It seems like it wasn't reached, as min_rels was increased and sieving started again. This sometimes happens over and over. In this case it stopped when "... same ideal: 77".
    .
  • Code:
    found 313232 cycles, need 308046
    weight of 308046 cycles is about 21812975 (70.81/cycle)
    distribution of cycle lengths:
    1 relations: 32845
    2 relations: 31392
    3 relations: 31314
    4 relations: 28921
    5 relations: 26046
    6 relations: 22902
    7 relations: 20085
    8 relations: 17671
    9 relations: 15533
    10+ relations: 81337
    heaviest cycle: 26 relations
    commencing cycle optimization
    start with 2114433 relations
    pruned 48198 relations
    Can this part be explained briefly?
    .
  • Now it seems like the matrix is build
    Code:
    matrix is 307866 x 308046 (93.8 MB) with weight 29784334 (96.69/col)
    sparse part has weight 20889229 (67.81/col)
    filtering completed in 2 passes
    matrix is 306949 x 307129 (93.7 MB) with weight 29742133 (96.84/col)
    sparse part has weight 20875519 (67.97/col)
    matrix starts at (0, 0)
    matrix is 306949 x 307129 (93.7 MB) with weight 29742133 (96.84/col)
    sparse part has weight 20875519 (67.97/col)
    saving the first 48 matrix rows for later
    matrix includes 64 packed rows
    matrix is 306901 x 307129 (89.6 MB) with weight 23610028 (76.87/col)
    sparse part has weight 20425605 (66.50/col)
    using block size 65536 for processor cache size 4096 kB
    commencing Lanczos iteration (4 threads)
    Does the weight per col have to be large enough or why is it goind through several iterations? A large weight equals small matrix size?
    .
  • Then I think it is looking for the gcd
    Code:
    commencing square root phase
    reading relations for dependency 1
    read 153621 cycles
    cycles contain 571496 unique relations
    read 571496 relations
    multiplying 571496 relations
    multiply complete, coefficients have about 23.13 million bits
    initial square root is modulo 4407917
    This commences until gcd is differente from 1 or N. Each time the dependency increases, what does that mean? And is there a specific reason why the size of the coefficients is mentioned? It seems to fluctuate slightly with each dependency.
bur is offline   Reply With Quote
Old 2021-04-05, 07:42   #2
LaurV
Romulan Interpreter
 
LaurV's Avatar
 
Jun 2011
Thailand

22×17×139 Posts
Default

Generally, you have to solve a system with m equations, with n variables. As long as you have independent equations, and more equations than variables, you can solve the system. All the "sieving" phase is to find those equations. Then the linear algebra will solve the system.

Say you want to find out x, y, and z, and you ask your friends and one tells you that 2x+3y=7, you still don't know them. Another friend will tell you than x+y+z=6, you still don't know them. Now you ask a third friend and he tells you that 2x+2x+2z=12, you kick his ass, because this is not useful to you, it is not independent from the second, and then he says sorry and tells you 2y-z+w=5. Now, you don't need w, but you can't get more from this guy, you already pulled his fingernails and squeezed his nose.. but you continue asking, ans somebody else tells you that 5x-w=1. Now you have two "partial" relations which you can add together to get 5x+2y-z=6, and you can solve the system, and get x, y, and z.

Factoring is the same, but your variables are small primes, the coefficients in front are powers of these primes, and the addition is multiplication. Now you know...

With SIQS the things are simple, you start with a number of "variables" (primes) that is dependent of the size of the number you want to factor, larger number, more variables, and you kinda know "exactly" how many equations you will need. With NFS the waters are murky...

Five bucks for who explains it simpler.. (and don't kick me hard if the system is not solvable, I made it up as I was writing it, I think it is ok...)

Last fiddled with by LaurV on 2021-04-05 at 07:45
LaurV is offline   Reply With Quote
Old 2021-04-05, 12:51   #3
Dr Sardonicus
 
Dr Sardonicus's Avatar
 
Feb 2017
Nowhere

11C616 Posts
Default

If memory serves, for quadratic sieves the result you need here is that a homogeneous system of linear equations with more equations than variables automatically contains a linear dependency. (A system is homogeneous when there are no constant terms. A homogeneous system is thus automatically consistent, since setting all the variables equal to 0 gives a solution.

The application to factoring the composite number N is to find a non-trivial solution to x^2 == y^2 (mod N) (i.e. where neither x - y nor x + y is divisible by N).

As I recall, the basic idea is to select a bound B, and then look for squares (mod N) whose prime factors are all less than B (i.e are "B-smooth"). Say there are m such primes, p1 to pm.

Say we find x^2 == p1^e1*...* pm^em.

Since we're only interested in squareness, we can reduce the exponents (mod 2), so the e's are all either 0 or 1. Multiplying x-values corresponds to adding the exponents (which can then be reduced mod 2).

So we can replace our x-values by rows of m 0's and 1's (mod 2). If we have k such x-values, we can form a k by m matrix of 0's and 1's (mod 2). Adding rows corresponds to multiplying x-values.

The idea is that if k > m (and there are no duplicate rows) there will automatically be a non-trivial dependency among the rows, which means a nontrivial relation x^2 == y^2 (mod N).

Luckily, the integers mod 2 obey the same rules of addition and multiplication as the rational or real numbers (a "field") so dependencies can be found by doing Gaussian elimination on the matrix (linear algebra).

The success of the method depends on selecting B appropriately, and finding enough "B-smooth" squares (mod N).

There are ways to fiddle the B-smoothness requirement slightly, but it's been ages since I looked at this closely.

I am not sufficiently familiar with number field sieves (using polynomials of degree greater than 2) to explain much.
Dr Sardonicus is offline   Reply With Quote
Old 2021-04-05, 15:51   #4
charybdis
 
charybdis's Avatar
 
Apr 2020

2×7×19 Posts
Default

Quote:
Originally Posted by LaurV View Post
Now you ask a third friend and he tells you that 2x+2x+2z=12, you kick his ass, because this is not useful to you, it is not independent from the second, and then he says sorry and tells you 2y-z+w=5. Now, you don't need w, but you can't get more from this guy, you already pulled his fingernails and squeezed his nose


I don't have much more to say about QS; LaurV and Dr Sardonicus have done a pretty good job of explaining the basics. The one thing I'd add to connect their two posts is that we can allow x^2 mod N to have one or two factors ("large primes") a bit larger than the smoothness bound B; these are the "partial relations" that LaurV refers to, and they can be combined into full relations when the same large prime appears in more than one partial relation.

The mathematics involved in NFS is more sophisticated than QS, but the general idea is similar. Roughly speaking, "relations" and "cycles" in NFS correspond respectively to "partial relations" and "full/combined relations" in QS.

Quote:
Originally Posted by bur View Post
How is the "need at least x relations" for NFS and SIQS chosen?
Code:
found 542503 relations, need at least 4468504
For QS, we know the number of variables in the system of equations that we need to solve, so the number of (full/combined) relations needed is set to slightly more than that number of variables. With NFS, the number of relations needed is just an estimate produced by yafu. Often this is an underestimate, so if there aren't enough relations, yafu adds 5% to its estimate and goes back to sieving.

Quote:
After nfs found the polynomial it accumulates relations:
Code:
total yield: 13895, q=1150861 (0.00246 sec/rel; ETA 0h02m)
What is q? When does it stop for a specific q?
Relations in NFS consist of two numbers (the "rational norm" and "algebraic norm") which both satisfy some smoothness conditions, i.e. their prime factors are small. q=1150861 means that the siever is searching for relations where 1150861 is a factor of the algebraic norm, or maybe the rational norm if this is SNFS.

How long is spent on each q value depends on which siever yafu chooses. The sievers are labelled gnfs-lasieve4I<n>e with <n> ranging from 11 to 16; the larger this value, the longer the siever spends on a specific q.

Quote:
After that it says
Code:
326 Special q, 488 reduction iterations
reports: 99084057->10611292->9562226->4168473->4168470->3934638
Number of relations with k rational and l algebraic primes for (k,l)=:
Can this output be explained in simple terms?
"326 Special q" tells you how many q values have been tried in that range. (This is actually a slight lie for reasons that I won't go into.) I don't know what the rest of the numbers mean but I'd guess that they relate to the finer details of what the siever has been doing.

Quote:
Afterwards several of these
Code:
Total yield: 70275
0/0 mpqs failures, 32943/35804 vain mpqs
milliseconds total: Sieve 39943 Sched 0 medsched 23512
TD 72687 (Init 3342, MPQS 45683) Sieve-Change 57558
TD side 0: init/small/medium/large/search: 859 3317 1029 1114 6297
come up. It seems this yields some more relations. Once more, can it be explained in simple terms?
"Total yield: 70275" is the number of relations that the siever found in the range it's just completed. I think most of the other numbers here are timing information, showing how much CPU time was spent in each stage of the sieving process.

Quote:
In the linear algebra/filtering step it says
Code:
keeping 5478273 ideals with weight <= 200, target excess is 37555
commencing in-memory singleton removal
begin with 4730654 relations and 5478273 unique ideals
reduce to 1339923 relations and 1328963 ideals in 22 passes
max relations containing the same ideal: 85
nfs: raising min_rels by 5.00 percent to 4997472
What is the "target excess"? And what is the goal here? It seems like it wasn't reached, as min_rels was increased and sieving started again. This sometimes happens over and over. In this case it stopped when "... same ideal: 77".
The goal is to have enough relations to finish the factorization. yafu (well, really msieve) wants to know if it has enough relations to start combining ("merging") them into cycles. In this case it discovers that it doesn't, because 1339923-1328963 < 37555.

Quote:
Code:
found 313232 cycles, need 308046
weight of 308046 cycles is about 21812975 (70.81/cycle)
distribution of cycle lengths:
1 relations: 32845
2 relations: 31392
3 relations: 31314
4 relations: 28921
5 relations: 26046
6 relations: 22902
7 relations: 20085
8 relations: 17671
9 relations: 15533
10+ relations: 81337
heaviest cycle: 26 relations
commencing cycle optimization
start with 2114433 relations
pruned 48198 relations
Can this part be explained briefly?
This is a breakdown of the cycles by how many relations have been merged to make them.

Quote:
Code:
matrix is 307866 x 308046 (93.8 MB) with weight 29784334 (96.69/col)
sparse part has weight 20889229 (67.81/col)
filtering completed in 2 passes
matrix is 306949 x 307129 (93.7 MB) with weight 29742133 (96.84/col)
sparse part has weight 20875519 (67.97/col)
matrix starts at (0, 0)
matrix is 306949 x 307129 (93.7 MB) with weight 29742133 (96.84/col)
sparse part has weight 20875519 (67.97/col)
saving the first 48 matrix rows for later
matrix includes 64 packed rows
matrix is 306901 x 307129 (89.6 MB) with weight 23610028 (76.87/col)
sparse part has weight 20425605 (66.50/col)
using block size 65536 for processor cache size 4096 kB
commencing Lanczos iteration (4 threads)
Does the weight per col have to be large enough or why is it goind through several iterations? A large weight equals small matrix size?
The weight is the number of non-zero entries in the matrix, i.e. the number of 1s, since the matrix is mod 2 as Dr Sardonicus explained. The lower the weight, the easier it is to find the linear dependencies, but there's a trade-off, as if we aim for a sparser matrix during the merge phase, it ends up being larger. This is an important consideration when handling large numbers for which the matrix step can take weeks, but the target matrix density can only be altered by calling msieve manually rather than via yafu.

Quote:
Then I think it is looking for the gcd
Code:
commencing square root phase
reading relations for dependency 1
read 153621 cycles
cycles contain 571496 unique relations
read 571496 relations
multiplying 571496 relations
multiply complete, coefficients have about 23.13 million bits
initial square root is modulo 4407917
This commences until gcd is differente from 1 or N. Each time the dependency increases, what does that mean? And is there a specific reason why the size of the coefficients is mentioned? It seems to fluctuate slightly with each dependency.
In QS, the aim is to find x and y such that x^2 = y^2 mod N, and then taking gcd(N, x-y) gives a factor of N. However, half of the time this factor is 1 or N. To insure against this possibility, we want to find lots of (x,y) pairs. Luckily, this requires virtually no extra effort compared to finding one such pair. Suppose that instead of n+1 equations in n variables, we have n+50 equations in n variables. Now we get 50 independent solutions (dependencies), and if each of these has a 1/2 probability of producing a non-trivial factor of N, then it's overwhelmingly likely that we find the factors. 50 more relations can be found very quickly, so when running QS, yafu sieves until there are enough relations that a factorization is virtually guaranteed.

With NFS, the principle is the same: we have lots of linear equations mod 2, and each independent solution/dependency has a 1/2 chance of finding the factors. However, in NFS, we don't get x and y such that x^2 = y^2 mod N straight away; we have to take the square root of an enormous complex number to find them. As for the coefficient sizes, you'd have to ask msieve's creator jasonp to find out why they're printed. But at least they give an idea of the size of the number we're taking the square root of.
charybdis is offline   Reply With Quote
Old 2021-04-05, 16:42   #5
bsquared
 
bsquared's Avatar
 
"Ben"
Feb 2007

D6E16 Posts
Default

Excellent info provided so far. The only thing I can add is some specifics on how many relations are needed for NFS and SIQS.

For SIQS, the number of relations needed is just a few more than the number of factor base primes. We know at any point during the sieving process how many relations have been found: the number of full relations + the number of relations combined from partials. A full relation occurs whenever Q(x)/a = (ax + 2b)x + c fully factors over the factor base. (a, b, and c are coefficients of a quadratic polynomial and x is some small integer). If Q(x)/a almost fully factors, with 1 or 2 large primes remaining, then we have a partial relation. These are added to a special data structure that can be quickly processed to tell us how these partial relations can be combined into fulls (I can't beat the simplicity of LaurV's example here). It is a little more work to gather and process these partial relations, but it is well worth it. Large jobs obtain most of their full relations by combining from partials.

For NFS, the number of relations needed is mostly a function of the large prime bounds. These are the lpba and lpbr lines of the .job file. This thread provides a little more info on the estimates yafu makes (the last few posts in particular).

Last fiddled with by bsquared on 2021-04-05 at 16:44
bsquared is offline   Reply With Quote
Old 2021-04-05, 21:09   #6
henryzz
Just call me Henry
 
henryzz's Avatar
 
"David"
Sep 2007
Cambridge (GMT/BST)

2×5×587 Posts
Default

In reality NFS has similar requirements to qs for how many relations are needed. We just count them differently which messes that up. In QS we tend to count full and combined relations while in nfs we count partial relations. For 1 and 2 large primes it is possible to fully count how many combined relations have been found. This becomes a lot harder for 3 or more(a lot of nfs has 2 for rational poly and 2 for algebraic making 4).
henryzz is online now   Reply With Quote
Old 2021-04-07, 17:34   #7
bur
 
Aug 2020

32·19 Posts
Default

Thanks a lot guys, that is some great reading. You know someone really understood a topic if they can explain the basics in very simple terms. Reading it put a smile to my face both due to finally understanding what's going on (on a low level probably) and also because the poor guy got his nails pulled (shame on me).


It also made me appreciate what a genius idea these sieves are and how enormous the computational work is.



I will have to go through it a few more times and likely some questions will come up.


Thanks again.
bur is offline   Reply With Quote
Old 2021-04-07, 17:51   #8
bsquared
 
bsquared's Avatar
 
"Ben"
Feb 2007

65568 Posts
Default

Quote:
Originally Posted by LaurV View Post
you ask a third "friend" ... you kick his ass ... you already pulled his fingernails and squeezed his nose..
That's how you treat your friends, eh? Remind me never to become your enemy
bsquared is offline   Reply With Quote
Old 2021-04-10, 08:52   #9
bur
 
Aug 2020

32·19 Posts
Default

Code:
commencing square root phase
reading relations for dependency 1
read 182232 cycles
cycles contain 637316 unique relations
read 637316 relations
multiplying 637316 relations
multiply complete, coefficients have about 27.84 million bits
initial square root is modulo 98794741
GCD is 1, no factor found
reading relations for dependency 2
read 181783 cycles
cycles contain 636612 unique relations
read 636612 relations
multiplying 636612 relations
multiply complete, coefficients have about 27.80 million bits
initial square root is modulo 96579421
GCD is N, no factor found
 reading relations for dependency 3
...
charybdis, I didn't really get the explanation for the last part. I included another example showing that it takes several attempts to find a GCD != 1 or N.

If I got you right, more than n relations are found for n variables, so with the excess our chance of finding a factor are is 50% for each excess relation. But why does it happen so often that I still see GCD is 1 or N with this large excess?


So it's two questions, first why does it happen so often and second, what is changed (apparently "the dependency") until finally a non-trivial GCD is found?



And something else:
Code:
begin with 1395803 relations and 1627193 unique ideals
reduce to 1343077 relations and 1305488 ideals in 9 passes
max relations containing the same ideal: 84
I understood that target excess is 28409 means that 28409 more relations than ideals are required and it's fulfilled in the example, but what are those ideals? Maybe it's even possible to explain the difference between unique ideals and ideals?
bur is offline   Reply With Quote
Old 2021-04-10, 12:50   #10
charybdis
 
charybdis's Avatar
 
Apr 2020

2·7·19 Posts
Default

Quote:
Originally Posted by bur View Post
charybdis, I didn't really get the explanation for the last part. I included another example showing that it takes several attempts to find a GCD != 1 or N.

If I got you right, more than n relations are found for n variables, so with the excess our chance of finding a factor are is 50% for each excess relation. But why does it happen so often that I still see GCD is 1 or N with this large excess?


So it's two questions, first why does it happen so often and second, what is changed (apparently "the dependency") until finally a non-trivial GCD is found?
Each dependency is a single solution to the equations, i.e. a set of columns of the matrix which add to 0. At the end of the linear algebra you'll see something like "recovered 30 nontrivial dependencies". Each of these dependencies has a 1/2 probability of finding a non-trivial gcd, assuming there are only two factors.

Quote:
I understood that target excess is 28409 means that 28409 more relations than ideals are required and it's fulfilled in the example, but what are those ideals? Maybe it's even possible to explain the difference between unique ideals and ideals?
Continuing the analogy from earlier, ideals are essentially the variables that our equations are in. (They correspond roughly to factors of the rational and algebraic norms.) I don't think there's any difference between "ideals" and "unique ideals" here; the word "unique" implies that we count each ideal only once, no matter how many relations it appears in.

Last fiddled with by charybdis on 2021-04-10 at 13:38 Reason: rows -> columns
charybdis is offline   Reply With Quote
Old 2021-04-10, 15:41   #11
LaurV
Romulan Interpreter
 
LaurV's Avatar
 
Jun 2011
Thailand

223548 Posts
Default

Quote:
Originally Posted by bsquared View Post
That's how you treat your friends, eh? Remind me never to become your enemy
"Please God protect me from my friends. From my enemies I protect myself." (old Romanian proverb).
LaurV is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
YAFU 1.34.5 sometimes doesn't output all prime factors it finds James Heinrich YAFU 8 2021-03-15 19:43
Running YAFU via Aliqueit doesn't find yafu.ini EdH YAFU 8 2018-03-14 17:22
Yafu not writing to the output file as requested BudgieJane YAFU 3 2016-02-22 15:14
Bounds explanation Uncwilly Lounge 4 2011-04-01 19:15
explanation on polynomial firejuggler Aliquot Sequences 7 2010-05-29 02:46

All times are UTC. The time now is 22:11.

Mon May 17 22:11:35 UTC 2021 up 39 days, 16:52, 0 users, load averages: 3.92, 3.43, 3.42

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.