 mersenneforum.org Questions about the QS
 Register FAQ Search Today's Posts Mark Forums Read 2007-03-24, 23:17 #1 Carmichael   Mar 2007 22 Posts Questions about the QS Hi, I wanted to understand how the quadratic sieve works. I have understand the most part of it but I have some questions because I don't understand everything. I am sixteen years, sorry if I maybe I ask some stupid things. I found this forum thanks to the good mersenne wiki :) 1) I understand the process of finding congruence of square, and the idea of finding smooth number so that we will have smaller vector but I don't understand the most important process : the sieving. How does that works ? I understand that we first need a factor base with the Legendre symbol 1 (n is the number i want to factorise, p a number thant (n/p) must be one. But after, how do we check that y's ( y(x) = (floor sqrt n + x)² - n) are smooths ?? 2) Why do we have to find one relation more than the size of the factor base to can resolve our matrix. 3) So my big question is about the smooth numbers and the sieving : / Please help me :) Thank you very much !!   2007-03-25, 14:04   #2
bdodson

Jun 2005
lehigh.edu

210 Posts Quote:
 Originally Posted by Carmichael I wanted to understand how the quadratic sieve works. I have understand the most part of it ... but I don't understand the most important process : the sieving.
You already understand a lot more than many of the people trying to
explain QS, since, as you say, most explanations omit the sieving. As
it happens, I was just explaining this in my math 396, intro crypto, course,
so have recently reminded myself of the main points. If we use y(x)
for the original QS polyn (as distinct from MPQS or SIQS versions), we're
supposed to look at y(x) "mod p", for each prime p in our factor base
(actually, I'm used to omitting the "tiny" primes, but that seems to
depend upon who one asks). Anyway, we write y(x) = (x - r1)(x - r2) =
x^2 - (r1 + r2)x + (r1)(r2) where r1 and r2 are between 0 and p-1
so that " 2 floor sqrt n" + (r1 + r2) is a multiple of p and
("floor sqrt n")^2 -n - (r1)(r2) is a multiple of p. We're supposed to
be looking for locations "i" so that y(i) is divisible by "sufficiently many"
of the factor base primes we're sieving by. That's where the critical
"fast memory access" that distinguishes sieving from other computationally
intensive tasks come in, so we start by initiallizing all locations in a sieving
interval --- say [-L, L] = -L, (-L)+1, ... , -1, 0, 1, ..., L --- for the sake
of discussion. One might consider just factoring the y(i)'s, but the
factor base is supposed to be sufficiently small that y(i) is extremely
unlikely to factor completely using only factor base primes. We'd be wasting
way too much time; factoring way too many numbers, almost all of
which have no chance of working. So here's the Key Fact: a factor
base prime p divides y(i) if and only if the location i differs from one of the
"mod p roots" r1 or r2 by a multiple of p. Another point, it's not just how
many primes occur at a given location, but also their size, so we mark the
locations given by the Key Fact with log(p). We only send y(i) off to be
factored when the location has the sum of the log(p)'s that occur there
sufficiently close to log(y(i)). If we set our "sufficently close" too large, we
get too many false-positives, and spend too much time factoring y(i)'s
that don't work --- too small, and we miss too many locations for which
y(i) would have worked.

As the numbers we try to factor get larger, we have to take larger factor
base sizes. But, eventually, there's a diminshing return that sets in;
further increases give fewer-and-fewer additional factor base primes,
and they're less-and-less likely to occur in factorizations. That's one of
two reason's why it's also the Multiple-Polynomials, in MPQS or SIQS, that's
the Key Idea to getting QS to work well. There's not just the initial
Quadratic Sieve polyn y(x) that has a given collection of primes as
quadratic residues, but a large collection of polyn. So as we get to locations
i further away from 0, the y(i) get larger/harder to factor; but since
there are other polyn to use (for small i's), we just throw y(x) away,
and go on to the next polyn. Even better, we break up the possible polyn
into disjoint regions, and send different polyn to different computers,
and/or to different people to use on their computers, which is what
makes the sieving step a distributed computation (better than parallel;
during sieving on a given polyn, there's no communication needed at
all between cpus). Hope this helps.

As I recall reading the early Math Comp papers (Bob's for one) and the
factor-by-email ppmpqs code, there was a clear description of what needed
to be done to make sure that all of the polyn used give the same collection
of quad residue primes. Perhaps I didn't read closely enough, but iirc
I wasn't able to see that being done for the g_a,b(x)'s in the wiki. -bd

postscript -- as the numbers being factored get larger yet, even
switching polyn doesn't have enough small primes for the factor base;
and nfs comes to the rescue by adding small algebraic primes, as determined
by the specific Number Field (the nf of nfs) being used. One of the key
steps there -- to get from RSA130 to RSA140/RSA155=RSA512 for example
-- is to pick the nf to have many more small primes than average. IIRC,
for the (Lehigh) polyn used for RSA155, we have the usual 25 primes
smaller than 100, and expect a random nf to also have 25 primes "smaller"
than 100, but the nf used actually managed over 50, more than twice as
many as a random nf.

Quote:
 I understand that we first need a factor base with the Legendre symbol 1 (n is the number i want to factorise, p a number thant (n/p) must be one. But after, how do we check that y's ( y(x) = (floor sqrt n + x)ÃÂÃÂ² - n) are smooths ??

Last fiddled with by bdodson on 2007-03-25 at 14:10 Reason: typo on sieve interval, and a missing log, inconsistency in notation   2007-03-25, 17:23   #3
jasonp
Tribal Bullet

Oct 2004

5·709 Posts Quote:
 Originally Posted by Carmichael 2) Why do we have to find one relation more than the size of the factor base to can resolve our matrix.
Bruce already explained a lot about the sieving. When implementing QS I found Scott Contini's paper 'Factoring Integeres Using the Self-Initializing Quadratic Sieve' to be a very useful resource.

You use the sieving technique to fill up an array of log values of y(x). Then you scan linearly through the array, looking for values that are 'close' in some sense to log(y(i)) . In order to do that, you can compute log(y(i)) for all the y(i) you're going to be checking, but that's too much work. A better way for ordinary QS is to compute log(y(i)) every once in a while (say, once every 10000 i values) and use that value as a cutoff for the next bunch of y(i). Any value in the sieve array that is almost this number gets trial factored; the best value of the cutoff should therefore allow a little room below log(y(i)), exactly how much room being implmentation dependent.

As for your other question, linear algebra tells you that any time you have a matrix A with N linear equations, if those equations have at least N+1 variables then A*x = 0 has a nonzero solution for x. In the QS case, every row of the matrix represents a prime number, every column represents a relation from the sieving, and x is an array of bits that tell you whether or not a relation is used in the QS square root. If A*x = 0, then some combination of relations produce a product of prime numbers where all the primes have even exponents (this is what a zero vector means), and so the resulting product of relations is a square. Wikipedia has a good worked example of QS that attempts to explain all this.

jasonp   2007-03-26, 17:02   #4
Carmichael

Mar 2007

22 Posts Thank you very much for your replies !

bdodson

Thank you for this long reply. But it's a bit to far for me, I want first to understand everything in the basic QS of Pomerance and then I'll look for the MPQS.
Now I understand how the sieving works. It's like the sieve of Erathostem but we cross the numbers, and the numbers that are the most crossed are smooth.
Or if we divide, the smooth number we'll be the number that at the end five "1" because they were divided by all the numbers in the factor base.

But what I don't understand, is how do we implant this in an algorithm based on squares to find the smooth numbers ??? I don't understand how do we get these strange roots.

From wikipedia :

Quote:
 Here is an example. Let n = 1817, therefore m, the floor of the square root of n, is 42. Since n is small, the basic polynomial is enough: y(x) = (x + 42)2 − 1817.  Data collection For a factor base, only primes p whose Legendre symbol (n/p) is one are needed: F = { − 1,2,7,13}. Now, for sieving purposes, solve the congruence x^2\equiv 1817\pmod{p} for each p in the factor base. The square roots are: Mod 2: 1 Mod 7: 2 and 5 Mod 13: 6 and 7.
I don't understand this :

Now, for sieving purposes, solve the congruence

x^2\equiv 1817\pmod{p}

for each p in the factor base. The square roots are:

Mod 2: 1
Mod 7: 2 and 5
Mod 13: 6 and 7

Do you have a piece of code, or some documents that I should read about the smooth numbers, quadratic sieve... ?

jasonp

Thank you now I understand. (I have to revise the matrixes...)

Thank you very much for your help !

Last fiddled with by Carmichael on 2007-03-26 at 17:03   2007-03-26, 18:08   #5
jasonp
Tribal Bullet

Oct 2004

5×709 Posts Quote:
 Originally Posted by Carmichael But what I don't understand, is how do we implant this in an algorithm based on squares to find the smooth numbers ??? I don't understand how do we get these strange roots.
y(x) is a quadratic polynomial, and you need to find all the x where this polynomial is divisible by a prime p. This means finding the x where y(x) mod p = 0. Because y(x) is a quadratic polynomial there are two polynomial roots r1 and r2, so any x = r1 + k*p and x = r2 + k*p for any k has y(x) divisible by p. Finding r1 and r2 is a little tricky; these are 'modular roots', not the ordinary polynomial roots (see here and also Contini's paper, it's the reference that made this clear for me). It is these values of x in the sieve array that get log(p) added to them, to indicate divisibility by p.

jasonp

Last fiddled with by jasonp on 2007-03-26 at 18:10   2007-04-07, 16:16 #6 Carmichael   Mar 2007 410 Posts Ok, Now I understand nearly all about the use and implementation of the quadratic sieve. But if I sieve with the trial division. So I mean, i can use and understand the quadratic sieve just for for short number. Larger numbers need the use of sieving with the Shank Tonelli algorithm. I still don't understand how these modular roots can help me in sieving and finding the smooth numbers y(x) :( Please help my brain    2007-04-07, 20:05   #7
jasonp
Tribal Bullet

Oct 2004

67318 Posts Quote:
 Originally Posted by Carmichael Ok, Now I understand nearly all about the use and implementation of the quadratic sieve. But if I sieve with the trial division. So I mean, i can use and understand the quadratic sieve just for for short number. Larger numbers need the use of sieving with the Shank Tonelli algorithm. I still don't understand how these modular roots can help me in sieving and finding the smooth numbers y(x) :( Please help my brain Did you try reading Contini's paper?   2007-04-09, 12:47   #8
R.D. Silverman

"Bob Silverman"
Nov 2003
North of Boston

23×3×311 Posts Quote:
 Originally Posted by Carmichael Ok, Now I understand nearly all about the use and implementation of the quadratic sieve. But if I sieve with the trial division. So I mean, i can use and understand the quadratic sieve just for for short number. Larger numbers need the use of sieving with the Shank Tonelli algorithm. I still don't understand how these modular roots can help me in sieving and finding the smooth numbers y(x) :( Please help my brain My 1987 paper, "The Multiple Polynomial Quadratic Sieve" explains
everything in detail.

Let Q(x) be a [quadratic] polynomial. We seek values of x such
that Q(x) is smooth with respect to a fixed set of primes {p1, p2, p3.......}

Observe that if p divides Q(x), then it also divides Q(x + kp) for all
integers k. Solve Q(x) = 0 mod p. There will be two roots (unless p
divides the discriminant). Call these r1 and r2.
Note now that Q(r1), Q(r1 + p), Q(r1 + 2p), .... and Q(r2), Q(r2 + p)...
are ALL divisible by p. Thus, we add log(p) to sieve locations
r1, r1+p, r1+2p, ....
r2, r2+p, r2+2p....

for ALL primes in our fixed set of primes. Now observe that if the
accumulated sum at any location x is close to the value of log(Q(x)),
then Q(x) will factored over our fixed set of primes. Estimating
log(Q(x)) is easy.   2007-04-10, 11:30 #9 Carmichael   Mar 2007 22 Posts Thank you very much for all your help, now I understand the QS algorithm      Thread Tools Show Printable Version Email this Page Similar Threads Thread Thread Starter Forum Replies Last Post Dubslow GPU Computing 1 2011-08-05 18:22 OmbooHankvald Factoring 6 2005-08-28 19:31 OmbooHankvald Prime Sierpinski Project 2 2005-08-01 20:18 OmbooHankvald Math 6 2005-06-23 11:42 xtreme2k Lounge 59 2002-10-31 06:20

All times are UTC. The time now is 06:12.

Thu Aug 11 06:12:51 UTC 2022 up 35 days, 1 hr, 2 users, load averages: 1.40, 1.22, 1.16