View Single Post
2004-05-26, 15:30   #3
R.D. Silverman

Nov 2003

22·5·373 Posts

Quote:
 Originally Posted by xilman Allow me to add a bit.... Right, now fix a large prime, one bigger than any in the factorbase for a particular polynomial. Call it q. Better, call it special_q because we've chosen it to be special. Now, you will again agree that norms divisible by this special-q are regularly separated in the sieving region, right? That is, they form a lattice.
They actually form a sub-lattice of the original lattice. One keeps track
of the sub-lattice by finding a basis (which is easy) then using LLL or some
other method to QUICKLY find a reduced basis. The quality of the reduction
is important. See below.

Quote:
 Originally Posted by xilman The clever bit of the lattice sieve is to transform the polynomials or, entirely equivalent, the coordinate system of the sieving rectangle so that these lattice points become adjacent in the transformed system. Now sieve the transformed region with the factorbase primes. You know, a priori, that all the norms are divisible by a large prime and so, after that prime is divided out, the norm will be smaller
Yes. But there is a "seesaw" effect. As one pushes the norms of one polynomial smaller, the norms of the other one get bigger. If the reduced
lattice isn't small enough, the decreased norm is outweighed by a bigger
increase (bigger than the decrease) in the other norm.

Quote:
 Originally Posted by xilman That's the handwaving reason why the lattice sieve is likely to be faster than the line sieve. It completely glosses over some features which damage its performance. For a start, reducing the norm of one polynomial is likely, in practice, to increase the norm of the other. To some extent this can be counteracted by using special-q on the polynomial which typically has the greater norm. Another source of inefficiency is the requirement for the coordinate transformations for each special-q.
Yes. And computing the transform does take time.

Quote:
 Originally Posted by xilman It's my belief that Jens Franke's lattice siever gains most of its speed over the CWI line siever from implementational differences. It has assembly language support for various x86 systems whereas the CWI siever uses much more general purpose code. The lattice siever's use of a very fast mpqs for factoring 2-large prime candidates probably outperforms the CWI-siever's use of rho and squfof, though I haven't evaluated that area in any detail.
I considered using QS a long time ago, but thought that SQUFOF (in single
precision) would be faster. In any event my code takes ~ 7% of the total
run time in squfof. Doubling the speed would only yield ~3% improvement in total run time. Pollard Rho is quite a bit slower than squfof, but my code
succeeds with squfof better than 95% of the time. Only if squfof fails do I use Rho.

QS looks better for the 3 large prime variation.

Here is siever data (mine) for 2,653+. The sieve length is [-13M, 13M] per
b-value. Total values sieved is 13.6 x 10^9 (26 x 2^20 x 500)

Siever built on Mar 18 2004 12:47:42
Finished processing the range 704500 to 704999 <-- b's
In 2296.507117 elapsed seconds
This is approximately 1920 Million Arithmetic Operations/sec <-- by estimated count of arithmetic ops only

<-- times are msec -->
Total sieve time = 1419237.100308 (odd b sieving plus all subroutines)
Total even sieve time = 858458.400465 ("" even b's plus subroutines)
Total resieve time = 74504.261681 (time to factor by resieving odd b)
Total even resieve time = 62589.224396 ("" even b)
Total trial int time = 12307.307793 (time for trial division; linear poly)
Total trial alg time = 208669.855863 ("" sextic poly)
Total alg scan time = 29042.052749 (time to scan for successes)
Total alg squfof time = 161006.544098 (time running squfof on sextic)
Total int squfof time = 9264.036061 ("" linear)

This last line is the actual time spent JUST sieving odd b's. Even b's
take about 55% of the odd ones.
Total asieve, isieve = 561935.884146 599546.527601

Quote:
 Originally Posted by xilman Down sides of the lattice sieve become apparent when you consider the post-sieving phases. First off, a prime can be a special-q and also a regular large prime for a different special-q and vice versa. That is, duplicate relations are almost inevitable when using a lattice sieve. The dups have to be identified and rejected. This takes computation and storage and, in a distributed computation, comms bandwidth. It also means that the raw relations/second measure isn't quite such a good measure of efficiency as it is for the line siever. Worse, the number of duplicates increases as the number of relations grows (another view of the birthday paradox) and so the effective rate of relation production falls as the computation proceeds.
Yes. comparing speed by looking at output for a short time falsely compares
the two methods because toward the end the lattice siever generates
quite a lot of duplicates.

One advantage the lattice siever has is the following. The yield rate for the
line siever decreases over time because the norms get bigger as the sieve
region moves away from the origin. The lattice siever brings the sieve region
"back to the origin" when special-q's are changed. This might be its biggest