![]() |
![]() |
#12 | |
"Ben"
Feb 2007
70418 Posts |
![]() Quote:
The resulting matrix for the C135 wasn't very large, but it was pretty dense. Some of that can be improved by rejecting cycles that are huge, which I didn't do. Even so, single-thread Lanczos only took an hour or so to run on the dense matrix. |
|
![]() |
![]() |
![]() |
#13 |
Tribal Bullet
Oct 2004
DD916 Posts |
![]()
Ok, I was asking because if you want to tune that part of the process you can convert relations into the intermediate format that Msieve's NFS filtering uses and then run all the filtering as if it was an NFS job. I can almost guarantee the resulting matrix would wind up much better because you can leverage the algorithm research that goes into NFS filtering.
Of course that takes a backseat to tuning the sieving. |
![]() |
![]() |
![]() |
#14 | |
Bamboozled!
"๐บ๐๐ท๐ท๐ญ"
May 2003
Down not across
2·5,657 Posts |
![]() Quote:
Back in the day there were still many composites of interest which were still best done by QS, not least because GNFS implementations were not easily available and, even if you could get hold of one, it was bleeding edge code which need skill and patience to nurse its way through a factorization. Sic transit gloria mundi. |
|
![]() |
![]() |
![]() |
#15 | |
"Ben"
Feb 2007
70418 Posts |
![]() Quote:
An NFS run on the same number and same computer took about 8 hours for poly find and sieving. The matrix was larger, of course, but used Jason's multi-threaded and better optimized Lanczos. The matrix+sqrt took about another hour (on a different machine on which those tasks are better suited). In round numbers, QS ran about 75hrs / 9 hrs ~ 8 times slower than NFS. The NFS in question is yafu's automated msieve+ggnfs version. |
|
![]() |
![]() |
![]() |
#16 | |
Bamboozled!
"๐บ๐๐ท๐ท๐ญ"
May 2003
Down not across
2×5,657 Posts |
![]() Quote:
The hardware architectures are radically different. Back in the day memory could almost keep up with the cpu. That hasn't been true for a long time now. Our computation ran for a large part on MIPS and Sparc processors. That which did use x86 ran on 486 or early (inevitably 32-bit) Pentium cpus. You also appear to be comparing elapsed times. A fairer comparison would be to use cpu-core time. Last fiddled with by xilman on 2019-07-26 at 19:41 |
|
![]() |
![]() |
![]() |
#17 | |
"Ben"
Feb 2007
3,617 Posts |
![]() Quote:
|
|
![]() |
![]() |
![]() |
#18 |
Tribal Bullet
Oct 2004
5·709 Posts |
![]()
I've been remiss in adding my congrats, getting end-to-end QS with three large primes is a major major undertaking.
|
![]() |
![]() |
![]() |
#19 | |
Bamboozled!
"๐บ๐๐ท๐ท๐ญ"
May 2003
Down not across
101100001100102 Posts |
![]() Quote:
![]() Last fiddled with by xilman on 2019-07-26 at 21:01 |
|
![]() |
![]() |
![]() |
#20 | |
Sep 2009
91A16 Posts |
![]() Quote:
Chris |
|
![]() |
![]() |
![]() |
#21 | ||
"Tilman Neumann"
Jan 2016
Germany
7578 Posts |
![]() Quote:
Quote:
Otherwise, I might be able to deduce the parameters of Ben's hardware from other threads, but https://infoscience.epfl.ch/record/1...es/NPDF-28.pdf does not seem to contain the necessary information. Anyway, whoever wants more accurate numbers is free to compute them :-) |
||
![]() |
![]() |
![]() |
#22 | |
"Ben"
Feb 2007
70418 Posts |
![]() Quote:
The paper has timings for the main computational steps of the class group computations for imaginary quadratic fields with d-digit discriminants. They mention that the sieving step of this computation is the same as that used during SIQS integer factorization, but I don't know much more than that about class group computations. Anyway, the table reports a time of 2.23 core years spent in the sieving step for a C130 input. Their core is a 1.9GHz Opteron core. I used a 7210 KNL processor that has 256 virtual cores (64 physical cores, each with 4 hyperthreads) that run at 1.5 GHz. The C135 sieving took 74 wall clock hours * 256 cores = 18,944 core-hours = 2.16 core years. Although I'm not sure how to handle the hyperthreads... if you only count physical cores (performance certainly does not scale linearly past 64 cores on this processor) then it spent 0.54 core years. Call it double that to 1.08 core-years to account for the diminishing return of the hyperthreads. Scaling the ratio of core-years by the ratio of clock frequencies and inversely by the ratio of effort I get: 2.23 / 1.08 * 1.9 / 1.5 * exp(sqrt(ln C135 ln ln C135)) / exp(sqrt(ln C130 ln ln C130)) which if I did that right means 4.88 times the effort for equivalent job sizes. This still isn't right because the sievers are probably memory bound, not cpu bound, and the two implementations use way different instruction sets (and AVX512 could maybe also accelerate their ideas). But it's enough to tell me that it might not be worth the effort to try to implement it. Even so, I do need to read more and try to understand it better. |
|
![]() |
![]() |
![]() |
Thread Tools | |
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Where do I send my PRP primes with large k? | Trilo | Riesel Prime Search | 3 | 2013-08-20 00:32 |
48-bit large primes! | jasonp | Msieve | 24 | 2010-06-01 19:14 |
NFS with 5 and 6 large primes | jasonp | Factoring | 4 | 2007-12-04 18:32 |
Why only three large primes | fivemack | Factoring | 18 | 2007-05-10 12:14 |
What is the use of these large primes | Prime Monster | Lounge | 34 | 2004-06-10 18:12 |