![]() |
![]() |
#34 |
Romulan Interpreter
"name field"
Jun 2011
Thailand
2·17·293 Posts |
![]()
Wow! Ben, that is freaking wonderful! And you kept it for yourself for such a long time!
![]() Now I will have to upgrade yafu... (grrr... still using old versions here and there, you know, if it works, don't fix it...) (unfortunately, since I discovered pari, I did most of my "numerology" in it, so the loops and screws came somehow too late, but be sure I will give it a try, I still have some ideas in a far side corner of the brain, which I never had the time/knowledge/balls to follow...) Last fiddled with by LaurV on 2020-05-22 at 04:39 |
![]() |
![]() |
![]() |
#35 | |
"Ben"
Feb 2007
361710 Posts |
![]() Quote:
![]() Last fiddled with by bsquared on 2020-05-22 at 16:24 |
|
![]() |
![]() |
![]() |
#36 |
"Ben"
Feb 2007
3,617 Posts |
![]()
I've been spending a little time with the TLP variation again... integrating jasonp's batch factoring code and investigating parameters. The batch factoring code provides a huge speedup for TLP, easily twice as fast as before at C110 sizes. The crossover with DLP quadratic sieve now appears to be around C110, although I'm still not convinced I have found optimal parameters for TLP. There are a lot of influential parameters. So as of now, TLP is still slower than regular DLP in sizes of interest to the quadratic sieve (C95-C100).
Code:
C110 48178889479314834847826896738914354061668125063983964035428538278448985505047157633738779051249185304620494013 80 threads of cascade-lake based xeon: DLP: 1642 seconds for sieving TLP: 1578 seconds for sieving |
![]() |
![]() |
![]() |
#37 | |
"Tilman Neumann"
Jan 2016
Germany
2·13·19 Posts |
![]() Quote:
Compare a polynomial-major SIQS to a prime-major SIQS. (both single-threaded) The conversion of my old (polynomial-major) SIQS to a prime-major SIQS took two steps: 1. Compute the solution arrays for all (a, b)-polynomials of one a-parameter outside the sieve. 2. In the sieve, have outer loop over primes, inner loop over polynomials. At first, performance was very bad. I had to switch two parameters to improve that: 1. Increase the sieve array size. 2. Reduce qCount = the number of q's whose product gaves the a-parameter. These measures improved performance a lot, but the result is still bad. Here is a comparison to the polynomial-major sieve: Code:
200 bit: Poly-mj 4s, 789ms, Prime-mj 11s, 224ms, ratio = 2.34370 210 bit: Poly-mj 8s, 22ms, Prime-mj 21s, 269ms, ratio = 2.65133 220 bit: Poly-mj 19s, 609ms, Prime-mj 55s, 826ms, ratio = 2.84695 230 bit: Poly-mj 40s, 232ms, Prime-mj 2m, 12s, 960ms, ratio = 3.30483 240 bit: Poly-mj 55s, 305ms, Prime-mj 3m, 12s, 622ms, ratio = 3.48290, Prime-mj sieveArraySize = 119552 250 bit: Poly-mj 2m, 15s, 526ms, Prime-mj 8m, 11s, 63ms, ratio = 3.62338, Prime-mj sieveArraySize = 159744 260 bit: Poly-mj 4m, 49s, 561ms, Prime-mj 43m, 30s, 551ms, ratio = 9.01554, Prime-mj sieveArraySize = 206848 For smaller N, the performance gap stems mostly from additional array accesses and another loop for the largest primes that is not necessary in the polynomial-major version. The performance break-in at 260 bit might point out too many L3 cache misses. Kleinjungs SIQS might remedy many of the problems of a silly prime-major sieve. But maybe we can see here what the challenge is. ----------------------------------------------- @Jason: It seems that Colin Percivals stuff has never been published? The only thing I found were some comments in another forum... Last fiddled with by Till on 2020-07-12 at 16:24 |
|
![]() |
![]() |
![]() |
#38 |
Tribal Bullet
Oct 2004
DD916 Posts |
![]()
It was never published. When I asked him about it in ~2007 he said he was too busy with his startup company to work on it.
|
![]() |
![]() |
![]() |
#39 | |
"Ben"
Feb 2007
3,617 Posts |
![]() Quote:
First instance of a C100 faster by TLP! Standard double-large-prime (small-blocks, large-core-count server): Code:
./yafu "siqs(1802716097522165018257858828415111497060066282677325501816640492782221110851604465066510547671104729)" -v -threads 72 -siqsNB 6 -inmem 200 116794 rels found: 28831 full + 87963 from 1988560 partial, (24888.91 rels/sec) Elapsed time: 81.0558 sec QS elapsed time = 81.2256 seconds. Tweaked triple-large-prime (small-blocks, large-core-count server): Code:
./yafu "siqs(1802716097522165018257858828415111497060066282677325501816640492782221110851604465066510547671104729)" -v -threads 72 -siqsNB 6 -forceTLP -siqsBDiv 3 -siqsMFBT 2.6 -siqsMFBD 1.9 -noopt -siqsTFSm 18 -siqsLPB 27 -siqsTF 95 -siqsB 60000 -inmem 200 -siqsBT 250000 last: 60736, now: 10347 full, 147634 slp, 683638 dlp, 159612 tlp, (10763 r/sec) TLP filtering time = 2.8450 seconds. QS elapsed time = 79.5713 seconds. It might be difficult to get these parameter tweaks automated, but I thought this was a neat hero experiment at least. p.s. C100 in < 80 seconds! That's a long way from where yafu started. Granted on a fairly massive machine. |
|
![]() |
![]() |
![]() |
#40 |
"Tilman Neumann"
Jan 2016
Germany
2·13·19 Posts |
![]()
Amazing performance (for a C-100, no matter if 2 or 3 large primes). Can you tell how much better your current version is on that number compared to YaFU 1.34.5? Just for fun maybe compare RSA-100, too.
Last fiddled with by Till on 2021-03-21 at 22:09 Reason: "no matter if 2 or 3 large primes" comment |
![]() |
![]() |
![]() |
#41 | |
"Ben"
Feb 2007
3,617 Posts |
![]() Quote:
Code:
random C100 vNew, DLP 81 sec vNew, TLP 79 sec v1.34.5, DLP 129 sec RSA-100 vNew, DLP 103 sec vNew, TLP 108 sec v1.34.5, DLP 179 sec Last fiddled with by bsquared on 2021-03-22 at 00:15 |
|
![]() |
![]() |
![]() |
#42 | |
Just call me Henry
"David"
Sep 2007
Liverpool (GMT/BST)
25·11·17 Posts |
![]() Quote:
My dad and I are currently working towards adding TLP to our SIQS implementation(record is C120 currently I think). The current target is ECM with Edwards curves. Is there a particular reason why you(and GMP-ECM) avoided Edwards curves? Last fiddled with by henryzz on 2021-03-22 at 10:42 |
|
![]() |
![]() |
![]() |
#43 |
"Ben"
Feb 2007
3,617 Posts |
![]()
I suspect because the rsa numbers were chosen to have poor QS factor base properties, but I haven't verified this.
Congrats on the C120! I have looked at the various Edwards curves papers, but haven't found the energy yet to implement any of it. So for me at least it was a path of least resistance: the C&P and gmp-ecm references were too good to not use for a first implementation. And after that was working, the amount of additional work necessary to get an incremental improvement (hopefully) from Edwards curves has just been a non-starter. |
![]() |
![]() |
![]() |
#44 | |
"Tilman Neumann"
Jan 2016
Germany
2×13×19 Posts |
![]() Quote:
Thanks for the data and congrats, great improvements really! |
|
![]() |
![]() |
![]() |
Thread Tools | |
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Where do I send my PRP primes with large k? | Trilo | Riesel Prime Search | 3 | 2013-08-20 00:32 |
48-bit large primes! | jasonp | Msieve | 24 | 2010-06-01 19:14 |
NFS with 5 and 6 large primes | jasonp | Factoring | 4 | 2007-12-04 18:32 |
Why only three large primes | fivemack | Factoring | 18 | 2007-05-10 12:14 |
What is the use of these large primes | Prime Monster | Lounge | 34 | 2004-06-10 18:12 |