20200522, 04:37  #34 
Romulan Interpreter
Jun 2011
Thailand
2499_{16} Posts 
Wow! Ben, that is freaking wonderful! And you kept it for yourself for such a long time!
Now I will have to upgrade yafu... (grrr... still using old versions here and there, you know, if it works, don't fix it...) (unfortunately, since I discovered pari, I did most of my "numerology" in it, so the loops and screws came somehow too late, but be sure I will give it a try, I still have some ideas in a far side corner of the brain, which I never had the time/knowledge/balls to follow...) Last fiddled with by LaurV on 20200522 at 04:39 
20200522, 16:24  #35  
"Ben"
Feb 2007
41×83 Posts 
Quote:
Last fiddled with by bsquared on 20200522 at 16:24 

20200527, 18:48  #36 
"Ben"
Feb 2007
6513_{8} Posts 
I've been spending a little time with the TLP variation again... integrating jasonp's batch factoring code and investigating parameters. The batch factoring code provides a huge speedup for TLP, easily twice as fast as before at C110 sizes. The crossover with DLP quadratic sieve now appears to be around C110, although I'm still not convinced I have found optimal parameters for TLP. There are a lot of influential parameters. So as of now, TLP is still slower than regular DLP in sizes of interest to the quadratic sieve (C95C100).
Code:
C110 48178889479314834847826896738914354061668125063983964035428538278448985505047157633738779051249185304620494013 80 threads of cascadelake based xeon: DLP: 1642 seconds for sieving TLP: 1578 seconds for sieving 
20200712, 16:06  #37  
"Tilman Neumann"
Jan 2016
Germany
2·7·31 Posts 
Quote:
Compare a polynomialmajor SIQS to a primemajor SIQS. (both singlethreaded) The conversion of my old (polynomialmajor) SIQS to a primemajor SIQS took two steps: 1. Compute the solution arrays for all (a, b)polynomials of one aparameter outside the sieve. 2. In the sieve, have outer loop over primes, inner loop over polynomials. At first, performance was very bad. I had to switch two parameters to improve that: 1. Increase the sieve array size. 2. Reduce qCount = the number of q's whose product gaves the aparameter. These measures improved performance a lot, but the result is still bad. Here is a comparison to the polynomialmajor sieve: Code:
200 bit: Polymj 4s, 789ms, Primemj 11s, 224ms, ratio = 2.34370 210 bit: Polymj 8s, 22ms, Primemj 21s, 269ms, ratio = 2.65133 220 bit: Polymj 19s, 609ms, Primemj 55s, 826ms, ratio = 2.84695 230 bit: Polymj 40s, 232ms, Primemj 2m, 12s, 960ms, ratio = 3.30483 240 bit: Polymj 55s, 305ms, Primemj 3m, 12s, 622ms, ratio = 3.48290, Primemj sieveArraySize = 119552 250 bit: Polymj 2m, 15s, 526ms, Primemj 8m, 11s, 63ms, ratio = 3.62338, Primemj sieveArraySize = 159744 260 bit: Polymj 4m, 49s, 561ms, Primemj 43m, 30s, 551ms, ratio = 9.01554, Primemj sieveArraySize = 206848 For smaller N, the performance gap stems mostly from additional array accesses and another loop for the largest primes that is not necessary in the polynomialmajor version. The performance breakin at 260 bit might point out too many L3 cache misses. Kleinjungs SIQS might remedy many of the problems of a silly primemajor sieve. But maybe we can see here what the challenge is.  @Jason: It seems that Colin Percivals stuff has never been published? The only thing I found were some comments in another forum... Last fiddled with by Till on 20200712 at 16:24 

20200713, 15:52  #38 
Tribal Bullet
Oct 2004
110111010000_{2} Posts 
It was never published. When I asked him about it in ~2007 he said he was too busy with his startup company to work on it.

20210318, 18:17  #39  
"Ben"
Feb 2007
41·83 Posts 
Quote:
First instance of a C100 faster by TLP! Standard doublelargeprime (smallblocks, largecorecount server): Code:
./yafu "siqs(1802716097522165018257858828415111497060066282677325501816640492782221110851604465066510547671104729)" v threads 72 siqsNB 6 inmem 200 116794 rels found: 28831 full + 87963 from 1988560 partial, (24888.91 rels/sec) Elapsed time: 81.0558 sec QS elapsed time = 81.2256 seconds. Tweaked triplelargeprime (smallblocks, largecorecount server): Code:
./yafu "siqs(1802716097522165018257858828415111497060066282677325501816640492782221110851604465066510547671104729)" v threads 72 siqsNB 6 forceTLP siqsBDiv 3 siqsMFBT 2.6 siqsMFBD 1.9 noopt siqsTFSm 18 siqsLPB 27 siqsTF 95 siqsB 60000 inmem 200 siqsBT 250000 last: 60736, now: 10347 full, 147634 slp, 683638 dlp, 159612 tlp, (10763 r/sec) TLP filtering time = 2.8450 seconds. QS elapsed time = 79.5713 seconds. It might be difficult to get these parameter tweaks automated, but I thought this was a neat hero experiment at least. p.s. C100 in < 80 seconds! That's a long way from where yafu started. Granted on a fairly massive machine. 

20210321, 22:04  #40 
"Tilman Neumann"
Jan 2016
Germany
2×7×31 Posts 
Amazing performance (for a C100, no matter if 2 or 3 large primes). Can you tell how much better your current version is on that number compared to YaFU 1.34.5? Just for fun maybe compare RSA100, too.
Last fiddled with by Till on 20210321 at 22:09 Reason: "no matter if 2 or 3 large primes" comment 
20210322, 00:14  #41  
"Ben"
Feb 2007
41×83 Posts 
Quote:
Code:
random C100 vNew, DLP 81 sec vNew, TLP 79 sec v1.34.5, DLP 129 sec RSA100 vNew, DLP 103 sec vNew, TLP 108 sec v1.34.5, DLP 179 sec Last fiddled with by bsquared on 20210322 at 00:15 

20210322, 10:39  #42  
Just call me Henry
"David"
Sep 2007
Cambridge (GMT/BST)
13335_{8} Posts 
Quote:
My dad and I are currently working towards adding TLP to our SIQS implementation(record is C120 currently I think). The current target is ECM with Edwards curves. Is there a particular reason why you(and GMPECM) avoided Edwards curves? Last fiddled with by henryzz on 20210322 at 10:42 

20210322, 13:19  #43 
"Ben"
Feb 2007
41×83 Posts 
I suspect because the rsa numbers were chosen to have poor QS factor base properties, but I haven't verified this.
Congrats on the C120! I have looked at the various Edwards curves papers, but haven't found the energy yet to implement any of it. So for me at least it was a path of least resistance: the C&P and gmpecm references were too good to not use for a first implementation. And after that was working, the amount of additional work necessary to get an incremental improvement (hopefully) from Edwards curves has just been a nonstarter. 
20210322, 18:57  #44  
"Tilman Neumann"
Jan 2016
Germany
2×7×31 Posts 
Quote:
Thanks for the data and congrats, great improvements really! 

Thread Tools  
Similar Threads  
Thread  Thread Starter  Forum  Replies  Last Post 
Where do I send my PRP primes with large k?  Trilo  Riesel Prime Search  3  20130820 00:32 
48bit large primes!  jasonp  Msieve  24  20100601 19:14 
NFS with 5 and 6 large primes  jasonp  Factoring  4  20071204 18:32 
Why only three large primes  fivemack  Factoring  18  20070510 12:14 
What is the use of these large primes  Prime Monster  Lounge  34  20040610 18:12 