20161231, 09:01  #1 
"Tilman Neumann"
Jan 2016
Germany
494_{10} Posts 
QS questions about partials
Hi QS experts and friends,
recently I tested my PSIQS to factor numbers up to 340 bit. Giving it 12GB RAM, I noticed that starting from 330 bit inputs it runs into memory problems, which lead to many lengthy garbage collections. I found out that it's the collected (1 and 2) partials that occupy most of the memory. At 340 bit, there may be 1020 million of them. My data structures are fast but not very memoryefficient, partially caused by Java, partially by my current design; so one partial may need something like 500 to 1000 bytes. Making the data structures somewhat leaner would only delay the problem to slightly bigger inputs, though. So here are my questions: 1. Is there some kind of best practice to reduce the set of partials that are stored? 2. Should I retreat from storing partials having factors that do not fit into long? (such partials might turn up in my SIQS at inputs with about 350 bit) Thanks for your help Till 
20161231, 09:32  #2  
Bamboozled!
"๐บ๐๐ท๐ท๐ญ"
May 2003
Down not across
2^{2}·3·941 Posts 
Quote:
2. What do you mean by "long"? On many machines these days a long is a 64bit integer. If that's the case in your implementation then you are definitely saving partials with primes which are too big. If you mean 32bit integers, you should be good up to 450 bits or so. Back in the day Alec Muffett, Arjen Lenstra, Sam Wagstaff, Bruce Dodson and I factored a ~450 integer with large primes of that size. Beware: much above 350 bits QS starts becoming markedly slower than NFS and I really wouldn't recommend pushing above 380 bits. 

20161231, 11:03  #3  
"Tilman Neumann"
Jan 2016
Germany
2×13×19 Posts 
Thanks for the quick reply.
Quote:
Not doing onthefly combination, how does one decide when to start the combining stage? This can only be a (somewhat crude) estimate I guess? Quote:
I know, and some people might say the transition point is even lower, maybe around 320 bit. Maybe I'll try to implement NFS one day, too; but not now, because it seems to be quite a challenge. 

20161231, 14:30  #4  
Bamboozled!
"๐บ๐๐ท๐ท๐ญ"
May 2003
Down not across
2^{2}×3×941 Posts 
Quote:
However, the traditional way is to monitor the factorization by counting the number of cycles episodically throughout the sieving stage, assuming you are performing a significant factorization that is. (If you computation takes less than a day or two, it doesn't really matter how bad your estimate is as long as it's an overestimate.) The number of cycles grows somewhat faster than quadratically and you can plot log(cycles) against #relations to produce a nice straight line. All this is in the literature of 2530 years ago. Verb. Sap. Don't try to think of a QS factorization as a monolithic computation. Break it into portions such as parameter selection, sieving, cycle counting, cycle generation, linear algebra and square root phases. You can then overlap the sieving and counting without the latter interrupting the former. 

20161231, 16:09  #5 
"Tilman Neumann"
Jan 2016
Germany
2·13·19 Posts 
Thank you for the explanation of the estimate. I read loads of articles but I missed that part, probably because I had chosen the mentioned "onthefly combining" approach. And, I do have all those separate phases you mention.
Looking at it from another point: My program shows that you can have all partials in memory until 330 bit. Using optimized data structures could surely extend that range to 350 bits or more. So if NFS takes over at ~ 350 bits, having the partials in memory could be quite interesting for highperformance SIQS implementations like Yafu? If the developers are not ashamed to use so much memory, it is to say ;) 
20161231, 19:15  #6 
Tribal Bullet
Oct 2004
5·709 Posts 
You can also compromise: rather than having all partials in memory with full information, or no relations in memory with no information, you can have relations on disk along with a graph of the large primes in memory for counting cycles, that is built incrementally as relations arrive. Following the description in Manasse's nice paper for QS with two large primes, the size of the graph is proportional to the number of large primes you find which should be much smaller than the complete list of relations. With a single large prime you don't even need a cycle counting graph but a hashtable of bits, one for each large prime encountered.
These data structures can't solve the entire cycle composition problem but they don't need to; their job is find out when the postprocessing can begin. For the largest QS jobs, if you're worried about the memory usage you can run through the dataset on disk twice: the first pass counts the full relations and builds a reduced size version of each partial relation, that only stores the large primes. This is probably 1/5 the size of the complete dataset but is enough to run singleton removal in memory, which can tell you which relations to never even bother reading from disk. Then the second pass skips over those relations, which in practice removes 7090% of the memory footprint of the dataset. You'll find that even for the largest QS jobs this will make the surviving relations fit in memory on almost any modern machine. For NFS datasets that you expect to actually be big enough to not fit in memory (i.e. hundreds of GB in size) you need multiple passes plus much more advanced data structures. See here for more detail than you ever wanted on the techniques involved; the advanced filtering algorithms are all applicable to QS, and incidentally handle an arbitrary number of large primes per relation Last fiddled with by jasonp on 20161231 at 19:32 
20170101, 13:07  #7 
"Tilman Neumann"
Jan 2016
Germany
1EE_{16} Posts 
Thanks, Jason. I see that there is a big difference between toytools like mine that could eventually factor 350 bit numbers and highend tools that are meant to deal with 400, 500 and more bits.
I will remember that doing larger jobs will require to store the full information about partials on disk. But for the moment I'll try to get as far as possible doing everything in memory. I already got two clues from this thread how to continue: 1. I am storing too many partials, those having too big factors do not contribute to finding smooth relations. 2. Dropping partials with too big factors also means that I do not need to store the large factors in BigIntegers. This will help a lot, because the memory cost of BigIntegers is really quite high: For a number N they need 64+4*size byte, where size is bitLength(N)/32. Furthermore I already found a few simplifications of my data structures that will help, too. Thanks for all replies! Till 
20170102, 17:11  #8 
"Tilman Neumann"
Jan 2016
Germany
2×13×19 Posts 
I'ld like to report some progress...
After realizing some of the measures mentioned before, I did a new test run. Usually I test one not too easy random numbers of r bits, then one of r+10 bits, one of r+20 bits, and so on. This night I found: 310 bit  needs 2.4 GB RAM  took 1h:24m 320 bit  needs 2.8 GB RAM  took 1h:44m 330 bit  needs 3.3 GB RAM  took 4h:42m 340 bit  needs 4.0 GB RAM  took 9h:10m Taking into account that before 12 GB were not enough to factor a 340 bit number, this is a nice improvement in the memory management, I think. The last factored number was a C103 = 1338789862071581121049463469318139104929907866001906470793076524738452716923872080265576143257108151871 The factor I found is 31447532672384751301305460460001853881953018305177 (165 bits). Running YaFu 1.34.5 's 64.bit windows binary with siqs(.) on the same machine and with the same number of threads (6) took 6420s = 1h:47min. Older threads in this forum suggested that the difference between C and Java in these applications is about factor 30. Now it seems to be around 5.89. That's some improvement, isn't it? 
20170102, 18:00  #9 
"Tilman Neumann"
Jan 2016
Germany
2×13×19 Posts 
Oops
Sorry I had a typo in my calculation. Correct is:
9h:10m / 1h:47m = 550min / 107min ~ 5.14 So that's the true factor Yafu was better on that number than my current PSIQS in development. I'm planning the next release for the 2nd half of January. 
20170102, 20:48  #10 
"Ben"
Feb 2007
E21_{16} Posts 
I'd say that is quite good! Alpertron's implementation is the other high performance java siqs that might be comparable. I don't know of any benchmarks performed at this size input.

20170104, 14:59  #11 
"Tilman Neumann"
Jan 2016
Germany
111101110_{2} Posts 
I feel honored that you call my program "highperformance". There is still much to improve. But thanks!

Thread Tools  
Similar Threads  
Thread  Thread Starter  Forum  Replies  Last Post 
Two questions:  Dubslow  GPU Computing  1  20110805 18:22 
gmpecm questions  yoyo  GMPECM  34  20090320 18:06 
Some questions...  OmbooHankvald  PSearch  3  20050917 19:29 
5 questions  OmbooHankvald  Factoring  6  20050828 19:31 
Questions  OmbooHankvald  Prime Sierpinski Project  2  20050801 20:18 