mersenneforum.org  

Go Back   mersenneforum.org > Factoring Projects > Factoring

Reply
 
Thread Tools
Old 2016-12-31, 09:01   #1
Till
 
Till's Avatar
 
"Tilman Neumann"
Jan 2016
Germany

49410 Posts
Default QS questions about partials

Hi QS experts and friends,

recently I tested my PSIQS to factor numbers up to 340 bit. Giving it 12GB RAM, I noticed that starting from 330 bit inputs it runs into memory problems, which lead to many lengthy garbage collections.

I found out that it's the collected (1- and 2-) partials that occupy most of the memory. At 340 bit, there may be 10-20 million of them. My data structures are fast but not very memory-efficient, partially caused by Java, partially by my current design; so one partial may need something like 500 to 1000 bytes.

Making the data structures somewhat leaner would only delay the problem to slightly bigger inputs, though.

So here are my questions:
1. Is there some kind of best practice to reduce the set of partials that are stored?
2. Should I retreat from storing partials having factors that do not fit into long? (such partials might turn up in my SIQS at inputs with about 350 bit)

Thanks for your help
Till
Till is offline   Reply With Quote
Old 2016-12-31, 09:32   #2
xilman
Bamboozled!
 
xilman's Avatar
 
"๐’‰บ๐’ŒŒ๐’‡ท๐’†ท๐’€ญ"
May 2003
Down not across

22·3·941 Posts
Default

Quote:
Originally Posted by Till View Post
So here are my questions:
1. Is there some kind of best practice to reduce the set of partials that are stored?
2. Should I retreat from storing partials having factors that do not fit into long? (such partials might turn up in my SIQS at inputs with about 350 bit)

Thanks for your help
Till
1. It's traditional to store them in an external file where they don't take up memory until the combining and subsequent stages. It's also traditional to find those which form cycles using file I/O rather than RAM storage.

2. What do you mean by "long"? On many machines these days a long is a 64-bit integer. If that's the case in your implementation then you are definitely saving partials with primes which are too big. If you mean 32-bit integers, you should be good up to 450 bits or so. Back in the day Alec Muffett, Arjen Lenstra, Sam Wagstaff, Bruce Dodson and I factored a ~450 integer with large primes of that size.

Beware: much above 350 bits QS starts becoming markedly slower than NFS and I really wouldn't recommend pushing above 380 bits.
xilman is online now   Reply With Quote
Old 2016-12-31, 11:03   #3
Till
 
Till's Avatar
 
"Tilman Neumann"
Jan 2016
Germany

2×13×19 Posts
Default

Thanks for the quick reply.

Quote:
Originally Posted by xilman View Post
1. It's traditional to store them in an external file where they don't take up memory until the combining and subsequent stages. It's also traditional to find those which form cycles using file I/O rather than RAM storage.
Ok, so you get the same amount of data, and the problem is that I am doing it all in memory. This has the advantage that I can combine partials to smooth relations "on the fly", and have exact information about when to start a solver run. But then it seems as if the amount of data makes the approach unfeasible for numbers of some size.

Not doing on-the-fly combination, how does one decide when to start the combining stage? This can only be a (somewhat crude) estimate I guess?

Quote:
Originally Posted by xilman View Post
2. What do you mean by "long"? On many machines these days a long is a 64-bit integer. If that's the case in your implementation then you are definitely saving partials with primes which are too big. If you mean 32-bit integers, you should be good up to 450 bits or so. Back in the day Alec Muffett, Arjen Lenstra, Sam Wagstaff, Bruce Dodson and I factored a ~450 integer with large primes of that size.
With long I meant 64 bit, yes. I'll try to drop the partials with the largest primes, then.

Quote:
Originally Posted by xilman View Post
Beware: much above 350 bits QS starts becoming markedly slower than NFS and I really wouldn't recommend pushing above 380 bits.
I know, and some people might say the transition point is even lower, maybe around 320 bit. Maybe I'll try to implement NFS one day, too; but not now, because it seems to be quite a challenge.
Till is offline   Reply With Quote
Old 2016-12-31, 14:30   #4
xilman
Bamboozled!
 
xilman's Avatar
 
"๐’‰บ๐’ŒŒ๐’‡ท๐’†ท๐’€ญ"
May 2003
Down not across

22×3×941 Posts
Default

Quote:
Originally Posted by Till View Post
Not doing on-the-fly combination, how does one decide when to start the combining stage? This can only be a (somewhat crude) estimate I guess?
With experience the estimate can be made quite precisely.

However, the traditional way is to monitor the factorization by counting the number of cycles episodically throughout the sieving stage, assuming you are performing a significant factorization that is. (If you computation takes less than a day or two, it doesn't really matter how bad your estimate is as long as it's an over-estimate.) The number of cycles grows somewhat faster than quadratically and you can plot log(cycles) against #relations to produce a nice straight line. All this is in the literature of 25-30 years ago.

Verb. Sap. Don't try to think of a QS factorization as a monolithic computation. Break it into portions such as parameter selection, sieving, cycle counting, cycle generation, linear algebra and square root phases. You can then overlap the sieving and counting without the latter interrupting the former.
xilman is online now   Reply With Quote
Old 2016-12-31, 16:09   #5
Till
 
Till's Avatar
 
"Tilman Neumann"
Jan 2016
Germany

2·13·19 Posts
Default

Thank you for the explanation of the estimate. I read loads of articles but I missed that part, probably because I had chosen the mentioned "on-the-fly combining" approach. And, I do have all those separate phases you mention.

Looking at it from another point:
My program shows that you can have all partials in memory until 330 bit. Using optimized data structures could surely extend that range to 350 bits or more. So if NFS takes over at ~ 350 bits, having the partials in memory could be quite interesting for high-performance SIQS implementations like Yafu? If the developers are not ashamed to use so much memory, it is to say ;-)
Till is offline   Reply With Quote
Old 2016-12-31, 19:15   #6
jasonp
Tribal Bullet
 
jasonp's Avatar
 
Oct 2004

5·709 Posts
Default

You can also compromise: rather than having all partials in memory with full information, or no relations in memory with no information, you can have relations on disk along with a graph of the large primes in memory for counting cycles, that is built incrementally as relations arrive. Following the description in Manasse's nice paper for QS with two large primes, the size of the graph is proportional to the number of large primes you find which should be much smaller than the complete list of relations. With a single large prime you don't even need a cycle counting graph but a hashtable of bits, one for each large prime encountered.

These data structures can't solve the entire cycle composition problem but they don't need to; their job is find out when the postprocessing can begin.

For the largest QS jobs, if you're worried about the memory usage you can run through the dataset on disk twice: the first pass counts the full relations and builds a reduced size version of each partial relation, that only stores the large primes. This is probably 1/5 the size of the complete dataset but is enough to run singleton removal in memory, which can tell you which relations to never even bother reading from disk. Then the second pass skips over those relations, which in practice removes 70-90% of the memory footprint of the dataset. You'll find that even for the largest QS jobs this will make the surviving relations fit in memory on almost any modern machine. For NFS datasets that you expect to actually be big enough to not fit in memory (i.e. hundreds of GB in size) you need multiple passes plus much more advanced data structures. See here for more detail than you ever wanted on the techniques involved; the advanced filtering algorithms are all applicable to QS, and incidentally handle an arbitrary number of large primes per relation

Last fiddled with by jasonp on 2016-12-31 at 19:32
jasonp is offline   Reply With Quote
Old 2017-01-01, 13:07   #7
Till
 
Till's Avatar
 
"Tilman Neumann"
Jan 2016
Germany

1EE16 Posts
Default

Thanks, Jason. I see that there is a big difference between toy-tools like mine that could eventually factor 350 bit numbers and high-end tools that are meant to deal with 400, 500 and more bits.

I will remember that doing larger jobs will require to store the full information about partials on disk. But for the moment I'll try to get as far as possible doing everything in memory. I already got two clues from this thread how to continue:
1. I am storing too many partials, those having too big factors do not contribute to finding smooth relations.
2. Dropping partials with too big factors also means that I do not need to store the large factors in BigIntegers. This will help a lot, because the memory cost of BigIntegers is really quite high: For a number N they need 64+4*size byte, where size is bitLength(N)/32.

Furthermore I already found a few simplifications of my data structures that will help, too.

Thanks for all replies!
Till
Till is offline   Reply With Quote
Old 2017-01-02, 17:11   #8
Till
 
Till's Avatar
 
"Tilman Neumann"
Jan 2016
Germany

2×13×19 Posts
Default

I'ld like to report some progress...

After realizing some of the measures mentioned before, I did a new test run. Usually I test one not too easy random numbers of r bits, then one of r+10 bits, one of r+20 bits, and so on.

This night I found:
310 bit -- needs 2.4 GB RAM -- took 1h:24m
320 bit -- needs 2.8 GB RAM -- took 1h:44m
330 bit -- needs 3.3 GB RAM -- took 4h:42m
340 bit -- needs 4.0 GB RAM -- took 9h:10m

Taking into account that before 12 GB were not enough to factor a 340 bit number, this is a nice improvement in the memory management, I think.

The last factored number was a C103 = 1338789862071581121049463469318139104929907866001906470793076524738452716923872080265576143257108151871
The factor I found is 31447532672384751301305460460001853881953018305177 (165 bits).

Running YaFu 1.34.5 's 64.bit windows binary with siqs(.) on the same machine and with the same number of threads (6) took 6420s = 1h:47min.

Older threads in this forum suggested that the difference between C and Java in these applications is about factor 30. Now it seems to be around 5.89. That's some improvement, isn't it?
Till is offline   Reply With Quote
Old 2017-01-02, 18:00   #9
Till
 
Till's Avatar
 
"Tilman Neumann"
Jan 2016
Germany

2×13×19 Posts
Default Oops

Sorry I had a typo in my calculation. Correct is:
9h:10m / 1h:47m = 550min / 107min ~ 5.14

So that's the true factor Yafu was better on that number than my current PSIQS in development.

I'm planning the next release for the 2nd half of January.
Till is offline   Reply With Quote
Old 2017-01-02, 20:48   #10
bsquared
 
bsquared's Avatar
 
"Ben"
Feb 2007

E2116 Posts
Default

Quote:
Originally Posted by Till View Post
Sorry I had a typo in my calculation. Correct is:
9h:10m / 1h:47m = 550min / 107min ~ 5.14

So that's the true factor Yafu was better on that number than my current PSIQS in development.

I'm planning the next release for the 2nd half of January.
I'd say that is quite good! Alpertron's implementation is the other high performance java siqs that might be comparable. I don't know of any benchmarks performed at this size input.
bsquared is online now   Reply With Quote
Old 2017-01-04, 14:59   #11
Till
 
Till's Avatar
 
"Tilman Neumann"
Jan 2016
Germany

1111011102 Posts
Default

I feel honored that you call my program "high-performance". There is still much to improve. But thanks!
Till is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
Two questions: Dubslow GPU Computing 1 2011-08-05 18:22
gmp-ecm questions yoyo GMP-ECM 34 2009-03-20 18:06
Some questions... OmbooHankvald PSearch 3 2005-09-17 19:29
5 questions OmbooHankvald Factoring 6 2005-08-28 19:31
Questions OmbooHankvald Prime Sierpinski Project 2 2005-08-01 20:18

All times are UTC. The time now is 19:54.


Mon May 16 19:54:34 UTC 2022 up 32 days, 17:55, 1 user, load averages: 1.63, 1.39, 1.35

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2022, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.

โ‰  ยฑ โˆ“ รท ร— ยท โˆ’ โˆš โ€ฐ โŠ— โŠ• โŠ– โŠ˜ โŠ™ โ‰ค โ‰ฅ โ‰ฆ โ‰ง โ‰จ โ‰ฉ โ‰บ โ‰ป โ‰ผ โ‰ฝ โŠ โŠ โŠ‘ โŠ’ ยฒ ยณ ยฐ
โˆ  โˆŸ ยฐ โ‰… ~ โ€– โŸ‚ โซ›
โ‰ก โ‰œ โ‰ˆ โˆ โˆž โ‰ช โ‰ซ โŒŠโŒ‹ โŒˆโŒ‰ โˆ˜ โˆ โˆ โˆ‘ โˆง โˆจ โˆฉ โˆช โจ€ โŠ• โŠ— ๐–• ๐–– ๐–— โŠฒ โŠณ
โˆ… โˆ– โˆ โ†ฆ โ†ฃ โˆฉ โˆช โŠ† โŠ‚ โŠ„ โŠŠ โŠ‡ โŠƒ โŠ… โŠ‹ โŠ– โˆˆ โˆ‰ โˆ‹ โˆŒ โ„• โ„ค โ„š โ„ โ„‚ โ„ต โ„ถ โ„ท โ„ธ ๐“Ÿ
ยฌ โˆจ โˆง โŠ• โ†’ โ† โ‡’ โ‡ โ‡” โˆ€ โˆƒ โˆ„ โˆด โˆต โŠค โŠฅ โŠข โŠจ โซค โŠฃ โ€ฆ โ‹ฏ โ‹ฎ โ‹ฐ โ‹ฑ
โˆซ โˆฌ โˆญ โˆฎ โˆฏ โˆฐ โˆ‡ โˆ† ฮด โˆ‚ โ„ฑ โ„’ โ„“
๐›ข๐›ผ ๐›ฃ๐›ฝ ๐›ค๐›พ ๐›ฅ๐›ฟ ๐›ฆ๐œ€๐œ– ๐›ง๐œ ๐›จ๐œ‚ ๐›ฉ๐œƒ๐œ— ๐›ช๐œ„ ๐›ซ๐œ… ๐›ฌ๐œ† ๐›ญ๐œ‡ ๐›ฎ๐œˆ ๐›ฏ๐œ‰ ๐›ฐ๐œŠ ๐›ฑ๐œ‹ ๐›ฒ๐œŒ ๐›ด๐œŽ๐œ ๐›ต๐œ ๐›ถ๐œ ๐›ท๐œ™๐œ‘ ๐›ธ๐œ’ ๐›น๐œ“ ๐›บ๐œ”