20100406, 20:49  #1 
Oct 2006
Berlin, Germany
607 Posts 
gmpecm needed memory and runtime
Hello,
I run gmpecm with B1=43M and different input number length. Afterwards I made a chart of estimated memory (red line) and measured run time (blue line). X: number length Estimated memory is nearly linear increasing, this shows also the chart for different B1. But why is there a big jump in the line? Runtime seems to grow potencially (bah, what is the right engl. word?). Can somobod confirm or explain it? yoyo 
20100407, 07:14  #2 
Bamboozled!
"ð’‰ºð’ŒŒð’‡·ð’†·ð’€"
May 2003
Down not across
2^{5}×331 Posts 

20100407, 07:48  #3 
(loop (#_fork))
Feb 2006
Cambridge, England
2·3,191 Posts 
This isn't an exponentialgrowth case. I can't think of a nice English word for "as some power of the input"; 'polynomially' suggests it's an integer exponent which isn't the case.

20100407, 13:52  #4  
Nov 2003
2^{2}×5×373 Posts 
Quote:
A Practical Analysis of ECM Math. Comp. 

20100407, 16:09  #5 
"William"
May 2003
New Haven
3·787 Posts 
Bob's paper would answer your question, but it's overkill for you needs, it's hard to get hold of if you aren't associated with a university, and it has an error that made it hard, at least for me, to follow (although not relevant to your question).
The jump in memory happens when the granularity of the FFT changes. Large numbers are represented as arrays of numbers  much like decimal numbers are arrays of digits. As the number gets larger, the maximum size permitted in each array element gets smaller. To make up some numbers for the sake up discussion, as the size goes from 100 to 500 digits, you are using 3 digits per element, and the memory grows because you need 34 elements for the 100 digit numbers and 167 elements for the 500 digit number. But between 500 and 600 digits you must shift to 2 digits per element, so 600 digits requires arrays of length 300  much longer than the 200 you would get from the earlier growth. It then grows slowly until the next jump. The change in granularity is necessary because of how FFT multiplication deals with carrying. When we do multiplication by longhand, we treat the numbers as arrays of single digits and we integrate the carries in two stages. But FFT multiplication computes a convolution  this is as if we kept the results of each multiply (for 5x5, keeping the whole 25 instead of keeping the 5 and carrying the 2), AND in the addition stage we keep the whole totals instead of keeping the last digit and carrying the rest. Hence the convolution of 153 x 718 would be Code:
1 5 3 7 1 8  8 40 24 1 5 3 7 35 21  7 36 34 43 24 From which you could then do the carries as 24 43 34 36 7  109854 To get the product William 
20100407, 16:13  #6 
"Nancy"
Aug 2002
Alexandria
2,467 Posts 
What is the unit of "length" and on what kind of system (in particular, 32 bit or 64 bit) were the timings produced?
Alex 
20100407, 16:42  #7  
Oct 2006
Berlin, Germany
1001011111_{2} Posts 
Quote:
It was produced on a 64bit Suse Linux i7 processor: http://www.rechenkraft.net/yoyo/show...hp?hostid=8036 yoyo 

20100409, 16:48  #8 
"Nancy"
Aug 2002
Alexandria
100110100011_{2} Posts 
Consider stage 1 and stage 2 separately, their run times behave rather differently for increasing input number size.
For relatively small input, usually bounded by GMP's MUL_TOOM22_THRESHOLD (28 by default on K8/K10 systems), the multiplication is simply quadratic time grammarschool, so in theory the time grows like the square of the number of words in the input. In practice the growth is a little slower due to overhead. In GMPECM 6.3 we also switch from GMPECM's own multiplication functions to GMP's which are faster for operands of more than 1520 words (depending on CPU), so the growth will again slow down a bit, even though each algorithm is quadratic time. Above MUL_TOOM22_THRESHOLD, GMP uses asymptotically faster algorithms like Karatsuba with complexity O(n^1.59), Toom3, etc, and eventually SchÃ¶nhageStrassen with O(n log n loglog n). So for large enough input, the stage 1 time eventually becomes quasilinear. In stage 2 the time is mostly linear in the size of the inputs. There is some modular arithmetic, but the time for polynomial multiplication normally dominates. With NTT, that time is O(deg * n^2) for conversion between Z/NZ and NTT primes, and O(n * deg * log(deg)) for the transforms. The initialisation of the NTT primes is cubic time however, which will dominate for large enough input. Without NTT, stage 2 uses SchÃ¶nhageStrassen which results in quasipolynomial time for pretty much all input sizes, at the cost of using more memory. So, sadly, there's no easy "grows like n^c" answer. Stage 1 is quasilinear eventually, but almost quadratic for most numbers that we try to factor. Stage 2 becomes cubic(!) eventually when using NTT, but is pretty much linear for numbers and parameters we typically use, and stays pretty much linear if you don't use the NTT. For stage 1, here's a graph for 1, ..., 23 words with B1=1M on a Core 2 2133 MHz.The graph fits a 51x^{2}+106x+982 function pretty well up to 20 words, where a jump occurs since we switch to GMP functions at that point which are a no faster than our own on Core 2 but incur more overhead (Note: On AMD cpus we switch to GMP functions earlier, because they actually are faster there). I haven't made a graph for stage 2 times yet, but I expect that for reasonable B2 it will be pretty linear in the number of NTT primes, which is moreorless proportional to the number of words in the modulus. Alex Last fiddled with by akruppa on 20100409 at 17:17 
Thread Tools  
Similar Threads  
Thread  Thread Starter  Forum  Replies  Last Post 
Is ECC memory needed?  joe a  Hardware  35  20161107 22:06 
Predicting QS and NFS runtime  jasonp  Factoring  2  20110307 01:22 
Memory Needed  Unregistered  Information & Answers  2  20090920 15:21 
ECM Runtime and F20  D. B. Staple  Factoring  11  20071212 16:52 
amount of memory needed for Stage 2 (factor)  pin  Software  13  20030318 19:54 