![]() |
![]() |
#1 |
Dec 2005
22·23 Posts |
![]()
Last November, I wrote program capable of finding testing ranges of mersenne prime numbers for fun in C using the C Standard Libraries and the GNU Multi-Precision Library. I wanted to multithread it with the pthreads library, but could not get compilation to work so I put the program down until this weekend. I have gotten multithreading (coarse multithreading, as it runs separate Lucas Lehmer tests in separate threads, rather than actually multithreading the Lucas Lehmer tests themselves) working on both Unix and Windows.
Anyway, I have never used the program test large prime numbers (I have always tested low ranges, like M1 to M12000) and I decided to give testing M32582657 a try. I calculated that my program should use approximately 12.6MB (1 MB = 2^20 bytes, with (32582657 * 2 + 1) / 64 bytes for the sieve and 32582657 / 8 * 3 bytes for the Lucas Lehmer test) of RAM for data and assumed that it would use another megabyte of RAM for its instructions. I was wrong. It uses approximately 30MB to 55 MB of RAM, alternating between the two and while I am describing it as an alternation, it is not a true alternation, but that is the best way I can describe its behavior. If I could plot a normal distribution of its memory consumption, I would imagine that the mean of the normal distribution would be approximately 44 to 45 MB. To explain this, I reasoned that the GMP library functions are allocating their own memory as a scratch space to perform computations. I tried a few things to lower my program's memory consumption such as changing the GMP functions used in the Lucas Lehmer test and switching back to the single threaded version, which I am using to run the M32582657 test now, before realizing this. Digging through the code for GMP's mpz_powm_ui() function confirmed this. However, I cannot see how that would require 3 to 4 times the amount of memory I computed. I installed Prime95 to see how much memory it used and it allocated exactly 22,780 KB, although it has since declined to 18,512 KB since I started typing this. Does a program using GMP to do Lucas Lehmer tests requiring 2 (+/- 0.5) times the amount of memory Prime95 uses sound reasonable to anyone? |
![]() |
![]() |
![]() |
#2 |
"Nancy"
Aug 2002
Alexandria
2,467 Posts |
![]()
First, you shouldn't use mpz_powm_ui() for the modular exponentiation. You can do the reduction mod 2^n-1 with a shift and add, but mpz_powm_ui() does not know that. Just square and reduce repeatedly instead.
The Schönhage-Strassen multiplication in GMP makes a copy of your data cut into pieces, to act as the coefficients of polynomials that get multiplied. The input polynomials are zero padded to degree equal to that of the product polynomial, and each coefficient is zero-padded to the largest possible size of coefficients in the product polynomial, which explains a 4-fold increase over the size of the input numbers. You could avoid some of the zero padding by writing a DWT on top of the Schönhage-Strassen convolution, but that would require messing with some of the core GMP routines. Alex Last fiddled with by akruppa on 2008-02-25 at 20:26 Reason: typo |
![]() |
![]() |
![]() |
#3 |
Dec 2005
22·23 Posts |
![]()
How would you do the reduction mod 2^n-1 with a shift and add? I knew how to do reductions mod 2^n with a binary AND, but I did not think there was any equivalent fast operation for mod 2^n-1.
|
![]() |
![]() |
![]() |
#4 |
"Bob Silverman"
Nov 2003
North of Boston
11101001010002 Posts |
![]() |
![]() |
![]() |
![]() |
#5 |
"Nancy"
Aug 2002
Alexandria
2,467 Posts |
![]()
Another way of putting it is 2^n == 1 (mod 2^n-1)
So 2^n * A + B == 1 * A + B == A+B (mod 2^n-1) So take the unreduced product, separate high and low bits with mpz_tdiv_q_2exp() and mpz_tdiv_r_2exp() and add the two pieces. Alex |
![]() |
![]() |
![]() |
#6 |
Undefined
"The unspeakable one"
Jun 2006
My evil lair
199E16 Posts |
![]()
... and if there is an overflow past 2^n-2, truncate to n bits and add 1. But this won't catch a final 2^n-1 result, if that would be a problem, then instead: if there is an overflow past 2^n-2, subtract 2^n-1.
Last fiddled with by retina on 2008-02-25 at 16:09 |
![]() |
![]() |
![]() |
#7 |
Dec 2005
22·23 Posts |
![]()
I am an undergraduate Biochemistry/Computer Science double major and I did not quite understand what you guys said, so I asked one of the graduate mathematics students to look over it and after explaining binary operations to him, he said that what you were saying is in order to compute y, where y = x mod (2^n - 1), I am sum every n bits AND (2^n - 1).
He used a square bracket notation that is alien to me to indicate that a number was being reduced modulo another number, so I am assuming that the course where I would learn that is abstract/applied algebra. So to compute 111000 mod 10^n - 1 where n = 10, I do (000 AND 011) + (111 AND 011), to get 10. Is that what you were saying or do I have things terribly wrong? |
![]() |
![]() |
![]() |
#8 |
"Nancy"
Aug 2002
Alexandria
2,467 Posts |
![]()
The binary (binary as in base 2) AND operator does not work if you do it with decimal digits.
Saying "sum every n bits AND (2^n - 1)" seems correct (although a bit vague) but is misleading, there's never any AND operation actually performed. Your example doesn't look right to me, if n=10, you'd be reducing modulo 9999999999, so 111000 would remain unchanged. Simply look at the algebra: N = 2n * A + B ≡ A+B (mod 2n-1) If you want to reduce N modulo 2n-1, cut N into two pieces: the lower n bits (=B) and the remaining upper bits (=A). Then add these pieces. There's nothing more to it. As retina pointed out, this sum is not necessarily < 2n-1; if N ≤ 22n-1, then A, B ≤ 2n-1 and A+B ≤ 2n+1-1. If you square this again, you get something ≤ 22n+2-1, etc., i.e. the size can run away. You can prevent this by applying the reduction twice. Alex |
![]() |
![]() |
![]() |
#9 |
Dec 2005
9210 Posts |
![]()
I should have mentioned that my example uses base 2 to perform computations, so 2 (base 10) = 10 (base 2) and 111000 (base 2) = 56 (base 10). The AND operator works here, as I am doing binary arithmetic.
|
![]() |
![]() |
![]() |
#10 |
"Nancy"
Aug 2002
Alexandria
2,467 Posts |
![]()
Ah, ok... the 10^n had me confused.
Alex |
![]() |
![]() |
![]() |
#11 | |
Dec 2005
9210 Posts |
![]()
I have an update. I replaced mpz_mod(n->s, n->s, n->num); with the following in my code:
Quote:
I had no idea that GMP's modulus operation was so expensive. Edit: I did some more testing. It takes 2-3 seconds for Prime95 to test M44497 on my computer and 124 seconds for my program, which uses GMP. Prior to this optimization, my program took 555 seconds on my computer to test M44497 and the speed-up is proportional (to 2 significant figures) to the one I observed for M21701, which surprised me. Anyway, this means that Prime95 is 40 to 60 times faster than my program at testing prime numbers around M44497, which is remarkable. Last fiddled with by ShiningArcanine on 2008-02-29 at 00:33 |
|
![]() |
![]() |
![]() |
Thread Tools | |
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
"Hybrid Memory Cube" offers 1 Tb/s memory bandwith at just 1.4 mW/Gb/s | ixfd64 | Hardware | 4 | 2011-12-14 21:24 |
about the program | boooh | Information & Answers | 2 | 2010-03-22 15:22 |
Is there any program... | mart_r | Software | 2 | 2009-11-15 20:06 |
What program? | ThomRuley | LMH > 100M | 4 | 2004-10-15 06:27 |
program P-1 for K*2^n-1 | jocelynl | 15k Search | 19 | 2004-01-11 17:24 |