mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Software

Reply
 
Thread Tools
Old 2008-02-25, 02:04   #1
ShiningArcanine
 
ShiningArcanine's Avatar
 
Dec 2005

22·23 Posts
Default Should my program use so much memory?

Last November, I wrote program capable of finding testing ranges of mersenne prime numbers for fun in C using the C Standard Libraries and the GNU Multi-Precision Library. I wanted to multithread it with the pthreads library, but could not get compilation to work so I put the program down until this weekend. I have gotten multithreading (coarse multithreading, as it runs separate Lucas Lehmer tests in separate threads, rather than actually multithreading the Lucas Lehmer tests themselves) working on both Unix and Windows.

Anyway, I have never used the program test large prime numbers (I have always tested low ranges, like M1 to M12000) and I decided to give testing M32582657 a try. I calculated that my program should use approximately 12.6MB (1 MB = 2^20 bytes, with (32582657 * 2 + 1) / 64 bytes for the sieve and 32582657 / 8 * 3 bytes for the Lucas Lehmer test) of RAM for data and assumed that it would use another megabyte of RAM for its instructions. I was wrong. It uses approximately 30MB to 55 MB of RAM, alternating between the two and while I am describing it as an alternation, it is not a true alternation, but that is the best way I can describe its behavior. If I could plot a normal distribution of its memory consumption, I would imagine that the mean of the normal distribution would be approximately 44 to 45 MB.

To explain this, I reasoned that the GMP library functions are allocating their own memory as a scratch space to perform computations. I tried a few things to lower my program's memory consumption such as changing the GMP functions used in the Lucas Lehmer test and switching back to the single threaded version, which I am using to run the M32582657 test now, before realizing this. Digging through the code for GMP's mpz_powm_ui() function confirmed this. However, I cannot see how that would require 3 to 4 times the amount of memory I computed. I installed Prime95 to see how much memory it used and it allocated exactly 22,780 KB, although it has since declined to 18,512 KB since I started typing this. Does a program using GMP to do Lucas Lehmer tests requiring 2 (+/- 0.5) times the amount of memory Prime95 uses sound reasonable to anyone?
ShiningArcanine is offline   Reply With Quote
Old 2008-02-25, 10:42   #2
akruppa
 
akruppa's Avatar
 
"Nancy"
Aug 2002
Alexandria

2,467 Posts
Default

First, you shouldn't use mpz_powm_ui() for the modular exponentiation. You can do the reduction mod 2^n-1 with a shift and add, but mpz_powm_ui() does not know that. Just square and reduce repeatedly instead.

The Schönhage-Strassen multiplication in GMP makes a copy of your data cut into pieces, to act as the coefficients of polynomials that get multiplied. The input polynomials are zero padded to degree equal to that of the product polynomial, and each coefficient is zero-padded to the largest possible size of coefficients in the product polynomial, which explains a 4-fold increase over the size of the input numbers.

You could avoid some of the zero padding by writing a DWT on top of the Schönhage-Strassen convolution, but that would require messing with some of the core GMP routines.

Alex

Last fiddled with by akruppa on 2008-02-25 at 20:26 Reason: typo
akruppa is offline   Reply With Quote
Old 2008-02-25, 11:53   #3
ShiningArcanine
 
ShiningArcanine's Avatar
 
Dec 2005

22·23 Posts
Default

How would you do the reduction mod 2^n-1 with a shift and add? I knew how to do reductions mod 2^n with a binary AND, but I did not think there was any equivalent fast operation for mod 2^n-1.
ShiningArcanine is offline   Reply With Quote
Old 2008-02-25, 12:15   #4
R.D. Silverman
 
R.D. Silverman's Avatar
 
"Bob Silverman"
Nov 2003
North of Boston

11101001010002 Posts
Default

Quote:
Originally Posted by ShiningArcanine View Post
How would you do the reduction mod 2^n-1 with a shift and add? I knew how to do reductions mod 2^n with a binary AND, but I did not think there was any equivalent fast operation for mod 2^n-1.


Put N = 2^n A + B + A - A = A(2^n-1) + B + A.
R.D. Silverman is offline   Reply With Quote
Old 2008-02-25, 12:40   #5
akruppa
 
akruppa's Avatar
 
"Nancy"
Aug 2002
Alexandria

2,467 Posts
Default

Another way of putting it is 2^n == 1 (mod 2^n-1)

So 2^n * A + B == 1 * A + B == A+B (mod 2^n-1)

So take the unreduced product, separate high and low bits with mpz_tdiv_q_2exp() and mpz_tdiv_r_2exp() and add the two pieces.

Alex
akruppa is offline   Reply With Quote
Old 2008-02-25, 16:08   #6
retina
Undefined
 
retina's Avatar
 
"The unspeakable one"
Jun 2006
My evil lair

199E16 Posts
Default

Quote:
Originally Posted by akruppa View Post
Another way of putting it is 2^n == 1 (mod 2^n-1)

So 2^n * A + B == 1 * A + B == A+B (mod 2^n-1)

So take the unreduced product, separate high and low bits with mpz_tdiv_q_2exp() and mpz_tdiv_r_2exp() and add the two pieces.
... and if there is an overflow past 2^n-2, truncate to n bits and add 1. But this won't catch a final 2^n-1 result, if that would be a problem, then instead: if there is an overflow past 2^n-2, subtract 2^n-1.

Last fiddled with by retina on 2008-02-25 at 16:09
retina is offline   Reply With Quote
Old 2008-02-25, 23:17   #7
ShiningArcanine
 
ShiningArcanine's Avatar
 
Dec 2005

22·23 Posts
Default

I am an undergraduate Biochemistry/Computer Science double major and I did not quite understand what you guys said, so I asked one of the graduate mathematics students to look over it and after explaining binary operations to him, he said that what you were saying is in order to compute y, where y = x mod (2^n - 1), I am sum every n bits AND (2^n - 1).

He used a square bracket notation that is alien to me to indicate that a number was being reduced modulo another number, so I am assuming that the course where I would learn that is abstract/applied algebra.

So to compute 111000 mod 10^n - 1 where n = 10, I do (000 AND 011) + (111 AND 011), to get 10.

Is that what you were saying or do I have things terribly wrong?
ShiningArcanine is offline   Reply With Quote
Old 2008-02-26, 09:59   #8
akruppa
 
akruppa's Avatar
 
"Nancy"
Aug 2002
Alexandria

2,467 Posts
Default

The binary (binary as in base 2) AND operator does not work if you do it with decimal digits.

Saying "sum every n bits AND (2^n - 1)" seems correct (although a bit vague) but is misleading, there's never any AND operation actually performed. Your example doesn't look right to me, if n=10, you'd be reducing modulo 9999999999, so 111000 would remain unchanged.

Simply look at the algebra: N = 2n * A + B ≡ A+B (mod 2n-1)
If you want to reduce N modulo 2n-1, cut N into two pieces: the lower n bits (=B) and the remaining upper bits (=A). Then add these pieces. There's nothing more to it.

As retina pointed out, this sum is not necessarily < 2n-1; if N ≤ 22n-1, then A, B ≤ 2n-1 and A+B ≤ 2n+1-1. If you square this again, you get something ≤ 22n+2-1, etc., i.e. the size can run away. You can prevent this by applying the reduction twice.

Alex
akruppa is offline   Reply With Quote
Old 2008-02-26, 12:07   #9
ShiningArcanine
 
ShiningArcanine's Avatar
 
Dec 2005

9210 Posts
Default

I should have mentioned that my example uses base 2 to perform computations, so 2 (base 10) = 10 (base 2) and 111000 (base 2) = 56 (base 10). The AND operator works here, as I am doing binary arithmetic.
ShiningArcanine is offline   Reply With Quote
Old 2008-02-26, 12:36   #10
akruppa
 
akruppa's Avatar
 
"Nancy"
Aug 2002
Alexandria

2,467 Posts
Default

Ah, ok... the 10^n had me confused.

Alex
akruppa is offline   Reply With Quote
Old 2008-02-28, 23:53   #11
ShiningArcanine
 
ShiningArcanine's Avatar
 
Dec 2005

9210 Posts
Default

I have an update. I replaced mpz_mod(n->s, n->s, n->num); with the following in my code:

Quote:
mpz_tdiv_q_2exp(n->t, n->s, n->exp);

mpz_tdiv_r_2exp(n->s, n->s, n->exp);

mpz_add(n->s, n->s, n->t);

if (mpz_cmp(n->s, n->num) >= 0)
{

mpz_sub(n->s, n->s, n->num);

}
Tests on my university's Unix server of all of the mersenne numbers from 0 to 3217 now take 2 seconds instead of 7 and a test from 0 to 10000 now takes 90 seconds. Testing M21701 now takes 8 seconds instead of 31 seconds. On my computer, testing M21701 now takes 20 seconds instead of 90 seconds and testing all of the mersenne numbers from 0 to 3217 now takes 9 seconds instead of 30 seconds.

I had no idea that GMP's modulus operation was so expensive.

Edit: I did some more testing. It takes 2-3 seconds for Prime95 to test M44497 on my computer and 124 seconds for my program, which uses GMP. Prior to this optimization, my program took 555 seconds on my computer to test M44497 and the speed-up is proportional (to 2 significant figures) to the one I observed for M21701, which surprised me. Anyway, this means that Prime95 is 40 to 60 times faster than my program at testing prime numbers around M44497, which is remarkable.

Last fiddled with by ShiningArcanine on 2008-02-29 at 00:33
ShiningArcanine is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
"Hybrid Memory Cube" offers 1 Tb/s memory bandwith at just 1.4 mW/Gb/s ixfd64 Hardware 4 2011-12-14 21:24
about the program boooh Information & Answers 2 2010-03-22 15:22
Is there any program... mart_r Software 2 2009-11-15 20:06
What program? ThomRuley LMH > 100M 4 2004-10-15 06:27
program P-1 for K*2^n-1 jocelynl 15k Search 19 2004-01-11 17:24

All times are UTC. The time now is 01:36.


Mon Aug 8 01:36:22 UTC 2022 up 31 days, 20:23, 1 user, load averages: 0.89, 1.11, 1.13

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2022, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.

≠ ± ∓ ÷ × · − √ ‰ ⊗ ⊕ ⊖ ⊘ ⊙ ≤ ≥ ≦ ≧ ≨ ≩ ≺ ≻ ≼ ≽ ⊏ ⊐ ⊑ ⊒ ² ³ °
∠ ∟ ° ≅ ~ ‖ ⟂ ⫛
≡ ≜ ≈ ∝ ∞ ≪ ≫ ⌊⌋ ⌈⌉ ∘ ∏ ∐ ∑ ∧ ∨ ∩ ∪ ⨀ ⊕ ⊗ 𝖕 𝖖 𝖗 ⊲ ⊳
∅ ∖ ∁ ↦ ↣ ∩ ∪ ⊆ ⊂ ⊄ ⊊ ⊇ ⊃ ⊅ ⊋ ⊖ ∈ ∉ ∋ ∌ ℕ ℤ ℚ ℝ ℂ ℵ ℶ ℷ ℸ 𝓟
¬ ∨ ∧ ⊕ → ← ⇒ ⇐ ⇔ ∀ ∃ ∄ ∴ ∵ ⊤ ⊥ ⊢ ⊨ ⫤ ⊣ … ⋯ ⋮ ⋰ ⋱
∫ ∬ ∭ ∮ ∯ ∰ ∇ ∆ δ ∂ ℱ ℒ ℓ
𝛢𝛼 𝛣𝛽 𝛤𝛾 𝛥𝛿 𝛦𝜀𝜖 𝛧𝜁 𝛨𝜂 𝛩𝜃𝜗 𝛪𝜄 𝛫𝜅 𝛬𝜆 𝛭𝜇 𝛮𝜈 𝛯𝜉 𝛰𝜊 𝛱𝜋 𝛲𝜌 𝛴𝜎𝜍 𝛵𝜏 𝛶𝜐 𝛷𝜙𝜑 𝛸𝜒 𝛹𝜓 𝛺𝜔