mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Software

Reply
 
Thread Tools
Old 2012-04-17, 11:13   #34
henryzz
Just call me Henry
 
henryzz's Avatar
 
"David"
Sep 2007
Cambridge (GMT/BST)

2×3×7×137 Posts
Default

Quote:
Originally Posted by axn View Post
I am out of ideas Perhaps you can save/restore registers in each asm block. Try adding "push <reg>" statements at the beginning of each asm block and do a "pop <reg>" at the end. Save/restore all registers except eax & edx.

I am thinking of rewriting the whole thing in C without asm.



They look correct. Note that for sextuples, you can include 7 (30 becomes 210) also into the mix and thus be 7 times faster.
Saving and restoring registers will require a bit of thought as I am not very up on asm. Could there be a problem because your asm is 32-bit while I am compiling for 64-bit?

I think I have done the change from 30 to 210 correctly. This will probably limit n to >210 now.

If you rewrite can I suggest you have primes in blocks as well as ranges of k. Currently those arrays use lots of memory if sieving to 1G(in fact can't go quite that far for 6-tuples). Would it be possible to have portions of ndx in the L2 cache as well as sieve?
Attached Files
File Type: zip Sex_2.zip (2.5 KB, 80 views)

Last fiddled with by henryzz on 2012-04-17 at 11:31 Reason: forgot attachment
henryzz is offline   Reply With Quote
Old 2012-04-17, 13:45   #35
axn
 
axn's Avatar
 
Jun 2003

479010 Posts
Default

Quote:
Originally Posted by henryzz View Post
Could there be a problem because your asm is 32-bit while I am compiling for 64-bit?
Could be. Can you try moving the declaration of pre_sieve and sieve before the declaration of sprime?

Quote:
Originally Posted by henryzz View Post
I think I have done the change from 30 to 210 correctly. This will probably limit n to >210 now.
There needs to be additional modular considerations to correctly implement this. And some other minor changes in the index.

Quote:
Originally Posted by henryzz View Post
If you rewrite can I suggest you have primes in blocks as well as ranges of k. Currently those arrays use lots of memory if sieving to 1G(in fact can't go quite that far for 6-tuples). Would it be possible to have portions of ndx in the L2 cache as well as sieve?
Because of the way the sieving works, it requires that much memory. The good news it that, for the 5- and 6-tuple versions, there really is no point in further sieving by NewPGen. You can directly feed the output to LLR. And you don't need to sieve that deep with the program itself, either.
axn is online now   Reply With Quote
Old 2012-04-17, 14:50   #36
axn
 
axn's Avatar
 
Jun 2003

2×5×479 Posts
Default

Quote:
Originally Posted by axn View Post
There needs to be additional modular considerations to correctly implement this. And some other minor changes in the index.
You need to make the following changes for the 6-tuple version:

Code:
n_mod: array[0..11] of byte = (207,51,183,39,177,141,123,9,57,81,93,99);
Code:
nm := n_mod[ THE_N mod 12 ];
and in init_indexes, first line:
Code:
i:=5;
axn is online now   Reply With Quote
Old 2012-04-18, 23:56   #37
henryzz
Just call me Henry
 
henryzz's Avatar
 
"David"
Sep 2007
Cambridge (GMT/BST)

2×3×7×137 Posts
Default

Quote:
Originally Posted by axn View Post
You need to make the following changes for the 6-tuple version:

Code:
n_mod: array[0..11] of byte = (207,51,183,39,177,141,123,9,57,81,93,99);
Code:
nm := n_mod[ THE_N mod 12 ];
and in init_indexes, first line:
Code:
i:=5;
Didn't realize I had to change to mod 12 from mod 4 as well as 30 to 210.
henryzz is offline   Reply With Quote
Old 2012-04-19, 11:47   #38
biwema
 
biwema's Avatar
 
Mar 2004

3·127 Posts
Default

Hi Peter

Congratulations for your new discovery of the new quadruplet.
This time you were quite lucky again: At k*2^9999 (k odd) there is on average a 4-tuple every 375T.

I am also searching for a 4-tuple for some time. I my case I am sieving candidates of the form k* 2^10000. Our Name is similar and we have similar ideas…


NewPGen
I also use NewPGen for sieving and it is true that it cannot allocate a lot of memory. Besides that the algorithms are not optimal and there was no update since a long time.
I assume that Jocelyn implemented NewPGen when he was a student and now he has a job, family and no time anymore.
I did not experience that NewPGen can allocate more memory beyond 1G with the combined sieve. Could be that it is the case if you have more than 100M candidates. But in that case I would not be fast anyway.
Below 1G, it uses a bitmap with one bit per entry. That is the fastest mode in NewPGen. After combining the sieve it uses an array or fast array. The array needs 8 Byte for one candidate and it performs a binary lookup if a sieved candidate is in the list. That means Log2(Number of candidates) lookups every time.
If there is at least twice as much memory available, it uses fast array, which is a hashtable. In the best case one lookup is enough; in case of hash collisions it needs more.
One of the worst disadvantages is, that NewPGen searches for k*2^n instead of k*15*2^n. Otherwise the array would be 15 times more dense.

I will talk more about algorithmic improvements in a later post. I did not yet check, which improvement axn has already implemented in his code.


Primorial or Power of 2?
About 10 years ago I was searching for triplets and was using k*2^n with constant n and was quite alone with that. There was the common opinion that primorial are more efficient than simple powers.
The Power of 2 is a special form where it much more efficient to sieve and prp the candidates. On the other hand, primorial are a product of the smallest primes so that they don’t need to be sieved anymore. So the density of candidates is much higher and a smaller range needs to be searched.
I experienced, that powers of 2 is faster for twins and 3-tuple; 4-tuple are also faster, but the difference is not big anymore.
For 5 and higher tuple I would recommend primorial.

Some rough calculations about the expectations:
3 Tuple:
Size: 10000 digits
PRP test: 2 Seconds
Optimal NewPGen Sieve depth: 10T
Remaining candidates after sieving a range of 1T to 1G: 52500000
Range per one tuple: 3.5 T

4 Tuple:
Size: 3030 digits
PRP test: 0.15 Seconds
Optimal NewPGen Sieve depth: 0.6T
Remaining candidates after sieving a range of 1T to 1G: 2000000
Range per one tuple: 375 T

5 Tuple:
Size: 1100 digits
PRP test: 0.02 Seconds
Optimal NewPGen Sieve depth: 1G
Remaining candidates after sieving a range of 1T to 1G: 130000
Range per one tuple: 6400 T

6 Tuple:
Size: 560 digits
PRP test: 0.005 Seconds
Optimal NewPGen Sieve depth: 1G
Remaining candidates after sieving a range of 1T to 1G: 6000
Range per one tuple: 146000 T

7 Tuple:
Size: 301 digits
Remaining candidates after sieving a range of 1T to 1G: 500
Range per one tuple: 730000 T

8 Tuple:
Size: 218 digits
Remaining candidates after sieving a range of 1T to 1G: 35
Range per one tuple: 9750000 T


Even if you sieve 6-tuple only up to 1 Million, there are only 68000 candidates left, which can be prp tested within 6 minutes.

It is not difficult to find a 3 tuple with 10000 digits; maybe 2 years with NewPGen and one core. But then primalty certification of the +5 is really hard. Depending on the software that could take some years.
biwema is offline   Reply With Quote
Old 2012-04-19, 20:12   #39
Puzzle-Peter
 
Puzzle-Peter's Avatar
 
Jun 2009

2A416 Posts
Default

Quote:
Originally Posted by biwema View Post
It is not difficult to find a 3 tuple with 10000 digits; maybe 2 years with NewPGen and one core. But then primalty certification of the +5 is really hard. Depending on the software that could take some years.
Thanks for all the input! I looked for a quadruple also at n=10000 up to k=90T or so with no hit. Same with n=10001 to 10010 to k=10T each.

About proving the +5: The new version of PRIMO can use multiple CPUs when running under Linux. According to this table: http://www.ellipsa.eu/public/primo/primo.html a 10000 digit number should not take more than a few weeks on a fast machine.

I really hope somebody will extend the sieve to longer k-tuples as I am still trying to figure out some parts of the code. It's the main sieving procedure that I still just don't understand.

EDIT: By the way, how do you calculate your "Range per one tuple" figures? I get much higher numbers when I try to estimate the k range needed per tuple.

Last fiddled with by Puzzle-Peter on 2012-04-19 at 20:16
Puzzle-Peter is offline   Reply With Quote
Old 2012-04-20, 11:10   #40
henryzz
Just call me Henry
 
henryzz's Avatar
 
"David"
Sep 2007
Cambridge (GMT/BST)

2×3×7×137 Posts
Default

I have attached 5, 6 and 7
7 is back with mod 30 as -5 has been lost and I haven't worked out the numbers for mod 210 yet
The tuples we have programs for are:
3: -1 +1 +5
4: -1 +1 +5 +7
5: -1 +1 +5 +7 +11
6: -5 -1 +1 +5 +7 +11
7: -1 +1 +5 +7 +11 +17 +19
@axn Do you have a quick way of calculating the values for n_mod? My way so far has been really slow and isn't really suitable for 12. 4 was bad enough.
How large tuples do you want Puzzle-Peter?
Attached Files
File Type: zip 5and6and7.zip (7.8 KB, 85 views)
henryzz is offline   Reply With Quote
Old 2012-04-20, 12:39   #41
firejuggler
 
firejuggler's Avatar
 
Apr 2010
Over the rainbow

13×191 Posts
Default

just a minor nitpick : when you sieve for 7-tuple the file is named "quinxxx.txt"
firejuggler is offline   Reply With Quote
Old 2012-04-20, 14:54   #42
Puzzle-Peter
 
Puzzle-Peter's Avatar
 
Jun 2009

22×132 Posts
Default

I am still trying to understand enough so I can do the modifications myself. There is some success, but I still don't get several things.

1) PreSieve = ... product of all primes <19 minus some of the smaller ones. Multiplying the missing ones seems to give the "step length" which is 30 or 210 in our examples here. But why <19 and not <23, <29 or whichever prime number?

2) Probably the largest question mark: how do you determine the length and the numbers of n_mod? axn posted the numbers for step 210. 12 of them which is 1 more than the +11 used for 6tuples. Is this the way to calculate the length of n_mod?

How large should the tuples be? Well, this page http://sites.google.com/site/anthony...cts=1#largest4 lists tuples up to 23 numbers long. I don't understand the "there are no known 24tuples". Why not add the 17 to the 23-tuple? However, the longest non-trivial so far is a 19-tuple. Wouldn't it be fun to experiment in these ranges? I don't want to ask you to write this. As I said I hope I can get to the point where I am able to to this myself.

EDIT: Did I get this right: When I don't modify n_mod and stick to 210 with the numbers axn posted, all I need to do is complete the init_indexes according to the pattern I'm looking for? As I understand it, modifying the step (6, 30, 210,...) together with n_mod is purely a performance thing?

Last fiddled with by Puzzle-Peter on 2012-04-20 at 15:09
Puzzle-Peter is offline   Reply With Quote
Old 2012-04-20, 16:26   #43
biwema
 
biwema's Avatar
 
Mar 2004

3·127 Posts
Default

1) Presieve:

(I am not completely sure, if axn uses the presieve for the same purpose, as I do).
For me it is not necessary, but it is a way to improve the performance. When you sieve, you remove a candidate from the bitmap, add (for example) 7 remove the next candidate, add 7 and so on. When the prime you sieve is small, it takes a large number of steps to walk through the whole array. That’y why sieving small primes is slow (NewPGen is slow at the beginning). In this case it is easier to create an array of the size 7*11*13*17*19 * 8 (that it into bytes without remainder), sieve these first primes and take then this array as basis for future sieving. A large array (up to 23, 29 etc) is faster, when you sieve a big area, but takes longer to presieve.
A combined approach is to take an array of 56 Bits (7 bytes), sieve the prime 7, then concat the resulting array 11 times (77 Bytes). Now you can sieve the prime 11, then concat the resulting array 13 times (1001 Bytes), sieve the prime 13, concat the resulting array 17 times and sieve the prime 17 etc. until you reach the final size.
Maybe you want to keep the resulting array in case you want to sieve more than one range, in case one range does not fit into the memory.

2) n_mod:
Indeed it is a performance thing (and you can keep the memory usage 15 or 105 times smaller)
When you sieve 5-tuple, you have for example the form N + d, where d= -1, 1, 5, 7, 11.
You can show that N needs the form
0 (mod 2)
0 (mod 3)
2 (mod 5)
When you combine them together you get 12 (mod 30).
For 4 tuple and 5 tuple you only need to sieve the candidates of the form 12 (mod 30)
You can also show that primes need the form 4 or 5 (mod 7). That will result in 12 or 102 (mod 210).
So you can either sieve (mod 30) or you generate 2 runs with 12 (mod 210) and 102 (mod 210). That is not faster but 7 times more candidates fit into memory

When sieving 6-tuple (N + d, where d= -5, -1, 1, 5, 7, 11) then just 4 (mod 7) is remaining.
In this case you can sieve mod 210.
The next larger mod you get when sieveing 10-tuples with mod 2310.
biwema is offline   Reply With Quote
Old 2012-04-20, 18:27   #44
firejuggler
 
firejuggler's Avatar
 
Apr 2010
Over the rainbow

1001101100112 Posts
Default

where should i change Sprime to sieve a bit deeper for quad? (right now, quad10011_5_20... 32.3 M k ...at 1.22 B, advancing slooowly 1k every 0.2/0.1 sec)
firejuggler is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
How/Where to get Jens Kruse Andersen's prime constellation sieve? Stargate38 And now for something completely different 2 2017-04-28 00:08
Efficiently finding a linear progression in data fivemack Math 27 2015-12-12 18:42
GPU Prime Sieve tapion64 GPU Computing 7 2014-04-10 06:15
Sieve depth vs. prime probability Unregistered Information & Answers 2 2010-05-25 20:51
Prime in Riesel Sieve Project Sloth Prime Sierpinski Project 1 2006-05-10 02:02

All times are UTC. The time now is 03:28.

Sat Dec 5 03:28:44 UTC 2020 up 1 day, 23:40, 0 users, load averages: 1.75, 1.68, 1.62

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2020, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.