20230520, 12:27  #89 
Dec 2022
1B8_{16} Posts 
OK, the range to reach the new gaps and a little beyond is 2^57, let's use that. At 10^8 per second that's 50 years, so too slow. As at most one division for each prime should be needed per sieve interval, using the biggest intervals you can (whatever fits in memory) speeds it up. I'm not sure how the divisions are being done now, but I'd think 64bit assembly to be required for optimal performance  I don't think any highlevel language implements fulllength integer division, and I've used 32bit assembly for my trial division codes, hence the limit to 32bit divisors. Here 128/64 (64bit DIV) would be really needed to do a division in one piece.
I'd imagine the divisions should then take possibly a second per sieve interval. It's likely the rest of sieving would take longer, as it involves a pass over the whole of memory being used, so ultimately memory speed should be limiting. If you want 10^10 per second, you need a memory bandwidth that accommodates that length of sieve (at one odd integer per bit, 600MB/second). This should be possible. 
20230520, 17:22  #90 
Jan 2007
Germany
677 Posts 
I have an other way to store primes up to 4311711871. [sqrt (2^64+2^57)]
This will take 143,7 MB RAM in total via Byte. Code: Every prime is 30k+1,7,11,13,17,19,23,29 . These are 8 points. 1 Byte have 8 Bits To encode primes 90 to 120, we have [01111110] at Arrayposition 3 To encode primes 120 to 150, we have [10110110] at Arrayposition 4 and so on 143,7MB * 30 => primes up to 4,31 Billion @mart_r: "Currently the division of the 65bit offset number by the upto33bit numbers is the bottleneck" This is not the problem, also for this I have a trick. You will sieve from 18446744074000000000 to 18446744075000000000 and you will find the starting point for p=997...  take 18446744074000000000\2=9223372037000000000  9223372037000000000 MOD 997 = 157  the starting point for p=997 is 18446744074000000000+9972*157=18446744074000000683, so 683+(997*2)x  is starting point even, so set +p p=4170000013 > 4170000013546082470*2=3077835073 ; 4170000013 times 18446744077077835073 To multiply by 2 ... maybe via SHR option ? Last fiddled with by Cybertronic on 20230520 at 18:02 
20230521, 10:02  #91  
Dec 2008
you know...around...
2×443 Posts 
Quote:


20230521, 14:52  #92 
Jan 2022
2^{2} Posts 
I am trying to expand the range of my sieve to about 25*2^64. I am using a segmented sieve, with sieving primes stored in buckets. The sieve is mod 30, so 1 byte = 30n+[1,7,11,13,17,19,23,29].
It takes 1.52 s. to sieve an interval of 10^9, just below 2^64. My guess is that i will have almost same speed above 2^64. The main reason for this, is the use of buckets. The starting point for each prime is only calculated once, and then stored in the right bucket. When sieving, i only use the primes in the buckets assosiated with the current segment, and move the primes to the next bucket. This way i can skip using a large prime for many segments. I am using segment size 512k bytes. Any sieving prime close to 4*10^9 will skip at least 127 segments. ( (4*10^9)/(2*30*512k)). The bottleneck is storing the primes, using 8 bytes for each prime. that is at least 1.6 gb of RAM. ( and up to 8 gb, when close to 25*2^64 ) But this is just a free time project, competing with a lot of other interests. So the progress is slow. One second concern, is that first occurrences and maximal gaps are very rare above 2^64. I did not count this, but below 2^64 there are ~750 first occurrences, and i estimated there to be ~1525 first occurrences between 2^64 and 2^65. So even with fast algoritms there will only be a small gain in known first occurrences. 
20230521, 15:36  #93 
Jan 2007
Germany
677 Posts 
@kog67
To store alle primes for 25*2^64 take 715 MB, not 8GB. Another way is, to store differences between primes> 3,2,2,4,2,... This will take 2Byte per prime> ~ 1.8GB in total. 
20230521, 18:20  #94  
Jan 2022
100_{2} Posts 
Quote:
A simple segmented sieve, is 3 nested loops like this: 1) loop through all segments 2) Loop through all primes < sgrt( end of segment )this is slow, when starting at a large number, using a small segment. in step 2, large amounts of primes will miss the segment, and the starting position will be calculated repeated times until you reach the correct segment.3) loop through all positions inside segments that divide the primenumber But using a larger segment is not cachefriendly, resulting in slow memory access. Improving with buckets: 1) Initialize buckets  loop through all primes, and calculate start position  calculate witch segment it belongs ( Segment id )  Calculate position within that segment.  Save prime number, and position within segment in Bucket[ Segment id ] in step 2) we loop through primes only in Bucket[ Segment id ] in this loop the bucket has to be loaded, processed, and distributed to next buckets That is the reason the prime number has to be saved, along with its position in the next segment an example: Start of sieve = 10^10 Segment size = 1000 only odd numbers are considered Prime = 99707 First multiple is 10^10 + 113565 We save prime 99707, and position 565 in bucket[ 113 ] Sieve segments 0..112 without the need to consider this prime in Segment 113 we cross of position 565 we find next position : 565 + 2*99707 = 119979 Next bucket for this prime is 113 + 119 = 312 Move 99707 with position 979 to bucket[ 312 ] After the initializing no division or modulo is calculated in my actual implementation, i am using 32 bits to save the prime number divided by 30, and then 3 bits to mark the class of the prime modulo 30. ( 0=1, 1=7, 2=11...7=29 ).24 bits is used for position inside segment, and 3 bits is used for current multiple modulo 30. ( if we cross off M * prime, and m =1 (mod 30 ), then the next cross off is (M+6) * prime, and i need to store this info in the bucket ) 62 bits used, 2 dummy bits. 

20230522, 19:04  #95 
Dec 2008
you know...around...
376_{16} Posts 

20230524, 11:56  #96 
Dec 2022
670_{8} Posts 
I see some possibly good ideas here  Cybertronic's proposal of dropping the last bit in division could avoid 64bit DIV for all but the small number of 33bit primes, which may be faster and even might make it possible to still do this in 32bit code.
The last, describing the use of 'buckets', would need testing. Small segments are inefficient, as he admits, and it's not clear that using buckets would overcome this. While it's true that a long segment is 'not cache friendly' I'm not sure that any alternative is, which is why I stated that the speed of main memory is ultimately going to limit sieving. The list of primes and their buckets won't fit in a cache any more than a long sieve segment would; the idea I guess is that at least the first can be scanned linearly only once, but that applies for each (shorter) segment, which would appear to destroy the advantage. And again I repeat that I'd need to know that this would be accepted as an official verification (and who is making that decision)  certainly with a large investment of my and my computer's time, I don't want to have repeated what happened with F22 (https://mersenneforum.org/showpost.p...6&postcount=38) and only be ignored and mocked. 
Thread Tools  
Similar Threads  
Thread  Thread Starter  Forum  Replies  Last Post 
Guess the next maximal prime gap.  Bobby Jacobs  Prime Gap Searches  7  20220828 12:12 
Gaps between maximal prime gaps  Bobby Jacobs  Prime Gap Searches  52  20200822 15:20 
Superprime gaps  Bobby Jacobs  Prime Gap Searches  5  20190317 20:01 
Top 50 gaps  robert44444uk  Prime Gap Searches  1  20180710 20:50 
Gaps and more gaps on <300 site  gd_barnes  Riesel Prime Search  11  20070627 04:12 