20220716, 10:46  #1 
Jul 2022
2^{2}·13 Posts 
Sieve Wheel
To reduce the memory size of the storage of prime numbers I had the idea of using modular arithmetic and then choosing a suitable base bW < sqrt(n) and storing prime numbers greater than p_s, with p_1 * p_2 *...* p_s <= bW .
Taking RW a vector, of length nR, of the possible remainder modulo bW not divisible for prime p with p <= p_s it is possible to store in an array PRIMES of dimension nR*n/bW+1 the prime numbers greater than p_s and less than n. If PRIMES[i,j]=1 then RW[i]+bW*j is prime. This way you can use a wheel sieve to find prime numbers with this algorithm: In the Boolean array SIEVE of size nR*(n//bW+2) initially all set to true multiples of primes p=RW[j]+bW*k can be set to false so: Code:
for (k=1; k<=sqrt(n)/bW+1; k++) for (j=0; j<len(RW); j++) if(SIEVE[j,k]) { for (i=0; i<len(RW); i++) { m_min=bW*k*k+k*C1[i,j]+C2[i,j]; for (m=m_min; m<n/bW+2; m+=RW[j]+bW*k) SIEVE[i,m]=false; } } for each RW[j] finding RW[x] such that (RW[j]*RW[x])%bW=RW[i] then C1[i,j]=RW[j]+RW[x] and if (RW[j]*RW[x])%bW ==1 then C2[i,j]=(RW[j]*RW[x])//bW otherwise C2[i,j]= 1 + (RW[j]*RW[x])//bW Is this procedure already known? Can it be useful for making a quick sieve?  Here you can find some examples: https://gist.github.com/user140242 Last fiddled with by User140242 on 20220716 at 11:09 
20220719, 11:21  #2 
Jul 2022
2^{2}×13 Posts 
In the previous post I forgot that for convenience RW[i] <= 1 and bW must be divisible by the prime numbers p with p <= p_s to find possible remains.
The sieve wheel can be adapted to Mersenne numbers using a base bW = 8*P. In this example https://www.mersenneforum.org/showpo...4&postcount=21 the multiples of the primes less than min(bW, 524288) are eliminated, which is greater than 40000 as indicated here https://www.mersenne.org/various/math.php . Last fiddled with by User140242 on 20220719 at 11:33 
20220727, 08:56  #3 
Jul 2022
2^{2}·13 Posts 
I tried looking for something similar but for example in gpusieve.cpp of mfacto I saw something like this still used:
Code:
for (i=1; i<=sqrt(n)/2; i++) if (SIEVE[i]){ p=2*i+1; for (m=i+p; m<n/2+1; m+=p) SIEVE[m]=false; } Instead using this: Code:
for (i=1; i<=sqrt(n)/6+1; i++){ if (SIEVE5mod6[i]){ p=6*i1; for (m=6*i*i; m<n/6+1; m+=p) SIEVE5mod6[m]=false; for (m=6*i*i2*i; m<n/6+1; m+=p) SIEVE1mod6[m]=false; } if (SIEVE1mod6[i]){ p=6*i+1; for (m=6*i*i; m<n/6+1; m+=p) SIEVE5mod6[m]=false; for (m=6*i*i+2*i; m<n/6+1; m+=p) SIEVE1mod6[m]=false; } } should result in a better result in terms of speed. I'm wrong? Last fiddled with by User140242 on 20220727 at 08:57 
20220727, 11:23  #4 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
7·983 Posts 
In post one you refer to using Boolean arrays. https://www.scaler.com/topics/booleaninc/ indicates those use a byte for storage.
Mfaktx use bit maps for sieving for maximum storage efficiency. I don't know whether that was driven by capacity limitations of GPU ram in circa 2009 GPUs or imposes any performance penalty. Some versions maxed out at gpusievesize 128M bits ~ 16MiB. Current maximum gpusievesize is 2047M bits ~256 MiB, limited by max magnitude of a signed 32 bit int variable used in bit position computations, 2^{31}1. All reasonably modern GPUs have considerably more VRAM than that, generally orders of magnitude. On the faster GPUs (~RTX2080) with the current bit mapped storage there are indications there would be slightly more performance if it was rewritten to unsigned to allow up to 4095M bits ~512MiB. In post 3 IIUC you paraphrase the mfakto code for sieving small primes. Those are done once per run, and are relatively few, so its performance optimization is not necessary. (I may be missing some factors of two or more in the following, but I don't think it changes the conclusion.) We want to check for factors of some substantial Mersenne number, in a range of trial factor candidates 2^{start} to 2^{stop}, say for example 2^{75} to 2^{76} on M(~110,000,000). Since only 1 or 7 mod 8 may be factors, that's ~2^{75}/110000000/4 ~ 86.E12 possible factor candidates, of which a moreclasses wheel generator "only" produces ~ 1/2 2/3 4/5 6/7 10/11 ~0.208 or ~17.8E12. It produces more performance in mfaktx to allow a relatively few composite candidates to slip through to be tried as potential factors, than to completely sieve all ~17.8E12 candidates. So it doesn't sieve the factor candidates all the way to sqrt(candidate) ~2^{37.5} to 2^{38}. Instead it goes to ~256M to ~4G dependent on program limits & on GPU&operandsspecific tuning for performance. Smallprime limits observed in performance tuning are ~100k values typically, so values up to ~2^{20}. Sieving those ~10^{5} values once, up to a suitable small limit <~1000 is relatively quick, so optimizing its comparatively small amount of work is not worthwhile. Last fiddled with by kriesel on 20220727 at 12:20 
20220727, 11:56  #5 
Jul 2022
2^{2}×13 Posts 
Yes, in the first post I used Boolean arrays only to understand how it works, of course it must be transformed through the use of bits to make it more efficient. What I noticed however in the various sieves used that the final size of the SIEVE is always n or even the prime numbers are stored in a second array. But using a sieve wheel described by me this is not necessary because by choosing an appropriate base bW the space used to store the data is always less and the complexity does not vary much and the number of iterations is the same furthermore we must not store the prime numbers in a second vector but using p=RW[i]+bW*j. 
20220727, 12:28  #6 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
7×983 Posts 
The smallprimes array is reusable on different bit levels or Mersenne numbers, since its size therefore content depend only on maktx.ini tuning parameters.
The factor candidates must be generated for each bit level & Mersenne number combination. (edit time expired on post 4) Last fiddled with by kriesel on 20220727 at 12:38 
20220727, 12:43  #7 
Jul 2022
2^{2}·13 Posts 
I tried to see how maktx works but it's the code is too long and I can't understand everything. But as I understand it is used primes[SIEVE_PRIMES_MAX] and SIEVE_PRIMES_MAX is the number of primes that are used. But with the post 3 with sieve base 6 it is possible to generate primes equal to 3*SIEVE_PRIMES_MAX using the same memory and the same complexity. This reduces the factors to be verified and therefore increases the speed. I hadn't read this Last fiddled with by User140242 on 20220727 at 13:09 
20220727, 13:34  #8 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
7·983 Posts 
See also Concepts in GIMPS trial factoring (TF) https://www.mersenneforum.org/showpo...23&postcount=6

20220810, 09:41  #9 
Jul 2022
2^{2}×13 Posts 
I propose this thread again, I have implemented this version of the wheel sieve with adjustable base:
https://gist.github.com/user140242/e...8e5f9bafdd20f7 I do not have the opportunity to test it to see if it is performing like the codes reported in this thread. Can anyone tell me if it is a good idea and if it is possible to make it more performing? MODERATOR NOTE: This post was found starting a new thread elsewhere. I moved it here. It is NOT a good idea to start a new thread on an existing topic, especially if it's an earlier thread you started. Don't do that again. Last fiddled with by Dr Sardonicus on 20220810 at 11:48 
20220810, 11:56  #10  
Jul 2022
2^{2}×13 Posts 
Quote:
I inserted the post in this thread: https://www.mersenneforum.org/showthread.php?t=22961 because I thought it was more suitable since I needed help to make a comparison with the codes reported in the thread. 

20220810, 13:04  #11  
Feb 2017
Nowhere
13557_{8} Posts 
Quote:
It just happened that your post started a new page, I read "I propose this thread again," and thought it was a new thread. That'll teach me to do moderator things before I'm fully awake... 

Thread Tools  
Similar Threads  
Thread  Thread Starter  Forum  Replies  Last Post 
PunchWheel: A (new?) wheel / segmented sieve idea (w/ code)  hansl  Computer Science & Computational Number Theory  6  20191016 20:36 
Wheel Factorization  a1call  Factoring  11  20170619 14:04 
Advantage of lattice sieve over line sieve  binu  Factoring  3  20130413 16:32 
Wheel factorization: Efficient?  3.14159  Miscellaneous Math  520  20100830 07:21 
A Wheel  storm5510  Puzzles  7  20100625 10:29 