mersenneforum.org mtsieve
 Register FAQ Search Today's Posts Mark Forums Read

2021-01-10, 04:35   #485
rogue

"Mark"
Apr 2003
Between here and the

3·5·419 Posts

Quote:
 Originally Posted by MisterBitcoin Code:  p=24662984657, 428.2K p/sec, 2771 factors found at 13.13 sec per factor, 98.6% done. ETC 2021-01-10 03:06 Sieve completed at p=25000000013. Processor time: 9358.14 sec. (15.48 sieving) (3.87 cores) I wonder how he took 15,48 for sieving. Maybe its just an error, i used srsieve v 1.1 with the -W 4 flag. This is the first time sieving goes above 1 for me :D
The framework differentiates the time used to generate a list of primes for testing vs the rest of the execution time. You can see that a tiny percentage of time is needed for sieving compare to the function that looks for factors given a list of primes.

 2021-01-10, 19:04 #486 rogue     "Mark" Apr 2003 Between here and the 3×5×419 Posts I have released 2.1.4. Here are the changes: Code:  framework: Fixed an issue with creating GPU kernels on OS X. srseive2cl: new release Finally an OpenCL version of srsieve2. srsieve2cl is at least 3x faster than srsieve2, On my GPU it is limited to about 5000 sequences due to GPU memory limitations. I do not know what the limits are for other GPUs. It will switch to the GPU at p>1e6. On an older GPU, srsieve2cl struggles with 1000 sequences causing significant lag in the display. But that GPU is also much slower, so it isn't worth running on it.
2021-01-10, 19:30   #487
rebirther

Sep 2011
Germany

277110 Posts

Quote:
 Originally Posted by rogue I have released 2.1.4. Here are the changes: Code:  framework: Fixed an issue with creating GPU kernels on OS X. srseive2cl: new release Finally an OpenCL version of srsieve2. srsieve2cl is at least 3x faster than srsieve2, On my GPU it is limited to about 5000 sequences due to GPU memory limitations. I do not know what the limits are for other GPUs. It will switch to the GPU at p>1e6. On an older GPU, srsieve2cl struggles with 1000 sequences causing significant lag in the display. But that GPU is also much slower, so it isn't worth running on it.
How much VRAM is used for 5000 sequences and 80000?

2021-01-10, 19:43   #488
rogue

"Mark"
Apr 2003
Between here and the

3·5·419 Posts

Quote:
 Originally Posted by rebirther How much VRAM is used for 5000 sequences and 80000?
3257 sequences (9383 subsequences) using the GPU takes about 37 MB of RAM in the CPU and about 6 GB dedicated memory in the GPU (per Task Manager).

I do not recall how much CPU memory was used with 80000 sequences, but I thought it was around 2 GB.

Last fiddled with by rogue on 2021-01-10 at 19:46

2021-01-11, 01:35   #489
Citrix

Jun 2003

1,579 Posts

Quote:
 Originally Posted by rogue I have released 2.1.4. Here are the changes: Code:  framework: Fixed an issue with creating GPU kernels on OS X. srseive2cl: new release Finally an OpenCL version of srsieve2. srsieve2cl is at least 3x faster than srsieve2, On my GPU it is limited to about 5000 sequences due to GPU memory limitations. I do not know what the limits are for other GPUs. It will switch to the GPU at p>1e6. On an older GPU, srsieve2cl struggles with 1000 sequences causing significant lag in the display. But that GPU is also much slower, so it isn't worth running on it.
I am getting a speed of 4kp/sec for 11 sequences from n=1M to 20M. Sr2sieve and srsieve2 are both significantly faster. Is this what is expected?

2021-01-11, 03:24   #490
rogue

"Mark"
Apr 2003
Between here and the

3·5·419 Posts

Quote:
 Originally Posted by Citrix I am getting a speed of 4kp/sec for 11 sequences from n=1M to 20M. Sr2sieve and srsieve2 are both significantly faster. Is this what is expected?
I do not look at p/sec as it is calculated differently. I look at factors per second. It is far more accurate. Nevertheless srsieve2 and sr2sieve can be faster if your GPU isn't particularly fast.

 2021-01-11, 19:00 #491 Dylan14     "Dylan" Mar 2017 10758 Posts Might it be possible to update the primesieve code used by mtsieve to version 7.6? It seems to provide some improvements over 7.3 which is currently used: improved caching of primes improved switch statement in EratSmall and EratMedium cache size detection improved on Linux and with the Apple Silicon CPU's (which could be useful for compiling this for ARM)
2021-01-11, 19:04   #492
rebirther

Sep 2011
Germany

17×163 Posts

Quote:
 Originally Posted by rogue 3257 sequences (9383 subsequences) using the GPU takes about 37 MB of RAM in the CPU and about 6 GB dedicated memory in the GPU (per Task Manager). I do not recall how much CPU memory was used with 80000 sequences, but I thought it was around 2 GB.

Tried now the cl version on a RTX 5500XT with 8GB RAM but hit the limit, there was a driver timeout because of too much RAM used, I think it was 7.4GB.

srsieve2cl.exe -n2501 -N10000 -P1e9 -M 15000 -spl_remain.txt -fB

2021-01-11 19:57:22: Sieve completed at p=1000071173. Primes tested 50772480. Found 87459308 factors. 16098192 terms remaining. Time 239.43 seconds

The speed is awesome, still running this on 16 cores srsieve2 to compare. Could be much better on faster cards with 16-24GB RAM.

2021-01-11, 19:36   #493
rogue

"Mark"
Apr 2003
Between here and the

3·5·419 Posts

Quote:
 Originally Posted by rebirther Tried now the cl version on a RTX 5500XT with 8GB RAM but hit the limit, there was a driver timeout because of too much RAM used, I think it was 7.4GB. srsieve2cl.exe -n2501 -N10000 -P1e9 -M 15000 -spl_remain.txt -fB 2021-01-11 19:57:22: Sieve completed at p=1000071173. Primes tested 50772480. Found 87459308 factors. 16098192 terms remaining. Time 239.43 seconds The speed is awesome, still running this on 16 cores srsieve2 to compare. Could be much better on faster cards with 16-24GB RAM.
Try using a lower value for -g (10 is the default). That should reduce some of the GPU memory usage..

2021-01-11, 19:38   #494
rogue

"Mark"
Apr 2003
Between here and the

3·5·419 Posts

Quote:
 Originally Posted by Dylan14 Might it be possible to update the primesieve code used by mtsieve to version 7.6? It seems to provide some improvements over 7.3 which is currently used: improved caching of primes improved switch statement in EratSmall and EratMedium cache size detection improved on Linux and with the Apple Silicon CPU's (which could be useful for compiling this for ARM)
That shouldn't be too hard to do.

2021-01-11, 19:54   #495
rebirther

Sep 2011
Germany

17·163 Posts

Quote:
 Originally Posted by rebirther Tried now the cl version on a RTX 5500XT with 8GB RAM but hit the limit, there was a driver timeout because of too much RAM used, I think it was 7.4GB. srsieve2cl.exe -n2501 -N10000 -P1e9 -M 15000 -spl_remain.txt -fB 2021-01-11 19:57:22: Sieve completed at p=1000071173. Primes tested 50772480. Found 87459308 factors. 16098192 terms remaining. Time 239.43 seconds The speed is awesome, still running this on 16 cores srsieve2 to compare. Could be much better on faster cards with 16-24GB RAM.
vs Ryzen 3950X with 16 cores

srsieve2 -n2501 -N10000 -P1e9 -W16 -spl_remain.txt -fB

2021-01-11 20:50:35: Sieve completed at p=1000000007. Primes tested 50847420. Found 92827983 factors. 10729517 terms remaining. Time 4990.80 seconds

The CPU reduces the sievefile a bit more than GPU.