![]() |
Thanks, I had thought about using the GPU since at Primegrid sieving Proth numbers with GPU is much faster than any CPU. The reason I didn't was that I only have a GTX 760 and a GTX 1660. The 760 is quite slow and not useful. The 1660 might be ok, but I prefer to use it for Wieferich/Wall-Sun-Sun search currently and as Happy said, it'd need to be really fast to compete against the 12 cores of the Ryzen 9 3900X.
Btw, hijack all you want, I'm glad to hear about such things. |
[QUOTE=bur;594352]Thanks, I had thought about using the GPU since at Primegrid sieving Proth numbers with GPU is much faster than any CPU. The reason I didn't was that I only have a GTX 760 and a GTX 1660. The 760 is quite slow and not useful. The 1660 might be ok, but I prefer to use it for Wieferich/Wall-Sun-Sun search currently and as Happy said, it'd need to be really fast to compete against the 12 cores of the Ryzen 9 3900X.
Btw, hijack all you want, I'm glad to hear about such things.[/QUOTE] I agree, just using 6 cores on my 3800xt provides around 4-5 mp /s but using -G 2 - g40 (which seems to be the limit of improving efficiency for me) on a gtx 1650 with ddr6 only gives me around 1 mp/s. |
While we're on the topic of sieving on GPU, did anyone try colab sessions for it? I don't have any experience with it, just began copy&pasting GPU72 code which seems to run fine. Is there a similar "fire&forget" available for srsieve2cl?
|
No new primes, just another status update:
No sieving was done since the last update. All n < 5,600,000 have now been checked. No prime since more than 5M candidates, low weight indeed. :) Since the FFT size grew to 640K with n > 5.6M, the 64 MB L3 cache of the Ryzen 9 3900x ran out when testing 12 numbers simultaneously. Initially I ran six 2-threaded LLR instances, but noticed that two of them were about 30% slower than the other four. The reason being the special layout of the processor. There are four so-called CCDs with 16MB L3 cache each. And since each CCD houses three cores, that means that two of the LLR instances ran on two separate CCDs. So I switched to four 3-threaded LLR instances occupying a single CCD each. Maybe special constructs like 4 2-threaded and 4 single-threaded LLRs would lead to a higher throughput, I didn't run any tests. Smallest LLR-test currently running: n = 5.62M FFT = 640K duration = 4060 s / test digits = 1.69M Caldwell entry rank: 241 Largest LLR-test currently running: n = 5.65M FFT = 640k duration = 4090 s / test digits = 1.70M Caldwell entry rank: 238 |
Long time no update...
After a pause the tests are now running on a 12-core i9-10920x with 32 GB RAM and 20 MB L3 cache under Ubuntu 22.04 LTS. It supports AVX512 which not only gives a nice speed-up but also decreased the FFT from 640K to 588K (I assume that's what caused it). Since I'm now only running 2 simultaneous tests, I can comfortably run each one single-threaded. All [B]n < 5,800,000[/B] have been checked for primality now. No new primes. Largest known prime: [URL="https://www.rieselprime.de/ziki/Proth_prime_2_1281979"]n = 485014 (146010 digits)[/URL] Some stats for the [B]4,100,000 < n 10,000,000[/B] range: [LIST][*]Initial sieving with p < 1E6 removed 5,666,278 of the 5,900,000 candidates, i.e. 96%.[*]233,722 candidates were left after that first step.[*]Sieving with 1E6 < p < 825E12 found 172022 factors[*]93,945 candidates were left unfactored[*]27,383 LLR2 tests done[*]66,816 candidates left[/LIST](the surplus of 254 LLR2 tests is due to tests done on numbers that were factored simultaneously by sieving) [B]Sieving[/B] Recently sieved: 800E12 < p < 825E12 Software: sr1sieve 1.4.7 Factors found: 77 Largest factor found: 824937311469287 (15 digits) | 1281979 * 2^6579962 + 1 [B]LLR[/B] Currently testing: 5,800,000 <= n < 5,820,000 Software: LLR2 1.1.1 FFT = 588K duration = 7400 s / test digits = 1.74M - 1.75M Caldwell entry rank: 249 |
Still running on the same hardware.
All [B]n < 6,500,000[/B] have been checked for primality now. No new primes. Largest known prime: [URL="https://www.rieselprime.de/ziki/Proth_prime_2_1281979"]n = 485014 (146010 digits)[/URL] Some stats for the [B]4,100,000 < n 10,000,000[/B] range: Differences in brackets are referring to the last update, almost 9 months ago. [LIST][*]Initial sieving with p < 1E6 removed 5,666,278 of the 5,900,000 candidates, i.e. 96%.[*]233,722 candidates were left after that first step.[*]Sieving with 1E6 < p < 975E12 found 172,366 factors ([COLOR="SeaGreen"]+344[/COLOR]) and removed 140,119 candidates[*]93,603 candidates ([COLOR="DarkOrange"]-342[/COLOR]) were left unfactored[*]38,420 LLR2 tests ([COLOR="SeaGreen"]+11,037[/COLOR]) done[*]55,451 candidates ([COLOR="DarkOrange"]-11,365[/COLOR]) left[/LIST](the surplus of 268 LLR2 tests is due to tests done on numbers that were factored simultaneously by sieving) [B]Sieving[/B] Recently sieved: 825E12 <= p < 975E12 Software: sr1sieve 1.4.7 Factors found: 344 Duration: 18000 (s * threads) / factor Largest factor found: 974804682848417 (15 digits) | 1281979 * 2^8320810 + 1 [B]LLR[/B] Currently testing: 6,500,000 <= n < 6,540,000 Software: LLR2 1.1.1, 3 threads FFT = 672K ([COLOR="SeaGreen"]+84K[/COLOR]) Duration = 17400 (s * thread) / test, i.e. 5800 s / test Digits = 2.16M - 2.17M ([COLOR="seagreen"]+0.32M[/COLOR]) Caldwell entry rank: 180 ([COLOR="seagreen"]+69[/COLOR]) I will probably do another round of sieving soon as the removal rate got close again. 5.75E-5 / s (LLR) vs. 5.56E-5 / s (sieving). |
All times are UTC. The time now is 01:26. |
Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2023, Jelsoft Enterprises Ltd.