![]() |
![]() |
#749 |
"Seth"
Apr 2019
1111011012 Posts |
![]()
When I wrote my quick discrete-log solver I added this very useful optimization. I looked over the code and didn't find it in mt-sieve.
When considering a single sequence k*b^n+c with 0 <= n <= N (e.g. in the sierpinski riesel) I solve the discrete_log(b, p, mod_inv(k) * -c) = x <=> b^x = mod_inv(k, p) * -c <=> p | k*b^x+c For each prime I'm hoping x <= MaxN, I noticed that when x > MaxN I don't care about the value. So I only compute baby-steps/giant-steps up to min(MaxN, order_b(p-1)) With MaxN = 1e6 and p near 1e9 this is around 2*sqrt(1e6)=2000 steps vs 2*sqrt(1e9)=63000 steps! Maybe this is already covered by GenericWorker::InitializeWorker when choosing r? or in some other way I don't understand? |
![]() |
![]() |
![]() |
#750 | |
"Mark"
Apr 2003
Between here and the
2·3·17·71 Posts |
![]() Quote:
|
|
![]() |
![]() |
![]() |
#751 | |
"Seth"
Apr 2019
17·29 Posts |
![]() Quote:
My rough understanding of srsieve is that you convert b^n (in my case 9^n) into several sub-sequences 9^0*(9^3)^a, 9^1*(9^3)^a, 9^2*(9^3)^a. After this some of these sub sequence will always have trivial divisors (related to the residual classes) and can entirely be dropped. |
|
![]() |
![]() |
![]() |
#752 | |
"Mark"
Apr 2003
Between here and the
724210 Posts |
![]() Quote:
|
|
![]() |
![]() |
![]() |
#753 |
"Seth"
Apr 2019
17·29 Posts |
![]()
I found a bug with HashTable that when I fix gives me a 20-30% speedup (at least for srsieve2). I'm looking around but I don't see a test suite in the SVN repository how do you test changes?
Last fiddled with by SethTro on 2022-11-16 at 09:20 |
![]() |
![]() |
![]() |
#754 | |
"Mark"
Apr 2003
Between here and the
724210 Posts |
![]() Quote:
What is the bug? I assume this doesn't impact the number of factors found, but impacts either putting something into the hash table or looking something up. It is possible that this impacts the OpenCL kernel as well. |
|
![]() |
![]() |
![]() |
#755 |
"Seth"
Apr 2019
17×29 Posts |
![]()
The bug is that `htable` is initializing and cleared with `0` instead of `empty_slot`.
This causes Insert to mark every entry as a probing conflict, e.g. `htable[slot] = (J | HASH_MASK1)`. On Lookup this causes additional probing. I'm working on getting your code into a git repository so I can break out my changes into smaller commits. I should have a small diff by the end of the day. |
![]() |
![]() |
![]() |
#756 |
"Seth"
Apr 2019
17×29 Posts |
![]()
The change is here: https://github.com/sethtroisi/mtsiev...7ada11384c9f6e
Before Code:
five:~/Projects/mtsieve-git$ ./srsieve2_clean -fA -W1 -p 8 -P 1e8 -n 100000 -N 250000 -s "4*9^n-7" Sieve started: 8 < p < 1e8 with 150001 terms (100000 < n < 250000, k*9^n-7) (expecting 133067 factors) CPU time: 18.90 sec. (0.04 sieving) (0.90 cores) Primes tested: 5761244. Factors found: 126233. Remaining terms: 23768. Time: 20.99 seconds. five:~/Projects/mtsieve-git$ ./srsieve2_clean -fA -W8 -p 8 -P 1e9 -n 100000 -N 250000 -s "4*9^n-7" Sieve started: 8 < p < 1e9 with 150001 terms (100000 < n < 250000, k*9^n-7) (expecting 134949 factors) CPU time: 178.19 sec. (4.04 sieving) (7.21 cores) Primes tested: 50847324. Factors found: 128864. Remaining terms: 21137. Time: 24.69 seconds. Code:
five:~/Projects/mtsieve-git$ ./srsieve2 -fA -W1 -p 8 -P 1e8 -n 100000 -N 250000 -s "4*9^n-7" Sieve started: 8 < p < 1e8 with 150001 terms (100000 < n < 250000, k*9^n-7) (expecting 133067 factors) CPU time: 15.23 sec. (0.04 sieving) (0.88 cores) Primes tested: 5761244. Factors found: 126233. Remaining terms: 23768. Time: 17.30 seconds. five:~/Projects/mtsieve-git$ ./srsieve2 -fA -W8 -p 8 -P 1e9 -n 100000 -N 250000 -s "4*9^n-7" Sieve started: 8 < p < 1e9 with 150001 terms (100000 < n < 250000, k*9^n-7) (expecting 134949 factors) CPU time: 144.05 sec. (4.36 sieving) (7.08 cores) Primes tested: 50847324. Factors found: 128864. Remaining terms: 21137. Time: 20.34 seconds. 178.19 / 144.05 = 24% faster |
![]() |
![]() |
![]() |
#757 |
"Mark"
Apr 2003
Between here and the
2·3·17·71 Posts |
![]()
Looks like the removal of memset() is where the performance boost comes from. How much of an improvement will likely be compiler specific. The OpenCL code does not use memset(), so it won't gain anything. I see that you removed the other constructor. It might be used by one of the other sieves.
This is a really nice performance boost. I greatly appreciate it. Not many people have been brave enough to look at my code and offer optimizations like this. I will look into making the changes in sourceforge and getting a new release out soon. |
![]() |
![]() |
![]() |
#758 | |
"Seth"
Apr 2019
17·29 Posts |
![]() Quote:
The real benefit is you go from 100% conflicts to 8-16% on Insert() and Lookup() I have a little more diagnostics in https://github.com/sethtroisi/mtsiev...511d3ab577cf8a I also fixed a segfault in OpenCL with https://github.com/sethtroisi/mtsiev...4b92914b0ac1f3 |
|
![]() |
![]() |
![]() |
#759 |
"Seth"
Apr 2019
1111011012 Posts |
![]()
In two quick checks I saw a 10% improvement in the openCL code by making this one line fix
Code:
for (idx=0; idx<HASH_SIZE; idx++) - h_table[idx] = 0; + h_table[idx] = HASH_ELEMENTS; Code:
# Before $ ./srsieve2cl --platform=1 -W1 -G1 -fA -p 1e9 -P 20e8 -i b9_n_1e9.abc CPU time: 56.43 sec. (1.21 sieving) (1.95 cores) GPU time: 26.96 sec. Primes tested: 47432100. Factors found: 660. Remaining terms: 20477. Time: 28.82 seconds. # After $ $ ./srsieve2cl --platform=1 -W1 -G1 -fA -p 1e9 -P 20e8 -i b9_n_1e9.abc CPU time: 51.72 sec. (0.93 sieving) (1.96 cores) GPU time: 24.91 sec. Primes tested: 47410816. Factors found: 659. Remaining terms: 20478. Time: 26.26 seconds. |
![]() |
![]() |