mersenneforum.org mtsieve
 Register FAQ Search Today's Posts Mark Forums Read

 2022-11-04, 19:28 #749 SethTro     "Seth" Apr 2019 503 Posts When I wrote my quick discrete-log solver I added this very useful optimization. I looked over the code and didn't find it in mt-sieve. When considering a single sequence k*b^n+c with 0 <= n <= N (e.g. in the sierpinski riesel) I solve the discrete_log(b, p, mod_inv(k) * -c) = x <=> b^x = mod_inv(k, p) * -c <=> p | k*b^x+c For each prime I'm hoping x <= MaxN, I noticed that when x > MaxN I don't care about the value. So I only compute baby-steps/giant-steps up to min(MaxN, order_b(p-1)) With MaxN = 1e6 and p near 1e9 this is around 2*sqrt(1e6)=2000 steps vs 2*sqrt(1e9)=63000 steps! Maybe this is already covered by GenericWorker::InitializeWorker when choosing r? or in some other way I don't understand?
2022-11-04, 23:09   #750
rogue

"Mark"
Apr 2003
Between here and the

7,481 Posts

Quote:
 Originally Posted by SethTro When I wrote my quick discrete-log solver I added this very useful optimization. I looked over the code and didn't find it in mt-sieve. When considering a single sequence k*b^n+c with 0 <= n <= N (e.g. in the sierpinski riesel) I solve the discrete_log(b, p, mod_inv(k) * -c) = x <=> b^x = mod_inv(k, p) * -c <=> p | k*b^x+c For each prime I'm hoping x <= MaxN, I noticed that when x > MaxN I don't care about the value. So I only compute baby-steps/giant-steps up to min(MaxN, order_b(p-1)) With MaxN = 1e6 and p near 1e9 this is around 2*sqrt(1e6)=2000 steps vs 2*sqrt(1e9)=63000 steps! Maybe this is already covered by GenericWorker::InitializeWorker when choosing r? or in some other way I don't understand?
Look at FindBestQ() is computed in the Helper class. q is computed once and is independent of p. This allows the discrete log to take the same amount of time regardless of p.

2022-11-11, 11:21   #751
SethTro

"Seth"
Apr 2019

503 Posts

Quote:
 Originally Posted by rogue Look at FindBestQ() is computed in the Helper class. q is computed once and is independent of p. This allows the discrete log to take the same amount of time regardless of p.
I've only spent a couple of days on this problem so I'm missing a lot of the deeper context you have. I'd love to understand why q makes the discrete log constant time (or the same per batch?). If there's an article or paper on the subject I'd enjoy reading it.

My rough understanding of srsieve is that you convert b^n (in my case 9^n) into several sub-sequences 9^0*(9^3)^a, 9^1*(9^3)^a, 9^2*(9^3)^a. After this some of these sub sequence will always have trivial divisors (related to the residual classes) and can entirely be dropped.

2022-11-11, 14:06   #752
rogue

"Mark"
Apr 2003
Between here and the

7,481 Posts

Quote:
 Originally Posted by SethTro I've only spent a couple of days on this problem so I'm missing a lot of the deeper context you have. I'd love to understand why q makes the discrete log constant time (or the same per batch?). If there's an article or paper on the subject I'd enjoy reading it. My rough understanding of srsieve is that you convert b^n (in my case 9^n) into several sub-sequences 9^0*(9^3)^a, 9^1*(9^3)^a, 9^2*(9^3)^a. After this some of these sub sequence will always have trivial divisors (related to the residual classes) and can entirely be dropped.
The code in srsieve2/srsieve2cl originated from srsieve/sr1sieve. I was not the writer of that code. I do not recall if any of that code referenced a wiki page or other document specifying the logic behind the discrete log.

 2022-11-16, 09:18 #753 SethTro     "Seth" Apr 2019 7678 Posts I found a bug with HashTable that when I fix gives me a 20-30% speedup (at least for srsieve2). I'm looking around but I don't see a test suite in the SVN repository how do you test changes? Last fiddled with by SethTro on 2022-11-16 at 09:20
2022-11-16, 13:15   #754
rogue

"Mark"
Apr 2003
Between here and the

7,481 Posts

Quote:
 Originally Posted by SethTro I found a bug with HashTable that when I fix gives me a 20-30% speedup (at least for srsieve2). I'm looking around but I don't see a test suite in the SVN repository how do you test changes?
Nice! I do not have a test suite. I typically run a set of tests with the previous version and the new version to compare results. If invalid factors are found or factors are missed, then I need to dig further.

What is the bug? I assume this doesn't impact the number of factors found, but impacts either putting something into the hash table or looking something up. It is possible that this impacts the OpenCL kernel as well.

 2022-11-16, 18:42 #755 SethTro     "Seth" Apr 2019 50310 Posts The bug is that htable is initializing and cleared with 0 instead of empty_slot. This causes Insert to mark every entry as a probing conflict, e.g. htable[slot] = (J | HASH_MASK1). On Lookup this causes additional probing. I'm working on getting your code into a git repository so I can break out my changes into smaller commits. I should have a small diff by the end of the day.
 2022-11-16, 21:24 #756 SethTro     "Seth" Apr 2019 503 Posts The change is here: https://github.com/sethtroisi/mtsiev...7ada11384c9f6e Before Code: five:~/Projects/mtsieve-git$./srsieve2_clean -fA -W1 -p 8 -P 1e8 -n 100000 -N 250000 -s "4*9^n-7" Sieve started: 8 < p < 1e8 with 150001 terms (100000 < n < 250000, k*9^n-7) (expecting 133067 factors) CPU time: 18.90 sec. (0.04 sieving) (0.90 cores) Primes tested: 5761244. Factors found: 126233. Remaining terms: 23768. Time: 20.99 seconds. five:~/Projects/mtsieve-git$ ./srsieve2_clean -fA -W8 -p 8 -P 1e9 -n 100000 -N 250000 -s "4*9^n-7" Sieve started: 8 < p < 1e9 with 150001 terms (100000 < n < 250000, k*9^n-7) (expecting 134949 factors) CPU time: 178.19 sec. (4.04 sieving) (7.21 cores) Primes tested: 50847324. Factors found: 128864. Remaining terms: 21137. Time: 24.69 seconds. After Code: five:~/Projects/mtsieve-git$./srsieve2 -fA -W1 -p 8 -P 1e8 -n 100000 -N 250000 -s "4*9^n-7" Sieve started: 8 < p < 1e8 with 150001 terms (100000 < n < 250000, k*9^n-7) (expecting 133067 factors) CPU time: 15.23 sec. (0.04 sieving) (0.88 cores) Primes tested: 5761244. Factors found: 126233. Remaining terms: 23768. Time: 17.30 seconds. five:~/Projects/mtsieve-git$ ./srsieve2 -fA -W8 -p 8 -P 1e9 -n 100000 -N 250000 -s "4*9^n-7" Sieve started: 8 < p < 1e9 with 150001 terms (100000 < n < 250000, k*9^n-7) (expecting 134949 factors) CPU time: 144.05 sec. (4.36 sieving) (7.08 cores) Primes tested: 50847324. Factors found: 128864. Remaining terms: 21137. Time: 20.34 seconds. 18.9 / 15.23 = 24% faster 178.19 / 144.05 = 24% faster
 2022-11-16, 22:19 #757 rogue     "Mark" Apr 2003 Between here and the 1D3916 Posts Looks like the removal of memset() is where the performance boost comes from. How much of an improvement will likely be compiler specific. The OpenCL code does not use memset(), so it won't gain anything. I see that you removed the other constructor. It might be used by one of the other sieves. This is a really nice performance boost. I greatly appreciate it. Not many people have been brave enough to look at my code and offer optimizations like this. I will look into making the changes in sourceforge and getting a new release out soon.
2022-11-16, 23:23   #758
SethTro

"Seth"
Apr 2019

503 Posts

Quote:
 Originally Posted by rogue Looks like the removal of memset() is where the performance boost comes from. How much of an improvement will likely be compiler specific. The OpenCL code does not use memset(), so it won't gain anything. I see that you removed the other constructor. It might be used by one of the other sieves. This is a really nice performance boost. I greatly appreciate it. Not many people have been brave enough to look at my code and offer optimizations like this. I will look into making the changes in sourceforge and getting a new release out soon.
I don't believe removing memset changes the performance, I still have to write a value to every index.

The real benefit is you go from 100% conflicts to 8-16% on Insert() and Lookup()
I have a little more diagnostics in https://github.com/sethtroisi/mtsiev...511d3ab577cf8a

I also fixed a segfault in OpenCL with https://github.com/sethtroisi/mtsiev...4b92914b0ac1f3

 2022-11-16, 23:40 #759 SethTro     "Seth" Apr 2019 503 Posts In two quick checks I saw a 10% improvement in the openCL code by making this one line fix Code: ` for (idx=0; idx

All times are UTC. The time now is 10:02.

Fri Sep 29 10:02:34 UTC 2023 up 16 days, 7:44, 0 users, load averages: 1.03, 0.97, 0.81

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.

≠ ± ∓ ÷ × · − √ ‰ ⊗ ⊕ ⊖ ⊘ ⊙ ≤ ≥ ≦ ≧ ≨ ≩ ≺ ≻ ≼ ≽ ⊏ ⊐ ⊑ ⊒ ² ³ °
∠ ∟ ° ≅ ~ ‖ ⟂ ⫛
≡ ≜ ≈ ∝ ∞ ≪ ≫ ⌊⌋ ⌈⌉ ∘ ∏ ∐ ∑ ∧ ∨ ∩ ∪ ⨀ ⊕ ⊗ 𝖕 𝖖 𝖗 ⊲ ⊳
∅ ∖ ∁ ↦ ↣ ∩ ∪ ⊆ ⊂ ⊄ ⊊ ⊇ ⊃ ⊅ ⊋ ⊖ ∈ ∉ ∋ ∌ ℕ ℤ ℚ ℝ ℂ ℵ ℶ ℷ ℸ 𝓟
¬ ∨ ∧ ⊕ → ← ⇒ ⇐ ⇔ ∀ ∃ ∄ ∴ ∵ ⊤ ⊥ ⊢ ⊨ ⫤ ⊣ … ⋯ ⋮ ⋰ ⋱
∫ ∬ ∭ ∮ ∯ ∰ ∇ ∆ δ ∂ ℱ ℒ ℓ
𝛢𝛼 𝛣𝛽 𝛤𝛾 𝛥𝛿 𝛦𝜀𝜖 𝛧𝜁 𝛨𝜂 𝛩𝜃𝜗 𝛪𝜄 𝛫𝜅 𝛬𝜆 𝛭𝜇 𝛮𝜈 𝛯𝜉 𝛰𝜊 𝛱𝜋 𝛲𝜌 𝛴𝜎𝜍 𝛵𝜏 𝛶𝜐 𝛷𝜙𝜑 𝛸𝜒 𝛹𝜓 𝛺𝜔