mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Software

Reply
 
Thread Tools
Old 2022-11-04, 19:28   #749
SethTro
 
SethTro's Avatar
 
"Seth"
Apr 2019

503 Posts
Default

When I wrote my quick discrete-log solver I added this very useful optimization. I looked over the code and didn't find it in mt-sieve.

When considering a single sequence k*b^n+c with 0 <= n <= N (e.g. in the sierpinski riesel)
I solve the discrete_log(b, p, mod_inv(k) * -c) = x <=> b^x = mod_inv(k, p) * -c <=> p | k*b^x+c

For each prime I'm hoping x <= MaxN, I noticed that when x > MaxN I don't care about the value.
So I only compute baby-steps/giant-steps up to min(MaxN, order_b(p-1))
With MaxN = 1e6 and p near 1e9 this is around 2*sqrt(1e6)=2000 steps vs 2*sqrt(1e9)=63000 steps!

Maybe this is already covered by GenericWorker::InitializeWorker when choosing r? or in some other way I don't understand?
SethTro is offline   Reply With Quote
Old 2022-11-04, 23:09   #750
rogue
 
rogue's Avatar
 
"Mark"
Apr 2003
Between here and the

7,481 Posts
Default

Quote:
Originally Posted by SethTro View Post
When I wrote my quick discrete-log solver I added this very useful optimization. I looked over the code and didn't find it in mt-sieve.

When considering a single sequence k*b^n+c with 0 <= n <= N (e.g. in the sierpinski riesel)
I solve the discrete_log(b, p, mod_inv(k) * -c) = x <=> b^x = mod_inv(k, p) * -c <=> p | k*b^x+c

For each prime I'm hoping x <= MaxN, I noticed that when x > MaxN I don't care about the value.
So I only compute baby-steps/giant-steps up to min(MaxN, order_b(p-1))
With MaxN = 1e6 and p near 1e9 this is around 2*sqrt(1e6)=2000 steps vs 2*sqrt(1e9)=63000 steps!

Maybe this is already covered by GenericWorker::InitializeWorker when choosing r? or in some other way I don't understand?
Look at FindBestQ() is computed in the Helper class. q is computed once and is independent of p. This allows the discrete log to take the same amount of time regardless of p.
rogue is offline   Reply With Quote
Old 2022-11-11, 11:21   #751
SethTro
 
SethTro's Avatar
 
"Seth"
Apr 2019

503 Posts
Default

Quote:
Originally Posted by rogue View Post
Look at FindBestQ() is computed in the Helper class. q is computed once and is independent of p. This allows the discrete log to take the same amount of time regardless of p.
I've only spent a couple of days on this problem so I'm missing a lot of the deeper context you have. I'd love to understand why q makes the discrete log constant time (or the same per batch?). If there's an article or paper on the subject I'd enjoy reading it.

My rough understanding of srsieve is that you convert b^n (in my case 9^n) into several sub-sequences 9^0*(9^3)^a, 9^1*(9^3)^a, 9^2*(9^3)^a. After this some of these sub sequence will always have trivial divisors (related to the residual classes) and can entirely be dropped.
SethTro is offline   Reply With Quote
Old 2022-11-11, 14:06   #752
rogue
 
rogue's Avatar
 
"Mark"
Apr 2003
Between here and the

7,481 Posts
Default

Quote:
Originally Posted by SethTro View Post
I've only spent a couple of days on this problem so I'm missing a lot of the deeper context you have. I'd love to understand why q makes the discrete log constant time (or the same per batch?). If there's an article or paper on the subject I'd enjoy reading it.

My rough understanding of srsieve is that you convert b^n (in my case 9^n) into several sub-sequences 9^0*(9^3)^a, 9^1*(9^3)^a, 9^2*(9^3)^a. After this some of these sub sequence will always have trivial divisors (related to the residual classes) and can entirely be dropped.
The code in srsieve2/srsieve2cl originated from srsieve/sr1sieve. I was not the writer of that code. I do not recall if any of that code referenced a wiki page or other document specifying the logic behind the discrete log.
rogue is offline   Reply With Quote
Old 2022-11-16, 09:18   #753
SethTro
 
SethTro's Avatar
 
"Seth"
Apr 2019

7678 Posts
Default

I found a bug with HashTable that when I fix gives me a 20-30% speedup (at least for srsieve2). I'm looking around but I don't see a test suite in the SVN repository how do you test changes?

Last fiddled with by SethTro on 2022-11-16 at 09:20
SethTro is offline   Reply With Quote
Old 2022-11-16, 13:15   #754
rogue
 
rogue's Avatar
 
"Mark"
Apr 2003
Between here and the

7,481 Posts
Default

Quote:
Originally Posted by SethTro View Post
I found a bug with HashTable that when I fix gives me a 20-30% speedup (at least for srsieve2). I'm looking around but I don't see a test suite in the SVN repository how do you test changes?
Nice! I do not have a test suite. I typically run a set of tests with the previous version and the new version to compare results. If invalid factors are found or factors are missed, then I need to dig further.

What is the bug? I assume this doesn't impact the number of factors found, but impacts either putting something into the hash table or looking something up. It is possible that this impacts the OpenCL kernel as well.
rogue is offline   Reply With Quote
Old 2022-11-16, 18:42   #755
SethTro
 
SethTro's Avatar
 
"Seth"
Apr 2019

50310 Posts
Default

The bug is that `htable` is initializing and cleared with `0` instead of `empty_slot`.
This causes Insert to mark every entry as a probing conflict, e.g. `htable[slot] = (J | HASH_MASK1)`.
On Lookup this causes additional probing.

I'm working on getting your code into a git repository so I can break out my changes into smaller commits. I should have a small diff by the end of the day.
SethTro is offline   Reply With Quote
Old 2022-11-16, 21:24   #756
SethTro
 
SethTro's Avatar
 
"Seth"
Apr 2019

503 Posts
Default

The change is here: https://github.com/sethtroisi/mtsiev...7ada11384c9f6e

Before
Code:
five:~/Projects/mtsieve-git$ ./srsieve2_clean -fA -W1 -p 8 -P 1e8 -n 100000 -N 250000 -s "4*9^n-7"
Sieve started: 8 < p < 1e8 with 150001 terms (100000 < n < 250000, k*9^n-7) (expecting 133067 factors)
CPU time: 18.90 sec. (0.04 sieving) (0.90 cores)
Primes tested: 5761244.  Factors found: 126233.  Remaining terms: 23768.  Time: 20.99 seconds.

five:~/Projects/mtsieve-git$ ./srsieve2_clean -fA -W8 -p 8 -P 1e9 -n 100000 -N 250000 -s "4*9^n-7"
Sieve started: 8 < p < 1e9 with 150001 terms (100000 < n < 250000, k*9^n-7) (expecting 134949 factors)
CPU time: 178.19 sec. (4.04 sieving) (7.21 cores)
Primes tested: 50847324.  Factors found: 128864.  Remaining terms: 21137.  Time: 24.69 seconds.
After
Code:
five:~/Projects/mtsieve-git$ ./srsieve2 -fA -W1 -p 8 -P 1e8 -n 100000 -N 250000 -s "4*9^n-7"
Sieve started: 8 < p < 1e8 with 150001 terms (100000 < n < 250000, k*9^n-7) (expecting 133067 factors)
CPU time: 15.23 sec. (0.04 sieving) (0.88 cores)
Primes tested: 5761244.  Factors found: 126233.  Remaining terms: 23768.  Time: 17.30 seconds.

five:~/Projects/mtsieve-git$ ./srsieve2 -fA -W8 -p 8 -P 1e9 -n 100000 -N 250000 -s "4*9^n-7"
Sieve started: 8 < p < 1e9 with 150001 terms (100000 < n < 250000, k*9^n-7) (expecting 134949 factors)
CPU time: 144.05 sec. (4.36 sieving) (7.08 cores)
Primes tested: 50847324.  Factors found: 128864.  Remaining terms: 21137.  Time: 20.34 seconds.
18.9 / 15.23 = 24% faster
178.19 / 144.05 = 24% faster
SethTro is offline   Reply With Quote
Old 2022-11-16, 22:19   #757
rogue
 
rogue's Avatar
 
"Mark"
Apr 2003
Between here and the

1D3916 Posts
Default

Looks like the removal of memset() is where the performance boost comes from. How much of an improvement will likely be compiler specific. The OpenCL code does not use memset(), so it won't gain anything. I see that you removed the other constructor. It might be used by one of the other sieves.

This is a really nice performance boost. I greatly appreciate it. Not many people have been brave enough to look at my code and offer optimizations like this. I will look into making the changes in sourceforge and getting a new release out soon.
rogue is offline   Reply With Quote
Old 2022-11-16, 23:23   #758
SethTro
 
SethTro's Avatar
 
"Seth"
Apr 2019

503 Posts
Default

Quote:
Originally Posted by rogue View Post
Looks like the removal of memset() is where the performance boost comes from. How much of an improvement will likely be compiler specific. The OpenCL code does not use memset(), so it won't gain anything. I see that you removed the other constructor. It might be used by one of the other sieves.

This is a really nice performance boost. I greatly appreciate it. Not many people have been brave enough to look at my code and offer optimizations like this. I will look into making the changes in sourceforge and getting a new release out soon.
I don't believe removing memset changes the performance, I still have to write a value to every index.

The real benefit is you go from 100% conflicts to 8-16% on Insert() and Lookup()
I have a little more diagnostics in https://github.com/sethtroisi/mtsiev...511d3ab577cf8a

I also fixed a segfault in OpenCL with https://github.com/sethtroisi/mtsiev...4b92914b0ac1f3
SethTro is offline   Reply With Quote
Old 2022-11-16, 23:40   #759
SethTro
 
SethTro's Avatar
 
"Seth"
Apr 2019

503 Posts
Default

In two quick checks I saw a 10% improvement in the openCL code by making this one line fix

Code:
    for (idx=0; idx<HASH_SIZE; idx++)
-      h_table[idx] = 0;  
+      h_table[idx] = HASH_ELEMENTS;
Code:
# Before
$ ./srsieve2cl --platform=1 -W1 -G1 -fA -p 1e9 -P 20e8 -i b9_n_1e9.abc
CPU time: 56.43 sec. (1.21 sieving) (1.95 cores) GPU time: 26.96 sec. 
Primes tested: 47432100.  Factors found: 660.  Remaining terms: 20477.  Time: 28.82 seconds.

# After
$ $ ./srsieve2cl --platform=1 -W1 -G1 -fA -p 1e9 -P 20e8 -i b9_n_1e9.abc
CPU time: 51.72 sec. (0.93 sieving) (1.96 cores) GPU time: 24.91 sec. 
Primes tested: 47410816.  Factors found: 659.  Remaining terms: 20478.  Time: 26.26 seconds.
SethTro is offline   Reply With Quote
Reply

Thread Tools


All times are UTC. The time now is 10:02.


Fri Sep 29 10:02:34 UTC 2023 up 16 days, 7:44, 0 users, load averages: 1.03, 0.97, 0.81

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2023, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.

≠ ± ∓ ÷ × · − √ ‰ ⊗ ⊕ ⊖ ⊘ ⊙ ≤ ≥ ≦ ≧ ≨ ≩ ≺ ≻ ≼ ≽ ⊏ ⊐ ⊑ ⊒ ² ³ °
∠ ∟ ° ≅ ~ ‖ ⟂ ⫛
≡ ≜ ≈ ∝ ∞ ≪ ≫ ⌊⌋ ⌈⌉ ∘ ∏ ∐ ∑ ∧ ∨ ∩ ∪ ⨀ ⊕ ⊗ 𝖕 𝖖 𝖗 ⊲ ⊳
∅ ∖ ∁ ↦ ↣ ∩ ∪ ⊆ ⊂ ⊄ ⊊ ⊇ ⊃ ⊅ ⊋ ⊖ ∈ ∉ ∋ ∌ ℕ ℤ ℚ ℝ ℂ ℵ ℶ ℷ ℸ 𝓟
¬ ∨ ∧ ⊕ → ← ⇒ ⇐ ⇔ ∀ ∃ ∄ ∴ ∵ ⊤ ⊥ ⊢ ⊨ ⫤ ⊣ … ⋯ ⋮ ⋰ ⋱
∫ ∬ ∭ ∮ ∯ ∰ ∇ ∆ δ ∂ ℱ ℒ ℓ
𝛢𝛼 𝛣𝛽 𝛤𝛾 𝛥𝛿 𝛦𝜀𝜖 𝛧𝜁 𝛨𝜂 𝛩𝜃𝜗 𝛪𝜄 𝛫𝜅 𝛬𝜆 𝛭𝜇 𝛮𝜈 𝛯𝜉 𝛰𝜊 𝛱𝜋 𝛲𝜌 𝛴𝜎𝜍 𝛵𝜏 𝛶𝜐 𝛷𝜙𝜑 𝛸𝜒 𝛹𝜓 𝛺𝜔