mersenneforum.org mtsieve
 Register FAQ Search Today's Posts Mark Forums Read

2022-12-04, 13:24   #804
Jean Penné

May 2004
FRANCE

617 Posts
Does the static binary work for you?

Quote:
 Originally Posted by pepi37 Nope, just for linux , and as I know it is not fast enough ( in my case)
Nevertheless, I need to know if the static binary of llrCUDA works for you, even if it is not fast enough...

Jean

2022-12-04, 15:14   #805
pepi37

Dec 2011
After milion nines:)

2·32·7·13 Posts

Quote:
 Originally Posted by Jean Penné Nevertheless, I need to know if the static binary of llrCUDA works for you, even if it is not fast enough... Thank you by advance, Jean
Yes it works :) But one instance "eat one CPU core"
Using trick with libsleep.I can reduce it to 50% of one CPU core. Speed is same as Ryzen7 3700x per core: since both need around 17 minutes for test of 535000 digits candidate

Quote:
 root@OMICRON:~/LLR# ./sllrCUDA -d -q"4569*2^1778899+1" Starting Proth prime test of 4569*2^1778899+1 Using complex irrational base DWT, FFT length = 262144, a = 5 ^Ceration: 160000 / 1778910 [8.49%], ms/iter: 0.596, ETA: 00:16:04 Caught signal. Terminating. Stopping Proth prime test of 4569*2^1778899+1 at iteration 164342 [9.23%] root@OMICRON:~/LLR# LD_PRELOAD="/usr/local/lib/libsleep.so" ./sllrCUDA -d -q"4569*2^1778899+1" libsleep: Sleep time: 50usec Resuming Proth prime test of 4569*2^1778899+1 at bit 164343 [9.23%] Using complex irrational base DWT, FFT length = 262144, a = 5 ^Ceration: 310000 / 1778910 [16.93%], ms/iter: 0.593, ETA: 00:14:30 Caught signal. Terminating. Stopping Proth prime test of 4569*2^1778899+1 at iteration 317616 [17.85%]

 2022-12-04, 20:51 #806 Citrix     Jun 2003 1,609 Posts I am getting the following error. What settings do I need to change? Code: srsieve2cl.exe -i sr_2.abcd -W4 -p 10000000000000 -P 11000000000000 -Ofactors.txt -osr_2_new.abcd -G12 -M100000 -l1000 srsieve2cl v1.6.5, a program to find factors of k*b^n+c numbers for fixed b and variable k and n Sieving with multi-sequence c=1 logic for p >= 10000000000000 BASE_MULTIPLE = 2, POWER_RESIDUE_LCM = 720, LIMIT_BASE = 720 Assertion failed: m <= HASH_MAX_ELTS, file sierpinski_riesel/AbstractSequenceHelper.cpp, line 272
2022-12-04, 21:59   #807
rogue

"Mark"
Apr 2003
Between here and the

11011100011102 Posts

Quote:
 Originally Posted by Citrix I am getting the following error. What settings do I need to change? Code: srsieve2cl.exe -i sr_2.abcd -W4 -p 10000000000000 -P 11000000000000 -Ofactors.txt -osr_2_new.abcd -G12 -M100000 -l1000 srsieve2cl v1.6.5, a program to find factors of k*b^n+c numbers for fixed b and variable k and n Sieving with multi-sequence c=1 logic for p >= 10000000000000 BASE_MULTIPLE = 2, POWER_RESIDUE_LCM = 720, LIMIT_BASE = 720 Assertion failed: m <= HASH_MAX_ELTS, file sierpinski_riesel/AbstractSequenceHelper.cpp, line 272
I ran into this in the past week so I have a solution for it. I posted an experimental build over at sourceforge that should address this.

 2022-12-04, 22:05 #808 Citrix     Jun 2003 1,609 Posts I get with new Code: srsieve2cl.exe -i sr_2.abcd -W2 -p 10000000000000 -P 11000000000000 -Ofactors.txt -osr_2_new.abcd -M1000 -l10000 -w1000 -G12 srsieve2cl v1.6.5, a program to find factors of k*b^n+c numbers for fixed b and variable k and n Sieving with multi-sequence c=1 logic for p >= 10000000000000 BASE_MULTIPLE = 2, POWER_RESIDUE_LCM = 720, LIMIT_BASE = 720 Split 204 base 2 sequences into 9182 base 2^720 sequences. Legendre summary: Approximately 4752 B needed for Legendre tables 204 total sequences 204 are eligible for Legendre tables 0 are not eligible for Legendre tables 204 have Legendre tables in memory 0 cannot have Legendre tables in memory 0 have Legendre tables loaded from files 204 required building of the Legendre tables 17625600 bytes used for congruent subseq indices 1360000 bytes used for congruent subseqs Fatal Error: Must use generic worker if using GPU with multiple sequences by specifying -l0 With generic code Code: srsieve2cl.exe -i sr_2.abcd -W2 -p 10000000000000 -P 11000000000000 -Ofactors.txt -osr_2_new.abcd -M1000 -w1000 -G6 srsieve2cl v1.6.5, a program to find factors of k*b^n+c numbers for fixed b and variable k and n Must use generic sieving logic because -l was not specified for mutiple sequences Sieving with generic logic for p >= 10000000000000 Split 204 base 2 sequences into 20555 base 2^2880 sequences. bestQ = 2880 yields bs = 6077, gs = 1, sieveLow = 868, sieveRange = 6077 bestQ = 2880 yields bs = 6077, gs = 1, sieveLow = 868, sieveRange = 6077 GPU primes per worker is 57344 Sieve started: 1e13 < p < 11e12 with 134418 terms (2500875 < n < 20000000, k*2^n-1) (expecting 427 factors) Increasing worksize to 16000 since each chunk is tested in less than a second OpenCL Error: Out of host memory in call to clEnqueueNDRangeOpenCLKernel kernelName: generic_kernel globalworksize 57344 localworksize 256 Last fiddled with by Citrix on 2022-12-04 at 22:08
2022-12-05, 03:39   #809
rogue

"Mark"
Apr 2003
Between here and the

2·3,527 Posts

Quote:
 Originally Posted by Citrix With generic code Code: srsieve2cl.exe -i sr_2.abcd -W2 -p 10000000000000 -P 11000000000000 -Ofactors.txt -osr_2_new.abcd -M1000 -w1000 -G6 srsieve2cl v1.6.5, a program to find factors of k*b^n+c numbers for fixed b and variable k and n Must use generic sieving logic because -l was not specified for mutiple sequences Sieving with generic logic for p >= 10000000000000 Split 204 base 2 sequences into 20555 base 2^2880 sequences. bestQ = 2880 yields bs = 6077, gs = 1, sieveLow = 868, sieveRange = 6077 bestQ = 2880 yields bs = 6077, gs = 1, sieveLow = 868, sieveRange = 6077 GPU primes per worker is 57344 Sieve started: 1e13 < p < 11e12 with 134418 terms (2500875 < n < 20000000, k*2^n-1) (expecting 427 factors) Increasing worksize to 16000 since each chunk is tested in less than a second OpenCL Error: Out of host memory in call to clEnqueueNDRangeOpenCLKernel kernelName: generic_kernel globalworksize 57344 localworksize 256
You should use -g to increase GPU primes per worker as opposed to the number of GPU threads. The framework, at this time, does not support one executable running concurrently on multiple GPUs.

Using -G impacts GPU memory usage, but with that many subsequences I suggest that you use -b (a value less than 1.0) to reduce the size of the hash table that the GPU will use. You might also want to use -K to split the sequences across multiple chunks. This will require some trial and error on your part. There is no way (that I am aware of) to compute the memory required for a kernel so the code cannot "auto-tune" these parameters.

You cannot use -l > 0 with the GPU when you have multiple sequences. srsieve2cl does not support it at this time.

I also do not recommend mixing -W and -G. The factor rate calculation does not work correctly when using both CPU and GPU workers.

You can use -p10e12 -P11e12 if that is easier to read.

 2022-12-05, 18:44 #810 storm5510 Random Account     Aug 2009 Not U. + S.A. 1010001101112 Posts @rogue Q.: Does srsieve2cl generate an exit code when it finishes? Running small sieves from a batch sometimes would fail because I had the -M set too low. It was at 3,500. Now, it is 10,000. It varied based on what the k value was. Some k's caused problems and others did not. All used the same values for -n, -N, and -P.
2022-12-05, 19:24   #811
rogue

"Mark"
Apr 2003
Between here and the

2·3,527 Posts

Quote:
 Originally Posted by storm5510 @rogue Q.: Does srsieve2cl generate an exit code when it finishes? Running small sieves from a batch sometimes would fail because I had the -M set too low. It was at 3,500. Now, it is 10,000. It varied based on what the k value was. Some k's caused problems and others did not. All used the same values for -n, -N, and -P.
For normal completion it will output the number of terms written to the output file and the time it took to run.

SEGFAULTs will just give you the command prompt without any of that. If that happens let me know.

Last fiddled with by rogue on 2022-12-05 at 19:25

2022-12-05, 23:41   #812
storm5510
Random Account

Aug 2009
Not U. + S.A.

5×523 Posts

Quote:
 Originally Posted by rogue For normal completion it will output the number of terms written to the output file and the time it took to run. SEGFAULTs will just give you the command prompt without any of that. If that happens let me know.
Forgive me, but I didn't specify it correctly. An error code?

For a normal program run and exit, an error code of zero is expected. If there is an error, a non-zero code is returned.

Quote:
 Originally Posted by Jean Penné Nevertheless, I need to know if the static binary of llrCUDA works for you, even if it is not fast enough... Thank you by advance, Jean
Off-topic: I am running it as a test. According to nvidia-smi, it is using about 30% of the GPU's capability. I am running "1955*2^n+1" for the test. The k is my birth year. The n's are around 102K presently. Despite not being all that fast, it is quite stable in my case. Ubuntu 20.04.4 LTS using a GTX 1080. The iteration time holds steady at 0.14 seconds. The overall time is increasing gradually.

2022-12-06, 00:09   #813
rogue

"Mark"
Apr 2003
Between here and the

2×3,527 Posts

Quote:
 Originally Posted by storm5510 Forgive me, but I didn't specify it correctly. An error code? For a normal program run and exit, an error code of zero is expected. If there is an error, a non-zero code is returned.
It will be zero upon successful completion. A FatalError (caught and output to the console) is -1. I'm not certain what assert() with exit with.

I do not understand why you care. The error code is not output to the console.

 2022-12-06, 03:34 #814 Citrix     Jun 2003 1,609 Posts @Rogue I can get the program to work but it is extremely slow without the Legendre tables. Couple of other questions/thoughts 1. I get the following error with the CPU code as well (srsieve2). Can you release a fix. Code: Assertion failed: m <= HASH_MAX_ELTS, file sierpinski_riesel/AbstractSequenceHelper.cpp, line 272 2. For BASE_MULTIPLE there is a limit of 60 ... can this be increased to 256 or higher. 3. Possible bug:- The GPU code seems to crash if the n range is large (~15M); seems to produce false factors if n range is large and LIMIT_BASE is huge. 4. For what type of sequences is it best to use GPU and for which ones should you stick to CPU. Thanks Last fiddled with by Citrix on 2022-12-06 at 04:00 Reason: Sp

All times are UTC. The time now is 02:34.

Thu Mar 23 02:34:31 UTC 2023 up 217 days, 3 mins, 0 users, load averages: 0.81, 0.87, 0.88

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.

≠ ± ∓ ÷ × · − √ ‰ ⊗ ⊕ ⊖ ⊘ ⊙ ≤ ≥ ≦ ≧ ≨ ≩ ≺ ≻ ≼ ≽ ⊏ ⊐ ⊑ ⊒ ² ³ °
∠ ∟ ° ≅ ~ ‖ ⟂ ⫛
≡ ≜ ≈ ∝ ∞ ≪ ≫ ⌊⌋ ⌈⌉ ∘ ∏ ∐ ∑ ∧ ∨ ∩ ∪ ⨀ ⊕ ⊗ 𝖕 𝖖 𝖗 ⊲ ⊳
∅ ∖ ∁ ↦ ↣ ∩ ∪ ⊆ ⊂ ⊄ ⊊ ⊇ ⊃ ⊅ ⊋ ⊖ ∈ ∉ ∋ ∌ ℕ ℤ ℚ ℝ ℂ ℵ ℶ ℷ ℸ 𝓟
¬ ∨ ∧ ⊕ → ← ⇒ ⇐ ⇔ ∀ ∃ ∄ ∴ ∵ ⊤ ⊥ ⊢ ⊨ ⫤ ⊣ … ⋯ ⋮ ⋰ ⋱
∫ ∬ ∭ ∮ ∯ ∰ ∇ ∆ δ ∂ ℱ ℒ ℓ
𝛢𝛼 𝛣𝛽 𝛤𝛾 𝛥𝛿 𝛦𝜀𝜖 𝛧𝜁 𝛨𝜂 𝛩𝜃𝜗 𝛪𝜄 𝛫𝜅 𝛬𝜆 𝛭𝜇 𝛮𝜈 𝛯𝜉 𝛰𝜊 𝛱𝜋 𝛲𝜌 𝛴𝜎𝜍 𝛵𝜏 𝛶𝜐 𝛷𝜙𝜑 𝛸𝜒 𝛹𝜓 𝛺𝜔