mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Software

Reply
 
Thread Tools
Old 2021-12-25, 19:09   #606
rogue
 
rogue's Avatar
 
"Mark"
Apr 2003
Between here and the

11×599 Posts
Default

I fixed the issue that caused the c=1 (single sequence) sieve to slow down. While playing around with the code, I decided to make BASE_MULTIPLE, LIMIT_BASE, and POWER_RESIDUE_LCM configurable via command line switches. These are used primarily for the c=1 sieves and not the generic sieve. Although the default setting for these provides some of the fastest sieving, I have found some configurations that provide a 4% speed bump. Many other configurations are dreadful. The point is that you will want to take a range of 1e9 primes and try various settings for the switches. This could take a couple of hours, but considering that sieving deeply enough could take many days to reach optimal sieving depth, it would be worth the time.

Before I take the c=1 (multiple sequence) sieve testing I will finally implement the code for -L, where srsieve2 can read from a file then save Legendre tables to a file. Right now it appears that the main benefit is when one has to build large Legendre tables (or many Legendre tables) because building small ones doesn't take very long.
rogue is offline   Reply With Quote
Old 2021-12-29, 15:19   #607
rogue
 
rogue's Avatar
 
"Mark"
Apr 2003
Between here and the

11001101111012 Posts
Default

In working the logic to read/write files for the Legendre tables, I have a question to the users. Would it be preferable to have a single file for all sequences or to have a single file per sequence?

In the case of a single file, if you have a lot of sequences, then that file can be quite large. In one of my current tests with nearly 4000 sequences that file is over 6 GB in size. It takes over five hours to build (single threaded).

I'm leaning towards a change to have a single file per sequence. -L would now be used to provide a directory name (instead of a file name). The file names would be auto-generated. This would allow a user to run multiple instances of srsieve2 concurrently (same box or different box) to build those files then run a single instance (with one or more workers) that could then read the files for the sequences it uses. Those files could easily be saved for future use.

Thoughts?
rogue is offline   Reply With Quote
Old 2021-12-29, 18:00   #608
VBCurtis
 
VBCurtis's Avatar
 
"Curtis"
Feb 2005
Riverside, CA

149C16 Posts
Default

my initial reaction was "one file please, clutter."
Then your plan for organization convinced me that one file per sequence is fine too!
VBCurtis is offline   Reply With Quote
Old 2021-12-30, 20:36   #609
rogue
 
rogue's Avatar
 
"Mark"
Apr 2003
Between here and the

11·599 Posts
Default

Not a lot of feedback, but my mind was made up for me in testing.

I am running into an issue with large binary files, specifically the Legendre file when it exceeds 2 GB in size. Per the gcc doc I should be able to use fseek() to go beyond 2 GB, but I cannot despite the defining _FILE_OFFSET_BITS as 64 at compile time. I suspect that srsieve2 was never used to manage such a large file since it doesn't work either.

For this reason I am going to modify -L to represent a directory (which can be relative or absolute to srsieve2) instead of a file. That probably isn't what everyone wants to read, but I cannot imagine it being too much of a hardship for users.
rogue is offline   Reply With Quote
Old 2022-01-03, 20:32   #610
rogue
 
rogue's Avatar
 
"Mark"
Apr 2003
Between here and the

146758 Posts
Default

I have finally made enough progress to commit my changes. I haven't posted a build yet as I need to do more testing, but everything looks good right now. The new release is version 1.6.0.

My goal was never to make this logic as fast as sr2sieve. My hope was to get it "in the ballpark", but the results below are worse than what I expected. I know some of the reasons why it is slower (no hand-tuned assembly for unrolling loops), but there could be others. The long term goal was to implement this logic in the GPU, where I expect it to perform much better than sr2sieve. The unknown is whether or not it will perform better that the generic logic in srsieve2cl 1.5.3. I have not started coding it yet. I think it will be partly dependent upon whether or not I can determine the reasons for v1.6.0 to be so much slower.

Here are some timings for one range that has 265 sequences. Each run produced the same list of factors. All of these were run on the same laptop. The CPU is an Intel i7-8850H. The GPU is an NVIDIA Quadro P3200.

Code:
srsieve                                              5397 seconds
sr2sieve (no Legendre tables)                        4083 seconds
sr2sieve (with Legendre tables)                      1772 seconds
srsieve2 v1.5.3 (generic logic)                      3424 seconds
srsieve2cl v1.5.3 (generic logic, -g10 (default))     672 secomds
srsieve2cl v1.5.3 (generic logic, -g100)              602 secomds
srsieve2 v1.6.0 (c=1 logic, no legendre tables)      5015 seconds
srsieve2 v1.6.0 (c=1 logic, with legendre tables)    3504 seconds
Once I can compile and run this on OS X, I can compare there as I have an AMD Radeon on that computer.

I think the results are pretty clear if you have multiple sequences where abs(c) = 1 for all sequences:
  • If you have a GPU, use srsieve2cl, although I strongly suggest you run some tests as GPU speeds vary significantly.
  • If you do not have a GPU and are sieving sequences which support Legendre tables, use sr2sieve.
  • If you do not have a GPU and are sieving sequences which do not support Legendre tables, use srsieve2.
  • If you do not have a GPU and have a mix of sequences where some support Legendre tables and some do not, split into two inputs, run one thru sr2sieve and one thru srsieve2.
rogue is offline   Reply With Quote
Old 2022-01-13, 04:52   #611
Dylan14
 
Dylan14's Avatar
 
"Dylan"
Mar 2017

25016 Posts
Default

On the latest commit (r160) I am running into an issue with the compilation of srsieve2:

Code:
sierpinski_riesel/CisOneWithOneSequenceGpuWorker.cpp:87:56: error: ‘POWER_RESIDUE_LCM’ was not declared in this scope
   87 |    sprintf(define12, "#define POWER_RESIDUE_LCM %u\n", POWER_RESIDUE_LCM);
      |
which seems strange as a) we are defining this here, and b) in CisOneSequenceHelper.h we have the related ii_PowerResidueLcm defined.
Dylan14 is online now   Reply With Quote
Old 2022-01-13, 13:42   #612
rogue
 
rogue's Avatar
 
"Mark"
Apr 2003
Between here and the

11·599 Posts
Default

Quote:
Originally Posted by Dylan14 View Post
On the latest commit (r160) I am running into an issue with the compilation of srsieve2:

Code:
sierpinski_riesel/CisOneWithOneSequenceGpuWorker.cpp:87:56: error: ‘POWER_RESIDUE_LCM’ was not declared in this scope
   87 |    sprintf(define12, "#define POWER_RESIDUE_LCM %u\n", POWER_RESIDUE_LCM);
      |
which seems strange as a) we are defining this here, and b) in CisOneSequenceHelper.h we have the related ii_PowerResidueLcm defined.
That has to be fixed before I can release the next build. If you need the GPU worker for the single sequence c=1 sieve, then d/l the latest .7z file of the Windows executables.

If you want to take srsieve2 for multi sequence c=1 sieving for a spin, make srsieve2 and not srsieve2cl.
rogue is offline   Reply With Quote
Old 2022-01-17, 17:57   #613
rogue
 
rogue's Avatar
 
"Mark"
Apr 2003
Between here and the

11×599 Posts
Default

I fixed the source to address the compiler errors for srsieve2cl.
rogue is offline   Reply With Quote
Old 2022-01-17, 20:14   #614
rogue
 
rogue's Avatar
 
"Mark"
Apr 2003
Between here and the

146758 Posts
Default

mtsieve 2.2.3 is finally released.

At this time if you are sieving multiple sequences with srsieve2/srsieve2cl, use -l0 to force usage of the generic logic. Without it that command line option it will use the sr2sieve logic, but my testing has shown that to be slower than the generic logic. Also there is no OpenCL variant supporting sr2sieve logic.

Here are the updates:

Code:
   framework:
      Modified sieving and factor rate calculations to account for long-running sieving
      processes which lead to those rates fluctuating a lot.

      Added xmallocNew() which can return NULL if not enough memory can be allocated
      which allows callers to decide how to handle that condition.

      For OpenCL sievers, added a call to clFinish() immediately after the call to
      clEnqueueNDRangeKernel() so that the GPU time is more accurately calculated.

   fkbnsieve:  version 1.4
      Do not create output file if there are no terms to write to it.

      Exit sieving early if there are no remaining terms.
   
   smsieve, smsievecl: version 1.0
      Initial release for sieving of Smarandache numbers between Sm(100000) and Sm(9999999).
      Cannot be used at this time for sieving p < 10000, but that is okay since a project
      has a pre-sieved input file to use.  It is impractical at this time to sieve values
      larger than Sm(9999999) due to the time needed for a PRP test with pfgw.
      
   srsieve2, srsieve2cl:  version 1.6.0
      Changed the calculation of factor density so that the GPU workers can use a
      lot less GPU memory.
      
      Added -S to split sequences into groups when feeding the GPU.
      
      The following only apply to the abs(c) = 1 logic:
         Added sr2sieve logic which uses Legendre checks in support of sieving when there
         are multiple sequences.
         
         Modified -L to represent a directory to hold a back up of the Legendre tables.
         There will be one file per unique b/k/c.

         Modified -l to specify the amount of memory to allocate to Legendre tsbles. -l0
         disables use of Legendre tables.  

         Added -U to specify a multiplier to compute BASE_MULTIPLE.  The specified value
         is multiplied by 2 to compute BASE_MULTIPLE.
         
         Added -V to specify a multiplier to compute POWER_RESIDUE_LCM.  The specified value
         is multiplied by BASE_MULTIPLE to compute POWER_RESIDUE_LCM.

         Added -X to specify a multiplier to compute LIMIT_BASE.  The specified value
         is multiplied by POWER_RESIDUE_LCM to compute LIMIT_BASE.
         
         BASE_MULTIPLE: least exponent Q for which sieving in subsequence base b^Q will be allowed.
         
         POWER_RESIDUE_LCM: For a prime p that satisfies p=1 (mod r), an "r-th power residue test"
                            checks whether a subsequence of k*b^n+c can possibly contain any terms of
                            the form x^r (mod p). If there are none then that subsequence can be
                            omitted from the BSGS step.
                            
                            To conduct r-th power residue tests for each r in a set R of prime
                            powers, set POWER_RESIDUE_LCM to lcm(R).
         
         LIMIT_BASE:  Allow sieving in base b^Q for Q chosen from the divisors of LIMIT_BASE.

         When sieving with a single sequence the defaults are:
            bmMultiplier = 15, prlMultiplier = 24, lbMultiplier = 1
            BASE_MULTIPLE = 30, POWER_RESIDUE_LCM = 720, LIMIT_BASE = 720
         
         When sieving with multiple sequences the defaults are:
            bmMultiplier = 1, prlMultiplier = 360, lbMultiplier = 1
            BASE_MULTIPLE = 2, POWER_RESIDUE_LCM = 720, LIMIT_BASE = 720
            
         It is recommend to play with these multipliers to find the optimal values as the provided
         defaults are from sr1sieve and sr2sieve and some values can provide faster sieving.
         The recommendation is to sieve a range of 1e9 and see how long it takes (use sr2sieve.log
         to see how long it actually takes).  Vary -U, -V, and-X evaluate the results.  With a
         script and multiple cores, this could done relatively quickly for most inputs.
rogue is offline   Reply With Quote
Old 2022-03-11, 23:04   #615
pepi37
 
pepi37's Avatar
 
Dec 2011
After milion nines:)

2·757 Posts
Default

latest revision


e:\MTSIEVE\216\MT>k1b2sieve -P 100000000 -w 1e7 -W 1 -n 100 -N 10000 -c 1 -c 3
k1b2sieve v1.1, a program to find factors of 2^n+c numbers for variable n and c
Fatal Error: kmax must be specified
pepi37 is offline   Reply With Quote
Old 2022-03-12, 00:50   #616
rogue
 
rogue's Avatar
 
"Mark"
Apr 2003
Between here and the

19BD16 Posts
Default

Quote:
Originally Posted by pepi37 View Post
latest revision


e:\MTSIEVE\216\MT>k1b2sieve -P 100000000 -w 1e7 -W 1 -n 100 -N 10000 -c 1 -c 3
k1b2sieve v1.1, a program to find factors of 2^n+c numbers for variable n and c
Fatal Error: kmax must be specified
Probably means cmax, i.e. -C. I will fix the text.
rogue is offline   Reply With Quote
Reply

Thread Tools


All times are UTC. The time now is 01:56.


Tue May 17 01:56:04 UTC 2022 up 32 days, 23:57, 0 users, load averages: 1.67, 1.38, 1.24

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2022, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.

≠ ± ∓ ÷ × · − √ ‰ ⊗ ⊕ ⊖ ⊘ ⊙ ≤ ≥ ≦ ≧ ≨ ≩ ≺ ≻ ≼ ≽ ⊏ ⊐ ⊑ ⊒ ² ³ °
∠ ∟ ° ≅ ~ ‖ ⟂ ⫛
≡ ≜ ≈ ∝ ∞ ≪ ≫ ⌊⌋ ⌈⌉ ∘ ∏ ∐ ∑ ∧ ∨ ∩ ∪ ⨀ ⊕ ⊗ 𝖕 𝖖 𝖗 ⊲ ⊳
∅ ∖ ∁ ↦ ↣ ∩ ∪ ⊆ ⊂ ⊄ ⊊ ⊇ ⊃ ⊅ ⊋ ⊖ ∈ ∉ ∋ ∌ ℕ ℤ ℚ ℝ ℂ ℵ ℶ ℷ ℸ 𝓟
¬ ∨ ∧ ⊕ → ← ⇒ ⇐ ⇔ ∀ ∃ ∄ ∴ ∵ ⊤ ⊥ ⊢ ⊨ ⫤ ⊣ … ⋯ ⋮ ⋰ ⋱
∫ ∬ ∭ ∮ ∯ ∰ ∇ ∆ δ ∂ ℱ ℒ ℓ
𝛢𝛼 𝛣𝛽 𝛤𝛾 𝛥𝛿 𝛦𝜀𝜖 𝛧𝜁 𝛨𝜂 𝛩𝜃𝜗 𝛪𝜄 𝛫𝜅 𝛬𝜆 𝛭𝜇 𝛮𝜈 𝛯𝜉 𝛰𝜊 𝛱𝜋 𝛲𝜌 𝛴𝜎𝜍 𝛵𝜏 𝛶𝜐 𝛷𝜙𝜑 𝛸𝜒 𝛹𝜓 𝛺𝜔