![]() |
![]() |
#1068 |
"Gary"
May 2007
Overland Park, KS
22×32×349 Posts |
![]() |
![]() |
![]() |
![]() |
#1069 |
"Mark"
Apr 2003
Between here and the
33·277 Posts |
![]()
To d/l all of the code you can use svn checkout.
srsieve2/srsieve2cl is by far the most complex code built upon the framework. There are the "Generic" classes which have the srsieve functionality. "CisOneWithMultipleSequences" classes have the sr2sieve functionality. "CisOneWithOneeSequence" classes have the sr1sieve functionality. All GPU code is in the .gpu files. Are build time these are run thru a converter to create the .h files, which are needed by the GPU worker classes. The .gpu files use OpenCL C, which is easily understood if you know C. As I have stated before sr1sieve and sr2sieve are likely faster and srsieve2, but srsieve2cl is likely faster than sr1sieve and sr2sieve. The only reason to use srsieve2 on Windows is if you want to take advantage of multi-threading or if you cannot use sr1sieve/sr2sieve. I have no intention of changing srsieve2 to compete directly with sr1sieve/sr2sieve. That would require a lot of ASM code and I have avoided such code to ensure portability to other CPU architectures, such as ARM. Some less used sieves still have ASM. Unless asked, I will probably not update those for ARM support. Some sieves support AVX (which uses ASM), but they also have a non-AVX code path. In short srsieve2 is not meant as a replacement for sr1sieve/sr2sieve. I was focused on srsieve2cl. At some point I will write the GPU equivalent code for sr2sieve. Fortunately the Generic code in srsieve2cl is fast enough to replace sr2sieve so it hasn't been too high on my priority list. I would be happy to answer any questions. |
![]() |
![]() |
![]() |
#1070 |
Random Account
Aug 2009
Oceanus Procellarum
302710 Posts |
![]()
fbncsieve fatal error:
Code:
D:\sieve>fbncsieve -P5e12 -i1897-3.abcd -o1897-4.abcd fbncsieve v1.5, a program to find factors of k*b^n+c numbers for fixed b, n, and c and variable k Sieve started: 1000000000039 < p < 5e12 with 30071 terms (1000010 < k < 1999980, k*18970509^3+1) (expecting 1655 factors) Increasing worksize to 1600000 since each chunk is tested in less than a second Increasing worksize to 200000000 since each chunk is tested in less than a second Fatal Error: 1302598*18970509^3+1 mod 1014558378077 = 758131303968 I checked to make sure I had the latest build. Unless something has changed in the past day, it appears I do. |
![]() |
![]() |
![]() |
#1071 | |
"Mark"
Apr 2003
Between here and the
33×277 Posts |
![]() Quote:
|
|
![]() |
![]() |
![]() |
#1072 |
Random Account
Aug 2009
Oceanus Procellarum
3×1,009 Posts |
![]() |
![]() |
![]() |
![]() |
#1073 |
"Mark"
Apr 2003
Between here and the
33·277 Posts |
![]()
I can fix this, but lost some speed in the process. I don't know if I can restore the speed without re-introducing this issue. I need to look at it further.
|
![]() |
![]() |
![]() |
#1074 | |
"Gary"
May 2007
Overland Park, KS
110001000101002 Posts |
![]() Quote:
What all would be involved in creating my own executable? It's interesting that you bring up srsieve2 not being able to compete with sr2sieve/sr1sieve as far as overall throughput on multi-core machines. I've generally found that to be true but I have found a major exception: CRUS Sierp base 66. I'm getting much more overall throughput with srsieve2 vs. multiple instances of sr2sieve with the -x switch on 3 different machines: Intel 8-core/8-thread, Intel 8-core/16-thread, and AMD 16-core/32-thread. Perhaps srsieve2 is faster when you have to use the -x switch in sr2sieve due to the many large k-values. But based on your explanations in various places, I don't know why. Eventually I want to fiddle with running srsieve2cl. I don't know anything about GPU's but I believe my Ryzen 3950X has one that would do quite well with this. |
|
![]() |
![]() |
![]() |
#1075 |
"Mark"
Apr 2003
Between here and the
33·277 Posts |
![]()
To build on Windows I use clang 14.0.0 (from the llvm project on github). The build the GPU executables you will also need perl. With those installed you just need to use "make" or "make <program>" from the command line from the directory with the makefile.
I have seen the similar results with sr2sieve -x vs srsieve2. In other words some conjectures sieve faster with sr2sieve -x, but others sieve faster with srsieve2. I have not investigated why. As you stated it likely has something to do with large k, but it isn't obvious in looking at either sr2sieve or srsieve2 since they have very different implementations. FYI all command line output is generated with calls to WriteToConsole(). Many of these are in App.cpp. You will find most (but not all) of the rest in the xxApp.cpp class specific to the sieve. https://www.mersenneforum.org/rogue/mtsieve.html has more detail on the framework including descriptions of the framework classes and methods. I would be happy to answer any questions. |
![]() |
![]() |
![]() |
#1076 |
"Mark"
Apr 2003
Between here and the
33×277 Posts |
![]()
I found the issue. For larger bases it requires different logic. twinsieve is also impacted by this, but I think I can use the faster logic for ccsieve for some forms.
|
![]() |
![]() |
![]() |
#1077 |
"Mark"
Apr 2003
Between here and the
11101001101112 Posts |
![]()
I have posted mtsieve 2.4.5 at sourceforge. Here are a list of changes:
Code:
framework: Replace vsprintf with vsnprintf. srsieve2/srsieve2cl: version 1.6.9 Fix an issue that occurs when logging factors and using multiple threads. gcwsieve/gcwsievecl: version 1.5.1 Log terms of GFN or Mersenne forms as they are removed. fbncsieve: version 1.6 Implement different logic (which is 5x slower) for larger bsaes to avoid invalid factors. Only verify first factor for the first k for each prime. Reduce memory usage for odd bases since we only track even k. Reduce memory usage for base 2 since we only track odd k. Output primes to a separate file. twincsieve: version 1.6 Implement different logic (which is 5x slower) for larger bsaes to avoid invalid factors. This only applies to b^n forms. Add support to sieve for factorial/primorial twins. Reduce memory usage for odd bases since we only track even k. Reduce memory usage for base 2 since we only track odd k. Only verify first factor for the first k for each prime. ccsieve: version 1.2 Implement different logic (which is up to 2x faster) for b^n forms. |
![]() |
![]() |
![]() |
#1078 |
Random Account
Aug 2009
Oceanus Procellarum
BD316 Posts |
![]() Code:
...287 factors found at 234 sec per factor (last 163 min)... ![]() |
![]() |
![]() |