mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   Software (https://www.mersenneforum.org/forumdisplay.php?f=10)
-   -   mtsieve (https://www.mersenneforum.org/showthread.php?t=23042)

ryanp 2021-10-25 16:39

[QUOTE=rogue;591577]I recommend bumping -g. You will have to play around to see where you start seeing diminishing returns.

I have noticed that when running many workers that the code that feeds the worker threads is not fast enough. In some cases it is better to have multiple instances of srsieve2 running. To address this would require significant changes to the framework.[/QUOTE]

I've tried a number of combinations of -G and -g, ranging from "-g 16" up to "-G 8 -g 288". No matter what, it always tops out around 9.7 or 9.8M p/sec.

This is pretty surprising (the A100 has a max 19.5 TFlop/s single precision) considering I can easily get higher than that with 48 workers on a Xeon.

rogue 2021-10-25 17:46

[QUOTE=ryanp;591580]I've tried a number of combinations of -G and -g, ranging from "-g 16" up to "-G 8 -g 288". No matter what, it always tops out around 9.7 or 9.8M p/sec.

This is pretty surprising (the A100 has a max 19.5 TFlop/s single precision) considering I can easily get higher than that with 48 workers on a Xeon.[/QUOTE]

I have used -g500 (no -G). It is unlikely that the GPU is the bottleneck.

ryanp 2021-10-25 18:24

[QUOTE=rogue;591588]I have used -g500 (no -G). It is unlikely that the GPU is the bottleneck.[/QUOTE]

I've tested "-g 72", "-g 144", "-g 600" and for fun, "-g 6192". All produce about 7.5M to 8M p/sec.

With "-G 2" or "-G 4" added in as well, i can get up to about 9.5 to 9.8M p/sec, but no higher.

R. Gerbicz 2021-10-25 18:25

[QUOTE=ryanp;591575]
On a Tesla A100, I couldn't get srsieve2cl to go much above 9 to 10M p/sec, after fiddling with values for a while. By comparison, a plain [C]./srsieve2 -W 48[/C] on a 72-core Xeon CPU gives me about 15M p/sec.[/QUOTE]

Just for curiosity that is is how many k/n pairs, so what is the #k * (nmax-nmin) ?

Citrix 2021-10-26 04:46

[QUOTE=rogue;591588]I have used -g500 (no -G). It is unlikely that the GPU is the bottleneck.[/QUOTE]

The "Sieve of Eratosthenes" code becomes a bottle neck around 10-15 Mp/sec.

henryzz 2021-10-26 10:27

Have you tried multiple instances of mtsieve?

rogue 2021-10-26 12:41

[QUOTE=Citrix;591639]The "Sieve of Eratosthenes" code becomes a bottle neck around 10-15 Mp/sec.[/QUOTE]

I use primesieve as the prime generator. Maybe it could be used in a more efficient manner.

Citrix 2021-10-27 03:02

[QUOTE=rogue;591664]I use primesieve as the prime generator. Maybe it could be used in a more efficient manner.[/QUOTE]

The license prevents us from changing the code. You could write a faster Sieve of Eratosthenes if you let a few composites slip through.

[QUOTE=henryzz;591660]Have you tried multiple instances of mtsieve?[/QUOTE]

You can get 60MP/sec for 4 cores.

Happy5214 2021-11-02 00:40

Would it be possible for mtsieve to return with a non-zero exit code if it was terminated manually (i.e. with SIGINT)? I have several shell scripts for the older srsieve family of programs that have that behavior, and it's useful to know whether the range completed (particularly when it comes to determining whether or not to update a sieve progress control file, which I currently have to revert manually to the old value if I manually kill an mtsieve-family sieve program).

---------

[QUOTE=Citrix;591744]The license prevents us from changing the code. You could write a faster Sieve of Eratosthenes if you let a few composites slip through.[/QUOTE]

Assuming [url]https://github.com/kimwalisch/primesieve[/url] is the same library, what stops you from forking it and modifying it? It's under a 2-clause BSD license, one of the most liberal FOSS licenses there is.

rogue 2021-11-02 12:49

[QUOTE=Happy5214;592249]Would it be possible for mtsieve to return with a non-zero exit code if it was terminated manually (i.e. with SIGINT)? I have several shell scripts for the older srsieve family of programs that have that behavior, and it's useful to know whether the range completed (particularly when it comes to determining whether or not to update a sieve progress control file, which I currently have to revert manually to the old value if I manually kill an mtsieve-family sieve program).

---------

Assuming [url]https://github.com/kimwalisch/primesieve[/url] is the same library, what stops you from forking it and modifying it? It's under a 2-clause BSD license, one of the most liberal FOSS licenses there is.[/QUOTE]

The output when abnormally terminated has the string "Interrupted" in the text, e.g. "2021-10-01 13:16:49: Sieve interrupted at p=26666330822323". Can you parse the output to determine that it was stopped without completing the range?

The challenge is that the framework was designed around slower sieves, ones where each chunk of work takes much more time to process than getting the "chunk of work". This applies to most sieves. It doesn't work as well for faster sieves or computers with many/many cores where one wants a worker per core.

A "chunk of work" means a fixed number of primes, not a fixed range of primes. With the former, there is a mutex around the code getting the primes so that no two workers work on the same primes and so that each worker has the optimal number of primes for its looping logic.

I could change this to a "fixed range of primes". This is slightly less efficient for the worker threads, but significantly reduces the scope of the mutex.

Citrix 2021-11-07 03:11

[QUOTE=Happy5214;592249]

Assuming [url]https://github.com/kimwalisch/primesieve[/url] is the same library, what stops you from forking it and modifying it? It's under a 2-clause BSD license, one of the most liberal FOSS licenses there is.[/QUOTE]

It was under a different license before.
Maybe this would be helpful:-
[url]https://github.com/curtisseizert/CUDASieve[/url]

matzetoni 2021-11-07 21:34

[C]> gfndsievecl.exe -n22001 -N25000 -k2000000 -K3000000 -o"out1.txt"[/C]

I tried running above input with the newest mtsieve version 2.2.2.7 and the program just exits after 5 minutes with no output / error in log file written.

rogue 2021-11-08 00:20

[QUOTE=matzetoni;592673][C]> gfndsievecl.exe -n22001 -N25000 -k2000000 -K3000000 -o"out1.txt"[/C]

I tried running above input with the newest mtsieve version 2.2.2.7 and the program just exits after 5 minutes with no output / error in log file written.[/QUOTE]

I found the issue and will fix. In the interim use gfndsieve to sieve to 1e8, then switch to gfndsievecl.

rogue 2021-12-03 21:27

I am sieving a conjecture with 7223 sequences for CRUS. I was trying to use srsieve2cl and ran into some issues.

I discovered that srsieve2cl consumes too much memory when using a lot of sequences so I have made an adjustment to calculation of factor density, which can be changed using the -M command line switch. This impacts how much memory is allocated for factors returned from GPU memory to CPU memory.

In the future I will add functionality to account for reduced factor density on the fly. As one sieves more deeply, the amount of memory needed for moving factors from GPU to CPU will decrease. For now the adjustment to the calculation is sufficient.

I have noticed that when using the GPU that even though the CPU is waiting on the GPU that a full core of CPU is used (on Windows). I will modify the code to exclude GPU time when computing the factor rate until I have fully investigated the issue.

I noticed in testing that long-running kernels (a few seconds or longer) impact the computed factor rate making it fluctuate up and down depending upon how many kernel executions completed in the previous minute. To account for this, I have increased the minimum number of minutes used to compute the factor rate.

My hope is to time between Christmas and New Year's to finally get the sr2sieve functionality (which supports Legendre tables when one has multiple sequences) implemented into srsieve2 and srsieve2cl. Based upon how long it takes to build Legendre tables when one has many sequences, I will try to offload that logic into the GPU so it should take seconds to build the sequences instead of minutes or hours. For these 7233 sequences took hours to build those tables.

Using a fixed range of P and the exact same input file (7233 sequences and 6868941 terms), here is the wall time (in seconds) to run of the various programs:

[code]
srsieve 1407
sr2sieve 2334 (no Legendre tables)
sr2sieve fail (with Legendre tables)
srsieve2 1183
srsieve2cl 213 (using the default of -M10 -g10)
srsieve2cl 147 (using -M1 -g100)
[/code]

I didn't use clock time because srsieve/sr2sieve don't compute it and because srsieve2cl computes CPU utilization to account for the fact that it is multi-threaded. I ensured that other CPU intensive applications were not running so I cannot determine how much these programs are impacted by other CPU intensive programs that are running concurrently.

There was not enough memory to build the Legendre tables for sr2sieve. I don't know how much memory it needed. It stopped about halfway thru building them. That implies that srsieve2 and srsieve2cl will fail to allocate the necessary memory for those tables. When I make the enhancements to srsieve2, I will see what I can do about failing earlier rather than later. I doubt there is much I can do about reducing memory using with so many sequences.

rogue 2021-12-04 00:14

[QUOTE=rogue;594426]I am sieving a conjecture with 7223 sequences for CRUS. I was trying to use srsieve2cl and ran into some issues.

I discovered that srsieve2cl consumes too much memory when using a lot of sequences so I have made an adjustment to calculation of factor density, which can be changed using the -M command line switch. This impacts how much memory is allocated for factors returned from GPU memory to CPU memory.

In the future I will add functionality to account for reduced factor density on the fly. As one sieves more deeply, the amount of memory needed for moving factors from GPU to CPU will decrease. For now the adjustment to the calculation is sufficient.

I have noticed that when using the GPU that even though the CPU is waiting on the GPU that a full core of CPU is used (on Windows). I will modify the code to exclude GPU time when computing the factor rate until I have fully investigated the issue.

I noticed in testing that long-running kernels (a few seconds or longer) impact the computed factor rate making it fluctuate up and down depending upon how many kernel executions completed in the previous minute. To account for this, I have increased the minimum number of minutes used to compute the factor rate.

My hope is to time between Christmas and New Year's to finally get the sr2sieve functionality (which supports Legendre tables when one has multiple sequences) implemented into srsieve2 and srsieve2cl. Based upon how long it takes to build Legendre tables when one has many sequences, I will try to offload that logic into the GPU so it should take seconds to build the sequences instead of minutes or hours. For these 7233 sequences took hours to build those tables.

Using a fixed range of P and the exact same input file (7233 sequences and 6868941 terms), here is the wall time (in seconds) to run of the various programs:

[code]
srsieve 1407
sr2sieve 2334 (no Legendre tables)
sr2sieve fail (with Legendre tables)
srsieve2 1183
srsieve2cl 213 (using the default of -M10 -g10)
srsieve2cl 147 (using -M1 -g100)
[/code]

I didn't use clock time because srsieve/sr2sieve don't compute it and because srsieve2cl computes CPU utilization to account for the fact that it is multi-threaded. I ensured that other CPU intensive applications were not running so I cannot determine how much these programs are impacted by other CPU intensive programs that are running concurrently.

There was not enough memory to build the Legendre tables for sr2sieve. I don't know how much memory it needed. It stopped about halfway thru building them. That implies that srsieve2 and srsieve2cl will fail to allocate the necessary memory for those tables. When I make the enhancements to srsieve2, I will see what I can do about failing earlier rather than later. I doubt there is much I can do about reducing memory using with so many sequences.[/QUOTE]

sr2sieve has a limit of a little over 2GB. It is a 64-bit build, so I don't know what is causing that limit.

I trimmed down the input file to 2541 sequences with 2421027 terms. The time for sr2sieve here is after it has built the Legendre tables.

[code]
sr2sieve 759 (no Legendre tables)
sr2sieve 327 (with Legendre tables)
srsieve2 446
srsieve2cl 36 (using the default of -M10 -g10)
srsieve2cl 32 (using -M1 -g100)
[/code]

So even without Legendre tables, srsieve2cl kicks sr2sieve's ass.

FYI sr2sieve took over an hour to build those Legendre tables. That time is not included in this table.

srsieve2 and srsieve2cl with Legendre tables will be faster, but how much faster is unknown until I write that code.

All tests were for the same range of primes, so I don't know why the times do not scale. I have about 1/3 the number of terms, but the time is more than 3x as fast. I suspect cache misses in memory, but that is just a guess. It requires further investigation.

rob147147 2021-12-04 01:26

Mark, I was intrigued by your speed tests, so I decided to dig my old code up and run a few too.

I assumed from the numbers (and reservations page) you were running tests on R126, so I fired up a quick sieve file with 7223 sequences and 8212814 terms, so reasonably similar, just less well sieved.

I see a roughly 12x performance difference between srsieve (~30,000 p/sec) and my CUDA code (~365,000 p/sec) on my hardware (5600x and GTX1060), which is pretty similar to the 10x difference you see between srsieve and srsieve2cl. If you ran your tests on the same hardware you mentioned a few pages back (i7-8850H and P3200) then adjusting for hardware, my native CUDA code looks to be about 50% faster than srsieve2cl. I do however have a form of Legendre implemented, so I would expect you to see a pretty reasonable performance bump from implementing Legendre based on that difference and obviously the underlying theory. I really need to spend an evening getting srsieve2cl to work on my machine, then I might be able to help identify any potential areas for gains.

When you come to implement it you may find you get more performance calculating Legendre on the fly, rather than generating the tables and storing them. I certainly found that as it helped to relieve memory pressure of looking things up.

I also suspect you are correct that your greater than 3x performance improvement with 1/3 the terms is memory pressure. The GPU cache size makes an incredible difference to my code, I've seen 2x performance from cards with similar compute but 33%-50% more L2 cache, it appears your code sees something reasonably similar.

rogue 2021-12-04 03:57

[QUOTE=rob147147;594442]Mark, I was intrigued by your speed tests, so I decided to dig my old code up and run a few too.

I assumed from the numbers (and reservations page) you were running tests on R126, so I fired up a quick sieve file with 7223 sequences and 8212814 terms, so reasonably similar, just less well sieved.

I see a roughly 12x performance difference between srsieve (~30,000 p/sec) and my CUDA code (~365,000 p/sec) on my hardware (5600x and GTX1060), which is pretty similar to the 10x difference you see between srsieve and srsieve2cl. If you ran your tests on the same hardware you mentioned a few pages back (i7-8850H and P3200) then adjusting for hardware, my native CUDA code looks to be about 50% faster than srsieve2cl. I do however have a form of Legendre implemented, so I would expect you to see a pretty reasonable performance bump from implementing Legendre based on that difference and obviously the underlying theory. I really need to spend an evening getting srsieve2cl to work on my machine, then I might be able to help identify any potential areas for gains.

When you come to implement it you may find you get more performance calculating Legendre on the fly, rather than generating the tables and storing them. I certainly found that as it helped to relieve memory pressure of looking things up.

I also suspect you are correct that your greater than 3x performance improvement with 1/3 the terms is memory pressure. The GPU cache size makes an incredible difference to my code, I've seen 2x performance from cards with similar compute but 33%-50% more L2 cache, it appears your code sees something reasonably similar.[/QUOTE]

The memory hit for the Legendre tables imply that computing on the fly could be better.

rogue 2021-12-09 23:39

I have started implementing the changes to support the sr2sieve functionality into srsieve2. So far I have focused on the building of the Legendre tables. Although not difficult, the main issue with sr2sieve is that one has not idea of the progress when it is expected to take a long time to build those tables. srsieve2 will now give an estimate as to when it will complete. srsieve2 will also "abort" building the Legendre tables if are too large, i.e. need too much memory. sr2sieve would just terminate. srsieve2 will go on its merry way after giving a message. What I don't know yet is if srsieve2 without Legendre tables when using the c=1 logic will be faster than the generic logic, which does not use Legendre tables. What I also don't know is if computing Legendre values on the fly will be faster if there isn't enough memory to build them.

I do intend to make a change to split the sequences into groups when feeding the GPU. That should boost performance (fewer cache hits) and it should address problems for those with a GPU that has less memory.

rogue 2021-12-16 19:16

It has been a week, so time for an update.

The code is a mess right now. As I was working on the logic for sr2sieve, I ended up having to refactor some of the single sequence logic (sr1sieve) used to build congruent q, ladder, and legendre tables since some of that code is shared between both of those sieves. A lot of of this was in consideration of using OpenCL for the new sr2sieve logic. On the plus side, I addressed some code that was confusing to read and thus hard to understand.

So at this point it won't compile. It might take a day or more to fix all of those problems, but I first need to revisit the new code for sr2sieve functionality because I'm certain that I have missed some logic or did not implement some logic correctly.

For the Legendre tables, the code will first determine how much memory is needed for those tables and spit that out. It will then try to allocate that memory. If it fails, then it won't use the Legendre tables.

But there are other considerations that I have to look into.

First, what if the code can build some Legendre tables, but not all of them? Based upon my understanding of the code, srsieve2 should use the Legendre tables for the sequences for which there was enough memory but then fall back and compute the Legendre symbol on the fly for the other sequences.

Second, I recall reading that there are cases where one could use sr1sieve to sieve more than one k (in series) and that would be faster than sieving both k with sr2sieve. I haven't tested that out. I can imagine that might be useful if one can build the Legendre tables for only one k, but not both (due to memory limitations). I will have to do some testing to see if that is true. I don't think it is, but I could be wrong.

Third, how much of a performance hit will it be to the OpenCL logic (for multiple sequences) if it has to compute the Legendre symbol on the fly? This goes back to my concern about how much GPU memory is needed.

Fourth, before I started on these changes I had implemented the logic to split the number of sequences sieved at a time with the GPU. I didn't complete that testing, although the results at the time were promising.

Fifth, I need to address building OpenCL code on OS X. With Apple's changes that are pushing developers to use Metal it no longer supplies the OpenCL headers even though I can use the OpenCL framework. I will have to pull those from kronos and put into SVN and use those for builds on OS X.

Next year I hope to get an M1 MacBook, so mtsieve will be supported on that, although some sieves might not be ported because few people use them and they rely heavily on x86 asm.

KEP 2021-12-17 20:54

Very good work Rogue.

A note on your 1 k or 2 k sequence sieve.

We have to go back to the time just a few years after CRUS started, when someone finally upgraded sr2sieve, to run more efficient. Since then, if memory serves me correct, or someone can find the post our dear missed Lennart made, we are all the way back to 2010 or 2011.

Before that update of sr2sieve, it was indeed faster to sieve just 1 sequence and running 2 instances of sr1sieve sieving 1 sequence at a time. After the upgrade sr1sieve only was faster for a single sequence and not for conjectures with 2 or more k's remaining.

Before the upgrade it was infact fastest to use sr2sieve, for 3 or more sequences. After upgrade of sr2sieve, only fastest to use sr1sieve for 1 sequence.

Looking forward to see what can be done on my GT 1030's once I complete SR383 testing to n=1M and has to start sieving about 39 sequences for either 1M n, 9M n or 49M n (39M candidates, 351M candidates or 1,911M candidates) - that sieve is about 970 days into the future :smile:

rogue 2021-12-21 23:58

Thanks for letting me know KEP. I will do some testing to verify behavior when the code is ready.

In the next release the -l parameter, which is currently used to turn off Legendre logic, will be modified to accept a value indicating how much memory you want to allocate for Legendre tables. It will default to 2^62, which is ridiculous of course. At runtime srsieve2 will tell you how much memory is needed and will give you an idea how long it will take to build the Legendre tables, giving an ETC. If your machine doesn't have enough memory it will terminate immediately (as opposed to trying to build tables then fail before it finishes). It won't tell you how much memory you have available, but if you combine how much memory it tells it needs vs how much you can see is available, you can run again with -l limiting how much memory it will allocate for those table. srsieve2 will only allocate up to that limit. This means that some sequences will be tested using pre-built Legendre tables and others will compute the Legendre symbol on the fly.

This brings up a few things.

First, I suspect that srsieve2 with calculating the Legendre symbol might be slower than the generic sieving logic. I have seen this with srsieve vs sr2sieve. This might be dependent upon the sequences being sieved or might be true for all sequences.

Second, if computing the Legendre symbol on the fly is slower than generic sieving, then it would make sense to sieve sequences with a Legendre table with the sr2sieve code and then sieve those without with the srsieve code. I could possibly do this in code, but it would be easier for me to have users split the sequences across two input files and sieve them separately. This could be a challenge to do in srsieve2. If anything it would go on the wish list.

Third, I am thinking about building two Legendre table for mixed parity sequences, which is what sr1sieve and srsieve2 (with a single sequence) do. When sr2sieve was initially written Geoff was more concerned about memory requirements, but today's computers typically have a lot more memory than the ones that existed when sr2sieve was first created. I think that this could improve performance by about 25%, but it will take some time for me to verify the accuracy of that assumption. I'm trying to think how to best manage this if a computer doesn't have enough memory. I have some options. One is "all or nothing" where all sequences use only one method. This could be based upon available memory, as creating two tables instead of one for a sequence requires more memory. In this case I try to allocate all memory for two Legendre tables (for mixed parity sequences) and if there isn't enough, switch to allocating memory for one Legendre table (for mixed parity sequences). I could write the code to use available memory to pick and choose the sequences that use one Legendre table and which use two Legendre tables, but that would be a lot of work and might not be worth the effort.

Fourth, although not written yet, the GPU logic for multiple sequences with Legendre tables is a concern. Some computers will have more available CPU memory than GPU memory. They will fail when sieving starts as opposed to when the program builds the Legendre tables. So this is a question of whether or not to force the GPU to compute the Legendre symbol on the fly when the CPU could use Legendre tables.

Fifth, it might be possible that the GPU logic with computing the Legendre symbol (no Legendre tables) is faster than the CPU logic with computing the Legendre symbol yet the CPU with the Legendre table is faster than the GPU with the Legendre tables. This is likely dependent upon the end user's GPU and CPU. If that is the case, then the end user will need to do some testing to find out what is best for them. Right now I do not have a switch to force use of generic logic when the c=1 logic is available. That will be a requirement for the next release.

If anyone here has opinions on the direction I am taking, feel free to share them.

KEP 2021-12-23 20:52

I like your ideas Mark :smile:

When that is said, in wake of the experience that I have now had, trying to make mfaktc running on Linux Mint 20.4 - ONE really really big wish is that if mtsieve is not like mprime a truly STAND-ALONE program, where all needed for the program to execute and test is selfcontained in the coding or compilation, that the driverfiles and what else externally needed for mtsieve to work on a Linux machine, is included in the zip file (with instructions on where to put the files).

It simply isn't as much a walk in the park, using Linux, especially if one has low experience and the machine(s) is unable to get online for at least 4 or maybe even 6 months - if ever, when one has to install forinstance CUDA and make mfaktc (just to give an example) see the claimed by Linux terminal installed version of CUDA.

Still looking forward to be seeing what can be done and achieved when using mtsieve :smile:

Merry christmas to you and your loved ones :smile:

rogue 2021-12-24 00:39

Someone could probably write some kind of service to start a .bat file at startup. On Linux or OS X, this would be a .sh file.

One could wrap a client/server app around it, but that would take a lot of work.

As of today, the generic sieve and c=1 sieve (for a single sequence) are now working again, which required at bit of testing after all of the refactoring to support the c=1 sieve (for multiple sequences). Unfortunately the c=1 sieve for a single sequence lost a lot of speed. I do not know why. It finds all of the expected factors so I need to investigate this further. Hopefully the cause is fairly obvious.

There is another issue in the c=1 GPU sieve (for a single sequence) that causes it to crash, but I cannot trigger the problem when running thru gdb. Very mysterious and annoying. I have yet to find the cause of that problem. Once I build on OS X, it is more likely that I will be able to trigger the issue in lldb, but that remains to be seen.

I have yet to start testing the c=1 sieve (for multiple sequences). I know it won't work "out of the box", but I hope it is close. Once that works I can focus on the GPU version of that sieve.

All being said, there is still a lot of testing to be done, but I feel confident that it will be ready next week.

rogue 2021-12-25 19:09

I fixed the issue that caused the c=1 (single sequence) sieve to slow down. While playing around with the code, I decided to make BASE_MULTIPLE, LIMIT_BASE, and POWER_RESIDUE_LCM configurable via command line switches. These are used primarily for the c=1 sieves and not the generic sieve. Although the default setting for these provides some of the fastest sieving, I have found some configurations that provide a 4% speed bump. Many other configurations are dreadful. The point is that you will want to take a range of 1e9 primes and try various settings for the switches. This could take a couple of hours, but considering that sieving deeply enough could take many days to reach optimal sieving depth, it would be worth the time.

Before I take the c=1 (multiple sequence) sieve testing I will finally implement the code for -L, where srsieve2 can read from a file then save Legendre tables to a file. Right now it appears that the main benefit is when one has to build large Legendre tables (or many Legendre tables) because building small ones doesn't take very long.

rogue 2021-12-29 15:19

In working the logic to read/write files for the Legendre tables, I have a question to the users. Would it be preferable to have a single file for all sequences or to have a single file per sequence?

In the case of a single file, if you have a lot of sequences, then that file can be quite large. In one of my current tests with nearly 4000 sequences that file is over 6 GB in size. It takes over five hours to build (single threaded).

I'm leaning towards a change to have a single file per sequence. -L would now be used to provide a directory name (instead of a file name). The file names would be auto-generated. This would allow a user to run multiple instances of srsieve2 concurrently (same box or different box) to build those files then run a single instance (with one or more workers) that could then read the files for the sequences it uses. Those files could easily be saved for future use.

Thoughts?

VBCurtis 2021-12-29 18:00

my initial reaction was "one file please, clutter."
Then your plan for organization convinced me that one file per sequence is fine too!

rogue 2021-12-30 20:36

Not a lot of feedback, but my mind was made up for me in testing.

I am running into an issue with large binary files, specifically the Legendre file when it exceeds 2 GB in size. Per the gcc doc I should be able to use fseek() to go beyond 2 GB, but I cannot despite the defining _FILE_OFFSET_BITS as 64 at compile time. I suspect that srsieve2 was never used to manage such a large file since it doesn't work either.

For this reason I am going to modify -L to represent a directory (which can be relative or absolute to srsieve2) instead of a file. That probably isn't what everyone wants to read, but I cannot imagine it being too much of a hardship for users.

rogue 2022-01-03 20:32

I have finally made enough progress to commit my changes. I haven't posted a build yet as I need to do more testing, but everything looks good right now. The new release is version 1.6.0.

My goal was never to make this logic as fast as sr2sieve. My hope was to get it "in the ballpark", but the results below are worse than what I expected. I know some of the reasons why it is slower (no hand-tuned assembly for unrolling loops), but there could be others. The long term goal was to implement this logic in the GPU, where I expect it to perform much better than sr2sieve. The unknown is whether or not it will perform better that the generic logic in srsieve2cl 1.5.3. I have not started coding it yet. I think it will be partly dependent upon whether or not I can determine the reasons for v1.6.0 to be so much slower.

Here are some timings for one range that has 265 sequences. Each run produced the same list of factors. All of these were run on the same laptop. The CPU is an Intel i7-8850H. The GPU is an NVIDIA Quadro P3200.

[code]
srsieve 5397 seconds
sr2sieve (no Legendre tables) 4083 seconds
sr2sieve (with Legendre tables) 1772 seconds
srsieve2 v1.5.3 (generic logic) 3424 seconds
srsieve2cl v1.5.3 (generic logic, -g10 (default)) 672 secomds
srsieve2cl v1.5.3 (generic logic, -g100) 602 secomds
srsieve2 v1.6.0 (c=1 logic, no legendre tables) 5015 seconds
srsieve2 v1.6.0 (c=1 logic, with legendre tables) 3504 seconds
[/code]

Once I can compile and run this on OS X, I can compare there as I have an AMD Radeon on that computer.

I think the results are pretty clear if you have multiple sequences where abs(c) = 1 for all sequences:
[LIST][*]If you have a GPU, use srsieve2cl, although I strongly suggest you run some tests as GPU speeds vary significantly.[*]If you do not have a GPU and are sieving sequences which support Legendre tables, use sr2sieve.[*]If you do not have a GPU and are sieving sequences which do not support Legendre tables, use srsieve2.[*]If you do not have a GPU and have a mix of sequences where some support Legendre tables and some do not, split into two inputs, run one thru sr2sieve and one thru srsieve2.[/LIST]

Dylan14 2022-01-13 04:52

On the latest commit (r160) I am running into an issue with the compilation of srsieve2:

[code]sierpinski_riesel/CisOneWithOneSequenceGpuWorker.cpp:87:56: error: ‘POWER_RESIDUE_LCM’ was not declared in this scope
87 | sprintf(define12, "#define POWER_RESIDUE_LCM %u\n", POWER_RESIDUE_LCM);
|
[/code]

which seems strange as a) we are defining this here, and b) in CisOneSequenceHelper.h we have the related ii_PowerResidueLcm defined.

rogue 2022-01-13 13:42

[QUOTE=Dylan14;597820]On the latest commit (r160) I am running into an issue with the compilation of srsieve2:

[code]sierpinski_riesel/CisOneWithOneSequenceGpuWorker.cpp:87:56: error: ‘POWER_RESIDUE_LCM’ was not declared in this scope
87 | sprintf(define12, "#define POWER_RESIDUE_LCM %u\n", POWER_RESIDUE_LCM);
|
[/code]

which seems strange as a) we are defining this here, and b) in CisOneSequenceHelper.h we have the related ii_PowerResidueLcm defined.[/QUOTE]

That has to be fixed before I can release the next build. If you need the GPU worker for the single sequence c=1 sieve, then d/l the latest .7z file of the Windows executables.

If you want to take srsieve2 for multi sequence c=1 sieving for a spin, make srsieve2 and not srsieve2cl.

rogue 2022-01-17 17:57

I fixed the source to address the compiler errors for srsieve2cl.

rogue 2022-01-17 20:14

mtsieve 2.2.3 is finally released.

At this time if you are sieving multiple sequences with srsieve2/srsieve2cl, use -l0 to force usage of the generic logic. Without it that command line option it will use the sr2sieve logic, but my testing has shown that to be slower than the generic logic. Also there is no OpenCL variant supporting sr2sieve logic.

Here are the updates:

[code]
framework:
Modified sieving and factor rate calculations to account for long-running sieving
processes which lead to those rates fluctuating a lot.

Added xmallocNew() which can return NULL if not enough memory can be allocated
which allows callers to decide how to handle that condition.

For OpenCL sievers, added a call to clFinish() immediately after the call to
clEnqueueNDRangeKernel() so that the GPU time is more accurately calculated.

fkbnsieve: version 1.4
Do not create output file if there are no terms to write to it.

Exit sieving early if there are no remaining terms.

smsieve, smsievecl: version 1.0
Initial release for sieving of Smarandache numbers between Sm(100000) and Sm(9999999).
Cannot be used at this time for sieving p < 10000, but that is okay since a project
has a pre-sieved input file to use. It is impractical at this time to sieve values
larger than Sm(9999999) due to the time needed for a PRP test with pfgw.

srsieve2, srsieve2cl: version 1.6.0
Changed the calculation of factor density so that the GPU workers can use a
lot less GPU memory.

Added -S to split sequences into groups when feeding the GPU.

The following only apply to the abs(c) = 1 logic:
Added sr2sieve logic which uses Legendre checks in support of sieving when there
are multiple sequences.

Modified -L to represent a directory to hold a back up of the Legendre tables.
There will be one file per unique b/k/c.

Modified -l to specify the amount of memory to allocate to Legendre tsbles. -l0
disables use of Legendre tables.

Added -U to specify a multiplier to compute BASE_MULTIPLE. The specified value
is multiplied by 2 to compute BASE_MULTIPLE.

Added -V to specify a multiplier to compute POWER_RESIDUE_LCM. The specified value
is multiplied by BASE_MULTIPLE to compute POWER_RESIDUE_LCM.

Added -X to specify a multiplier to compute LIMIT_BASE. The specified value
is multiplied by POWER_RESIDUE_LCM to compute LIMIT_BASE.

BASE_MULTIPLE: least exponent Q for which sieving in subsequence base b^Q will be allowed.

POWER_RESIDUE_LCM: For a prime p that satisfies p=1 (mod r), an "r-th power residue test"
checks whether a subsequence of k*b^n+c can possibly contain any terms of
the form x^r (mod p). If there are none then that subsequence can be
omitted from the BSGS step.

To conduct r-th power residue tests for each r in a set R of prime
powers, set POWER_RESIDUE_LCM to lcm(R).

LIMIT_BASE: Allow sieving in base b^Q for Q chosen from the divisors of LIMIT_BASE.

When sieving with a single sequence the defaults are:
bmMultiplier = 15, prlMultiplier = 24, lbMultiplier = 1
BASE_MULTIPLE = 30, POWER_RESIDUE_LCM = 720, LIMIT_BASE = 720

When sieving with multiple sequences the defaults are:
bmMultiplier = 1, prlMultiplier = 360, lbMultiplier = 1
BASE_MULTIPLE = 2, POWER_RESIDUE_LCM = 720, LIMIT_BASE = 720

It is recommend to play with these multipliers to find the optimal values as the provided
defaults are from sr1sieve and sr2sieve and some values can provide faster sieving.
The recommendation is to sieve a range of 1e9 and see how long it takes (use sr2sieve.log
to see how long it actually takes). Vary -U, -V, and-X evaluate the results. With a
script and multiple cores, this could done relatively quickly for most inputs.
[/code]

pepi37 2022-03-11 23:04

latest revision


e:\MTSIEVE\216\MT>k1b2sieve -P 100000000 -w 1e7 -W 1 -n 100 -N 10000 -c 1 -c 3
k1b2sieve v1.1, a program to find factors of 2^n+c numbers for variable n and c
Fatal Error: kmax must be specified

rogue 2022-03-12 00:50

[QUOTE=pepi37;601563]latest revision


e:\MTSIEVE\216\MT>k1b2sieve -P 100000000 -w 1e7 -W 1 -n 100 -N 10000 -c 1 -c 3
k1b2sieve v1.1, a program to find factors of 2^n+c numbers for variable n and c
Fatal Error: kmax must be specified[/QUOTE]

Probably means cmax, i.e. -C. I will fix the text.

pepi37 2022-03-22 00:13

What is use of k1b2sieve? ( what kind of sieve files we got, or what type of prime we search)
Thanks

rogue 2022-03-22 12:20

[QUOTE=pepi37;602254]What is use of k1b2sieve? ( what kind of sieve files we got, or what type of prime we search)
Thanks[/QUOTE]

These are primes of the form 2^n+c for variable n and c. The output file is ABC, which can be used with pfgw. I do not know if its ABC format is compatible with llr. When abs(c) > 1 the numbers will be PRP, unless small enough to be proven prime using other means.

pepi37 2022-03-22 12:27

Is there any prime in that form exist?

rogue 2022-03-22 14:02

[QUOTE=pepi37;602286]Is there any prime in that form exist?[/QUOTE]

Mersenne Primes (when c = -1). Nothing for c = +1.

Almost all other PRPs cannot be proven prime with current hardware.

chris2be8 2022-03-22 16:41

[QUOTE=rogue;602295]Nothing for c = +1.
[/QUOTE]

Except for Fermat primes when n is a power of 2.

Most cases with abs(c) > 1 will be difficult to prove prime if they are large enough to be out of range of primo etc. Exceptions might be when c+/-1 is a power of 2 and the remainder after dividing out 2s has a lot of algebraic factors (or is a Mersenne prime).

rogue 2022-05-09 14:28

I am pleased to announce that mtsieve 2.3.1 is now available at sourceforge. Here is a summary of the changes:

[code]
framework:
Updated to msys2 gcc version 11.2.0, which required changing string to std:string
and vector to std::vector instead of using the std namespace.

Due to a compiler bug, now compiling xyyxsieve and afsieve with -O2 instead of -O3.

Refactored the OpenCL implementation so that Metal implementation can use the same
interface as OpenCL. As a result KernelArgument.cpp no longer exists. The Kernel
now manages all CPU and GPU memory used by the Kernel.

Removed the embedded ASM logic for NVIDIA from all GPU kernels. The OpenCL compiler
generates code that is about 10% faster. This is likely due to hard-coded register
usage in the ASM.

Added Metal support (for Apple hardware) since Apple has deprecated OpenCL on their
hardware. As a result, all sieves that can use OpenCL on Apple hardware can now use
Metal on that same hardware.

Added ARM support for sieves that do not require x86 ASM functions.

Updated factor validation of mfsieve, gcwsieve, etc. to no longer require x86 asm.

Moved cltoh.pl to the main directory. make will now build the xxx.gpu.h file used
by the GpuWorkers as part of compiling the GpuWorker object files.

afsieve, afsievecl: version 1.2:
The GPU code uses the Montgomery logic for the mulmod.
All factors, found by either CPU or GPU are now validated.

psieve, psievecl: version 1.5
Replaced x86 ASM FPU code with Montgomery logic for the mulmod as the x86 ASM FPU
code was missing factors. There is no speed difference between the two. The x86
AVX code worked and has not been changed.

srsieve2, srsieve2cl: version 1.6.2
Changed default value for -g from to to 16. GPUs tend to like powers of 2 due
to their architecture.

Added -C forsieving with the GPU. This reduces threading overhead which has a
noticeable impact on sieving speed when sieving a single sequence. Using -C5 can
improve speed by 50%.

Change -S to -K so that less guessing is needed when one requires multiple GPU
kernels due to having a lot of sequences. The program will now create groups of
sequences that are approximately the same size for each call to the kernel.
[/code]

This doesn't mean that the Metal code is working yet or that all of the sieves will build on Apple M1. Many of them will build and run on Apple M1 out of the box but others needs more work. The components for Metal support are there, but I haven't tried to compile any Metal code yet. I have three tasks remaining:
[LIST=1][*]Build and test sieves with no x86 ASM on Apple M1.[*]Modify all Open CL sieves to build with Metal on Apple Intel.[*]Build and test Metal sieves with no x86 ASM on Apple M1. These are the sieves for the first item that also have an OpenCL version.[/LIST]
What has been most annoying for me is that I cannot upgrade gdb on msys2 so I cannot use the debugger on Windows at all right now. I am getting an error trying to update gdb using pacman and none of the resolutions I have tried (per google) have worked.

pepi37 2022-05-14 00:35

e:\PRIME>srsieve2 -P 50000000000000 -W 1 -w1e6 -L legend.txt -i t17_b10_k99999998.npg -o t17_b10_k99999998.npg -O fact999991.txt -f B
srsieve2 v1.6.2, a program to find factors of k*b^n+c numbers for fixed b and variable k and n
Sieving with single sequence c=1 logic for p >= 110728654783
BASE_MULTIPLE = 30, POWER_RESIDUE_LCM = 720, LIMIT_BASE = 720
Split 1 base 10 sequence into 30 base 10^240 sequences.
Fatal Error: Could not open Legendre file legend.txt\b10_k99999998_c-1.leg9999998)

pepi37 2022-05-14 06:22

e:\PRIME\srsieve2 -P 50000000000000 -W 6 -w1e7 -i sr_10.abcd -O factors.txt
srsieve2 v1.6.2, a program to find factors of k*b^n+c numbers for fixed b and variable k and n
Sieving with multi-sequence c=1 logic for p >= 79917202699
BASE_MULTIPLE = 2, POWER_RESIDUE_LCM = 720, LIMIT_BASE = 720
Split 10 base 10 sequences into 27 base 10^144 sequences.
Legendre summary: Approximately 4740 B needed for Legendre tables
10 total sequences
1 are eligible for Legendre tables
9 are not eligible for Legendre tables
1 have Legendre tables in memory
9 cannot have Legendre tables in memory
0 have Legendre tables loaded from files
1 required building of the Legendre tables
864000 bytes used for congruent subseq indices
12000 bytes used for congruent subseqs
Sieve started: 79917202699 < p < 5e13 with 6902 terms (100005 < n < 999892, k*10^n+1) (expecting 1409 factors)

And just exit , no sieve process started
When I use old version of srsieve2 sieving process continue without any problem



e:\PRIME\srsieve2 -P 20000000000000 -W 6 -w 5e6 -i sr_10.abcd -O factors.txt
srsieve2 v1.5.3, a program to find factors of k*b^n+c numbers for fixed b and variable k and n
Cannot use Legendre tables because square-free part of k is too large
Must use generic sieving logic because there is more than one sequence
Sieving with generic logic for p >= 79917202700
Split 10 base 10 sequences into 14 base 10^96 sequences.
Sieve started: 79917202700 < p < 2e13 with 6897 terms (100005 < n < 999892, k*10^n+1) (expecting 1244 factors)

rogue 2022-05-14 13:09

Can you send me the file you are using?

Also, you can use -l0 to use the generic logic for multiple sequences. It will be faster.

pepi37 2022-05-14 14:44

[QUOTE=rogue;605867]Can you send me the file you are using?

Also, you can use -l0 to use the generic logic for multiple sequences. It will be faster.[/QUOTE]


[URL]https://www.dropbox.com/t/YD3zba2CafIGX80M[/URL]


bat file, exe file and sieve file inside zip file :)

rogue 2022-05-16 14:42

There are two different bugs. Unfortunately I need to update gdb in my environment as it doesn't work with the newer gcc, but I'm getting errors when trying to update gdb and nothing I've tried (per google searches) has worked yet. Ugh!

rogue 2022-05-17 14:13

I tried switching to the cygwin g++ compiler, but it won't link. It appears to be a bug in primesieve or the linker or compiler used by cygwin. So I am now trying to use the llvm-mingw compiler. It appears that I can debug with lldb or gdb for programs built on it.

This will require a bit of testing. It seems to crash when resetting the rounding mode for SSE, but only one sieve uses SSE, so I will probably just modify that sieve to use different logic and thus remove SSE assembler completely from the code. It is possible that something else I changed between releases triggered this crash, but it is odd that only srsieve2 seems to be affected by it. Without use of gdb on msys2 builds, I cannot diagnose the root cause of the crash.

ryanp 2022-05-23 13:30

Found what seems to be a bug in the latest SVN code. I have not had time to recompile or test with [C]gdb[/C].

[CODE]$ ./srsieve2 -P 1e14 -W 64 -s "39*2^n+1" -o ferm39_10M_20M_sv1e14.txt -n 10e6 -N 20e6
srsieve2 v1.6.2, a program to find factors of k*b^n+c numbers for fixed b and variable k and n
Sieving with generic logic for p >= 3
Sieve started: 3 < p < 1e14 with 10000001 terms (10000000 < n < 20000000, k*2^n+1) (expecting 9659200 factors)
Sieving with single sequence c=1 logic for p >= 257
BASE_MULTIPLE = 30, POWER_RESIDUE_LCM = 720, LIMIT_BASE = 720
Split 1 base 2 sequence into 408 base 2^720 sequences.
Legendre summary: Approximately 40 B needed for Legendre tables
1 total sequences
1 are eligible for Legendre tables
0 are not eligible for Legendre tables
1 have Legendre tables in memory
0 cannot have Legendre tables in memory
0 have Legendre tables loaded from files
1 required building of the Legendre tables
518400 bytes used for congruent q and ladder indices
311200 bytes used for congruent qs and ladders
Unable to lock mutex thread_6_worker. Exiting.[/CODE]

Interestingly, this only seems to happen for [C]39*2^n+1[/C] and not [C]37*2^n+1[/C] or [C]41*2^n+1[/C].

rogue 2022-05-23 15:00

[QUOTE=ryanp;606332]Found what seems to be a bug in the latest SVN code. I have not had time to recompile or test with [C]gdb[/C].

[CODE]$ ./srsieve2 -P 1e14 -W 64 -s "39*2^n+1" -o ferm39_10M_20M_sv1e14.txt -n 10e6 -N 20e6
srsieve2 v1.6.2, a program to find factors of k*b^n+c numbers for fixed b and variable k and n
Sieving with generic logic for p >= 3
Sieve started: 3 < p < 1e14 with 10000001 terms (10000000 < n < 20000000, k*2^n+1) (expecting 9659200 factors)
Sieving with single sequence c=1 logic for p >= 257
BASE_MULTIPLE = 30, POWER_RESIDUE_LCM = 720, LIMIT_BASE = 720
Split 1 base 2 sequence into 408 base 2^720 sequences.
Legendre summary: Approximately 40 B needed for Legendre tables
1 total sequences
1 are eligible for Legendre tables
0 are not eligible for Legendre tables
1 have Legendre tables in memory
0 cannot have Legendre tables in memory
0 have Legendre tables loaded from files
1 required building of the Legendre tables
518400 bytes used for congruent q and ladder indices
311200 bytes used for congruent qs and ladders
Unable to lock mutex thread_6_worker. Exiting.[/CODE]

Interestingly, this only seems to happen for [C]39*2^n+1[/C] and not [C]37*2^n+1[/C] or [C]41*2^n+1[/C].[/QUOTE]

srsieve2 doesn't seem to perform nicely for large numbers of threads. I think it would require a rethinking of the framework to do that. You are probably off running multiple instances with a smaller number of threads, each with its own range. You can also bump -w to change the number of primes per worker thread. That might alleviate some of the thread contention. I do not have a CPU with more than 8 cores to run a test like this on.

I do not recall if sr1sieve is faster (compared to srsieve2). I know it is slower than srsieve2cl.

ryanp 2022-05-23 17:16

[QUOTE=rogue;606337]srsieve2 doesn't seem to perform nicely for large numbers of threads. I think it would require a rethinking of the framework to do that. You are probably off running multiple instances with a smaller number of threads, each with its own range. You can also bump -w to change the number of primes per worker thread. That might alleviate some of the thread contention. I do not have a CPU with more than 8 cores to run a test like this on.

I do not recall if sr1sieve is faster (compared to srsieve2). I know it is slower than srsieve2cl.[/QUOTE]

I've been able to repro this on multiple machines. Observations:

* it fails consistently and only with [C]-s "39*2^n+1"[/C]
* it fails with even -W 4, though this produces:

[CODE]518400 bytes used for congruent q and ladder indices
311200 bytes used for congruent qs and ladders
corrupted size vs. prev_size
Aborted[/CODE]

instead of:

[CODE]518400 bytes used for congruent q and ladder indices
311200 bytes used for congruent qs and ladders
srsieve2: ../nptl/pthread_mutex_lock.c:81: __pthread_mutex_lock: Assertion `mutex->__data.__owner == 0' failed.
Aborted[/CODE]

Here's the result of debugging with [C]gdb[/C]:

[CODE][New Thread 0x7ffff6ef9640 (LWP 206359)]
[New Thread 0x7ffff5ef7640 (LWP 206360)]
corrupted size vs. prev_size

Thread 1 "srsieve2" received signal SIGABRT, Aborted.
__GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:49
49 ../sysdeps/unix/sysv/linux/raise.c: No such file or directory.
(gdb) bt
#0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:49
#1 0x00007ffff7a4a546 in __GI_abort () at abort.c:79
#2 0x00007ffff7aa1eb8 in __libc_message (action=action@entry=do_abort,
fmt=fmt@entry=0x7ffff7bbfa78 "%s\n") at ../sysdeps/posix/libc_fatal.c:155
#3 0x00007ffff7aa991a in malloc_printerr (
str=str@entry=0x7ffff7bbd714 "corrupted size vs. prev_size")
at malloc.c:5628
#4 0x00007ffff7aaa816 in unlink_chunk (p=p@entry=0x5555555f9e00,
av=<optimized out>) at malloc.c:1608
#5 0x00007ffff7aad24a in _int_malloc (
av=av@entry=0x7ffff7bf6ba0 <main_arena>, bytes=bytes@entry=112)
at malloc.c:4263
#6 0x00007ffff7aae4b1 in __GI___libc_malloc (bytes=112) at malloc.c:3237
#7 0x00007ffff7e0e6bc in operator new(unsigned long) ()
from /lib/x86_64-linux-gnu/libstdc++.so.6
#8 0x000055555556f308 in __gnu_cxx::new_allocator<primesieve::SievingPrime>::allocate (this=0x7fffffffd5f0, __n=14)
at /usr/include/c++/11/ext/new_allocator.h:127
#9 0x000055555556efb2 in std::allocator_traits<std::allocator<primesieve::SievingPrime> >::allocate (__a=..., __n=14)
at /usr/include/c++/11/bits/alloc_traits.h:464
#10 0x000055555556ea3c in std::_Vector_base<primesieve::SievingPrime, std::allocator<primesieve::SievingPrime> >::_M_allocate (this=0x7fffffffd5f0, __n=14)
at /usr/include/c++/11/bits/stl_vector.h:346
#11 0x000055555556e6c4 in std::vector<primesieve::SievingPrime, std::allocator<primesieve::SievingPrime> >::reserve (this=0x7fffffffd5f0, __n=14)
at /usr/include/c++/11/bits/vector.tcc:78
#12 0x000055555556ba1c in primesieve::EratSmall::init (this=0x7fffffffd5d0,
stop=1021020, l1CacheSize=17017, maxPrime=17) at sieve/EratSmall.cpp:57
#13 0x000055555556f9d4 in primesieve::PreSieve::initBuffer (
this=0x55555583efd8, maxPrime=17, primeProduct=510510)
at sieve/PreSieve.cpp:86
#14 0x000055555556f8dc in primesieve::PreSieve::init (this=0x55555583efd8,
start=11924379, stop=23704475) at sieve/PreSieve.cpp:68
#15 0x0000555555564079 in primesieve::Erat::init (this=0x55555583ec70,
start=11924379, stop=23704475, sieveSize=512, preSieve=...)
at sieve/Erat.cpp:79
#16 0x000055555557793b in primesieve::PrimeGenerator::initErat (
this=0x55555583ec70) at sieve/PrimeGenerator.cpp:159
#17 0x00005555555778ad in primesieve::PrimeGenerator::init (
this=0x55555583ec70,
primes=std::vector of length 256, capacity 256 = {...},
size=0x7fffffffd8f8) at sieve/PrimeGenerator.cpp:147
#18 0x0000555555577bab in primesieve::PrimeGenerator::sieveSegment (
this=0x55555583ec70,
primes=std::vector of length 256, capacity 256 = {...},
size=0x7fffffffd8f8) at sieve/PrimeGenerator.cpp:232
#19 0x0000555555577d56 in primesieve::PrimeGenerator::fill (
this=0x55555583ec70,
primes=std::vector of length 256, capacity 256 = {...},
size=0x7fffffffd8f8) at sieve/PrimeGenerator.cpp:291
#20 0x00005555555850d5 in primesieve::iterator::generate_next_primes (
this=0x7fffffffd8f0) at sieve/iterator.cpp:67
#21 0x0000555555560110 in primesieve::iterator::next_prime (
this=0x7fffffffd8f0) at sieve/primesieve/iterator.hpp:69
#22 0x0000555555560632 in primesieve::store_n_primes<std::vector<unsigned long, std::allocator<unsigned long> > > (n=216804, start=1257,
primes=std::vector of length 783196, capacity 1000000 = {...})
at sieve/primesieve/StorePrimes.hpp:87
#23 0x000055555556028c in primesieve::generate_n_primes<unsigned long> (
n=1000000, start=1258, primes=0x5555556fb390)
at core/../sieve/primesieve.hpp:62
#24 0x0000555555561fff in Worker::ProcessNextPrimeChunk (this=0x5555556fb380,
startFrom=1257, maxPrimeForChunk=1257) at core/Worker.cpp:155
#25 0x000055555555cbe1 in App::Sieve (this=0x5555555d98e0) at core/App.cpp:450
#26 0x000055555555ca3b in App::Run (this=0x5555555d98e0) at core/App.cpp:405
#27 0x0000555555563288 in main (argc=13, argv=0x7fffffffdba8)
at core/main.cpp:87[/CODE]

pepi37 2022-05-23 18:15

What is range of n?


[QUOTE=ryanp;606344]I've been able to repro this on multiple machines. Observations:

* it fails consistently and only with [C]-s "39*2^n+1"[/C]
* it fails with even -W 4, though this produces:

[

rogue 2022-05-23 19:29

When I have some time I will play around with this using the latest build as it uses a different compiler. I have also upgraded to the latest primesieve code. One of those changes might fix this issue.

pepi37 2022-05-23 21:58

fix for problem (temporary)
Make initial sieve with srsieve

use srsieve2 version 1.5.3
working on 6 core CPU without problem



srsieve2 -P 1446900638801[COLOR=Red][B] -W 6[/B][/COLOR] -w5e6 -i a.txt -o b.txt -O fact39.txt
srsieve2 v1.5.3, a program to find factors of k*b^n+c numbers for fixed b and variable k and n
Sieving with generic logic for p >= 100000000
Split 1 base 2 sequence into 204 base 2^360 sequences.
712822 bytes used for congruence tables
522 bytes used for Legendre tables
Sieve started: 1e8 < p < 1446900638801 with 1353444 terms (11924391 < n < 23704473, k*2^n+1) (expecting 463052 factors)
p=731561923, 679.7K p/sec, 142804 factors found at 476.8 f/sec (last 1 min), 0.0% done. ETC 2022-05-25 14:45

pepi37 2022-05-23 22:00

[QUOTE=rogue;606337]srsieve2 doesn't seem to perform nicely for large numbers of threads. I think it would require a rethinking of the framework to do that. You are probably off running multiple instances with a smaller number of threads, each with its own range. You can also bump -w to change the number of primes per worker thread. That might alleviate some of the thread contention. I do not have a CPU with more than 8 cores to run a test like this on.

I do not recall if sr1sieve is faster (compared to srsieve2). I know it is slower than srsieve2cl.[/QUOTE]




I noticed this on 12 core Ryzen ( HT is off) When I set 12 cores I only have 785% CPU utilization ( Linux)

dannyridel 2022-05-25 23:37

sgsieve returning CC 2nd kind numbers in sieve file
 
Hi rogue,

I used the sgsieve program (part of mtsieve) in order to sieve for my SG search. This is a sample command that I used:
sgsieve -P 10e13 --workers=5 -w1e7 -k 3 -K 2500000000 -n 143106 --outputterms=143105.txt -f NEWPGEN

However, when I opened my sieve file, I see the first line (for the example above):
100000000000067:C:0:2:5

According to the LLR documentation, the C indicates that the sieve file wants LLR to search "CC 2nd kind len 2", which isn't what I asked the sieve to produce. However, the numbers do come out to match the sg criteria (both the original prime and the to-be sg prime are in +1 form). Could you explain why the sieve sieves for +1 candidates instead of the vast majority of -1 primes seen on t5k?

Thank you,
Daniel

rogue 2022-05-26 12:14

[QUOTE=dannyridel;606511]Hi rogue,

I used the sgsieve program (part of mtsieve) in order to sieve for my SG search. This is a sample command that I used:
sgsieve -P 10e13 --workers=5 -w1e7 -k 3 -K 2500000000 -n 143106 --outputterms=143105.txt -f NEWPGEN

However, when I opened my sieve file, I see the first line (for the example above):
100000000000067:C:0:2:5

According to the LLR documentation, the C indicates that the sieve file wants LLR to search "CC 2nd kind len 2", which isn't what I asked the sieve to produce. However, the numbers do come out to match the sg criteria (both the original prime and the to-be sg prime are in +1 form). Could you explain why the sieve sieves for +1 candidates instead of the vast majority of -1 primes seen on t5k?

Thank you,
Daniel[/QUOTE]

Sophie Germain: A prime p is said to be a Sophie Germain prime if both p and 2p+1 are prime.
Cunningham Chain of the first kind of length 2: A prime p is said to be a Sophie Germain prime if both p and 2p+1 are prime.
Cunningham Chain of the second kind of length 2: A prime p is said to be a Sophie Germain prime if both p and 2p-1 are prime.

You can see that the first two are the same. A "C" in newpgen formst corresponds to that. You can run newgpen yourself and discover that.

newpgen can also sieve for Sophie Germain, but it uses p of the form k*2^n-1. sgsieve uses with p of the form k*2^n+1, since the main code originated from gfndsieve.

I can look into changing sgsieve to support either +1 or -1 terms for p.

dannyridel 2022-05-27 04:27

So in essence, both types of Cunningham Chains, if they result in prime pairs, can count as SG pairs? Eg. SG pairs can be either (p,2p-1) or (p,2p+1)?

Also, doesn't a C in newpgen correspond to CC 2nd kind? And S corresponds to first kind/SG?

axn 2022-05-27 05:03

[QUOTE=dannyridel;606614]So in essence, both types of Cunningham Chains, if they result in prime pairs, can count as SG pairs? Eg. SG pairs can be either (p,2p-1) or (p,2p+1)?[/QUOTE]

No. Only (p, 2p+1) is SG

[QUOTE=rogue;606553]Sophie Germain: A prime p is said to be a Sophie Germain prime if both p and 2p+1 are prime.[/QUOTE]

When p=k*2^n-1, 2p+1 = k*2^(n+1)-1. Both are easily provable by LLR.
When p=k*2^n+1, 2p+1 = k*2^(n+1)+3, but is still provable, since, by definition, we have 100% factorization of N-1. But you'll need PFGW with p=k*2^n+1 as a helper

rogue 2022-05-27 13:15

From the pfgw doc on newpgen formats:

[code]
99991:S:0:11:10 k*11^n-1 & k*11^(n+1)-1 (SG) (generalized SG since base is not 2)
99991:C:0:2:5 k*2^n+1 & k*2^(n+1)+1 (CC)
99991:B:0:2:15 k*2^n+-1 & k*2^(n+1)+-1 (BiTwin)
99991:Y:3:2:29 k*2^n+-1 & k*2^(n-1)+1 & k*2^(n+1)+1 (LP)
99991:Z:3:2:46 k*2^n+-1 & k*2^(n-1)-1 & k*2^(n+1)-1 (LM)
99991:J:3:2:11 k*2^n+-1 & k*2^(n+1)-1 (Twin/SG)
99991:K:3:2:7 k*2^n+-1 & k*2^(n+1)+1 (Twin/CC)
99991:1:3:2:42 CC 1st kind len 3 k*2^(n-1)-1 & k*2^n-1 & k*2^(n+1)-1
99991:2:3:2:21 CC 2st kind len 3 k*2^(n-1)+1 & k*2^n+1 & k*2^(n+1)+1
[/code]

sgsieve is outputting the second type because it is using k*2^n+1 instead of k*2^n-1.

The difference between "C", "1", and "2" is that "C" always has length of 2 whereas "1" and "2" will have a length > 2.

Note that sgsieve is limited to base 2. It should allow for other bases.

At some point I will modify sgsieve to be more generic. Some day I will create a ccsieve to support all of the forms above. sgsieve might be rolled into ccsieve when that happens.

It appears that the pfgw doc has CC 1st kind and CC 2nd kind reversed per the definition of Cunningham Chains (see [URL="https://mathworld.wolfram.com/CunninghamChain.html"]MathWorld[/URL]), but it is possible that newpgen is also reversed. There are a few possible places with a mistake: the pfgw doc, the pfgw code, the llr code, the newpgen code. I suspect that all of them are wrong. It would be easy enough in the pfgw doc to change the description of "1" and "2", but that could be misleading if anyone is using newpgen.

Happy5214 2022-05-29 15:25

Regarding the ARM version, what ever happened to the ARM ASM routines I sent you? Those were supposed to be translations of the [c]fpu_*.S[/c] files for x86. Did those not work?

Edit: They were attached to [url=https://www.mersenneforum.org/showpost.php?p=567515&postcount=71]this post[/url].

rogue 2022-05-30 00:11

[QUOTE=Happy5214;606780]Regarding the ARM version, what ever happened to the ARM ASM routines I sent you? Those were supposed to be translations of the [c]fpu_*.S[/c] files for x86. Did those not work?

Edit: They were attached to [url=https://www.mersenneforum.org/showpost.php?p=567515&postcount=71]this post[/url].[/QUOTE]

I have not integrated them yet. I suspect that the non-asm version of the mulmod is faster, but I have not compared the speeds. I only say that because the non-asm version is faster than the x86 asm. AVX is faster than the non-asm version of the mulmod, but very few of the sieves use AVX.

dannyridel 2022-06-02 16:01

[QUOTE=rogue;606636]From the pfgw doc on newpgen formats:

[code]
99991:S:0:11:10 k*11^n-1 & k*11^(n+1)-1 (SG) (generalized SG since base is not 2)
99991:C:0:2:5 k*2^n+1 & k*2^(n+1)+1 (CC)
99991:B:0:2:15 k*2^n+-1 & k*2^(n+1)+-1 (BiTwin)
99991:Y:3:2:29 k*2^n+-1 & k*2^(n-1)+1 & k*2^(n+1)+1 (LP)
99991:Z:3:2:46 k*2^n+-1 & k*2^(n-1)-1 & k*2^(n+1)-1 (LM)
99991:J:3:2:11 k*2^n+-1 & k*2^(n+1)-1 (Twin/SG)
99991:K:3:2:7 k*2^n+-1 & k*2^(n+1)+1 (Twin/CC)
99991:1:3:2:42 CC 1st kind len 3 k*2^(n-1)-1 & k*2^n-1 & k*2^(n+1)-1
99991:2:3:2:21 CC 2st kind len 3 k*2^(n-1)+1 & k*2^n+1 & k*2^(n+1)+1
[/code]

sgsieve is outputting the second type because it is using k*2^n+1 instead of k*2^n-1.

The difference between "C", "1", and "2" is that "C" always has length of 2 whereas "1" and "2" will have a length > 2.

Note that sgsieve is limited to base 2. It should allow for other bases.

At some point I will modify sgsieve to be more generic. Some day I will create a ccsieve to support all of the forms above. sgsieve might be rolled into ccsieve when that happens.[/QUOTE]

So this means that the files that sgsieve are outputting actually don't sieve for sg pairs but for CC chains of second kind?

I found a CC length 2 of 2nd kind with it anyways...

rogue 2022-06-02 16:35

[QUOTE=dannyridel;607014]So this means that the files that sgsieve are outputting actually don't sieve for sg pairs but for CC chains of second kind?

I found a CC length 2 of 2nd kind with it anyways...[/QUOTE]

Sorry I have created some confusion as I have confused myself. newpgen and pfgw are correct. My math was wrong.

If a term output from sgsieve is prime/PRP, then it is both Sophie-Germain and CC of the first kind, length 2.

I need to look at sgsieve and verify that it is doing what I think it is doing. I just haven't had time to do that.

If pfgw or llr say that both terms are prime/PRP, then that is good enough.

dannyridel 2022-06-03 01:38

[QUOTE=rogue;607015]Sorry I have created some confusion as I have confused myself. newpgen and pfgw are correct. My math was wrong.

If a term output from sgsieve is prime/PRP, then it is both Sophie-Germain and CC of the first kind, length 2.

I need to look at sgsieve and verify that it is doing what I think it is doing. I just haven't had time to do that.

If pfgw or llr say that both terms are prime/PRP, then that is good enough.[/QUOTE]


Oh that's fine. At least twinsieve works :)
LLR does find the primes, but they don't find SGs, as I said above.
When will the fix come out?

rogue 2022-06-03 12:28

[QUOTE=dannyridel;607048]Oh that's fine. At least twinsieve works :)
LLR does find the primes, but they don't find SGs, as I said above.
When will the fix come out?[/QUOTE]

I do not know. I have yet to finish the work for the Apple Metal support. I'm close to getting that working, but I haven't had much time.

rogue 2022-06-03 16:02

I have just uploaded mtsieve 2.3.2 to sourceforge. Here are the changes:

[code]
Switched to llvm-mingw compiler in Windows. I have tried three other compilers, I cannot
use gdb with the latest msys2 or mingw64 compilers. gdb gives an error when I start the
programs, but only if they are built with -g. The latest cygwin compiler fails to
link the code. Others have seen the same error I have seen with other sofwtware and
nobody has provided a solution.

There were some spurious reports of bugs with the 2.3.0/2.3.1 release. This bugs
seem to have gone away now that I switched to llvm-mingw.

framework:
Updated primesieve to 7.8.

All remaining SSE code has been removed due to a bug resetting the rounding mode.
kbbsieve was the only program using it and is now faster without it.

Created abstract Device and Kernel classes and extend OpenCL and Metal classes
from those. This means that none of the GpuWorker classes need to include
or need to rely on logic specific to OpenCL or Metal as the implementation is
hidden from those workers.

kbbsieve: version 1.1
Switched to Montgomery logic. This is about 20% faster than the SSE logic.
[/code]

The bug reported by ryanp still exists. This seems to be a problem when starting a new sieve with -W > 1. Unfortunately although gdb is now working again, I cannot reproduce when running within gdb. For the time being I suggest starting the sieve with a single worker and sieving to 1e4. Once sieved to that depth, start srsieve2 again using the output from the previous run and -W > 1. srsieve should automatically do that. In other words it should use only a single worker to start (to knock out all of the candidates with small divisors), then start the other workers. I suspect problem lies with that mechanism.

rogue 2022-06-03 16:37

[QUOTE=rogue;607083]The bug reported by ryanp still exists. This seems to be a problem when starting a new sieve with -W > 1. Unfortunately although gdb is now working again, I cannot reproduce when running within gdb. For the time being I suggest starting the sieve with a single worker and sieving to 1e4. Once sieved to that depth, start srsieve2 again using the output from the previous run and -W > 1. srsieve should automatically do that. In other words it should use only a single worker to start (to knock out all of the candidates with small divisors), then start the other workers. I suspect problem lies with that mechanism.[/QUOTE]

Upon further review, this appears to be a bug in primesieve given the stack trace or that primesieve is triggering a bug on the OS. It does not consistently crash on Windows, which is why I'm having difficulty reproducing the problem. In other words I can sometimes run and it fails and I can sometimes run and it does not fail. I just cannot get it to fail under gdb. Very annoying. I will see if I can reproduce on OS X.

Happy5214 2022-06-06 15:09

[QUOTE=rogue;606800]I have not integrated them yet. I suspect that the non-asm version of the mulmod is faster, but I have not compared the speeds. I only say that because the non-asm version is faster than the x86 asm. AVX is faster than the non-asm version of the mulmod, but very few of the sieves use AVX.[/QUOTE]

I assume that's both an algorithmic and processor unit advantage (Montgomery arithmetic with integers on the faster ALU vs. a more straightforward implementation with floats on the slower FPU, or whatever unit handles SSE). I compiled a few of the Montgomery functions to ASM on x86 just to see how much optimization GCC does on them with [C]-O2[/C], and at least on [C]REDC[/C] and [C]mul[/C] I couldn't find anywhere to hand-optimize. I imagine ARM would be similar, though I'd have to get my ODROID set up again (we switched to a new ISP and that messed up the wiring setup) to know for sure.

rogue 2022-06-06 15:39

[QUOTE=Happy5214;607228]I assume that's both an algorithmic and processor unit advantage (Montgomery arithmetic with integers on the faster ALU vs. a more straightforward implementation with floats on the slower FPU, or whatever unit handles SSE). I compiled a few of the Montgomery functions to ASM on x86 just to see how much optimization GCC does on them with [C]-O2[/C], and at least on [C]REDC[/C] and [C]mul[/C] I couldn't find anywhere to hand-optimize. I imagine ARM would be similar, though I'd have to get my ODROID set up again (we switched to a new ISP and that messed up the wiring setup) to know for sure.[/QUOTE]

If you want to do some comparisons on your own, take a look at the MpArith.h class and it uses. MpArithVec.h is the same, but does 4 at a time. I expect -O2/-O3 to help greatly when using that.

rogue 2022-06-06 16:10

In making changes to sgsieve to work like the Sophie-Germain sieve in newpgen, it appears that newpgen is missing factors for 2p+1 terms. sgsieve is outputting valid factors (per pfgw). I need to do more investigation to see if I am misunderstanding something else.

rogue 2022-06-07 14:55

I think that I have determined what is happening.

sgsieve is sieving for k*b^n-1 and 2*k*b^n-1

I suspect that newpgen is sieving for k*b^n-1 and k*b^(n+1)-1

sgsieve is sieving for the traditional form which is p and 2p+1. This is only a problem when b != 2.

I do not know if anyone is sieving for b != 2 with newpgen.

I do not know if llr, pfgw, sgsieve, and newpgen are all in sync with npg file formats when b != 2.

I would appreciate if someone could run some tests with llr, pfgw, and newpgen to determine if they all handle the npg file format for Sophie-Germains properly for b != 2.

rogue 2022-06-07 18:38

I looked again at the pfgw doc on newpgen formats. I was correct. When the base != 2, then newpgen sieves per my suspicion. I will modify sgsieve to support a switch for "true" SG sieving (what it does now) or "generalized" SG sieving (which is what newpgen does). Since nobody (as far as I know) is searching for Sophie-Germain primes for base != 2, this shouldn't be a problem.

Note this is one of the reasons I dislike newpgen file formats. ABC and ABCD formats are easier to understand.

twobombs 2022-06-08 18:59

1 Attachment(s)
[QUOTE=rogue;607235]In making changes to sgsieve to work like the Sophie-Germain sieve in newpgen, it appears that newpgen is missing factors for 2p+1 terms. sgsieve is outputting valid factors (per pfgw). I need to do more investigation to see if I am misunderstanding something else.[/QUOTE]

FYI: mtsieve runs into compile errors on the Sophie-Germain code since a couple of days. ( see attachment of Docker Hub build log)

Code invoking the build : [url]https://github.com/twobombs/thereminq/blob/f37abb8f9f2df2479adf101428541ea1ab8e9fb0/Dockerfiles/Dockerfile-sieve#L12[/url]

rogue 2022-06-08 19:36

[QUOTE=twobombs;607363]FYI: mtsieve runs into compile errors on the Sophie-Germain code since a couple of days. ( see attachment of Docker Hub build log)

Code invoking the build : [url]https://github.com/twobombs/thereminq/blob/f37abb8f9f2df2479adf101428541ea1ab8e9fb0/Dockerfiles/Dockerfile-sieve#L12[/url][/QUOTE]

Which compiler is this? The syntax is correct. [] is an overloaded operator of MpResVector.

Does mfsieve build? It uses the same syntax.

japelprime 2022-07-17 19:44

fbncsieve
 
I was looking at fkbncsieve. In the html multi- threaded sieve framework link it says this program is for both k*b^n+1 and k*b^n-1 form. Using Command prompt and fbncsieve -h it says k*b^n+c form. Is there any -1 or +1 switch for fbncsieve win batch file for some of the mtsieve programs. I was just thinking of start sieving k*b^n-1 from scratch without any "-i" input file and also later with input file.

C:\EB\Prime\mtsieve\mtsieve_2.3.2 - profa>fkbnsieve -h
fkbnsieve v1.4, a program to find factors of k*b^n+c numbers for fixed k, b, and n and variable c
-h --help prints this help
-p --pmin=P0 sieve start: P0 < p (default 1)
-P --pmax=P1 sieve end: p < P1 (default 2^62)
-w --worksize=w primes per chunk of work (default 1000000)
-W --workers=W start W workers (default 0)
-A --applyandexit apply factors and exit (used with -I)
-i --inputterms=i input file of remaining candidates
-I --inputfactors=I input file with factors (used with -A)
-o --outputterms=o output file of remaining candidates
-O --outputfactors=O output file with new factors
-c --cmin=c Minimum c to search
-C --cmax=C Maximum c to search
-s --sequence=s Sequence to find factors of in form k*b^n+c where k, b, and n are integer values

rogue 2022-07-17 22:48

[QUOTE=japelprime;609710]I was looking at fkbncsieve. In the html multi- threaded sieve framework link it says this program is for both k*b^n+1 and k*b^n-1 form. Using Command prompt and fbncsieve -h it says k*b^n+c form. Is there any -1 or +1 switch for fbncsieve win batch file for some of the mtsieve programs. I was just thinking of start sieving k*b^n-1 from scratch without any "-i" input file and also later with input file.

C:\EB\Prime\mtsieve\mtsieve_2.3.2 - profa>fkbnsieve -h
fkbnsieve v1.4, a program to find factors of k*b^n+c numbers for fixed k, b, and n and variable c
-h --help prints this help
-p --pmin=P0 sieve start: P0 < p (default 1)
-P --pmax=P1 sieve end: p < P1 (default 2^62)
-w --worksize=w primes per chunk of work (default 1000000)
-W --workers=W start W workers (default 0)
-A --applyandexit apply factors and exit (used with -I)
-i --inputterms=i input file of remaining candidates
-I --inputfactors=I input file with factors (used with -A)
-o --outputterms=o output file of remaining candidates
-O --outputfactors=O output file with new factors
-c --cmin=c Minimum c to search
-C --cmax=C Maximum c to search
-s --sequence=s Sequence to find factors of in form k*b^n+c where k, b, and n are integer values[/QUOTE]

Not certain that I understand your question. You can use -1 or +1 for "+c".

japelprime 2022-07-18 01:03

[QUOTE=rogue;609717]Not certain that I understand your question. You can use -1 or +1 for "+c".[/QUOTE]

Of course. Silly me.
Thanks.

rogue 2022-07-18 14:26

I have finally released mtsieve 2.3.3. Changes include:

[code]
framework:
Updated primesieve to 7.9. This addresses an issue in primesieve that can crash
the mtsieve applications.

Changed usage of primesieve (thanks to hints from Kim Wallisch, its creator) to
improve sieving peformance. As a result the worker threads now use an array instead
of a vector.

Broke the main sieving loop into two loops, one for when limited to a single thread
or for sieving prior to switching to the GPU. The second loop handles multiple worker
threads much better.

Reduced the initial CPU worksize from 1e6 to 96e3. When the largest prime tested
reaches 1e6, the worker thread can automatically adjust the worksize in an effort
to have each chunk take between 1 and 5 seconds to process. This has two affects.
First, it will allow the CPU workers to stop more quickly when the user hits ^C.
Second, it will do a better job at keeping CPU workers busy when using mulitple workers.

fkbnsieve: version 1.5
Only verify the first few factors found for each prime to speed up initial sieve.

sgsieve: version 1.3
Changed to support p/2p+1 where p is of the form k*b^n-1. This makes it compatible
with newpgen. As of 1.2 p was of the form k*b^n+1.

srsieve2: vesion 1.6.3
Default to not use Legendre tables for multiple sequences unless -l is used.
[/code]

I have still not completed the Metal changes with testing. It has been a busy summer so I can't get that to the finish line. I think it is close.
I just need to have the time and motivation to finish. The sieves with no x86 ASM will compile on ARM and thus on Apple's M1/M2 CPUs.

japelprime 2022-07-18 23:34

I notice trouble with the ^ sign for me in batch file. Maybe this error outcome is regarding my wrong format for k*b^n+c but I am not sure what is wrong when the output file is not the same as input.
command promt shows -s k*21964-1 (without powerfaktor ^) but my batch file is written with k*2^1964-1

Command prompt:
C:\EB\Prime\mtsieve\fbncsieve_minus_1964>fbncsieve.exe -s k*21964-1 -k 1 -K 4000000 -P 10e14 -o remaining_faktors_prufa.txt
fbncsieve v1.4, a program to find factors of k*b^n+c numbers for fixed b, n, and c and variable k
Fatal Error: sequence must be in form k*b^n+c where you specify values for b, n and c

slandrum 2022-07-19 01:11

[QUOTE=japelprime;609801]I notice trouble with the ^ sign for me in batch file. Maybe this error outcome is regarding my wrong format for k*b^n+c but I am not sure what is wrong when the output file is not the same as input.
command promt shows -s k*21964-1 (without powerfaktor ^) but my batch file is written with k*2^1964-1[/QUOTE]

Caret (^) is an escape character in .cmd or .bat batch files, try doubling it.

rogue 2022-07-19 12:35

[QUOTE=japelprime;609801]I notice trouble with the ^ sign for me in batch file. Maybe this error outcome is regarding my wrong format for k*b^n+c but I am not sure what is wrong when the output file is not the same as input.
command promt shows -s k*21964-1 (without powerfaktor ^) but my batch file is written with k*2^1964-1

Command prompt:
C:\EB\Prime\mtsieve\fbncsieve_minus_1964>fbncsieve.exe -s k*21964-1 -k 1 -K 4000000 -P 10e14 -o remaining_faktors_prufa.txt
fbncsieve v1.4, a program to find factors of k*b^n+c numbers for fixed b, n, and c and variable k
Fatal Error: sequence must be in form k*b^n+c where you specify values for b, n and c[/QUOTE]

The ^ symbol is like an escape characters on the Window command line. You can either use -s k*2^^1964+1 or -s "k*2^1864+1".

pepi37 2022-07-20 09:00

Give me advice :)
If I run srsieve1 then I got this
[QUOTE]sr1sieve145.exe -i b.txt -o b.txt -P 600000000000000 -f factors.txt
sr1sieve 1.4.5 -- A sieve for one sequence k*b^n+/-1.
Read 14081 terms for 4*395^n+1 from NewPGen file `b.txt'.
sr1sieve 1.4.5 started: 500026 <= n <= 999994, 500000000000000 <= p <= 600000000000000
p=500002651324609, 44403358 p/sec, 0 factors, 0.0% done, 0 sec/factor, ETA 15 Aug 14:50[/QUOTE]If I run latest srsieve2 (posted before few days) on six core CPU (utilization near 96%)
[QUOTE]srsieve2 -P 600000000000000 -W6 -w2500000 -i b.txt -o b1.txt -f B -O factors.txt
srsieve2 v1.6.3, a program to find factors of k*b^n+c numbers for fixed b and variable k and n
Sieving with single sequence c=1 logic for p >= 500000000000000
BASE_MULTIPLE = 30, POWER_RESIDUE_LCM = 720, LIMIT_BASE = 720
Split 1 base 395 sequence into 40 base 395^180 sequences.
Legendre summary: Approximately 1 B needed for Legendre tables
1 total sequences
1 are eligible for Legendre tables
0 are not eligible for Legendre tables
1 have Legendre tables in memory
0 cannot have Legendre tables in memory
0 have Legendre tables loaded from files
1 required building of the Legendre tables
518400 bytes used for congruent q and ladder indices
259200 bytes used for congruent qs and ladders
Sieve started: 5e14 < p < 6e14 with 14081 terms (500026 < n < 999994, k*395^n+1) (expecting 75 factors)
[COLOR=red]p=500016415279873, 7.969M p/sec[/COLOR], no factors found, 0.0% done. ETC 2022-07-24 19:14[/QUOTE]

And last , latest version of srsieve2cl. I use sieve file from CRUS page , sequence 4*395+1. I use countless combination of commands to get GPU work, without any success. Max speed I get is same as using 6 core GPU.
[B]GPU utilization is always 0 % [/B]. So please be kind , use same sieve as I , and run it on your GPU, and then post what commands you use to utilize GPU to work.
[QUOTE]mtsieve2022>srsieve2cl -P 600000000000000 -W 6 -g 10 -D 1 -d 0 -i b.txt -o b1.txt -f B -O factors.txt
srsieve2cl v1.6.3, a program to find factors of k*b^n+c numbers for fixed b and variable k and n
Sieving with single sequence c=1 logic for p >= 500000000000000
BASE_MULTIPLE = 30, POWER_RESIDUE_LCM = 720, LIMIT_BASE = 720
Split 1 base 395 sequence into 40 base 395^180 sequences.
Legendre summary: Approximately 1 B needed for Legendre tables
1 total sequences
1 are eligible for Legendre tables
0 are not eligible for Legendre tables
1 have Legendre tables in memory
0 cannot have Legendre tables in memory
0 have Legendre tables loaded from files
1 required building of the Legendre tables
518400 bytes used for congruent q and ladder indices
259200 bytes used for congruent qs and ladders
Sieve started: 5e14 < p < 6e14 with 14081 terms (500026 < n < 999994, k*395^n+1) (expecting 75 factors)
Increasing worksize to 160000 since each chunk is tested in less than a second
Increasing worksize to 2560000 since each chunk is tested in less than a second
[COLOR=red]p=500031657619723, 7.727M p/sec,[/COLOR] no factors found, 0.0% done. ETC 2022-07-24 21:51
CTRL-C accepted. Threads will stop after sieving to 500045438037643
Sieve interrupted at p=500045438037643.
CPU time: 1015.75 sec. (22.17 sieving) (5.83 cores) [COLOR=Red][B]GPU time: 0.00 sec.[/B][/COLOR]
14081 terms written to b1.txt
Primes tested: 1342496000. Factors found: 0. Remaining terms: 14081. Time: 174.24 seconds.[/QUOTE]

rogue 2022-07-20 12:30

-W is for CPU workers. -G is for GPU workers. If you do not specify -W then it will not use a CPU worker to find factors.

Also note that the p/sec rates are computed differently. You can see that srsieve2cl will finish weeks earlier than sr1sieve for the same range.

It is possible that the calculation of time spent in the GPU is incorrect. You can use Windows Task Manager (Performance tab) to see how heavily it is using the GPU.

Based upon Task Manager's GPU utilization you should adjust -G/-g.

pepi37 2022-07-20 21:15

srsieve2cl -P 600000000000000 -G 5 -g 100 -D 1 -d 1 -i b.txt -o b1.txt -f B -O factors.txt
works on [COLOR=Red]version 1.6.2[/COLOR] and give me speed of 4.351M p/sec, GPU utilization 99%
On same settings 1.6.3 give me error
[QUOTE]srsieve2cl -P 600000000000000 -G 5 -g 100 -D 1 -d 1 -i b.txt -o b1.txt -f B -O factors.txt
srsieve2cl v1.6.3, a program to find factors of k*b^n+c numbers for fixed b and variable k and n
Sieving with single sequence c=1 logic for p >= 500000000000000
BASE_MULTIPLE = 30, POWER_RESIDUE_LCM = 720, LIMIT_BASE = 720
Split 1 base 395 sequence into 40 base 395^180 sequences.
Legendre summary: Approximately 1 B needed for Legendre tables
1 total sequences
1 are eligible for Legendre tables
0 are not eligible for Legendre tables
1 have Legendre tables in memory
0 cannot have Legendre tables in memory
0 have Legendre tables loaded from files
1 required building of the Legendre tables
518400 bytes used for congruent q and ladder indices
259200 bytes used for congruent qs and ladders
GPU primes per worker is 563200
Sieve started: 5e14 < p < 6e14 with 14081 terms (500026 < n < 999994, k*395^n+1) (expecting 75 factors)
OpenCL Error: Out of resources
OpenCL Error: Out of resources
in call to clEnqueueWriteBuffer
argument: babySteps[/QUOTE]

rogue 2022-07-20 21:21

[QUOTE=pepi37;609924]srsieve2cl -P 600000000000000 -G 5 -g 100 -D 1 -d 1 -i b.txt -o b1.txt -f B -O factors.txt
works on [COLOR=Red]version 1.6.2[/COLOR] and give me speed of 4.351M p/sec, GPU utilization 99%
On same settings 1.6.3 give me error[/QUOTE]

"Out of resources" implies that -g or -G is too high or that -D or -d are pointing to the wrong GPU.

If you have a single GPU, I suggest not using -G at all, but adjust -g. Use -h to list the platforms and devices. Most likely you will not want to change -D or -d.

pepi37 2022-07-20 21:25

[QUOTE=rogue;609925]"Out of resources" implies that -g or -G is too high or that -D or -d are pointing to the wrong GPU.

If you have a single GPU, I suggest not using -G at all, but adjust -g. Use -h to list the platforms and devices. Most likely you will not want to change -D or -d.[/QUOTE]


Same settings on same card WORK on 1.62 without any problem.
So 1.63 is problematic, or give me error, or just exit after few seconds.


I have 3 GPU, one is intel , integrated and second and third card are Nvidia 1660 Super with 6 GB DDR6
Settings are point to "second" Nvidia

rogue 2022-07-20 22:25

[QUOTE=pepi37;609927]Same settings on same card WORK on 1.62 without any problem.
So 1.63 is problematic, or give me error, or just exit after few seconds.


I have 3 GPU, one is intel , integrated and second and third card are Nvidia 1660 Super with 6 GB DDR6
Settings are point to "second" Nvidia[/QUOTE]

How many primes per GPU worker in 1.6.2 vs 1.6.3?

pepi37 2022-07-20 22:45

[QUOTE=rogue;609931]How many primes per GPU worker in 1.6.2 vs 1.6.3?[/QUOTE]


In both case same


GPU primes per worker is 563200

rogue 2022-07-21 01:44

Not certain why 1.6.3 would behave differently. Can you reduce -g and try again?

Happy5214 2022-07-31 15:02

The srsieve BOINC output format mask (i.e. the 257 or 258) is incompatible with LLR versions in the 4.0.x series. Jean added a new meaning for the 0x100 bit in LLR 4.0.0 that makes those sieve files process the 2[I]k[/I]*[I]b[/I]^[I]n[/I]-3 form as well. The correct values should be 1 or 2, respectively, though I did bring this up in the LLR thread in the event Jean wants to revert the change. The bit was undocumented in the LLR readme, so I don't know if he dug it out of NewPGen or somewhere.

pepi37 2022-07-31 21:28

e:\MTSIEVE\[COLOR=Magenta][B]MTSIEVE-2-3-3[/B][/COLOR]>kbbsieve -P 100000 -W1 -k 77 -b 2 -B 200000
kbbsieve v1.1, a program to find factors of k*b^b+c numbers for fixed k and c = +1/-1
199998 even terms removed
Sieve started: 3 < p < 1e5 with 199998 terms (k = 77, 2 <= b <= 200000) (expecting 180913 factors)
CTRL-C accepted. Threads will stop after sieving to 100000
Decreasing worksize to 5024 since each chunk needs more than 5 seconds to test
Sieve completed at p=100003.
CPU time: 10.28 sec. (0.00 sieving) (1.11 cores)
[U][B]Fatal Error: Something is wrong. Counted terms (22861) != expected terms (22859)[/B][/U]

rogue 2022-08-01 03:53

[QUOTE=pepi37;610574]e:\MTSIEVE\[COLOR=Magenta][B]MTSIEVE-2-3-3[/B][/COLOR]>kbbsieve -P 100000 -W1 -k 77 -b 2 -B 200000
kbbsieve v1.1, a program to find factors of k*b^b+c numbers for fixed k and c = +1/-1
199998 even terms removed
Sieve started: 3 < p < 1e5 with 199998 terms (k = 77, 2 <= b <= 200000) (expecting 180913 factors)
CTRL-C accepted. Threads will stop after sieving to 100000
Decreasing worksize to 5024 since each chunk needs more than 5 seconds to test
Sieve completed at p=100003.
CPU time: 10.28 sec. (0.00 sieving) (1.11 cores)
[U][B]Fatal Error: Something is wrong. Counted terms (22861) != expected terms (22859)[/B][/U][/QUOTE]

I see what the issue is. A factor is counted when it shouldn't be. The output .pfgw file is correct. I will fix sometime this week.

japelprime 2022-08-01 15:41

Maybe this helps pepi37. I was running gfndsievecl. I found that if -g is to low then the CPU will not load any work to GPU. The CPU will do the work without distributing work to GPU. (explains high CPU time and low or no GPU time) when I skip -g command no work goes to GPU because default limit is to low. Not even -g 100 moves the card to workload. When putting -g 200 I start to see the CPU buffering up the extra work to GPU card. From 5% workload for -g 200 to 8% for -g 300. So maybe default -g value should be somehow higher. I used -S 10000

rogue 2022-08-01 18:09

[QUOTE=japelprime;610617]Maybe this helps pepi37. I was running gfndsievecl. I found that if -g is to low then the CPU will not load any work to GPU. The CPU will do the work without distributing work to GPU. (explains high CPU time and low or no GPU time) when I skip -g command no work goes to GPU because default limit is to low. Not even -g 100 moves the card to workload. When putting -g 200 I start to see the CPU buffering up the extra work to GPU card. From 5% workload for -g 200 to 8% for -g 300. So maybe default -g value should be somehow higher. I used -S 10000[/QUOTE]

You could also try a higher value for -G. I haven't tried that myself.

pepi37 2022-08-01 21:10

[QUOTE=japelprime;610617]Maybe this helps pepi37. I was running gfndsievecl. I found that if -g is to low then the CPU will not load any work to GPU. The CPU will do the work without distributing work to GPU. (explains high CPU time and low or no GPU time) when I skip -g command no work goes to GPU because default limit is to low. Not even -g 100 moves the card to workload. When putting -g 200 I start to see the CPU buffering up the extra work to GPU card. From 5% workload for -g 200 to 8% for -g 300. So maybe default -g value should be somehow higher. I used -S 10000[/QUOTE]
I will try this
Thanks!

japelprime 2022-08-14 13:07

I am trying xyyxsievecl. I have some questions.

OpenCL Error: Out of resources
in call to clEnqueueReadBuffer
argument: factorCount

Is this error a driver- old hardware problem or wrong parameter setting ? I am not sure where to look. I did clean driver install.

On command prompt I see "xyyxsieve v1.8" on screen when calling xyyxsievecl version (both versions under same folder)

switch -s b seems not to be working for me here.

When calling xyyxsievecl the program works fine but only for cpu work. -W1 and -W2 are fine but not -G or different -M or any other parameters when I try to effect GPU. Most of the time I get blank screen for 2 second as the card try to start working and then this OpenCL error. When it is stable there is no workrate at GPU when running. I have Geforce GTX 760 card.

The same PC works fine trying gfndsievecl for the gpu part.

rogue 2022-08-15 00:13

This implies you do not have enough GPU memory. I believe there is a -S option. Try reducing the number of steps per GPU execution

japelprime 2022-08-16 20:46

[QUOTE=rogue;611461]This implies you do not have enough GPU memory. I believe there is a -S option. Try reducing the number of steps per GPU execution[/QUOTE]

Thanks. It is working now.


All times are UTC. The time now is 18:40.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2023, Jelsoft Enterprises Ltd.