mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   Software (https://www.mersenneforum.org/forumdisplay.php?f=10)
-   -   mtsieve (https://www.mersenneforum.org/showthread.php?t=23042)

gd_barnes 2023-03-21 09:32

[QUOTE=kar_bon;627123]Look at the "SVN" index in the "sierpinski_riesel" folder: main source is SierpinskiRieselApp.cpp.[/QUOTE]

Great! Thanks, Karsten.

rogue 2023-03-21 13:06

To d/l all of the code you can use svn checkout.

srsieve2/srsieve2cl is by far the most complex code built upon the framework. There are the "Generic" classes which have the srsieve functionality. "CisOneWithMultipleSequences" classes have the sr2sieve functionality. "CisOneWithOneeSequence" classes have the sr1sieve functionality. All GPU code is in the .gpu files. Are build time these are run thru a converter to create the .h files, which are needed by the GPU worker classes. The .gpu files use OpenCL C, which is easily understood if you know C.

As I have stated before sr1sieve and sr2sieve are likely faster and srsieve2, but srsieve2cl is likely faster than sr1sieve and sr2sieve. The only reason to use srsieve2 on Windows is if you want to take advantage of multi-threading or if you cannot use sr1sieve/sr2sieve. I have no intention of changing srsieve2 to compete directly with sr1sieve/sr2sieve. That would require a lot of ASM code and I have avoided such code to ensure portability to other CPU architectures, such as ARM. Some less used sieves still have ASM. Unless asked, I will probably not update those for ARM support. Some sieves support AVX (which uses ASM), but they also have a non-AVX code path.

In short srsieve2 is not meant as a replacement for sr1sieve/sr2sieve. I was focused on srsieve2cl. At some point I will write the GPU equivalent code for sr2sieve. Fortunately the Generic code in srsieve2cl is fast enough to replace sr2sieve so it hasn't been too high on my priority list.

I would be happy to answer any questions.

storm5510 2023-03-21 15:48

[U]fbncsieve fatal error:
[/U]
[CODE]D:\sieve>fbncsieve -P5e12 -i1897-3.abcd -o1897-4.abcd
fbncsieve v1.5, a program to find factors of k*b^n+c numbers for fixed b, n, and c and variable k
Sieve started: 1000000000039 < p < 5e12 with 30071 terms (1000010 < k < 1999980, k*18970509^3+1) (expecting 1655 factors)
Increasing worksize to 1600000 since each chunk is tested in less than a second
Increasing worksize to 200000000 since each chunk is tested in less than a second
[B]Fatal Error: 1302598*18970509^3+1 mod 1014558378077 = 758131303968[/B][/CODE]

The original series was "k*18970509^3+1." -k 1e6, -K 2e6. I had no problem running -P to 1e12. After trying to run the above, I dropped -P to 2e12. Same error.

I checked to make sure I had the latest build. Unless something has changed in the past day, it appears I do.

rogue 2023-03-21 16:30

[QUOTE=storm5510;627134][U]fbncsieve fatal error:
[/U]
[CODE]D:\sieve>fbncsieve -P5e12 -i1897-3.abcd -o1897-4.abcd
fbncsieve v1.5, a program to find factors of k*b^n+c numbers for fixed b, n, and c and variable k
Sieve started: 1000000000039 < p < 5e12 with 30071 terms (1000010 < k < 1999980, k*18970509^3+1) (expecting 1655 factors)
Increasing worksize to 1600000 since each chunk is tested in less than a second
Increasing worksize to 200000000 since each chunk is tested in less than a second
[B]Fatal Error: 1302598*18970509^3+1 mod 1014558378077 = 758131303968[/B][/CODE]

The original series was "k*18970509^3+1." -k 1e6, -K 2e6. I had no problem running -P to 1e12. After trying to run the above, I dropped -P to 2e12. Same error.

I checked to make sure I had the latest build. Unless something has changed in the past day, it appears I do.[/QUOTE]

Please send my your ABCD file. I will take a look.

storm5510 2023-03-21 18:42

1 Attachment(s)
[QUOTE=rogue;627138]Please send my your ABCD file. I will take a look.[/QUOTE]

Attached.

rogue 2023-03-21 20:16

I can fix this, but lost some speed in the process. I don't know if I can restore the speed without re-introducing this issue. I need to look at it further.

gd_barnes 2023-03-22 02:01

[QUOTE=rogue;627127]To d/l all of the code you can use svn checkout.

srsieve2/srsieve2cl is by far the most complex code built upon the framework. There are the "Generic" classes which have the srsieve functionality. "CisOneWithMultipleSequences" classes have the sr2sieve functionality. "CisOneWithOneeSequence" classes have the sr1sieve functionality. All GPU code is in the .gpu files. Are build time these are run thru a converter to create the .h files, which are needed by the GPU worker classes. The .gpu files use OpenCL C, which is easily understood if you know C.

As I have stated before sr1sieve and sr2sieve are likely faster and srsieve2, but srsieve2cl is likely faster than sr1sieve and sr2sieve. The only reason to use srsieve2 on Windows is if you want to take advantage of multi-threading or if you cannot use sr1sieve/sr2sieve. I have no intention of changing srsieve2 to compete directly with sr1sieve/sr2sieve. That would require a lot of ASM code and I have avoided such code to ensure portability to other CPU architectures, such as ARM. Some less used sieves still have ASM. Unless asked, I will probably not update those for ARM support. Some sieves support AVX (which uses ASM), but they also have a non-AVX code path.

In short srsieve2 is not meant as a replacement for sr1sieve/sr2sieve. I was focused on srsieve2cl. At some point I will write the GPU equivalent code for sr2sieve. Fortunately the Generic code in srsieve2cl is fast enough to replace sr2sieve so it hasn't been too high on my priority list.

I would be happy to answer any questions.[/QUOTE]

Thanks for the info Mark. It's interesting to see all of the complex code. As a former mainframe programmer, I know barely enough to be dangerous in C/C++ having had single classes of Pascal and C+ in college. Mainly I'd like to dabble a little bit with creating my own builds. I don't know enough to change general logic but it would be interesting to tweak the cosmetics of the output in srsieve2.

What all would be involved in creating my own executable?

It's interesting that you bring up srsieve2 not being able to compete with sr2sieve/sr1sieve as far as overall throughput on multi-core machines. I've generally found that to be true but I have found a major exception: CRUS Sierp base 66. I'm getting much more overall throughput with srsieve2 vs. multiple instances of sr2sieve with the -x switch on 3 different machines: Intel 8-core/8-thread, Intel 8-core/16-thread, and AMD 16-core/32-thread. Perhaps srsieve2 is faster when you have to use the -x switch in sr2sieve due to the many large k-values. But based on your explanations in various places, I don't know why.

Eventually I want to fiddle with running srsieve2cl. I don't know anything about GPU's but I believe my Ryzen 3950X has one that would do quite well with this.

rogue 2023-03-22 12:37

To build on Windows I use clang 14.0.0 (from the llvm project on github). The build the GPU executables you will also need perl. With those installed you just need to use "make" or "make <program>" from the command line from the directory with the makefile.

I have seen the similar results with sr2sieve -x vs srsieve2. In other words some conjectures sieve faster with sr2sieve -x, but others sieve faster with srsieve2. I have not investigated why. As you stated it likely has something to do with large k, but it isn't obvious in looking at either sr2sieve or srsieve2 since they have very different implementations.

FYI all command line output is generated with calls to WriteToConsole(). Many of these are in App.cpp. You will find most (but not all) of the rest in the xxApp.cpp class specific to the sieve. [url]https://www.mersenneforum.org/rogue/mtsieve.html[/url] has more detail on the framework including descriptions of the framework classes and methods. I would be happy to answer any questions.

rogue 2023-03-22 14:26

[QUOTE=rogue;627145]I can fix this, but lost some speed in the process. I don't know if I can restore the speed without re-introducing this issue. I need to look at it further.[/QUOTE]

I found the issue. For larger bases it requires different logic. twinsieve is also impacted by this, but I think I can use the faster logic for ccsieve for some forms.

rogue 2023-03-24 16:17

I have posted mtsieve 2.4.5 at sourceforge. Here are a list of changes:

[code]
framework:
Replace vsprintf with vsnprintf.

srsieve2/srsieve2cl: version 1.6.9
Fix an issue that occurs when logging factors and using multiple threads.

gcwsieve/gcwsievecl: version 1.5.1
Log terms of GFN or Mersenne forms as they are removed.

fbncsieve: version 1.6
Implement different logic (which is 5x slower) for larger bsaes to avoid invalid factors.
Only verify first factor for the first k for each prime.
Reduce memory usage for odd bases since we only track even k.
Reduce memory usage for base 2 since we only track odd k.
Output primes to a separate file.

twincsieve: version 1.6
Implement different logic (which is 5x slower) for larger bsaes to avoid invalid
factors. This only applies to b^n forms.
Add support to sieve for factorial/primorial twins.
Reduce memory usage for odd bases since we only track even k.
Reduce memory usage for base 2 since we only track odd k.
Only verify first factor for the first k for each prime.

ccsieve: version 1.2
Implement different logic (which is up to 2x faster) for b^n forms.

[/code]

storm5510 2023-03-25 17:29

[CODE]...287 factors found at 234 sec per factor (last 163 min)...[/CODE]

From [I]srsieve2[/I]: What is this time keeping method? It is certainly not real-time like a clock, other than the elapsed time at the end. :ermm:

rogue 2023-03-25 21:53

[QUOTE=storm5510;627328][CODE]...287 factors found at 234 sec per factor (last 163 min)...[/CODE]

From [I]srsieve2[/I]: What is this time keeping method? It is certainly not real-time like a clock, other than the elapsed time at the end. :ermm:[/QUOTE]

It tries to take CPU utilization into account. So if you removed 10 terms in 20 minutes then you would think that it would be 1 factor per 120 seconds, but if CPU utilization is only 50%, then it would compute 1 factor per 60 seconds.

I'm sure it isn't perfect.

storm5510 2023-03-26 15:25

[QUOTE=rogue;627352]It tries to take CPU utilization into account. So if you removed 10 terms in 20 minutes then you would think that it would be 1 factor per 120 seconds, but if CPU utilization is only 50%, then it would compute 1 factor per 60 seconds.

I'm sure it isn't perfect.[/QUOTE]

I read a short article about CPU time vs. real time last week.

[QUOTE]For example, if a program accesses the CPU for one second every five seconds, then its total CPU time in the span of one minute is 12 seconds.[/QUOTE]

The above is pretty cut-and-dried. Perhaps the following mod might un-muddy the water a bit for others:

[CODE]...22 factors found at 214 [B]cpu[/B] sec per factor... [/CODE]

Citrix 2023-03-26 21:02

Feature Request
 
Is it possible in the next release we could have a command line option to limit primes being tested to certain classes (similar to pfgw -f{n,+-1}). The program currently does not automatically catch these.

Thanks.

henryzz 2023-03-27 09:54

It would also be nice to be able to see the sec/factor for shorter periods than from the start. Starting a sieve and then looking at average sec/factor makes no sense.

p/sec also seems to be based on time elapsed rather than cpu seconds. Some consistency would make sense.

rogue 2023-03-27 12:14

[QUOTE=Citrix;627390]Is it possible in the next release we could have a command line option to limit primes being tested to certain classes (similar to pfgw -f{n,+-1}). The program currently does not automatically catch these. [/QUOTE]

Is this for all sieves or one some sieves?

This could be a fairly major effort.

rogue 2023-03-27 12:22

[QUOTE=henryzz;627411]It would also be nice to be able to see the sec/factor for shorter periods than from the start. Starting a sieve and then looking at average sec/factor makes no sense.

p/sec also seems to be based on time elapsed rather than cpu seconds. Some consistency would make sense.[/QUOTE]

The program tries to compute sec/factor in a way that includes enough time to represent a meaningful average. Obviously the more time then the better the average. This time can be up to 5 days. The best way to compute is to increase the period of time for the calculation as the removal rate slows down.

If you have ideas on computing a better average, please share them.

As for p/sec, I see what you mean. It shouldn't be too hard to adjust.

Citrix 2023-03-28 05:13

[QUOTE=rogue;627413]Is this for all sieves or one some sieves?

This could be a fairly major effort.[/QUOTE]

I mainly use srsieve2/srsieve2cl. I do not think the other sieves have special factor classes.

The changes would only require a few lines of code. We only need to filter out where p%n belongs to certain classes. The sieve of Eratosthenes code does not need to rewritten.

If n=2^x then the modulus step can be even faster
If n is extremely smooth (ex 5 smooth) then the modulus step can be fast as well.

henryzz 2023-03-28 09:34

[QUOTE=rogue;627414]The program tries to compute sec/factor in a way that includes enough time to represent a meaningful average. Obviously the more time then the better the average. This time can be up to 5 days. The best way to compute is to increase the period of time for the calculation as the removal rate slows down.

If you have ideas on computing a better average, please share them.

As for p/sec, I see what you mean. It shouldn't be too hard to adjust.[/QUOTE]

When p changes by a significant factor including it in a longer average makes no sense. For many sieves p will change massively over a 5 day period. I recently did a sieve for 2k minutes(p increased by factor of 7 in sieve) which was estimating 59 sec per factor. I then stopped it and restarted for 100 minutes and the estimate was 152 seconds after 41 factors. This would have been even more extreme if I hadn't already restarted after the first 400 minutes of the sieve.

Something like the last 100 factors would make sense to me(configurable). Something like a std::deque would make it fairly easy to store the cpu time for the last x factors. My only concern is that this could be a bit dodgy when factors are being found very quickly near the beginning of a sieve. Maybe more factors should be considered then; however, the rate of factor finding will be changing rapidly then anyway so it probably doesn't matter too much.

Another option would be to report an expected sec/f based on recent p/sec. That would require the estimated f/p to be accurate. Some of the current sieves don't give accurate estimates for the number of factors. For example, I think last time I used ccsieve, it didn't adjust estimates for sieving multiple terms per candidate.

rogue 2023-03-28 12:46

Citrix, if you have some code, send it my way and I will take a look at how to best integrate it.

henryzz, I might be able to do something along the line of what you suggested, i.e. track rate for the last xx factors as opposed to the last xx minutes, but I don't think that solves the problem. When one starts a new sieve p/sec is lower due to factor validation and other overhead associated with removing candidates due to a factor.

I think that the best option (whatever that option is) would require saving runtime and factor removal details in a file so that upon restart it could read from that file and continue as if the program was not stopped and restarted.

Since you have some concrete ideas regarding the calculation, would you mind experimenting with the code to find an algorithm that computes the rate in a way that makes more sense?

ccsieve and twinsieve both miscalculate since candidates can be removed "more than one way" for each p. If you have ideas on how to better compute the number of candidates to be removed for those sieves, please share.

henryzz 2023-03-28 14:22

[QUOTE=rogue;627471]henryzz, I might be able to do something along the line of what you suggested, i.e. track rate for the last xx factors as opposed to the last xx minutes, but I don't think that solves the problem. When one starts a new sieve p/sec is lower due to factor validation and other overhead associated with removing candidates due to a factor.
[/QUOTE]
If you are tracking the cpu-time that each factor is found at shouldn't factors "age-out" pretty quickly if factors are being found quickly enough that validation is taking a meaningful amount of time. Is the issue that the cpu-time measurement doesn't include validation? I am struggling to see the issue. If validation time is having a significant impact still then it is very unlikely that the sieve should stop.

[QUOTE=rogue;627471]
I think that the best option (whatever that option is) would require saving runtime and factor removal details in a file so that upon restart it could read from that file and continue as if the program was not stopped and restarted.
[/QUOTE]
Saving this sort of information would enable more accurate estimation upon restart although I suspect implementation is overkill. If an estimate has to be provided on a reduced number of factors for a while that isn't too bad. The issue that exists currently is more in long runs where sec/fac has changed significantly and the average sec/fac over the run is meaningless.

[QUOTE=rogue;627471]
Since you have some concrete ideas regarding the calculation, would you mind experimenting with the code to find an algorithm that computes the rate in a way that makes more sense?
[/QUOTE]

I can have a look at experimenting. I plan to implement something for a sieve of my own soon so will probably experiment in that. My sieve doesn't verify factors(currently at least) so millage may vary a little.

[QUOTE=rogue;627471]
ccsieve and twinsieve both miscalculate since candidates can be removed "more than one way" for each p. If you have ideas on how to better compute the number of candidates to be removed for those sieves, please share.[/QUOTE]

If x% of candidates would be removed for one of the ways then I believe for n-ways approximately 1 - (1-x/100)^n candidates should be removed. This isn't perfect as it assumes independence but I believe it is probably a good enough approximation. My only concern is for very small p where the probability of removing both candidates if they were independent would be higher. My sieve sieves tuples so I should be able to test the formula there.

I have limitted time to spend on this so it may take a little while to get around to implementing everything. My aim for my siever is to support a number of different techniques for finding/removing factors so I can compare them in different situations. Tuple sieves can become very sparse which means a sieve array is not ideal but a list of candidates can take time checking if a candidate remains as well.

Citrix 2023-03-29 04:32

[QUOTE=rogue;627471]Citrix, if you have some code, send it my way and I will take a look at how to best integrate it.
[/QUOTE]

Here is the simplest pseudocode
1. Get n from command line and classes to be tested (generally +1 or -1 or both); (if n is 2 all primes need to be tested and we do not need to filter as all primes are odd; 2 is the default value)
2. The location where il_PrimeList is initialized with primes we can insert the following lines of code to filter the primes.
(p is current prime)

[CODE]
while (prime<max)
{
if (n>2 && (p%n==+1 || p%n==-1)) {Enter this prime in the array}
else {skip this prime and go to next prime from iterator}
}
[/CODE]

3. p%n can be calculated faster for certain n values (ex. n=2^x)- though this might not provide any significant speed up. I will let you decide.

storm5510 2023-03-29 17:00

I use [I]p/sec[/I] as a relative speed indicator. On my aging i7 hardware, it will run between 500K and 650K with 7 threads. I have tried 8, but the Windows GUI gets sluggish at times. It is fine at 7. Of course, this throughput depends on how large the series is. There are 36 [I]k's[/I] in the series I am running now for CRUS. A smaller series means a little faster. It would be nice if [I]srsieve2[/I] could run faster, but it would need to maintain the stability it has now. This is where the "not broken, don't fix" idea comes from.

Citrix 2023-04-03 01:01

For version 2_4.0 I am getting

[CODE]

twinsieve.exe -k2 -K1000000 -n1000000 -r -b2 -p3 -P10e14
twinsieve v1.3, a program to find factors of k*b^n+1/-1 numbers for fixed b and n and variable k
Sieve started: 3 < p < 1e15 with 499999 terms (2 < k < 1000000, k*2^1000000) (expecting 484095 factors)
p=0, 0.000 p/sec, no factors found

[/CODE]

Am I doing something wrong?
Thanks

henryzz 2023-04-03 08:57

[QUOTE=Citrix;627748]For version 2_4.0 I am getting

[CODE]

twinsieve.exe -k2 -K1000000 -n1000000 -r -b2 -p3 -P10e14
twinsieve v1.3, a program to find factors of k*b^n+1/-1 numbers for fixed b and n and variable k
Sieve started: 3 < p < 1e15 with 499999 terms (2 < k < 1000000, k*2^1000000) (expecting 484095 factors)
p=0, 0.000 p/sec, no factors found

[/CODE]

Am I doing something wrong?
Thanks[/QUOTE]

You are using an old buggy version. Hopefully an update should solve your issue.

rogue 2023-04-03 12:05

[QUOTE=Citrix;627748]For version 2_4.0 I am getting

[CODE]

twinsieve.exe -k2 -K1000000 -n1000000 -r -b2 -p3 -P10e14
twinsieve v1.3, a program to find factors of k*b^n+1/-1 numbers for fixed b and n and variable k
Sieve started: 3 < p < 1e15 with 499999 terms (2 < k < 1000000, k*2^1000000) (expecting 484095 factors)
p=0, 0.000 p/sec, no factors found

[/CODE]

Am I doing something wrong?
Thanks[/QUOTE]

This was fixed in twinsieve 1.4. The current version is 1.6.

rogue 2023-04-03 13:34

[QUOTE=Citrix;627508]Here is the simplest pseudocode
1. Get n from command line and classes to be tested (generally +1 or -1 or both); (if n is 2 all primes need to be tested and we do not need to filter as all primes are odd; 2 is the default value)
2. The location where il_PrimeList is initialized with primes we can insert the following lines of code to filter the primes.
(p is current prime)

[CODE]
while (prime<max)
{
if (n>2 && (p%n==+1 || p%n==-1)) {Enter this prime in the array}
else {skip this prime and go to next prime from iterator}
}
[/CODE]

3. p%n can be calculated faster for certain n values (ex. n=2^x)- though this might not provide any significant speed up. I will let you decide.[/QUOTE]

Is "n" the only value to be specified on the command line? I cannot change where the il_PrimeList is populated. I would have to eliminate in TestMegaPrimeChunk(). For the GPU the kernel itself would need to ignore.

Citrix 2023-04-04 02:05

[QUOTE=rogue;627758]This was fixed in twinsieve 1.4. The current version is 1.6.[/QUOTE]

[code]

twinsieve.exe -W16 -k2 -K1000000 -n1000000 -r -b2 -p3 -P10e14 -fA -t1 -r
twinsieve v1.6, a program to find factors of k*b^n+1/-1 numbers for fixed b and n and variable k
Sieve started: 3 < p < 1e15 with 2 terms (3 < k < 999999, k*2^1000000) (expecting 2 factors)
Increasing worksize to 80000 since each chunk is tested in less than a second
Increasing worksize to 10000000 since each chunk is tested in less than a second
Increasing worksize to 50000000 since each chunk is tested in less than a second
Decreasing worksize to 25000000 since each chunk needs more than 5 seconds to test
p=82453759727, 57.39M p/sec, 1 factors found at 180 sec per factor (last 1 min), 0.0% done. ETC 2023-04-12 17:16

CTRL-C accepted. Threads will stop after sieving to 158741299859
Sieve interrupted at p=160495427809.
CPU time: 410.95 sec. (21.41 sieving) (4.20 cores)
Fatal Error: Something is wrong. Counted terms (0) != expected terms (1)

[/code]

How does this need to be fixed?

Citrix 2023-04-04 02:13

[QUOTE=rogue;627759]Is "n" the only value to be specified on the command line? I cannot change where the il_PrimeList is populated. I would have to eliminate in TestMegaPrimeChunk(). For the GPU the kernel itself would need to ignore.[/QUOTE]

The value of N and possible classes need to be specified in the command line. If possible can we allow multiple arbitrary number of classes ex. -f{N,a,b,c,d...}. So we allow p%N=a or b or c or d

You can eliminate in TestMegaPrimeChunk() or wherever you think it would be appropriate.

rogue 2023-04-04 12:36

[QUOTE=Citrix;627795][code]

twinsieve.exe -W16 -k2 -K1000000 -n1000000 -r -b2 -p3 -P10e14 -fA -t1 -r
twinsieve v1.6, a program to find factors of k*b^n+1/-1 numbers for fixed b and n and variable k
Sieve started: 3 < p < 1e15 with 2 terms (3 < k < 999999, k*2^1000000) (expecting 2 factors)
Increasing worksize to 80000 since each chunk is tested in less than a second
Increasing worksize to 10000000 since each chunk is tested in less than a second
Increasing worksize to 50000000 since each chunk is tested in less than a second
Decreasing worksize to 25000000 since each chunk needs more than 5 seconds to test
p=82453759727, 57.39M p/sec, 1 factors found at 180 sec per factor (last 1 min), 0.0% done. ETC 2023-04-12 17:16

CTRL-C accepted. Threads will stop after sieving to 158741299859
Sieve interrupted at p=160495427809.
CPU time: 410.95 sec. (21.41 sieving) (4.20 cores)
Fatal Error: Something is wrong. Counted terms (0) != expected terms (1)

[/code]

How does this need to be fixed?[/QUOTE]

For base 2, even k are already removed, so you don't need -r. I broke this in 1.6. I will fix the code. I think I should remove -r and do that automatically. Thoughts?

storm5510 2023-04-04 15:57

[I]srsieve2[/I] uses values of [B]n[/B] and [I]fbncsieve[/I] uses values of [B]k[/B].

I don't believe this is a 1-to-1 relationship. So, what is the conversion value for [B]n[/B] to [B]k[/B]?

rogue 2023-04-04 17:16

[QUOTE=storm5510;627823][I]srsieve2[/I] uses values of [B]n[/B] and [I]fbncsieve[/I] uses values of [B]k[/B].

I don't believe this is a 1-to-1 relationship. So, what is the conversion value for [B]n[/B] to [B]k[/B]?[/QUOTE]

i don't understand the question.

storm5510 2023-04-04 18:18

[QUOTE=rogue;627826]i don't understand the question.[/QUOTE]

You don't understand the question. OK. I will make it simpler.

Is k=1e6 the same as n=1e6?

rogue 2023-04-04 19:30

[QUOTE=storm5510;627829]You don't understand the question. OK. I will make it simpler.

Is k=1e6 the same as n=1e6?[/QUOTE]

Yes. All numeric inputs support scientific notation. They also support "g" and "m" and a few other characters.

storm5510 2023-04-04 23:23

[QUOTE=rogue;627835]Yes. All numeric inputs support scientific notation. They also support "g" and "m" and a few other characters.[/QUOTE]

I think you're still missing the question. Is k=10000 in [I]fbncsieve[/I] equivalent to n=10000 in [I]srsieve2[/I]? I was told quite a few years ago that a small n value was some gigantic number when converted to k.

I don't know how else to word this...

gd_barnes 2023-04-05 00:02

[QUOTE=storm5510;627840]I think you're still missing the question. Is k=10000 in [I]fbncsieve[/I] equivalent to n=10000 in [I]srsieve2[/I]? I was told quite a few years ago that a small n value was some gigantic number when converted to k.

I don't know how else to word this...[/QUOTE]

Why would the multiplier be equivalent to the exponent when talking about k*b^n-1 (or +1) forms? Since both fbncsieve and srsieve2 sieve those specific forms, the question comes across as not making sense.

Fbncsieve sieves a wide range of k. Srsieve2 sieves a wide range of n. Is that what you are asking?

To answer your question: no. They are not equivalent. 3*2^10000-1 is not the same as 10000*2^3-1. I'm not sure why the question had to be asked. The calculator in Windows would have given the answer.

storm5510 2023-04-05 00:46

[QUOTE=gd_barnes;627841]Why would the multiplier be equivalent to the exponent when talking about k*b^n-1 (or +1) forms? Since both fbncsieve and srsieve2 sieve those specific forms, the question comes across as not making sense.

Fbncsieve sieves a wide range of k. Srsieve2 sieves a wide range of n. Is that what you are asking?
[/QUOTE]

No. It is not about ranges. There is a conversion for decimal to binary. Both can be the same number, just expressed in different ways. Example: n could be 15,383 and its equivalent k could be 584,101. It's like comparing pennies to $1 paper bills. Both the same value, but don't look the same.

I will let this go. It is not important. I was just goofing to pass some time.

Citrix 2023-04-05 04:04

[QUOTE=rogue;627809]For base 2, even k are already removed, so you don't need -r. I broke this in 1.6. I will fix the code. I think I should remove -r and do that automatically. Thoughts?[/QUOTE]

I would prefer having a -r option or if -r is present by default then an include option.

rogue 2023-04-05 12:40

[QUOTE=storm5510;627844]Example: n could be 15,383 and its equivalent k could be 584,101. It's like comparing pennies to $1 paper bills. Both the same value, but don't look the same.[/QUOTE]

This doesn't make any sense to me so I don't think you are asking the right question.

Both sieves sieve k*b^n+/-1, but fbncsieve sieves for a fixed n and variable k (expressed as a range using -k and -K). srsieve2 sieves on variable n (expressed as a range using -n and -N) for one or more k (expressed as a sequence, e.g. k*b^n+/-1)

For srsieve2 -n1e6 -N2e6 means that variable n has a value between 1000000 to 2000000.
For fbncsieve -k1e6 -K2e6 means that variable k has a value between 1000000 to 2000000.

henryzz 2023-04-05 13:18

Is he referring to something like 1024*2^n-1 == 2^(n+10)-1 ?

storm5510 2023-04-05 13:59

[QUOTE=rogue;627867][COLOR="Gray"]This doesn't make any sense to me so I don't think you are asking the right question.

Both sieves sieve k*b^n+/-1, but fbncsieve sieves for a fixed n and variable k (expressed as a range using -k and -K). srsieve2 sieves on variable n (expressed as a range using -n and -N) for one or more k (expressed as a sequence, e.g. k*b^n+/-1)[/COLOR]

[COLOR="DarkRed"][B]For srsieve2 -n1e6 -N2e6 means that variable n has a value between 1000000 to 2000000.
For fbncsieve -k1e6 -K2e6 means that variable k has a value between 1000000 to 2000000.[/QUOTE][/B][/COLOR]

The highlighted above is [U]exactly[/U] what I was looking for. Both are numerically weighted the same. It would appear that I was being led-by-the-nose years ago when I was told they were not.

[I]Many thanks, and apologies for the confusion![/I] :smile:

storm5510 2023-04-06 17:04

The error below happens with an inline series, but not with an "abcd" input file.

Example:
[CODE]fbncsieve -k 3 -K 1000000 -p 3 -P 1e10 -W 6 -s "k*1061955^6+1" -o 1e10.abcd[/CODE]

Result:
[CODE]fbncsieve v1.6, a program to find factors of k*b^n+c numbers for fixed b, n, and c and variable k
Sieve started: 3 < p < 1e10 with 499999 terms (3 < k < 1000000, k*1061955^6+1) (expecting 476143 factors)
Increasing worksize to 400000 since each chunk is tested in less than a second
Increasing worksize to 10000000 since each chunk is tested in less than a second
Increasing worksize to 50000000 since each chunk is tested in less than a second
Sieve completed at p=10171501019.
CPU time: 52.31 sec. (2.78 sieving) (4.89 cores)
[B]Fatal Error: Something is wrong. Counted terms (26337) != expected terms (26336)[/B][/CODE]

If I repeat the example line, the counted and expected terms on the bottom line are different each time. The same occurs if I change the series in the example line.

[U]Sorry![/U]

rogue 2023-04-06 17:14

[QUOTE=storm5510;627939]The error below happens with an inline series, but not with an "abcd" input file.

Example:
[CODE]fbncsieve -k 3 -K 1000000 -p 3 -P 1e10 -W 6 -s "k*1061955^6+1" -o 1e10.abcd[/CODE]

Result:
[CODE]fbncsieve v1.6, a program to find factors of k*b^n+c numbers for fixed b, n, and c and variable k
Sieve started: 3 < p < 1e10 with 499999 terms (3 < k < 1000000, k*1061955^6+1) (expecting 476143 factors)
Increasing worksize to 400000 since each chunk is tested in less than a second
Increasing worksize to 10000000 since each chunk is tested in less than a second
Increasing worksize to 50000000 since each chunk is tested in less than a second
Sieve completed at p=10171501019.
CPU time: 52.31 sec. (2.78 sieving) (4.89 cores)
[B]Fatal Error: Something is wrong. Counted terms (26337) != expected terms (26336)[/B][/CODE]

If I repeat the example line, the counted and expected terms on the bottom line are different each time. The same occurs if I change the series in the example line.

[U]Sorry![/U][/QUOTE]

I will take a look. It shouldn't be too hard to fix.

rogue 2023-04-07 15:19

[QUOTE=rogue;627940]I will take a look. It shouldn't be too hard to fix.[/QUOTE]

The code is fixed and commited to sourceforge. This happens only with ABCD formatted output files.

rogue 2023-04-07 15:36

[QUOTE=rogue;627809]For base 2, even k are already removed, so you don't need -r. I broke this in 1.6. I will fix the code.[/QUOTE]

This is fixed and committed to sourceforge.

For odd bases, all odd k are automatically removed.

rogue 2023-04-07 15:43

[QUOTE=Citrix;627390]Is it possible in the next release we could have a command line option to limit primes being tested to certain classes (similar to pfgw -f{n,+-1}). The program currently does not automatically catch these.[/QUOTE]

This is what pfgwdoc.txt has for -f:

[code]
-f[percent][[{Mod_Expr}][{condition}[{condition}...]]]
Modular factoring:
-f{801} uses only primes which are of the form k*801+1
-f{632,-1} uses only primes which are of the form k*632-1
** The {801} and the {632,-1} are the optional {Mod_Expr}
*** NOTE new code added to do both -1 and +1. the format
would be -f{801,+-1} (the +-1 MUST look just like that)
-f{256}{y,8,1) uses only primes which are of the form k*256+1 where
the resultant primes are also of the form j*8+1
-f{256}{n,8,1) uses only primes which are of the form k*256+1 where
the resultant primes are not of the form j*8+1
-f500{256}{y,8,1){y,8,7){n,32,1) uses only primes which are of the
form k*256+1 where the resultant primes are also of the form
j*8+-1 but not j*32+1. There is also a 500% factoring level.
-f{8132}{y,8,1){f,8132} uses only primes which are of the
form k*8132+1 where the resultant primes are also of the form
j*8+1. Also, all factors of 8132 (2,19,107) are checked first.
-f{8132}{y,8,1){p,8133} uses only primes which are of the
form k*8132+1 where the resultant primes are also of the form
j*8+1. Also, ALL primes <= 8133 are checked first.
[/code]

Are you requesting this full functionality for such a switch in srsieve2?

storm5510 2023-04-08 15:02

[QUOTE=rogue;627983]The code is fixed and commited to sourceforge. This happens only with ABCD formatted output files.[/QUOTE]

Something else:
[CODE]fbncsieve v1.6, a program to find factors of k*b^n+c numbers for fixed b, n, and c and variable k
Sieve started: 3 < p < 1e9 with 50001 terms (100000 < k < 200000, k*1955^6+1) (expecting 47350 factors)
Increasing worksize to 1600000 since each chunk is tested in less than a second
Increasing worksize to 200000000 since each chunk is tested in less than a second
[B]Fatal Error: Unable to allocate 8000000080 bytes of memory for N/A[/B][/CODE]

This particular system has 16GB or RAM.

Changing the values behind the caret results in different behaviors. 6+1 is not a good choice. 2+1 generates the previous error. 2-1 and 3-1 are alright. I have probably been trying to make the program do what it was not designed for.

I have [I]srfile[/I] v1.1.4 from 2019. It is unable to read ABCD formats from [I]fbncsieve[/I]. I looked around for something newer but couldn't find one.

[I]Again, I am sorry for all this trouble.[/I] :blush:

rogue 2023-04-08 16:58

[QUOTE=storm5510;628042]Something else:
[CODE]fbncsieve v1.6, a program to find factors of k*b^n+c numbers for fixed b, n, and c and variable k
Sieve started: 3 < p < 1e9 with 50001 terms (100000 < k < 200000, k*1955^6+1) (expecting 47350 factors)
Increasing worksize to 1600000 since each chunk is tested in less than a second
Increasing worksize to 200000000 since each chunk is tested in less than a second
[B]Fatal Error: Unable to allocate 8000000080 bytes of memory for N/A[/B][/CODE]

This particular system has 16GB or RAM.

Changing the values behind the caret results in different behaviors. 6+1 is not a good choice. 2+1 generates the previous error. 2-1 and 3-1 are alright. I have probably been trying to make the program do what it was not designed for.

I have [I]srfile[/I] v1.1.4 from 2019. It is unable to read ABCD formats from [I]fbncsieve[/I]. I looked around for something newer but couldn't find one.

[I]Again, I am sorry for all this trouble.[/I] :blush:[/QUOTE]

Your computer does not have enough memory for the buffer of primes. That is something I haven't run into, but is certainly possible with fbncsieve and gfndsieve (and possibly others). I will see what I can do to address this.

Citrix 2023-04-08 23:27

[QUOTE=rogue;627987]

Are you requesting this full functionality for such a switch in srsieve2?[/QUOTE]

Yes, if possible. That would be very useful.
Note:- we do not need the {Percent}

Also, thank you for fixing the twinsieve code.

storm5510 2023-04-09 00:00

[QUOTE=rogue;628049]Your computer does not have enough memory for the buffer of primes. That is something I haven't run into, but is certainly possible with fbncsieve and gfndsieve (and possibly others). I will see what I can do to address this.[/QUOTE]

Unless I miscounted the digits in the error message, I believe it is indicating a smidge over 8GB. I have 16GB in this system. Roughly 13GB is available at idle.

In any case, I will leave this alone and move on to other things. Again, sorry for all the trouble!

Citrix 2023-04-09 08:19

[QUOTE=storm5510;628072]Unless I miscounted the digits in the error message, I believe it is indicating a smidge over 8GB. I have 16GB in this system. Roughly 13GB is available at idle.

In any case, I will leave this alone and move on to other things. Again, sorry for all the trouble![/QUOTE]

I think the problem is with "k*1955^6+1"

This is too small causing the program to be too fast and hence requiring large amounts of memory. You are better off using pfgw directly for such small numbers.

Try replacing it with "k*1955^60000+1"

henryzz 2023-04-09 11:41

[QUOTE=Citrix;628083]I think the problem is with "k*1955^6+1"

This is too small causing the program to be too fast and hence requiring large amounts of memory. You are better off using pfgw directly for such small numbers.

Try replacing it with "k*1955^60000+1"[/QUOTE]

With tests that small most of the time in pfgw will be in overhead. Hence why sieving is done first.

rogue 2023-04-09 13:36

For some sieves, such as twinsieve and fbncsieve, there is a secondary buffer where primes passing the first phase of the sieve are put into a second buffer for the second phase. So when the number of primes per chunk grows, so does the need for additional memory. This can be a problem because those sieves are so fast that each chunk is really large which put strain on the memory utilization.

The solution requires two changes. First, eliminate that secondary buffer. Second, add a parameter to limit the maximum number of primes per chunk.

storm5510 2023-04-09 18:07

[QUOTE=Citrix;628083]I think the problem is with "k*1955^6+1"

This is too small causing the program to be too fast and hence requiring large amounts of memory. You are better off using pfgw directly for such small numbers.

Try replacing it with "k*1955^60000+1"[/QUOTE]

I extended it out to 10061955, which is my full birthdate. My starting k was 100e3. I wondered how [I]fbncsieve[/I] would respond to a number evenly divisible by 5. The program ran this very fast. This speed was what got me wondering about the absolute values of [I]k[/I] vs. [I]n.[/I] This was resolved. [I]k=500[/I] and [I]n=500[/I] have the same numeric Base 10 weight.

I know how to use [I]pfgw[/I], but I didn't. [I]cllr64[/I] didn't like the taste of the input npg file. The console screen text went from the default light gray to a deep blue which I could not read on a black background. I tried it in [I]PowerShell[/I]. [I]cllr64[/I] ran one line then stopped. At least, I could read it. [I]srfile[/I] was not able to convert an ABCD file to npg from [I]fbncsieve[/I]. I ended up changing the output form to npg.

[I]srsieve2[/I] runs a variation very well. "10061955*6^n-1." n from 100e3 to 500e3. [I]cllr64[/I] accepted the converted npg file just fine.

I got to a point where I needed to get away from the formal GIMPS work and the side projects. Running R42 for CRUS got a little stressful. It took 39 days to get it to P=4e13. [B]gd_barnes[/B] was pleased with the effort.

So, I end up jumping around running one thing or another. Nothing to be submitted. Just goofing. I find it relaxing.

rogue 2023-04-09 21:04

Regarding the memory issue I’m thinking about using the -w switch to lock the size of each prime chunk. Without -w, it will auto adjust the chunks as it runs. This would be easy to implement. Thoughts?

kruoli 2023-04-09 21:06

Maybe adding some special character after the parameter to lock it? E.g. [C]-w 1e6![/C].

rogue 2023-04-13 22:01

I have posted 2.4.6 over at sourceforge. Here are the changes:

[code]
framework:
Support 'f' or 'F' at the end of the -w argument. This will "fix" the
number of primes per CPU workunit and not resize the workunit.

twinsieve: version 1.6.1
Do not apply -r logic to base 2 since even k are already excluded.

fbncsieve: version 1.6.1
Fix issue when generating ABCD file as it counts terms incorrectly.
[/code]

I know that 'l' was suggested and I chose 'f' instead. In any case this was a workable solution.

storm5510 2023-04-13 23:37

[QUOTE=rogue;628429]I have posted 2.4.6 over at sourceforge. Here are the changes:

[code]
framework:
Support 'f' or 'F' at the end of the -w argument. This will "fix" the
number of primes per CPU workunit and not resize the workunit.

twinsieve: version 1.6.1
Do not apply -r logic to base 2 since even k are already excluded.

fbncsieve: version 1.6.1
Fix issue when generating ABCD file as it counts terms incorrectly.
[/code]

I know that 'l' was suggested and I chose 'f' instead. In any case this was a workable solution.[/QUOTE]

Like this: -W6F?

rogue 2023-04-14 02:36

[QUOTE=storm5510;628436]Like this: -W6F?[/QUOTE]

Not quite, more like -w1e6f. Use -w to specify the number of prime per worker. -W is the number of workers and typically would not exceed the number of CPU cores.

If you are going to use that feature, then setting that value higher will improve the rate. For example if you use -w1e6f vs -w1e8f, you will see that -w1e8f is faster. I would only recommend using this under two conditions. First, if you run out of memory, which can happen with the faster sieves. Second, if you want to see if larger prime chunks provide better removal rates for the slower sieves. The downside is that chunks that need a very long time to process will require you to wait longer if you use ^C and you will also likely sieve deeper than you want without it.

storm5510 2023-04-14 14:46

[QUOTE=rogue;628441]Not quite, more like -w1e6f. Use -w to specify the number of prime per worker. -W is the number of workers and typically would not exceed the number of CPU cores.

If you are going to use that feature, then setting that value higher will improve the rate. For example if you use -w1e6f vs -w1e8f, you will see that -w1e8f is faster. I would only recommend using this under two conditions. First, if you run out of memory, which can happen with the faster sieves. Second, if you want to see if larger prime chunks provide better removal rates for the slower sieves. The downside is that chunks that need a very long time to process will require you to wait longer if you use ^C and you will also likely sieve deeper than you want without it.[/QUOTE]

-W6 -w1e8f. Sorry I fudged it. I had to go back and look at all the switches. I've seen [I]srsieve2[/I] resize. Sometimes up and other times down, or both in short order. I have never had a memory problem with it.

Should your [B]^C[/B] above be something else?

rogue 2023-04-14 15:33

[QUOTE=storm5510;628462]Should your [B]^C[/B] above be something else?[/QUOTE]

No. When using ^C, some sieves will process the entire chunk they are currently working on, then terminate. For others it can terminate in the middle of a chunk.

storm5510 2023-04-14 16:14

[QUOTE=rogue;628469]No. When using ^C, some sieves will process the entire chunk they are currently working on, then terminate. For others it can terminate in the middle of a chunk.[/QUOTE]

OK. Mine seems to always finish the chunk then drop out to the prompt. A bit of patience is required. :smile:

pepi37 2023-04-16 01:30

I do some testing with latest srsieve2cl with single sequence

Win10 , RTX3060Ti with 8 GB VRAM

g 32 0.34 core 6.632Mp/s
g 1000 1 core 13.68Mp/s
g 5000 1 core 13.89Mp/s
g 16834 1 core 13.61Mp/s
G10 g32 2.36 core 15.08MP/s
G3 g400 1 core 15.58MP/s
G30 g10 5.25 core 14.56MP/s
G2 g1782 1.1 core 15.41MP/s

On same sequence CPU with 8 Workers ( 8 cores) has around 17.8 Mp/s
If nothing else, CPU draw less then GPU :) Speed is near same

pepi37 2023-04-16 11:31

And very important additional info to my post above: you will get those values only [B][COLOR="Red"]if your GPU is in PCiex16 slot[/B][/COLOR].
I compile srsieve2cl on my small rig where cards are on risers, and fastest I can get on 2070 Super is only 172K p/sec

rogue 2023-04-16 13:37

Note that some of the command line switches might give you a performance boost. These same switches could hurt performance. Play around with the Q/U/V/X switches.

pepi37 2023-04-16 14:31

[QUOTE=rogue;628595]Note that some of the command line switches might give you a performance boost. These same switches could hurt performance. Play around with the Q/U/V/X switches.[/QUOTE]

[QUOTE]-U --bmmulitplier=U multiplied by 2 to compute BASE_MULTIPLE (default 15 for single 1 for multi
default BASE_MULTIPLE=30, BASE_MULTIPLE=2 for multi)
-V --prmmultiplier=V multiplied by BASE_MULTIPLE to compute POWER_RESIDUE_LCM (default 24 for single 360 for multi
default POWER_RESIDUE_LCM=360, POWER_RESIDUE_LCM=360 for multi)
-X --lbmultipler=X multiplied by POWER_RESIDUE_LCM to compute LIMIT_BASE (default 1 for single 1 for multi
default LIMIT_BASE=24, LIMIT_BASE=360 for multi)[/QUOTE]

I cannot even understand what is written here, using those switches to me is big mystery.
Any manual, samples, anything?

For example

[QUOTE]q = 2 with 16 subseq yields bs = 2679, gs = 168, work = 5375
q = 4 with 30 subseq yields bs = 2587, gs = 87, work = 5212
q = 8 with 51 subseq yields bs = 2394, gs = 47, work = 4817
q = 16 with 102 subseq yields bs = 2446, gs = 23, work = 4844
q = 32 with 203 subseq yields bs = 2344, gs = 12, work = 4884
q = 64 with 406 subseq yields bs = 2344, gs = 6, work = 4987
q = 3 with 36 subseq yields bs = 3297, gs = 91, work = 6589
q = 6 with 38 subseq yields bs = 2381, gs = 63, work = 4794
q = 12 with 65 subseq yields bs = 2206, gs = 34, work = 4450
q = 24 with 111 subseq yields bs = 2084, gs = 18, work = 4143
q = 48 with 222 subseq yields bs = 2084, gs = 9, work = 4204
q = 96 with 434 subseq yields bs = 1876, gs = 5, work = 4287
q = 192 with 868 subseq yields bs = 2344, gs = 2, work = 4562
q = 9 with 100 subseq yields bs = 3126, gs = 32, work = 6372
q = 18 with 104 subseq yields bs = 2273, gs = 22, work = 4615
q = 36 with 175 subseq yields bs = 2084, gs = 12, work = 4279
q = 72 with 297 subseq yields bs = 2084, gs = 6, work = 4035
q = 144 with 594 subseq yields bs = 2084, gs = 3, work = 4204
q = 288 with 1163 subseq yields bs = 1563, gs = 2, work = 4556
q = 576 with 2322 subseq yields bs = 1564, gs = 1, work = 5218
q = 5 with 74 subseq yields bs = 3674, gs = 49, work = 7333
q = 10 with 79 subseq yields bs = 2648, gs = 34, work = 5373
q = 20 with 147 subseq yields bs = 2648, gs = 17, work = 5220
q = 40 with 241 subseq yields bs = 2250, gs = 10, work = 4784
q = 80 with 481 subseq yields bs = 2250, gs = 5, work = 4903
q = 160 with 957 subseq yields bs = 2813, gs = 2, work = 5222
q = 320 with 1914 subseq yields bs = 2813, gs = 1, work = 5717
q = 15 with 166 subseq yields bs = 3158, gs = 19, work = 6389
q = 30 with 176 subseq yields bs = 2308, gs = 13, work = 4687
q = 60 with 299 subseq yields bs = 2143, gs = 7, work = 4398
q = 120 with 491 subseq yields bs = 1876, gs = 4, work = 4120
q = 240 with 977 subseq yields bs = 1876, gs = 2, work = 4389
q = 480 with 1908 subseq yields bs = 1876, gs = 1, work = 4883
q = 960 with 3816 subseq yields bs = 939, gs = 1, work = 6953
q = 45 with 462 subseq yields bs = 2858, gs = 7, work = 6308
q = 90 with 482 subseq yields bs = 2001, gs = 5, work = 4667
q = 180 with 805 subseq yields bs = 2501, gs = 2, work = 4559
q = 360 with 1312 subseq yields bs = 2501, gs = 1, work = 4590
q = 720 with 2612 subseq yields bs = 1251, gs = 1, work = 5412
q = 1440 with 5109 subseq yields bs = 626, gs = 1, work = 8787
q = 2880 with 10202 subseq yields bs = 314, gs = 1, work = 16613
q = 1 with 15 subseq yields bs = 3674, gs = 245, work = 7356[/QUOTE]

what parameters you recommend to me, looking at this report?

rogue 2023-04-16 15:41

The program will choose the q with the lowest value for work. "work" is an estimate of the effort to do a discrete log for each p. The lower the "work", then the more p can be tested per second. This is just an estimate. Reality is sometimes different. I recommend sieving to 1e9 (or deeper) to eliminate terms with small factors as they will skew the results. Take the file of remaining candidates and run a range of at least 1e9 (e.g. 10e9 to 11e9) for each q that is within 20% of the q with the lowest value for work. Look at srsieve2.log or the console output to see which value for -q executed that range in the shortest period of time. That will most often be the default value for q, but not always.

As or U/V/X those are a bit more nuanced and can impact the q which has the lowest value for work. You can play around with these if you want to squeeze out more performance, Some combinations of U/V/X won't work. In other words they might result in invalid factors (the program will terminate if that happens).

I do not have a way today to test all the various combinations to determine which is best. I have thought about adding a command line switch, but I think that would be a lot of work. For now choosing the best q/U/V/X values for each set of sequences can only be done manually.

pepi37 2023-04-16 15:50

-Q give report
-q can be used for user input

cxc 2023-04-17 08:39

I have a possibly odd question – are there versions of mtsieve binaries compiled for Mac Intel that can be downloaded from somewhere? I have a new machine (which is Metal) so it would be impossible to run mtsieve on the current setup as mtsieve doesn’t support Metal (yet); the older machine refuses to compile mtsieve with the old version of Xcode I have installed there, and updating Xcode doesn’t seem to fix the problem. (And if I were to try compiling on the new machine the binary wouldn’t execute on the old machine, so I seem to be stuck.)

After a day of bashing my head against the proverbial brick wall (Xcode) and finding it impervious, I thought I’d ask here to see if anyone has anything that might help. I’m looking to do searching on Fermat numbers, so specifically I think I’m after a binary of the gfn_divsor sieve.

rogue 2023-04-17 12:55

[QUOTE=cxc;628643]I have a possibly odd question – are there versions of mtsieve binaries compiled for Mac Intel that can be downloaded from somewhere? I have a new machine (which is Metal) so it would be impossible to run mtsieve on the current setup as mtsieve doesn’t support Metal (yet); the older machine refuses to compile mtsieve with the old version of Xcode I have installed there, and updating Xcode doesn’t seem to fix the problem. (And if I were to try compiling on the new machine the binary wouldn’t execute on the old machine, so I seem to be stuck.)

After a day of bashing my head against the proverbial brick wall (Xcode) and finding it impervious, I thought I’d ask here to see if anyone has anything that might help. I’m looking to do searching on Fermat numbers, so specifically I think I’m after a binary of the gfn_divsor sieve.[/QUOTE]

I can build on OS X. Please PM or e-mail to talk about the issues you are running into when compiling.

rogue 2023-04-21 19:37

I am working on an experimental change to srsieve2. With this change I am adding a -S parameter. With this parameter one can split the input file by q. srsieve2 will determine the best q for each sequence in the file then write that sequence (or terms for that sequence) to one file per q. In theory each file can be run with srsieve2. For example if I take a file with 6108 sequences, which will have varying best q for each k, it will spit out these files:

[code]
Split 6108 base 3 sequences into 17624 base 3^6 sequences.
1 sequences with 788 terms written to q006_b3_n.abcd
679 sequences with 873666 terms written to q012_b3_n.abcd
3 sequences with 2255 terms written to q015_b3_n.abcd
1 sequences with 1506 terms written to q016_b3_n.abcd
59 sequences with 62472 terms written to q018_b3_n.abcd
5 sequences with 10979 terms written to q020_b3_n.abcd
1579 sequences with 1635356 terms written to q024_b3_n.abcd
126 sequences with 147455 terms written to q030_b3_n.abcd
1569 sequences with 1647256 terms written to q036_b3_n.abcd
11 sequences with 17332 terms written to q040_b3_n.abcd
813 sequences with 734827 terms written to q048_b3_n.abcd
1090 sequences with 1194326 terms written to q060_b3_n.abcd
171 sequences with 112236 terms written to q072_b3_n.abcd
1 sequences with 422 terms written to q090_b3_n.abcd
[/code]

Note that only 1 sequence of the 6108 have a q of 6, yet all are being sieved with that q. Note that srsieve2 might not choose that q when running the file associated with that q. This is due to how it compute the work for the combined k for that file. With limited testing I have seen that by using the q in the file name with the -q parameter actually out-performs the one that srsieve2 would choose. For example with the q036_b3_n.abcd file above I could get 181K p/sec with -q36, but the default q of 12 only yields 142K p/sec. With all 6108 sequences srsieve2cl chooses q of 6 and gets only 35K p/sec, which is pretty much the same speed as q of 12 with 1569 sequences. So one quarter of the sequences gives over 5x of the speed with -q36. I did run q006_b3_n.abcd which uses Legendre tables. It run at about 10M p/sec, which is worse than running the entire file. I'm thinking that the best option is to "peel off" the files with the most sequences and test them with the desired q to the desired depth, then combine the remaining into a single file and sieve them to the desired depth. It might even be possible that each q has a different optimal sieving depth. This needs a lot more experimentation, so when the code is ready I will post on sourceforge.

For new sieves, it has to sieve a bit as that will remove most of the n as the remaining n have an impact on the best q. In this case it will sieve up to a maximum of 2^16. Only the sequences are output and not the file of sequences as that file could be very large. Here is what that output looks like:

[code]
Split 12000 base 3 sequences into 23223 base 3^3 sequences.
1212 sequences for q 12 written to q012_b3.in
10 sequences for q 15 written to q015_b3.in
3 sequences for q 16 written to q016_b3.in
105 sequences for q 18 written to q018_b3.in
2 sequences for q 20 written to q020_b3.in
3171 sequences for q 24 written to q024_b3.in
214 sequences for q 30 written to q030_b3.in
3076 sequences for q 36 written to q036_b3.in
17 sequences for q 40 written to q040_b3.in
1 sequences for q 45 written to q045_b3.in
1700 sequences for q 48 written to q048_b3.in
2166 sequences for q 60 written to q060_b3.in
320 sequences for q 72 written to q072_b3.in
1 sequences for q 80 written to q080_b3.in
2 sequences for q 90 written to q090_b3.in
[/code]

The reason I did this is because we have some conjectures over at CRUS with many thousands, if not tens of thousands or hundreds of thousands of sequences. Since srsieve2cl is memory constrained much more than srsieve2, finding a way to split the various sequences optimally is very important.

This could benefit those who want to split sequences across multiple CPUs or multiple computers. This could benefit those who still use sr2sieve as the logic for selection of q is the same between the programs. IIRC, sr2sieve allows you to specify q on the command line. Use srsieve2 to split the sequences and use the output file as input to sr2sieve. Note that since sr2sieve cannot start with a file of sequences you will have to presieve to some low p with srsieve2, then use -S to split the terms.

One more thing, there is a limit of 2^15 babySteps so too many sequences can yield a assertion error. So if you have tens of thousands or hundreds of thousands of sequences you will need to split into smaller sets of sequences before using -S.

storm5510 2023-04-22 15:18

It is really difficult to create a new series from scratch with [I]srsieve2[/I], example k*1923^n-1. I wrote a script to write the series one k at a time. I end up with many millions of remaining terms in a small range, like k from 2 to 1000. I suspect that I am not using the correct program. Ideas?

rogue 2023-04-22 15:29

[QUOTE=storm5510;629119]It is really difficult to create a new series from scratch with [I]srsieve2[/I], example k*1923^n-1. I wrote a script to write the series one k at a time. I end up with many millions of remaining terms in a small range, like k from 2 to 1000. I suspect that I am not using the correct program. Ideas?[/QUOTE]

Absolutely, but all sequences must have the same base. You can use -s as many times as you want to add sequences or you can use -s with an input file with a sequence on each line. I do this all of the time. Since I have a GPU, I never use srsieve, sr1sieve, or sr2sieve.

rebirther 2023-04-22 15:49

Is it possible to add a parameter to define the writing outputfile time to replace the 1h fixed code for srsieve2?

rogue 2023-04-22 17:23

[QUOTE=rebirther;629123]Is it possible to add a parameter to define the writing outputfile time to replace the 1h fixed code for srsieve2?[/QUOTE]

Yes, but I'm not inclined to add one. What is wrong with writing that file once per hour?

rebirther 2023-04-22 17:26

[QUOTE=rogue;629129]Yes, but I'm not inclined to add one. What is wrong with writing that file once per hour?[/QUOTE]

Its more userfriendly to define shorter times for small bases and longer times for bigger bases.

storm5510 2023-04-22 17:58

[QUOTE=rogue;629120]Absolutely, but all sequences must have the same base. You can use -s as many times as you want to add sequences or you can use -s with an input file with a sequence on each line. I do this all of the time. Since I have a GPU, I never use srsieve, sr1sieve, or sr2sieve.[/QUOTE]

I have a RTX-2080 and can use [I]srsieve2cl[/I]. It just seemed like what I was doing was the long way around.

My input file contained the same base for all sequences. I ended up with 46-million terms in my first try with P=1e9 for k from 2 to 1000.

I have gotten really good throughput using Legendre tables with [I]srsieve2[/I]. Many times it was faster than [I]srsieve2cl[/I]. Still, I will give it a try again.

rogue 2023-04-22 20:14

[QUOTE=rebirther;629131]Its more userfriendly to define shorter times for small bases and longer times for bigger bases.[/QUOTE]

Why? I do not understand.

rogue 2023-04-22 20:16

[QUOTE=storm5510;629133]I have a RTX-2080 and can use [I]srsieve2cl[/I]. It just seemed like what I was doing was the long way around.

My input file contained the same base for all sequences. I ended up with 46-million terms in my first try with P=1e9 for k from 2 to 1000.

I have gotten really good throughput using Legendre tables with [I]srsieve2[/I]. Many times it was faster than [I]srsieve2cl[/I]. Still, I will give it a try again.[/QUOTE]

What inputs you are using for the programs? Can you also show some outputs while running the same range?

rebirther 2023-04-22 20:25

[QUOTE=rogue;629143]Why? I do not understand.[/QUOTE]

I would be happy in sieving most of the time to write the output every 15-30min while Ism testing to not loosing time.

storm5510 2023-04-22 23:32

[QUOTE=rogue;629144]What inputs you are using for the programs? Can you also show some outputs while running the same range?[/QUOTE]

A short "input.txt as a series sample."

[CODE][SIZE="3"]100*1032^n-1
101*1032^n-1
102*1032^n-1
103*1032^n-1
104*1032^n-1
105*1032^n-1
106*1032^n-1
107*1032^n-1
108*1032^n-1
109*1032^n-1
110*1032^n-1
111*1032^n-1
112*1032^n-1
113*1032^n-1
114*1032^n-1
115*1032^n-1
[/SIZE][/CODE]


[CODE]srsieve2 -n3 -N100e3 -P100e6 -W4 -l4M -sinput.txt -o100e6.abcd[/CODE]

[CODE]srsieve2 v1.6.9, a program to find factors of k*b^n+c numbers for fixed b and variable k and n
(kp) Sequence has algebraic factorization: 100*1032^n-1 -> (10^2)*1032^n-1
(kp) Sequence 100*1032^n-1 has 49999 terms removed due to algebraic factors of the form 10*1032^(n/2)-1
Sieving with generic logic for p >= 3
Sieve started: 3 < p < 1e8 with 1549969 terms (3 < n < 100000, k*1032^n-1) (expecting 1457529 factors)
Sieving with multi-sequence c=1 logic for p >= 1032
BASE_MULTIPLE = 2, POWER_RESIDUE_LCM = 720, LIMIT_BASE = 720
Split 16 base 1032 sequences into 431 base 1032^60 sequences.
Legendre summary: Approximately 84 KB needed for Legendre tables
16 total sequences
16 are eligible for Legendre tables
0 are not eligible for Legendre tables
16 have Legendre tables in memory
0 cannot have Legendre tables in memory
0 have Legendre tables loaded from files
16 required building of the Legendre tables
1382400 bytes used for congruent subseq indices
212000 bytes used for congruent subseqs
Increasing worksize to 80000 since each chunk is tested in less than a second
Increasing worksize to 400000 since each chunk is tested in less than a second
Sieve completed at p=100000007.
CPU time: 52.19 sec. (0.00 sieving) (3.08 cores)
141591 terms written to 100e6.abcd
Primes tested: 5761135. Factors found: 1408378. Remaining terms: 141591. Time: 16.97 seconds.
[/CODE]

This short list produces a lot of remaining terms. Imagine the series list containing a thousand, or more. My -P setting may be too low. I changed it to -P1e9 and ran it again. 126,224 remaining.

rogue 2023-04-23 03:25

[QUOTE=rebirther;629145]I would be happy in sieving most of the time to write the output every 15-30min while Ism testing to not loosing time.[/QUOTE]

IIUC you start sieving, then while sieving use the output file to start testing at the same time. I have never used srsieve2 in that way. I suspect that isn't an efficient use of the CPU.

rogue 2023-04-23 03:29

[QUOTE=storm5510;629152]A short "input.txt as a series sample."

[CODE]srsieve2 -n3 -N100e3 -P100e6 -W4 -l4M -sinput.txt -o100e6.abcd[/CODE]

[CODE]srsieve2 v1.6.9, a program to find factors of k*b^n+c numbers for fixed b and variable k and n
(kp) Sequence has algebraic factorization: 100*1032^n-1 -> (10^2)*1032^n-1
(kp) Sequence 100*1032^n-1 has 49999 terms removed due to algebraic factors of the form 10*1032^(n/2)-1
Sieving with generic logic for p >= 3
Sieve started: 3 < p < 1e8 with 1549969 terms (3 < n < 100000, k*1032^n-1) (expecting 1457529 factors)
Sieving with multi-sequence c=1 logic for p >= 1032
Sieve completed at p=100000007.
CPU time: 52.19 sec. (0.00 sieving) (3.08 cores)
141591 terms written to 100e6.abcd
Primes tested: 5761135. Factors found: 1408378. Remaining terms: 141591. Time: 16.97 seconds.
[/CODE]

This short list produces a lot of remaining terms. Imagine the series list containing a thousand, or more. My -P setting may be too low. I changed it to -P1e9 and ran it again. 126,224 remaining.[/QUOTE]

You need to sieve to 1e7 then use that output to sieve to 1e10 (or deeper) with srsieve2 with Legendre tables and srsieve2cl without Legendre tables. Can you share the output of those two runs? Run srsieve2cl with -H to show the GPU being used. If I had to guess all of those small k mean that you have small Legendre tables. Larger k will likely penalize you more.

rebirther 2023-04-23 06:38

[QUOTE=rogue;629158]IIUC you start sieving, then while sieving use the output file to start testing at the same time. I have never used srsieve2 in that way. I suspect that isn't an efficient use of the CPU.[/QUOTE]

no, Iam not using the outputfile as inputfile.

rogue 2023-04-23 13:09

[QUOTE=rebirther;629162]no, Iam not using the outputfile as inputfile.[/QUOTE]

I do not understand. When you use ^C the output file will be written. If you are not using ^C, then it seems to me that you want that output file to be written more frequently, so instead of stopping srsieve2, you take the output file it has written then start PRP testing at the same time.

You can get close to the behavior your want by using -O to save the factors. Factors are immediately to the file so you can use -I with -i and -A to apply factors to any input file to create a new output file. And if you have primes (CRUS type project) you can also add -R along with -I, -i, and -A to remove those sequences.

storm5510 2023-04-23 14:49

[QUOTE=rogue;629159]You need to sieve to 1e7 then use that output to sieve to 1e10 (or deeper) with srsieve2 with Legendre tables and srsieve2cl without Legendre tables. Can you share the output of those two runs? Run srsieve2cl with -H to show the GPU being used. If I had to guess all of those small k mean that you have small Legendre tables. Larger k will likely penalize you more.[/QUOTE]

Output? I take this to mean the screen outputs which are below. 1e7 first then 1e10. ABCD's are not much to look at.

[I]srsieve2cl[/I] is a underperformer on my hardware. [I]GPU-Z[/I] indicates my 2080 is underpowered. [I]srsieve2cl -H[/I] generates and error. It shows my GPU when I use the -h switch as device 0, which it should be.

[CODE][SIZE="2"]srsieve2 v1.6.9, a program to find factors of k*b^n+c numbers for fixed b and variable k and n
(kp) Sequence has algebraic factorization: 100*1032^n-1 -> (10^2)*1032^n-1
(kp) Sequence 100*1032^n-1 has 49999 terms removed due to algebraic factors of the form 10*1032^(n/2)-1
Sieving with generic logic for p >= 3
Sieve started: 3 < p < [B]1e7[/B] with 1549969 terms (3 < n < 100000, k*1032^n-1) (expecting 1444323 factors)
Sieving with multi-sequence c=1 logic for p >= 1032
BASE_MULTIPLE = 2, POWER_RESIDUE_LCM = 720, LIMIT_BASE = 720
Split 16 base 1032 sequences into 431 base 1032^60 sequences.
Legendre summary: Approximately 84 KB needed for Legendre tables
16 total sequences
16 are eligible for Legendre tables
0 are not eligible for Legendre tables
16 have Legendre tables in memory
0 cannot have Legendre tables in memory
0 have Legendre tables loaded from files
16 required building of the Legendre tables
1382400 bytes used for congruent subseq indices
212000 bytes used for congruent subseqs
Increasing worksize to 80000 since each chunk is tested in less than a second
Increasing worksize to 400000 since each chunk is tested in less than a second
Sieve completed at p=10000019.
CPU time: 8.31 sec. (0.00 sieving) (2.03 cores)
161420 terms written to 1e7.abcd
Primes tested: 664259. Factors found: 1388549. Remaining terms: 161420. Time: 4.10 seconds.[/SIZE][/CODE]

[CODE][SIZE="2"]srsieve2 v1.6.9, a program to find factors of k*b^n+c numbers for fixed b and variable k and n
Sieving with multi-sequence c=1 logic for p >= 10000019
BASE_MULTIPLE = 2, POWER_RESIDUE_LCM = 720, LIMIT_BASE = 720
Split 16 base 1032 sequences into 431 base 1032^60 sequences.
Legendre summary: Approximately 84 KB needed for Legendre tables
16 total sequences
16 are eligible for Legendre tables
0 are not eligible for Legendre tables
16 have Legendre tables in memory
0 cannot have Legendre tables in memory
0 have Legendre tables loaded from files
16 required building of the Legendre tables
1382400 bytes used for congruent subseq indices
212000 bytes used for congruent subseqs
Sieve started: 10000019 < p < [B]1e10[/B] with 161420 terms (3 < n < 99998, k*1032^n-1) (expecting 48426 factors)
Increasing worksize to 400000 since each chunk is tested in less than a second
p=716679811, 629.1K p/sec, 33439 factors found at 93.50 f/sec (last 1 min), 7.1% done. ETC 2023-04-23 10:18
p=1515103531, 626.0K p/sec, 37787 factors found at 11.98 f/sec (last 1 min), 15.1% done. ETC 2023-04-23 10:17
p=2329318477, 623.9K p/sec, 40222 factors found at 6.710 f/sec (last 1 min), 23.2% done. ETC 2023-04-23 10:16
p=3160401193, 622.4K p/sec, 41908 factors found at 4.646 f/sec (last 1 min), 31.5% done. ETC 2023-04-23 10:16
p=3964469057, 621.1K p/sec, 43212 factors found at 3.593 f/sec (last 1 min), 39.6% done. ETC 2023-04-23 10:16
p=4805565209, 620.0K p/sec, 44222 factors found at 2.783 f/sec (last 1 min), 48.0% done. ETC 2023-04-23 10:16
p=5648664317, 619.1K p/sec, 45121 factors found at 2.478 f/sec (last 1 min), 56.4% done. ETC 2023-04-23 10:16
p=6484736749, 618.4K p/sec, 45842 factors found at 1.988 f/sec (last 1 min), 64.8% done. ETC 2023-04-23 10:16
p=7339403521, 617.8K p/sec, 46476 factors found at 1.748 f/sec (last 1 min), 73.4% done. ETC 2023-04-23 10:16
p=8176309507, 617.4K p/sec, 47013 factors found at 1.480 f/sec (last 1 min), 81.7% done. ETC 2023-04-23 10:16
p=9036196537, 616.7K p/sec, 47541 factors found at 1.455 f/sec (last 1 min), 90.4% done. ETC 2023-04-23 10:16
p=9882168259, 615.9K p/sec, 47953 factors found at 1.136 f/sec (last 1 min), 98.8% done. ETC 2023-04-23 10:16
Sieve completed at p=10000000019.
CPU time: 4387.84 sec. (0.06 sieving) (5.94 cores)
113428 terms written to 1e10.abcd
Primes tested: 454387933. Factors found: 47992. Remaining terms: 113428. Time: 739.06 seconds.
[/SIZE][/CODE]

What I am trying to do is understand how all the tables [URL="http://www.noprimeleftbehind.net/crus/Riesel-conjecture-reserves.htm"]here[/URL] were created.

rebirther 2023-04-23 15:03

[QUOTE=rogue;629182]I do not understand. When you use ^C the output file will be written. If you are not using ^C, then it seems to me that you want that output file to be written more frequently, so instead of stopping srsieve2, you take the output file it has written then start PRP testing at the same time.

You can get close to the behavior your want by using -O to save the factors. Factors are immediately to the file so you can use -I with -i and -A to apply factors to any input file to create a new output file. And if you have primes (CRUS type project) you can also add -R along with -I, -i, and -A to remove those sequences.[/QUOTE]

Its only helping if the program is crashing then the current output can be used as input. Iam only sieving 2.5-10k

rogue 2023-04-23 15:08

[QUOTE=storm5510;629190]Output? I take this to mean the screen outputs which are below. 1e7 first then 1e10. ABCD's are not much to look at.

[I]srsieve2cl[/I] is a underperformer on my hardware. [I]GPU-Z[/I] indicates my 2080 is underpowered. [I]srsieve2cl -H[/I] generates and error. It shows my GPU when I use the -h switch as device 0, which it should be.

What I am trying to do is understand how all the tables [URL="http://www.noprimeleftbehind.net/crus/Riesel-conjecture-reserves.htm"]here[/URL] were created.[/QUOTE]

If the GPU is "underpowered", then that is something out of my control. Maybe you need a larger power supply. Maybe it is not seated correctly into the slot. What error do you get with srsieve2cl?

There is a number of steps that go into the creation of those tables. It starts with a program that computes the covering set for each base which yields the conjectured k. This is the smallest k such that k*b^n+1 (Sierpinski) or k*b^n-1 (Riesel) is composite for all n. That program is available on mersenneforum, but I would have to do some digging to find it. Once we know the conjectured k for that base we need to find a prime for each k less than that conjectured k. srbsieve (one of my creations that can be found on this forum) can be used to find the primes for the small k. This can eliminate well over 90% of the k less than the conjectured k. From there one has to use a cobimation of srsieve2/srsieve2cl/srsieve/sr2sieve/sr1sieve with llr/pfgw to find primes for higher n.

gd_barnes (Gary) maintains those pages, although some like [url]http://www.noprimeleftbehind.net/crus/vstats_new/crus-unproven.htm[/url] are generated by code that reads the details from the other pages that Gary maintains.

storm5510 2023-04-23 16:49

[QUOTE=rogue;629192]If the GPU is "underpowered", then that is something out of my control. Maybe you need a larger power supply. Maybe it is not seated correctly into the slot. What error do you get with srsieve2cl?[/QUOTE]

It is seated correctly into the slot. The PSU is 750W. No error. OpenCL, it struggles with. CUDA, not really. For [I]srsieve2[/I] and [I]srsieve2cl[/I], I gauge them with p/sec. [I]srsieve2cl[/I] never goes above 50K. I have seen [I]srsieve2[/I] run at 600K, or more.

For me, this is a learning experience only. I don't plan to post anything.

rogue 2023-04-23 17:30

[QUOTE=storm5510;629199]It is seated correctly into the slot. The PSU is 750W. No error. OpenCL, it struggles with. CUDA, not really. For [I]srsieve2[/I] and [I]srsieve2cl[/I], I gauge them with p/sec. [I]srsieve2cl[/I] never goes above 50K. I have seen [I]srsieve2[/I] run at 600K, or more.

For me, this is a learning experience only. I don't plan to post anything.[/QUOTE]

The poor performance doesn't make any sense to me.

rogue 2023-04-23 23:00

With a couple of small changes I ran all remaining k for S3 with a single run of srsieve2 with the new changes. Here are the results:

[code]
Split 411412 base 3 sequences into 411412 base 3^1 sequences.
51 sequences for q 6 written to q006_b3.in
46406 sequences for q 12 written to q012_b3.in
351 sequences for q 15 written to q015_b3.in
139 sequences for q 16 written to q016_b3.in
3648 sequences for q 18 written to q018_b3.in
89 sequences for q 20 written to q020_b3.in
104877 sequences for q 24 written to q024_b3.in
8084 sequences for q 30 written to q030_b3.in
107384 sequences for q 36 written to q036_b3.in
610 sequences for q 40 written to q040_b3.in
41 sequences for q 45 written to q045_b3.in
55312 sequences for q 48 written to q048_b3.in
73656 sequences for q 60 written to q060_b3.in
10715 sequences for q 72 written to q072_b3.in
7 sequences for q 80 written to q080_b3.in
42 sequences for q 90 written to q090_b3.in
[/code]

If anyone is interested, I have posted the Windows exe over at sourceforge. I have to do some additional testing before I push the code updates because I had to change the HashTable class to use a larger primitive so I don't know if that will impact performance.

storm5510 2023-04-23 23:38

[QUOTE=rogue;629201]The poor performance doesn't make any sense to me.[/QUOTE]

[I]mfaktc[/I] often averages 3000GHz-d/day. [I]gpuOwl[/I] finishes wavefront P-1's in three hours. So, no, it doesn't make sense to me either.

I found two versions of [I]srbsieve[/I]. One from 2015 and another a bit newer I believe. I was unable to get either to do anything. No matter.

I can test-drive your latest srsieve2 release. It will give me something to do. :smile:

storm5510 2023-04-26 17:12

[QUOTE=rogue;629201]The poor performance doesn't make any sense to me.[/QUOTE]

I have gotten [I]srsieve2cl [/I]to perform better than [I]srsieve2[/I]. It's in the GPU specific parameters. I have it set to -g24 -G8 and -M2500. I don't know if these are what they should be or not. The program is stable though.

rogue 2023-04-26 17:33

[QUOTE=storm5510;629441]I have gotten [I]srsieve2cl [/I]to perform better than [I]srsieve2[/I]. It's in the GPU specific parameters. I have it set to -g24 -G8 and -M2500. I don't know if these are what they should be or not. The program is stable though.[/QUOTE]

Interesting. You typically only need to set -M for initial sieving due to higher factor density. How many combinations of -g and -G have you played with? What if you use -g128 or -g256 without -G?

kruoli 2023-04-26 17:40

Please try [C]-g46 -G1[/C] and [C]-g46 -G2[/C] and leave out [C]-M[/C].

Assuming, this is a "normal" 2080 and not a 2080 Super (it would be [C]-g48[/C]) or 2080 Ti (it would be [C]-g68[/C]).

rogue 2023-04-26 20:06

[QUOTE=kruoli;629444]Please try [C]-g46 -G1[/C] and [C]-g46 -G2[/C] and leave out [C]-M[/C].

Assuming, this is a "normal" 2080 and not a 2080 Super (it would be [C]-g48[/C]) or 22080 Ti (it would be [C]-g68[/C]).[/QUOTE]

Why -g46? Normally I see multiples of 8 working best.

henryzz 2023-04-26 20:15

[QUOTE=rogue;629450]Why -g46? Normally I see multiples of 8 working best.[/QUOTE]

The number of streaming multiprocessors for the 2080 is 46.


All times are UTC. The time now is 14:47.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2023, Jelsoft Enterprises Ltd.