mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   Software (https://www.mersenneforum.org/forumdisplay.php?f=10)
-   -   mtsieve (https://www.mersenneforum.org/showthread.php?t=23042)

SethTro 2022-11-22 19:58

I'm glad these could get integrating.

I wanted to find a few primes in [URL="https://oeis.org/A063679"]a sequence[/URL] and I was so happy to find a full feature sieving tool already existed for my problem. My first attempt was single-threaded and missed factors so it was great to find sr2sieve.

I also appreciate that it's fully open source and I could modify and improve it; as I needed this to hack around the all terms being divisible by 2 for (3^n-7)/2.

If you wanted to add a line somewhere that acknowledged optimization/profiling from Seth Troisi, it would make me feel extra valued for the work I did.

rogue 2022-11-22 23:24

[QUOTE=SethTro;618288]I'm glad these could get integrating.

I wanted to find a few primes in [URL="https://oeis.org/A063679"]a sequence[/URL] and I was so happy to find a full feature sieving tool already existed for my problem. My first attempt was single-threaded and missed factors so it was great to find sr2sieve.

I also appreciate that it's fully open source and I could modify and improve it; as I needed this to hack around the all terms being divisible by 2 for (3^n-7)/2.

If you wanted to add a line somewhere that acknowledged optimization/profiling from Seth Troisi, it would make me feel extra valued for the work I did.[/QUOTE]

Sorry about that. I should have added your name to CHANGES.txt. I will update that soon.

The divisible by d sequences need work. I laid out in one of these threads the conditions that must be met for srsieve2/srsieve2cl to sieve such sequences. I just don't recall where it is. That would be a nice contribution if you want to work on it.

rogue 2022-11-28 15:32

I have posted mtsieve 2.3.6 to sourceforge. Outside of modifying CHANGES.txt to mention Seth Troisi's addition, here are the changes for 2.3.6:

[code]
cksieve/cksievecl: version 1.4
Initial release of cksievecl.
cksieve will now run on non-x86 CPUs. It is 25% faster than the previous version.
cksievecl is about 5x faster than cksieve when comparing i9-11950H vs NVIDIA RTX A5000
[/code]

The only sieves without ARM builds are afsieve, gcwsieve, pixsieve, xyyxsieve and their OpenCL/Metal equivalents.

rogue 2022-11-29 17:26

I have posted mtsieve 2.3.7 to sourceforge. Here are the changes for 2.3.6:

[code]
gcwsieve/gcwsievecl: version 1.5
Added support for non-x86 CPUs. FPU or AVX is still used on x86 CPUs.
Added -A to enable AVX on x86 CPUs. AVX code can be faster than the FPU,
but you will have to test ranges (for p > max n) to see which is faster.
Updated invmod method in the GPU and FPU code to gain about 2%.
[/code]

storm5510 2022-11-29 18:05

I have an older package. With all due respect, I have not yet seen any good examples of how to use these things.

rogue 2022-11-29 18:23

[QUOTE=storm5510;618695]I have an older package. With all due respect, I have not yet seen any good examples of how to use these things.[/QUOTE]

Sorry, but I have not updated the webpage in a while. Do you want to use a specific sieve from the framework?

storm5510 2022-11-29 23:17

[QUOTE=rogue;618696]Sorry, but I have not updated the webpage in a while. Do you want to use a specific sieve from the framework?[/QUOTE]

Not at this time. What web page are you referring to. Perhaps it might be of assistance, for learning.

kruoli 2022-11-30 15:42

[URL="http://mersenneforum.org/rogue/mtsieve.html"]Here[/URL] you go. :smile:

rogue 2022-11-30 16:07

[QUOTE=kruoli;618743][URL="http://mersenneforum.org/rogue/mtsieve.html"]Here[/URL] you go. :smile:[/QUOTE]

Horribly out of date. Working on an update.

rogue 2022-11-30 20:57

[QUOTE=rogue;618749]Horribly out of date. Working on an update.[/QUOTE]

Updated

storm5510 2022-12-01 00:25

[QUOTE=kruoli;618743][URL="http://mersenneforum.org/rogue/mtsieve.html"]Here[/URL] you go. :smile:[/QUOTE]

This is helpful. Many thanks!

Several years ago, when I was running LLR's, I used the [I]srsieve[/I] group for several months. The command-line switches are different with this [I]srsieve2[/I]. There were two min/max parameters then. I only see one now.

My RTX 2080 supports OpenCL, but using it did not seem to make much difference in throughput.

I will have to do more experimentation with this, and others. :smile:

rogue 2022-12-01 13:42

[QUOTE=storm5510;618774]This is helpful. Many thanks!

Several years ago, when I was running LLR's, I used the [I]srsieve[/I] group for several months. The command-line switches are different with this [I]srsieve2[/I]. There were two min/max parameters then. I only see one now.

My RTX 2080 supports OpenCL, but using it did not seem to make much difference in throughput.

I will have to do more experimentation with this, and others. :smile:[/QUOTE]

Yes, the command line parameters are different. This is part due to all sieves using some common parameters. Part of it is due to some of the parameters from srsieve have no equivalent in srsieve2 and srsieve2 have parameters that have no equivalent in srsieve.

srsieve2cl supports OpenCL. srsieve2 does not. srsieve2cl will start using the GPU when p > 1e6.

I have been using -g32 as that provides better rates compared to the default of -g8. With thousands of sequences you might need to use -K or -b with -K. You can also play around with -U, -V, and -X. You will likely need to use -M at lower p due to higher factor density. It will tell you if -M needs to be changed for the range.

Unfortunately the program does not "auto-tune" to come up with the best values for these parameters. I recommend that you find a fixed range that takes at least one minute to sieve then create a script to run that range multiple times, but changing the values for those switches. When done look at srsieve2.log to see which combination was the best.

storm5510 2022-12-01 17:10

[QUOTE=rogue;618785]Yes, the command line parameters are different. This is part due to all sieves using some common parameters. Part of it is due to some of the parameters from srsieve have no equivalent in srsieve2 and srsieve2 have parameters that have no equivalent in srsieve.

srsieve2cl supports OpenCL. srsieve2 does not. srsieve2cl will start using the GPU when p > 1e6.

I have been using -g32 as that provides better rates compared to the default of -g8. With thousands of sequences you might need to use -K or -b with -K. You can also play around with -U, -V, and -X. You will likely need to use -M at lower p due to higher factor density. It will tell you if -M needs to be changed for the range.

Unfortunately the program does not "auto-tune" to come up with the best values for these parameters. I recommend that you find a fixed range that takes at least one minute to sieve then create a script to run that range multiple times, but changing the values for those switches. When done look at srsieve2.log to see which combination was the best.[/QUOTE]

I found some of the older sieve programs on an external hard drive. [I]sr1sieve[/I], and others. I used the [C]-h[/C] switch to look at the parameters for each. It seems my ability to remember things has dimmed somewhat. I can remember [C]-p[/C], [C]-P[/C], [C]-n[/C], and [C]-N[/C], but not much more. I used to run these from a batch file so I would not have to remember the specific switches. Looking through the forums might help.

rogue 2022-12-01 18:19

[QUOTE=storm5510;618799]I found some of the older sieve programs on an external hard drive. [I]sr1sieve[/I], and others. I used the [C]-h[/C] switch to look at the parameters for each. It seems my ability to remember things has dimmed somewhat. I can remember [C]-p[/C], [C]-P[/C], [C]-n[/C], and [C]-N[/C], but not much more. I used to run these from a batch file so I would not have to remember the specific switches. Looking through the forums might help.[/QUOTE]

To start sieving one or more sequences with srsieve2/srsieve2cl the only required parameters are -s, -n, and -N. It will stop upon ^C if you do not specify -P. The output file name and format for that file are defaulted. You can change with -o and -F.

rogue 2022-12-01 22:12

Here are some relative speeds for the programs. I used S750 from CRUS as the base for the sequences to be tested. I pre-sieved to 1e9. These times (in seconds) are for sieving from 1e9 to 2e9 with default values used for -g and -w. The CPU code ran on i9-11950H and the GPU code ran on NVIDIA RTX A5000.

[code]
sr1sieve sr2sieve sr2sieve srsieve2 srsieve2 srsieve2cl srsieve2cl
w/Leg wo/Leg w/Leg wo/Leg w/Leg wo/Leg
1 54 n/a n/a 65 218 30 30
10 n/a 247 282 801 994 *** 91
100 n/a 1214 1580 3645 4198 *** 319

*** -> uses generic sieving logic in the GPU, which does not support Legendre tables for multiple sequences
[/code]

1000 sequences takes much longer, but I expect similar results. In other words srsieve2cl should be faster than anything else.

In the future I will add Legendre support in the GPU when using multiple sequences, but I'm not certain how much of a benefit it will have, especially when one has hundreds of sequences.

storm5510 2022-12-01 23:33

[QUOTE=rogue;618806]To start sieving one or more sequences with srsieve2/srsieve2cl the only required parameters are -s, -n, and -N. It will stop upon ^C if you do not specify -P. The output file name and format for that file are defaulted. You can change with -o and -F.[/QUOTE]

After doing more "digging" on my external drive, I found the batch files I had written for [I]srsieve[/I] and [I]sr1sieve[/I]. [I]srsieve[/I] runs to a point then [I]sr1sieve[/I] takes over after [I]srfile[/I] does a conversion. This may take some time.

Many thanks! :smile:

rogue 2022-12-02 14:44

[QUOTE=storm5510;618821]After doing more "digging" on my external drive, I found the batch files I had written for [I]srsieve[/I] and [I]sr1sieve[/I]. [I]srsieve[/I] runs to a point then [I]sr1sieve[/I] takes over after [I]srfile[/I] does a conversion. This may take some time.

Many thanks! :smile:[/QUOTE]

There is no reason to using srsieve anymore. Use srsieve2, even if you don't have a GPU. Without a GPU you sieve to 1e6 with srsieve2, then switch to sr1sieve/sr2sieve. The default output format from srsieve2 (ABCD) can be read by the current versions of sr1sieve/sr2sieve.

storm5510 2022-12-02 17:43

[QUOTE=rogue;618839]There is no reason to using srsieve anymore. Use srsieve2, even if you don't have a GPU. Without a GPU you sieve to 1e6 with srsieve2, then switch to sr1sieve/sr2sieve. The default output format from srsieve2 (ABCD) can be read by the current versions of sr1sieve/sr2sieve.[/QUOTE]

I did. [I]srsieve2cl[/I]. The GPU utilization never dropped below 80%. Consider the following:

[CODE]srsieve2cl -n 1e3 -N 15e6 -P 5e9 -M 3500 -s "101*2^n+1"[/CODE]

This is me experimenting with the switches. What I ended up with was 831,020 remaining terms. This is, by far, too many to be practical for any LLR process. The largest result in an output file never appears to exceed the value of [C]-N[/C]. The quoted above took five minutes to run. I believe I need to [U]greatly[/U] increase the value of [C]-P[/C]. I will try this again with [C]-P[/C] at 100e9 and see what is left over.

[U]Note[/U]: I used 101 in the sequence because I knew it was a prime number, just not Mersenne. I was receiving GPU messages until [C]-M[/C] was at 3,500.

rogue 2022-12-02 19:28

[QUOTE=storm5510;618850]I did. [I]srsieve2cl[/I]. The GPU utilization never dropped below 80%. Consider the following:

[CODE]srsieve2cl -n 1e3 -N 15e6 -P 5e9 -M 3500 -s "101*2^n+1"[/CODE]

This is me experimenting with the switches. What I ended up with was 831,020 remaining terms. This is, by far, too many to be practical for any LLR process. The largest result in an output file never appears to exceed the value of [C]-N[/C]. The quoted above took five minutes to run. I believe I need to [U]greatly[/U] increase the value of [C]-P[/C]. I will try this again with [C]-P[/C] at 100e9 and see what is left over.

[U]Note[/U]: I used 101 in the sequence because I knew it was a prime number, just not Mersenne. I was receiving GPU messages until [C]-M[/C] was at 3,500.[/QUOTE]

Hopefully the speed is to your liking. Use -g. The default is 8. I recommend a power of 2, such as -g16 or -g32. That should increase GPU utilization. Once you get to P of about 1e9 or maybe even 1e10 you will need need to add -M. In the future I will modify the code so that -M is adjusted automatically while running.

Yes, no n in the output file will be outside of the range you specified on the command line.

storm5510 2022-12-02 22:43

[QUOTE=rogue;618858]Hopefully the speed is to your liking. Use -g. The default is 8. I recommend a power of 2, such as -g16 or -g32. That should increase GPU utilization. Once you get to P of about 1e9 or maybe even 1e10 you will need need to add -M. In the future I will modify the code so that -M is adjusted automatically while running.

Yes, no n in the output file will be outside of the range you specified on the command line.[/QUOTE]

I adjusted [C]-P[/C] to 100e9. The run took 1.3 hours. The number of remaining terms was about half of what I had before. The GPU stayed below 50°C, so that is really good. I will give [C]-g 16[/C] a try. Perhaps the high number of terms is related to the size of the number at the front of the series. I should use something larger.

rogue 2022-12-02 23:30

[QUOTE=storm5510;618874]I adjusted [C]-P[/C] to 100e9. The run took 1.3 hours. The number of remaining terms was about half of what I had before. The GPU stayed below 50°C, so that is really good. I will give [C]-g 16[/C] a try. Perhaps the high number of terms is related to the size of the number at the front of the series. I should use something larger.[/QUOTE]

The range of n determines how many terms you start with so I'm not certain what you mean by "use something larger".

storm5510 2022-12-03 05:58

[QUOTE=rogue;618878]The range of n determines how many terms you start with so I'm not certain what you mean by "[B]use something larger[/B]".[/QUOTE]

Using a larger value for [I]k[/I].

You are to be congratulated for the amazing performance increase. I can run sieves in a few hours which three years ago may have taken days when I was running Riesel's for [URL="https://www.rieselprime.de/ziki/Main_Page"]Prime Wiki[/URL].

[U]Off-topic[/U]: I am not aware of anything being done with the [I]LLR[/I] group of programs. In my case, there was a major slow-down which started around 800K for [I]n[/I]. The [I]k[/I] value did not much matter.

rogue 2022-12-03 14:06

[QUOTE=storm5510;618889]You are to be congratulated for the amazing performance increase. I can run sieves in a few hours which three years ago may have taken days when I was running Riesel's for [URL="https://www.rieselprime.de/ziki/Main_Page"]Prime Wiki[/URL].

[U]Off-topic[/U]: I am not aware of anything being done with the [I]LLR[/I] group of programs. In my case, there was a major slow-down which started around 800K for [I]n[/I]. The [I]k[/I] value did not much matter.[/QUOTE]

Thank you!

The speed of llr/pfgw is a result of the FFT size needed to do the PRP/primary test. This primarily driven by n since k is only a handful of bits and n is many thousands of bits. There are some GPU programs that can do PRP/primarity tests, such as llrCUDA, proth20, and various versions of genefer. proth20 is limited to base 2. genefer is limited to GFNs.

kruoli 2022-12-03 15:53

[QUOTE=storm5510;618889]I am not aware of anything being done with the [I]LLR[/I] group of programs. In my case, there was a major slow-down which started around 800K for [I]n[/I]. The [I]k[/I] value did not much matter.[/QUOTE]

There is a [URL="http://mersenneforum.org/showthread.php?t=28170"]CUDA version[/URL] of LLR. If you want to test larger numbers, you might be interested in trying this on your 2080.

storm5510 2022-12-04 00:30

[QUOTE=kruoli;618911]There is a [URL="http://mersenneforum.org/showthread.php?t=28170"]CUDA version[/URL] of LLR. If you want to test larger numbers, you might be interested in trying this on your 2080.[/QUOTE]

Thanks. I did on my Linux box, (Ubuntu 20.04.4 LTS). That system has a GTX 1080 in it. It is not a good performer because of limited power. Tests started out taking 13 seconds and got slower as it went on. The same system is a dual-boot, of sorts. Windows 7 exists on one drive and Ubuntu lives on another. I switch the drive cables. The Windows version began the same tests taking about 3 seconds. It is an HP workstation probably 8 years old. Long-in-the-tooth. My 2080 is in a Windows 10 system from 2018. I make do with what I have.

[QUOTE=rogue]Thank you![/QUOTE]

I give recognition when due. It was decidedly due here! I am quite pleased with it. :smile:

Honza 2022-12-04 08:31

Is there Windows binary for latest version of LLRCUDA somewhere?

pepi37 2022-12-04 11:24

[QUOTE=Honza;618948]Is there Windows binary for latest version of LLRCUDA somewhere?[/QUOTE]

Nope, just for linux , and as I know it is not fast enough ( in my case)

Jean Penné 2022-12-04 13:24

Does the static binary work for you?
 
[QUOTE=pepi37;618951]Nope, just for linux , and as I know it is not fast enough ( in my case)[/QUOTE]

Nevertheless, I need to know if the static binary of llrCUDA works for you, even if it is not fast enough...

Thank you by advance,

Jean

pepi37 2022-12-04 15:14

[QUOTE=Jean Penné;618957]Nevertheless, I need to know if the static binary of llrCUDA works for you, even if it is not fast enough...

Thank you by advance,

Jean[/QUOTE]

Yes it works :) But one instance "eat one CPU core"
Using trick with libsleep.I can reduce it to 50% of one CPU core. Speed is same as Ryzen7 3700x per core: since both need around 17 minutes for test of 535000 digits candidate

[QUOTE]root@OMICRON:~/LLR# ./sllrCUDA -d -q"4569*2^1778899+1"
Starting Proth prime test of 4569*2^1778899+1
Using complex irrational base DWT, FFT length = 262144, a = 5
^Ceration: 160000 / 1778910 [8.49%], ms/iter: 0.596, ETA: 00:16:04
Caught signal. Terminating.
Stopping Proth prime test of 4569*2^1778899+1 at iteration 164342 [9.23%]


root@OMICRON:~/LLR# [COLOR="Red"]LD_PRELOAD="/usr/local/lib/libsleep.so"[/COLOR] ./sllrCUDA -d -q"4569*2^1778899+1"
libsleep: Sleep time: 50usec
Resuming Proth prime test of 4569*2^1778899+1 at bit 164343 [9.23%]
Using complex irrational base DWT, FFT length = 262144, a = 5
^Ceration: 310000 / 1778910 [16.93%], ms/iter: 0.593, ETA: 00:14:30
Caught signal. Terminating.
Stopping Proth prime test of 4569*2^1778899+1 at iteration 317616 [17.85%]
[/QUOTE]

Citrix 2022-12-04 20:51

I am getting the following error. What settings do I need to change?

[CODE]
srsieve2cl.exe -i sr_2.abcd -W4 -p 10000000000000 -P 11000000000000 -Ofactors.txt -osr_2_new.abcd -G12 -M100000 -l1000

srsieve2cl v1.6.5, a program to find factors of k*b^n+c numbers for fixed b and variable k and n
Sieving with multi-sequence c=1 logic for p >= 10000000000000
BASE_MULTIPLE = 2, POWER_RESIDUE_LCM = 720, LIMIT_BASE = 720
Assertion failed: m <= HASH_MAX_ELTS, file sierpinski_riesel/AbstractSequenceHelper.cpp, line 272
[/CODE]

rogue 2022-12-04 21:59

[QUOTE=Citrix;618983]I am getting the following error. What settings do I need to change?

[CODE]
srsieve2cl.exe -i sr_2.abcd -W4 -p 10000000000000 -P 11000000000000 -Ofactors.txt -osr_2_new.abcd -G12 -M100000 -l1000

srsieve2cl v1.6.5, a program to find factors of k*b^n+c numbers for fixed b and variable k and n
Sieving with multi-sequence c=1 logic for p >= 10000000000000
BASE_MULTIPLE = 2, POWER_RESIDUE_LCM = 720, LIMIT_BASE = 720
Assertion failed: m <= HASH_MAX_ELTS, file sierpinski_riesel/AbstractSequenceHelper.cpp, line 272
[/CODE][/QUOTE]

I ran into this in the past week so I have a solution for it. I posted an experimental build over at sourceforge that should address this.

Citrix 2022-12-04 22:05

I get with new

[CODE]
srsieve2cl.exe -i sr_2.abcd -W2 -p 10000000000000 -P 11000000000000 -Ofactors.txt -osr_2_new.abcd -M1000 -l10000 -w1000 -G12
srsieve2cl v1.6.5, a program to find factors of k*b^n+c numbers for fixed b and variable k and n
Sieving with multi-sequence c=1 logic for p >= 10000000000000
BASE_MULTIPLE = 2, POWER_RESIDUE_LCM = 720, LIMIT_BASE = 720
Split 204 base 2 sequences into 9182 base 2^720 sequences.
Legendre summary: Approximately 4752 B needed for Legendre tables
204 total sequences
204 are eligible for Legendre tables
0 are not eligible for Legendre tables
204 have Legendre tables in memory
0 cannot have Legendre tables in memory
0 have Legendre tables loaded from files
204 required building of the Legendre tables
17625600 bytes used for congruent subseq indices
1360000 bytes used for congruent subseqs
Fatal Error: Must use generic worker if using GPU with multiple sequences by specifying -l0

[/CODE]

With generic code
[CODE]
srsieve2cl.exe -i sr_2.abcd -W2 -p 10000000000000 -P 11000000000000 -Ofactors.txt -osr_2_new.abcd -M1000 -w1000 -G6
srsieve2cl v1.6.5, a program to find factors of k*b^n+c numbers for fixed b and variable k and n
Must use generic sieving logic because -l was not specified for mutiple sequences
Sieving with generic logic for p >= 10000000000000
Split 204 base 2 sequences into 20555 base 2^2880 sequences.
bestQ = 2880 yields bs = 6077, gs = 1, sieveLow = 868, sieveRange = 6077
bestQ = 2880 yields bs = 6077, gs = 1, sieveLow = 868, sieveRange = 6077
GPU primes per worker is 57344
Sieve started: 1e13 < p < 11e12 with 134418 terms (2500875 < n < 20000000, k*2^n-1) (expecting 427 factors)
Increasing worksize to 16000 since each chunk is tested in less than a second

OpenCL Error: Out of host memory
in call to clEnqueueNDRangeOpenCLKernel
kernelName: generic_kernel globalworksize 57344 localworksize 256
[/CODE]

rogue 2022-12-05 03:39

[QUOTE=Citrix;618989]With generic code
[CODE]
srsieve2cl.exe -i sr_2.abcd -W2 -p 10000000000000 -P 11000000000000 -Ofactors.txt -osr_2_new.abcd -M1000 -w1000 -G6
srsieve2cl v1.6.5, a program to find factors of k*b^n+c numbers for fixed b and variable k and n
Must use generic sieving logic because -l was not specified for mutiple sequences
Sieving with generic logic for p >= 10000000000000
Split 204 base 2 sequences into 20555 base 2^2880 sequences.
bestQ = 2880 yields bs = 6077, gs = 1, sieveLow = 868, sieveRange = 6077
bestQ = 2880 yields bs = 6077, gs = 1, sieveLow = 868, sieveRange = 6077
GPU primes per worker is 57344
Sieve started: 1e13 < p < 11e12 with 134418 terms (2500875 < n < 20000000, k*2^n-1) (expecting 427 factors)
Increasing worksize to 16000 since each chunk is tested in less than a second

OpenCL Error: Out of host memory
in call to clEnqueueNDRangeOpenCLKernel
kernelName: generic_kernel globalworksize 57344 localworksize 256
[/CODE][/QUOTE]

You should use -g to increase GPU primes per worker as opposed to the number of GPU threads. The framework, at this time, does not support one executable running concurrently on multiple GPUs.

Using -G impacts GPU memory usage, but with that many subsequences I suggest that you use -b (a value less than 1.0) to reduce the size of the hash table that the GPU will use. You might also want to use -K to split the sequences across multiple chunks. This will require some trial and error on your part. There is no way (that I am aware of) to compute the memory required for a kernel so the code cannot "auto-tune" these parameters.

You cannot use -l > 0 with the GPU when you have multiple sequences. srsieve2cl does not support it at this time.

I also do not recommend mixing -W and -G. The factor rate calculation does not work correctly when using both CPU and GPU workers.

You can use -p10e12 -P11e12 if that is easier to read.

storm5510 2022-12-05 18:44

[B]@rogue[/B]

[U]Q[/U].: Does [I]srsieve2cl[/I] generate an exit code when it finishes? Running small sieves from a batch sometimes would fail because I had the [C]-M[/C] set too low. It was at 3,500. Now, it is 10,000. It varied based on what the [I]k[/I] value was. Some [I]k's[/I] caused problems and others did not. All used the same values for [I]-n[/I], [I]-N[/I], and -[I]P.[/I]

rogue 2022-12-05 19:24

[QUOTE=storm5510;619056][B]@rogue[/B]

[U]Q[/U].: Does [I]srsieve2cl[/I] generate an exit code when it finishes? Running small sieves from a batch sometimes would fail because I had the [C]-M[/C] set too low. It was at 3,500. Now, it is 10,000. It varied based on what the [I]k[/I] value was. Some [I]k's[/I] caused problems and others did not. All used the same values for [I]-n[/I], [I]-N[/I], and -[I]P.[/I][/QUOTE]

For normal completion it will output the number of terms written to the output file and the time it took to run.

SEGFAULTs will just give you the command prompt without any of that. If that happens let me know.

storm5510 2022-12-05 23:41

[QUOTE=rogue;619059]For normal completion it will output the number of terms written to the output file and the time it took to run.

SEGFAULTs will just give you the command prompt without any of that. If that happens let me know.[/QUOTE]

Forgive me, but I didn't specify it correctly. An error code?

For a normal program run and exit, an error code of zero is expected. If there is an error, a non-zero code is returned.

[QUOTE=Jean Penné]Nevertheless, I need to know if the static binary of llrCUDA works for you, even if it is not fast enough...

Thank you by advance,

Jean[/QUOTE]

[U]Off-topic[/U]: I am running it as a test. According to [I]nvidia-smi[/I], it is using about 30% of the GPU's capability. I am running "1955*2^n+1" for the test. The [I]k[/I] is my birth year. The [I]n's[/I] are around 102K presently. Despite not being all that fast, it is quite stable in my case. [I]Ubuntu 20.04.4 LTS[/I] using a GTX 1080. The iteration time holds steady at 0.14 seconds. The overall time is increasing gradually.

rogue 2022-12-06 00:09

[QUOTE=storm5510;619083]Forgive me, but I didn't specify it correctly. An error code?

For a normal program run and exit, an error code of zero is expected. If there is an error, a non-zero code is returned.[/QUOTE]

It will be zero upon successful completion. A FatalError (caught and output to the console) is -1. I'm not certain what assert() with exit with.

I do not understand why you care. The error code is not output to the console.

Citrix 2022-12-06 03:34

@Rogue

I can get the program to work but it is extremely slow without the Legendre tables.

Couple of other questions/thoughts

1. I get the following error with the CPU code as well (srsieve2). Can you release a fix.
[CODE]Assertion failed: m <= HASH_MAX_ELTS, file sierpinski_riesel/AbstractSequenceHelper.cpp, line 272[/CODE]

2. For BASE_MULTIPLE there is a limit of 60 ... can this be increased to 256 or higher.

3. Possible bug:- The GPU code seems to crash if the n range is large (~15M); seems to produce false factors if n range is large and LIMIT_BASE is huge.

4. For what type of sequences is it best to use GPU and for which ones should you stick to CPU.

Thanks

rogue 2022-12-06 04:21

[QUOTE=Citrix;619096]I can get the program to work but it is extremely slow without the Legendre tables.

Couple of other questions/thoughts

1. I get the following error with the CPU code as well (srsieve2). Can you release a fix.
[CODE]Assertion failed: m <= HASH_MAX_ELTS, file sierpinski_riesel/AbstractSequenceHelper.cpp, line 272[/CODE]

2. For BASE_MULTIPLE there is a limit of 60 ... can this be increased to 256 or higher.

3. Possible bug:- The GPU code seems to crash if the n range is large (~15M); seems to produce false factors if n range is large and LIMIT_BASE is huge.

4. For what type of sequences is it best to use GPU and for which ones should you stick to CPU.[/QUOTE]

If you are on Windows, I posted a Windows build of srsieve2cl which should resolve that. I haven't committed the code yet.

I will take a look regarding BASE_MULTIPLE. There might be memory considerations if one makes it too large.

Please provide the command line arguments you are using to cause the issue with srsieve2cl. If it is crashing (with no nastygrams), then it is a bug. It is also a bug if it is producing false factors. If this is different input than the crash, then please provide details.

AFAIK, srsieve2cl is always better than any alternative. What GPU are you using? Note that it will be slower for smaller p due to factor validation.

henryzz 2022-12-06 10:47

[QUOTE=rogue;618818]Here are some relative speeds for the programs. I used S750 from CRUS as the base for the sequences to be tested. I pre-sieved to 1e9. These times (in seconds) are for sieving from 1e9 to 2e9 with default values used for -g and -w. The CPU code ran on i9-11950H and the GPU code ran on NVIDIA RTX A5000.

[code]
sr1sieve sr2sieve sr2sieve srsieve2 srsieve2 srsieve2cl srsieve2cl
w/Leg wo/Leg w/Leg wo/Leg w/Leg wo/Leg
1 54 n/a n/a 65 218 30 30
10 n/a 247 282 801 994 *** 91
100 n/a 1214 1580 3645 4198 *** 319

*** -> uses generic sieving logic in the GPU, which does not support Legendre tables for multiple sequences
[/code]

1000 sequences takes much longer, but I expect similar results. In other words srsieve2cl should be faster than anything else.

In the future I will add Legendre support in the GPU when using multiple sequences, but I'm not certain how much of a benefit it will have, especially when one has hundreds of sequences.[/QUOTE]
Is the A5000 mentioned the mobile version? Regardless the cpu and gpu comparison is a little unfair as the cpu is a low power (35 to 45W) and the gpu is one of the best server gpus(although maybe fairly power limited if in a laptop). This may be why others are seeing a much smaller difference between cpus and gpus.

rogue 2022-12-06 13:34

[QUOTE=henryzz;619101]Is the A5000 mentioned the mobile version? Regardless the cpu and gpu comparison is a little unfair as the cpu is a low power (35 to 45W) and the gpu is one of the best server gpus(although maybe fairly power limited if in a laptop). This may be why others are seeing a much smaller difference between cpus and gpus.[/QUOTE]

You make a good point, but I cannot speak for what others are seeing because I haven't seen others post comparisons.

storm5510 2022-12-06 16:38

[QUOTE=rogue;619086]It will be zero upon successful completion. A FatalError (caught and output to the console) is -1. I'm not certain what assert() with exit with.

I do not understand why you care. The error code is not output to the console.[/QUOTE]

If, and only if, somebody wanted to run it in a batch process.

[CODE]srsieve2cl -n x -N x -P x "k*2^n+1"
if not errorlevel 0 ....[/CODE]

The OS catches the error code. It does not need to be seen.

rogue 2022-12-06 16:55

[QUOTE=storm5510;619127]If, and only if, somebody wanted to run it in a batch process.

[CODE]srsieve2cl -n x -N x -P x "k*2^n+1"
if not errorlevel 0 ....[/CODE]

The OS catches the error code. It does not need to be seen.[/QUOTE]

I see. It should only return 0 upon successful completion, but I haven't verified that.

Happy5214 2022-12-07 19:13

[QUOTE=storm5510;619127]If, and only if, somebody wanted to run it in a batch process.

[CODE]srsieve2cl -n x -N x -P x "k*2^n+1"
if not errorlevel 0 ....[/CODE]

The OS catches the error code. It does not need to be seen.[/QUOTE]
I also made a similar request a few months ago for scripting purposes, specifically for a nonzero exit status for interrupted (i.e. SIGINT) sieves. I've developed more robust log parsing in my workflow since then, but an exit code-based solution would only be a couple of lines in Perl.

rogue 2022-12-07 20:16

[QUOTE=Happy5214;619189]I also made a similar request a few months ago for scripting purposes, specifically for a nonzero exit status for interrupted (i.e. SIGINT) sieves. I've developed more robust log parsing in my workflow since then, but an exit code-based solution would only be a couple of lines in Perl.[/QUOTE]

If you try using a script with something built from the framework, but it doesn't work, please let me know and I will look into it. I would not expect the change to be difficult.

storm5510 2022-12-07 23:53

I tried running [I]srsieve2cl[/I] on a Windows 7 system I have. It has a GTX 1080 in it. An error dialog appeared indicating it was looking for something beginning with "api." I have seen this many times over the years. Doing a simple drive search, these things are in multiple places, but not where the program could find them. :ermm:

rogue 2022-12-08 04:15

[QUOTE=storm5510;619202]I tried running [I]srsieve2cl[/I] on a Windows 7 system I have. It has a GTX 1080 in it. An error dialog appeared indicating it was looking for something beginning with "api." I have seen this many times over the years. Doing a simple drive search, these things are in multiple places, but not where the program could find them. :ermm:[/QUOTE]

I don't know what it would be looking for. Do you have more details?

storm5510 2022-12-08 17:04

[QUOTE=rogue;619210]I don't know what it would be looking for. Do you have more details?[/QUOTE]

If will only display the first one it does not find. Below are some sample names I found.

[CODE]api-ms-win-core-console-l1-1-0.dll
api-ms-win-core-console-l1-2-0.dll
api-ms-win-core-datetime-l1-1-0.dll[/CODE]

There are many more. I "think" these may be [I].Net Framework[/I], but I am not sure.

rogue 2022-12-08 17:26

[QUOTE=storm5510;619245]If will only display the first one it does not find. Below are some sample names I found.

[CODE]api-ms-win-core-console-l1-1-0.dll
api-ms-win-core-console-l1-2-0.dll
api-ms-win-core-datetime-l1-1-0.dll[/CODE]

There are many more. I "think" these may be [I].Net Framework[/I], but I am not sure.[/QUOTE]

Okay. I built on Windows 10. I do not have a Windows 7 machine to build on and I'm not certain how to build a Windows 7 compatible binary. If you want to try your own build, then you will need to install this:

[code]D:\test\done>gcc --version
clang version 14.0.0 (https://github.com/llvm/llvm-project.git 329fda39c507e8740978d10458451dcdb21563be)
Target: x86_64-w64-windows-gnu
Thread model: posix
InstalledDir: C:/llvm-mingw-20220323-ucrt-x86_64/bin[/code]

And an OpenCL SDK. I don't know if the AMD one I have used is still available online, but there are others.

storm5510 2022-12-08 21:20

[QUOTE=rogue;619247]Okay. I built on Windows 10. I do not have a Windows 7 machine to build on and I'm not certain how to build a Windows 7 compatible binary. If you want to try your own build, then you will need to install this:

[code]D:\test\done>gcc --version
clang version 14.0.0 (https://github.com/llvm/llvm-project.git 329fda39c507e8740978d10458451dcdb21563be)
Target: x86_64-w64-windows-gnu
Thread model: posix
InstalledDir: C:/llvm-mingw-20220323-ucrt-x86_64/bin[/code]

And an OpenCL SDK. I don't know if the AMD one I have used is still available online, but there are others.[/QUOTE]

No, I won't try to do a custom build. The older one you wrote in 2020 is working just fine. Using the -W switch to specify threads helps tremendously.

storm5510 2022-12-09 05:07

1 Attachment(s)
[B]@rogue[/B]

Look at the last line in the attached image. It may have something to do with the line after the invocation.

Happy5214 2022-12-09 12:44

[QUOTE=storm5510;619245]If will only display the first one it does not find. Below are some sample names I found.

[CODE]api-ms-win-core-console-l1-1-0.dll
api-ms-win-core-console-l1-2-0.dll
api-ms-win-core-datetime-l1-1-0.dll[/CODE]

There are many more. I "think" these may be [I].Net Framework[/I], but I am not sure.[/QUOTE]

From what I could gather on the Microsoft website, those are internal API DLLs for the operating system (the specific phrase used was "implementation detail"), and they're Windows version-specific. They will not work if copied from a newer version of Windows. To answer your question, they're not related to .Net.

rogue 2022-12-09 13:51

[QUOTE=storm5510;619281][B]@rogue[/B]

Look at the last line in the attached image. It may have something to do with the line after the invocation.[/QUOTE]

This is not a valid use case for srsieve2. You must pre-sieve all sequences to remove small terms before triggering the special logic which is used for larger p. You should never start sieving a new sequence at p > 1e6. In fact you should never specify -p when you start a new sequence.

I will add some validation to prevent this.

storm5510 2022-12-09 17:02

[QUOTE=rogue;619303]This is not a valid use case for srsieve2. You must pre-sieve all sequences to remove small terms before triggering the special logic which is used for larger p. You should never start sieving a new sequence at p > 1e6. In fact you should never specify -p when you start a new sequence.

I will add some validation to prevent this.[/QUOTE]

This is the 3rd of 3 batch files. The first two use lower p values.

Look at my last post in "Sieving for CRUS." If you change the program, I may not be able to do what I am doing now unless I stay with the current version, which I would have to.

pepi37 2022-12-10 18:02

In srsieve2cl what is M switch?
I cannot find it in help menu...

[QUOTE]srsieve2cl -g 512 -M 7500 -P 5e13 -f B -i b1732_n.boinc -o b17323_n.boinc -O fact173.txt[/QUOTE]

I do some testing on GPU ( 2070 SUPER) and find that -g must be 512 to get stable 95% gou utilization and speed same as 4 core ryzen 7 5700x at 4Ghz.
But switch M is still mystery. (and looks like speed is same with or without M switch)

rogue 2022-12-10 18:29

[QUOTE=pepi37;619409]In srsieve2cl what is M switch?
I cannot find it in help menu...

I do some testing on GPU ( 2070 SUPER) and find that -g must be 512 to get stable 95% gou utilization and speed same as 4 core ryzen 7 5700x at 4Ghz.
But switch M is still mystery. (and looks like speed is same with or without M switch)[/QUOTE]

-M sets the maxfactordensity, which indicates the number of expected factors per 1e6 terms. If the GPU cannot report all of the factors it will terminate the run and tell you to adjust it. At higher p you shouldn't need to adjust it. If you need to set it and set it too high, then it could impact performance due to how much more memory is required.

rogue 2022-12-12 16:47

I have posted mtsieve 2.3.8. Here are the changes:

[code]
mfsieve/mfsievecl: version 2.1
Build a list of terms with powers prior to sieving so that computing minn! is faster.
For factorial, this improves the calculation of minn! by 30% with minn=1e6 when using
the GPU and by 40% when using the CPU.
Multi-factorials where minn < 1e6 will see less of a boost in performance.

srsieve2/srsieve2cl: version 1.6.6
Added -Q which will output estimated work for each possible q.
Added -q which can be used to specify the q to use (if that q is possible),
overriding the computed best q.

To use -Q, first sieve your sequence(s) to at least 1e6. This will ensure that
subsequent runs are using the correct sieving subroutines. Starting with the
output file, run that output file with the -q flag then stop immediately after
it outputs a group of lines looking like this:

q = 45 with 162 subseq yields bs = 445, gs = 2, work = 793

work is an estimated cost for that q with lower costs implying higher throughput.

To use -q, take each of the q values output from -Q and run a range of at least
1e9 to determine the actual amount of time it takes for that q. Using the
Run a range of at least 1e9 using -Q to specific which q to run with selecting
the q output from using the -q flag. Although you can run all q, you can limit
to those q withing 20% of the lowest cost. For each run observe the total time
for that run to determine which q required the least amount of time. You should
then run the entire range with that q. This will not necessrily be the q with
the lowest cost.
[/code]

Happy5214 2022-12-12 19:01

[QUOTE=rogue;619548]I have posted mtsieve 2.3.8. Here are the changes:

[code]
[...]

srsieve2/srsieve2cl: version 1.6.6
Added -Q which will output estimated work for each possible q.
Added -q which can be used to specify the q to use (if that q is possible),
overriding the computed best q.

To use -Q, first sieve your sequence(s) to at least 1e6. This will ensure that
subsequent runs are using the correct sieving subroutines. Starting with the
output file, run that output file with the -q flag then stop immediately after
it outputs a group of lines looking like this:

q = 45 with 162 subseq yields bs = 445, gs = 2, work = 793

work is an estimated cost for that q with lower costs implying higher throughput.

To use -q, take each of the q values output from -Q and run a range of at least
1e9 to determine the actual amount of time it takes for that q. Using the
Run a range of at least 1e9 using -Q to specific which q to run with selecting
the q output from using the -q flag. Although you can run all q, you can limit
to those q withing 20% of the lowest cost. For each run observe the total time
for that run to determine which q required the least amount of time. You should
then run the entire range with that q. This will not necessrily be the q with
the lowest cost.
[/code][/QUOTE]

Your description of the [c]-q[/c] and [c]-Q[/c] flag usage seems to flip the flags around as it goes along. Can you please look over those again and clarify?

rogue 2022-12-12 19:09

Sorry about that. How about this:

[code]
Added -Q which will output estimated work for each possible q.
Added -q which can be used to specify the q to use (if that q is possible),
overriding the computed best q.

To use -Q, first sieve your sequence(s) to at least 1e6. This will ensure that
subsequent runs are using the correct sieving subroutines. Starting with the
output file, run that output file with the -Q flag then stop immediately after
it outputs a group of lines looking like this:

q = 45 with 162 subseq yields bs = 445, gs = 2, work = 793

work is an estimated cost for that q with lower costs implying higher throughput.

To use -q, take each of the q values output from -Q and run a range of at least
1e9 to determine the actual amount of time it takes for that q. Using the
Run a range of at least 1e9 using -q to specific which q to run with selecting
the q output from using the -q flag. Although you can run all q, you can limit
to those q withing 20% of the lowest cost. For each run observe the total time
for that run to determine which q required the least amount of time. You should
then run the entire range with that q. This will not necessrily be the q with
the lowest cost.
[/code]

Happy5214 2022-12-12 19:34

[QUOTE=rogue;619566]Sorry about that. How about this:

[code]
Added -Q which will output estimated work for each possible q.
Added -q which can be used to specify the q to use (if that q is possible),
overriding the computed best q.

To use -Q, first sieve your sequence(s) to at least 1e6. This will ensure that
subsequent runs are using the correct sieving subroutines. Starting with the
output file, run that output file with the -Q flag then stop immediately after
it outputs a group of lines looking like this:

q = 45 with 162 subseq yields bs = 445, gs = 2, work = 793

work is an estimated cost for that q with lower costs implying higher throughput.

To use -q, take each of the q values output from -Q and run a range of at least
1e9 to determine the actual amount of time it takes for that q. Using the [B][?][/B]
Run a range of at least 1e9 using -q to [B][STRIKE]specific[/STRIKE][/B][specify] which q to run [B][STRIKE]with[/STRIKE][/B][by] selecting
the q output from using the [B][STRIKE]-q[/STRIKE][/B][-Q] flag. Although you can run all q, you can limit
to those q withing 20% of the lowest cost. For each run observe the total time
for that run to determine which q required the least amount of time. You should
then run the entire range with that q. This will not [B][STRIKE]necessrily[/STRIKE][/B][necessarily] be the q with
the lowest cost.
[/code][/QUOTE]

I just marked up the quote to show what I best understood it to mean.

rogue 2022-12-12 21:06

Thanks.

Happy5214 2022-12-13 13:35

Can you make it stop automatically once it prints out the [i]q[/i]'s when passing the [c]-Q[/c] flag? Not having to worry about killing the sieve will make writing a script to orchestrate automated testing of [i]q[/i]'s easier.

rogue 2022-12-13 13:43

[QUOTE=Happy5214;619632]Can you make it stop automatically once it prints out the [i]q[/i]'s when passing the [c]-Q[/c] flag? Not having to worry about killing the sieve will make writing a script to orchestrate automated testing of [i]q[/i]'s easier.[/QUOTE]

Try using -A and passing an empty factor file with -I. I think that would work.

Happy5214 2022-12-13 14:51

[QUOTE=rogue;619633]Try using -A and passing an empty factor file with -I. I think that would work.[/QUOTE]

It does not. The following invocation:

[code]../../../bin/srsieve2cl -A -I /dev/null -i b2_n.boinc -Q[/code]

produces this:

[code]srsieve2cl v1.6.6, a program to find factors of k*b^n+c numbers for fixed b and variable k and n
Read 0 factors from /dev/null which removed 0 terms.[/code]

Reading from [C]/dev/null[/C] is an immediate EOF, and I also tested with an empty file in the same directory.

rogue 2022-12-13 14:59

Create an empty file in the directory. Use that with -I.

Happy5214 2022-12-13 16:10

[QUOTE=Happy5214;619638]It does not. The following invocation:

[code]../../../bin/srsieve2cl -A -I /dev/null -i b2_n.boinc -Q[/code]

produces this:

[code]srsieve2cl v1.6.6, a program to find factors of k*b^n+c numbers for fixed b and variable k and n
Read 0 factors from /dev/null which removed 0 terms.[/code]

Reading from [C]/dev/null[/C] is an immediate EOF, and [B]I also tested with an empty file in the same directory.[/B][/QUOTE]

[QUOTE=rogue;619639]Create an empty file in the directory. Use that with -I.[/QUOTE]

Please re-read my post. This was the aforementioned second run with the file in the same directory:

[code]$ ../../../bin/srsieve2cl -A -I blank_file -i b2_n.boinc -Q
srsieve2cl v1.6.6, a program to find factors of k*b^n+c numbers for fixed b and variable k and n
Read 0 factors from blank_file which removed 0 terms.[/code]

rogue 2022-12-13 17:21

So the use of -A and -I doesn't work.

How about this?

srieve2cl -ifile.abcd -p1000000 -P1000100 -ofile.temp -Q

You can throw out file.temp after running this.

Happy5214 2022-12-13 18:21

[QUOTE=rogue;619648]So the use of -A and -I doesn't work.

How about this?

srieve2cl -ifile.abcd -p1000000 -P1000100 -ofile.temp -Q

You can throw out file.temp after running this.[/QUOTE]

The file is sieved to 1T already, but experimentation shows that the [c]-P[/c] doesn't really matter much:

[code]$ ../../../bin/srsieve2cl -i b2_n.boinc -P 5 -o /dev/null -Q
srsieve2cl v1.6.6, a program to find factors of k*b^n+c numbers for fixed b and variable k and n
Must use generic sieving logic because -l was not specified for [B]mutiple[/B] sequences
Sieving with generic logic for p >= 1000000000000
q = 2 with 5 subseq yields bs = 1116, gs = 224, work = 2239
[i]a bunch more q's[/i]
q = 1 with 5 subseq yields bs = 1582, gs = 316, work = 3164
Split 5 base 2 sequences into 18 base 2^72 sequences.
Fatal Error: pmin must be less than pmax[/code]

Again, note the use of [c]/dev/null[/c] to toss the output (and the bolded typo).

I can work with this now. Another question I had is about the interplay between [c]-q[/c] and [c]-g[/c]. Is the best [i]q[/i] (by speed, not score) going to be the fastest regardless of the value chosen for [c]-g[/c], allowing me to check for [i]q[/i], fix it, and check the [c]-g[/c] values; or do I have to check all (reasonable) [i]q[/i]/[c]-g[/c] combinations within that 20% [i]q[/i] score threshold, sort of like a 2×2 matrix?

rogue 2022-12-13 18:54

Some settings of -q will allow you to increase -g others will require you to decrease -g as -q can impact how much memory is needed by each kernel.

ryanp 2022-12-13 23:06

Another bug in srsieve2?

[CODE]$ ./srsieve2 -P 4e14 -o out.txt -W 32 -s "11*2^n+1" -n 15e6 -N 25e6
srsieve2 v1.6.6, a program to find factors of k*b^n+c numbers for fixed b and variable k and n
Sieving with generic logic for p >= 3
Sieve started: 3 < p < 4e14 with 10000001 terms (15000000 < n < 25000000, k*2^n+1) (expecting 9673251 factors)
Sieving with single sequence c=1 logic for p >= 257
BASE_MULTIPLE = 30, POWER_RESIDUE_LCM = 720, LIMIT_BASE = 720
Fatal Error: Expected 15 subsequences but 233 were created[/CODE]

storm5510 2022-12-14 01:08

I awakened very early this morning to find the latest [I]srsieve2cl[/I] has caused a reboot. Not sure why. It may be in the system logs. I haven't looked yet.

chalsall 2022-12-14 01:11

[QUOTE=storm5510;619693]I awakened very early this morning to find the latest [I]srsieve2cl[/I] has caused a reboot. Not sure why. It may be in the system logs. I haven't looked yet.[/QUOTE]

Please rest.

The kit can wait.

storm5510 2022-12-14 01:21

[QUOTE=chalsall;619694]Please rest.

The kit can wait.[/QUOTE]

Kit?

chalsall 2022-12-14 01:31

[QUOTE=storm5510;619695]Kit?[/QUOTE]

LOL... Glad to see you feeling better.

Kit is slang. It means the equipment you regularly work with. Or, in a military context, what you carry.

Nowadays it means the compute you spin up beside you. Or, near you. Sometimes "in the cloud".

Just to share... I hate it when people say "in the cloud", because usually they have no idea what the cloud actually is.

To share... People who think AI is going to end the world might want to study a bit about how neural networks work.

storm5510 2022-12-14 02:25

[QUOTE=chalsall;619696]LOL... Glad to see you feeling better...

Just to share... I hate it when people say "in the cloud", because usually they have no idea what the cloud actually is.[/QUOTE]


[U]Off-topic[/U]: Other than a mild headache and sneezing, it is mostly gone. A less then 24-hour deal. I get these now and then. A cloud system can be in a chicken house behind a barn in a rural area. It does not float. A server-farm more likely. It is a totally misapplied description.

rogue 2022-12-14 13:35

[QUOTE=ryanp;619687]Another bug in srsieve2?

[CODE]$ ./srsieve2 -P 4e14 -o out.txt -W 32 -s "11*2^n+1" -n 15e6 -N 25e6
srsieve2 v1.6.6, a program to find factors of k*b^n+c numbers for fixed b and variable k and n
Sieving with generic logic for p >= 3
Sieve started: 3 < p < 4e14 with 10000001 terms (15000000 < n < 25000000, k*2^n+1) (expecting 9673251 factors)
Sieving with single sequence c=1 logic for p >= 257
BASE_MULTIPLE = 30, POWER_RESIDUE_LCM = 720, LIMIT_BASE = 720
Fatal Error: Expected 15 subsequences but 233 were created[/CODE][/QUOTE]

Interesting. This is the first time I have seen this get triggered. I wonder if this is something I accidentally introduced in the latest build for single sequences.

Trying2Sieve 2022-12-15 10:53

NewPGen Output from mtsieve?
 
Noob here. Kind of. I haven't found a large prime in about 10 years and I see so much has changed. Last one I found was ranked 84 on the list (2013-ish).

I now have access to a ridiculous number of cores and would like to start my esoteric prime hunting again.

I have a great custom parallelizer tool for NewPGen formatted output, but NewPGen is very slow.

For me, the mtsieve output file is confusing. I can't determine the exponents being tested by cursory inspection.

Is there a way to link the speed of mtsieve with the NewpGen output?

I tried the -N setting to no avail.

Any reply that helps will be greatly appreciated.

rogue 2022-12-15 17:10

[QUOTE=ryanp;619687]Another bug in srsieve2?

[CODE]$ ./srsieve2 -P 4e14 -o out.txt -W 32 -s "11*2^n+1" -n 15e6 -N 25e6
srsieve2 v1.6.6, a program to find factors of k*b^n+c numbers for fixed b and variable k and n
Sieving with generic logic for p >= 3
Sieve started: 3 < p < 4e14 with 10000001 terms (15000000 < n < 25000000, k*2^n+1) (expecting 9673251 factors)
Sieving with single sequence c=1 logic for p >= 257
BASE_MULTIPLE = 30, POWER_RESIDUE_LCM = 720, LIMIT_BASE = 720
Fatal Error: Expected 15 subsequences but 233 were created[/CODE][/QUOTE]

I have fixed this and committed code changes. I ran into a segfault with multiple sequences when using Legendre tables. This does not appear to be new. I would like to fix it before posting a new build.

rogue 2022-12-15 17:18

[QUOTE=Trying2Sieve;619860]Noob here. Kind of. I haven't found a large prime in about 10 years and I see so much has changed. Last one I found was ranked 84 on the list (2013-ish).

I now have access to a ridiculous number of cores and would like to start my esoteric prime hunting again.

I have a great custom parallelizer tool for NewPGen formatted output, but NewPGen is very slow.

For me, the mtsieve output file is confusing. I can't determine the exponents being tested by cursory inspection.

Is there a way to link the speed of mtsieve with the NewpGen output?

I tried the -N setting to no avail.[/QUOTE]

Did you try to use -h to see what options are available for the program you are using?

Go [URL="https://www.mersenneforum.org/rogue/mtsieve.html"]here[/URL]. The default output for many sieves is ABCD format because it is compact. Some sieves do not support the ABCD format. The ABCD format is supported by pfgw. I do not know if llr supports that format off the top of my head. Some sieves have a -f option that allows you to specify the format of the output. As newpgen is not supported and since the most popular sieves it supports are either in the mtsieve framework or somewhere else, it is not recommended to use that format. The ABC format, which llr and pfgw both support, is preferred as the header of those files is not as cryptic as newpgen. The pfgw readme explains the ABC and ABCD formats. ABC is really easy to understand compared to newpgen.

kruoli 2022-12-15 19:39

When I try to continue a file I presieved with srsieve2 to 1e10 with srsieve2cl and Legendre tables, I get:
[C]Fatal Error: Expected 81 subsequences but 967 were created[/C]
Command line:
[C]srsieve2cl -p 1e10 -i remaining.abcd -g 16 -Q -o remaining_new.abcd -H -l {any value greater than 0}[/C]

Additionally, it takes really long to start the sieve. I ommitted [C]-l[/C]. It used more than 20 GB RAM and has not started sieving after 20 minutes. I was trying to sieve the 77 remaining sequences of R53. Maybe I was doing something wrong?

The initial sieve was called as:
[C]srsieve2 -n 100e3 -N 250e3 -P 1e10 -W 16 -Q -o remaining.abcd -s k.in[/C]
k.in held all k's in a line each.

rogue 2022-12-15 23:50

[QUOTE=kruoli;619893]When I try to continue a file I presieved with srsieve2 to 1e10 with srsieve2cl and Legendre tables, I get:
[C]Fatal Error: Expected 81 subsequences but 967 were created[/C]
Command line:
[C]srsieve2cl -p 1e10 -i remaining.abcd -g 16 -Q -o remaining_new.abcd -H -l {any value greater than 0}[/C]

Additionally, it takes really long to start the sieve. I ommitted [C]-l[/C]. It used more than 20 GB RAM and has not started sieving after 20 minutes. I was trying to sieve the 77 remaining sequences of R53. Maybe I was doing something wrong?

The initial sieve was called as:
[C]srsieve2 -n 100e3 -N 250e3 -P 1e10 -W 16 -Q -o remaining.abcd -s k.in[/C]
k.in held all k's in a line each.[/QUOTE]

The bug I have to fix is with multiple sequences and using Legendre tables.

I will take a look to see why it is using so much memory and task so long to load. It shouldn't take that long.

Trying2Sieve 2022-12-16 01:58

[QUOTE=rogue;619884]Did you try to use -h to see what options are available for the program you are using?

Go [URL="https://www.mersenneforum.org/rogue/mtsieve.html"]here[/URL]. The default output for many sieves is ABCD format because it is compact. Some sieves do not support the ABCD format. The ABCD format is supported by pfgw. I do not know if llr supports that format off the top of my head. Some sieves have a -f option that allows you to specify the format of the output. As newpgen is not supported and since the most popular sieves it supports are either in the mtsieve framework or somewhere else, it is not recommended to use that format. The ABC format, which llr and pfgw both support, is preferred as the header of those files is not as cryptic as newpgen. The pfgw readme explains the ABC and ABCD formats. ABC is really easy to understand compared to newpgen.[/QUOTE]

Ok, I understand, there is a large base of support for the ABCD format.
Maybe I can do the equivalent in that format, [B](if I understood it)[/B] so let me show you what I am doing and maybe it will give you guys an idea or two also.

[CODE]
7 40
7 45
7 48
7 68
7 80
7 83
7 97
7 119
7 124
7 129
7 130
[/CODE]

That's the sample NewPGen code to find primes that start with 7, end in 1, and have 00000...00000 in the middle.
The number next to the 7 is the power of 10 that will be used, like 10^45, 10^48 etc.

It's a vertical text file, and each line will be read one at a time, as you all know and expect.

My "parallelizer" basically turns this into a huge rectangular grid of data points.

Something like this:

[CODE]
7 40 7 45 7 48 7 68 7 80 7 83 7 97
7 273 7 278 7 282 7 293 7 311 7 336 7 352
7 545 7 560 7 563 7 564 7 566 7 568 7 578
7 780 7 808 7 810 7 823 7 824 7 827 7 832
7 1082 7 1083 7 1091 7 1092 7 1106 7 1118 7 1123
7 1314 7 1317 7 1320 7 1346 7 1350 7 1352 7 1379
7 1526 7 1542 7 1546 7 1559 7 1565 7 1566 7 1569
7 1680 7 1682 7 1690 7 1701 7 1705 7 1721 7 1722
7 1940 7 1942 7 1943 7 1958 7 1964 7 1978 7 1980
7 2154 7 2156 7 2160
[/CODE]

Each [B]VERTICAL [/B]column of candidate primes will be given to a different core.

In this small example, I used 30-way parallelism to demonstrate its usefulness.

Candidate prime #2 on core one is actually #31 in the master sieve list.
Candidate prime #3 on core one is actually #61 in the master sieve list.

It's like "skipping ahead" 30 candidates at a time, or making the artificial sieve depth for that core nearly 97% better. I'll encounter larger primes more quickly on average, even if they are scattered haphazardly, because I am not testing them "in order."

With the ABCD format, I have no idea how to do this.

Or is this just a waste of time?

I'm sure I'll encounter larger primes quicker this way (I have 365 cores actually) so each day of searching should statistically be equivalent to one year's worth of probing on a single core.

[B]How can I do this in ABCD format?[/B]

And thanks Rogue, your contributions here have been enormous and undoubtedly everyone vastly appreciates your effort.

rogue 2022-12-16 03:55

None of the sieves in the framework sieve repdigits, which is what this appears to be. Which option in newpgen sieves that format?

For pfgw you would want the ABC format. The first line would be something like this:
ABC 7*10^($a+1)+1
40
45
48

Again, nothing in the framework that does that.

For distributing the work across many clients, if they are all on the same network, PRPNet is an option, but you would have to use the generic pfgw format in the server as the server doesn't support repdigit searches.

kruoli 2022-12-16 08:20

[QUOTE=rogue;619921]I will take a look to see why it is using so much memory and task so long to load. It shouldn't take that long.[/QUOTE]

Thank you! After 12 hours, it still has not done anything, but memory usage reduced to around 1 GB. I will kill it now.

kruoli 2022-12-16 10:33

It looks like it has to do with the GPU I use. I now inserted a GT 1030 and it works fine with it in the same machine. No large memory usage on startup and no delay etc.

The card I wanted to use was an R9 290. This card works fine with other OpenCL programs such as mfakto and gpuOwl. So maybe srsieve2 is trying to do some optimisation on the R9 290 that it does not do on the GT 1030?

rogue 2022-12-16 14:05

[QUOTE=kruoli;619955]It looks like it has to do with the GPU I use. I now inserted a GT 1030 and it works fine with it in the same machine. No large memory usage on startup and no delay etc.

The card I wanted to use was an R9 290. This card works fine with other OpenCL programs such as mfakto and gpuOwl. So maybe srsieve2 is trying to do some optimisation on the R9 290 that it does not do on the GT 1030?[/QUOTE]

The OpenCL code knows nothing about the driver or the GPU. The code is generic. It could be an incompatibility between driver and GPU.

kruoli 2022-12-16 14:15

Since it is working with other programs, I wonder what exactly might be the problem in this case. If I call it with -H, it hangs before the OpenCL details get printed out. Do you know of any possibility how we can track down where exactly it hangs and maybe find a workaround?

rogue 2022-12-16 14:24

[QUOTE=kruoli;619968]Since it is working with other programs, I wonder what exactly might be the problem in this case. If I call it with -H, it hangs before the OpenCL details get printed out. Do you know of any possibility how we can track down where exactly it hangs and maybe find a workaround?[/QUOTE]

Does it hang with one card or with both?

Are the drivers out of date?

The only thing one could do is make with debug=yes and let it run then ^C after it "hangs" to see where it is.

IF I had to guess it is hanging when creating the kernel.

kruoli 2022-12-16 14:33

Since I assume you mean each card individually: It only hangs with the R9 290. I now have both of them in the machine and one works while the other does not; but the other programs still work. So maybe the AMD card needs some specialty the GT 1030 does not need.

I will try to build it myself later or this weekend. If I can find something, I will try to find a workaround since I have programmed OpenCL a few years ago, maybe I see something.

The driver [I]should[/I] be fine since it is the one AMD suggests on their site for this GPU.

If I found out something and/or found a workaround, I will inform you, thank you! :smile:

PS: When running the GT 1030 with sufficiently large -g, srsieve2cl still uses consistently one full core while doing 60K p/sec and around 1 f/sec. Is this expected, some kind of overhead, factor verification? I assumed it should not be the prime generation, since I can run much faster with e.g. 16 cores.

Happy5214 2022-12-16 14:37

Can you add a flag to either suppress the log file generation or redirect it to another file? My script (the one I mentioned I was writing for testing [i]q[/i] values) will likely run in mixed environments, some where a log already exists (and I don't want it deleted), and some where no log exists (and I don't want one created).

rogue 2022-12-17 00:10

[QUOTE=kruoli;619973]Since I assume you mean each card individually: It only hangs with the R9 290. I now have both of them in the machine and one works while the other does not; but the other programs still work. So maybe the AMD card needs some specialty the GT 1030 does not need.

I will try to build it myself later or this weekend. If I can find something, I will try to find a workaround since I have programmed OpenCL a few years ago, maybe I see something.

The driver [I]should[/I] be fine since it is the one AMD suggests on their site for this GPU.

If I found out something and/or found a workaround, I will inform you, thank you! :smile:

PS: When running the GT 1030 with sufficiently large -g, srsieve2cl still uses consistently one full core while doing 60K p/sec and around 1 f/sec. Is this expected, some kind of overhead, factor verification? I assumed it should not be the prime generation, since I can run much faster with e.g. 16 cores.[/QUOTE]

If this is on Windows, I suspect something with Windows because I have seen the same behavior.

rogue 2022-12-17 00:12

[QUOTE=Happy5214;619974]Can you add a flag to either suppress the log file generation or redirect it to another file? My script (the one I mentioned I was writing for testing [i]q[/i] values) will likely run in mixed environments, some where a log already exists (and I don't want it deleted), and some where no log exists (and I don't want one created).[/QUOTE]

Unlikely. That would require large changes to the framework. I do not understand why the log file is an issue. Do you have a script reading from it? You can run a copy of the programs from different directories. They will not share a log file. I do not understand why you would want to suppress it.

Trying2Sieve 2022-12-17 08:55

[QUOTE=rogue;619942]None of the sieves in the framework sieve repdigits, which is what this appears to be. Which option in newpgen sieves that format?

For pfgw you would want the ABC format. The first line would be something like this:
ABC 7*10^($a+1)+1
40
45
48

Again, nothing in the framework that does that.

For distributing the work across many clients, if they are all on the same network, PRPNet is an option, but you would have to use the generic pfgw format in the server as the server doesn't support repdigit searches.[/QUOTE]

But I'm [B]NOT [/B]looking for rep digits.

7 * 10^n + 1 is a form of candidate primes that is supported by almost every siever and of course pfgw.

So my original question still stands, I think.

How can I REORGANIZE the output of a fast sieve?

It seems none of them spit out the unsieved exponents as NewpGen does.

I see output that looks like this:

[CODE]
ABCD 7*10^$a+1 [1] // Sieved to 9147679528817
1
1
1
1
3
1
31
5
3
49
22
5
5
1
6
6
6
4
6
9
[/CODE]


I have no idea which exponents remain. How can we possibly know from such a format?
How can I change this to something that shows the unsieved exponents?

kruoli 2022-12-17 09:14

[QUOTE=rogue;620021]If this is on Windows, I suspect something with Windows because I have seen the same behavior.[/QUOTE]

It looks like it has to do with the OpenCL compiler. If I e.g. use -K 10 and -H, it takes around three minutes to show the OpenCL details and half an hour to start sieving. But then it works!

rogue 2022-12-17 13:54

[QUOTE=kruoli;620060]It looks like it has to do with the OpenCL compiler. If I e.g. use -K 10 and -H, it takes around three minutes to show the OpenCL details and half an hour to start sieving. But then it works![/QUOTE]

If you need -K10, then the GPU is slow or lacking memory. A CPU will likely be faster.

rogue 2022-12-17 13:56

I have already stated, use ABC format. That is one of the options with -f.

rogue 2022-12-17 15:20

I have posted 2.3.9 over at sourceforge.

This fixes the -q/-Q issue with srsieve2cl and a segfault I encountered in srsieve2cl.


All times are UTC. The time now is 17:50.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2023, Jelsoft Enterprises Ltd.