mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   Software (https://www.mersenneforum.org/forumdisplay.php?f=10)
-   -   mtsieve (https://www.mersenneforum.org/showthread.php?t=23042)

rogue 2022-12-02 14:44

[QUOTE=storm5510;618821]After doing more "digging" on my external drive, I found the batch files I had written for [I]srsieve[/I] and [I]sr1sieve[/I]. [I]srsieve[/I] runs to a point then [I]sr1sieve[/I] takes over after [I]srfile[/I] does a conversion. This may take some time.

Many thanks! :smile:[/QUOTE]

There is no reason to using srsieve anymore. Use srsieve2, even if you don't have a GPU. Without a GPU you sieve to 1e6 with srsieve2, then switch to sr1sieve/sr2sieve. The default output format from srsieve2 (ABCD) can be read by the current versions of sr1sieve/sr2sieve.

storm5510 2022-12-02 17:43

[QUOTE=rogue;618839]There is no reason to using srsieve anymore. Use srsieve2, even if you don't have a GPU. Without a GPU you sieve to 1e6 with srsieve2, then switch to sr1sieve/sr2sieve. The default output format from srsieve2 (ABCD) can be read by the current versions of sr1sieve/sr2sieve.[/QUOTE]

I did. [I]srsieve2cl[/I]. The GPU utilization never dropped below 80%. Consider the following:

[CODE]srsieve2cl -n 1e3 -N 15e6 -P 5e9 -M 3500 -s "101*2^n+1"[/CODE]

This is me experimenting with the switches. What I ended up with was 831,020 remaining terms. This is, by far, too many to be practical for any LLR process. The largest result in an output file never appears to exceed the value of [C]-N[/C]. The quoted above took five minutes to run. I believe I need to [U]greatly[/U] increase the value of [C]-P[/C]. I will try this again with [C]-P[/C] at 100e9 and see what is left over.

[U]Note[/U]: I used 101 in the sequence because I knew it was a prime number, just not Mersenne. I was receiving GPU messages until [C]-M[/C] was at 3,500.

rogue 2022-12-02 19:28

[QUOTE=storm5510;618850]I did. [I]srsieve2cl[/I]. The GPU utilization never dropped below 80%. Consider the following:

[CODE]srsieve2cl -n 1e3 -N 15e6 -P 5e9 -M 3500 -s "101*2^n+1"[/CODE]

This is me experimenting with the switches. What I ended up with was 831,020 remaining terms. This is, by far, too many to be practical for any LLR process. The largest result in an output file never appears to exceed the value of [C]-N[/C]. The quoted above took five minutes to run. I believe I need to [U]greatly[/U] increase the value of [C]-P[/C]. I will try this again with [C]-P[/C] at 100e9 and see what is left over.

[U]Note[/U]: I used 101 in the sequence because I knew it was a prime number, just not Mersenne. I was receiving GPU messages until [C]-M[/C] was at 3,500.[/QUOTE]

Hopefully the speed is to your liking. Use -g. The default is 8. I recommend a power of 2, such as -g16 or -g32. That should increase GPU utilization. Once you get to P of about 1e9 or maybe even 1e10 you will need need to add -M. In the future I will modify the code so that -M is adjusted automatically while running.

Yes, no n in the output file will be outside of the range you specified on the command line.

storm5510 2022-12-02 22:43

[QUOTE=rogue;618858]Hopefully the speed is to your liking. Use -g. The default is 8. I recommend a power of 2, such as -g16 or -g32. That should increase GPU utilization. Once you get to P of about 1e9 or maybe even 1e10 you will need need to add -M. In the future I will modify the code so that -M is adjusted automatically while running.

Yes, no n in the output file will be outside of the range you specified on the command line.[/QUOTE]

I adjusted [C]-P[/C] to 100e9. The run took 1.3 hours. The number of remaining terms was about half of what I had before. The GPU stayed below 50°C, so that is really good. I will give [C]-g 16[/C] a try. Perhaps the high number of terms is related to the size of the number at the front of the series. I should use something larger.

rogue 2022-12-02 23:30

[QUOTE=storm5510;618874]I adjusted [C]-P[/C] to 100e9. The run took 1.3 hours. The number of remaining terms was about half of what I had before. The GPU stayed below 50°C, so that is really good. I will give [C]-g 16[/C] a try. Perhaps the high number of terms is related to the size of the number at the front of the series. I should use something larger.[/QUOTE]

The range of n determines how many terms you start with so I'm not certain what you mean by "use something larger".

storm5510 2022-12-03 05:58

[QUOTE=rogue;618878]The range of n determines how many terms you start with so I'm not certain what you mean by "[B]use something larger[/B]".[/QUOTE]

Using a larger value for [I]k[/I].

You are to be congratulated for the amazing performance increase. I can run sieves in a few hours which three years ago may have taken days when I was running Riesel's for [URL="https://www.rieselprime.de/ziki/Main_Page"]Prime Wiki[/URL].

[U]Off-topic[/U]: I am not aware of anything being done with the [I]LLR[/I] group of programs. In my case, there was a major slow-down which started around 800K for [I]n[/I]. The [I]k[/I] value did not much matter.

rogue 2022-12-03 14:06

[QUOTE=storm5510;618889]You are to be congratulated for the amazing performance increase. I can run sieves in a few hours which three years ago may have taken days when I was running Riesel's for [URL="https://www.rieselprime.de/ziki/Main_Page"]Prime Wiki[/URL].

[U]Off-topic[/U]: I am not aware of anything being done with the [I]LLR[/I] group of programs. In my case, there was a major slow-down which started around 800K for [I]n[/I]. The [I]k[/I] value did not much matter.[/QUOTE]

Thank you!

The speed of llr/pfgw is a result of the FFT size needed to do the PRP/primary test. This primarily driven by n since k is only a handful of bits and n is many thousands of bits. There are some GPU programs that can do PRP/primarity tests, such as llrCUDA, proth20, and various versions of genefer. proth20 is limited to base 2. genefer is limited to GFNs.

kruoli 2022-12-03 15:53

[QUOTE=storm5510;618889]I am not aware of anything being done with the [I]LLR[/I] group of programs. In my case, there was a major slow-down which started around 800K for [I]n[/I]. The [I]k[/I] value did not much matter.[/QUOTE]

There is a [URL="http://mersenneforum.org/showthread.php?t=28170"]CUDA version[/URL] of LLR. If you want to test larger numbers, you might be interested in trying this on your 2080.

storm5510 2022-12-04 00:30

[QUOTE=kruoli;618911]There is a [URL="http://mersenneforum.org/showthread.php?t=28170"]CUDA version[/URL] of LLR. If you want to test larger numbers, you might be interested in trying this on your 2080.[/QUOTE]

Thanks. I did on my Linux box, (Ubuntu 20.04.4 LTS). That system has a GTX 1080 in it. It is not a good performer because of limited power. Tests started out taking 13 seconds and got slower as it went on. The same system is a dual-boot, of sorts. Windows 7 exists on one drive and Ubuntu lives on another. I switch the drive cables. The Windows version began the same tests taking about 3 seconds. It is an HP workstation probably 8 years old. Long-in-the-tooth. My 2080 is in a Windows 10 system from 2018. I make do with what I have.

[QUOTE=rogue]Thank you![/QUOTE]

I give recognition when due. It was decidedly due here! I am quite pleased with it. :smile:

Honza 2022-12-04 08:31

Is there Windows binary for latest version of LLRCUDA somewhere?

pepi37 2022-12-04 11:24

[QUOTE=Honza;618948]Is there Windows binary for latest version of LLRCUDA somewhere?[/QUOTE]

Nope, just for linux , and as I know it is not fast enough ( in my case)


All times are UTC. The time now is 18:09.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2023, Jelsoft Enterprises Ltd.