mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   Software (https://www.mersenneforum.org/forumdisplay.php?f=10)
-   -   mtsieve (https://www.mersenneforum.org/showthread.php?t=23042)

rogue 2018-06-21 15:58

To this point I have only been using extended FPU and SSE routines within the sieving code. Starting with code written by Ernst, I have written AVX routines that will improve performance on the CPU. I estimate between 30% and 50% faster sieving. Now that I understand AVX much better, AVX512 is a possibility, but I don't have access to a CPU with AVX512 support.

It will take time for me to integrate into the various sieves and some sieves will not be a good candidate for the AVX routines. One example of this is cksieve. I will need to evaluate that separately.

rogue 2018-06-25 20:36

I have released mtsieve 1.6. Here are the changes:

[code]
Fixed an error with factor rate calculation when less than 1 per second.
Fixed an issue with gfndsieve when continuing a sieve and k < n.
For kbbsieve, added some checks for algebraic factorizations.
Added gcwsieve for Cullens and Woodalls. This sieve is GPU enabled.
Renamed all ASM routines to easily distinguish FPU/SSE/AVX.

Added AVX asm code for use by the Worker classes.
Added a mini-chunk mode that can be used when the worker classes
handles primes in chunks, such as AVX mode, which is chunks of 16 primes.

gcwsieve supports AVX. The CPU-only code is about 30% faster than Geoff Reynold's version.
xyyxsieve supports AVX. The CPU-only code is about 2.5x faster than the previous version.
[/code]

You can get the latest code with Windows builds [URL="http://mersenneforum.org/rogue/mtsieve.html"]here[/URL].

I expect the AVX routines to fail on non-Windows OSes. If they do then I know what I need to fix. It is a matter of finding the time. I've been doing some refactoring with the hope that using this framework becomes easier for others once that is done.

If anyone is truly interested in helping me, I need help adding AVX support to the other sieves. If interested, please contact me via PM or e-mail.

henryzz 2018-06-26 08:25

I need to give you my code for [TEX]b_1^{n_1}b_2^{n_2}+c[/TEX] at some point. I need to work out whether it is TestPrimeChunk or BuildBNRemainders taking more time. If it is BuildBNRemainders then I will probably add a third variable base and n. I do want to at least add in support for a fixed multiplier.

It also needs to be converted to use SSE2/AVX where possible.

rogue 2018-06-26 16:36

[QUOTE=henryzz;490582]I need to give you my code for [TEX]b_1^{n_1}b_2^{n_2}+c[/TEX] at some point. I need to work out whether it is TestPrimeChunk or BuildBNRemainders taking more time. If it is BuildBNRemainders then I will probably add a third variable base and n. I do want to at least add in support for a fixed multiplier.

It also needs to be converted to use SSE2/AVX where possible.[/QUOTE]

Good to know that someone is trying to use my framework for their own benefit.

You need to call CpuSupportsAvx() to determine if the CPU has AVX support. This is declared in Worker.h. If it returns false then you need to code with the FPU or SSE routines.

I suggest that you a look at avx-asm-x86.h as well as CullenWoodallWorker.cpp to help familiarize yourself with the AVX routines.

One other caveat is that you cannot use the AVX code and have data of type double (double * is fine). This is because the AVX routines don't save the xmm registers upon entry and restore them upon exit. I'll address that limitation in an upcoming release.

rogue 2018-07-21 21:44

I have posted mtsieve 1.7 to my website. Here are the changes:

[code]
Added a timestamp to liens written to the log.
Canged usage of some registers in the AVX code to avoid ymm0-ymm3 being passed between calls to AVX routines.
Added psieve for primorials.

psieve supports AVX and is about 30% faster than fpsieve.
[/code]

pepi37 2018-07-25 08:34

fbncsieve v1.3.1, a program to find factors of k*b^n+c numbers for fixed b, n, and c and variable k
Sieve started: 92230982980247 < p < 122230982980247 with 33803 terms
p=92471161196761, 25.35M p/sec, 1 factors found at 308 sec per factor, 0.8% done. ETC 2018-07-25 21:07


So simple math show that speed of this sieve is not 25350000 but much higher 952380952


Or 25.35M p/sec means something different?

henryzz 2018-07-25 08:51

[QUOTE=pepi37;492446]fbncsieve v1.3.1, a program to find factors of k*b^n+c numbers for fixed b, n, and c and variable k
Sieve started: 92230982980247 < p < 122230982980247 with 33803 terms
p=92471161196761, 25.35M p/sec, 1 factors found at 308 sec per factor, 0.8% done. ETC 2018-07-25 21:07


So simple math show that speed of this sieve is not 25350000 but much higher 952380952


Or 25.35M p/sec means something different?[/QUOTE]

Primes vs range of p?

axn 2018-07-25 09:40

[QUOTE=henryzz;492447]Primes vs range of p?[/QUOTE]

Sure sounds like it. log(92471161196761) ~= 32. So there will be a factor of 32 between the prime range method and the prime count method. But OP's calculation has a factor of 37 -- don't know how.
(122230982980247 - 92230982980247 )*0.8% / 308 ~= 779m, not 952m.

ATH 2018-07-25 10:21

2 minor cosmetic errors with fkbnsieve:
The variable showing the number of terms it is about to sieve is a signed 32 bit variable, but it still works even if the c interval is above 2^31 and above 2^32.
The -c and -C are also called kmin and kmax instead of cmin and cmax.

rogue 2018-07-25 16:18

[QUOTE=henryzz;492447]Primes vs range of p?[/QUOTE]

Correct. "range of p" is really primes between -p and -P. The software only tests primes in that range, so counting primes that were tested makes a lot more sense.

rogue 2018-07-25 16:23

[QUOTE=ATH;492453]2 minor cosmetic errors with fkbnsieve:
The variable showing the number of terms it is about to sieve is a signed 32 bit variable, but it still works even if the c interval is above 2^31 and above 2^32.
The -c and -C are also called kmin and kmax instead of cmin and cmax.[/QUOTE]

Thanks for the heads-up. I'll fix in the next release.


All times are UTC. The time now is 20:01.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2022, Jelsoft Enterprises Ltd.