mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   Software (https://www.mersenneforum.org/forumdisplay.php?f=10)
-   -   mtsieve (https://www.mersenneforum.org/showthread.php?t=23042)

ET_ 2018-09-19 13:23

Hi folks.

I have a basic core2 machine at home, and an AVX-512 capable 4-core server via AWS. Now, I need a copy of [B]gfndsieve[/B] compiled for [B]Linux 64-bit[/B] and all the [B]AVX optimizations[/B] turned on and no OpenCL, but on the AWS system I only have gcc version 4.8 and can't compile my optimized copy.

Is anybody out there who can provide me with such file? Or should I use the correct -march flag an compile on my machine?

Thanks in advance.

Luigi
---

rogue 2018-09-19 13:33

[QUOTE=ET_;496357]Hi folks.

I have a basic core2 machine at home, and an AVX-512 capable 4-core server via AWS. Now, I need a copy of [B]gfndsieve[/B] compiled for [B]Linux 64-bit[/B] and all the [B]AVX optimizations[/B] turned on and no OpenCL, but on the AWS system I only have gcc version 4.8 and can't compile my optimized copy.

Is anybody out there who can provide me with such file? Or should I use the correct -march flag an compile on my machine?[/QUOTE]

There is no AVX512 code (yet) and the decision to use AVX or SSE/FPU is decided a runtime based upon the capability of the CPU.

To disable compiling and linking with GPU code, set ENABLE_GPU to no in the makefile.

Once you do that, what issues are you getting on that box with gcc?

ET_ 2018-09-19 14:05

[QUOTE=rogue;496360]There is no AVX512 code (yet) and the decision to use AVX or SSE/FPU is decided a runtime based upon the capability of the CPU.

To disable compiling and linking with GPU code, set ENABLE_GPU to no in the makefile.

Once you do that, what issues are you getting on that box with gcc?[/QUOTE]

I knew there were distinct code paths enabled on the executable, but I thought one had to enable the relative processor optimizations to have the code recognize it.

In other words, if my code is compiled with -march=native, and I have a Intel G2030 processor (a crippled ivy-bridge with no AVX / FMA3 support), will the executable automatically run the FMA3 path once it is run on a AWS Skylake architecture?

If so, then I solved the issue.
If not, then I should recompile the code on an architecture whose "native" processor recognizes the optimizations. But the AWS gcc is locked at version 4.8, and I'm afraid it wouldn't recognize FMA3 optimizations.

I'm a master in complicating my own life... :sad:

rogue 2018-09-19 14:32

[QUOTE=ET_;496363]I knew there were distinct code paths enabled on the executable, but I thought one had to enable the relative processor optimizations to have the code recognize it.

In other words, if my code is compiled with -march=native, and I have a Intel G2030 processor (a crippled ivy-bridge with no AVX / FMA3 support), will the executable automatically run the FMA3 path once it is run on a AWS Skylake architecture?

If so, then I solved the issue.
If not, then I should recompile the code on an architecture whose "native" processor recognizes the optimizations. But the AWS gcc is locked at version 4.8, and I'm afraid it wouldn't recognize FMA3 optimizations.

I'm a master in complicating my own life... :sad:[/QUOTE]

It should. It calls a gcc function called builtin_cpu_supports() when deciding if it can use AVX code. I assume that function checks something specific to the computer upon which the code is executing.

ET_ 2018-09-19 14:50

[QUOTE=rogue;496366]It should. It calls a gcc function called builtin_cpu_supports() when deciding if it can use AVX code. I assume that function checks something specific to the computer upon which the code is executing.[/QUOTE]

Thank you Mark. I will test it and then report here. BTW, is there a message saying what optimizations are used at runtime?

ET_ 2018-09-19 15:47

[QUOTE=ET_;496369]Thank you Mark. I will test it and then report here. BTW, is there a message saying what optimizations are used at runtime?[/QUOTE]

After the 50G primes tested with the same executable, the AVX version on the Xeon is 35%-40% faster than the base version at the same clock.

rogue 2018-09-19 16:51

[QUOTE=ET_;496369]Thank you Mark. I will test it and then report here. BTW, is there a message saying what optimizations are used at runtime?[/QUOTE]

Not at this time, but it is something I have considered adding.

rogue 2018-09-26 01:36

I have posted mtsieve 1.8.0 at my website. Here are the changes:

[code]
Added twinsieve. This is more than 3x faster than newpgen's twin sieve.

Modified OpenCL code to change calculation for default workunits to improve GPU throughput.
Modified "start sieving" message to include expected factors, but only if -P is not the default value.
Modified all sieves to have custom "start sieving message" so it each show more detail specific to that sieve.
[/code]

The default behavior of twinsieve is to sieve such that only potential twin primes are remaining, but there is a -i switch that allows one to sieve the +1 and -1 side independently.

pepi37 2018-09-26 07:33

[QUOTE=rogue;496776]I have posted mtsieve 1.8.0 at my website. Here are the changes:

[code]
Added twinsieve. This is more than 3x faster than newpgen's twin sieve.

Modified OpenCL code to change calculation for default workunits to improve GPU throughput.
Modified "start sieving" message to include expected factors, but only if -P is not the default value.
Modified all sieves to have custom "start sieving message" so it each show more detail specific to that sieve.
[/code]The default behavior of twinsieve is to sieve such that only potential twin primes are remaining, but there is a -i switch that allows one to sieve the +1 and -1 side independently.[/QUOTE]


In twinsieve you use switch -i on two different places

-i --inputterms=i input file of remaining candidates
-i --independent Sieve +1 and -1 independently
if (!ib_OnlyTwins && it_Format == FF_ABC)
FatalError("Can only support ABC format if sieving +1 and -1 independently");
If i use --independent then is always zero output regardless format ABC


d:\MTSIEVE\TWINSIEVE>twinsieve -P100000000000 -w10000000 -i1.npg -ofact.txt -W4 -fN -r
twinsieve v1.0.0, a program to find factors of k*b^n+1/-1 numbers for fixed b and n and variable k
Sieve started: 30000000001 < p < 1e11 with 18446744073709502166 terms (261 < k < 99309, k*2^1778899) (expecting 876855490500155136 factors)


If in command line stay switch -r then you got this , if you remove it, then all is ok

pepi37 2018-09-26 12:19

And last
If sieve passed 54105949591 ( or very close up to this value) then will be no output and program just terminate.
If sieve depth is lower then that value, program gives output as should do.

rogue 2018-09-26 13:08

[QUOTE=pepi37;496791]In twinsieve you use switch -i on two different places

-i --inputterms=i input file of remaining candidates
-i --independent Sieve +1 and -1 independently
if (!ib_OnlyTwins && it_Format == FF_ABC)
FatalError("Can only support ABC format if sieving +1 and -1 independently");
If i use --independent then is always zero output regardless format ABC


d:\MTSIEVE\TWINSIEVE>twinsieve -P100000000000 -w10000000 -i1.npg -ofact.txt -W4 -fN -r
twinsieve v1.0.0, a program to find factors of k*b^n+1/-1 numbers for fixed b and n and variable k
Sieve started: 30000000001 < p < 1e11 with 18446744073709502166 terms (261 < k < 99309, k*2^1778899) (expecting 876855490500155136 factors)

If in command line stay switch -r then you got this , if you remove it, then all is ok[/QUOTE]

I'll switch it to use a different character as -i is reserved for the underlying framework.


All times are UTC. The time now is 18:36.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2022, Jelsoft Enterprises Ltd.