mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   Software (https://www.mersenneforum.org/forumdisplay.php?f=10)
-   -   mtsieve (https://www.mersenneforum.org/showthread.php?t=23042)

rogue 2021-01-10 19:04

I have released 2.1.4. Here are the changes:

[code]
framework:
Fixed an issue with creating GPU kernels on OS X.

srseive2cl: new release
Finally an OpenCL version of srsieve2. srsieve2cl is at least 3x faster than srsieve2,
On my GPU it is limited to about 5000 sequences due to GPU memory limitations. I do not
know what the limits are for other GPUs. It will switch to the GPU at p>1e6.
[/code]

On an older GPU, srsieve2cl struggles with 1000 sequences causing significant lag in the display. But that GPU is also much slower, so it isn't worth running on it.

rebirther 2021-01-10 19:30

[QUOTE=rogue;568920]I have released 2.1.4. Here are the changes:

[code]
framework:
Fixed an issue with creating GPU kernels on OS X.

srseive2cl: new release
Finally an OpenCL version of srsieve2. srsieve2cl is at least 3x faster than srsieve2,
On my GPU it is limited to about 5000 sequences due to GPU memory limitations. I do not
know what the limits are for other GPUs. It will switch to the GPU at p>1e6.
[/code]

On an older GPU, srsieve2cl struggles with 1000 sequences causing significant lag in the display. But that GPU is also much slower, so it isn't worth running on it.[/QUOTE]

How much VRAM is used for 5000 sequences and 80000?

rogue 2021-01-10 19:43

[QUOTE=rebirther;568925]How much VRAM is used for 5000 sequences and 80000?[/QUOTE]

3257 sequences (9383 subsequences) using the GPU takes about 37 MB of RAM in the CPU and about 6 GB dedicated memory in the GPU (per Task Manager).

I do not recall how much CPU memory was used with 80000 sequences, but I thought it was around 2 GB.

Citrix 2021-01-11 01:35

[QUOTE=rogue;568920]I have released 2.1.4. Here are the changes:

[code]
framework:
Fixed an issue with creating GPU kernels on OS X.

srseive2cl: new release
Finally an OpenCL version of srsieve2. srsieve2cl is at least 3x faster than srsieve2,
On my GPU it is limited to about 5000 sequences due to GPU memory limitations. I do not
know what the limits are for other GPUs. It will switch to the GPU at p>1e6.
[/code]

On an older GPU, srsieve2cl struggles with 1000 sequences causing significant lag in the display. But that GPU is also much slower, so it isn't worth running on it.[/QUOTE]

I am getting a speed of 4kp/sec for 11 sequences from n=1M to 20M. Sr2sieve and srsieve2 are both significantly faster. Is this what is expected?

rogue 2021-01-11 03:24

[QUOTE=Citrix;568959]I am getting a speed of 4kp/sec for 11 sequences from n=1M to 20M. Sr2sieve and srsieve2 are both significantly faster. Is this what is expected?[/QUOTE]

I do not look at p/sec as it is calculated differently. I look at factors per second. It is far more accurate. Nevertheless srsieve2 and sr2sieve can be faster if your GPU isn't particularly fast.

Dylan14 2021-01-11 19:00

Might it be possible to update the primesieve code used by mtsieve to version 7.6? It seems to provide some improvements over 7.3 which is currently used:
[LIST][*]improved caching of primes[*]improved switch statement in EratSmall and EratMedium[*]cache size detection improved on Linux and with the Apple Silicon CPU's (which could be useful for compiling this for ARM)[/LIST]

rebirther 2021-01-11 19:04

[QUOTE=rogue;568928]3257 sequences (9383 subsequences) using the GPU takes about 37 MB of RAM in the CPU and about 6 GB dedicated memory in the GPU (per Task Manager).

I do not recall how much CPU memory was used with 80000 sequences, but I thought it was around 2 GB.[/QUOTE]


Tried now the cl version on a RTX 5500XT with 8GB RAM but hit the limit, there was a driver timeout because of too much RAM used, I think it was 7.4GB.



[B]srsieve2cl.exe -n2501 -N10000 -P1e9 -M 15000 -spl_remain.txt -fB[/B]



2021-01-11 19:57:22: Sieve completed at p=1000071173. Primes tested 50772480. Found 87459308 factors. 16098192 terms remaining. Time 239.43 seconds


The speed is awesome, still running this on 16 cores srsieve2 to compare. Could be much better on faster cards with 16-24GB RAM.

rogue 2021-01-11 19:36

[QUOTE=rebirther;569010]Tried now the cl version on a RTX 5500XT with 8GB RAM but hit the limit, there was a driver timeout because of too much RAM used, I think it was 7.4GB.



[B]srsieve2cl.exe -n2501 -N10000 -P1e9 -M 15000 -spl_remain.txt -fB[/B]



2021-01-11 19:57:22: Sieve completed at p=1000071173. Primes tested 50772480. Found 87459308 factors. 16098192 terms remaining. Time 239.43 seconds


The speed is awesome, still running this on 16 cores srsieve2 to compare. Could be much better on faster cards with 16-24GB RAM.[/QUOTE]

Try using a lower value for -g (10 is the default). That should reduce some of the GPU memory usage..

rogue 2021-01-11 19:38

[QUOTE=Dylan14;569009]Might it be possible to update the primesieve code used by mtsieve to version 7.6? It seems to provide some improvements over 7.3 which is currently used:
[LIST][*]improved caching of primes[*]improved switch statement in EratSmall and EratMedium[*]cache size detection improved on Linux and with the Apple Silicon CPU's (which could be useful for compiling this for ARM)[/LIST][/QUOTE]

That shouldn't be too hard to do.

rebirther 2021-01-11 19:54

[QUOTE=rebirther;569010]Tried now the cl version on a RTX 5500XT with 8GB RAM but hit the limit, there was a driver timeout because of too much RAM used, I think it was 7.4GB.

[B]srsieve2cl.exe -n2501 -N10000 -P1e9 -M 15000 -spl_remain.txt -fB[/B]

2021-01-11 19:57:22: Sieve completed at p=1000071173. Primes tested 50772480. Found 87459308 factors. 16098192 terms remaining. Time 239.43 seconds

The speed is awesome, still running this on 16 cores srsieve2 to compare. Could be much better on faster cards with 16-24GB RAM.[/QUOTE]

vs Ryzen 3950X with 16 cores

[B]srsieve2 -n2501 -N10000 -P1e9 -W16 -spl_remain.txt -fB[/B]

2021-01-11 20:50:35: Sieve completed at p=1000000007. Primes tested 50847420. Found 92827983 factors. 10729517 terms remaining. Time 4990.80 seconds

The CPU reduces the sievefile a bit more than GPU.

rebirther 2021-01-11 20:09

[B]srsieve2cl.exe -n2501 -N10000 -P1e9 -M 5000 -g 5 -spl_remain.txt -fB[/B]

I tried to reduce the values but ends up every time in a driver timout. The usage was from 6.5-6.8GB VRAM. The error message came from the AMD driver, windows log (amdkmdag).

rogue 2021-01-11 21:57

[QUOTE=rebirther;569021][B]srsieve2cl.exe -n2501 -N10000 -P1e9 -M 5000 -g 5 -spl_remain.txt -fB[/B]

I tried to reduce the values but ends up every time in a driver timout. The usage was from 6.5-6.8GB VRAM. The error message came from the AMD driver, windows log (amdkmdag).[/QUOTE]

I'm not certain how to address that. How many sequences are in the file? Can you cut the number of sequences in half and see if that works?

rebirther 2021-01-12 16:44

[QUOTE=rogue;569027]I'm not certain how to address that. How many sequences are in the file? Can you cut the number of sequences in half and see if that works?[/QUOTE]


13808 sequences

pepi37 2021-01-13 11:04

[QUOTE]e:\PRIME\NEW-PROJECT1>twinsieve -P 2000000000000000 -W 5 -w 5e7 -i k_b2_n4194304.pfgw -o k_b2_n4194304.pfgw -O 4194304.txt
twinsieve v1.3, a program to find factors of k*b^n+1/-1 numbers for fixed b and n and variable k
Switching to ABC format since other formats are not supported when using -s
Sieve started: 1646125914229267 < p < 2e15 with 128499 terms (7 < k < 3999975, k*2^4194304) (expecting 710 factors)
p=1646139929503171, 46.01M p/sec, 23 factors found at 1328 sec per factor (last 128 min), 0.0% done. [COLOR=Red][B]ETC 2027-04-15[/B][/COLOR] 09:31


C:\Users\Desktop\NEW-PROJECT>twinsieve -P 1500000000000000 -W 3 -w 2e7 -i k_b2_n4194304.pfgw -o k_b2_n4194304.pfgw -O 4194304.txt
twinsieve v1.3, a program to find factors of k*b^n+1/-1 numbers for fixed b and n and variable k
Switching to ABC format since other formats are not supported when using -s
Sieve started: 1215757153905743 < p < 15e14 with 131939 terms (7 < k < 3999975, k*2^4194304) (expecting 793 factors)
p=1226583079600591, 30.11M p/sec, 32 factors found at 925 sec per factor (last 164 min), 3.8% done. ETC [B][COLOR=Red]2021-01-16 [/COLOR][/B]10:29[/QUOTE]


I found prediction time very wrong,as you can see in this example. I also found why is time so wrong. From time to time sieve increase current p ( in this example p=1646139929503171) and then prediction become "normal" : in this case sieve will be over in 5-6 days, not 6 years. It is cosmetic bug, but little strange one. On the other CPU prediction with same sieve file, is very accurate.

rogue 2021-01-13 13:06

[QUOTE=pepi37;569155]I found prediction time very wrong,as you can see in this example. I also found why is time so wrong. From time to time sieve increase current p ( in this example p=1646139929503171) and then prediction become "normal" : in this case sieve will be over in 5-6 days, not 6 years. It is cosmetic bug, but little strange one. On the other CPU prediction with same sieve file, is very accurate.[/QUOTE]

The first output shows 0.0% done. This implies that the sample size is too small to accurate compute the ETC.

Was something hindering execution such as another CPU intensive process? The first snippet shows very little progress (after at least 128 minutes) compared to the second.

MisterBitcoin 2021-01-13 16:10

I found an, well not an bug but a flaw in srsieve2. When using srfile to remove sequences from an k´s remain file its writes a file that begins like that:



pmin=0


I tried starting a sieve and got this result:

[CODE]C:\Users\Sydekum\Documents\other stuff\cllr38iwin64_v2>srsieve2.exe -s R7_4800.out -n 4800 -N 5000 -P 500e6
srsieve2 v1.1, a program to find factors of k*b^n+c numbers for fixed b and variable k and n
Fatal Error: sequence must be in form k*b^n+c where you specify values for k, b and c[/CODE]


After removing the line it worked fine.

rogue 2021-01-13 16:28

[QUOTE=MisterBitcoin;569172]I found an, well not an bug but a flaw in srsieve2. When using srfile to remove sequences from an k´s remain file its writes a file that begins like that:

pmin=0

I tried starting a sieve and got this result:

[CODE]C:\Users\Sydekum\Documents\other stuff\cllr38iwin64_v2>srsieve2.exe -s R7_4800.out -n 4800 -N 5000 -P 500e6
srsieve2 v1.1, a program to find factors of k*b^n+c numbers for fixed b and variable k and n
Fatal Error: sequence must be in form k*b^n+c where you specify values for k, b and c[/CODE]

After removing the line it worked fine.[/QUOTE]

That is only of those oddities from srsieve/srfile. If you use a different output format from srsieve/srfile (such as ABCD of PFGW), that line isn't included.

I have added these to FUTURE.txt.
[list][*]skipping the pmin line in srsieve2[*]added logic to srsieve2 to support removal of sequences similar to what srfile does[/list]

pepi37 2021-01-13 21:18

[QUOTE=rogue;569161]The first output shows 0.0% done. This implies that the sample size is too small to accurate compute the ETC.

Was something hindering execution such as another CPU intensive process? The first snippet shows very little progress (after at least 128 minutes) compared to the second.[/QUOTE]


No, there is no such process, this is Intel CPU with 6 cores, 5 are used. Dedicated machine only for sieving and from time to time open Firefox

rogue 2021-01-13 22:06

[QUOTE=pepi37;569213]No, there is no such process, this is Intel CPU with 6 cores, 5 are used. Dedicated machine only for sieving and from time to time open Firefox[/QUOTE]

All I can say is that the output is telling me that the CPUs are not being fully utilized by twinsieve, unless there is a bug causing the CPU to burn cycle without doing anything useful.

pepi37 2021-01-13 22:15

[QUOTE=rogue;569220]All I can say is that the output is telling me that the CPUs are not being fully utilized by twinsieve, unless there is a bug causing the CPU to burn cycle without doing anything useful.[/QUOTE]
That is reason why I say it is cosmetic bug, as soon as p increase, prediction time become accurate.

Plutie 2021-01-17 19:54

This is most likely an issue with my syntax, but when I run srsieve2 with a sequence in the form (k*b^n+c)[B]/d[/B], the program does not seem to be recognizing the division.

[CODE]uwu@DESKTOP-7I8GNER:~/Math/mtsieve$ ./srsieve2 -W2 -o=45557 -s"(41*10^n+13)/9" -n100001 -N150000
srsieve2 v1.3.1, a program to find factors of k*b^n+c numbers for fixed b and variable k and n
Sieving with generic logic
Sieve started: 3 < p < 2^62 with 50000 terms (100001 < n < 150000, k*10^n+c)
Fatal Error: Invalid factor: 41*10^100001+13 mod 3 = 12[/CODE]

rogue 2021-01-17 23:57

[QUOTE=Plutie;569531]This is most likely an issue with my syntax, but when I run srsieve2 with a sequence in the form (k*b^n+c)[B]/d[/B], the program does not seem to be recognizing the division.

[CODE]uwu@DESKTOP-7I8GNER:~/Math/mtsieve$ ./srsieve2 -W2 -o=45557 -s"(41*10^n+13)/9" -n100001 -N150000
srsieve2 v1.3.1, a program to find factors of k*b^n+c numbers for fixed b and variable k and n
Sieving with generic logic
Sieve started: 3 < p < 2^62 with 50000 terms (100001 < n < 150000, k*10^n+c)
Fatal Error: Invalid factor: 41*10^100001+13 mod 3 = 12[/CODE][/QUOTE]

There are some gaps in the factor validation for these. I'll take a look to see if I can fix them.

rogue 2021-01-26 20:31

I have released 2.1.5. Here are the changes:

[code]
framework:
Added MpArith.h (non-vectorized) and changed class names in MpArithVector.h.
Overloaded HashTable constructor as needed for srsieve2.

srsieve2, srsieve2cl: version 1.4
Lots of refactoring to support special sieving logic for c=1 sequences.
Implemented sr1sieve logic using Montgomery mulmod logic (CPU only).
Change array of sequences to a linked list to avoid compiler warnings.
Add support for pmin= line in input file (as generated by srsieve/srfile).
[/code]

This is a beta release for srsieve2 and srsieve2cl because I had to refactor a lot of code in order to implement the c=1 logic (sr1sieve) cleanly. As this is a beta, I am asking interested users to give it a spin. You should be able to start sieving a new sequence and it will switch to the c=1 logic automatically.

Right now the c=1 logic only works for a single sequence. If you have multiple sequences it will use the generic logic. Support for multiple sequences will come in the future, but that isn't next on my list. The c=1 logic is about 15% slower than sr1sieve based upon the limited testing I have done. Most of that is due to having zero hand-tuned ASM in that logic. sr1sieve has a ton of ASM and I am rather loathe to pull it into srsieve2. On the plus side I intend to focus next on fixing bugs (if any are reported) and implementing the OpenCL logic for a single c=1 sequence. It should be doable, but I don't know how fast it will be or if I will find other limits that prevent it from performing well.

I think that the issue reported by Plutie is fixed, but I have not tested it.

Dylan14 2021-01-26 23:02

Compiling the latest version of mtsieve (r92) fails at CisOneSequenceHelper.cpp:

[code]g++ -Isieve -m64 -Wall -O3 -std=c++11 -c -o sierpinski_riesel/CisOneSequenceHelper_cpu.o sierpinski_riesel/CisOneSequenceHelper.cpp
sierpinski_riesel/CisOneSequenceHelper.cpp:13:10: fatal error: HashTable.h: No such file or directory
13 | #include "HashTable.h"
| ^~~~~~~~~~~~~
compilation terminated.
make: *** [makefile:131: sierpinski_riesel/CisOneSequenceHelper_cpu.o] Error 1
[/code]

This is fixed if line 13 is changed to

[code]#include "../core/HashTable.h"[/code]

rogue 2021-01-27 00:05

[QUOTE=Dylan14;570182]Compiling the latest version of mtsieve (r92) fails at CisOneSequenceHelper.cpp:

[code]g++ -Isieve -m64 -Wall -O3 -std=c++11 -c -o sierpinski_riesel/CisOneSequenceHelper_cpu.o sierpinski_riesel/CisOneSequenceHelper.cpp
sierpinski_riesel/CisOneSequenceHelper.cpp:13:10: fatal error: HashTable.h: No such file or directory
13 | #include "HashTable.h"
| ^~~~~~~~~~~~~
compilation terminated.
make: *** [makefile:131: sierpinski_riesel/CisOneSequenceHelper_cpu.o] Error 1
[/code]

This is fixed if line 13 is changed to

[code]#include "../core/HashTable.h"[/code][/QUOTE]

Thanks. I wonder why it compiles in Windows. In any case that #include is not needed.

BTW, if anyone has ideas for optimizations for the new c=1 logic, I would appreciate if you posted them in the "mtsieve enhancements" thread.

pepi37 2021-01-27 23:40

Srsieve2
 
[QUOTE]srsieve2 -P 11000000000000000 -W4 -w 1e7 -i t16_b155_k4.npg -o t16_b155_k4.npg -f B -O factgenefer.txt[/QUOTE]


Last version crash, version 1.3 works without problems

rogue 2021-01-28 03:45

[QUOTE=pepi37;570300]Last version crash, version 1.3 works without problems[/QUOTE]

Can you post or e-mail me the input file?

pepi37 2021-01-28 10:22

[QUOTE=rogue;570304]Can you post or e-mail me the input file?[/QUOTE]


This is part of input file


[QUOTE]10000000000000000:P:1:155:257
4 1174326
4 1174366
4 1174374
4 1174582
4 1174598
4 1174630
4 1174646
4 1174830
4 1174950
4 1174974
4 1174998
4 1175014
4 1175142
4 1175150
4 1175254
4 1175278
4 1175302
4 1175398
4 1175430
4 1175454
4 1175574
4 1175742
4 1175822[/QUOTE]

rogue 2021-01-28 13:37

Found the problem. It will be fixed in the next release.

pepi37 2021-01-28 21:13

Great news, thanks!

rogue 2021-01-29 03:17

I posted 1.4.1 of srsieve2 at sourceforge in its own 7z file.

Upon some further testing, it is about 30% slower than sr1sieve (with x86 asm) and 10% slower than sr1sieve (with no x86 asm).

I fully expect that srsieve2cl with c=1 support in the GPU will be much faster than sr1sieve even on modest GPUs, so I'm not too concerned about the poorer performance at this time. As much as I would love to stop supporting sr1sieve, I don't think that is going to happen anytime soon.

henryzz 2021-01-29 16:44

[QUOTE=rogue;570376]I posted 1.4.1 of srsieve2 at sourceforge in its own 7z file.

Upon some further testing, it is about 30% slower than sr1sieve (with x86 asm) and 10% slower than sr1sieve (with no x86 asm).

I fully expect that srsieve2cl with c=1 support in the GPU will be much faster than sr1sieve even on modest GPUs, so I'm not too concerned about the poorer performance at this time. As much as I would love to stop supporting sr1sieve, I don't think that is going to happen anytime soon.[/QUOTE]

Is that comparison without sr1sieve using a Legendre symbol cache? As far as I can tell srsieve2 with sr1sieve logic is spending around 30% of its time calculating legendre symbols. I get the following message if I try to turn it on "[COLOR="Red"]Ingoring [/COLOR]-L option since Legendre tables cannot be used"

Also I get a seg fault after running "./srsieve2 -P 1e9 -n 1 -N 100000 -s "19920911*2^n+1""

This is using r95 of the code on Sourceforge.

rogue 2021-01-29 18:09

[QUOTE=henryzz;570419]Is that comparison without sr1sieve using a Legendre symbol cache? As far as I can tell srsieve2 with sr1sieve logic is spending around 30% of its time calculating legendre symbols. I get the following message if I try to turn it on "[COLOR="Red"]Ingoring [/COLOR]-L option since Legendre tables cannot be used"

Also I get a seg fault after running "./srsieve2 -P 1e9 -n 1 -N 100000 -s "19920911*2^n+1""

This is using r95 of the code on Sourceforge.[/QUOTE]

-L isn't supported (yet). By default it will create a Legendre table and you can use -l to disable, but I actually haven't verified that is working correctly.

I found the error and committed a change to sourceforge. I have updated srsieve2.7z over at sourceforge as well.

Happy5214 2021-02-02 08:59

The current SVN version fails to run on Kubuntu 20.04:

[code]$ ./srsieve2 -W "3" -n "50e3" -N "230e3" -P "1e9" -o 't17_b2.prp' -f B -s "37803*2^n-1"
srsieve2 v1.5, a program to find factors of k*b^n+c numbers for fixed b and variable k and n
Sieving with generic logic for p >= 3
Sieve started: 3 < p < 1e9 with 180001 terms (50000 < n < 230000, k*2^n+c) (expecting 170458 factors)
Sieving one sequence where abs(c) = 1 for p >= 37803
Split 1 base 2 sequence into 94 base 2^180 sequences.
malloc(): corrupted top size
Aborted (core dumped)
[/code]

rogue 2021-02-02 13:21

[QUOTE=Happy5214;570707]The current SVN version fails to run on Kubuntu 20.04:

[code]$ ./srsieve2 -W "3" -n "50e3" -N "230e3" -P "1e9" -o 't17_b2.prp' -f B -s "37803*2^n-1"
srsieve2 v1.5, a program to find factors of k*b^n+c numbers for fixed b and variable k and n
Sieving with generic logic for p >= 3
Sieve started: 3 < p < 1e9 with 180001 terms (50000 < n < 230000, k*2^n+c) (expecting 170458 factors)
Sieving one sequence where abs(c) = 1 for p >= 37803
Split 1 base 2 sequence into 94 base 2^180 sequences.
malloc(): corrupted top size
Aborted (core dumped)
[/code][/QUOTE]

It crashes on Windows as well, so it shouldn't be too hard to track down and fix.

rogue 2021-02-02 15:47

I found and fixed the problem. The changes are committed to sourceforge.

rogue 2021-02-02 19:24

There seems to be an issue with the Legendre lookup table. If you do not use -l, then it will miss factors. It should be easy to track down, but one never knows. Note that -l disables the building of the Legendre lookup tables. It is enabled by default.

rogue 2021-02-03 14:07

[QUOTE=rogue;570755]There seems to be an issue with the Legendre lookup table. If you do not use -l, then it will miss factors. It should be easy to track down, but one never knows. Note that -l disables the building of the Legendre lookup tables. It is enabled by default.[/QUOTE]

This is now fixed.

rogue 2021-02-03 15:30

BTW, now with this change the speed of srsieve2 (for CisOne logic) is within 5% of the speed of sr1sieve (with x86 asm) and about 10% faster than the speed of sr1sieve (with no x86 asm). By "within" I mean that sometimes it is faster and sometimes it is slower. The speed difference appears to be one of cache usage and CPU load on the machine overall. Note this was only tested with a single sequence so it is possible that other sequences will yield different results.

I will have to play around with unrolling some of the loops in srsieve2 to see if I can do better, but right now I'm pleased to see that it is performing so well considering it didn't look so well earlier this week.

My intention is to post a build after I track down the issue with the CisOne logic in srsieve2cl.

rogue 2021-02-04 01:40

Great news! I have tracked down and squashed the known bugs in srsieve2 and srsieve2cl. I have some benchmarks to share.

The CPU is an Intel i78-8550H at 2.6 GHz and the GPU is an NVIDIA Quadro P3200. I was running no other CPU/GPU intensive processes during this test. All runs yielded the same set of factors.

I sieved 37803*2^n-1 for n from 5e4 to 25e4 up to 1e6. I then ran the file thru sr1sieve, sr2sieve, and sr2sievecl taking the average of 5 runs. Here are the results:

[code]
srsieve2 -i b2_n.in -P1e10 504
srsieve2 -i b2_n.in -P1e10 -l 647

srsieve2cl -i b2_n.in -P1e10 355
srsieve2cl -i b2_n.in -P1e10 -l 353

srsieve2cl -i b2_n.in -P1e10 -g100 221
srsieve2cl -i b2_n.in -P1e10 -g100 -1 210

srsieve2cl -i b2_n.in -P1e10 -g1000 184
srsieve2cl -i b2_n.in -P1e10 -g1000 -l 183

sr1sieve -i b2_n.in -P1e10 -ffact.out (asm) 460
sr1sieve -i b2_n.in -P1e10 -ffact.out -x (asm) 562

sr1sieve -i b2_n.in -P1e10 -ffact.out (no asm) 455
sr1sieve -i b2_n.in -P1e10 -ffact.out -x (no asm) 549
[/code]

As a reminder -l with srsieve2/srsieve2cl means "do not use Legendre lookup tables". This corresponds to -x from sr1sieve. The OpenCL code in srsieve2cl supports Legendre lookup tables, but you can see that it doesn't provide any benefit for this k.

It is clear that srsieve2cl with -g1000 clearly beats out everything else. With -g1000 it uses less than 500 MB of GPU memory (per Windows Task Manager.

It will be interesting to see this run on lower GPUs to see how they compare.

So with this report, mtsieve 2.1.6 is now released. Here are the changes:

[code]
framework:
Add largestPrimeTested parameter to NotifyAppToRebuild() as the app cannot rely
on accurately determining that value.

srsieve2, srsieve2cl: version 1.5
Fixed remaining known issues with CisOne logic (sequences where abs(c) = 1) for
a single CisOne sequence (sr1sieve).
Added OpenCL code for CisOne logic.
Added Legendre table lookups for CisOne logic.
[/code]

LaurV 2021-02-04 02:05

[QUOTE=rogue;570815]
[code]
srsieve2cl -i b2_n.in -P1e10 -g100 -1 210
[/code]As a reminder -l with srsieve2/srsieve2cl means "do not use Legendre lookup tables". [/QUOTE]
And what does the "-1" means? :razz:
OTOH, good job!

pepi37 2021-02-04 12:25

Does srsieve2cl with -g1000 kill srsieve1 in speed?

rogue 2021-02-04 13:23

[QUOTE=pepi37;570838]Does srsieve2cl with -g1000 kill srsieve1 in speed?[/QUOTE]

Based upon the single sequence I tested given the hardware specs I provided, sriseve2cl with -g1000 is more than twice as fast as sr1sieve. With -g100 it is slightly more than twice is faster as sr1sieve. With a higher value with -g, it could possible be 3x faster, but that is on this hardware.

pepi37 2021-02-04 13:29

Single sequence is only I need 😊

pepi37 2021-02-04 20:29

[QUOTE]e:\MTSIEVE\216>srsieve2cl -P 2e15 -H -D 1 -d 1 -i 92.txt -g 120 -o 92.txt -f B -l
srsieve2cl v1.5, a program to find factors of k*b^n+c numbers for fixed b and variable k and n
Sieving one sequence where abs(c) = 1 for p >= 1600012693787917
Split 1 base 10 sequence into 216 base 10^360 sequences.
709440 bytes used for congruence tables
CL_DEVICE_MAX_COMPUTE_UNITS = 22
CL_DEVICE_GLOBAL_MEM_SIZE = 2147483648
CL_DEVICE_LOCAL_MEM_SIZE = 49152
CL_KERNEL_WORK_GROUP_SIZE = 256
CL_KERNEL_PREFERRED_WORK_GROUP_SIZE_MULTIPLE = 32
CL_KERNEL_LOCAL_MEM_SIZE = 1
CL_KERNEL_PRIVATE_MEM_SIZE = 14752
GPU global bytes allocated = 36837532
GPU private bytes allocated = 4805632
GPU primes per worker is 675840
Sieve started: 1600012693787917 < p < 2e15 with 93883 terms (1400033 < n < 2999998, k*10^n+c) (expecting 595 factors)
p=1600014279249383, 742.3K p/sec, no factors found, 0.0% done. ETC 2021-08-02 01:12[/QUOTE]


CPU is Intel 9600K ( 5% utilization)
GPU is 1660 Super ( working only with this program)


I experiment with g , and if I go above 150 my gpu utilization spikes from 0 -100% but speed is same as I use g=120 so it is around 742.3K p/sec


I will kindly ask to explain -W and -w parameters since they clearly are not same as in rest of mtsieve package.

Test was done on sequence 92*10^n-1 from ( sieve is from 1M4 to 3M) with 93883 candidates

rogue 2021-02-04 22:10

[QUOTE=pepi37;570877]CPU is Intel 9600K ( 5% utilization)
GPU is 1660 Super ( working only with this program)


I experiment with g , and if I go above 150 my gpu utilization spikes from 0 -100% but speed is same as I use g=120 so it is around 742.3K p/sec


I will kindly ask to explain -W and -w parameters since they clearly are not same as in rest of mtsieve package.

Test was done on sequence 92*10^n-1 from ( sieve is from 1M4 to 3M) with 93883 candidates[/QUOTE]

How does that compare to the speed of sr1sieve?

-W is used to specify the number of CPU only workers.
-w is the number of primes per chunk per worker.

-G is used to specify the number of GPU only workers.
-g is a multiplier for the CL_DEVICE_MAX_COMPUTE_UNITS * CL_KERNEL_WORK_GROUP_SIZE to compute the number of primes per chunk per worker.

For CPU-only exes, -W defaults to 1 and -w to 1e6.
For GPU exes, -W defaults to 0 and -G default to 1 and -g defaults to 10.

For GPU-only exes, if p_min < a threshold determined at runtime, then a CPU worker is used even if -W is 0, but that CPU worker is only used until p_min > that threshold. IIRC, that threshold is min(1e6, k) for srsieve2cl. 1e6 is used because the factor density is fairly high for low p and I want to limit how much GPU memory is needed to pass factors back to the CPU. It is typically n_max or k_max for other GPU exes. -M is used to adjust the amount of memory needed for returning factors. If the default is not sufficient, then you will be told at runtime to adjust it if it detects too many factors for given -M.

You can use -W with a value > 0 for GPU exes, but that really depends upon the relative speed of your CPU to your GPU.
You cannot use -G with CPU-only exes.

I don't see a usage for -G > 1, but I suppose you can do that if -g isn't large enough to keep your GPU busy.

If -g is too large you could encounter screen lag.

I hope that answers your questions.

pepi37 2021-02-05 05:53

I can compare speed of sieves only by finish date

rogue 2021-02-05 13:18

[QUOTE=pepi37;570901]I can compare speed of sieves only by finish date[/QUOTE]

There is an ETA output by both programs and a factors removal rate. You just need to run both them for 10 minutes, hit ^c, then compare the results to see which one sieved further during that time.

MisterBitcoin 2021-02-05 14:36

I encountered an bug while using srsieve2 on an large input file (around 100 MB). After two days of sieving it stopped at the given Pmax; but look at the screen output:


[CODE] p=19852801223, 178.6 p/sec, 4397895 factors found at 6.62 sec per factor (last
p=19852801223, 180.6 p/sec, 4397919 factors found at 6.63 sec per factor (last
p=19852801223, 176.3 p/sec, 4397936 factors found at 6.64 sec per factor (last
p=19852801223, 180.7 p/sec, 4397951 factors found at 6.65 sec per factor (last
517 min), 99.2% done. ETC 2021-02-05 12:14
2 workers didn't stop after 10 minutes

D:\sieve>[/CODE]


Which also means he didnt saved the file, last checkpoint was around 4 hours ago which is okay.
However i wonder why it did crash, is there a function that gives the app an timeout after a worker didnt reported the results back after reaching n-minutes?

rogue 2021-02-05 15:24

[QUOTE=MisterBitcoin;570916]I encountered an bug while using srsieve2 on an large input file (around 100 MB). After two days of sieving it stopped at the given Pmax; but look at the screen output:

[CODE] p=19852801223, 178.6 p/sec, 4397895 factors found at 6.62 sec per factor (last
p=19852801223, 180.6 p/sec, 4397919 factors found at 6.63 sec per factor (last
p=19852801223, 176.3 p/sec, 4397936 factors found at 6.64 sec per factor (last
p=19852801223, 180.7 p/sec, 4397951 factors found at 6.65 sec per factor (last
517 min), 99.2% done. ETC 2021-02-05 12:14
2 workers didn't stop after 10 minutes

D:\sieve>[/CODE]

Which also means he didnt saved the file, last checkpoint was around 4 hours ago which is okay.
However i wonder why it did crash, is there a function that gives the app an timeout after a worker didnt reported the results back after reaching n-minutes?[/QUOTE]

This occurs when the main thread suspects that one or more of the worker threads has become unresponsive, maybe stuck in a tight loop. In this case because p/sec is so low each worker thread needs more than 10 minutes to process a single chunk of primes. In this case the worker thread was likely still working. I can look into a change to not do that check under certain circumstances.

To get around the problem, I suggest that you use -w1e3 or -w1e4 (the default is -w1e6). This will give smaller chunks of work to each worker and thus they can process the chunk must faster. This will have a negligible affect on overall rate since so little time is spent in the prime sieve.

Happy5214 2021-02-08 09:41

I ran into an issue starting a new Riesel sieve with multiple workers:

[code]$ ./bin/srsieve2 -W 3 -n 125e3 -N 300e3 -P 1e9 -o t17_b2.prp -f B -s "14549535*2^n-1"
srsieve2 v1.5, a program to find factors of k*b^n+c numbers for fixed b and variable k and n
Sieving with generic logic for p >= 3
Sieve started: 3 < p < 1e9 with 175001 terms (125000 < n < 300000, k*2^n+c) (expecting 165723 factors)
Sieving with generic logic for p >= 257
Split 1 base 2 sequence into 1 base 2^1 sequences.
Fatal Error: Invalid factor: 14549535*2^128595-1 mod 34747 = 22443
[/code]

[c]-W 2[/c] and [c]-W 4[/c] also failed with the same error at other small primes, but using only 1 worker seemed to get past that stage (I had another sieve running that I didn't want to interfere with, so I didn't complete this):

[code]$ ./bin/srsieve2 -W 1 -n 125e3 -N 300e3 -P 1e9 -o t17_b2.prp -f B -s "14549535*2^n-1"
srsieve2 v1.5, a program to find factors of k*b^n+c numbers for fixed b and variable k and n
Sieving with generic logic for p >= 3
Sieve started: 3 < p < 1e9 with 175001 terms (125000 < n < 300000, k*2^n+c) (expecting 165723 factors)
Sieving with generic logic for p >= 257
Split 1 base 2 sequence into 1 base 2^1 sequences.
Sieving one sequence where abs(c) = 1 for p >= 15489191
Split 1 base 2 sequence into 171 base 2^180 sequences.
741796 bytes used for congruence tables
1617098 bytes used for Legendre tables
^CCTRL-C accepted. Threads will stop after sieving to 32456407
Sieve interrupted at p=32456407.
CPU time: 16.72 sec. (0.03 sieving) (0.97 cores)
43690 terms written to t17_b2.prp
Primes tested: 1000000. Factors found: 131311. Remaining terms: 43690. Time: 17.19 seconds.
[/code]

rogue 2021-02-08 13:28

[QUOTE=Happy5214;571143]I ran into an issue starting a new Riesel sieve with multiple workers:

[code]$ ./bin/srsieve2 -W 3 -n 125e3 -N 300e3 -P 1e9 -o t17_b2.prp -f B -s "14549535*2^n-1"
srsieve2 v1.5, a program to find factors of k*b^n+c numbers for fixed b and variable k and n
Sieving with generic logic for p >= 3
Sieve started: 3 < p < 1e9 with 175001 terms (125000 < n < 300000, k*2^n+c) (expecting 165723 factors)
Sieving with generic logic for p >= 257
Split 1 base 2 sequence into 1 base 2^1 sequences.
Fatal Error: Invalid factor: 14549535*2^128595-1 mod 34747 = 22443
[/code]

[c]-W 2[/c] and [c]-W 4[/c] also failed with the same error at other small primes, but using only 1 worker seemed to get past that stage (I had another sieve running that I didn't want to interfere with, so I didn't complete this).][/QUOTE]

Hmm. I'll take a look. This seems to have failed in the generic sieving logic, not the cisone logic.

rogue 2021-02-08 16:26

[QUOTE=rogue;571153]Hmm. I'll take a look. This seems to have failed in the generic sieving logic, not the cisone logic.[/QUOTE]

I see what I did wrong It was something I introduced in the most recent release. I'll fix in the next release. Since you are on linux or OS X, I have committed the files so you can build and try again.

Happy5214 2021-02-09 03:51

[QUOTE=rogue;571167]I see what I did wrong It was something I introduced in the most recent release. I'll fix in the next release. Since you are on linux or OS X, I have committed the files so you can build and try again.[/QUOTE]
It worked, though I actually didn't need it anymore. My workflow involves using srsieve2 to 1e9 and the faster sr1sieve or sr2sieve beyond that (this old box has no real GPU option), so I just burned the extra minute and ran it with one worker.

rogue 2021-02-09 22:55

I have released 2.2.0:

[code]
framework:
Updated OpenCL on Windows. See makefile for details.
Updated primesieve to 7.6.

psieve, psievecl: version 1.4
Some refactoring to support OpenCL worker.
First release of psievecl.
Verify factors from -I input file

srsieve2, srsieve2cl: version 1.5.1
Fixed bug that was introduced in the refactoring of 1.5 that impacts generic sieving
while using multiple threads.
Added -R to remove sequences. Use -Rk*b^n+c format to remove a single sequence or
use -R with a file that has multiple sequences. This is not tested yet.
[/code]

psievecl is about 20x faster than psieve. The main slowdown is factor validation, which is less noticeable as factors become more sparse.

One odd behavior is that I noticed that srsieve2cl is slower than the previous release, but I do not know why. Even after reverting the framework changes for the release it was slower. I thought it was the update to OpenCL or primesieve, but reverting to the older versions of those made no difference. I'm likely missing something, but I don't know what.

I'm hoping that someone is willing to give the -R option with srsieve2 a spin.

Happy5214 2021-02-10 02:51

1 Attachment(s)
[QUOTE=rogue;571238]I'm hoping that someone is willing to give the -R option with srsieve2 a spin.[/QUOTE]

No luck:

[code]$ ./srsieve2 -i b2_n.abcd -o b2_n.abcd -R "658687*2^n-1"
srsieve2 v1.5.1, a program to find factors of k*b^n+c numbers for fixed b and variable k and n
2018 terms for sequence 658687*2^n-1 have been removed
Must use generic sieving logic because there is more than one sequence
Sieving with generic logic for p >= 982453051
Fatal Error: Expected 62636 terms when building sequences, but counted only 60618
[/code]

Input file attached (with .txt extension added).

rogue 2021-02-10 04:16

[QUOTE=Happy5214;571242]No luck:

[code]$ ./srsieve2 -i b2_n.abcd -o b2_n.abcd -R "658687*2^n-1"
srsieve2 v1.5.1, a program to find factors of k*b^n+c numbers for fixed b and variable k and n
2018 terms for sequence 658687*2^n-1 have been removed
Must use generic sieving logic because there is more than one sequence
Sieving with generic logic for p >= 982453051
Fatal Error: Expected 62636 terms when building sequences, but counted only 60618
[/code]

Input file attached (with .txt extension added).[/QUOTE]

That will be easy to fix. When the sequence is removed the code isn't decrementing il_TermCount.

rogue 2021-02-23 17:15

I have released 2.2.1. Here are the changes:

[quote]
framework: no changes

gfndsieve, gfndsievecl: version 2.0
Moved GFN divisor testing to GFNDivisorTester class so that GFNDivisorApp is smaller
and so that future support of non-x86 is easier since GFNDivisorTester
calls a number of x86 asm methods directly.

For gfndsievecl do not report any terms with factors < 50. This reduces the size
needed for the buffer that is used to report factors.

Added -r and -R options to support functionality similar to ppsieve.
-r will not generate a bitmap for tracking terms. It will only generate an
output file of factors.
-R is used with -r. If a term has a factor below 32767 (the default value),
then the program will not output any factors for the term.
-r and -x are mutually exclusive with -r overriding -x.

Added various speed improvements.

srsieve2, srsieve2cl: version 1.5.2
Fixed issue with CisOne logic as it tries to rebuild sequences when there are
multiple sequences as that is not yet supported.
[quote]

In what limited performance testing I have done it appears that gnfdsievecl with -r is about 5x faster than the OpenCL version of ppsieve. I think some of that is due to using a much higher value for -S in gnfdsievecl than what ppsievecl does, but that doesn't explain the entirely of the speed gain. The CPU only version is about 3x slower than ppsieve, but that is due to a lot of fine-tuned assembler in the ppsieve CPU code. I don't expect anyone to use it for that, but at the same time gfndsieve should be about 50% faster for typical gfn divisor sieving.

YaoPlaysMC 2021-02-27 18:43

Can someone make a program for sieving sequences of the form k[SUB]1[/SUB]*b[SUB]1[/SUB]^n+k[SUB]2[/SUB]*b[SUB]2[/SUB]^n+c with variable n?

mathwiz 2021-08-12 19:53

There appears to be something broken with gfndsieve at head. This is from the latest svn build:

[CODE]./gfndsieve -P 1e11 -W 36 -n 10000 -N 100000 -k 1e5 -K 1e6 -o gfn.txt

gfndsieve v2.0, a program to find factors of k*2^n+1 numbers for variable k and n
Sieve started: 3 < p < 1e11 with 40500450000 terms (100001 <= k <= 999999, 10000 <= n <= 100000, k*2^n+1) (expecting 38743756770 factors)
Fatal Error: Invalid factor: 100007*2^99457+1 mod 100271 = 13484[/CODE]

rogue 2021-08-12 20:09

[QUOTE=mathwiz;585516]There appears to be something broken with gfndsieve at head. This is from the latest svn build:

[CODE]./gfndsieve -P 1e11 -W 36 -n 10000 -N 100000 -k 1e5 -K 1e6 -o gfn.txt

gfndsieve v2.0, a program to find factors of k*2^n+1 numbers for variable k and n
Sieve started: 3 < p < 1e11 with 40500450000 terms (100001 <= k <= 999999, 10000 <= n <= 100000, k*2^n+1) (expecting 38743756770 factors)
Fatal Error: Invalid factor: 100007*2^99457+1 mod 100271 = 13484[/CODE][/QUOTE]

I get the same error. I'll take a look.

houding 2021-08-13 05:49

[QUOTE=rogue;585518]I get the same error. I'll take a look.[/QUOTE]


I had the same the same issue with a previous build.


You found the problem back then.


[url]https://www.mersenneforum.org/showthread.php?t=22890[/url]

rogue 2021-08-13 12:03

[QUOTE=houding;585541]I had the same the same issue with a previous build.


You found the problem back then.


[url]https://www.mersenneforum.org/showthread.php?t=22890[/url][/QUOTE]

It might or not be the same cause. I haven't had a chance to look at it yet.

rogue 2021-08-13 15:03

The problem with gfnsieve is fixed in sourceforge.

Plutie 2021-08-13 17:16

Similar? issue to what I reported on page 46 - latest svn

[CODE]/Math/mtsieve$ ./srsieve2cl -n 100000 -N 500000 -o out.txt -s "(88*10^n-7)/9"
srsieve2cl v1.5.3, a program to find factors of k*b^n+c numbers for fixed b and variable k and n
Sieving with generic logic for p >= 3
Creating CPU worker to use until p >= 1000000
GPU primes per worker is 25600
Sieve started: 3 < p < 2^62 with 400001 terms (100000 < n < 500000, k*10^n+c)
Fatal Error: Invalid factor: (88*10^100000-7)/9 mod 3 = 18446744073709551610[/CODE]

ET_ 2021-08-13 17:24

[QUOTE=Plutie;585584]Similar? issue to what I reported on page 46 - latest svn

[CODE]/Math/mtsieve$ ./srsieve2cl -n 100000 -N 500000 -o out.txt -s "(88*10^n-7)/9"
srsieve2cl v1.5.3, a program to find factors of k*b^n+c numbers for fixed b and variable k and n
Sieving with generic logic for p >= 3
Creating CPU worker to use until p >= 1000000
GPU primes per worker is 25600
Sieve started: 3 < p < 2^62 with 400001 terms (100000 < n < 500000, k*10^n+c)
Fatal Error: Invalid factor: (88*10^100000-7)/9 mod 3 = 18446744073709551610[/CODE][/QUOTE]

Latest SVN source code is 2.2.2 (July 2nd 2021, r138)
The Windows EXE 7z file is 2.2.1
The last commit is r139

Plutie 2021-08-13 17:30

[QUOTE=ET_;585585]Latest SVN source code is 2.2.2 (July 2nd 2021, r138)
The Windows EXE 7z file is 2.2.1
The last commit is r139[/QUOTE]

Just redownloaded r139, same error is occurring.

rogue 2021-08-13 18:27

[QUOTE=Plutie;585584]Similar? issue to what I reported on page 46 - latest svn

[CODE]/Math/mtsieve$ ./srsieve2cl -n 100000 -N 500000 -o out.txt -s "(88*10^n-7)/9"
srsieve2cl v1.5.3, a program to find factors of k*b^n+c numbers for fixed b and variable k and n
Sieving with generic logic for p >= 3
Creating CPU worker to use until p >= 1000000
GPU primes per worker is 25600
Sieve started: 3 < p < 2^62 with 400001 terms (100000 < n < 500000, k*10^n+c)
Fatal Error: Invalid factor: (88*10^100000-7)/9 mod 3 = 18446744073709551610[/CODE][/QUOTE]

srsieve2 does not work correctly when d > 1. I don't know how easy that will be to fix. I thought it was correct and working but is clearly not.

I do see a separate issue in factor validation when abs(c) > 1, so I will fix that.

mathwiz 2021-08-13 18:30

[QUOTE=rogue;585589]This is a different issue. I have time to look into it today.[/QUOTE]

The latest build in SVN seems to be working for me.

However, does gfndsieve respect the -W flag? With -W 36 I still only see a single CPU utilized in "top".

rogue 2021-08-13 19:22

[QUOTE=mathwiz;585590]The latest build in SVN seems to be working for me.

However, does gfndsieve respect the -W flag? With -W 36 I still only see a single CPU utilized in "top".[/QUOTE]

For primes < 1e4 only a single thread is used to reduce contention when removing terms from the vector of remaining terms. But if the first chunk of work ends after 1e4 it will wait until that chunk of work is done. I could modify the code to "rebuild the workers" once it reaches 1e4 so that more threads can do the work.

I would not be surprised if you end up starving workers with -W36 because the main thread that doles out chunks of work will become a bottleneck. One way to offset this is to increase -w to give each worker more to work on.

To eliminate the bottleneck with the main worker would require a number changes to the framework. I haven't thought about it a lot so I don't know how easy or difficult that would be or how much it would impact overall performance.

mathwiz 2021-08-16 14:43

Unless I'm mistaken about how the ABCD format, gfndsieve seems to be leaving a lot of terms with very small factors. For example, this is a snippet from gfndsieve output:

[CODE]ABCD $a*2^60000+1 [700001] // Sieved to 10000000000051
4
2
2
4
2
2
2
4
2
2
2
2
2
2
4
2
2
2
2
2
2
2
2
4
2
2
4
2
6[/CODE]

Feeding this to LLR gives:

[CODE]700001*2^60000+1 has a small factor : 3 !!
Starting Proth prime test of 700005*2^60000+1
Using all-complex AVX-512 FFT length 6K, a = 7
700005*2^60000+1 is not prime. Base-7 Proth RES64: A386651F7F998E5F. Time : 802.675 ms.
700007*2^60000+1 has a small factor : 3 !!
Starting Proth prime test of 700009*2^60000+1
Using all-complex AVX-512 FFT length 6K, a = 3
700009*2^60000+1 is not prime. Proth RES64: B72DF3B16C12AD73. Time : 744.449 ms.
700013*2^60000+1 has a small factor : 3 !!
Starting Proth prime test of 700015*2^60000+1
Using all-complex AVX-512 FFT length 6K, a = 3
700015*2^60000+1 is not prime. Proth RES64: E1F28AC9BAA1BB8B. Time : 739.817 ms.
Starting Proth prime test of 700017*2^60000+1
Using all-complex AVX-512 FFT length 6K, a = 5
700017*2^60000+1 is not prime. Base-5 Proth RES64: 43CEAF84FFC1C844. Time : 745.082 ms.
700019*2^60000+1 has a small factor : 3 !!
Starting Proth prime test of 700023*2^60000+1
Using all-complex AVX-512 FFT length 6K, a = 7
700023*2^60000+1 is not prime. Base-7 Proth RES64: 4B0BB226CE058F90. Time : 755.652 ms.
700025*2^60000+1 has a small factor : 3 !!
Starting Proth prime test of 700027*2^60000+1
Using all-complex AVX-512 FFT length 6K, a = 3
700027*2^60000+1 is not prime. Proth RES64: 10DA62863970FCB8. Time : 803.375 ms.
700029*2^60000+1 has a small factor : 5 !!
700031*2^60000+1 has a small factor : 3 !![/CODE]

Note all the "has a small factor : 3 !!" lines.

rogue 2021-08-16 16:23

Is this the latest code in source forge? I'm guessing it is. I'll have to take a look at it.

mathwiz 2021-08-16 16:26

[QUOTE=rogue;585764]Is this the latest code in source forge? I'm guessing it is. I'll have to take a look at it.[/QUOTE]

Yep, latest from SVN. Sample command after building:

[CODE] ./gfndsieve -P 1e13 -W 36 -n 60000 -N 61000 -k 700e3 -K 800e3 -o gfnsmall.txt[/CODE]

Produces:

[CODE]ABCD $a*2^60000+1 [700001] // Sieved to 10000000000051
4
2
2
4
2
2
2
4
2
2
2
2
2
2
...[/CODE]

rogue 2021-08-16 19:03

I think I know which code change introduced this, so it shouldn't be too hard to fix. If you need working code, use revision 122.

mathwiz 2021-08-16 19:10

[QUOTE=rogue;585784]I think I know which code change introduced this, so it shouldn't be too hard to fix. If you need working code, use revision 122.[/QUOTE]

Thanks -- but that revision seems to have (build) issues of its own.

[CODE]g++ -Isieve -m64 -Wall -O3 -std=c++11 -lstdc++ -o gfndsieve core/App_cpu.o core/FactorApp_cpu.o core/AlgebraicFactorApp_cpu.o core/Clock_cpu.o core/Parser_cpu.o core/Worker_cpu.o core/HashTable_cpu.o core/main_cpu.o core/SharedMemoryItem_cpu.o sieve/Erat.o sieve/EratBig.o sieve/EratMedium.o sieve/EratSmall.o sieve/PreSieve.o sieve/CpuInfo.o sieve/MemoryPool.o sieve/PrimeGenerator.o sieve/PrimeSieve.o sieve/IteratorHelper.o sieve/LookupTables.o sieve/popcount.o sieve/nthPrime.o sieve/PrintPrimes.o sieve/ParallelSieve.o sieve/iterator.o sieve/api.o sieve/SievingPrimes.o x86_asm/fpu_mod_init_fini.o x86_asm/fpu_push_pop.o x86_asm/sse_mulmod.o x86_asm/fpu_mulmod.o x86_asm/fpu_powmod.o x86_asm/fpu_powmod_4b_1n_4p.o x86_asm/fpu_mulmod_iter.o x86_asm/fpu_mulmod_iter_4a.o x86_asm/fpu_mulmod_4a_4b_4p.o x86_asm/sse_mod_init_fini.o x86_asm/sse_powmod_4b_1n_4p.o x86_asm/sse_mulmod_4a_4b_4p.o x86_asm/avx_set_a.o x86_asm/avx_set_b.o x86_asm/avx_get.o x86_asm/avx_compute_reciprocal.o x86_asm/avx_compare.o x86_asm/avx_mulmod.o x86_asm/avx_powmod.o x86_asm/sse_powmod_4b_1n_4p_mulmod_1k.o x86_asm_ext/m320.o x86_asm_ext/m384.o x86_asm_ext/m448.o x86_asm_ext/m512.o x86_asm_ext/m576.o x86_asm_ext/m640.o x86_asm_ext/m704.o x86_asm_ext/m768.o x86_asm_ext/mulmod128.o x86_asm_ext/mulmod192.o x86_asm_ext/mulmod256.o x86_asm_ext/sqrmod128.o x86_asm_ext/sqrmod192.o x86_asm_ext/sqrmod256.o x86_asm_ext/redc.o gfn_divisor/GFNDivisorApp_cpu.o gfn_divisor/GFNDivisorWorker_cpu.o -lgmp -lpthread
/usr/bin/ld: gfn_divisor/GFNDivisorApp_cpu.o: in function `GFNDivisorApp::PostSieveHook()':
GFNDivisorApp.cpp:(.text+0x3b1): undefined reference to `GFNDivisorTester::TestRemainingTerms(unsigned long, unsigned long, unsigned long)'
/usr/bin/ld: gfn_divisor/GFNDivisorApp_cpu.o: in function `GFNDivisorApp::ValidateOptions()':
GFNDivisorApp.cpp:(.text+0x292c): undefined reference to `GFNDivisorTester::GFNDivisorTester(App*)'
/usr/bin/ld: gfn_divisor/GFNDivisorApp_cpu.o: in function `GFNDivisorApp::PreSieveHook()':
GFNDivisorApp.cpp:(.text+0x20d8): undefined reference to `GFNDivisorTester::StartedSieving()'
collect2: error: ld returned 1 exit status[/CODE]

I'm in no rush, so happy to wait for a fix at head.

rogue 2021-08-16 22:34

You would need all of the sources from that revision to build it.

On the positive side this issue is now fixed. I was over-thinking a speed up in the previous revision for small primes and it was just plain stupid. It was never going to work. It works now based upon the testing I have done.

mathwiz 2021-08-16 22:43

[QUOTE=rogue;585802]You would need all of the sources from that revision to build it.[/QUOTE]

I think r122 is just broken; a clean "svn co --revision=..." in a clean directory still produces the same build error. But r123 seems to fix the makefile, and that build appears to be working for me.

[QUOTE]On the positive side this issue is now fixed. I was over-thinking a speed up in the previous revision for small primes and it was just plain stupid. It was never going to work. It works now based upon the testing I have done.[/QUOTE]

Great news! :smile:

MisterBitcoin 2021-08-22 08:56

[CODE]C:\Users\Administrator\Documents\cllr S649\r15\thread 4>srsieve2.exe -i sr_1005.
abcd -P 20e9
srsieve2 v1.5.1, a program to find factors of k*b^n+c numbers for fixed b and va
riable k and n
Must use generic sieving logic because there is more than one sequence
Sieving with generic logic for p >= 15000000000
Fatal Error: Expected 986923 terms when building sequences, but counted only 0
[/CODE]


I build the .abcd file with srfiles -a command. It looks like the sieve file is damaged; however -G is working fine so i have a prp file already. BUT when i -a the prp file to get a fresh abcd file; i get the same message!



Might already be known, and i am using an older version >.> (shame on me lol)

rebirther 2021-08-22 09:11

[QUOTE=MisterBitcoin;586241][CODE]C:\Users\Administrator\Documents\cllr S649\r15\thread 4>srsieve2.exe -i sr_1005.
abcd -P 20e9
srsieve2 v1.5.1, a program to find factors of k*b^n+c numbers for fixed b and va
riable k and n
Must use generic sieving logic because there is more than one sequence
Sieving with generic logic for p >= 15000000000
Fatal Error: Expected 986923 terms when building sequences, but counted only 0
[/CODE]I build the .abcd file with srfiles -a command. It looks like the sieve file is damaged; however -G is working fine so i have a prp file already. BUT when i -a the prp file to get a fresh abcd file; i get the same message!



Might already be known, and i am using an older version >.> (shame on me lol)[/QUOTE]


How about
srfile_win64 -a your.prp


Edit:
I still have the abcd file for R1005

MisterBitcoin 2021-08-22 11:59

[QUOTE=rebirther;586242]How about
srfile_win64 -a your.prp


Edit:
I still have the abcd file for R1005[/QUOTE]


I tried that, but it got me the same result. However upgrading to the newest srsieve2 fixed it.

pepi37 2021-09-04 00:09

Using srsieve2 from latest version I got this error.
Factors are written in file but in wrong format. When you try to remove factors then you get error: candidate is not divisible, and all stops. Editing file with factors is not solution.



[QUOTE]91961131 | 57*20^448688+1
479259601 | 36*20^413328+1
91962193 | 110*20^454361+1
281926849 | 91962443 | 15*20^415116+1
135*20^481951+1
281927411 | 90*20^480795+1
185593399 | 79*20^458720+1[/QUOTE]

rogue 2021-09-04 00:19

[QUOTE=pepi37;587208]Using srsieve2 from latest version I got this error.
Factors are written in file but in wrong format. When you try to remove factors then you get error: candidate is not divisible, and all stops. Editing file with factors is not solution.[/QUOTE]

That is weird. I assume this is with multiple threads based upon the output. There should be a lock to ensure this doesn't happen if multiple threads are writing factors concurrently. I will verify that.

pepi37 2021-09-04 00:24

[QUOTE=rogue;587211]That is weird. I assume this is with multiple threads based upon the output. There should be a lock to ensure this doesn't happen if multiple threads are writing factors concurrently. I will verify that.[/QUOTE]


yes 5 threads in this case

pepi37 2021-09-11 08:36

srsieve2cl ( latest build from package 1.5.2)

srsieve2cl -P 10000000000000 -g 24 -G4 -D1 -d1 -d2 -d3 -d0


I have 4 GPU and using this command only last GPU is 95 % utilized
How to utilize all?

rogue 2021-09-11 12:29

[QUOTE=pepi37;587682]srsieve2cl ( latest build from package 1.5.2)

srsieve2cl -P 10000000000000 -g 24 -G4 -D1 -d1 -d2 -d3 -d0


I have 4 GPU and using this command only last GPU is 95 % utilized
How to utilize all?[/QUOTE]

mtsieve isn't designed to use multiple GPUs. You would have to run one instance on each GPU. Use -h to list the platforms and devices then use -D and -d to specify the one you want to use. The default is platform 0 and device 0 on platform 0.

pepi37 2021-09-11 13:22

[QUOTE=rogue;587687]mtsieve isn't designed to use multiple GPUs. You would have to run one instance on each GPU. Use -h to list the platforms and devices then use -D and -d to specify the one you want to use. The default is platform 0 and device 0 on platform 0.[/QUOTE]




Yes , later I manage that and got in total around 10mp/sec: using 4 cards
Thanks!
You have PM with another problem with srsieve2

pepi37 2021-10-09 22:05

[QUOTE]e:\PRIME\REPDIGIT-k92>srsieve2 -P 2000000000000000 -W 5 -w 1e7 -i 92.txt -O 92fact.txt -f B
srsieve2 v1.5.2, a program to find factors of k*b^n+c numbers for fixed b and variable k and n
Sieving one sequence where abs(c) = 1 for p >= 1600000000000000
Split 1 base 10 sequence into 108 base 10^180 sequences.
673642 bytes used for congruence tables
598 bytes used for Legendre tables
Sieve started: 16e14 < p < 2e15 with 88073 terms (1500028 < n < 2999998, k*10^n[B][COLOR=Red]+[/COLOR][/B]c) (expecting 558 factors)
p=1600017808265749, 1.493M p/sec, no factors found, 0.0% done. ETC 2022-01-13 02:30[/QUOTE]
If I use srsieve2 to sieve k*10^n[COLOR=Red]-[/COLOR]1 why is reported k*10^n[B][COLOR=Red]+[/COLOR][/B]c?
Maybe it is just simple error, or something deeper?

rogue 2021-10-09 22:09

[QUOTE=pepi37;590065]If I use srsieve2 to sieve k*10^n[COLOR=Red]-[/COLOR]1 why is reported k*10^n[B][COLOR=Red]+[/COLOR][/B]c?
Maybe it is just simple error, or something deeper?[/QUOTE]

Missing a % in a format string. It should be "%+c" and is just "+c" in the code. I have fixed and committed that fix.

matzetoni 2021-10-11 17:48

[CODE]>> gfndsieve.exe -k5000000 -K6000000 -n16001 -N17000 -o"out_test.txt"
gfndsieve v2.0, a program to find factors of k*2^n+1 numbers for variable k and n
Sieve started: 3 < p < 2^62 with 500000000 terms (5000001 <= k <= 5999999, 16001 <= n <= 17000, k*2^n+1)
Fatal Error: Invalid factor: 5006169*2^16953+1 mod 5012429 = 2651427 [/CODE]
I got this error when using mtsieve_2.2.1
There is no error when using the same command with mtsieve_2.0.3

rogue 2021-10-11 19:12

[QUOTE=matzetoni;590175][CODE]>> gfndsieve.exe -k5000000 -K6000000 -n16001 -N17000 -o"out_test.txt"
gfndsieve v2.0, a program to find factors of k*2^n+1 numbers for variable k and n
Sieve started: 3 < p < 2^62 with 500000000 terms (5000001 <= k <= 5999999, 16001 <= n <= 17000, k*2^n+1)
Fatal Error: Invalid factor: 5006169*2^16953+1 mod 5012429 = 2651427 [/CODE]
I got this error when using mtsieve_2.2.1
There is no error when using the same command with mtsieve_2.0.3[/QUOTE]

I will look into this.

ET_ 2021-10-11 21:46

[QUOTE=matzetoni;590175][CODE]>> gfndsieve.exe -k5000000 -K6000000 -n16001 -N17000 -o"out_test.txt"
gfndsieve v2.0, a program to find factors of k*2^n+1 numbers for variable k and n
Sieve started: 3 < p < 2^62 with 500000000 terms (5000001 <= k <= 5999999, 16001 <= n <= 17000, k*2^n+1)
Fatal Error: Invalid factor: 5006169*2^16953+1 mod 5012429 = 2651427 [/CODE]
I got this error when using mtsieve_2.2.1
There is no error when using the same command with mtsieve_2.0.3[/QUOTE]

Are you working on Fermat factors research?

rogue 2021-10-22 16:54

I have posted mtsieve 2.2.2 over at sourceforge. It addresses the open issues and has these changes:

[code]
framework:
Added __attribute__ to method declarations that accept variable arguments.

srsieve2, srsieve2cl: version 1.5.3
Modified to not remove terms that are prime as that defeats the purpose of Sierpinski/Riesel searches.
Fixed bug where maxn for a sequence has a small factor, but it is not found.

gnfdsieve, gfndsievecl: version 2.1
Fixed bug where code can find invalid factors.
[/code]

ET_ 2021-10-23 09:30

[QUOTE=rogue;591373]I have posted mtsieve 2.2.2 over at sourceforge. It addresses the open issues and has these changes:

[code]
framework:
Added __attribute__ to method declarations that accept variable arguments.

srsieve2, srsieve2cl: version 1.5.3
Modified to not remove terms that are prime as that defeats the purpose of Sierpinski/Riesel searches.
Fixed bug where maxn for a sequence has a small factor, but it is not found.

gnfdsieve, gfndsievecl: version 2.1
Fixed bug where code can find invalid factors.
[/code][/QUOTE]

Can Linux users acess to the source code and recompile? :smile:

rogue 2021-10-23 14:06

[QUOTE=ET_;591434]Can Linux users acess to the source code and recompile? :smile:[/QUOTE]

All of the source is on sourceforge as well as a makefile that works on OS X and Windows. If the makefile doesn't work on Linux, I would not expect it to be difficult to get it to work.

ET_ 2021-10-23 16:13

[QUOTE=rogue;591442]All of the source is on sourceforge as well as a makefile that works on OS X and Windows. If the makefile doesn't work on Linux, I would not expect it to be difficult to get it to work.[/QUOTE]

Thank you Mark. I will look for it better.

ryanp 2021-10-25 15:56

Is there advice about how to best choose values for "-G", "-g" and "-W' for OpenCL based programs like [C]srsieve2cl[/C] on a given GPU?

On a Tesla A100, I couldn't get srsieve2cl to go much above 9 to 10M p/sec, after fiddling with values for a while. By comparison, a plain [C]./srsieve2 -W 48[/C] on a 72-core Xeon CPU gives me about 15M p/sec.

rogue 2021-10-25 16:24

[QUOTE=ryanp;591575]Is there advice about how to best choose values for "-G", "-g" and "-W' for OpenCL based programs like [C]srsieve2cl[/C] on a given GPU?

On a Tesla A100, I couldn't get srsieve2cl to go much above 9 to 10M p/sec, after fiddling with values for a while. By comparison, a plain [C]./srsieve2 -W 48[/C] on a 72-core Xeon CPU gives me about 15M p/sec.[/QUOTE]

I recommend bumping -g. You will have to play around to see where you start seeing diminishing returns.

I have noticed that when running many workers that the code that feeds the worker threads is not fast enough. In some cases it is better to have multiple instances of srsieve2 running. To address this would require significant changes to the framework.


All times are UTC. The time now is 14:01.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2023, Jelsoft Enterprises Ltd.