![]() |
I have released 2.1.4. Here are the changes:
[code] framework: Fixed an issue with creating GPU kernels on OS X. srseive2cl: new release Finally an OpenCL version of srsieve2. srsieve2cl is at least 3x faster than srsieve2, On my GPU it is limited to about 5000 sequences due to GPU memory limitations. I do not know what the limits are for other GPUs. It will switch to the GPU at p>1e6. [/code] On an older GPU, srsieve2cl struggles with 1000 sequences causing significant lag in the display. But that GPU is also much slower, so it isn't worth running on it. |
[QUOTE=rogue;568920]I have released 2.1.4. Here are the changes:
[code] framework: Fixed an issue with creating GPU kernels on OS X. srseive2cl: new release Finally an OpenCL version of srsieve2. srsieve2cl is at least 3x faster than srsieve2, On my GPU it is limited to about 5000 sequences due to GPU memory limitations. I do not know what the limits are for other GPUs. It will switch to the GPU at p>1e6. [/code] On an older GPU, srsieve2cl struggles with 1000 sequences causing significant lag in the display. But that GPU is also much slower, so it isn't worth running on it.[/QUOTE] How much VRAM is used for 5000 sequences and 80000? |
[QUOTE=rebirther;568925]How much VRAM is used for 5000 sequences and 80000?[/QUOTE]
3257 sequences (9383 subsequences) using the GPU takes about 37 MB of RAM in the CPU and about 6 GB dedicated memory in the GPU (per Task Manager). I do not recall how much CPU memory was used with 80000 sequences, but I thought it was around 2 GB. |
[QUOTE=rogue;568920]I have released 2.1.4. Here are the changes:
[code] framework: Fixed an issue with creating GPU kernels on OS X. srseive2cl: new release Finally an OpenCL version of srsieve2. srsieve2cl is at least 3x faster than srsieve2, On my GPU it is limited to about 5000 sequences due to GPU memory limitations. I do not know what the limits are for other GPUs. It will switch to the GPU at p>1e6. [/code] On an older GPU, srsieve2cl struggles with 1000 sequences causing significant lag in the display. But that GPU is also much slower, so it isn't worth running on it.[/QUOTE] I am getting a speed of 4kp/sec for 11 sequences from n=1M to 20M. Sr2sieve and srsieve2 are both significantly faster. Is this what is expected? |
[QUOTE=Citrix;568959]I am getting a speed of 4kp/sec for 11 sequences from n=1M to 20M. Sr2sieve and srsieve2 are both significantly faster. Is this what is expected?[/QUOTE]
I do not look at p/sec as it is calculated differently. I look at factors per second. It is far more accurate. Nevertheless srsieve2 and sr2sieve can be faster if your GPU isn't particularly fast. |
Might it be possible to update the primesieve code used by mtsieve to version 7.6? It seems to provide some improvements over 7.3 which is currently used:
[LIST][*]improved caching of primes[*]improved switch statement in EratSmall and EratMedium[*]cache size detection improved on Linux and with the Apple Silicon CPU's (which could be useful for compiling this for ARM)[/LIST] |
[QUOTE=rogue;568928]3257 sequences (9383 subsequences) using the GPU takes about 37 MB of RAM in the CPU and about 6 GB dedicated memory in the GPU (per Task Manager).
I do not recall how much CPU memory was used with 80000 sequences, but I thought it was around 2 GB.[/QUOTE] Tried now the cl version on a RTX 5500XT with 8GB RAM but hit the limit, there was a driver timeout because of too much RAM used, I think it was 7.4GB. [B]srsieve2cl.exe -n2501 -N10000 -P1e9 -M 15000 -spl_remain.txt -fB[/B] 2021-01-11 19:57:22: Sieve completed at p=1000071173. Primes tested 50772480. Found 87459308 factors. 16098192 terms remaining. Time 239.43 seconds The speed is awesome, still running this on 16 cores srsieve2 to compare. Could be much better on faster cards with 16-24GB RAM. |
[QUOTE=rebirther;569010]Tried now the cl version on a RTX 5500XT with 8GB RAM but hit the limit, there was a driver timeout because of too much RAM used, I think it was 7.4GB.
[B]srsieve2cl.exe -n2501 -N10000 -P1e9 -M 15000 -spl_remain.txt -fB[/B] 2021-01-11 19:57:22: Sieve completed at p=1000071173. Primes tested 50772480. Found 87459308 factors. 16098192 terms remaining. Time 239.43 seconds The speed is awesome, still running this on 16 cores srsieve2 to compare. Could be much better on faster cards with 16-24GB RAM.[/QUOTE] Try using a lower value for -g (10 is the default). That should reduce some of the GPU memory usage.. |
[QUOTE=Dylan14;569009]Might it be possible to update the primesieve code used by mtsieve to version 7.6? It seems to provide some improvements over 7.3 which is currently used:
[LIST][*]improved caching of primes[*]improved switch statement in EratSmall and EratMedium[*]cache size detection improved on Linux and with the Apple Silicon CPU's (which could be useful for compiling this for ARM)[/LIST][/QUOTE] That shouldn't be too hard to do. |
[QUOTE=rebirther;569010]Tried now the cl version on a RTX 5500XT with 8GB RAM but hit the limit, there was a driver timeout because of too much RAM used, I think it was 7.4GB.
[B]srsieve2cl.exe -n2501 -N10000 -P1e9 -M 15000 -spl_remain.txt -fB[/B] 2021-01-11 19:57:22: Sieve completed at p=1000071173. Primes tested 50772480. Found 87459308 factors. 16098192 terms remaining. Time 239.43 seconds The speed is awesome, still running this on 16 cores srsieve2 to compare. Could be much better on faster cards with 16-24GB RAM.[/QUOTE] vs Ryzen 3950X with 16 cores [B]srsieve2 -n2501 -N10000 -P1e9 -W16 -spl_remain.txt -fB[/B] 2021-01-11 20:50:35: Sieve completed at p=1000000007. Primes tested 50847420. Found 92827983 factors. 10729517 terms remaining. Time 4990.80 seconds The CPU reduces the sievefile a bit more than GPU. |
[B]srsieve2cl.exe -n2501 -N10000 -P1e9 -M 5000 -g 5 -spl_remain.txt -fB[/B]
I tried to reduce the values but ends up every time in a driver timout. The usage was from 6.5-6.8GB VRAM. The error message came from the AMD driver, windows log (amdkmdag). |
[QUOTE=rebirther;569021][B]srsieve2cl.exe -n2501 -N10000 -P1e9 -M 5000 -g 5 -spl_remain.txt -fB[/B]
I tried to reduce the values but ends up every time in a driver timout. The usage was from 6.5-6.8GB VRAM. The error message came from the AMD driver, windows log (amdkmdag).[/QUOTE] I'm not certain how to address that. How many sequences are in the file? Can you cut the number of sequences in half and see if that works? |
[QUOTE=rogue;569027]I'm not certain how to address that. How many sequences are in the file? Can you cut the number of sequences in half and see if that works?[/QUOTE]
13808 sequences |
[QUOTE]e:\PRIME\NEW-PROJECT1>twinsieve -P 2000000000000000 -W 5 -w 5e7 -i k_b2_n4194304.pfgw -o k_b2_n4194304.pfgw -O 4194304.txt
twinsieve v1.3, a program to find factors of k*b^n+1/-1 numbers for fixed b and n and variable k Switching to ABC format since other formats are not supported when using -s Sieve started: 1646125914229267 < p < 2e15 with 128499 terms (7 < k < 3999975, k*2^4194304) (expecting 710 factors) p=1646139929503171, 46.01M p/sec, 23 factors found at 1328 sec per factor (last 128 min), 0.0% done. [COLOR=Red][B]ETC 2027-04-15[/B][/COLOR] 09:31 C:\Users\Desktop\NEW-PROJECT>twinsieve -P 1500000000000000 -W 3 -w 2e7 -i k_b2_n4194304.pfgw -o k_b2_n4194304.pfgw -O 4194304.txt twinsieve v1.3, a program to find factors of k*b^n+1/-1 numbers for fixed b and n and variable k Switching to ABC format since other formats are not supported when using -s Sieve started: 1215757153905743 < p < 15e14 with 131939 terms (7 < k < 3999975, k*2^4194304) (expecting 793 factors) p=1226583079600591, 30.11M p/sec, 32 factors found at 925 sec per factor (last 164 min), 3.8% done. ETC [B][COLOR=Red]2021-01-16 [/COLOR][/B]10:29[/QUOTE] I found prediction time very wrong,as you can see in this example. I also found why is time so wrong. From time to time sieve increase current p ( in this example p=1646139929503171) and then prediction become "normal" : in this case sieve will be over in 5-6 days, not 6 years. It is cosmetic bug, but little strange one. On the other CPU prediction with same sieve file, is very accurate. |
[QUOTE=pepi37;569155]I found prediction time very wrong,as you can see in this example. I also found why is time so wrong. From time to time sieve increase current p ( in this example p=1646139929503171) and then prediction become "normal" : in this case sieve will be over in 5-6 days, not 6 years. It is cosmetic bug, but little strange one. On the other CPU prediction with same sieve file, is very accurate.[/QUOTE]
The first output shows 0.0% done. This implies that the sample size is too small to accurate compute the ETC. Was something hindering execution such as another CPU intensive process? The first snippet shows very little progress (after at least 128 minutes) compared to the second. |
I found an, well not an bug but a flaw in srsieve2. When using srfile to remove sequences from an k´s remain file its writes a file that begins like that:
pmin=0 I tried starting a sieve and got this result: [CODE]C:\Users\Sydekum\Documents\other stuff\cllr38iwin64_v2>srsieve2.exe -s R7_4800.out -n 4800 -N 5000 -P 500e6 srsieve2 v1.1, a program to find factors of k*b^n+c numbers for fixed b and variable k and n Fatal Error: sequence must be in form k*b^n+c where you specify values for k, b and c[/CODE] After removing the line it worked fine. |
[QUOTE=MisterBitcoin;569172]I found an, well not an bug but a flaw in srsieve2. When using srfile to remove sequences from an k´s remain file its writes a file that begins like that:
pmin=0 I tried starting a sieve and got this result: [CODE]C:\Users\Sydekum\Documents\other stuff\cllr38iwin64_v2>srsieve2.exe -s R7_4800.out -n 4800 -N 5000 -P 500e6 srsieve2 v1.1, a program to find factors of k*b^n+c numbers for fixed b and variable k and n Fatal Error: sequence must be in form k*b^n+c where you specify values for k, b and c[/CODE] After removing the line it worked fine.[/QUOTE] That is only of those oddities from srsieve/srfile. If you use a different output format from srsieve/srfile (such as ABCD of PFGW), that line isn't included. I have added these to FUTURE.txt. [list][*]skipping the pmin line in srsieve2[*]added logic to srsieve2 to support removal of sequences similar to what srfile does[/list] |
[QUOTE=rogue;569161]The first output shows 0.0% done. This implies that the sample size is too small to accurate compute the ETC.
Was something hindering execution such as another CPU intensive process? The first snippet shows very little progress (after at least 128 minutes) compared to the second.[/QUOTE] No, there is no such process, this is Intel CPU with 6 cores, 5 are used. Dedicated machine only for sieving and from time to time open Firefox |
[QUOTE=pepi37;569213]No, there is no such process, this is Intel CPU with 6 cores, 5 are used. Dedicated machine only for sieving and from time to time open Firefox[/QUOTE]
All I can say is that the output is telling me that the CPUs are not being fully utilized by twinsieve, unless there is a bug causing the CPU to burn cycle without doing anything useful. |
[QUOTE=rogue;569220]All I can say is that the output is telling me that the CPUs are not being fully utilized by twinsieve, unless there is a bug causing the CPU to burn cycle without doing anything useful.[/QUOTE]
That is reason why I say it is cosmetic bug, as soon as p increase, prediction time become accurate. |
This is most likely an issue with my syntax, but when I run srsieve2 with a sequence in the form (k*b^n+c)[B]/d[/B], the program does not seem to be recognizing the division.
[CODE]uwu@DESKTOP-7I8GNER:~/Math/mtsieve$ ./srsieve2 -W2 -o=45557 -s"(41*10^n+13)/9" -n100001 -N150000 srsieve2 v1.3.1, a program to find factors of k*b^n+c numbers for fixed b and variable k and n Sieving with generic logic Sieve started: 3 < p < 2^62 with 50000 terms (100001 < n < 150000, k*10^n+c) Fatal Error: Invalid factor: 41*10^100001+13 mod 3 = 12[/CODE] |
[QUOTE=Plutie;569531]This is most likely an issue with my syntax, but when I run srsieve2 with a sequence in the form (k*b^n+c)[B]/d[/B], the program does not seem to be recognizing the division.
[CODE]uwu@DESKTOP-7I8GNER:~/Math/mtsieve$ ./srsieve2 -W2 -o=45557 -s"(41*10^n+13)/9" -n100001 -N150000 srsieve2 v1.3.1, a program to find factors of k*b^n+c numbers for fixed b and variable k and n Sieving with generic logic Sieve started: 3 < p < 2^62 with 50000 terms (100001 < n < 150000, k*10^n+c) Fatal Error: Invalid factor: 41*10^100001+13 mod 3 = 12[/CODE][/QUOTE] There are some gaps in the factor validation for these. I'll take a look to see if I can fix them. |
I have released 2.1.5. Here are the changes:
[code] framework: Added MpArith.h (non-vectorized) and changed class names in MpArithVector.h. Overloaded HashTable constructor as needed for srsieve2. srsieve2, srsieve2cl: version 1.4 Lots of refactoring to support special sieving logic for c=1 sequences. Implemented sr1sieve logic using Montgomery mulmod logic (CPU only). Change array of sequences to a linked list to avoid compiler warnings. Add support for pmin= line in input file (as generated by srsieve/srfile). [/code] This is a beta release for srsieve2 and srsieve2cl because I had to refactor a lot of code in order to implement the c=1 logic (sr1sieve) cleanly. As this is a beta, I am asking interested users to give it a spin. You should be able to start sieving a new sequence and it will switch to the c=1 logic automatically. Right now the c=1 logic only works for a single sequence. If you have multiple sequences it will use the generic logic. Support for multiple sequences will come in the future, but that isn't next on my list. The c=1 logic is about 15% slower than sr1sieve based upon the limited testing I have done. Most of that is due to having zero hand-tuned ASM in that logic. sr1sieve has a ton of ASM and I am rather loathe to pull it into srsieve2. On the plus side I intend to focus next on fixing bugs (if any are reported) and implementing the OpenCL logic for a single c=1 sequence. It should be doable, but I don't know how fast it will be or if I will find other limits that prevent it from performing well. I think that the issue reported by Plutie is fixed, but I have not tested it. |
Compiling the latest version of mtsieve (r92) fails at CisOneSequenceHelper.cpp:
[code]g++ -Isieve -m64 -Wall -O3 -std=c++11 -c -o sierpinski_riesel/CisOneSequenceHelper_cpu.o sierpinski_riesel/CisOneSequenceHelper.cpp sierpinski_riesel/CisOneSequenceHelper.cpp:13:10: fatal error: HashTable.h: No such file or directory 13 | #include "HashTable.h" | ^~~~~~~~~~~~~ compilation terminated. make: *** [makefile:131: sierpinski_riesel/CisOneSequenceHelper_cpu.o] Error 1 [/code] This is fixed if line 13 is changed to [code]#include "../core/HashTable.h"[/code] |
[QUOTE=Dylan14;570182]Compiling the latest version of mtsieve (r92) fails at CisOneSequenceHelper.cpp:
[code]g++ -Isieve -m64 -Wall -O3 -std=c++11 -c -o sierpinski_riesel/CisOneSequenceHelper_cpu.o sierpinski_riesel/CisOneSequenceHelper.cpp sierpinski_riesel/CisOneSequenceHelper.cpp:13:10: fatal error: HashTable.h: No such file or directory 13 | #include "HashTable.h" | ^~~~~~~~~~~~~ compilation terminated. make: *** [makefile:131: sierpinski_riesel/CisOneSequenceHelper_cpu.o] Error 1 [/code] This is fixed if line 13 is changed to [code]#include "../core/HashTable.h"[/code][/QUOTE] Thanks. I wonder why it compiles in Windows. In any case that #include is not needed. BTW, if anyone has ideas for optimizations for the new c=1 logic, I would appreciate if you posted them in the "mtsieve enhancements" thread. |
Srsieve2
[QUOTE]srsieve2 -P 11000000000000000 -W4 -w 1e7 -i t16_b155_k4.npg -o t16_b155_k4.npg -f B -O factgenefer.txt[/QUOTE]
Last version crash, version 1.3 works without problems |
[QUOTE=pepi37;570300]Last version crash, version 1.3 works without problems[/QUOTE]
Can you post or e-mail me the input file? |
[QUOTE=rogue;570304]Can you post or e-mail me the input file?[/QUOTE]
This is part of input file [QUOTE]10000000000000000:P:1:155:257 4 1174326 4 1174366 4 1174374 4 1174582 4 1174598 4 1174630 4 1174646 4 1174830 4 1174950 4 1174974 4 1174998 4 1175014 4 1175142 4 1175150 4 1175254 4 1175278 4 1175302 4 1175398 4 1175430 4 1175454 4 1175574 4 1175742 4 1175822[/QUOTE] |
Found the problem. It will be fixed in the next release.
|
Great news, thanks!
|
I posted 1.4.1 of srsieve2 at sourceforge in its own 7z file.
Upon some further testing, it is about 30% slower than sr1sieve (with x86 asm) and 10% slower than sr1sieve (with no x86 asm). I fully expect that srsieve2cl with c=1 support in the GPU will be much faster than sr1sieve even on modest GPUs, so I'm not too concerned about the poorer performance at this time. As much as I would love to stop supporting sr1sieve, I don't think that is going to happen anytime soon. |
[QUOTE=rogue;570376]I posted 1.4.1 of srsieve2 at sourceforge in its own 7z file.
Upon some further testing, it is about 30% slower than sr1sieve (with x86 asm) and 10% slower than sr1sieve (with no x86 asm). I fully expect that srsieve2cl with c=1 support in the GPU will be much faster than sr1sieve even on modest GPUs, so I'm not too concerned about the poorer performance at this time. As much as I would love to stop supporting sr1sieve, I don't think that is going to happen anytime soon.[/QUOTE] Is that comparison without sr1sieve using a Legendre symbol cache? As far as I can tell srsieve2 with sr1sieve logic is spending around 30% of its time calculating legendre symbols. I get the following message if I try to turn it on "[COLOR="Red"]Ingoring [/COLOR]-L option since Legendre tables cannot be used" Also I get a seg fault after running "./srsieve2 -P 1e9 -n 1 -N 100000 -s "19920911*2^n+1"" This is using r95 of the code on Sourceforge. |
[QUOTE=henryzz;570419]Is that comparison without sr1sieve using a Legendre symbol cache? As far as I can tell srsieve2 with sr1sieve logic is spending around 30% of its time calculating legendre symbols. I get the following message if I try to turn it on "[COLOR="Red"]Ingoring [/COLOR]-L option since Legendre tables cannot be used"
Also I get a seg fault after running "./srsieve2 -P 1e9 -n 1 -N 100000 -s "19920911*2^n+1"" This is using r95 of the code on Sourceforge.[/QUOTE] -L isn't supported (yet). By default it will create a Legendre table and you can use -l to disable, but I actually haven't verified that is working correctly. I found the error and committed a change to sourceforge. I have updated srsieve2.7z over at sourceforge as well. |
The current SVN version fails to run on Kubuntu 20.04:
[code]$ ./srsieve2 -W "3" -n "50e3" -N "230e3" -P "1e9" -o 't17_b2.prp' -f B -s "37803*2^n-1" srsieve2 v1.5, a program to find factors of k*b^n+c numbers for fixed b and variable k and n Sieving with generic logic for p >= 3 Sieve started: 3 < p < 1e9 with 180001 terms (50000 < n < 230000, k*2^n+c) (expecting 170458 factors) Sieving one sequence where abs(c) = 1 for p >= 37803 Split 1 base 2 sequence into 94 base 2^180 sequences. malloc(): corrupted top size Aborted (core dumped) [/code] |
[QUOTE=Happy5214;570707]The current SVN version fails to run on Kubuntu 20.04:
[code]$ ./srsieve2 -W "3" -n "50e3" -N "230e3" -P "1e9" -o 't17_b2.prp' -f B -s "37803*2^n-1" srsieve2 v1.5, a program to find factors of k*b^n+c numbers for fixed b and variable k and n Sieving with generic logic for p >= 3 Sieve started: 3 < p < 1e9 with 180001 terms (50000 < n < 230000, k*2^n+c) (expecting 170458 factors) Sieving one sequence where abs(c) = 1 for p >= 37803 Split 1 base 2 sequence into 94 base 2^180 sequences. malloc(): corrupted top size Aborted (core dumped) [/code][/QUOTE] It crashes on Windows as well, so it shouldn't be too hard to track down and fix. |
I found and fixed the problem. The changes are committed to sourceforge.
|
There seems to be an issue with the Legendre lookup table. If you do not use -l, then it will miss factors. It should be easy to track down, but one never knows. Note that -l disables the building of the Legendre lookup tables. It is enabled by default.
|
[QUOTE=rogue;570755]There seems to be an issue with the Legendre lookup table. If you do not use -l, then it will miss factors. It should be easy to track down, but one never knows. Note that -l disables the building of the Legendre lookup tables. It is enabled by default.[/QUOTE]
This is now fixed. |
BTW, now with this change the speed of srsieve2 (for CisOne logic) is within 5% of the speed of sr1sieve (with x86 asm) and about 10% faster than the speed of sr1sieve (with no x86 asm). By "within" I mean that sometimes it is faster and sometimes it is slower. The speed difference appears to be one of cache usage and CPU load on the machine overall. Note this was only tested with a single sequence so it is possible that other sequences will yield different results.
I will have to play around with unrolling some of the loops in srsieve2 to see if I can do better, but right now I'm pleased to see that it is performing so well considering it didn't look so well earlier this week. My intention is to post a build after I track down the issue with the CisOne logic in srsieve2cl. |
Great news! I have tracked down and squashed the known bugs in srsieve2 and srsieve2cl. I have some benchmarks to share.
The CPU is an Intel i78-8550H at 2.6 GHz and the GPU is an NVIDIA Quadro P3200. I was running no other CPU/GPU intensive processes during this test. All runs yielded the same set of factors. I sieved 37803*2^n-1 for n from 5e4 to 25e4 up to 1e6. I then ran the file thru sr1sieve, sr2sieve, and sr2sievecl taking the average of 5 runs. Here are the results: [code] srsieve2 -i b2_n.in -P1e10 504 srsieve2 -i b2_n.in -P1e10 -l 647 srsieve2cl -i b2_n.in -P1e10 355 srsieve2cl -i b2_n.in -P1e10 -l 353 srsieve2cl -i b2_n.in -P1e10 -g100 221 srsieve2cl -i b2_n.in -P1e10 -g100 -1 210 srsieve2cl -i b2_n.in -P1e10 -g1000 184 srsieve2cl -i b2_n.in -P1e10 -g1000 -l 183 sr1sieve -i b2_n.in -P1e10 -ffact.out (asm) 460 sr1sieve -i b2_n.in -P1e10 -ffact.out -x (asm) 562 sr1sieve -i b2_n.in -P1e10 -ffact.out (no asm) 455 sr1sieve -i b2_n.in -P1e10 -ffact.out -x (no asm) 549 [/code] As a reminder -l with srsieve2/srsieve2cl means "do not use Legendre lookup tables". This corresponds to -x from sr1sieve. The OpenCL code in srsieve2cl supports Legendre lookup tables, but you can see that it doesn't provide any benefit for this k. It is clear that srsieve2cl with -g1000 clearly beats out everything else. With -g1000 it uses less than 500 MB of GPU memory (per Windows Task Manager. It will be interesting to see this run on lower GPUs to see how they compare. So with this report, mtsieve 2.1.6 is now released. Here are the changes: [code] framework: Add largestPrimeTested parameter to NotifyAppToRebuild() as the app cannot rely on accurately determining that value. srsieve2, srsieve2cl: version 1.5 Fixed remaining known issues with CisOne logic (sequences where abs(c) = 1) for a single CisOne sequence (sr1sieve). Added OpenCL code for CisOne logic. Added Legendre table lookups for CisOne logic. [/code] |
[QUOTE=rogue;570815]
[code] srsieve2cl -i b2_n.in -P1e10 -g100 -1 210 [/code]As a reminder -l with srsieve2/srsieve2cl means "do not use Legendre lookup tables". [/QUOTE] And what does the "-1" means? :razz: OTOH, good job! |
Does srsieve2cl with -g1000 kill srsieve1 in speed?
|
[QUOTE=pepi37;570838]Does srsieve2cl with -g1000 kill srsieve1 in speed?[/QUOTE]
Based upon the single sequence I tested given the hardware specs I provided, sriseve2cl with -g1000 is more than twice as fast as sr1sieve. With -g100 it is slightly more than twice is faster as sr1sieve. With a higher value with -g, it could possible be 3x faster, but that is on this hardware. |
Single sequence is only I need 😊
|
[QUOTE]e:\MTSIEVE\216>srsieve2cl -P 2e15 -H -D 1 -d 1 -i 92.txt -g 120 -o 92.txt -f B -l
srsieve2cl v1.5, a program to find factors of k*b^n+c numbers for fixed b and variable k and n Sieving one sequence where abs(c) = 1 for p >= 1600012693787917 Split 1 base 10 sequence into 216 base 10^360 sequences. 709440 bytes used for congruence tables CL_DEVICE_MAX_COMPUTE_UNITS = 22 CL_DEVICE_GLOBAL_MEM_SIZE = 2147483648 CL_DEVICE_LOCAL_MEM_SIZE = 49152 CL_KERNEL_WORK_GROUP_SIZE = 256 CL_KERNEL_PREFERRED_WORK_GROUP_SIZE_MULTIPLE = 32 CL_KERNEL_LOCAL_MEM_SIZE = 1 CL_KERNEL_PRIVATE_MEM_SIZE = 14752 GPU global bytes allocated = 36837532 GPU private bytes allocated = 4805632 GPU primes per worker is 675840 Sieve started: 1600012693787917 < p < 2e15 with 93883 terms (1400033 < n < 2999998, k*10^n+c) (expecting 595 factors) p=1600014279249383, 742.3K p/sec, no factors found, 0.0% done. ETC 2021-08-02 01:12[/QUOTE] CPU is Intel 9600K ( 5% utilization) GPU is 1660 Super ( working only with this program) I experiment with g , and if I go above 150 my gpu utilization spikes from 0 -100% but speed is same as I use g=120 so it is around 742.3K p/sec I will kindly ask to explain -W and -w parameters since they clearly are not same as in rest of mtsieve package. Test was done on sequence 92*10^n-1 from ( sieve is from 1M4 to 3M) with 93883 candidates |
[QUOTE=pepi37;570877]CPU is Intel 9600K ( 5% utilization)
GPU is 1660 Super ( working only with this program) I experiment with g , and if I go above 150 my gpu utilization spikes from 0 -100% but speed is same as I use g=120 so it is around 742.3K p/sec I will kindly ask to explain -W and -w parameters since they clearly are not same as in rest of mtsieve package. Test was done on sequence 92*10^n-1 from ( sieve is from 1M4 to 3M) with 93883 candidates[/QUOTE] How does that compare to the speed of sr1sieve? -W is used to specify the number of CPU only workers. -w is the number of primes per chunk per worker. -G is used to specify the number of GPU only workers. -g is a multiplier for the CL_DEVICE_MAX_COMPUTE_UNITS * CL_KERNEL_WORK_GROUP_SIZE to compute the number of primes per chunk per worker. For CPU-only exes, -W defaults to 1 and -w to 1e6. For GPU exes, -W defaults to 0 and -G default to 1 and -g defaults to 10. For GPU-only exes, if p_min < a threshold determined at runtime, then a CPU worker is used even if -W is 0, but that CPU worker is only used until p_min > that threshold. IIRC, that threshold is min(1e6, k) for srsieve2cl. 1e6 is used because the factor density is fairly high for low p and I want to limit how much GPU memory is needed to pass factors back to the CPU. It is typically n_max or k_max for other GPU exes. -M is used to adjust the amount of memory needed for returning factors. If the default is not sufficient, then you will be told at runtime to adjust it if it detects too many factors for given -M. You can use -W with a value > 0 for GPU exes, but that really depends upon the relative speed of your CPU to your GPU. You cannot use -G with CPU-only exes. I don't see a usage for -G > 1, but I suppose you can do that if -g isn't large enough to keep your GPU busy. If -g is too large you could encounter screen lag. I hope that answers your questions. |
I can compare speed of sieves only by finish date
|
[QUOTE=pepi37;570901]I can compare speed of sieves only by finish date[/QUOTE]
There is an ETA output by both programs and a factors removal rate. You just need to run both them for 10 minutes, hit ^c, then compare the results to see which one sieved further during that time. |
I encountered an bug while using srsieve2 on an large input file (around 100 MB). After two days of sieving it stopped at the given Pmax; but look at the screen output:
[CODE] p=19852801223, 178.6 p/sec, 4397895 factors found at 6.62 sec per factor (last p=19852801223, 180.6 p/sec, 4397919 factors found at 6.63 sec per factor (last p=19852801223, 176.3 p/sec, 4397936 factors found at 6.64 sec per factor (last p=19852801223, 180.7 p/sec, 4397951 factors found at 6.65 sec per factor (last 517 min), 99.2% done. ETC 2021-02-05 12:14 2 workers didn't stop after 10 minutes D:\sieve>[/CODE] Which also means he didnt saved the file, last checkpoint was around 4 hours ago which is okay. However i wonder why it did crash, is there a function that gives the app an timeout after a worker didnt reported the results back after reaching n-minutes? |
[QUOTE=MisterBitcoin;570916]I encountered an bug while using srsieve2 on an large input file (around 100 MB). After two days of sieving it stopped at the given Pmax; but look at the screen output:
[CODE] p=19852801223, 178.6 p/sec, 4397895 factors found at 6.62 sec per factor (last p=19852801223, 180.6 p/sec, 4397919 factors found at 6.63 sec per factor (last p=19852801223, 176.3 p/sec, 4397936 factors found at 6.64 sec per factor (last p=19852801223, 180.7 p/sec, 4397951 factors found at 6.65 sec per factor (last 517 min), 99.2% done. ETC 2021-02-05 12:14 2 workers didn't stop after 10 minutes D:\sieve>[/CODE] Which also means he didnt saved the file, last checkpoint was around 4 hours ago which is okay. However i wonder why it did crash, is there a function that gives the app an timeout after a worker didnt reported the results back after reaching n-minutes?[/QUOTE] This occurs when the main thread suspects that one or more of the worker threads has become unresponsive, maybe stuck in a tight loop. In this case because p/sec is so low each worker thread needs more than 10 minutes to process a single chunk of primes. In this case the worker thread was likely still working. I can look into a change to not do that check under certain circumstances. To get around the problem, I suggest that you use -w1e3 or -w1e4 (the default is -w1e6). This will give smaller chunks of work to each worker and thus they can process the chunk must faster. This will have a negligible affect on overall rate since so little time is spent in the prime sieve. |
I ran into an issue starting a new Riesel sieve with multiple workers:
[code]$ ./bin/srsieve2 -W 3 -n 125e3 -N 300e3 -P 1e9 -o t17_b2.prp -f B -s "14549535*2^n-1" srsieve2 v1.5, a program to find factors of k*b^n+c numbers for fixed b and variable k and n Sieving with generic logic for p >= 3 Sieve started: 3 < p < 1e9 with 175001 terms (125000 < n < 300000, k*2^n+c) (expecting 165723 factors) Sieving with generic logic for p >= 257 Split 1 base 2 sequence into 1 base 2^1 sequences. Fatal Error: Invalid factor: 14549535*2^128595-1 mod 34747 = 22443 [/code] [c]-W 2[/c] and [c]-W 4[/c] also failed with the same error at other small primes, but using only 1 worker seemed to get past that stage (I had another sieve running that I didn't want to interfere with, so I didn't complete this): [code]$ ./bin/srsieve2 -W 1 -n 125e3 -N 300e3 -P 1e9 -o t17_b2.prp -f B -s "14549535*2^n-1" srsieve2 v1.5, a program to find factors of k*b^n+c numbers for fixed b and variable k and n Sieving with generic logic for p >= 3 Sieve started: 3 < p < 1e9 with 175001 terms (125000 < n < 300000, k*2^n+c) (expecting 165723 factors) Sieving with generic logic for p >= 257 Split 1 base 2 sequence into 1 base 2^1 sequences. Sieving one sequence where abs(c) = 1 for p >= 15489191 Split 1 base 2 sequence into 171 base 2^180 sequences. 741796 bytes used for congruence tables 1617098 bytes used for Legendre tables ^CCTRL-C accepted. Threads will stop after sieving to 32456407 Sieve interrupted at p=32456407. CPU time: 16.72 sec. (0.03 sieving) (0.97 cores) 43690 terms written to t17_b2.prp Primes tested: 1000000. Factors found: 131311. Remaining terms: 43690. Time: 17.19 seconds. [/code] |
[QUOTE=Happy5214;571143]I ran into an issue starting a new Riesel sieve with multiple workers:
[code]$ ./bin/srsieve2 -W 3 -n 125e3 -N 300e3 -P 1e9 -o t17_b2.prp -f B -s "14549535*2^n-1" srsieve2 v1.5, a program to find factors of k*b^n+c numbers for fixed b and variable k and n Sieving with generic logic for p >= 3 Sieve started: 3 < p < 1e9 with 175001 terms (125000 < n < 300000, k*2^n+c) (expecting 165723 factors) Sieving with generic logic for p >= 257 Split 1 base 2 sequence into 1 base 2^1 sequences. Fatal Error: Invalid factor: 14549535*2^128595-1 mod 34747 = 22443 [/code] [c]-W 2[/c] and [c]-W 4[/c] also failed with the same error at other small primes, but using only 1 worker seemed to get past that stage (I had another sieve running that I didn't want to interfere with, so I didn't complete this).][/QUOTE] Hmm. I'll take a look. This seems to have failed in the generic sieving logic, not the cisone logic. |
[QUOTE=rogue;571153]Hmm. I'll take a look. This seems to have failed in the generic sieving logic, not the cisone logic.[/QUOTE]
I see what I did wrong It was something I introduced in the most recent release. I'll fix in the next release. Since you are on linux or OS X, I have committed the files so you can build and try again. |
[QUOTE=rogue;571167]I see what I did wrong It was something I introduced in the most recent release. I'll fix in the next release. Since you are on linux or OS X, I have committed the files so you can build and try again.[/QUOTE]
It worked, though I actually didn't need it anymore. My workflow involves using srsieve2 to 1e9 and the faster sr1sieve or sr2sieve beyond that (this old box has no real GPU option), so I just burned the extra minute and ran it with one worker. |
I have released 2.2.0:
[code] framework: Updated OpenCL on Windows. See makefile for details. Updated primesieve to 7.6. psieve, psievecl: version 1.4 Some refactoring to support OpenCL worker. First release of psievecl. Verify factors from -I input file srsieve2, srsieve2cl: version 1.5.1 Fixed bug that was introduced in the refactoring of 1.5 that impacts generic sieving while using multiple threads. Added -R to remove sequences. Use -Rk*b^n+c format to remove a single sequence or use -R with a file that has multiple sequences. This is not tested yet. [/code] psievecl is about 20x faster than psieve. The main slowdown is factor validation, which is less noticeable as factors become more sparse. One odd behavior is that I noticed that srsieve2cl is slower than the previous release, but I do not know why. Even after reverting the framework changes for the release it was slower. I thought it was the update to OpenCL or primesieve, but reverting to the older versions of those made no difference. I'm likely missing something, but I don't know what. I'm hoping that someone is willing to give the -R option with srsieve2 a spin. |
1 Attachment(s)
[QUOTE=rogue;571238]I'm hoping that someone is willing to give the -R option with srsieve2 a spin.[/QUOTE]
No luck: [code]$ ./srsieve2 -i b2_n.abcd -o b2_n.abcd -R "658687*2^n-1" srsieve2 v1.5.1, a program to find factors of k*b^n+c numbers for fixed b and variable k and n 2018 terms for sequence 658687*2^n-1 have been removed Must use generic sieving logic because there is more than one sequence Sieving with generic logic for p >= 982453051 Fatal Error: Expected 62636 terms when building sequences, but counted only 60618 [/code] Input file attached (with .txt extension added). |
[QUOTE=Happy5214;571242]No luck:
[code]$ ./srsieve2 -i b2_n.abcd -o b2_n.abcd -R "658687*2^n-1" srsieve2 v1.5.1, a program to find factors of k*b^n+c numbers for fixed b and variable k and n 2018 terms for sequence 658687*2^n-1 have been removed Must use generic sieving logic because there is more than one sequence Sieving with generic logic for p >= 982453051 Fatal Error: Expected 62636 terms when building sequences, but counted only 60618 [/code] Input file attached (with .txt extension added).[/QUOTE] That will be easy to fix. When the sequence is removed the code isn't decrementing il_TermCount. |
I have released 2.2.1. Here are the changes:
[quote] framework: no changes gfndsieve, gfndsievecl: version 2.0 Moved GFN divisor testing to GFNDivisorTester class so that GFNDivisorApp is smaller and so that future support of non-x86 is easier since GFNDivisorTester calls a number of x86 asm methods directly. For gfndsievecl do not report any terms with factors < 50. This reduces the size needed for the buffer that is used to report factors. Added -r and -R options to support functionality similar to ppsieve. -r will not generate a bitmap for tracking terms. It will only generate an output file of factors. -R is used with -r. If a term has a factor below 32767 (the default value), then the program will not output any factors for the term. -r and -x are mutually exclusive with -r overriding -x. Added various speed improvements. srsieve2, srsieve2cl: version 1.5.2 Fixed issue with CisOne logic as it tries to rebuild sequences when there are multiple sequences as that is not yet supported. [quote] In what limited performance testing I have done it appears that gnfdsievecl with -r is about 5x faster than the OpenCL version of ppsieve. I think some of that is due to using a much higher value for -S in gnfdsievecl than what ppsievecl does, but that doesn't explain the entirely of the speed gain. The CPU only version is about 3x slower than ppsieve, but that is due to a lot of fine-tuned assembler in the ppsieve CPU code. I don't expect anyone to use it for that, but at the same time gfndsieve should be about 50% faster for typical gfn divisor sieving. |
Can someone make a program for sieving sequences of the form k[SUB]1[/SUB]*b[SUB]1[/SUB]^n+k[SUB]2[/SUB]*b[SUB]2[/SUB]^n+c with variable n?
|
There appears to be something broken with gfndsieve at head. This is from the latest svn build:
[CODE]./gfndsieve -P 1e11 -W 36 -n 10000 -N 100000 -k 1e5 -K 1e6 -o gfn.txt gfndsieve v2.0, a program to find factors of k*2^n+1 numbers for variable k and n Sieve started: 3 < p < 1e11 with 40500450000 terms (100001 <= k <= 999999, 10000 <= n <= 100000, k*2^n+1) (expecting 38743756770 factors) Fatal Error: Invalid factor: 100007*2^99457+1 mod 100271 = 13484[/CODE] |
[QUOTE=mathwiz;585516]There appears to be something broken with gfndsieve at head. This is from the latest svn build:
[CODE]./gfndsieve -P 1e11 -W 36 -n 10000 -N 100000 -k 1e5 -K 1e6 -o gfn.txt gfndsieve v2.0, a program to find factors of k*2^n+1 numbers for variable k and n Sieve started: 3 < p < 1e11 with 40500450000 terms (100001 <= k <= 999999, 10000 <= n <= 100000, k*2^n+1) (expecting 38743756770 factors) Fatal Error: Invalid factor: 100007*2^99457+1 mod 100271 = 13484[/CODE][/QUOTE] I get the same error. I'll take a look. |
[QUOTE=rogue;585518]I get the same error. I'll take a look.[/QUOTE]
I had the same the same issue with a previous build. You found the problem back then. [url]https://www.mersenneforum.org/showthread.php?t=22890[/url] |
[QUOTE=houding;585541]I had the same the same issue with a previous build.
You found the problem back then. [url]https://www.mersenneforum.org/showthread.php?t=22890[/url][/QUOTE] It might or not be the same cause. I haven't had a chance to look at it yet. |
The problem with gfnsieve is fixed in sourceforge.
|
Similar? issue to what I reported on page 46 - latest svn
[CODE]/Math/mtsieve$ ./srsieve2cl -n 100000 -N 500000 -o out.txt -s "(88*10^n-7)/9" srsieve2cl v1.5.3, a program to find factors of k*b^n+c numbers for fixed b and variable k and n Sieving with generic logic for p >= 3 Creating CPU worker to use until p >= 1000000 GPU primes per worker is 25600 Sieve started: 3 < p < 2^62 with 400001 terms (100000 < n < 500000, k*10^n+c) Fatal Error: Invalid factor: (88*10^100000-7)/9 mod 3 = 18446744073709551610[/CODE] |
[QUOTE=Plutie;585584]Similar? issue to what I reported on page 46 - latest svn
[CODE]/Math/mtsieve$ ./srsieve2cl -n 100000 -N 500000 -o out.txt -s "(88*10^n-7)/9" srsieve2cl v1.5.3, a program to find factors of k*b^n+c numbers for fixed b and variable k and n Sieving with generic logic for p >= 3 Creating CPU worker to use until p >= 1000000 GPU primes per worker is 25600 Sieve started: 3 < p < 2^62 with 400001 terms (100000 < n < 500000, k*10^n+c) Fatal Error: Invalid factor: (88*10^100000-7)/9 mod 3 = 18446744073709551610[/CODE][/QUOTE] Latest SVN source code is 2.2.2 (July 2nd 2021, r138) The Windows EXE 7z file is 2.2.1 The last commit is r139 |
[QUOTE=ET_;585585]Latest SVN source code is 2.2.2 (July 2nd 2021, r138)
The Windows EXE 7z file is 2.2.1 The last commit is r139[/QUOTE] Just redownloaded r139, same error is occurring. |
[QUOTE=Plutie;585584]Similar? issue to what I reported on page 46 - latest svn
[CODE]/Math/mtsieve$ ./srsieve2cl -n 100000 -N 500000 -o out.txt -s "(88*10^n-7)/9" srsieve2cl v1.5.3, a program to find factors of k*b^n+c numbers for fixed b and variable k and n Sieving with generic logic for p >= 3 Creating CPU worker to use until p >= 1000000 GPU primes per worker is 25600 Sieve started: 3 < p < 2^62 with 400001 terms (100000 < n < 500000, k*10^n+c) Fatal Error: Invalid factor: (88*10^100000-7)/9 mod 3 = 18446744073709551610[/CODE][/QUOTE] srsieve2 does not work correctly when d > 1. I don't know how easy that will be to fix. I thought it was correct and working but is clearly not. I do see a separate issue in factor validation when abs(c) > 1, so I will fix that. |
[QUOTE=rogue;585589]This is a different issue. I have time to look into it today.[/QUOTE]
The latest build in SVN seems to be working for me. However, does gfndsieve respect the -W flag? With -W 36 I still only see a single CPU utilized in "top". |
[QUOTE=mathwiz;585590]The latest build in SVN seems to be working for me.
However, does gfndsieve respect the -W flag? With -W 36 I still only see a single CPU utilized in "top".[/QUOTE] For primes < 1e4 only a single thread is used to reduce contention when removing terms from the vector of remaining terms. But if the first chunk of work ends after 1e4 it will wait until that chunk of work is done. I could modify the code to "rebuild the workers" once it reaches 1e4 so that more threads can do the work. I would not be surprised if you end up starving workers with -W36 because the main thread that doles out chunks of work will become a bottleneck. One way to offset this is to increase -w to give each worker more to work on. To eliminate the bottleneck with the main worker would require a number changes to the framework. I haven't thought about it a lot so I don't know how easy or difficult that would be or how much it would impact overall performance. |
Unless I'm mistaken about how the ABCD format, gfndsieve seems to be leaving a lot of terms with very small factors. For example, this is a snippet from gfndsieve output:
[CODE]ABCD $a*2^60000+1 [700001] // Sieved to 10000000000051 4 2 2 4 2 2 2 4 2 2 2 2 2 2 4 2 2 2 2 2 2 2 2 4 2 2 4 2 6[/CODE] Feeding this to LLR gives: [CODE]700001*2^60000+1 has a small factor : 3 !! Starting Proth prime test of 700005*2^60000+1 Using all-complex AVX-512 FFT length 6K, a = 7 700005*2^60000+1 is not prime. Base-7 Proth RES64: A386651F7F998E5F. Time : 802.675 ms. 700007*2^60000+1 has a small factor : 3 !! Starting Proth prime test of 700009*2^60000+1 Using all-complex AVX-512 FFT length 6K, a = 3 700009*2^60000+1 is not prime. Proth RES64: B72DF3B16C12AD73. Time : 744.449 ms. 700013*2^60000+1 has a small factor : 3 !! Starting Proth prime test of 700015*2^60000+1 Using all-complex AVX-512 FFT length 6K, a = 3 700015*2^60000+1 is not prime. Proth RES64: E1F28AC9BAA1BB8B. Time : 739.817 ms. Starting Proth prime test of 700017*2^60000+1 Using all-complex AVX-512 FFT length 6K, a = 5 700017*2^60000+1 is not prime. Base-5 Proth RES64: 43CEAF84FFC1C844. Time : 745.082 ms. 700019*2^60000+1 has a small factor : 3 !! Starting Proth prime test of 700023*2^60000+1 Using all-complex AVX-512 FFT length 6K, a = 7 700023*2^60000+1 is not prime. Base-7 Proth RES64: 4B0BB226CE058F90. Time : 755.652 ms. 700025*2^60000+1 has a small factor : 3 !! Starting Proth prime test of 700027*2^60000+1 Using all-complex AVX-512 FFT length 6K, a = 3 700027*2^60000+1 is not prime. Proth RES64: 10DA62863970FCB8. Time : 803.375 ms. 700029*2^60000+1 has a small factor : 5 !! 700031*2^60000+1 has a small factor : 3 !![/CODE] Note all the "has a small factor : 3 !!" lines. |
Is this the latest code in source forge? I'm guessing it is. I'll have to take a look at it.
|
[QUOTE=rogue;585764]Is this the latest code in source forge? I'm guessing it is. I'll have to take a look at it.[/QUOTE]
Yep, latest from SVN. Sample command after building: [CODE] ./gfndsieve -P 1e13 -W 36 -n 60000 -N 61000 -k 700e3 -K 800e3 -o gfnsmall.txt[/CODE] Produces: [CODE]ABCD $a*2^60000+1 [700001] // Sieved to 10000000000051 4 2 2 4 2 2 2 4 2 2 2 2 2 2 ...[/CODE] |
I think I know which code change introduced this, so it shouldn't be too hard to fix. If you need working code, use revision 122.
|
[QUOTE=rogue;585784]I think I know which code change introduced this, so it shouldn't be too hard to fix. If you need working code, use revision 122.[/QUOTE]
Thanks -- but that revision seems to have (build) issues of its own. [CODE]g++ -Isieve -m64 -Wall -O3 -std=c++11 -lstdc++ -o gfndsieve core/App_cpu.o core/FactorApp_cpu.o core/AlgebraicFactorApp_cpu.o core/Clock_cpu.o core/Parser_cpu.o core/Worker_cpu.o core/HashTable_cpu.o core/main_cpu.o core/SharedMemoryItem_cpu.o sieve/Erat.o sieve/EratBig.o sieve/EratMedium.o sieve/EratSmall.o sieve/PreSieve.o sieve/CpuInfo.o sieve/MemoryPool.o sieve/PrimeGenerator.o sieve/PrimeSieve.o sieve/IteratorHelper.o sieve/LookupTables.o sieve/popcount.o sieve/nthPrime.o sieve/PrintPrimes.o sieve/ParallelSieve.o sieve/iterator.o sieve/api.o sieve/SievingPrimes.o x86_asm/fpu_mod_init_fini.o x86_asm/fpu_push_pop.o x86_asm/sse_mulmod.o x86_asm/fpu_mulmod.o x86_asm/fpu_powmod.o x86_asm/fpu_powmod_4b_1n_4p.o x86_asm/fpu_mulmod_iter.o x86_asm/fpu_mulmod_iter_4a.o x86_asm/fpu_mulmod_4a_4b_4p.o x86_asm/sse_mod_init_fini.o x86_asm/sse_powmod_4b_1n_4p.o x86_asm/sse_mulmod_4a_4b_4p.o x86_asm/avx_set_a.o x86_asm/avx_set_b.o x86_asm/avx_get.o x86_asm/avx_compute_reciprocal.o x86_asm/avx_compare.o x86_asm/avx_mulmod.o x86_asm/avx_powmod.o x86_asm/sse_powmod_4b_1n_4p_mulmod_1k.o x86_asm_ext/m320.o x86_asm_ext/m384.o x86_asm_ext/m448.o x86_asm_ext/m512.o x86_asm_ext/m576.o x86_asm_ext/m640.o x86_asm_ext/m704.o x86_asm_ext/m768.o x86_asm_ext/mulmod128.o x86_asm_ext/mulmod192.o x86_asm_ext/mulmod256.o x86_asm_ext/sqrmod128.o x86_asm_ext/sqrmod192.o x86_asm_ext/sqrmod256.o x86_asm_ext/redc.o gfn_divisor/GFNDivisorApp_cpu.o gfn_divisor/GFNDivisorWorker_cpu.o -lgmp -lpthread /usr/bin/ld: gfn_divisor/GFNDivisorApp_cpu.o: in function `GFNDivisorApp::PostSieveHook()': GFNDivisorApp.cpp:(.text+0x3b1): undefined reference to `GFNDivisorTester::TestRemainingTerms(unsigned long, unsigned long, unsigned long)' /usr/bin/ld: gfn_divisor/GFNDivisorApp_cpu.o: in function `GFNDivisorApp::ValidateOptions()': GFNDivisorApp.cpp:(.text+0x292c): undefined reference to `GFNDivisorTester::GFNDivisorTester(App*)' /usr/bin/ld: gfn_divisor/GFNDivisorApp_cpu.o: in function `GFNDivisorApp::PreSieveHook()': GFNDivisorApp.cpp:(.text+0x20d8): undefined reference to `GFNDivisorTester::StartedSieving()' collect2: error: ld returned 1 exit status[/CODE] I'm in no rush, so happy to wait for a fix at head. |
You would need all of the sources from that revision to build it.
On the positive side this issue is now fixed. I was over-thinking a speed up in the previous revision for small primes and it was just plain stupid. It was never going to work. It works now based upon the testing I have done. |
[QUOTE=rogue;585802]You would need all of the sources from that revision to build it.[/QUOTE]
I think r122 is just broken; a clean "svn co --revision=..." in a clean directory still produces the same build error. But r123 seems to fix the makefile, and that build appears to be working for me. [QUOTE]On the positive side this issue is now fixed. I was over-thinking a speed up in the previous revision for small primes and it was just plain stupid. It was never going to work. It works now based upon the testing I have done.[/QUOTE] Great news! :smile: |
[CODE]C:\Users\Administrator\Documents\cllr S649\r15\thread 4>srsieve2.exe -i sr_1005.
abcd -P 20e9 srsieve2 v1.5.1, a program to find factors of k*b^n+c numbers for fixed b and va riable k and n Must use generic sieving logic because there is more than one sequence Sieving with generic logic for p >= 15000000000 Fatal Error: Expected 986923 terms when building sequences, but counted only 0 [/CODE] I build the .abcd file with srfiles -a command. It looks like the sieve file is damaged; however -G is working fine so i have a prp file already. BUT when i -a the prp file to get a fresh abcd file; i get the same message! Might already be known, and i am using an older version >.> (shame on me lol) |
[QUOTE=MisterBitcoin;586241][CODE]C:\Users\Administrator\Documents\cllr S649\r15\thread 4>srsieve2.exe -i sr_1005.
abcd -P 20e9 srsieve2 v1.5.1, a program to find factors of k*b^n+c numbers for fixed b and va riable k and n Must use generic sieving logic because there is more than one sequence Sieving with generic logic for p >= 15000000000 Fatal Error: Expected 986923 terms when building sequences, but counted only 0 [/CODE]I build the .abcd file with srfiles -a command. It looks like the sieve file is damaged; however -G is working fine so i have a prp file already. BUT when i -a the prp file to get a fresh abcd file; i get the same message! Might already be known, and i am using an older version >.> (shame on me lol)[/QUOTE] How about srfile_win64 -a your.prp Edit: I still have the abcd file for R1005 |
[QUOTE=rebirther;586242]How about
srfile_win64 -a your.prp Edit: I still have the abcd file for R1005[/QUOTE] I tried that, but it got me the same result. However upgrading to the newest srsieve2 fixed it. |
Using srsieve2 from latest version I got this error.
Factors are written in file but in wrong format. When you try to remove factors then you get error: candidate is not divisible, and all stops. Editing file with factors is not solution. [QUOTE]91961131 | 57*20^448688+1 479259601 | 36*20^413328+1 91962193 | 110*20^454361+1 281926849 | 91962443 | 15*20^415116+1 135*20^481951+1 281927411 | 90*20^480795+1 185593399 | 79*20^458720+1[/QUOTE] |
[QUOTE=pepi37;587208]Using srsieve2 from latest version I got this error.
Factors are written in file but in wrong format. When you try to remove factors then you get error: candidate is not divisible, and all stops. Editing file with factors is not solution.[/QUOTE] That is weird. I assume this is with multiple threads based upon the output. There should be a lock to ensure this doesn't happen if multiple threads are writing factors concurrently. I will verify that. |
[QUOTE=rogue;587211]That is weird. I assume this is with multiple threads based upon the output. There should be a lock to ensure this doesn't happen if multiple threads are writing factors concurrently. I will verify that.[/QUOTE]
yes 5 threads in this case |
srsieve2cl ( latest build from package 1.5.2)
srsieve2cl -P 10000000000000 -g 24 -G4 -D1 -d1 -d2 -d3 -d0 I have 4 GPU and using this command only last GPU is 95 % utilized How to utilize all? |
[QUOTE=pepi37;587682]srsieve2cl ( latest build from package 1.5.2)
srsieve2cl -P 10000000000000 -g 24 -G4 -D1 -d1 -d2 -d3 -d0 I have 4 GPU and using this command only last GPU is 95 % utilized How to utilize all?[/QUOTE] mtsieve isn't designed to use multiple GPUs. You would have to run one instance on each GPU. Use -h to list the platforms and devices then use -D and -d to specify the one you want to use. The default is platform 0 and device 0 on platform 0. |
[QUOTE=rogue;587687]mtsieve isn't designed to use multiple GPUs. You would have to run one instance on each GPU. Use -h to list the platforms and devices then use -D and -d to specify the one you want to use. The default is platform 0 and device 0 on platform 0.[/QUOTE]
Yes , later I manage that and got in total around 10mp/sec: using 4 cards Thanks! You have PM with another problem with srsieve2 |
[QUOTE]e:\PRIME\REPDIGIT-k92>srsieve2 -P 2000000000000000 -W 5 -w 1e7 -i 92.txt -O 92fact.txt -f B
srsieve2 v1.5.2, a program to find factors of k*b^n+c numbers for fixed b and variable k and n Sieving one sequence where abs(c) = 1 for p >= 1600000000000000 Split 1 base 10 sequence into 108 base 10^180 sequences. 673642 bytes used for congruence tables 598 bytes used for Legendre tables Sieve started: 16e14 < p < 2e15 with 88073 terms (1500028 < n < 2999998, k*10^n[B][COLOR=Red]+[/COLOR][/B]c) (expecting 558 factors) p=1600017808265749, 1.493M p/sec, no factors found, 0.0% done. ETC 2022-01-13 02:30[/QUOTE] If I use srsieve2 to sieve k*10^n[COLOR=Red]-[/COLOR]1 why is reported k*10^n[B][COLOR=Red]+[/COLOR][/B]c? Maybe it is just simple error, or something deeper? |
[QUOTE=pepi37;590065]If I use srsieve2 to sieve k*10^n[COLOR=Red]-[/COLOR]1 why is reported k*10^n[B][COLOR=Red]+[/COLOR][/B]c?
Maybe it is just simple error, or something deeper?[/QUOTE] Missing a % in a format string. It should be "%+c" and is just "+c" in the code. I have fixed and committed that fix. |
[CODE]>> gfndsieve.exe -k5000000 -K6000000 -n16001 -N17000 -o"out_test.txt"
gfndsieve v2.0, a program to find factors of k*2^n+1 numbers for variable k and n Sieve started: 3 < p < 2^62 with 500000000 terms (5000001 <= k <= 5999999, 16001 <= n <= 17000, k*2^n+1) Fatal Error: Invalid factor: 5006169*2^16953+1 mod 5012429 = 2651427 [/CODE] I got this error when using mtsieve_2.2.1 There is no error when using the same command with mtsieve_2.0.3 |
[QUOTE=matzetoni;590175][CODE]>> gfndsieve.exe -k5000000 -K6000000 -n16001 -N17000 -o"out_test.txt"
gfndsieve v2.0, a program to find factors of k*2^n+1 numbers for variable k and n Sieve started: 3 < p < 2^62 with 500000000 terms (5000001 <= k <= 5999999, 16001 <= n <= 17000, k*2^n+1) Fatal Error: Invalid factor: 5006169*2^16953+1 mod 5012429 = 2651427 [/CODE] I got this error when using mtsieve_2.2.1 There is no error when using the same command with mtsieve_2.0.3[/QUOTE] I will look into this. |
[QUOTE=matzetoni;590175][CODE]>> gfndsieve.exe -k5000000 -K6000000 -n16001 -N17000 -o"out_test.txt"
gfndsieve v2.0, a program to find factors of k*2^n+1 numbers for variable k and n Sieve started: 3 < p < 2^62 with 500000000 terms (5000001 <= k <= 5999999, 16001 <= n <= 17000, k*2^n+1) Fatal Error: Invalid factor: 5006169*2^16953+1 mod 5012429 = 2651427 [/CODE] I got this error when using mtsieve_2.2.1 There is no error when using the same command with mtsieve_2.0.3[/QUOTE] Are you working on Fermat factors research? |
I have posted mtsieve 2.2.2 over at sourceforge. It addresses the open issues and has these changes:
[code] framework: Added __attribute__ to method declarations that accept variable arguments. srsieve2, srsieve2cl: version 1.5.3 Modified to not remove terms that are prime as that defeats the purpose of Sierpinski/Riesel searches. Fixed bug where maxn for a sequence has a small factor, but it is not found. gnfdsieve, gfndsievecl: version 2.1 Fixed bug where code can find invalid factors. [/code] |
[QUOTE=rogue;591373]I have posted mtsieve 2.2.2 over at sourceforge. It addresses the open issues and has these changes:
[code] framework: Added __attribute__ to method declarations that accept variable arguments. srsieve2, srsieve2cl: version 1.5.3 Modified to not remove terms that are prime as that defeats the purpose of Sierpinski/Riesel searches. Fixed bug where maxn for a sequence has a small factor, but it is not found. gnfdsieve, gfndsievecl: version 2.1 Fixed bug where code can find invalid factors. [/code][/QUOTE] Can Linux users acess to the source code and recompile? :smile: |
[QUOTE=ET_;591434]Can Linux users acess to the source code and recompile? :smile:[/QUOTE]
All of the source is on sourceforge as well as a makefile that works on OS X and Windows. If the makefile doesn't work on Linux, I would not expect it to be difficult to get it to work. |
[QUOTE=rogue;591442]All of the source is on sourceforge as well as a makefile that works on OS X and Windows. If the makefile doesn't work on Linux, I would not expect it to be difficult to get it to work.[/QUOTE]
Thank you Mark. I will look for it better. |
Is there advice about how to best choose values for "-G", "-g" and "-W' for OpenCL based programs like [C]srsieve2cl[/C] on a given GPU?
On a Tesla A100, I couldn't get srsieve2cl to go much above 9 to 10M p/sec, after fiddling with values for a while. By comparison, a plain [C]./srsieve2 -W 48[/C] on a 72-core Xeon CPU gives me about 15M p/sec. |
[QUOTE=ryanp;591575]Is there advice about how to best choose values for "-G", "-g" and "-W' for OpenCL based programs like [C]srsieve2cl[/C] on a given GPU?
On a Tesla A100, I couldn't get srsieve2cl to go much above 9 to 10M p/sec, after fiddling with values for a while. By comparison, a plain [C]./srsieve2 -W 48[/C] on a 72-core Xeon CPU gives me about 15M p/sec.[/QUOTE] I recommend bumping -g. You will have to play around to see where you start seeing diminishing returns. I have noticed that when running many workers that the code that feeds the worker threads is not fast enough. In some cases it is better to have multiple instances of srsieve2 running. To address this would require significant changes to the framework. |
All times are UTC. The time now is 14:01. |
Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2023, Jelsoft Enterprises Ltd.