![]() |
mtsieve
I am pleased to announce a sieving framework that encompasses many of the sieving programs I have written over the years. I call this framework mtsieve, short for multi-threaded sieve. You can get more information about from [URL="mersenneforum.org/rogue/mtsieve.html"]here[/URL].
Bundled with that framework are working 64-bit Windows versions of afsieve, mfsieve, pixsieve, fbncsieve, gfndsieve, and xyyxsieve. cksieve is also included, but doesn't work yet. It will take me a few days to get that working correctly. A makefile is included, so these programs can be built on OS X and Linux. You are probably wondering why you should switch. The most important reason is that all of these support multi-threading (what I call workers) out of the box. The previous version of most of these programs did not have any support for multiple cores. Also of note is that fbncsieve (with one thread) is about 20% faster than the previous version of fbncsieve thanks to a optimization I borrowed from gfndsieve. As always there might be some bugs, but I have done enough smoke testing for everything included (except cksieve) to feel confident that they are working. Over the coming weeks I will be refining the documentation and fixing cksieve and addressing any issues that are reported. I expect a few issues, but not many. After that I will delve into porting the various OpenCL versions of my programs to this framework. I do not how I will do that yet, but I'm thinking about it. |
The multi-threaded version of gfndsieve is not working correctly. It is possible the others are not as well. I need to do some more testing.
|
[QUOTE=rogue;480000]The multi-threaded version of gfndsieve is not working correctly. It is possible the others are not as well. I need to do some more testing.[/QUOTE]
It is possible the others are not as well. .... - your assumption is correct :) And you have at least two more bugs ( that I found) I will wait for fixed version. -W is option I am very interested to be in working state! Great collection of tools! Thanks for them! |
[QUOTE=pepi37;480018]It is possible the others are not as well. .... - your assumption is correct :)
And you have at least two more bugs ( that I found) I will wait for fixed version. -W is option I am very interested to be in working state! Great collection of tools! Thanks for them![/QUOTE] Please let me know the other bugs you found. |
The testing I have done so far has not revealed a bug in finding factors, but there does appear to be a bug in what it output to the console. I can also say that the multi-threaded gfndsieve and fbncsieve programs perform poorly. Due to high factor density there is a bottleneck when removing candidate from the pool. I will look to see if there is anything I can do to address that.
|
[QUOTE=rogue;480032]Please let me know the other bugs you found.[/QUOTE]
Output is always in pfgw format regardless what switch is used -f --format=f format of output file (A=ABC, D=ABCD (default), N=NEWPGEN) ( to clarify -extension is always pfgw, but header doesnot match , so srfile cannot process file) -W option doesnot work -P value is not working [QUOTE]C:\Users\Alpha-I7\Desktop\mtsieve>fbncsieve [COLOR=Lime][B]-P 1000000000000[/B][/COLOR] -k1e4 -K1e6 -s k*3^^11+1[B] [COLOR=magenta]--format=A[/COLOR][/B] fbncsieve v1.3, a program to find factors of k*b^n+c numbers for fixed b,n, and c and variable k Changing [B][COLOR=lime]p_max to 420890[/COLOR][/B]. All remaining terms will be prime. Sieve started: 1 < p < 420890 with 990001 terms Sieve completed at p=420899. Processor time: 0.08 sec. (0.00 sieving) (0.35 cores) [COLOR=Magenta][B]59545 terms written to k_b3_n11+1.pfgw[/B][/COLOR] Primes tested: 35458. Factors found: 930456. Remaining terms: 59545. Time: 0.22 seconds.[/QUOTE] |
[QUOTE=pepi37;480260]Output is always in pfgw format regardless what switch is used
-f --format=f format of output file (A=ABC, D=ABCD (default), N=NEWPGEN) ( to clarify -extension is always pfgw, but header doesnot match , so srfile cannot process file) -W option doesnot work -P value is not working[/QUOTE] For fbncsieve -P can be overridden by the program if it detects that you are sieving too deeply. In your example, it decides to sieve no deeper than sqrt(1000000*3^11+1). The message states that after sieving to that value, all remaining terms in the output file are prime. This is the exact same behavior as newpgen. -W does work, but since the sieving in this example is not very deep, a second worker doesn't have any work as all of the primes tested are tested by the first worker. Try again with a larger n that requires much deeper sieving and you will see that. -f and --format do work. .pfgw is just an extension. Although the programs formats output as documented, it doesn't change the file extension. I made no guarantees that srfile will be able to process the output from fbncsieve. Is there some reason that you need srfile to read the output from fbncsieve? I have an update on the performance issue. This version of fbncsieve is still faster than the previous one that only supports a single thread which means that fbncsieve is not impacted. I have not looked at the gfndsieve code yet, but I clearly did something really bad when I ported the code into the framework. |
[QUOTE=rogue;480269]
I have an update on the performance issue. This version of fbncsieve is still faster than the previous one that only supports a single thread which means that fbncsieve is not impacted. I have not looked at the gfndsieve code yet, but I clearly did something really bad when I ported the code into the framework.[/QUOTE] can you send that new version? |
[QUOTE=pepi37;480270]can you send that new version?[/QUOTE]
The one bundled with the mtsieve download is the current version. There are no issues with it. As for gfndsieve, I think I found the cause of the slowdown. I am going to make some changes and se if they improve the performance. |
[QUOTE][B]-o [/B]--outputterms=o output file of remaining candidates
[COLOR=Red][B]-o [/B][/COLOR]--outputfactors=O output file with new factors[/QUOTE]This second o ( for output factors ) should be [B]big O[/B] -i --inputterms=i input file of remaining candidates -I --inputfactors=I input file with factors -o --outputterms=o output file of remaining candidates -o --outputfactors=O output file with new factors this is cosmetic not true one bug 3 | 16127*10^1000000+1 3 | 16130*10^1000000+1 32463413 | 33589*10^1000000+1 15545081 | 20084*10^1000000+1 3 | 16133*10^1000000+1 3 | 16136*10^1000000+1 3 | 16139*10^1000000+1 3 | 16142*10^1000000+1 3 | 16145*10^1000000+1 3 | 16148*10^1000000+1 3 | 16151*10^1000000+1 3 | 16154*10^1000000+1 3 | 16157*10^1000000+1 3 | 16160*10^1000000+1 3 | 16163*10^1000000+1 3 | 16166*10^1000000+1 3 | 16169*10^1000000+1 3 | 16172*10^1000000+1 3 | 16175*10^1000000+1 3 | 16178*10^1000000+1 3 | 16181*10^1000000+1 3 | 16184*10^1000000+1 3 | 16187*10^1000000+1 3 | 16190*10^1000000+1 3 | 16193*10^1000000+1 15545417 | 40203*10^1000000+1 32471381 | 16246*10^1000000+1 3 | 16196*10^1000000+1 3 | 16199*10^1000000+1 I checked, and all factors are good, just not sorted |
I'll fix the -o/-O issue.
The factors will not be sorted, especially if using multiple workers. In fact when using multiple workers the factor is not necessarily the smallest factor. Neither are important to the functionality of the program. |
And I will public say: sorry for "bug " they are not bugs, more like my bad reading of tutorial!
It is really fast app! |
[QUOTE=pepi37;480313]And I will public say: sorry for "bug " they are not bugs, more like my bad reading of tutorial!
It is really fast app![/QUOTE] Thanks. There is a bug with fbncsieve and the -1 form. I know the cause. I'll let you know when an updated version has been posted. |
Mark, thank you for the continued efforts and time on your sieving programs! This is very much appreciated!
On my Windows 2012 R2 system with an Intel Xeon E5-2620 cksieve crashes, directly after starting. Which I think still matches your comment from the start post that there were issues with the cksieve. By running (as example): cksieve -b 2 -p 2 -P 1000000 -n 100 -N 10000 -o ck_remain.out -O ck_factors.out cksyeve v1.2, a program to find factors of (b^n+/-1)^2-2 numbers Sieve started: 2 < p < 1000000 with 19802 terms Windows eventlog shows: Faulting application name: cksieve.exe, version: 0.0.0.0, time stamp: 0x5a7f7e2c Faulting module name: ntdll.dll, version: 6.3.9600.18821, time stamp: 0x59ba86db Exception code: 0xc0000374 Fault offset: 0x00000000000f1c10 Faulting process id: 0x190c Faulting application start time: 0x01d3aa65f8e8626d Faulting application path: C:\mtsieve\cksieve.exe Faulting module path: C:\WINDOWS\SYSTEM32\ntdll.dll Faulting package full name: Faulting package-relative application ID: Also beside the o/O cosmetic points in the help section, with cksieve.exe -h there is a small typo: cks[b]y[/b]eve v1.2, a program to find factors of (b^n+/-1)^2-2 numbers -h --help prints this help[b])[/b] |
Thanks for your feedback. I'll fix that cosmetic issue.
I have not had the time to fix cksieve, but the gfndsieve performance issue has been resolved. I was trying to add a performance enhancement to fbncsieve, but it crashes on Windows and I haven't figured out why yet. That same change works in OS X, so it is either a compiler bug in mingw64 or something in the asm code. I'm hoping to post updated code this weekend. |
Good news. I found and fixed the bug with cksieve (stupid x86 asm). Here is a complete list of changes:
[code] Add an internal flag that guarantee that suspends all but one Worker when processing the first chunk of primes. This is used to improve performance when there is a high factor density for low primes. This will also suppress any on screen reporting or checkpointing until that chunk is processed. Fix issue in computing CPU utilization. Changed -c (chunksize) option to -w (worksize). Change output to use shorter notation for min and max primes. cksieve - Fixed. gfndsieve - Enable the flag mentioned above. fbncsieve - Enable the flag mentioned above. fkbnsieve - Added, but not tested. [/code] Visit my page to get the link to d/l the latest source and Windows builds. |
fbncsieve -p50000000000000 -P 100000000000000 -i 500.npg -fN -W4 -O fact.txt
fbncsieve v1.3.1, a program to find factors of k*b^n+c numbers for fixed b, n, and c and variable k Sieve started: 5e13 < p < 1e14 with 159895 terms p=50055611474419, 28.92M p/sec, [B]4 factors found at 15298 sec per factor[/B], 0.1% done. ETA 2018-02-22 16:51 Since program run[B] only 147 seconds[/B] I assume there will be 15.298 sec per factor not 15298 per factor , but this is cosmetic bug. +/- side works perfectly THANKS! |
[QUOTE=rogue;480599]Good news. I found and fixed the bug with cksieve (stupid x86 asm). Here is a complete list of changes:
[code] Add an internal flag that guarantee that suspends all but one Worker when processing the first chunk of primes. This is used to improve performance when there is a high factor density for low primes. This will also suppress any on screen reporting or checkpointing until that chunk is processed. Fix issue in computing CPU utilization. Changed -c (chunksize) option to -w (worksize). Change output to use shorter notation for min and max primes. cksieve - Fixed. gfndsieve - Enable the flag mentioned above. fbncsieve - Enable the flag mentioned above. fkbnsieve - Added, but not tested. [/code] Visit my page to get the link to d/l the latest source and Windows builds.[/QUOTE] To be clear, does this mean the multithreaded version of gfndsieve is considered fully functional? Side question, can gfndsieve take a sieved file as input or no? I look at the options and the readme and didn't see any such option, but I wanted to be sure. Thanks again for doing this. :smile: |
I still have problem if header of newpgen file is like this 29491439734612:M:0:2:16386
Then got message it is invalid header. Even on base 2 this program is faster then Newpgen. I can say it has [B]constant rate for every base [/B]( not just for base2 like Newpgen have) Little test I made sample file for base 500 Start point was 10000000000000 Newpgen done until 10016239604068 in 337 seconds and found 7 factors In same time ( number of worker 1) FBCNsieve found 50 factors and reach 10100000000000. In any way his program will give boost for searching type of variable K. Can you improve sr1sieve in similar way ( add MT option)? |
[QUOTE=pepi37;480601]fbncsieve -p50000000000000 -P 100000000000000 -i 500.npg -fN -W4 -O fact.txt
fbncsieve v1.3.1, a program to find factors of k*b^n+c numbers for fixed b, n, and c and variable k Sieve started: 5e13 < p < 1e14 with 159895 terms p=50055611474419, 28.92M p/sec, [B]4 factors found at 15298 sec per factor[/B], 0.1% done. ETA 2018-02-22 16:51 Since program run[B] only 147 seconds[/B] I assume there will be 15.298 sec per factor not 15298 per factor , but this is cosmetic bug. +/- side works perfectly[/QUOTE] I will look into this. Note that if you start from an input file that it will get the starting prime from that file. Using -p will override what it reads from that file. |
[QUOTE=wombatman;480603]To be clear, does this mean the multithreaded version of gfndsieve is considered fully functional? Side question, can gfndsieve take a sieved file as input or no? I look at the options and the readme and didn't see any such option, but I wanted to be sure. Thanks again for doing this. :smile:[/QUOTE]
gfndsieve is fully functional. The only files that gfndsieve can take as input files are files that were created by gfndsieve. Use the -i option to specify the input file instead of using the -k/-K/-n/-N options. With some manipulation it could read files created with the -abcd1 switch of fermfact. |
[QUOTE=pepi37;480618]I still have problem if header of newpgen file is like this 29491439734612:M:0:2:16386
Then got message it is invalid header. Even on base 2 this program is faster then Newpgen. I can say it has [B]constant rate for every base [/B]( not just for base2 like Newpgen have) Little test I made sample file for base 500 Start point was 10000000000000 Newpgen done until 10016239604068 in 337 seconds and found 7 factors In same time ( number of worker 1) FBCNsieve found 50 factors and reach 10100000000000. In any way his program will give boost for searching type of variable K. Can you improve sr1sieve in similar way ( add MT option)?[/QUOTE] How was the newpgen file created? Was it created by newpgen or fbnciseve? Is there a reason that you choose that format over the ABC or ABCD format? Getting sr1sieve into mtsieve is one of my goals, but it is behind the GPU options. |
[quote]Originally Posted by pepi37 View Post
fbncsieve -p50000000000000 -P 100000000000000 -i 500.npg -fN -W4 -O fact.txt fbncsieve v1.3.1, a program to find factors of k*b^n+c numbers for fixed b, n, and c and variable k Sieve started: 5e13 < p < 1e14 with 159895 terms p=50055611474419, 28.92M p/sec, 4 factors found at 15298 sec per factor, 0.1% done. ETA 2018-02-22 16:51 [/quote] Can you e-mail me your input file? |
[QUOTE=rogue;480630]How was the newpgen file created? Was it created by newpgen or fbnciseve? Is there a reason that you choose that format over the ABC or ABCD format?
Getting sr1sieve into mtsieve is one of my goals, but it is behind the GPU options.[/QUOTE] Newpgen file is created using merge function in that program (sieved across few machines) and then join into one. [QUOTE] Is there a reason that you choose that format over the ABC or ABCD format?[/QUOTE] Of course there is reason: and reason is called removing factors. I dont know why you now make some new formats when we have srfile that can make miracles , and make remove factors and convert from one to another formats. If you make some new formats, then please make utility to -allow removing factors -allow change from one format to another ( and also support to convert to npg format) How I will remove factors: if I have three computers and every computer make own range? |
[QUOTE=rogue;480640]Can you e-mail me your input file?[/QUOTE]
Here it is [URL]https://www.dropbox.com/s/hmmyd6lqrwgamf6/500.zip?dl=0[/URL] P.S sieve depth is lower, so header in file is not OK, but this is just for experiment so dont care about it |
Pepi, just as a heads-up. The linux version of sr1sieve is multithreaded. If you have Windows 10, you can compile the source within the Ubuntu shell and it will work fine.
|
[QUOTE=pepi37;480650]Newpgen file is created using merge function in that program (sieved across few machines) and then join into one.
Of course there is reason: and reason is called removing factors. I dont know why you now make some new formats when we have srfile that can make miracles , and make remove factors and convert from one to another formats. If you make some new formats, then please make utility to -allow removing factors -allow change from one format to another ( and also support to convert to npg format) How I will remove factors: if I have three computers and every computer make own range?[/QUOTE] Use the -I argument to pass a file of factors in the form "p | candidate" into fbncsieve. If you have multiple factor files, concatenate before using as input. fbncsieve can be used to convert input files from ABCD/ABC/NPG formats into ABCD/ABC/NPG format using the -f switch. What it doesn't do is "convert then exit immediately", but I could probably add a switch for that. fbncsieve does not output "invalid header", so I don't know what issue it had with your file. I will d/l that file later today and see what is tripping it up as I don't see anything obviously wrong with what you pasted. BTW, one of my long term goals is to remove the need for srfile. None of the programs in this framework rely on srfile for any processing. If they do, then please let me know what I can do to move you away from using it. |
[QUOTE=rogue;480653]Use the -I argument to pass a file of factors in the form "p | candidate" into fbncsieve. If you have multiple factor files, concatenate before using as input. fbncsieve can be used to convert input files from ABCD/ABC/NPG formats into ABCD/ABC/NPG format using the -f switch. What it doesn't do is "convert then exit immediately", but I could probably add a switch for that.
fbncsieve does not output "invalid header", so I don't know what issue it had with your file. I will d/l that file later today and see what is tripping it up as I don't see anything obviously wrong with what you pasted. BTW, one of my long term goals is to remove the need for srfile. None of the programs in this framework rely on srfile for any processing. If they do, then please let me know what I can do to move you away from using it.[/QUOTE] [B]What it doesn't do is "convert then exit immediately", but I could probably add a switch for that.[/B]- that will be great , so at the end I can get one or more npg files. Then I dont need srfile anymore ( agree with you) |
[QUOTE=pepi37;480659][B]What it doesn't do is "convert then exit immediately", but I could probably add a switch for that.[/B]- that will be great , so at the end I can get one or more npg files.
Then I dont need srfile anymore ( agree with you)[/QUOTE] How many terms are remaining in the output from fbncsieve that require you to split it? In the case of CRUS, you would be using that output as input to srbsieve. In other cases is it possible for you to set up a PRPNet server to hand out the work? If not, I can look into adding a switch (similar to what is in gfndsieve) for splitting the remaining terms from fbncsieve into multiple files. I recommend avoiding NPG file formats. The ABC format is similar, but the difference between the headers of the two formats makes the ABC format easier to comprehend. |
one more cosmetic bug
if you use option -h then you got text look at last line ( in the red) [QUOTE]C:\Users\Alpha-I7\Desktop\mtsieve>gfndsieve -h gfndsieve v1.3, a CPU program to find factors of k*2^n+1 numbers for variable k and n -h --help prints this help -p --pmin=P0 sieve start: P0 < p (default 1) -P --pmax=P1 sieve end: p < P1 (default 2^62) -w --worksize=w primes per chunk of work (default 1000000) -W --workers=W start W workers (default 1) -i --inputterms=i input file of remaining candidates -I --inputfactors=I input file with factors -o --outputterms=o output file of remaining candidates -O --outputfactors=O output file with new factors -k --kmin=k minimum k to search -K --kmax=K maximum k to search -n --nmin=N minimum n to search -N --nmax=N maximum n to search -T --nsperfile=T number of n per output file [COLOR=Red][B]Fatal Error: kmin must be specified[/B][/COLOR][/QUOTE] |
[QUOTE=pepi37;480669]one more cosmetic bug
if you use option -h then you got text look at last line ( in the red)[/QUOTE] The question is whether or not it should exit immediately after printing the help. |
pepi37 reported a bug with the newpgen format output by fbncsieve. It is with the cryptic details buried in that pesky first line. I hope to fix it this weekend.
|
I found a bug with the cksieve executable: when the -n flag is supplied with 1 as its argument it gives a fatal error saying that 1 is out of range. The output is given below:
[CODE]C:\Users\Dylan_000\Desktop\mtsieve\mtsieve>cksieve -P1e9 -n1 -N50000 -b50 cksieve v1.2, a program to find factors of (b^n+/-1)^2-2 numbers Fatal Error: cksieve: out of range argument -n 1[/CODE]whereas if the -n flag is supplied with a 2 instead it works fine: [CODE]C:\Users\Dylan_000\Desktop\mtsieve\mtsieve>cksieve -P1e9 -n2 -N50000 -b50 cksieve v1.2, a program to find factors of (b^n+/-1)^2-2 numbers Sieve started: 1 < p < 1e9 with 99998 terms <snip program output>[/CODE]In cksieve v1.1.4 -n 1 works fine: [CODE]cksieve -P1e9 -n1 -N50000 -b50 cksieve 1.1.4 -- A sieve for Carol (b^n-1)^2-2 and Kynea (b^n+1)^2-2 numbers. Started with 100000 terms for (50^n+/-c)^2-2 from command line. cksieve 1.1.4 started: 1 <= n <= 50000, 3 <= p <= 1000000000 <snip program output>[/CODE]I can confirm that the multithreading works in cksieve: with one thread on a otherwise idle i5-5200U it takes 261.82 seconds to process the sieve in the second code snippet, whereas with 4 threads the time taken is 88.82 sec. |
update to the previous post
I have figured how to allow cksieve to take 1 as an argument for the -n flag without the error described in the previous post. This requires only one change to the CarolKyneaApp.cpp file. Instead of
[CODE]status = Parser::Parse(arg, 2, 1000000000, ii_MinN);[/CODE]replace it with [CODE]status = Parser::Parse(arg, 1, 1000000000, ii_MinN);[/CODE] within the switch (opt) block in CarolKyneaApp::ParseOption and then run make to regenerate the executables. With this change I ran the following input: [CODE]cksieve -P1e9 -n 1 -N50000 -b50[/CODE]which worked perfectly and I got a sieve file with 18924 terms in it, the same as I got with cksieve v1.1.4. |
That is the fix, but an additional change is needed. For small bases, n=1 with c=-1 yields a negative number, so those need to be excluded.
The next release will have GPU support for afsieve. My fingers are crossed that compiling and linking OpenCL with mingw works. It took me a while to figure out how to get mingw objects to link with the OpenCL library. It now links. The question is, does it run without crashing. |
I have released version 1.2 of the framework.
[code] fkbnsieve is now working. Modify cksieve to detect candidates that are prime and to log them. Fixed an asm bug that at worst causes factors to be missed by fbncsieve and gfndsieve. It will nor result in invalid factors and if it did, they would be caught at runtime due to built-in factor checking that relies on completely different code. Added -A option to apply factors (or reformat candidate file) and exit immdiately without sieving. Added GPU classes. This adds the following command line options: -D - to select the GPU platform -d - to select the GPU device -G - to specify the number of GPU workers -g - to set multiple of workgroupsize which is used to compute the number of primes per GPU worker Added GPU workers to afsieve. [/code] Visit my page to get the link to d/l the latest source and Windows builds. There are a some things I need to do. First, I need to implement factor validation for the afsieve GPU worker. Second, the build on OS X is broken because getrusage doesn't have the same capabilities on OS X as it does on Linux. I have not tested the -A option, but if it doesn't work, it should be easy to fix. I need help from Linux developers to determine the correct settings for compiling and linking with OpenCL. Here are some notes on using the GPU:[list][*]When you use the GPU enabled sievers, t is strongly recommended that you play with the -g and -G options when using a GPU to determine the optimal settings for your hardware.[*]You can have a mix of CPU and GPU workers. The default right now is to always have 1 CPU worker even if you have GPU workers, but there is nothing to prevent you from using -W4 -G4 to create 8 workers, 4 for the CPU and 4 for the GPU. This is an incredibly cool and powerful feature that I have not supported in any of my previous sieving programs.[/list] |
Thanks for fixing the -n issue and the removal of candidates that result in unity, zero or negative unity in cksieve. However I have run into another bug - when I put this as input:
[CODE]cksieve -P150e9 -n 1 -N10000 -b214 -W 4[/CODE]it runs for a while until it terminates with the following message: [CODE]Fatal Error: 393216 is not a root (mod 77309411329)[/CODE]In cksieve v1.1.4 it didn't terminate when this happened, but it would give out a warning, like this: [CODE]WARNING: 393216 is not a root (mod 77309411329)[/CODE]and then continue onward to the desired sieve depth (in this case 150e9). |
[QUOTE=Dylan14;480889]Thanks for fixing the -n issue and the removal of candidates that result in unity, zero or negative unity in cksieve. However I have run into another bug - when I put this as input:
[CODE]cksieve -P150e9 -n 1 -N10000 -b214 -W 4[/CODE]it runs for a while until it terminates with the following message: [CODE]Fatal Error: 393216 is not a root (mod 77309411329)[/CODE]In cksieve v1.1.4 it didn't terminate when this happened, but it would give out a warning, like this: [CODE]WARNING: 393216 is not a root (mod 77309411329)[/CODE]and then continue onward to the desired sieve depth (in this case 150e9).[/QUOTE] I changed the behavior because this might be a bug, but I need to investigate. I'll look into restoring the old behavior. |
fbncsieve bug
If you continue sieve from file ( sieve depth for example 50000000) you cannot use switches -p 1000000000000 -P 2000000000000 becauseprogram will still start from 50000000.
So if you have few workers threads you must change header line in a file for right sieve range |
[QUOTE=pepi37;480983]If you continue sieve from file ( sieve depth for example 50000000) you cannot use switches -p 1000000000000 -P 2000000000000 becauseprogram will still start from 50000000.
So if you have few workers threads you must change header line in a file for right sieve range[/QUOTE] I think I know the cause. |
I have a machine with 2x [url=https://ark.intel.com/products/75789/Intel-Xeon-Processor-E5-2620-v2-15M-Cache-2_10-GHz]Intel Xeon E5-2620 v2[/url]. Which are 12 cores or 24 threads. When I tried to run cksieve version 1.2, I first received an error about a missing OpenCL.dll library.
After installing the [url=https://software.intel.com/en-us/articles/opencl-drivers]OpenCL Runtime[/url] version 16.1.2, it no longer crashes, but it notes it is not able to find a suitable device. [code] C:>cksieve.exe List of available platforms and devices Platform 0 is a Intel(R) Corporation Intel(R) OpenCL, version OpenCL 1.2 No devices Fatal Error: No devices were found that can run this code [/code] Reading the release notes of the runtime, the Xeon E5's are supported; as long as they support SSE4.2 or above. Which this particular cpu does. |
[QUOTE=BotXXX;481871]I have a machine with 2x [url=https://ark.intel.com/products/75789/Intel-Xeon-Processor-E5-2620-v2-15M-Cache-2_10-GHz]Intel Xeon E5-2620 v2[/url]. Which are 12 cores or 24 threads. When I tried to run cksieve version 1.2, I first received an error about a missing OpenCL.dll library.
After installing the [url=https://software.intel.com/en-us/articles/opencl-drivers]OpenCL Runtime[/url] version 16.1.2, it no longer crashes, but it notes it is not able to find a suitable device. [code] C:>cksieve.exe List of available platforms and devices Platform 0 is a Intel(R) Corporation Intel(R) OpenCL, version OpenCL 1.2 No devices Fatal Error: No devices were found that can run this code [/code] Reading the release notes of the runtime, the Xeon E5's are supported; as long as they support SSE4.2 or above. Which this particular cpu does.[/QUOTE] If you can build, then change this line in the makefile: ENABLE_GPU=yes to ENABLE_GPU=no I haven't tested a build with that set to no, but it might solve your problem. In any case that it requires a "device" at runtime is a bug. I have fixed it, but not posted an update. You can d/l the previous build from [URL="http://www.mersenneforum.org/rogue/mtsieve_1.1.7z"]here[/URL]. |
Thank you Mark for the 1.1 build .7z. That one works ok.
As example: [code]C:>cksieve -b 2 -p 2 -P 1000000 -n 100 -N 10000 -o ck_remain.out -O ck_factors.out cksieve v1.2, a program to find factors of (b^n+/-1)^2-2 numbers Sieve started: 2 < p < 1e6 with 19802 terms Sieve completed at p=1000033. Processor time: 0.36 sec. (0.02 sieving) (0.61 cores) 3750 terms written to ck_remain.out Primes tested: 39222. Factors found: 16052. Remaining terms: 3750. Time: 0.59 seconds. [/code] |
Once nice feature with mtsieve is that when starting a new sieve you don't need to specify -p. For many you probably don't want to use -P either and you just wait for the removal rate to reach what you need before you start PRP testing. Also many of the programs will generate an output file name if you don't specify one and that file name will often include information that makes it unique based upon the inputs.
And of course you can use scientific notation for most inputs that are numeric. Who wants to type --P1000000000000 when -P1e12 doesn't require you to count zeros? |
I have released v1.3 of mtsieve. Here are the changes:
[code] Ensure that "ENABLE_GPU=no" in makefile builds all programs without error. cksieve no longer gives a fatal error if the computed root is not an actual root. This condition rarely happens, but is okay when it does. Overriding -p from the command line should now work when starting with an input file. Added GPU workers to xyyssieve. When using GPU workers, an overflow with collecting factors can cause xyyxsieve to crash. If that happens override -S and/or -g or sieve more deeply with the CPU before adding GPU workers. This will be addressed in a future release. Added GPU workers to pixsieve. It has not been tested yet. [/code] |
[QUOTE=rogue;482100]I have released v1.3 of mtsieve. Here are the changes:
[code] Ensure that "ENABLE_GPU=no" in makefile builds all programs without error. cksieve no longer gives a fatal error if the computed root is not an actual root. This condition rarely happens, but is okay when it does. Overriding -p from the command line should now work when starting with an input file. Added GPU workers to xyyssieve. When using GPU workers, an overflow with collecting factors can cause xyyxsieve to crash. If that happens override -S and/or -g or sieve more deeply with the CPU before adding GPU workers. This will be addressed in a future release. Added GPU workers to pixsieve. It has not been tested yet. [/code][/QUOTE] It appears as if the download is corrupt, since when I try to open the new version of mtsieve with 7zip I am told that 7zip cannot open it as an archive. |
Okay. I'll take a look at it later tonight.
|
I loaded a new .7z file.
|
I have released v1.4 of mtsieve. Here are the changes:
[code] Some common functionality for GPU sieving has been moved to Worker.cpp. All GPU workers validate factors found by the GPU. The xyyxsieve GPU sieving issue has been resolved. The pixsieve GPU sieving code has been tested. GPU sieving has been added to mfsieve. It has been tested. GPU sieving has been added to gfndsieve. It has been tested. Add kbbsieve, for the form k*b^b+/-1 for fixed k and variable b. It has been partially tested. [/code] |
GPU sieving has been added to gfndsieve. It has been tested
Doesnot work for me: GPU load is zero :( All sieving is done on CPU cores |
[QUOTE=pepi37;484899]GPU sieving has been added to gfndsieve. It has been tested
Doesnot work for me: GPU load is zero :( All sieving is done on CPU cores[/QUOTE] The GPU piece of gfndsieve is not very fast. I need to grab the code from ppsievecl to make it faster. I suggest that you increase -g and -G to see if that puts more workload on the GPU. |
I posted 1.5. The only change is some more testing for kbbsieve, where it was missing factors for odd k. I also implemented a slightly faster expmod for kbbsieve, so it should be about 10% to 20% faster.
I also updated the page to give simple instructions on how to build your own sieve based upon the framework. All of the code will eventually be put into sourceforge. |
1 Attachment(s)
I have updated the html documentation to reflect the changes to the mtsieve website, plus corrected some typos in the part where you explain how to create a new sieve. I have done this since the html documentation included in the download refers to (and links to) v1.2 still. It is attached below:
|
Thanks. I have loaded the updated page.
|
[QUOTE=rogue;484985]I posted 1.5. The only change is some more testing for kbbsieve, where it was missing factors for odd k. I also implemented a slightly faster expmod for kbbsieve, so it should be about 10% to 20% faster.
I also updated the page to give simple instructions on how to build your own sieve based upon the framework. All of the code will eventually be put into sourceforge.[/QUOTE] Thank you Mark :smile: |
To this point I have only been using extended FPU and SSE routines within the sieving code. Starting with code written by Ernst, I have written AVX routines that will improve performance on the CPU. I estimate between 30% and 50% faster sieving. Now that I understand AVX much better, AVX512 is a possibility, but I don't have access to a CPU with AVX512 support.
It will take time for me to integrate into the various sieves and some sieves will not be a good candidate for the AVX routines. One example of this is cksieve. I will need to evaluate that separately. |
I have released mtsieve 1.6. Here are the changes:
[code] Fixed an error with factor rate calculation when less than 1 per second. Fixed an issue with gfndsieve when continuing a sieve and k < n. For kbbsieve, added some checks for algebraic factorizations. Added gcwsieve for Cullens and Woodalls. This sieve is GPU enabled. Renamed all ASM routines to easily distinguish FPU/SSE/AVX. Added AVX asm code for use by the Worker classes. Added a mini-chunk mode that can be used when the worker classes handles primes in chunks, such as AVX mode, which is chunks of 16 primes. gcwsieve supports AVX. The CPU-only code is about 30% faster than Geoff Reynold's version. xyyxsieve supports AVX. The CPU-only code is about 2.5x faster than the previous version. [/code] You can get the latest code with Windows builds [URL="http://mersenneforum.org/rogue/mtsieve.html"]here[/URL]. I expect the AVX routines to fail on non-Windows OSes. If they do then I know what I need to fix. It is a matter of finding the time. I've been doing some refactoring with the hope that using this framework becomes easier for others once that is done. If anyone is truly interested in helping me, I need help adding AVX support to the other sieves. If interested, please contact me via PM or e-mail. |
I need to give you my code for [TEX]b_1^{n_1}b_2^{n_2}+c[/TEX] at some point. I need to work out whether it is TestPrimeChunk or BuildBNRemainders taking more time. If it is BuildBNRemainders then I will probably add a third variable base and n. I do want to at least add in support for a fixed multiplier.
It also needs to be converted to use SSE2/AVX where possible. |
[QUOTE=henryzz;490582]I need to give you my code for [TEX]b_1^{n_1}b_2^{n_2}+c[/TEX] at some point. I need to work out whether it is TestPrimeChunk or BuildBNRemainders taking more time. If it is BuildBNRemainders then I will probably add a third variable base and n. I do want to at least add in support for a fixed multiplier.
It also needs to be converted to use SSE2/AVX where possible.[/QUOTE] Good to know that someone is trying to use my framework for their own benefit. You need to call CpuSupportsAvx() to determine if the CPU has AVX support. This is declared in Worker.h. If it returns false then you need to code with the FPU or SSE routines. I suggest that you a look at avx-asm-x86.h as well as CullenWoodallWorker.cpp to help familiarize yourself with the AVX routines. One other caveat is that you cannot use the AVX code and have data of type double (double * is fine). This is because the AVX routines don't save the xmm registers upon entry and restore them upon exit. I'll address that limitation in an upcoming release. |
I have posted mtsieve 1.7 to my website. Here are the changes:
[code] Added a timestamp to liens written to the log. Canged usage of some registers in the AVX code to avoid ymm0-ymm3 being passed between calls to AVX routines. Added psieve for primorials. psieve supports AVX and is about 30% faster than fpsieve. [/code] |
fbncsieve v1.3.1, a program to find factors of k*b^n+c numbers for fixed b, n, and c and variable k
Sieve started: 92230982980247 < p < 122230982980247 with 33803 terms p=92471161196761, 25.35M p/sec, 1 factors found at 308 sec per factor, 0.8% done. ETC 2018-07-25 21:07 So simple math show that speed of this sieve is not 25350000 but much higher 952380952 Or 25.35M p/sec means something different? |
[QUOTE=pepi37;492446]fbncsieve v1.3.1, a program to find factors of k*b^n+c numbers for fixed b, n, and c and variable k
Sieve started: 92230982980247 < p < 122230982980247 with 33803 terms p=92471161196761, 25.35M p/sec, 1 factors found at 308 sec per factor, 0.8% done. ETC 2018-07-25 21:07 So simple math show that speed of this sieve is not 25350000 but much higher 952380952 Or 25.35M p/sec means something different?[/QUOTE] Primes vs range of p? |
[QUOTE=henryzz;492447]Primes vs range of p?[/QUOTE]
Sure sounds like it. log(92471161196761) ~= 32. So there will be a factor of 32 between the prime range method and the prime count method. But OP's calculation has a factor of 37 -- don't know how. (122230982980247 - 92230982980247 )*0.8% / 308 ~= 779m, not 952m. |
2 minor cosmetic errors with fkbnsieve:
The variable showing the number of terms it is about to sieve is a signed 32 bit variable, but it still works even if the c interval is above 2^31 and above 2^32. The -c and -C are also called kmin and kmax instead of cmin and cmax. |
[QUOTE=henryzz;492447]Primes vs range of p?[/QUOTE]
Correct. "range of p" is really primes between -p and -P. The software only tests primes in that range, so counting primes that were tested makes a lot more sense. |
[QUOTE=ATH;492453]2 minor cosmetic errors with fkbnsieve:
The variable showing the number of terms it is about to sieve is a signed 32 bit variable, but it still works even if the c interval is above 2^31 and above 2^32. The -c and -C are also called kmin and kmax instead of cmin and cmax.[/QUOTE] Thanks for the heads-up. I'll fix in the next release. |
I've posted 1.7.1 which fixes that bug and a couple in psieve.
|
I have posted 1.7.3 of the framework (1.7.2 was not posted). It fixes the following issues:
[code] Fixed a memory exception that would crash any GPU workers immediately. Do not output factor rate if no factors found. Fixed another issue in non-AVX psieve code that causes it to crash. Fixed issue with reading ABCD files as input lines with 1 character would be skipped. Fixed a crash upon exit of fbncsieve. Allow override with -p when starting fbncsieve and fkbnsieve from an input file. [/code] |
If using gfndsieve, fkbnsieve, or fbncsieve, you should play around with larger values for the -w switch. Each inner loop has a single powmod followed by either addition or bit shifting. This means that each range of p is processed super quick. I have found that gfndsieve has a 20% performance boost by switching to -w7 (-w6 is the default), which means that each worker will process a chunk of 1e7 primes before getting the next chunk of primes. Eventually these sieves will get AVX support, but the framework doesn't have all of the routines I would need to do that. I do not know the speed-up that it will give, but I would hope that it is around 20%.
Sieves such as mfsieve, afsieve, and psieve are much slower as they have a mulmod for each term. For those sieves the default setting for -w is 1e5 or lower as the typical sieve has many thousands of terms. I haven't done a lot of testing with the other sieves to determine the optimal setting for -w, but it will likely be highly dependent upon how many terms are in the range you are testing. I recommend testing to p=1e6 with the default setting for -w, then testing with different values of -w in ranges above that to see where you get the most bang for the buck. I would like to know what you discover when you do that. |
[QUOTE=rogue;493136]I have posted 1.7.3 of the framework (1.7.2 was not posted). It fixes the following issues:
[code] Fixed a memory exception that would crash any GPU workers immediately. Do not output factor rate if no factors found. Fixed another issue in non-AVX psieve code that causes it to crash. Fixed issue with reading ABCD files as input lines with 1 character would be skipped. Fixed a crash upon exit of fbncsieve. Allow override with -p when starting fbncsieve and fkbnsieve from an input file. [/code][/QUOTE] I experimented a (constant) crash upon exiting gfndsieve. Or better, a core dump. I assume it's a problem involving reading past EOF. Maybe mtsieve version 1.7.3 will correct it as well. ([COLOR="Red"]YES, it works fine now[/COLOR]) |
[QUOTE=rogue;493141]If using gfndsieve, fkbnsieve, or fbncsieve, you should play around with larger values for the -w switch. Each inner loop has a single powmod followed by either addition or bit shifting. This means that each range of p is processed super quick. I have found that gfndsieve has a 20% performance boost by switching to -w7 (-w6 is the default), which means that each worker will process a chunk of 1e7 primes before getting the next chunk of primes. Eventually these sieves will get AVX support, but the framework doesn't have all of the routines I would need to do that. I do not know the speed-up that it will give, but I would hope that it is around 20%.
Sieves such as mfsieve, afsieve, and psieve are much slower as they have a mulmod for each term. For those sieves the default setting for -w is 1e5 or lower as the typical sieve has many thousands of terms. I haven't done a lot of testing with the other sieves to determine the optimal setting for -w, but it will likely be highly dependent upon how many terms are in the range you are testing. I recommend testing to p=1e6 with the default setting for -w, then testing with different values of -w in ranges above that to see where you get the most bang for the buck. I would like to know what you discover when you do that.[/QUOTE] Tried the -w switch with 1e7 but in a somewhat different environment: 4GB of RAM - 2 threads - 3.0 GHz - Intel G2030 In parallel with 2 threads already running other computationally-heavy programs. Results: [code] ./gfndsieve -k 1200 -K 10000 -P 100000000000000 -n 30000 -N39999 gfndsieve v1.4, a program to find factors of k*2^n+1 numbers for variable k and n Sieve started: k*2^n+1, 1201 < k < 9999, 30000 <= n <= 39999, 3 <= p < 1e14 with 44000000 terms ^Cp=256203221, 65.57K p/sec, 41448643 factors found at 786.9 f/sec, 0.0% done. ETC 2021-08-11 02:00 CTRL-C accepted. Please wait for threads to completed. Sieve interrupted at p=275604547. Processor time: 135.78 sec. (0.09 sieving) (0.54 cores) 2545082 terms written to gfnd.pfgw Primes tested: 15000000. Factors found: 41454918. Remaining terms: 2545082. Time: 253.74 seconds. luigi@luigi-Aspire-MC605:~/luigi/mtsieve_1.7.3$ ./gfndsieve -k 1200 -K 10000 -P 100000000000000 -n 30000 -N39999 -w 1e7 gfndsieve v1.4, a program to find factors of k*2^n+1 numbers for variable k and n Sieve started: k*2^n+1, 1201 < k < 9999, 30000 <= n <= 39999, 3 <= p < 1e14 with 44000000 terms ^Cp=179424691, 0.000 p/sec, 41474702 factors found at 541.5 f/sec, 0.0% done. ETC 2024-01-19 00:28 CTRL-C accepted. Please wait for threads to completed. Sieve interrupted at p=373587911. Processor time: 178.15 sec. (0.17 sieving) (0.51 cores) 2505902 terms written to gfnd.pfgw Primes tested: 20000000. Factors found: 41494098. Remaining terms: 2505902. Time: 348.17 seconds. [/code] Let me know if you want me to try with other parameters |
1 Attachment(s)
I have updated the webpage for mtsieve to include documentation for gcwsieve and psieve, as well as provide a version history. It is attached below:
|
Bug in xyyxsieve
I found a bug when using the -s flag in xyyxsieve. If -s b is used, the sieve works fine:
[CODE]C:\Users\Dylan_000\Desktop\mtsieve\mtsieve>xyyxsieve -W 4 -P 1e9 -x 100e3 -X 100000 -y 2 -Y 100000 -s b xyyxsieve v1.3, a program to find factors numbers of the form x^y+y^x Quick elimination of terms info (in order of check): 100000 because the term is even 20000 because x and y have a common divisor Sieve started: 3 < p < 1e9 with 79998 terms p=143390813, 133.9K p/sec, 79896 factors found at 329.9 f/sec, 14.3% done. ETC 2018-08-07 18:35 CTRL-C accepted. Please wait for threads to completed. Sieve interrupted at p=174025441. Processor time: 246.58 sec. (0.33 sieving) (3.60 cores) 98 terms written to xyyx.pfgw Primes tested: 9715520. Factors found: 79900. Remaining terms: 98. Time: 68.54 seconds.[/CODE] However, if either + or - is used, the program shows that the sieve is starting and then it quits without giving any further output: [CODE]C:\Users\Dylan_000\Desktop\mtsieve\mtsieve>xyyxsieve -W 4 -P 1e9 -x 100e3 -X 100000 -y 2 -Y 100000 -s + xyyxsieve v1.3, a program to find factors numbers of the form x^y+y^x Quick elimination of terms info (in order of check): 50000 because the term is even 10000 because x and y have a common divisor Sieve started: 3 < p < 1e9 with 39999 terms[/CODE] [CODE]C:\Users\Dylan_000\Desktop\mtsieve\mtsieve>xyyxsieve -W 4 -P 1e9 -x 100e3 -X 100000 -y 2 -Y 100000 -s - xyyxsieve v1.3, a program to find factors numbers of the form x^y+y^x Quick elimination of terms info (in order of check): 50000 because the term is even 10000 because x and y have a common divisor Sieve started: 3 < p < 1e9 with 39999 terms[/CODE] |
Thanks Dylan. I have updated the page and I found and fixed the bug in xyyxsieve. You can now d/l version 1.7.4 of the framework. Here are the changes:
[code] Modify pixsieve to report primes. Modify pixsieve to output search string to console and log when sieve starts. Fixed a crash in xyyxsieve when sieving only one sign. Generate default filename for mfsieve if not specified on the command line. Fix issue in psieve if it finds a factor for the last term of the input. mfsieve supports AVX and is about 40% faster than previously. [/code] |
[QUOTE=rogue;493645]Thanks Dylan. I have updated the page and I found and fixed the bug in xyyxsieve. You can now d/l version 1.7.4 of the framework. Here are the changes:
[code] Modify pixsieve to report primes. Modify pixsieve to output search string to console and log when sieve starts. Fixed a crash in xyyxsieve when sieving only one sign. Generate default filename for mfsieve if not specified on the command line. Fix issue in psieve if it finds a factor for the last term of the input. mfsieve supports AVX and is about 40% faster than previously. [/code][/QUOTE] Nice job :smile: |
I have posted 1.7.5 of the framework. It fixes the following issues:
[code] Fixed a crash when reading multiple empty lines in a row from an input file. Added -r option to fbncsieve to remove terms where k % base = 0. Various updates for newpgen output from fbncsieve: use the .npg extension instead of .pfgw extension change third parameter of first line to 1 for srsieve/srfile compatibility change last parameter of first line to 1/2 since 1/2 is used for fixed newpgen sieves and 257/258 are used for fixed k sieves. [/code] |
[QUOTE=rogue;493923]I have posted 1.7.5 of the framework. It fixes the following issues:
[code] Fixed a crash when reading multiple empty lines in a row from an input file. Added -r option to fbncsieve to remove terms where k % base = 0. Various updates for newpgen output from fbncsieve: use the .npg extension instead of .pfgw extension change third parameter of first line to 1 for srsieve/srfile compatibility change last parameter of first line to 1/2 since 1/2 is used for fixed newpgen sieves and 257/258 are used for fixed k sieves. [/code][/QUOTE] Thanks! Great update! |
Hi folks.
I have a basic core2 machine at home, and an AVX-512 capable 4-core server via AWS. Now, I need a copy of [B]gfndsieve[/B] compiled for [B]Linux 64-bit[/B] and all the [B]AVX optimizations[/B] turned on and no OpenCL, but on the AWS system I only have gcc version 4.8 and can't compile my optimized copy. Is anybody out there who can provide me with such file? Or should I use the correct -march flag an compile on my machine? Thanks in advance. Luigi --- |
[QUOTE=ET_;496357]Hi folks.
I have a basic core2 machine at home, and an AVX-512 capable 4-core server via AWS. Now, I need a copy of [B]gfndsieve[/B] compiled for [B]Linux 64-bit[/B] and all the [B]AVX optimizations[/B] turned on and no OpenCL, but on the AWS system I only have gcc version 4.8 and can't compile my optimized copy. Is anybody out there who can provide me with such file? Or should I use the correct -march flag an compile on my machine?[/QUOTE] There is no AVX512 code (yet) and the decision to use AVX or SSE/FPU is decided a runtime based upon the capability of the CPU. To disable compiling and linking with GPU code, set ENABLE_GPU to no in the makefile. Once you do that, what issues are you getting on that box with gcc? |
[QUOTE=rogue;496360]There is no AVX512 code (yet) and the decision to use AVX or SSE/FPU is decided a runtime based upon the capability of the CPU.
To disable compiling and linking with GPU code, set ENABLE_GPU to no in the makefile. Once you do that, what issues are you getting on that box with gcc?[/QUOTE] I knew there were distinct code paths enabled on the executable, but I thought one had to enable the relative processor optimizations to have the code recognize it. In other words, if my code is compiled with -march=native, and I have a Intel G2030 processor (a crippled ivy-bridge with no AVX / FMA3 support), will the executable automatically run the FMA3 path once it is run on a AWS Skylake architecture? If so, then I solved the issue. If not, then I should recompile the code on an architecture whose "native" processor recognizes the optimizations. But the AWS gcc is locked at version 4.8, and I'm afraid it wouldn't recognize FMA3 optimizations. I'm a master in complicating my own life... :sad: |
[QUOTE=ET_;496363]I knew there were distinct code paths enabled on the executable, but I thought one had to enable the relative processor optimizations to have the code recognize it.
In other words, if my code is compiled with -march=native, and I have a Intel G2030 processor (a crippled ivy-bridge with no AVX / FMA3 support), will the executable automatically run the FMA3 path once it is run on a AWS Skylake architecture? If so, then I solved the issue. If not, then I should recompile the code on an architecture whose "native" processor recognizes the optimizations. But the AWS gcc is locked at version 4.8, and I'm afraid it wouldn't recognize FMA3 optimizations. I'm a master in complicating my own life... :sad:[/QUOTE] It should. It calls a gcc function called builtin_cpu_supports() when deciding if it can use AVX code. I assume that function checks something specific to the computer upon which the code is executing. |
[QUOTE=rogue;496366]It should. It calls a gcc function called builtin_cpu_supports() when deciding if it can use AVX code. I assume that function checks something specific to the computer upon which the code is executing.[/QUOTE]
Thank you Mark. I will test it and then report here. BTW, is there a message saying what optimizations are used at runtime? |
[QUOTE=ET_;496369]Thank you Mark. I will test it and then report here. BTW, is there a message saying what optimizations are used at runtime?[/QUOTE]
After the 50G primes tested with the same executable, the AVX version on the Xeon is 35%-40% faster than the base version at the same clock. |
[QUOTE=ET_;496369]Thank you Mark. I will test it and then report here. BTW, is there a message saying what optimizations are used at runtime?[/QUOTE]
Not at this time, but it is something I have considered adding. |
I have posted mtsieve 1.8.0 at my website. Here are the changes:
[code] Added twinsieve. This is more than 3x faster than newpgen's twin sieve. Modified OpenCL code to change calculation for default workunits to improve GPU throughput. Modified "start sieving" message to include expected factors, but only if -P is not the default value. Modified all sieves to have custom "start sieving message" so it each show more detail specific to that sieve. [/code] The default behavior of twinsieve is to sieve such that only potential twin primes are remaining, but there is a -i switch that allows one to sieve the +1 and -1 side independently. |
[QUOTE=rogue;496776]I have posted mtsieve 1.8.0 at my website. Here are the changes:
[code] Added twinsieve. This is more than 3x faster than newpgen's twin sieve. Modified OpenCL code to change calculation for default workunits to improve GPU throughput. Modified "start sieving" message to include expected factors, but only if -P is not the default value. Modified all sieves to have custom "start sieving message" so it each show more detail specific to that sieve. [/code]The default behavior of twinsieve is to sieve such that only potential twin primes are remaining, but there is a -i switch that allows one to sieve the +1 and -1 side independently.[/QUOTE] In twinsieve you use switch -i on two different places -i --inputterms=i input file of remaining candidates -i --independent Sieve +1 and -1 independently if (!ib_OnlyTwins && it_Format == FF_ABC) FatalError("Can only support ABC format if sieving +1 and -1 independently"); If i use --independent then is always zero output regardless format ABC d:\MTSIEVE\TWINSIEVE>twinsieve -P100000000000 -w10000000 -i1.npg -ofact.txt -W4 -fN -r twinsieve v1.0.0, a program to find factors of k*b^n+1/-1 numbers for fixed b and n and variable k Sieve started: 30000000001 < p < 1e11 with 18446744073709502166 terms (261 < k < 99309, k*2^1778899) (expecting 876855490500155136 factors) If in command line stay switch -r then you got this , if you remove it, then all is ok |
And last
If sieve passed 54105949591 ( or very close up to this value) then will be no output and program just terminate. If sieve depth is lower then that value, program gives output as should do. |
[QUOTE=pepi37;496791]In twinsieve you use switch -i on two different places
-i --inputterms=i input file of remaining candidates -i --independent Sieve +1 and -1 independently if (!ib_OnlyTwins && it_Format == FF_ABC) FatalError("Can only support ABC format if sieving +1 and -1 independently"); If i use --independent then is always zero output regardless format ABC d:\MTSIEVE\TWINSIEVE>twinsieve -P100000000000 -w10000000 -i1.npg -ofact.txt -W4 -fN -r twinsieve v1.0.0, a program to find factors of k*b^n+1/-1 numbers for fixed b and n and variable k Sieve started: 30000000001 < p < 1e11 with 18446744073709502166 terms (261 < k < 99309, k*2^1778899) (expecting 876855490500155136 factors) If in command line stay switch -r then you got this , if you remove it, then all is ok[/QUOTE] I'll switch it to use a different character as -i is reserved for the underlying framework. |
[QUOTE=pepi37;496813]And last
If sieve passed 54105949591 ( or very close up to this value) then will be no output and program just terminate. If sieve depth is lower then that value, program gives output as should do.[/QUOTE] What inputs did you use? Similar to fbncsieve, it will stop sieving at sqrt(k*b^n) as terms above that value are prime. |
[QUOTE=rogue;496822]What inputs did you use? Similar to fbncsieve, it will stop sieving at sqrt(k*b^n) as terms above that value are prime.[/QUOTE]
I test same with newpgen and he continue to sieve, so that i not problem |
[QUOTE=pepi37;496832]I test same with newpgen and he continue to sieve, so that i not problem[/QUOTE]
Please provide me the command line options you are using and an input files. |
[QUOTE=rogue;496839]Please provide me the command line options you are using and an input files.[/QUOTE]
twinsieve -P600000000000 -i1.npg -o11.npg -Ofactor.txt -fN Also look at problem with -r option when you got insane number of factors |
[QUOTE=pepi37;496840]twinsieve -P600000000000 -i1.npg -o11.npg -Ofactor.txt -fN
Also look at problem with -r option when you got insane number of factors[/QUOTE] I need the file 1.npg. |
[QUOTE=rogue;496844]I need the file 1.npg.[/QUOTE]
Create with twinsieve -P100000000 -k2 -K100000 -b2 -n1778899 -fN -r then result rename to 1.npg :) Just increase first value to 100000000000 and you got all problems you need: there is no output file ( if you do test on Newpgen ( or TwinGenX then you still got output with 67 candidates left) Your twinsieve will produce no output,. Try this twinsieve -P100000000000 -k2 -K100000 -W4 -b2 -n1778899 -fN -r P.S I cannot see twinsieve in makefile [I]PROGS=afsieve mfsieve gfndsieve fbncsieve fkbnsieve pixsieve xyyxsieve cksieve kbbsieve gcwsieve psieve[/I] |
Still no hope to have a MMpsieve working in the near future, isn't it? :smile:
|
[QUOTE=ET_;496903]Still no hope to have a MMpsieve working in the near future, isn't it? :smile:[/QUOTE]
Sorry, I've been busy, but now that you have nudged me, I can spend some time on it. I need to fix the issues with twinsieve first. |
[QUOTE=rogue;496904]Sorry, I've been busy, but now that you have nudged me, I can spend some time on it. I need to fix the issues with twinsieve first.[/QUOTE]
Sure, no hurry and maximum collaboration. Thanks! |
All times are UTC. The time now is 14:49. |
Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2023, Jelsoft Enterprises Ltd.