The Multi-threaded Sieve Framework
The Multi-threaded Sieve Framework (called mtsieve) is a framework that I
have been using for many of the sieving programs that I have written over
the years. Although those programs reference the framework, it was
never promoted for other people to take advantage of. The main idea of
this framework is to help optimize the search for factors of a collection of
numbers that follow a well-defined pattern. Some examples are:
- factorials - where each successive term can only be calculated by the
previous term
- number in the form k*b^n+c for a fixed k, b, and c and a range of n -
a Discrete Log can be used to find an n such that k*b^n+1 is divisible
by a given p
- numbers in the form k*b^n+c for a fixed b, n, and c and a range of k -
a simple function can compute k such that k*b^n+1 is divisible by a
given p
This framework can also be used to do other functions with primes, such as
determining if each prime is a Wieferich Prime.
You can d/l mtsieve sources and all sieving programs using that framework
from here.
Open and Portable
I cannot tell you how many times I wanted to search for primes on my Mac
only to find that before I could do that I would have to sieve the range
on a Windows OS. Two of the most notorious examples are newpgen and
fermfact. Although the newpgen source is available, that source is a
complete mess. I understand that it was great during its heyday
on 32-bit CPUs, there was nothing in it to take advantage of 64-bit
CPUs. On the other hand fermfact source is not available despite my
requests to the original author. This also has 32-bit limitations
and the command line arguments are confusing at best after one has used
other programs that share the same command line arguments. And even
though I am the author of MultiSieve, it was 32-bit and Windows GUI
only. mtsieve replaces fermfact in its entirely, but only covers
some of the functionality of newpgen and MultiSieve. If I get
requests, I will provide work on extending mtsieve to replace other
functions from newpgen and MultiSieve.
Since mtsieve is also 64-bit the memory limitations of the old 32-bit
programs are gone.
With mtsieve, we now have a framework that can built and run on Windows,
Linux, and Mac OS X. For Windows I build with mingw64 so that I can
use many of the standard libraries that many "old time programmers" are
familiar with rather than doing something the "Visual Studio" way.
For Mac and Linux, the software should build out of the box with the given
makefile.
What about the GPU?
I have already moved the GPU code into afsieve and others are to
follow. For sieves that using a discrete log, I will not create an
OpenCL version. There are two things about discrete logs that make
them unsuitable to the GPU. The first is memory. Each "thread"
requires a memory to build the hashtables, typically more than was a
single thread has access to in local memory. The second is that the
GPU is very well suited to work units that can execute in the same amount
of time. For a discrete log, each prime can take a wildly different
amount of time to find the solution and the GPU can only return when the
prime that needs the most time to find a solutions gets that solution and
while that is happening, the CPU is waiting for the results and can't do
anything.
What about ARM?
Some people have expressed an interest in support ARM CPUs. That
would require ARM specific assembly routines and I don't know ARM
assembly. Fortunately mtsieve does not rely on inline assembler, so
the routines that would need to be replaced are all found in the
x86_64_asm folder, none of which are overly lengthy. If someone
wants to work with me to extend mtsieve to run on ARM, I would greatly
appreciate it.
What about srsieve/srfile/sr1sieve/sr2sieve?
As the one who maintains these programs, they are definitely on my "to
do" list. I cannot guarantee that all features of those programs
will be kept when I port them to this framework.
What about <xxx>?
It depends upon what <xxx> is. If it is something I wrote,
I'll get around to it. If you are a developer, I'll help you use the
framework to create your application. If not, send an e-mail to me
at rogue_at_wi.rr.com (change _at_ to @) and we'll discuss it.
What about 32-bit?
All of the current programs using the framework create 64-bit builds.
Nobody has asked me for a 32-bit build in years, so I have no intention of
supporting one.
What the Framework Provides
The main goal of the framework is to abstract Windows, Linux, and OS X
functionality from the developer using the framework. For example,
if you are a Windows software developer, you don't need to know how to
create a mutex on Linux. Likewise if you are a Mac software
developer, you don't need to know how Windows threading works. As a
developer you only need to know the interface (the .h file), and not the
execution (the .cpp file). The sources are there so that you can see
how it is done on other platforms, but you won't need to understand the
details, you just have to call the correct functions.
The Source
Classes in core
The classes in this directory are the ones shared by all of the program
using the framework. These classes include:
- main.cpp - this class contains the main() function and the handler for
signals
- Clock.cpp - this class contains static functions for getting the
current clock and processor times, in microseconds
- Parser.cpp - this class contains functions for parsing command line
options. Many command line options support scientific notation,
i.e. 1e3 --> 1000, and 1e6 --> 1000000
- FactorStats.cpp - this class is used for capturing runtime information
so that factor rates and be shown while programs are running
- HashTable.cpp - this class provides a hash table, which is used
programs that need one
- SharedMemoryItem.cpp - this class provides a mutex for variables that
can be read and modified by multiple threads
- App.cpp - this is an abstract class that implements the core features
shared by all applications built using mtsieve. There is only one
instance of this class at any time when your program is running.
This thread will sleep whenever there are no WorkerThreads available.
- FactorApp.cpp - this is an abstract class that implements features
shared by applications that use mtsieve for factoring. It extends
App.cpp.
- WorkerThread.cpp - this is an abstract class that implements the core
functions for the process of each list of primes. There will be
one instance of this class for each worker thread when your program is
running.
Please refer to App.h and WorkerThread.h to see which method are
abstract. These are the methods that must be
written by any concrete class that extends App or WorkerThread.
Classes in opencl
The classes in this directory are the ones shared by all of the program
using the framework that rely on GPU. These classes include:
- Device.cpp - this class contains functions for identifying the
available GPU devices
- Kernel.cpp - this class contains functions for each kernel to be run
in the GPU
- KernelArgument.cpp - this class contains functions to define arguments
for each Kernel as well as allocating GPU memory for those arguments
- ErrorChecker.cpp - this class contains static functions to check the
status of each call to an OpenCL function
Please refer to App.h and WorkerThread.h to see which method are
abstract. These are the methods that must be
written by any concrete class that extends App or WorkerThread.
Classes in sieve
This is the sieveing source from primesieve
6.3, a library whose sole purpose is to generate a list of primes as
quickly as possible. Per the primsieve license, no changes should be
made to this code.
Runtime Options
mtsieve has a few command line options that are available to all programs
using the framework. These are:
- -h - to print help for the command line options. For GPU enabled
sievers this will list the available platforms and devices.
- -W - to specify the number of CPU worker threads. The default is
1. It cannot be set to 0 even if using GPU workers.
- -w - to specify how many primes each thread should work on at a time
before asking for more primes. The default is 1e6, but each
program in the framework can change the default. It will depends
upon how much time is spent in the thread for each chunk of primes
- -p - the minimum prime returned by the sieve
- -P - the maximum prime returned by the sieve. Most sieves have a
limit of 2^62. Some have a limit of 2^52. The limit is shown
at runtime.
- -A - to apply factors or to reformat a candidate file without sieving.
Programs with GPU support have some additional options available to them:
- -d - device to use on the platform
- -D - GPU platform to use
- -g - the number of blocks of primes per worker thread. This
number of primes per block dependent upon the GPU and the kernel.
The number of primes per worker will be output when the program starts.
- -G - the number of GPU worker threads. The default is 0.
Programs extending FactorApp (instead of App) have some additional
options available to them:
- -i - the name of an input file with input terms to factor. When
resuming sieving, the candidate list will be built from this file.
It will override any other options used when starting a new sieve.
- -I - the name of an input file with factors to apply to terms.
- -o - the name of an output file to write terms to. When the
program is terminated, any numbers that have not been factored are
written to this file.
- -O - the name of an input file with any new factors that have been
found. This file can be used as input to pfgw to verify factors
found by individual applications.
Each program using this framework, whether it extends App or FactorApp,
can add its own command line options, but it cannot replace any that are
built into the framework.
Software Using the mtsieve Framework
Here is a list of programs I've written the use this framework. All
are bundled with mtsieve and are available from the download link.
afsieve
This program searches for factors of Alternating
Factorials. This program supports these additional parameters:
- -n - the minimum n
- -N - the maximum n
- -s - the amount n is iterated by per call to the GPU
mfsieve
This program searches for factors of MultiFactorials.
This program supports these additional parameters:
- -n - the minimum n
- -N - the maximum n
- -m - multifactoral (ie x!m where m = 1 -> x!, m = 2 -> x!!, m =
3 -> x!!!, etc.)
- -s - the amount n is iterated by per call to the GPU
cksieve
This program searches for factors of Carol
/ Kynea numbers. These numbers a form of Near Square numbers with
the form (b^n-1)^2-2 and (b^n+1)^2-2. This program supports these
additional parameters:
- -b - the base to search
- -n - the minimum n
- -N - the maximum n
pixsieve
This program searches for factors of numbers that are a substring of a
long decimal string where each successive term adds on decimal digit to
the end of the previous decimal term. This program supports these
additional parameters:
- -l - the minimum length to search
- -L - the maximum length to search
- -s - the file which contains a decimal representation of a number (eg
pi, e, the Champernowne constant, etc.)
- -S - the starting point of the substring
- -N - the amount N is iterated per call to GPU (default 5000)
fbncsieve
This program searches for factors of numbers in the form k*b^n+1 and
k*b^n-1. It is specifically designed to replace similar
functionality in newpgen. This program supports these additional
parameters:
- -k - the minimum k to search
- -K - the maximum k to search
- -s - the sequence to find factors of. The sequence must be of
the form k*b^n+c where b, n and c take decimal values.
- -f - the format of the output file (A = ABC, D = ABCD, N = newpgen)
fkbnsieve
This program searches for factors of the form k*b^n+c for fixed k, b, and
n and variable c. This program supports these additional
parameters:
- -c - the minimum c to search
- -C - the maximum c to search
- -s - the sequence to find factors of. The sequence must be of
the form k*b^n+c where k, b and n take decimal values.
gfndsieve
This program searches for factors of numbers in the form k*2^n+1 for a
range of k and a range of k. It is specifically designed to replace
fermfact. The output from this sieve should be used with pfgw and
the -gxo switch to find GFN divisors. This program supports these
additional parameters:
- -k - the minimum k to search
- -K - the maximum k to search
- -n - the minimum n to search
- -N - the maximum n to search
- -T - the number of n per output file
kbbsieve
This program searches for factors of numbers of the form k*b^b+1 or
k*b^b-1 for fixed k and variable b. This program supports these
additional parameters:
- -k - the k value to search
- -b - the minimum b to search
- -B - the maximum b to search
xyyxsieve
This program searches for factors of x^y+y^x
and x^y-y^x numbers. This program supports these additional
parameters:
- -x - the minimum x
- -X - the maximum x
- -y - the minimum y
- -Y - the maximum y
- -s - the sign (+, - or b)
gcwsieve
This program searches for factors of Cullen and Woodall numbers.
This program supports these additional parameters:
- -b - the base to search
- -n - the minimum n to search
- -N - the maximum n to search
- -s - sign to sieve for (+ = Cullen, - = Woodall, b = both)
- -S - number of steps iterated per call to GPU
psieve
This program searches for factors of primorials.
This program supports these additional parameters:
- -n - the minimum n to search
- -N - the maximum n to search
twinsieve
This program searches for factors of twin numbers of the form k*b^n+1 and
k*b^n-1. This program supports these additional parameters:
- -k - the minimum k to search
- -K - the maximum k to search
- -b - the base to search
- -n - the n to search
- -s to sieve +1 and -1 sides independently
dmdsieve
This program searches for factors of number of the form
2*k*(2^p-1)+1. Numbers that are not removed from the sieve are
potential divisors of Double Mersenne numbers. This program supports
these additional parameters:
- -k - the minimum k to search
- -K - the maximum k to search
- -n - the n to search
How To Write Your Own Sieve Using the Framework
The best way to create your own sieve is to start with an existing sieve
that is close to what you need. This will save you a lot of time.
1) Copy the folder for a sieve similar to what you want.
2) Rename the folder to something meaningful (no spaces).
3) Inside that folder, rename the .h and .cpp files, but keep the
App.h, App.cpp, Worker.h, Worker.cpp portions of the name.
4) Using NotePad++ open those files.
5) Use the "find and replace all open files" to rename the class
names to the same as the file names.
6) Update the makefile and create a new object list for your sieve,
using the App and Worker names, but with the .o suffix instead of .cpp.
7) Update the makefile and add an entry for your program near the
end. This will tell make what objects are needed for the executable.
For example, let's say that you want a new sieve similar to
cksieve. Let's call it mcsieve, short for "my custom sieve".
1) Copy the carol_kynea folder and rename as my_custom
2) Rename CarolKyneaApp.h to MyCustomApp.h.
3) Rename CarolKyneaApp.cpp to MyCustomApp.cpp.
4) Rename CarolKyneaWorker.h to MyCustomWorker.h.
5) Rename CarolKyneaWorker.cpp to MyCustomWorker.cpp.
6) Edit those four files in NotePad++.
7) Use "find and replace in all files" to change "CarolKynea"
to "MyCustom" and save.
8) Add MC_OBJS to makefile setting to "my_custom/MyCustomApp.o
my_custom/MyCustomWorker.o"
9) Copy the entry for "cksieve" and name as "mcsieve".
10) For mcsieve, change CK_OBJS to MC_OBJS
11) Type "make mcsieve" from the command line and it should build
without errors.
Now for the fun part, writing your custom code.
MyApp constructor
Call SetBanner() to set the banner, which is printed, when the program
runs. Call SetLogFileNam() to set the name of the log file used by
this program. Set the variables specific to the program that can.
MyApp::Help()
Call FactorApp::ParentHelp() first, then print help for each of the program
specific command line options.
MyApp::AddCommandLineOptions(string &shortOpts, struct option
*longOpts)
Call FactorApp::ParentAddCommandLineOptions first, then append shortOpts as
necessary. For each short option, add a colon after the option to signify
that the option takes a parameter. Before returning call AppendLongOpt
to add long options that will be supported by the command line.
MyApp::ParseOption(int opt, char *arg, const char *source)
Call FactorApp::ParentParseOption() first and if it returns P_UNSUPPORTED,
then parse the argument for the commnad line option. For numeric
options, use Parser::Parse passing the argument along with the min and min
values for the option and the variable to hold the value that is parsed.
Return the status for parsing the option.
MyApp::ValidateOptions(void)
Use this method to apply additional validations to the input numbers, such
as verifying that one input is less than another input. It can also
set or adjust some command line options if they were not specified on the
command line. Call FatalError() if the value for a parameter is
invalid.
This method is responsible for creating the bit map representing candidates
used throughout the application.
Call FactorApp::ParentValidateOptions() before returning.
MyApp::CreateWorker(uint32_t id, bool gpuWorker)
This method will create an instance of the correct worker class and return
it to the caller. The gpuWorker flag indicates if this method should
create a GPU worker instead of a CPU worker. GPU worker can only be
true if ib_SupportsGPU was set to true in the constructor.
MyApp::ProcessInputTermsFile(bool haveBitMap)
This method is called twice. The first time it is called to determine
the min/max of the candidates in the source file. The second time it
will apply the candidates from the source file to the bit map created by
ValidateOptions().
MyApp::WriteOutputTermsFile(uint64_t largestPrime)
This method will create a file for the program that does the PRP
testing. This method must lock ip_FactorAppLock while accessing the
list of candidates then release ip_FactorAppLock before returning. The
largestPrime is typically written to the first line of the output file and
will be used as the starting prime if sieving is resumed from the file.
MyApp::ApplyFactor(const char *term)
This method is called at start up to apply factors from an input file.
The input term represents a candidate number, i.e. the actual string
representing the number tested by the PRP program. Parse this string
to find the candidate then remove that candidate from the from the
bitmap. Decrement il_TermCount for each candidate removed.
MyApp::ReportFactor()
Each sieving application will need a version of this method. In other
words each application will call this method with a different list of
parameters. This method will take the inputs and turn of the bit in
the bitmap. For each found factor, it must decrement il_TermCount and
increment il_FactorCount. This method must lock ip_FactorAppLock while
accessing the list of candidates then release ip_FactorAppLock before
returning.
This method should return a boolean indicating if the term was removed from
the bitmap.
MyWorker constructor
Allocate memory or resourced needed by this worker. This can also be
used to set instance variables from MyApp, such as the min/max for the
candidates being tested.
MyWorker::TestPrimeChunk(uint64_t &largestPrimeTested, uint64_t
&primesTested)
This is where the heavy lifting goes. The code in this method will
iterate thru the it_Primes vector to find factors for the candidates being
sieved. It must set largestPrimeTested and primesTested before
returning to the caller.
MyWorker::CleanUp(void)
This method will free memory and other resources allocated or created by the
constructor.
MyWorker::VerifyFactor()
Each sieving application will need a version of this method. In other
words each application will call this method with a different list of
parameters. If ReportFactor() returns true, then this function will
use a non-optimized method to verify that the factor truly devides the
candidates. It will call FatalError() if the prime does not divide the
candidate.
Version History
1.0.0 - Feburary 10, 2018
Initial Release
1.1.0 - Feburary 21, 2018
Add an internal flag that guarantee that suspends all but one Worker when
processing the first chunk of primes. This is used to improve performance
when there is a high factor density for low primes. This will also suppress
any on screen reporting or checkpointing until that chunk is processed.
Fix issue in computing CPU utilization.
Changed -c (chunksize) option to -w (worksize).
Change output to use shorter notation for min and max primes.
cksieve - Fixed.
gfndsieve - Enable the flag mentioned above.
fbncsieve - Enable the flag mentioned above.
fkbnsieve - Added, but not tested.
1.2.0 - Feburary 23, 2018
fkbnsieve is now working.
Modify cksieve to detect candidates that are prime and to log them.
Fixed an asm bug that at worst causes factors to be missed by fbncsieve and
gfndsieve. It will nor result in invalid factors and if it did, they would
be caught at runtime due to built-in factor checking that relies on
completely different code.
Added -A option to apply factors (or reformat candidate file) and exit
immdiately without sieving.
Added GPU classes. This adds the following command line options:
-D - to select the GPU platform
-d - to select the GPU device
-G - to specify the number of GPU workers
-g - to set multiple of workgroupsize which is used to compute the number
of primes per GPU worker
Added GPU workers to afsieve.
1.3.0 - March 11, 2018
Ensure that "ENABLE_GPU=no" in makefile builds all programs without error.
cksieve no longer gives a fatal error if the computed root is not an actual
root. This condition rarely happens, but is okay when it does.
Overriding -p from the command line should now work when starting with an
input file.
Added GPU workers to xyyxsieve. When using GPU workers, an overflow with
collecting factors can cause xyyxsieve to crash. If that happens override -S
and/or -g or sieve more deeply with the CPU before adding GPU workers. This
will be addressed in a future release.
Added GPU workers to pixsieve. It has not been tested yet.
1.4.0 - April 9, 2018
Some common functionality for GPU sieving has been moved to Worker.cpp.
All GPU workers validate factors found by the GPU.
The xyyxsieve GPU sieving issue has been resolved.
The pixsieve GPU sieving code has been tested.
GPU sieving has been added to mfsieve. It has been tested.
GPU sieving has been added to gfndsieve. It has been tested.
Add kbbsieve, for the form k*b^b+/-1 for fixed k and variable b. It has been
partially tested.
1.5.0 - April 10, 2018
kbbsieve is more fully tested. It now uses a powmod that is limited to 52
bits (switching from extended floating point to SSE), which should be at
least 10% faster.
1.6.0 - June 25, 2018
Fixed an error with factor rate calculation when less than 1 per second.
Fixed an issue with gfndsieve when continuing a sieve and k is less than n.
For kbbsieve, added some checks for algebraic factorizations.
Added gcwsieve for Cullens and Woodalls. This sieve is GPU enabled.
Renamed all ASM routines to easily distinguish FPU/SSE/AVX.
Added AVX asm code for use by the Worker classes.
Added a mini-chunk mode that can be used when the worker classes handles
primes in chunks, such as AVX mode, which is chunks of 16 primes.
gcwsieve supports AVX. The CPU-only code is about 30% faster than Geoff
Reynold's version.
xyyxsieve supports AVX. The CPU-only code is about 2.5x faster than the
previous version.
1.7.0 - July 4, 2018
Added a timestamp to lines written to the log.
Changed usage of some registers in AVX code to avoid ymm0-ymm3 being passed
between calls to AVX routines.
Added psieve for primorials.
psieve supports AVX and is about 30% faster than fpsieve.
1.7.1 - July 25, 2018
Fixed output for number of terms remaaining to support values > 2^32.
Modified psieve to support an input file created by fpsieve.
Fixed a bug in the non-AVX primorial ASM code that causes it to crash.
1.7.2 - July 27, 2018
Not released - fixes absorbed into 1.7.3
Fixed issue with reading ABCD files as input lines with 1 character would be
ignored.
Fixed a crash upon exit of fbncsieve.
Allow override with -p when starting fbncsieve and fkbnsieve from an input
file.
1.7.3 - August 3, 2018
Fixed a memory exception that affects GPU workers for all sieves.
Re-enable AVX support with psieve as accidentally disabled in 1.7.1.
Do not output factor rate if no factors found.
Fixed another issue in non-AVX psieve code that causes it to crash.