Welcome to the Great Internet Mersenne Prime Search! (The not-PC-only version ;)

This ftp site contains Ernst Mayer's C source code for performing Lucas-Lehmer tests of prime-exponent Mersenne numbers. It also includes a simple Python script for assignment management via the GIMPS project's PrimeNet server, or you can manually manage your work, if you prefer. In short, everything you need to search for Mersenne primes on your Intel, AMD or non-x86-CPU-based computer!

Mlucas is an open-source program for primality testing of Mersenne numbers in search of a world-record prime. You may use it to test any suitable number as you wish, but it is preferable that you do so in a coordinated fashion, as part of the Great Internet Mersenne Prime Search (GIMPS). Note that on x86 processors Mlucas is not as efficient as the main GIMPS client, George Woltman's Prime95 program (a.k.a. mprime for the linux version), but that program is not 100% open-source. Prime95 is also only available for platforms based on the x86 processor architecture.

Page Index:

[Shameless beg-for-money section] Please consider donating to support ongoing Mlucas development. While this is a long-term labor of love, not money, donations to help defray expenses are greatly appreciated. Whatever you can spare to support this kind of open-source distributed-computing project is welcome! Note that Paypal charges [2.9% + $0.30] for each such donation, e.g. a $100 donation means $96.80 makes it to me. If you prefer to reduce or avoid these fees from eating into your hard-earned money, you have two workaround options:

[1] Use the Paypal "Send Money" function with my e-mail address, and mark it as Friends and Family rather than a purchase. Note that such money-sending is only fee-less if you are in the US and you use a PayPal balance or a bank account to fund the transaction. Note that "bank account" appears to include debit cards linked to such, since my Paypal account is linked to such a unified bank/checking/debit-card money-market account and I never pay to send money to US relatives. But check your transaction preview details before sending to be sure!

[2] E-mail me for a snail-mail address to which you can send your old-fashioned personal check for only the cost of your time and postage.


20 Mar 2022: Patch Alert: A user assertion-exit report revealed a bug in the logic used to perform the Gerbicz error check in PRP tests. The bug only manifests for users doing PRP tests and who have set CheckInterval = 1000, the minimal allowable value, in their mlucas.ini file, overriding the default values of 10000 (≤ 4 threads) and 100000 (> 4 threads). (The only good reason to use such a small value is on slow devices where 10000 iterations need substantially more than, say 10 minutes, as checkpoints more frequent than a minute or so will cost 1-2% throughput). The version number remains v20.1.1.

02 Dec 2021: Patch Alert: Due to a user report of bad behavior of regular (non '-9') kill with his multithreaded run, signal handling has been changed to immediate-exit without savefile write. Suggest users 'killall -9 Mlucas' any ongoing jobs at their earliest convenience and switch to the updated code; using that, regular 'kill' should work. Also, workfile assignments are now echoed to the per-exponent log (.stat) file, not just to stderr (e.g. to the nohup.out file).

23 Nov 2021: Patch Alert:

06 Nov 2021: Patch Alert: There was a PRP-postprocessing bug leading to a sanity-check assertion-exit, and a p-1 premature-exit-at-end-of-stage-1 bug in the initial v20.1.1 release. (01 Nov, md5 = e3302de913e7bf65a83985d68a1193e1). Please make sure you have the current version if you do either of these kinds of GIMPS assignments.

01 Nov 2021: v20.1.1 Patch-release, containing two modest feature-adds and several bugfixes. Thanks to tdulcet and kriesel for much beating-on-the-code QA work, bug reports and stack traces:

Feature-add #1 is to support several new user options stored in a file mlucas.ini in the user's run directory. These are CheckInterval, LowMem and InterimGCD. See the help.txt file for details and restrictions.

Feature-add #2 is to support p-1 "stage 2 continuation", meaning if one has run p-1 to the default stage 1 and 2 bounds B1 and B2 (or to stage bounds specified via the Pminus1= assignment format) and failed to find a factor, one can run one or more deeper stage 2 intervals at the same B1 in an attempt to find a factor. See the help.txt file for details and examples.

Here are descriptions of the bugfixes:

31 Aug 2021: v20.1 released: This is an update-release of v20, but with enough changes as to warrant a minor-version number increment.

I urge users to delete (or rename) the mlucas.cfg file they are using for runs and run the self-tests using the v20.1 build to generate a fresh one, due to the v20 suboptimal-radix-set selection issue mentioned in the list below.

Changes include:

30 Jul 2021: v20.0 released: The major feature-add is support for p-1 factoring. Special thanks to Mihai Preda and George Woltman for the total-immersion course in state-of-the-art p-1 stage 2 optimizations, and Paul Underwood for remote access to his Odroid N2 with 4GB RAM, which allowed me to test my stage 2 implementation in an Arm-NEON-SIMD build context. Thanks also to Teal Dulcet for help with the make-Makefile scripting, and Ken Kriesel for trying the automake under WSL.

The key new runtime flag added in v20 is -maxalloc, which takes an argument representing the limit on the percentage of system free RAM used in p-1 stage 2. Note: free RAM, not total system RAM; for Linux this is based on the value of the sysinfo "freeram" field. The default is 90%; if you see start seeing very small numbers in the 'top' Mem-usage summary (typically several lines above the big columnwise tasks-table) or said table starts showing the dreaded 'kswapd' entries near top of the %CPU column, you need to lower the allocation. I've not found a reliable way to obtain free-RAM numbers for MacOS, so the default there is 50% of available RAM, based on the sysctl "hw.memsize" value.

Available memory determines the number of floating-point-residue-sized 'buffers' used in the p-1 stage 2 prime-pairing algorithm. The greater the number of buffers, the greater the fraction of stage 2 primes we can pair, which reduces the stage 2 work. At the current GIMPS testing wavefront of 6M FFT each stage 2 buffer needs about 50MB of RAM. Note that once one gets beyond 500-1000 memory buffers in stage 2, one gets rapidly diminishing returns, as the prime-pairing percentage approaches 100% − this plot illustrates for the Mlucas v20 pairing algorithm (blue circles represent actual pairing %; the black curve is a semi-decent-at-fewer-than-2000-buffers curve fit. (The code uses the actual data tables to make its determination, not the curve fit.)

Savefiles: During stage 1, the code uses the same redundant p[exp], q[exp] savefile pair as for LL/PRP tests, where [exp] stands for the exponent p of the Mersenne number M(p) being used. On completion of stage 1 the p[exp] file is renamed p[exp].s1; this can later be used to run stage 2 continuations to deeper-than-default bounds, if the user wishes. (This is "advanced usage", not recommended − I will likely add a note to the thread linked at bottom of this release summary with how-to details; note that a deeper stage 1 requires rerun-from scratch). Stage 2 uses a single p[exp].s2 savefile; both s1 and s2 savefiles are roughly [exp] bits in size. During p-1 the logfile (p[exp].stat) output will also look very similar to that for LL/PRP tests, except each checkpoint-summary will include either 'S1' or 'S2', e.g.

[2021-07-28 13:22:38] M110763649 S2 at q = 3270330 [ 1.99% complete] clocks = 00:05:34.349 [ 33.3849 msec/iter] Res64: 99F98F04EFD4E49A. AvgMaxErr = 0.124210464. MaxErr = 0.187500000.
For both p-1 stages, 'iter' refers to FFT-based mod-M(p) multiplies (more precisely, "mod-squaring equivalents" in requiring precisely one forward and one inverse FFT); in stage 2 each such multiply processes between 1 and 2 stage 2 primes, the precise value depending on the number of memory buffers allocated; the value is the reciprocal of the ordinate value in the above-linked plot.

Note that for p-1 run ahead of a scheduled PRP test of the same exponent, stage 1 − let's say it does s1 iterations − is not fused with the corresponding opening s1 iterations of the PRP test. This is a possible future enhancement.

Other stuff:

Note that the p-1 support in v20 requires installation of GMP, the GNU arbitrary precision library, whose greatest common divisor (GCD) function is used to extract any found factors from the M(p)-sized residue generated by the p-1 code. The Build section has notes on how to check whether GMP is already installed on one's target system, and how to install it if not.

Users of low-memory systems (< 4GB RAM) such as Raspberry Pi and Odroid-style micro-PCs will only get stage 1 of the p-1 algorithm, since running an effective stage 2 on an exponent requires a minimum of 30 floating-point-residue-sized 'buffers', which at the current GIMPS testing wavefront translates to around 1.5GB of free RAM. I did an Arm-Neon SIMD build of v20 on my sole remaining Samsung Galaxy S7 android phone, and quickly discovered that nearly all of its 4GB RAM was eaten up by various system tasks and unremovable apps. (Linux-running micros will be better in that regard, but still.)

Users who for any reason decide continue with v19: please make sure you are using the patched version of the primenet.py script described at top of the v19.1 section here.

Post feedback and any questions regarding v20.1.1 here.

Return to Page Index

Jump to Download & Build section

General Questions:

For general questions about the math behind the Lucas-Lehmer test, general primality testing or related topics (and also no small number of unrelated topics, such as this one following in the literary-humor footsteps of Ambrose Bierce), check out the Mersenne Forum.

For discussions specifically about Mlucas at the above venue, see the Mlucas subforum there.


The 'which' commands are how to check if various pieces are already installed. I list only components which are not always included in a typical Linux distro or MacOS release:

Users need GCC ('which gcc') and/or the Clang/LLVM compiler ('which clang'; this is the native one for MacOS), the xzip compression package ('which xz'), GNU make ('which make'), and − for those using some version of the primenet.py work-managment script − python ('which python'). Nice-to-haves include wget and gdb. For example, under Ubuntu Linux one would use

sudo apt -y install build-essential xzip wget python
(build-essential instead is a meta package that installs gcc/g++/make and a few other packages commonly used in a standard libc toolchain);

Mlucas v20 and later also require GMP, but the install of that package is described separately, below.

Windows Users:

Note that as described in this article in The Hacker News, Win10 supports Linux Bash Shell and Ubuntu Binaries; that is referred to as Windows Subsystem for Linux (WSL). One important consideration applies to p-1 factoring using Mlucas v20 under the updated WSL2 − an Mlucas user who follows these things writes:
WSL 2 is supposed to be faster than WSL 1, at least for IO, but I suspect Mlucas might actually be faster with WSL 1, since there would be no overhead from the VM used with WSL 2 (do not quote me on that). With WSL 2, Mlucas would also not be able to access all the system's RAM and the RAM usage percentage would be incorrect, since it would only be for the VM. If someone is actually going to use Mlucas with the WSL, they may want to first test with both WSL 1 and 2.
Another user adds further notes re. WSL1-versus-2:
Manual WSL1 installation is straightforward, and possible on hardware that lacks virtualization features required for WSL2, allowing multithreaded Mlucas builds and runs on Windows 10 systems that can't run WSL2. Case in point, Xeon Phi 7210 & 7250, and some other processor models too. On Win 10, WSL1 install is simple: Manual method step 1, restart, step 6, and use as soon as it completes the distro download, bsdtar extract visible in Task Manager, and install, in my case Ubuntu 18.04 LTS.

DOWNLOAD AND BUILD THE CODE − Automated Linux-script Method

Thanks to fellow Mersenner/coders Teal Dulcet and Daniel Connelly, Linux users now have the option of using a very nice auto-install script which automates the source-download, builds in fast parallel-make mode and does an auto-tuning stage to identify the run mode (number of Mlucas instances and core/thread combinations for each) which maximizes total throughput on the user's system. They have also put together a primenet.py script based on Loïc Le Loarer's Primenet-v5-API-capable one released with Mlucas v19, but with many enhancements including support for other GIMPS clients, including GPU ones (GpuOwl and CUDALucas) and more of a "Prime95/mprime feel" w.r.to command-line options. (Note that their script supports the p-1 factoring and LL/PRP-test worktypes; it does not support trial-factoring). That install and work-management flow is described here. [Opens on separate page in same browser tab, unless user opens in new tab.]

Users switching from one version of the script to the other for any reason should delete the current primenet.py script and local.ini file from their run directory and re-register using the new script.

For the Do-It-Yourselfers, the build and tune flow is detailed below.

Installing GMP (GNU arbitrary precision library) and HWLOC (hardware-topology package)

Linux: If 'ls -l /usr/include/gmp*' comes up with one or more header files, it is likely that GMP is already installed on your system. If that comes up empty: Should libgmp-dev trigger a "not found" package manager error, you need to search for the package name. For Linux distros which use the apt package manager, 'apt search gmp'. Under CentOS, 'yum search gmp'. Since Mlucas only supports 64-bit builds, the x86_64 version of GMP is what you want, just install anything that comes up in the search with a name like gmp.x86_64 or gmp-devel.x86_64 .


Here is the procedure − the needed downloads can all be to the ~/Downloads folder, no need to put them in any special location:

1. Download the .tar.xz of the current GMP from https://gmplib.org/ (in my case, that was gmp-6.2.1.tar.xz):

2. install gmp:

	tar xJf gmp*xz
	cd gmp-[your downloaded version]
	make check
	make install (re-run with sudo if you receive a 'permission denied' message)

HWLOC: [NOTE: this package will only be needed as of v21, but is still useful for visualizing the hardware topology, especially for manycore and hybrid BIG+little processors.] In the context of 'apt', another useful associated tool is 'apt-file', which allows one to search for the package which provides a given file: for example, when first playing with the hwloc API, I needed the hwloc.h header, which was still missing after I installed the hwloc shared-object runtime libraries. I first used 'sudo apt-get install apt-file' to install apt-file, followed by 'apt-file update' to build the file cache needed by that for its search. Lastly, 'apt-file find hwloc.h' revealed that under Ubuntu one needs the libhwloc-dev package to get the needed header, thus 'sudo apt install -y libhwloc-dev'.

Installing hwloc on CentOS is even more fun: between CentOS 8.2 and 8.3 some clever soul decided to change the package-name capitalization from 'PowerTools' to 'powertools', so you first need to check your CentOS version, using 'lsb_release -a'. (If that gives 'lsb_release: command not found', you first need to - yep! - install yet another package, this one from the RedHat - sorry, I mean 'redhat' - side of the RedHat/CentOS alliance: 'sudo yum install redhat-lsb-core'). Once you know your CentOS version number:

More fun: On CentOS the resulting command-line tool was named not 'lstopo' but 'lstopo-no-graphics, so I added alias lstopo='lstopo-no-graphics to my .bashrc file to uniformize command naming across my various Linux machines.

The command-line tool can be used to generate a nifty graphical representation of your hardware topology output to various image formats, e.g. 'lstopo my_machine.svg' produces output in scaled vector graphics format. Here are some examples of such graphical topology views:

  1. My old 4-core/4-thread (i.e. no hyperthreading) Intel Haswell CPU − in addition to the CPU and the 8GB RAM (under "Machine" at top left) we also see schematic blocks for five PCI-attached devices in the right half: An old nVidia GPU 'card0' which was removed long ago but is still mapped in the OS's device-management files, a pair of AMD Radeon VII GPUs 'card1' and 'card2' currently hosted by this system (each with 16GB onboard HBRAM), an SSD 'sda' and a USB WiFi stick 'enp7s0';
  2. A top-line 18c36t Intel Skylake-X CPU − Note that like most Intel manycore systems, the bandwidth-to-memory is alas woefully insufficient to keep all those lovely 512-bit-vector-SIMD-supporting cores fed with data;
  3. A 61c244t Intel Knights Corner coprocessor-CPU with 16GB onboard MCDRAM (of which 15GB are available for user jobs) − note the bizarre internal logical-core numbering. Said numbering maps to the proc/cpuinfo file, which gives no clue as to the topology, i.e. that logical cores (the 'P#' entries) 0, 241, 242 and 243 all map to the same physical processor core 0. Without hwloc enabled in one's Mlucas build, to bind an instance to all 4 threads (logical cores) on said physical processor core 0, one would need the highly nonintuitive './Mlucas -cpu 0,241:243', whereas as of v21, with hwloc one can simply use './Mlucas -core 0:0:4', where the index triplet encodes a range of physical cores to use and how many threads to run on each, lo:hi[:threads_per_core].
  4. A 68c272t Intel Knights Landing CPU − this CPU has 16GB of high-bandwidth onboard MCDRAM, which can at least somewhat keep up with all those 512-bit-vector-SIMD processing cores, which are more numerous than on the above Skylake-X manycore system, but underclocked at 1.5GHz for heat-dissipation reasons (KNL CPU dies are a massive full square inch, over 600 mm^2, at a 14nm process). I further installed 192GB of DIMM-based RAM, of which the OS reserves 4GB for swap, leaving 188GB for user-process usage. With both types of memory available, the OS has mapped the DIMM-RAM to NUMA node 0 and the MCDRAM to NUMA node 1. For processes which fit comfortably into the MCDRAM one can bind them, memorywise, to the MCDRAM by prepending 'numactl -m 1' to the process invocation (after first installing the numactl package). Processes with larger memory footprints will by default run out of the slower DIMM-RAM, but one can still hint to the OS to use the MCDRAM as a level 3 cache for such jobs by prepending 'numactl --preferred=1'.
    Unlike the above KNC CPU, the KNL follows the general Intel logical-core numbering convention, but to bind an instance to all 4 threads on physical core 0, one would use './Mlucas -cpu 0,68,136,204' or equivalently, './Mlucas -cpu 0:271:68', whereas with hwloc incorporated into a v21 build one will again be able to use just './Mlucas -core 0:0:4'.


To do primality tests, you'll need to build the latest Mlucas C source code release. First get the release tarball, available via wget as wget --backups=1 https://www.mersenneforum.org/mayer/src/C/mlucas_v20.1.1.txz (the '--backups=1' option causes wget to append a '.1' to any existing tarfile of the same name, which can come in handy; note that 'wget -O' permits direct-overwrite-of-existing, but requires the target filename to be explicitly listed by name), or via-click-to-download, here: mlucas_v20.1.1.txz (20 Mar 2022, 1689292 bytes ~= 1.6 MB, md5 checksum = 888c95d3289d3b49a7c9cdba8aeab9b9). Then use 'tar xJf mlucas_v20.1.1.txz' to one-step uncompress/unpack the archive.

That unpacking will create a directory next to the .txz file whose name is the same as the prefix of the compressed tarchive, mlucas_v20.1.1. Inside that directory you will see the following ascii-text files: a bash-script file makemake.sh, a Python script primenet.py and a help.txt file with descriptions of the various command-line options. You will also see a directory 'src' containing the various C source and header files. If you wish to view any of the sourcefiles in an editor, I recommend using a 4-column tab setting, since much of the C code and especially the inline-assembly contained in various .h files is in multicolumn form and needs a 4-column tab setting to line up properly for viewing.

'bash makemake.sh' will create an 'obj' directory next to src, create a Makefile suitable for our CPU architecture inside obj, do a parallel-compile dividing the task of compilation among as many threads as your CPU has cores and, if all goes well, create an executable inside obj titled simply 'Mlucas'.

If you want to build with the Clang/LLVM (or other GCC-compatible) compiler instead of GCC, run export CC=clang before running the makemake.sh script to build Mlucas with Clang instead of the default GCC. If you have already run the script and want to recompile Mlucas with a different compiler without rerunning the entire script, you must first run 'make clean' inside the obj directory and then invoke 'make' the desired compiler's name:

cd mlucas_v20.1.1/obj/
make clean
CC=[compiler name] make -j "$(nproc)"
On my Intel Haswell quad running Ubuntu, the latest version of Clang for that was installed as clang-9, so after first building and testing with GCC, I made that replacement in the above CC= snip, then tested the resulting Clang binary for performance against the GCC one. (I found no appreciable difference on that machine.)

Build Errors: Should the above make-script indicate errors, the first thing to do is to view the build.log file generated in the 'obj' directory for the keyword 'error' − use case-insensitive search, so e.g. 'error' and 'Error' both get flagged. If you can't figure out the problem that way yourself, please e-mail me copies of the Makefile and build.log files insde 'obj'.

Issue: "clang: error: the clang compiler does not support '-march=native'": This affects Clang builds on the Arm platform, including Apple Silicon CPUs such as M1. The workaround is 'bash makemake.sh asimd', which will build an Mlucas_asimd-named binary in the obj_asimd directory. The binary can be renamed just 'Mlucas' post-build, if you prefer.

Issue: "cc: Command not found": Some users have reported hitting this make error under certain WSL installs. On both my Mac and various Linux boxes, 'which cc' points to /usr/bin/cc, and 'ls -l /usr/bin/cc' shows it to be an alias (soft-link) to gcc. But that is an assumption on the part of the makemake.sh script, which clearly does not hold on all Linuxes. On such systems, typing 'export CC=gcc' before retrying 'bash makemake.sh' should fix the problem.

Cross-compilation: The default mode for makemake.sh is to compile for whatever the highest vector-SIMD mode and other key architectural features are supported by the host machine. In general, cross-compiled binaries will not have quite the performance as natively compiled ones. Nevertheless, we anticipate that some users will want the convenience of building for multiple targets on a single host build platform. To that end, the script supports a single optional SIMD-mode targeting flag, which assumes the build host platform is capable of building said mode. The flag must be one of the supported SIMD-arithmetic types: [x86_64: avx512_skylake avx512_knl avx2 avx sse2]; [Armv8: asimd]. E.g. to build an avx2-SIMD binary on a skylake-X host, 'bash makemake.sh avx2' will create an object-file directory obj_avx2 and a binary name Mlucas_avx2 within it.

Rebuilding an updated version of the source file archive: This may be needed when upgrading from a beta to a release version, or after downloading a patched version of a release, or for users who want to make local modifications to one or more source files. Generally, one should always start with 'make clean' in the obj-file directory (or 'rm -f obj/*.o' from the same top-level directory which contains makemake.sh), making sure one substitutes obj_[extension] for 'obj' in the case of a cross-compiled binary. Then 'cd ..' and re-run 'bash makemake.sh' on the updated source-file archive in /src. Gnu make is smart enough that after download of a new src-file archive version or user-modification of one or more source files, it will rebuild only the *.c files that have changed, but that can be dangerous as it does not account for any header-file changes that may have occurred. One of the nice things about parallel-make is that it's so speedy that full-rebuilds are easy and fast. Incremental rebuilds of just or more .c files followed by relinking to obtain a new executable should be untertaken only by users who really know what they are doing.


BASIC BUILD SELF-TEST; The -nthread and -cpu flags

Once you have successfully linked a binary, I suggest you first try a spot-check at some smallish FFT length, say
./Mlucas -fft 192 -iters 100 -radset 0
You will want to look through the resulting informational output for a line reading "INFO: System has [X] available processor cores.", which I have bolded below in the sample output from my Core2 macbook. Here, the number reported refers to *logical* (virtual) processor cores. Intel and AMD users, if this number is double the number of physical system cores, that means your CPU supports hyperthreading. This is important in the "Performance Tune for Your Machine" section below.

This particular testcase should produce the following 100-iteration residues, with some platform-dependent variability in the roundoff errors and possibly the final residue shift count, which users need not concern themselves with:

INFO: System has 2 available processor cores.
100 iterations of M3888509 with FFT length 196608 = 192 K, final residue shift count = 744463
Res64: 71E61322CCFB396C. AvgMaxErr = 0.255430821. MaxErr = 0.312500000. Program: E19.1
Res mod 2^35 - 1 =          29259839105
Res mod 2^36 - 1 =          50741070790
[If the residues differ from these internally-pretabulated 100-iteration ones, the code will emit a visually-loud error message.]
If that works, try rerunning the same case, now with 2 threads rather than the default single-threaded:
./Mlucas -fft 192 -iters 100 -radset 0 -nthread 2
The -cpu flag and the differing Intel and AMD hyperthreaded-core-numbering conventions:

On non-hyperthreaded CPUs, this should nearly double the throughput (= half the runtime) versus the initial single-threaded (default) run. On hyperthreaded x86 processors, Intel users should see a nearly 2-fold speedup running this way, but AMD users won't. That's because '-nthread 2' really translates to 'run 2-threaded, with thread affinities set to logical CPU cores 0 and 1'. By 'logical cores' we mean the multiple (typically 2, but sometimes more) 'virtual cores' mapped to each physical CPU core in modern 'hyperthreaded' (Intel's term) CPU architectures. The Intel numbering system here is that on a system with n physical cores, physical CPU core 0 maps to logical cores 0 and n; physical CPU core 1 maps to logical cores 1 and n+1, etc. AMD uses a different logical-core numbering convention than Intel, whereby physical CPU core 0 maps to logical cores 0 and 1; physical CPU core 1 maps to logical cores 2 and 3, and so forth. The Intel-specificity of the Mlucas -nthread option is one reason it is deprecated (still supported but recommended-against) in v17 and beyond; another is that it does not permit setting the processor affinity of an Mlucas instance to a specific set of logical cores.

For these reasons Mlucas v17 introduced a new and much-more-flexible flag '-cpu', which accepts any mix of comma-separated individual core indices and core-index ranges of form low:high and low:high:stride, where if stride is omitted it defaults to 1, and if high is also omitted, it means "run 1 thread on logical core [low]". Thus for our Intel user, -nthread 2 is equivalent to -cpu 0:1, but now our user can run a second 2-threaded job using -cpu 2:3 and be reasonably sure that the two runs are not competing for the same CPU cores. Our AMD user will similarly see no runtime benefit from replacing -nthread 2 with -cpu 0:1 (since on AMD both have the same effect of overloading a single physical CPU core), but will find that -cpu 0,2 (or in colon-delimited syntax, 0:2:2, i.e. 'use cores 0 though 2 in increments of 2') gives the expected 2-threaded speedup.


[Advanced users: For a complete list of Mlucas command line options, type 'Mlucas -h', and note the topical help-submenu options.]

After building the source code, the first thing that should be done is a set of self-tests to make sure the binary works properly on your system. During these self-tests, the code also collects various timing data which allow it to configure itself for optimal performance on your hardware. It does this by saving data about the optimal FFT radix combination at each FFT length tried in the self-test to a configuration file, named mlucas.cfg. Once this file has been generated, it will be read whenever the program is invoked to get the optimal-FFT data (specifically, the optimal set of radices into which to subdivide each FFT length) for the exponent currently being tested.

To perform the needed self-tests for a typical-user setup (which implies that you'll be either doing double-checking or first-time probable-prime (PRP) testing), first remove or rename any existing mlucas.cfg file from a previous code build/release in the run directory, then type − note you'll want to insert a specific set of -cpu options from the list below in place of the [-cpu flags] placeholder − the following to run a bunch of self-tests. This first basic test will run single-threaded self-tests and needs from a few minutes on fast Intel hardware to upwards of an hour on humbler CPUs such as my little Odroid:

./Mlucas -s m [-cpu flags] >& test.log
mv mlucas.cfg mlucas.cfg.1thr

The above 'Mlucas -s m' command tells the program to perform a series of self-tests for FFT lengths in the 'medium' range, which currently means FFT lengths from 1024K-7680K, covering Mersenne numbers with exponents from 20M − 143M. You should run the self-tests under unloaded or constant-load conditions before starting work on any real assignments, so as to get the most-reliable optimal-FFT data for your machine, and to be able to identify and work around any anomalous timing data. (See example below for illustration of that). This may take a while, especially in single-threaded mode; you can monitor progress of the process by opening the mlucas.cfg file in an editor and watching the various-FFT-length entries get added as each set of tests at a given FFT length completes. When done, please check the resulting test.log file for error messages. You may see a few messages of the form

***** Excessive level of roundoff error detected - this radix set will not be used. *****

but a whole lot of such, or residue-mismatch or other kinds of errors means that something has likely gone awry in your build. This can be something as mundane as the compiler using unsafe optimizations for one or more FFT-radix functions, or something more serious. In such cases, please contact me, the program author, and attach zipped copies of your build.log and test.log, along with information about your compiler version and compute platform (CPU and OS).

Setting Up for Running on Multicore Systems:

Much testing experience shows that on systems with 4 or more physical cores, the optimal tradeoff between run-management-effort, RAM usage (especially in the presence of p-1 work) and total system throughput is achieved by assigning 2 or 4 physical processors (and possibly their logical counterparts if hyperthreaded) per Mlucas instance. Further, even if your CPU can run more than 4-threaded, you don't want to run small FFT lengths (< 1M) with > 4 threads, as the thread-overhead is going to kill parallel performance for those. Here are my suggestions what to enter in the [-cpu flags] field for several common hardware types and how to create a simple bash runscript which on boot-up can be invoked just once as 'bash jobs.sh', if your goal is to maximize total throughput of your system.

Note that in the primenet.py arguments in the runscript examples, the '-T 150' arguments represent a preferred worktype of "First time PRP tests", which is the default handed out by the server to decently fast user machines. See the Get exponents from PrimeNet section below for details regarding the possible choices here. The '-maxalloc' argument provided for each Mlucas instance limits the maximum amount of memory used by stage 2 of p-1 factoring. The default is 90% of free RAM, but for multiple instances one wants to allow for the fact that 2 or more instances may be doing simultaneous stage 2 work and cut the limit somewhat. The best values depend on the users' work mix; the values below are best guesses. Lastly, the copy-into-multiple-run-directories commands and sample bash scripts assume the user has registered their CPU using the primenet.py script, thus creating a local.ini file as detailed in Step 3 below − don't do the actual setup or run the bash script until this has been done, and you have decided on a preferred worktype for yourself:

If for some reason you want to generate optimal-FFT-params data for a single FFT length not covered by the standard self-tests, you can do so using the following command template:

./Mlucas -fft [n] -iters [100|1000|10000] [-cpu [args]]
First specify the FFT length, in units of Kdoubles − the supported lengths are of the form [8,9,10,11,12,13,14,15]*2k, with k some integer ≥10. Then replace the [-cpu [args]] placeholder in the command block below with the desired cores-to-use specifiers: nothing for 1-threaded self-test, -cpu [args] for multithreaded. Lastly, if you are trying to run a single-FFT-length self-test, you must explicitly specify the iteration count via '-iters [100|1000|10000]' − 100 is OK for 1-thread tests, but I suggest using 1000 for thread counts between 4 and 15, and 10000 for ≥ 16 threads, in order to reduce the thread-and-data-tables-initialization overhead to a reasonable level.

Each single-length self-test should add 1 line to your mlucas.cfg file. The cfg-file lines appended by such single-length self-tests will have some additional residue data following the "radices = " listing, which you can ignore, since Mlucas stops parsing of these lines after reading in the radices.

Format of the mlucas.cfg file:

If you are running multiple copies of Mlucas, a copy of the mlucas.cfg file should be placed into each working directory, along with a worktodo.ini file containing assignments from the PrimeNet server which will be done by a copy of the Mlucas executable run from that working directory. Note that the program can run without the .cfg file, but with a proper configuration file (in particular one which was run under unloaded or constant-load conditions) it will run optimally at each runlength.

What is contained in the configuration file? Well, let's let one speak for itself. The following mlucas.cfg file was generated on a 2.8 GHz AMD Opteron running RedHat 64-bit Linux. I've italicized and colorized the comments to set them off from the actual optimal-FFT-radix data:

	# mlucas.cfg file
	# Insert comments as desired in lines beginning with a # or // symbol, as long as such commenting occurs below line 1, which is reserved.
	# First non-comment line contains program version used to generate this mlucas.cfg file;
	# Remaining non-comment lines contain data about the optimal FFT parameters at each runlength on the host platform.
	# Each line below contains an FFT length in units of Kdoubles (i.e. the number of 8-byte floats used to store the
	# LL test residues for the exponent being tested), the best timing achieved at that FFT length on the host platform
	# and the range of per-iteration worst-case roundoff errors encountered (these should not exceed 0.35 or so), and the
	# optimal set of complex-FFT radices (whose product divided by 512 equals the FFT length in Kdoubles) yielding that timing.
	2048  sec/iter =    0.134  ROE[min,max] = [0.281250000, 0.343750000]  radices =  32 32 32 32  0  0  0  0  0  0  [Any text offset from the list-ending 0 by whitespace is ignored]
	2304  sec/iter =    0.148  ROE[min,max] = [0.242187500, 0.281250000]  radices =  36  8 16 16 16  0  0  0  0  0
	2560  sec/iter =    0.166  ROE[min,max] = [0.281250000, 0.312500000]  radices =  40  8 16 16 16  0  0  0  0  0
	2816  sec/iter =    0.188  ROE[min,max] = [0.328125000, 0.343750000]  radices =  44  8 16 16 16  0  0  0  0  0
	3072  sec/iter =    0.222  ROE[min,max] = [0.250000000, 0.250000000]  radices =  24 16 16 16 16  0  0  0  0  0
	3584  sec/iter =    0.264  ROE[min,max] = [0.281250000, 0.281250000]  radices =  28 16 16 16 16  0  0  0  0  0
	4096  sec/iter =    0.300  ROE[min,max] = [0.250000000, 0.312500000]  radices =  16 16 16 16 32  0  0  0  0  0
Note that as of Jun 2014 the per-iteration timing data written to mlucas.cfg file have been changed from seconds to milliseconds, but that change in scaling is immaterial with respect to the notes below.

You are free to modify or append data to the right of the # signs in the .cfg file and to add or delete comment lines beginning with a # as desired. For instance, one useful thing is to add information about the specific build and platform at the top of the file. Any text to the right of the 0-terminated radices list for each FFT length is similarly ignored, whether it is preceded by a # or // or not. (But there must be a whitespace separator between the list-ending 0 and any following text).

One important thing to look for in a .cfg file generated on your local system is non-monotone timing entries in the sec/iter (seconds per iteration at the particular FFT length) data. for instance, consider the following snippet from an example mlucas.cfg file (to which I've added some boldface highlighting):

	1536  sec/iter =    0.225
	1664  sec/iter =    0.244
	1792  sec/iter =    0.253
	1920  sec/iter =    0.299
	2048  sec/iter =    0.284

We see that the per-iteration time for runlength 1920K is actually greater than that for the next-larger vector length that follows it. If you encounter such occurrences in the mlucas.cfg file generated by the self-test run on your system, don't worry about it − when parsing the cfg file the program always "looks one FFT length beyond" the default one for the exponent in question. If the timing for the next-larger-available runlength is less than that for the default FFT length, the program will use the larger runlength. The only genuinely problematic case with this scheme is if both the default and next-larger FFT lengths are slower than an even larger runlength further down in the file, but this scenario is exceedingly rare. (If you do encounter it, please notify the author and in the meantime just let the run proceed).

Aside: This type of thing most often occurs for FFT lengths with non-power-of-2 leading radices (which are algorithmically less efficient than power-of-2 radices) just slightly less than a power-of-2 FFT length (e.g. 2048K = 221), and for FFT lengths involving a radix which is an odd prime greater than 7. It can also happen if for some reason the compiler does a relatively poorer job of optimization on a particular FFT radix, or if some FFT radix combinations happen to give better or worse memory-access and cache behavior on the system in question. Such nonmonotonicities have gotten more rare with each recent Mlucas release, and especially so at larger (say, > 1024K) FFT lengths, but they do still crop up now and again.


Assuming your self-tests ran successfully, reserve a range of exponents from the GIMPS PrimeNet server.

By far the easiest way to do this and also submit results as they become available is to use the Python script named primenet.py for automated PrimeNet assignments management − this is to be found in the top-level directory of the unzipped Mlucas source-archive, next to the makemake.sh script. the first thing to do is to invoke the help menu to see the various options and their abbreviated names (e.g. '--username' can be abbreviated as '-u'):

python primenet.py -h
Next, register your computer under your Primenet username with the server, now using the abbreviated forms of various double-minus-prefixed options listed in the above help output − note that <> and [] mean required and optional arguments, respectively:
python primenet.py [-d] -r -u <uid> [-T <worktype>] [-H <computer name>] [--frequency <CPU freq (MHz)>] [-m <total mem (MiB)>]
The '-d' flag enables helpful debug diagnostics, and '-r' is short for '--register'. The registration step creates a local.ini file; as the PrimeNet sever will preclude the system from getting some worktypes if the user sets too low of a value for the CPU frequency or total memory, I suggest users manually edit the local.ini file after registration to fix any incorrect values for their machine; e.g. on my Intel Core i3 NUC mini, I set 'frequency = 2000' and 'memory = 8192' to reflect the 2GHz CPU frequency and the 8GB RAM.

p-1 Assignments: Note that although Mlucas v20.1.1 supports p-1 testing in form of either Pfactor or Pminus1 assignments in the worktodo.ini file, in terms of the primenet.py script p-1 is only supported implicitly, i.e. as a Pminus1 assignment split from an LL|PRP assignment needing p-1 done: Mlucas does this splitting automatically when it encounters such an LL|PRP assignment with p-1 remaining to be done in the workfile. Users who want to do standalone p-1 testing must either fetch such assignments from the server via the Manual Tests form and paste them into the worktodo.ini file, or use the Dulcet/Connelly primenet script linked above, which supports standalone p-1 assignment fetch.

After you create however many run-subdirectories you want to run jobs from (say, one per 2 or 4 physical CPU cores of your system as in the above 'Performance-tune for your machine' section), you need to copy the mlucas.cfg file resulting from the post-build self-tests into each, and also place a copy of primenet.py and the registration-created local.ini file into each rundir −the Performance-tune section already described this, except for the local.ini copies. Then, assuming you created a job-management jobs.sh bash-script as described, one level above your various run directories, 'bash jobs.sh' will launch all the needed Mlucas and primenet.py instances. In the primenet.py invocations in my jobs.sh examples, -d enables some useful debug diagnostics, and -T 150 reflects my preferred worktype. The 'nohup' prefix for Mlucas and primenet.py invocations is short for "no hangup", and means the process can be safely invoked on a remote machine and continue after the user logs out of said machine. It also causes all terminal output by the process in question to be diverted to the 'nohup.out' file in the run directory, which is useful for tracking down problems.

As of this writing, the available worktype arguments for the -T flag are as follows:

Code Mnemonic Description
101 DoubleCheck Double-checking LL tests
150 SmallestAvailPRPFirst time PRP tests (Gerbicz)
151 DoubleCheckPRP Doublecheck PRP tests (Gerbicz)
152 WorldRecordPRP World record sized numbers to PRP test (Gerbicz)
153 100MdigitPRP 100M digit number to PRP test (Gerbicz)

Note that as of 8 Apr 2021, the server no longer hands out first-time LL assignments; any requests for such will be converted to LL double-checking assignments. The change is due to the fact that probable-prime (PRP) testing, which supports both the Gerbicz error-checking mechanism and a quickly checkable single-run certificate of correctness (support for the latter will appear in an Mlucas release later this year), has proven itself superior to LL-testing in terms of generating reliable results. Any M(p) flagged as probable primes via PRP test will still be verified using multiple LL tests and various independent software clients, since PRP tests are, as the name implies, only "this number is very likely to be prime" in nature when they yield such a result. (They are, on the other hand, rigorous proofs of compositeness for composite M(p).)

If the -T argument is omitted the script will use the default for this, which is DoubleCheck (numeric value 101. You must be connected to the internet when you launch the script; once it has done its initial work-fetching you can be offline most of the time; the program will simply periodically check whether there are any new results in the run directory in which it was launched; if yes *and* it is able to connect to the PrimeNet server it will submit the new results (usually just one, unless you are offline nearly all the time) and fetch new work; otherwise it will sleep and retry later. The default is to check for 'results to submit/work to get?' every 6 hours; you may override this via the -t option, followed by your desired time interval in seconds. '-t 0' means run a single-shot get-work-to-do and quit, if for some reason you prefer to periodically run the script manually yourself.

If the script runs successfully you should see a worktodo.ini file (if none existed already, the script creates it; otherwise it appends new work to the existing version of the file) with at least 2 LL|PRP-test assignments in it. The script will also periodically check the results.txt file in each run-subdirectory in which it is invoked. Whenever one or more new results are found and a connection to the internet is active during one of these periodic checks, the result is automatically submitted to the PrimeNet server, and the worktodo.ini file in the run directory 'topped up' to make sure it has at least 2 valid entries, the first of which will correspond to the currently ongoing job. Thus, the first time you use it, you just need to run the py-script in each local run directory to grab work to be done, then invoke the Mlucas binary to start the Lucas-Lehmer testing.

Offline Testing:

Users who wish to eschew this can continue to use the PrimeNet manual testing webforms at mersenne.org as described further down on this page, but folks running multiple copies of the program will find the .py-script greatly simplifies things. See the Get exponents from PrimeNet section for the simple instructions. Here's the procedure (for less-experienced users, I suggest toggling between the PrimeNet links and my explanatory comments):

The various PrimeNet work assignments supported by Mlucas are of several different forms (master reference available here). [aid,]Refers to a , with the [] indicating it is optional -- it may be either omitted or "n/a," (case-insensitive) put in its place. A 32-hexit string of zeros can also be used, though this will not be written to any JSON-formatted results.txt file entry, since the Primenet v5 API rejects all-zeros with an ERROR_INVALID_ASSIGNMENT_KEY. For the JSON result formatting, anything but a 32-hexit-and-nonzero value will be ignored. Here are the assignment types supported by Mlucas:

Examples of typical assignments returned by the server follows:

Assignment Explanation

Gets split into:


M48295213 has not been previously LL-tested (otherwise the assignment would begin with "DoubleCheck=" instead of "Test="). The long hexadecimal string is a unique assignment ID generated by the PrimeNet v5 server as an anti-poaching measure. The ",69" indicates that M48295213 has been trial-factored to depth 269 with no factors found. The 0 following the 69 indicates that p-1 still needs to be done, so Mlucas splits the original one into a pair, the Pminus1= of which specifies p-1 factoring to stage 1 and 2 bounds 400000 and 12000000, respectively. If the p-1 fails to turn up a factor, the program will proceed to the LL-test; should p-1 find a factor, both assignments are deleted.
M22831811 has already had a first-time LL test performed, been trial-factored to a depth of 266, and has had p-1 factoring attempted with no small factors found, so perform a second LL test of M22831811 in order to validate the result of the initial test. (Or refute it − in case of mismatching residues for the first-time test and the double-check a triple-check assignment would be generated by the server, whose format would however still read "Doublecheck")

Gets split into:


This is a probable-prime-test assignment, meaning the Gerbicz check will be done at regular intervals throughout. The 4 integers following the hexadecimal assignment ID define the number to be tested as 1*298024161-1, i.e. the mersenne number M98024161. This number has not been previously tested (otherwise the assignment would have 2 additional trailing arguments). The ",77" indicates that M98024161 has been trial-factored to depth 277, and the trailing 2 indicates that p-1 factoring has not been tried, i.e. would save 2 full-length primality tests (a first-time one and a double-check one) were p-1 to find a small factor. Mlucas splits the original one into a pair, the Pminus1= of which specifies p-1 factoring to stage 1 and 2 bounds 800000 and 29000000, respectively. If the p-1 fails to turn up a factor, the program will proceed to the PRP-test; should p-1 find a factor, both assignments are deleted.

This is a PRP-double-check assignment, or PRP-DC for short. M80340397 has been trial-factored to depth 275, has had sufficient p-1 testing done (the 0 following the 75), and already had a first-time PRP test performed, using base 3 and returning residue type 1.

If you are using the PrimeNet manual testing pages rather than the primenet.py script, copy the Test=... or DoubleCheck=... lines returned by the server into the worktodo.ini file, which must be in the same directory as the Mlucas executable (or contain a symbolic link to it) and the mlucas.cfg file. If this file does not yet exist, create it. If this file already has some existing entries, append any new ones below them.

Note that Mlucas makes no distinction between first-time LL tests and double-checks − this distinction is only important to the PrimeNet server.

Most exponents handed out by the PrimeNet server have already been trial-factored to the recommended depth (i.e. will be of the 'Test' or 'DoubleCheck' assignment type), so in most cases, no additional trial-factoring effort is necessary. If you have exponents that require additional trial factoring, you'll want to either return those assignments or, if you have a fast GPU installed on your system, download the appropriate GPU client from the GPU72 project to do the trial factoring.

Lastly, if you wish to Lcas-Lehmer test some non-server-assigned prime exponent, you can simply enter the raw exponent on a line by itself in the worktodo.ini file. To run a PRP-test with its superior hardware-error detection capability, create an assignment of the form

PRP=00000000000000000000000000000000,1,2,[your exponent here],-1,75,0

Then save the file and invoke Mlucas with your preferred -cpu arguments.

The program will run silently in background, leaving you free to do other things or to log out (If running on a remote machine, make sure to prefix the './Mlucas' with 'nohup'). Every 10000 iterations (or every 100000 if > 4 threads are used), the program writes a timing to the "p{exponent}.stat" file (which is automatically created for each exponent), and writes the current residue and all other data it needs to pick up at this point (in case of a crash or powerdown) to a pair of restart files, named "p{exponent}" and "q{exponent}." (The second is a backup, in the rare event the first is corrupt.) When the exponent finishes, the program writes the least significant 64 bits of the final residue (in hexadecimal form, just like Prime95) to the .stat and results.txt (master output) file. Any round-off or FFT convolution error warnings are written as they are detected both to the status and to the output file, thus preserving a record of them when the Lucas-Lehmer test of the current exponent is completed.

p-1 savefiles: During stage 1, the code uses the same redundant p[exp], q[exp] savefile pair as for LL/PRP tests, where [exp] stands for the exponent p of the Mersenne number M(p) being used. On completion of stage 1 the p[exp] file is renamed p[exp].s1; this can later be used to run stage 2 continuations to deeper-than-default bounds, if the user wishes. (This is "advanced usage", not recommended except for power-users who know what they are doing; note that a deeper stage 1 requires rerun-from scratch). Stage 2 uses a single p[exp].s2 savefile; both s1 and s2 savefiles are roughly [exp] bits in size. During p-1 the logfile (p[exp].stat) output will also look very similar to that for LL/PRP tests, except each checkpoint-summary will include either 'S1' or 'S2', e.g.

[2021-07-28 13:22:38] M110763649 S2 at q = 3270330 [ 1.99% complete] clocks = 00:05:34.349 [ 33.3849 msec/iter] Res64: 99F98F04EFD4E49A. AvgMaxErr = 0.124210464. MaxErr = 0.187500000.
For both p-1 stages, 'iter' refers to FFT-based mod-M(p) multiplies (more precisely, "mod-squaring equivalents" in requiring precisely one forward and one inverse FFT); in stage 2 each such multiply processes between 1 and 2 stage 2 primes, the precise value depending on the number of memory buffers allocated.

For LL and PRP-tests, the program also saves a persistent p-savefile every 10M iterations, with extensions .10M, .20M, ..., reflecting which iteration the file contains restart data for. This allows for a partial-rerun − even in parallel 10Miter subinterval reruns, if desired − in case the final result proves suspect.

ADDING NEW EXPONENTS TO THE WORKTODO.INI FILE: You may add or modify ALL BUT THE FIRST EXPONENT, i.e. the current one, which may span 2 lines in case of a Pminus assignment split off from an LL|PRP one) in the worktodo.ini file while the program is running.


For users who prefer not to use the automated Python assignments-management script, to report results (either after finishing a range, or as they come in), login to your PrimeNet account and then proceed to the Manual Test Results Check In. Paste the results you wish to report, that is, one or more lines of the results.txt file (any results which were added since your last checkin from that file) into the large window immediately below. Note that as of v19, the results-line format has changed to so-called JSON (Javascript object notation) format, so here are sample results-entry formats, depending on version:

v18 and earlier:
M86748829 is not prime. Res64: F28B3E531E99C315. Program: E18.0. Final residue shift count = 14725081

v19 writes a similar human-readable line as above to the p[exponent].stat file, but the output to the results.txt file is in JSON format, e.g.:
{"status":"C", "exponent":81253819, "worktype":"PRP-3", "res64":"CE9AB357C704369B", "residue-type":1, "fft-length":4718592, "shift-count":66554884, "error-code":"00000000", "program":{"name":"Mlucas", "version":"19.0"}, "timestamp":"2019-12-01 03:42:39 UTC", "aid":"0CF485CD87F1BAC48B6B05D3A2094579"}

If for some reason you need more time than the 180-day default to complete a particular assignment, go to the Manual Test Time Extension page, check the about-to-expire exponents, then go to the bottom of the page and click on "Extend checked exponents 60 days".

Savefile format and creation

Mlucas creates savefiles (a.k.a. "checkpoint" files) at regular intervals to permit safe restart with in case of a run being interrupted for any reason, such as the host machine going down or a code crash. o bytewise: Residues from LL or PRP-tests or p-1 factoring work are stored in minimum-size and endian-independent form o LL vs PRP+Gerbicz: o LL, PRP-test and p-1 stage 1 residues are stored in redundant savefile pairs. For work on the Mersenne number M[exponent] = 2[exponent] − 1, these files are named p[exponent] and q[exponent]. At the conclusion of a p-1 factoring run, the primary p[exponent] savefile is renamed p[exponent].s1. The reason for this is twofold:


You can track your overall progress (for both automated and manual testing work) at the PrimeNet server's producer page. Note that this does not include pre-v5-server manual test results. (That includes most of my GIMPS work, in case you were feeling personally slighted ;).


Happy hunting − perhaps you will be the person to discover the next Mersenne prime!

GNU Free Documentation License

Copyright (C) 2021 Ernst W. Mayer. Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.3 or any later version published by the Free Software Foundation; with no Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts.

README.html − Last Revised: 04 Apr 2022
Send mail to ewmayer@aol.com