 2019-07-15, 15:05 #1 kriesel     "TF79LL86GIMPS96gpu17" Mar 2017 US midwest 32·719 Posts new participant reference This is intended as a reference thread. Do not post here. Post comments at https://www.mersenneforum.org/showthread.php?t=23383 instead. Posts placed here may be moved or deleted without warning or recourse. This post New participant guidance https://www.mersenneforum.org/showpo...64&postcount=2 Background https://www.mersenneforum.org/showpo...65&postcount=3 How much work is it to do x https://www.mersenneforum.org/showpo...45&postcount=4 GIMPS glossary https://www.mersenneforum.org/showpost.php?p=533167&postcount=5 Older reference thread https://www.mersenneforum.org/showpost.php?p=533285&postcount=6 Best practices https://www.mersenneforum.org/showpo...18&postcount=7 OS fundamentals for GIMPS GPU application use https://www.mersenneforum.org/showpo...19&postcount=8 Why no one should run LL if they can run PRP with proof generation instead https://www.mersenneforum.org/showpo...06&postcount=9 Nick's Basic Number Theory Series https://www.mersenneforum.org/showpo...5&postcount=10 tbd if any Top of reference tree: https://www.mersenneforum.org/showpo...22&postcount=1 Last fiddled with by kriesel on 2022-02-17 at 12:58 Reason: added Nick's Basic Number Theory Series links post
How much work is it to do x

Effort can be computed at https://www.mersenne.ca/credit.php for TF, P-1 factoring, or LL testing. Effort is expressed in Ghz-Days, the measure of what one core of a Core2 cpu running at 1Ghz could do in a day. Estimated performance of a given GPU is available at https://www.mersenne.ca/mfaktc.php for TF and https://www.mersenne.ca/cudalucas.php for other work types. Gpuowl performance with a recent 6.11-x or 7.x version is considerably better than indicated there.

To TF a Mersenne number with exponent 100M from starting bit level 73 to finishing bit level 76 is 133.9 GhzDays.
Double the exponent is about half the effort for equal bit levels. Each bit level is twice as much effort as the one preceding.
Note, in TF a GhzDay is not comparable to a GhzDay for other computation types, since GPUs are MUCH faster at TF. The ratio can be 11:1 ranging up to 40:1 or higher depending on GPU model and computation parameters.

To P-1 factor a Mersenne number with exponent ~100M to PrimeNet bounds B1=1040000,B2=28080000 is 13.90 GhzDays. This scales similarly to how PRP or LL testing do, ~p2.1.

GCD phase of P-1 or P+1 run time is O(p (log p)2 log log p), and strongly dependent on CPU core speed since known GIMPS implementations use single-threaded gmplib for GCD. For p~110M, Xeon Phi 7210, GCD time ~5.7 minutes. Run time scaling is in the range of p relevant to DC and upward to 1G, ~p1.14. In most applications GCD runs sequentially, stalling other CPU cores of a worker, or a GPU, for the duration of the GCD, while in some versions of Gpuowl it runs in parallel with the next P-1 stage or next assignment if a valid one exists in the worktodo file.

Server confirmation of a reported factor for TF or P-1 is a trivially fast computation.

To LL test a Mersenne number with exponent ~100M is 381.39 GhzDays. For ~110M it is ~482 GHzDays, or about a day on a Radeon VII gpu in a relatively recent version of gpuowl. (But do PRP with GEC and proof generation instead for greater reliability and efficiency.)
Effort scales as p log p log log p per iteration, or about p2.1 per test.

LL Double checking ("LLDC") and the occasional triple check, quadruple check, etc. are the same effort per attempt as a first test for a given exponent. Therefore, first testing using LL should cease as soon as possible. Using PRP with proof generation instead is more than twice as efficient, given LL's real world higher error rate and extremely high verification cost and extreme delays in verification time of occurrence. (Eight years is not unusual.)

To PRP test a Mersenne number is basically the same effort as an LL test. In gpuowl on a Radeon VII that could be a day for ~110M. On a Core 2 Duo it could be 11 weeks or more.

Gerbicz error check (GEC) as a fraction of a PRP, depends inversely on block size, typically ~0.2% of a PRP test at block size 1000. Overhead * blocksize ~ constant.

Jacobi symbol check, as a fraction of an LL test, depends on frequency, typically ~0.3% of an LL test.

PRP DC (without proof and verification as below) is the same effort as a first PRP test for the same exponent. Upgrade to proof generation capability as soon as possible.

PRP proof generation and verification
Total effort, assuming a single verification on a system separate from the PRP tester/proof-generator system and server, is, for a 100M exponent, approximately:
Code:
A) power= 8,  3.2 GB temporary disk space needed, proof file size 113MB, 413K squarings = 0.41% of a full DC, default
B) power= 9,  6.4 GB temporary disk space needed, proof file size 125MB, 239K squarings = 0.24% of a full DC
C) power=10, 12.8 GB temporary disk space needed, proof file size 138MB, 182K squarings = 0.18% of a full DC.
Proof generation as a fraction of a PRP, for a 100M exponent:
Code:
A) power= 8,  3.2 GB temporary disk space needed, proof file size 113MB, computation ~0.02% of a full DC, default
B) power= 9,  6.4 GB temporary disk space needed, proof file size 125MB, computation ~0.04% of a full DC;
C) power=10, 12.8 GB temporary disk space needed, proof file size 138MB, computation ~0.08% of a full DC.
In practice it is somewhat longer, with gpuowl proof generation for power 8 taking about 0.07% of elapsed time, which includes SHA3 hashes, disk reads, misc. other small activities. Temporary space increases about proportionally to exponent, so power 10, 1G would be around 130GB per working instance!

Prime95 will reserve proof generation required disk space at the beginning and hold it for the duration, releasing the temporary disk space upon completion. "As exponents increase, squarings, disk space, and proof size increase roughly linearly." https://www.mersenneforum.org/showpo...1&postcount=75

For Gpuowl, maximum working system ram during proof generation for proof power 9 was observed in Task Manager as ~0.25 GB, which only takes about a minute at the end of a PRP computation for p~104M, occupying 1 cpu core. Ram in use increased as it began at level 1 and successively built higher levels of the proof, with ~0.25 GB seen as it performed the level 9 proof build step.

Server computation related to PRP proof is a small fraction of the total verification effort, at 1414 squarings ~14 ppm of a PRP test for p~100M, power 8; 1577 squarings ~16 ppm for power 9. It's unclear how that varies versus exponent. https://www.mersenneforum.org/showpo...&postcount=189
Note, the server CPU is SSE2 hardware and its code is based on gwnum routines, so is limited to handling up to ~595.8M exponent automatically. Higher requires manual intervention by George.

PRP Proof Verification as a fraction of a PRP or PRPDC, for a hypothetical 100M exponent:
Code:
A) power= 8, proof file size 113MB, topk= ceiling(p/28)*28 = 100M, topk/28 = 390,625 squarings = 0.39% of a full DC
B) power= 9, proof file size 125MB, topk= ceiling(p/29)*29 = 100000256; topk/29 =  195313 squarings ~0.195% of a full DC
C) power=10, proof file size 138MB, topk= ceiling(p/210)*210 = 100000768; topk/210 = 97657 squarings = 0.098% of a full DC.
Power 7 would be 0.78% https://www.mersenneforum.org/showpo...5&postcount=46

Overall, LL vs. PRP compared:
LL + DC + occasional TC, QC, etc, ~2.04 tests at ~100M exponent, ~2.5 tests at 100Mdigits, to get a matched pair of res64s, which are presumed to constitute verification of those two runs. (There are some bugs which will cause erroneous residues that are far from random.)
PRP with GEC & proof generation & cert: ~1.01 test equivalent, to get a proven correct result.
PRP's error detection is far superior, and the overall project efficiency is more than double that of LL. (Increasingly so at larger exponents.)
That's why first time LL assignments are no longer issued by the PrimeNet server.
The reliability of LL has historically been a declining function with exponent increase. Longer run times create more chance of computing error that may escape detection.
That strengthens the case against LL which inherently has inferior error detection and recovery, as exponent and run time increase.

For first tests, run PRP with GEC & proof generation whenever possible. Only run LL with its lesser error detection and lesser efficiency, if PRP is not possible.
The preceding is for implementations of equal efficiency on equal or equivalent hardware. If comparing recent gpuowl to CUDALucas or ClLucas, add about another factor of 2 disadvantage for LL, and note neither of them include the Jacobi symbol check. Just don't LL!

To find the next Mersenne prime, compared to the current largest. R D Silverman lays it out at https://www.mersenneforum.org/showpo...58&postcount=8 as approximately 8 times as much effort, based on conjectures about the expected distribution. GIMPS has had a very lucky run for the past several years where the Mersenne primes have been more closely spaced recently, than expected on the average.

If the remaining number of Mersenne primes with exponent p<109 fits conjectures, there are 6 left to find. A rough estimate of time to complete the search of p<109 is 150 years. If they are equally spaced in time that's 25 years apart. That's far longer than GIMPS previous experience, averaging ~17/25 ~ 0.68 per year.

(Particular thanks go to Preda and Prime95 who helped me understand the proof and verification resource usage)

Top of reference tree: https://www.mersenneforum.org/showpo...22&postcount=1
 2019-12-20, 16:55 #6 kriesel     "TF79LL86GIMPS96gpu17" Mar 2017 US midwest 145078 Posts Older reference thread There is a reference thread from 2003 by PrimeMonster at https://www.mersenneforum.org/showthread.php?t=1534 Top of reference tree: https://www.mersenneforum.org/showpo...22&postcount=1 Last fiddled with by kriesel on 2019-12-20 at 16:58
 2022-01-02, 14:32 #8 kriesel     "TF79LL86GIMPS96gpu17" Mar 2017 US midwest 32×719 Posts OS fundamentals for GIMPS GPU application use Modern OSes offer both GUIs and command line interaction. Learn the fundamentals of both, before attempting to run GIMPS GPU applications. GIMPS GPU applications are not GUI applications, they are console text applications. As such they can be invoked from Windows batch files or Linux shell scripts. Attempts to launch GPU applications from the GUI side might work, but if they don't, which is almost certain the first tries, you probably will not get a chance to see what's wrong. Run them from the command line in a session that will remain long enough to show and allow capture of the output including error messages from the program or OS. A new user will not be successful in running GIMPS GPU applications without understanding adequately most of the following, for the OS in use: How to launch a command line session that will remain for review until the user intentionally closes it. How to obtain help in general or specific to one command or command option from the command line. Proper syntax of essential commands and some of their options. How to obtain a directory listing. How to display and modify file and directory ownership and permissions, and which are needed. How to create, move, or delete a file or directory. How to change current directory. How to edit an existing text file and save the changed file to the same filename or different filename. How to launch a program from the command line. How to redirect program input and output. The difference between stdout and stderr. How to download and install a program. How and whether to obtain, install, and use a tee program. The difference between append and overwrite. How to find, obtain, and install third-party utilities. How to obtain and install OS updates. How to obtain and install graphics card, OpenCL or CUDA drivers. How to check that a graphics card, OpenCL, or CUDA driver is installed and functional. Basic troubleshooting techniques. How to create and use and modify batch files on Windows or shell scripts on Linux. How to check and document OS & version, application and version, program inputs and outputs, etc. How to write a complete accurate useful actionable specific trouble report when first asking for help from others; how to THOROUGHLY & PRECISELY yet concisely document EXACTLY what was attempted and what the resulting responses were. Know the difference between the environment strings, command line environment, the command prompt, and the various commands that can be used at the command prompt. (Even some so called experts who advise others online in commercial blogs get some of those distinctions wrong.) What all the preceding mean. (Are there more?) Some possible resources for learning these things can be found online, such as Linux: https://ubuntu.com/tutorials/command...ers#1-overview Windows: https://docs.microsoft.com/en-us/win...ndows-commands There are also "help" on Windows and "man" on Linux. Top of reference tree: https://www.mersenneforum.org/showpo...22&postcount=1 Last fiddled with by kriesel on 2022-01-02 at 19:20
 2022-02-10, 18:04 #9 kriesel     "TF79LL86GIMPS96gpu17" Mar 2017 US midwest 32·719 Posts Why no one should run LL if they can run PRP with proof generation instead Run time, reliability, efficiency. All new first primality tests should be PRP type, with proof generation whenever possible. PRP/GEC/proof/Cert is much more reliable, and effectively twice as fast, since it essentially eliminates the typical 2% LL error rate leading to occasional triple checks or further, and reduces verification to typically under 1% of what an LL DC requires (>99% reduction!). If the choices are between Gpuowl PRP or CUDALucas LL, the savings are even more dramatic, nearly 4:1. PRP with proof generation also provides much faster indication of system reliability (typically hours), than does LL. (Typically years, 8.2 in that example, for an LL DC to occur; examples over 9 years also exist, including: https://www.mersenne.org/report_expo...6440291&full=1 https://www.mersenne.org/report_expo...7890291&full=1 https://www.mersenne.org/report_expo...5182317&full=1 https://www.mersenne.org/report_expo...4926731&full=1 https://www.mersenne.org/report_expo...7082673&full=1 https://www.mersenne.org/report_expo...5916561&full=1 https://www.mersenne.org/report_expo...7224663&full=1 https://www.mersenne.org/report_expo...7357359&full=1 https://www.mersenne.org/report_expo...7458873&full=1 https://www.mersenne.org/report_expo...7939197&full=1 etc. Ten years: https://www.mersenne.org/report_expo...2197123&full=1) Since the reliability advantage of PRP/GEC over LL also applies to double checking, most LL first tests will never have LLDC run. They will instead be verified composite more reliably and more quickly by a first time PRP/GEC/proof and cert. So in the usual >999,999 ppm case of a composite Mersenne number, an LL first primality test is a complete waste of time and computing resources. (An LL DC on an LL first test is also a small percentage waste of time, and is much less reliable than PRP/GEC. As is LL TC, LL QC, etc compared to PRP/GEC/proof/cert for reasonably close to optimal proof powers.) Mprime/prime95 (~v30.3 or higher), and GpuOwl (~v6.11-316 and later) are available in proof-capable forms. Mlucas proof file generation capability was planned for V20.x, but I think that slipped past v20.1 to V21 because P-1 implementation in V20 took a lot of effort, followed by some bug finding and fixing fun, with current latest available Mlucas at V20.1.1 2021-12-02 tarball and a separate 2022-02-08 patch. (Lucas-Lehmer tests will still be used if/when PRP returns a probable prime indication for a suspected new Mersenne prime discovery, to confirm by multiple independent tests by separate hardware, software, and participants. The person reporting a PRP probable prime indication from a first test would still be regarded as the discoverer.) In occasional sampling ~2021-01-24 of over 130 primality testing assignments in 101-102M, 38% were LL, 62% PRP. That LL frequency was much too high. As of 2021-01-27 1540 UTC in 101-102M primality testing of 273 active assignments, 57 LL, 21%, better. For 101M-103M as of 2021-03-28 ~2010 UTC, 23 LL of 132, 17%. On 2021-04-21 ~1500 UTC, for 102M-104M, first 1000 contained only 41 LL, 4.1% On 2021-04-30 ~1800 UTC, for 102M-104M, first 1000 contained only 11 LL, 1.1%. Let's keep moving away from inefficient LL first tests, that are very slowly double-checked, toward complete conversion to the far more reliable and efficient PRP/GEC/proof/Cert that also provides very rapid verification. A check of "recently cleared" 2021-07-19 showed a mix of primality tests for exponents >100M, 12 LL (mostly elapsed time over a year) vs. 924 PRP, ~1.3%. Drive LL percentage toward 0%, PRP with proof generation toward 100%. As of 2021-08-28, https://www.mersenne.org/assignments...chk=1&excert=1 shows only PRP in 1000 queued tests. It's unclear how many will produce a proof file. GPUs that can run either CUDALucas or Gpuowl should run PRP/GEC/proof on Gpuowl as primality tests, not LL, and not CUDALucas. (That would include all Google Colab GPU models. The Google Drive free space is sufficient.) Exceptional cases If attempting primality testing above 1G exponent, there is currently nowhere to submit proof files or primality test results, and no software to certify them above 1.17G. Proof files should be preserved where they are generated and held until that changes. PRP DC with GEC is likely to work. LL is likely to take dozens of attempts, or paired attempts with frequent interim residue comparison manually, to obtain matching final residues. If running on very low storage space hardware, run low proof power PRP with proof. (Mprime/prime95, and numerous versions of gpuowl can do that. Eventually Mlucas will also have proof generation capability.) If running on GPUs or CPUs so old that PRP is not available for them, consider replacing with newer more energy efficient hardware. If part of the select small team confirming a PRP-P result is in fact a new Mersenne prime discovery, then LL is necessary to prove primality, and should be performed on the fastest available reliable hardware in differing software by multiple individuals. For large exponents it is likely comparison of interim residues will be useful in indicating reliable progress or detecting computation errors along the way. Top of reference tree: https://www.mersenneforum.org/showpo...22&postcount=1 Last fiddled with by kriesel on 2022-03-07 at 19:59 Reason: correction re status of Mlucas proof gen

