mersenneforum.org  

Go Back   mersenneforum.org > Extra Stuff > Blogorrhea > kriesel

Reply
 
Thread Tools
Old 2023-03-14, 15:38   #1
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

2·17·227 Posts
Default GP_LLT

GP_LLT, run time scaling and other testing

GP_LLT is a Windows-only GUI application offering TF and LL implementation only, for Mersenne exponents ranging 67 to 4294967291.
For background, see https://mersenneforum.org/showthread.php?t=28365. I suggest reading that entire thread first, then continuing here.

I recently installed GP_LLT v2.2 and its related software on an i5-1035G1 64 GiB dual-channel Windows 10 Home laptop and tried it out. The system was prepped with a new boot drive, an assortment of useful tools collected and stored on a USB stick, and run with networking disabled.
My focus was on the LL test capability. I did not test it for TF. See https://mersenneforum.org/showpost.p...6&postcount=35, for a report its TF is "more than a thousand times slower" than state of the art CPU TF (which is in turn, slower and less efficient than GPU-based TF).

Things I wish were different in GP_LLT https://mersenneforum.org/showpost.p...20&postcount=2
Observations https://mersenneforum.org/showpost.p...22&postcount=3
Conclusions https://mersenneforum.org/showpost.p...70&postcount=4


Top of reference tree: https://mersenneforum.org/showthread.php?t=24607

Last fiddled with by kriesel on 2023-03-15 at 23:17
kriesel is offline   Reply With Quote
Old 2023-03-14, 20:50   #2
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

2×17×227 Posts
Default Things I wish were different in GP_LLT

Things I wish were different in GP_LLT:
  1. Source code availability. In the absence of other thorough documentation, source code is very useful. It is also essential for any collaborative effort to improve performance or features.
  2. Plain text documentation availability.
  3. Exponent min and max are enforced on the fly as the user enters an exponent in the program's GUI.
    An exponent character string representing value < 67 will be replaced with 67; one > 4294967291 will be replaced by 4294967291, overwriting whatever the user has entered up to that moment.
    The practical effect is one must delete some unwanted characters present from the previous exponent value used, type a sufficiently large part of the intended exponent, without exceeding the maximum, delete remaining unwanted characters, then finish typing the intended exponent.
  4. Up and down arrows provided to increment or decrement exponent are not a productive interface for covering the very broad range 67 to 4294967291.
  5. There is no implementation of a worktodo input file in evidence. Nor command line parameters. Input must be interactive through the Windows GUI as above.
  6. The program blocks LL testing or benchmarking of exponents with known primality status, known LL results or known factors. (At least below 100M+ it does. A 100Mdigit exponent 332,192,831 with certified PRP result could be run.)
    This blocked 4 of my 7 preselected and pretested run time scaling exponents, which were the prime numbers just above 10^n for n=5 to 9 and minimal 100Mdigit and minimal 1Gdigit without known factor.
    It would be good if there was a GUI button allowing override of that blocking, for benchmarking purposes.
  7. There is no display in the GUI of the average time per iteration or time for the previous iteration, or total elapsed time of the computation for the exponent.
  8. Documentation is lacking. It ran on an AVX512 laptop, but there's no indication what its minimum requirements are. (AVX512? AVX2? AVX? SSE2? X86_64? I haven't the time currently to experiment with it on other systems to sleuth that out. It seems a simple matter for the author to remember after programming in assembly, or go look at the assembly source, but he was unwilling to provide the information, for some unknown reason.)
  9. There is no plain text work input file, log file or results file output. Any would be useful.
  10. There are interim save and journal files, in binary form.
    <exponent>.rdl consists of 8-byte pairs of res64 and timing. Timing is in resolution units of 100 nanoseconds, for the iteration.
    <exponent>.snp format is unclear.
    Providing clear file format documentation would be useful.
  11. The Proof viewer program will display residues and iteration timings even while the .rdl is shown to contain zero bytes in Explorer and confirmed empty by opening in FrHed (a free hex editor). Presumably it is drawing this data from the .snp file.
    For large exponents, the .rdl file where res64/iteration timing pairs get stored, remains empty for a very long time. (See also the paragraph at https://mersenneforum.org/showpost.p...7&postcount=39 in which the software author states pairs are accumulated and held in the .snp file, then written to the .rdl in blocks of 32Ki.)
    I think clicking suspend causes contents there, or writing a .snp file.
    But suspend and other manual actions throw off elapsed wall clock time as benchmark timing.
    A flush to disk every n minutes (n settable by the user) would be welcome.
  12. The .snp file once written is large enough to hold a single packed binary full residue. There is only one per exponent. Storing only a single copy at the last completed iteration is very dangerous. All progress can be lost due to various errors.
    Most GIMPS primality testing software automatically stores at least two separate save files, in case there's a write failure for any reason.
    One could write a batch script to make copies every n hours under different file types eg <exponent>.snp.bk1 etc. It's better if the software implements that, and recursive recovery attempts.
  13. GP_LLT uses all 8 hyperthreads on the i5-1035G1, and makes the GUI very unresponsive most of the elapsed time, while running large exponents. There are brief periods when the mouse cursor is responsive, perhaps while the .snp file is being updated between iterations or other events occur. This is quite different from most GIMPS applications, which either run at lower priority, or use other measures, to keep the impact on normal system use minimal and/or controllable by configuration available to the end user.
  14. It performs only TF factoring or LL, not PRP or P-1 factoring. (I did not test the TF.)
  15. It probably lacks the Jacobi check and other long known checks applicable to LL, used in prime95 and other software (roundoff error checking, sum of inputs not matching sum of outputs). Its author puts a lot of faith in ECC ram, which most of the GIMPS hardware fleet will lack, and mentions no other measures such as roundoff error checking, Jacobi symbol check, sum of inputs vs. sum of outputs, etc. GP_LLT does use a CRC32 check on file contents (.snp file contents I think).
  16. The absence of PRP means the best available error checking, GEC, necessarily is absent. Therefore the odds of correct results from its very long runs are poor. I would like to see GP_LLT add PRP, GEC, and proof generation, if/after the severe run time scaling issue is addressed.

Top of reference tree: https://mersenneforum.org/showthread.php?t=24607

Last fiddled with by kriesel on 2023-03-15 at 22:44
kriesel is offline   Reply With Quote
Old 2023-03-14, 21:08   #3
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

170468 Posts
Default Observations

Observations:
  1. At exponents below ~1M, run time is about linear with exponent, because it only implements one fixed 64K word fft, and that width is more than enough up to ~1.2M exponent. This makes it markedly slower at 100K exponent, even while using as many threads as possible, than a simple single-threaded perl PRP or LL test program, on the same hardware.
  2. Task manager indicates saturated use of one hyperthread at 1M exponent, and 8 threads at 10M or larger test exponents used. This is consistent with the likely number of 64k ffts being employed and an assumption that the 64k fft is itself single threaded.
  3. The program's accompanying database has some unexpected gaps. In one case it allowed LL test of a modest exponent (100019) with a known mere 21 bit factor, despite the code intended to block testing of exponents with known factors.
  4. The GP_LLT program does not compute all the iterations. It skips five at the beginning.
    Since its minimum exponent is 67, and the seed value four, it can preload the fixed precomputed 61 bit result of iteration 5 in a single 64 bit word and begin from there. (See also https://www.mersenneforum.org/showpo...41&postcount=3 from a few years ago.)
  5. The GP_LLT program seems inconsistent in iteration count. Its GUI seems to indicate it will compute p-1 iterations.
    It took a while for me to realize this was an error of interpretation; it displays the number of the current iteration, and includes the current iteration in the subtotal yet to be completed; the sum of those two is p-1. It stops at p-2, correctly. It avoids the first five as noted above.
  6. The proof viewer program seems to misrepresent iteration timings as xx,xxx.x msec, (milliseconds, 10-3 seconds), while the timing actually corresponds to xx.xxx.x microseconds, 10-6 second units. The SI abbreviation for milliseconds is msec; for microseconds a Greek mu is needed. See https://www.nist.gov/pml/owm/metric-si-prefixes When limited to ASCII character sets, the mu is often approximated as u, which does not conflict with any of the other SI prefixes listed there.
  7. Because of the known-status blocking implemented by GP_LLT, I ran GP_LLT iteration timing on the nearest allowed exponents instead, and scaled the run time according to exponent where needed. The difference between p-1, p-2, and p-7 is insignificant in the range p=105 and up. (None of the test exponents used were close enough to a multiple of 65536 for the minor exponent differences to affect the number of 64Ki fft superwords width in GP_LLT's operation.)
  8. The run time of GP_LLT is far longer than commonly used GIMPS software, on the same hardware. Orders of magnitude longer. I found no scenario in which GP_LLT ought be used for GIMPS goals.
    GP_LLT run time per primality test at exponents of current interest, p>100M, scales as O(p~3); depending on how many values are used or excluded from a graph fit, 2.88 to 2.97. Commonly used GIMPS applications scale as O(p~2.1), which is an approximation to p2 log p log log p.
    GP_LLT run time at modest exponents is hundreds of times longer than for prime95 on the same hardware, at best, (146.7-fold at 1M; 1765.-fold at 100M; 14643.-fold at 1G) and is thousands to tens of thousands of times longer at exponents of current and future prime-seeking interest. (Hundreds of thousands of times slower than gpuowl on a good GPU.)
    At OBD, individual GP_LLT iterations are estimated to take 6+ hours each on the i5-1035G1, corresponding to a primality testing time of 2.3+ MILLION years. A factor of 10, 100 or 1000 faster hardware is not sufficient to overcome that. A prudent user would use faster hardware with faster software also.
    It's even slower using all 8 hyperthreads, above ~150M exponent, than my simple primitive single threaded perl scripts (which are also too slow to be of any practical prime hunting use).
  9. Memory usage is about linear with exponent. No paging was observed.
    M99999847 469M ram occupied, ~4.918 times exponent size in packed binary.
    M1000000007 4632M ram occupied, ~4.857 times exponent size in packed binary.
    M3321928171 15660436K = 15293M ram occupied; ~4.827 times exponent size in packed binary, about equal to one packed binary and one at 17.5 bits/64bit word fft layout.
    This means p ~ 2^32 would require more than ~20 GiB of installed ram to avoid thrashing. The test system has 64 GiB installed. Ram is not limiting; run time is.
  10. On very short runs, e.g. 100019 (~27 minutes on i5-1035G1), it produced the correct final residue. Runs of days, weeks, or months are common in the more efficient software. GP_LLT reliability on longer runs is undetermined.
  11. ECC ram is not sufficient to forestall some run time errors. See for example, at https://mersenneforum.org/showpost.p...&postcount=864, the reproducible difficulties (on two different ECC ram equipped systems) with M333043493 in prime95 v30.8b15 at 18M fft length, finally resolved by forcing 20M fft length.
  12. .SNP files occupy at least p/8 bytes plus some small additional amount, as little as 200 bytes, increasing I think with number of iterations that have been run.
    There is apparently no provision for backup whole residues implemented in the program, to retreat to if an error occurs.
  13. In https://mersenneforum.org/showpost.p...7&postcount=30 I computed some lower bounds for GP_LLT iteration time for a ~1G exponent. Comparing actual run time shows it over 50 times slower than that bound:
    Code:
    Exponent  Low bound sec/iter  Measured sec/iter  ratio actual/bound
    1000000007       34.1                2010.             58.9
  14. In a run on OBD exponent 3321928171, taking more than 3 days, GPT_LL produced an empty .rdl file and no .snp file at all, during the run, so the program calculated timing, and interim res64s could not be viewed with a hex editor during a continuing run. Iteration timing was estimated by elapsed time and display of iteration count. Presumably if the OS or program had crashed, all progress would have been lost. After suspending the run, and then selecting exit the program, a .snp file was created. This claimed 14 iterations, but only residues for #6 thru 13 (8 total) were displayed by the proof viewer. The iteration numbers could be selected for copy/paste in the proof viewer display, but not the interim residue values or timing values. Average of the recorded iteration times was 22,165. seconds per iteration. That extrapolates to a total run time of about 2.3 million years.
  15. Responsiveness of the test system's GUI is substantially affected by the exponent being run. 10M or below was normal responsiveness. 100M was laggy; 332M very laggy, 1G worse; 3.32G extremely difficult and slow to accomplish anything, basically useless for anything else. At 3.32G it was not possible to launch Task Manager.
  16. The ETA computation/display in the GP_LLT program becomes quite capricious at very large exponent. I saw fluctuations of thousands of years from time to time, including times to completion slightly earlier in the day than when viewed, while billions of iterations remained to be performed, at several hours each. Perhaps the elapsed time computation is overflowing.
  17. A small exponent I ran to completion (100,019) produced a result matching the output of my LL perl script.

Top of reference tree: https://mersenneforum.org/showthread.php?t=24607
Attached Files
File Type: pdf multiapplication primaliy test run time scaling.pdf (27.5 KB, 19 views)
File Type: pdf ram usage vs exponent.pdf (17.1 KB, 21 views)

Last fiddled with by kriesel on 2023-03-15 at 23:13
kriesel is offline   Reply With Quote
Old 2023-03-15, 21:33   #4
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

2×17×227 Posts
Default Conclusions

Conclusions:

GP_LLT v2.2 is unsuitable for production GIMPS use.
(As are my quickly created perl scripts for LL and PRP.)
If the GP_LLT author would like to try again, that would be good.
A major redesign and rewrite would be needed.


Top of reference tree: https://mersenneforum.org/showthread.php?t=24607

Last fiddled with by kriesel on 2023-03-15 at 21:36
kriesel is offline   Reply With Quote
Reply

Thread Tools


All times are UTC. The time now is 13:53.


Mon Jun 5 13:53:52 UTC 2023 up 291 days, 11:22, 0 users, load averages: 1.18, 0.99, 1.04

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2023, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.

≠ ± ∓ ÷ × · − √ ‰ ⊗ ⊕ ⊖ ⊘ ⊙ ≤ ≥ ≦ ≧ ≨ ≩ ≺ ≻ ≼ ≽ ⊏ ⊐ ⊑ ⊒ ² ³ °
∠ ∟ ° ≅ ~ ‖ ⟂ ⫛
≡ ≜ ≈ ∝ ∞ ≪ ≫ ⌊⌋ ⌈⌉ ∘ ∏ ∐ ∑ ∧ ∨ ∩ ∪ ⨀ ⊕ ⊗ 𝖕 𝖖 𝖗 ⊲ ⊳
∅ ∖ ∁ ↦ ↣ ∩ ∪ ⊆ ⊂ ⊄ ⊊ ⊇ ⊃ ⊅ ⊋ ⊖ ∈ ∉ ∋ ∌ ℕ ℤ ℚ ℝ ℂ ℵ ℶ ℷ ℸ 𝓟
¬ ∨ ∧ ⊕ → ← ⇒ ⇐ ⇔ ∀ ∃ ∄ ∴ ∵ ⊤ ⊥ ⊢ ⊨ ⫤ ⊣ … ⋯ ⋮ ⋰ ⋱
∫ ∬ ∭ ∮ ∯ ∰ ∇ ∆ δ ∂ ℱ ℒ ℓ
𝛢𝛼 𝛣𝛽 𝛤𝛾 𝛥𝛿 𝛦𝜀𝜖 𝛧𝜁 𝛨𝜂 𝛩𝜃𝜗 𝛪𝜄 𝛫𝜅 𝛬𝜆 𝛭𝜇 𝛮𝜈 𝛯𝜉 𝛰𝜊 𝛱𝜋 𝛲𝜌 𝛴𝜎𝜍 𝛵𝜏 𝛶𝜐 𝛷𝜙𝜑 𝛸𝜒 𝛹𝜓 𝛺𝜔