2021-06-18, 16:49 | #12 |
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
2^{6}×3×37 Posts |
Use as hardware reliability test
There's a pretty good video on this at https://www.youtube.com/watch?v=n0U7fPKRlVs.
There are what occur to me as some inaccuracies in that youtube video. Hyperthreading should not usually be used in primality testing; performance is usually better without employing the additional threads in prime95 / mprime. Prime95 includes disclosure of cpu type, number of cores, whether hyperthreading is available, instructions supported, cache sizes etc. Options, CPU... It will not test the reliability of your GPU, IGP, PCIe slots, etc. Consider Gpuowl, CUDALucas -memtest, mfakto or mfaktc selftest, or other GPU GIMPS applications for that. And actual hardware test software. Top of reference tree: https://www.mersenneforum.org/showpo...22&postcount=1 Last fiddled with by kriesel on 2021-11-10 at 18:14 |
2022-08-13, 20:25 | #13 |
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
2^{6}×3×37 Posts |
Memory use during PRP
P-1, P+1, or ECM factoring may benefit from considerable allowed RAM use. Primality testing via PRP or LLDC has less need of RAM. For exponents up to ~120M, a few hundred MB per worker is enough. Even dual workers at 500+M exponent each do not use 3GiB of RAM on systems with several times that or more installed.
It looks like from the small charted set of data, that ~2.6 bytes times sum of exponents being primality tested at the moment on the system is a usable rough estimate of required RAM, in the absence of any of the memory-hungry factoring algorithms. I don't think Windows version matters. Data were collected from Vista to Windows 11. Prime95 version probably does not matter much either. Top of reference tree: https://www.mersenneforum.org/showpo...22&postcount=1 Last fiddled with by kriesel on 2022-08-13 at 20:47 |
2022-09-08, 05:46 | #14 |
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
2^{6}·3·37 Posts |
Interpreting the error-counts 32-bit word
(draft)
I haven't verified that the output form in results.txt or results.json.txt matches the internal storage form, but it seems likely. from prime95 v30.8b15 source module commonb.c starting line 6116 Code:
/* Increment the error counter. The error counter is one 32-bit field containing 5 values. Prior to version 29.3, this was */ /* a one-bit flag if this is a continuation from a save file that did not track error counts, a 7-bit count of errors that were */ /* reproducible, a 8-bit count of ILLEGAL SUMOUTs or zeroed FFT data or corrupt units_bit, a 8-bit count of convolution errors */ /* above 0.4, and a 8-bit count of SUMOUTs not close enough to SUMINPs. */ /* NOTE: The server considers an LL run clean if the error code is XXaaYY00 and XX = YY and aa is ignored. That is, repeatable */ /* round off errors and all ILLEGAL SUMOUTS are ignored. */ /* In version 29.3, a.k.a. Wf in result lines, the 32-bit field changed. See comments in the code below. */ void inc_error_count ( int type, unsigned long *error_count) { unsigned long addin, orin, maxval; addin = orin = 0; if (type == 0) addin = 1, maxval = 0xF; // SUMINP != SUMOUT else if (type == 4) addin = 1 << 4, maxval = 0x0F << 4; // Jacobi error check else if (type == 1) addin = 1 << 8, maxval = 0x3F << 8; // Roundoff > 0.4 else if (type == 5) orin = 1 << 14; // Zeroed FFT data else if (type == 6) orin = 1 << 15; // Units bit, counter, or other value corrupted else if (type == 2) addin = 1 << 16, maxval = 0xF << 16; // ILLEGAL SUMOUT else if (type == 7) addin = 1 << 20, maxval = 0xF << 20; // High reliability (Gerbicz or dblchk) PRP error else if (type == 3) addin = 1 << 24, maxval = 0x3F << 24; // Repeatable error if (addin && (*error_count & maxval) != maxval) *error_count += addin; *error_count |= orin; } 1 & 0xC most significant two bits unassigned 0x12 && 0x3F is the repeatable error count field with max value 63 base 10; 3 is GEC error field, max value fifteen 4 is Illegal Sumout field, max value fifteen 5 & 4 is FFT data zeroed error bit field, 0 or 1 5 & 8 is corrupted data indicator bit field, 0 or 1 0x56 & 3F is roundoff error > 0.4 field, max value 63 base 10; 7 is Jacobi symbol error check field, max value fifteen 8 is suminp!=sumout errors field, max value fifteen Top of reference tree: https://www.mersenneforum.org/showpo...22&postcount=1 Last fiddled with by kriesel on 2022-09-09 at 01:20 |
2022-09-25, 17:08 | #15 |
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
2^{6}×3×37 Posts |
P-1 performance
Mprime / prime95 V30.8 introduces an enhancement in P-1 performance, using polynomials to achieve almost 100% pairing of primes in stage 2. This allows cost-effectively factoring to higher stage 2 bounds, and achieving higher factor found probability, saving more primality tests.
Use V30.8 or later for P-1 factoring. Use adequate memory to enable the gains. A few GB is better than only running stage 1. The savings are approximately logarithmic with allowed memory, so 16 GiB is good, 32 is better, more is better yet, until risking the onset of slowdown by paging/swapping which can cut performance drastically and change the expected gain into a large loss. Prime95's GUI limits allowed stage 2 ram to 90% of installed physical system ram. That limit can be overridden by editing local.txt's Memory= line with a text editor, then restarting prime95. Use adequate memory and bounds the first time P-1 is run on an exponent. "Optimizing" by running P-1 to low bounds first, selected to maximize factors found per unit of initial computing effort, is actually a DE-optimization for the project. Avoid inadequate-bounds factoring attempts always. Typically on CPUs I've benchmarked, at the wavefront of DC or first time testing, fewer cores per worker produces highest aggregate throughput figures, but the difference is slight. The response of prime95 v30.8 P-1 to a lot of allowed ram is larger. So run two workers for a single-CPU-package system, and they will use about the same amount of time in stage 1 and two, and alternate using large quantities of memory for stage 2, fully employing available memory and maximizing expected net savings of computing time. See second attachment. That attachment is a work in progress, but clearly already shows by comparing the first try and retry curves for similar exponents (current first-primality-test wavefront) that the expected time saved is much larger for a first try, even at considerably less allowed ram than for a retry with nearly 64 GiB of ram. It also indicates there is not much difference in expected time saved versus retried exponent for the same allowable ram, from near the current DC wavefront (66M) to the first-test wavefront (110M). Multi-socket systems (Dual-Xeon, Quad-Xeon, etc) may have nonuniform memory access (NUMA). Specifying a large amount of allowed ram that will cause significant traffic over the NUMA interconnect (QPI, UPI, etc) may be slower than using a lesser amount of ram all connected to one processor socket. Performance may be better on a dual-Xeon system by running 4 workers, with ~45% of system ram allowed per worker, so that it could all be on the near side of the NUMA interconnect. There was a noticeable dip in performance on a dual-Xeon system with 2 workers when using more than half the total system ram in stage 2 in a single worker, which would require some of it to be accessed across the NUMA boundary. See first attachment. See also https://www.mersenneforum.org/showpo...&postcount=724 and https://www.mersenneforum.org/showpo...&postcount=727 Top of reference tree: https://www.mersenneforum.org/showpo...22&postcount=1 Last fiddled with by kriesel on 2022-11-14 at 13:29 Reason: updated second attachment |
Thread Tools | |
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
gpuOwL-specific reference material | kriesel | kriesel | 32 | 2022-08-07 17:06 |
clLucas-specific reference material | kriesel | kriesel | 5 | 2021-11-15 15:43 |
Mfakto-specific reference material | kriesel | kriesel | 5 | 2020-07-02 01:30 |
gpu-specific reference material | kriesel | kriesel | 4 | 2019-11-03 18:02 |
CUDAPm1-specific reference material | kriesel | kriesel | 12 | 2019-08-12 15:51 |