mersenneforum.org  

Go Back   mersenneforum.org > Extra Stuff > Blogorrhea > kriesel

Closed Thread
 
Thread Tools
Old 2021-06-18, 16:49   #12
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

26×3×37 Posts
Default Use as hardware reliability test

There's a pretty good video on this at https://www.youtube.com/watch?v=n0U7fPKRlVs.
There are what occur to me as some inaccuracies in that youtube video.

Hyperthreading should not usually be used in primality testing; performance is usually better without employing the additional threads in prime95 / mprime.

Prime95 includes disclosure of cpu type, number of cores, whether hyperthreading is available, instructions supported, cache sizes etc. Options, CPU...

It will not test the reliability of your GPU, IGP, PCIe slots, etc. Consider Gpuowl, CUDALucas -memtest, mfakto or mfaktc selftest, or other GPU GIMPS applications for that. And actual hardware test software.


Top of reference tree: https://www.mersenneforum.org/showpo...22&postcount=1

Last fiddled with by kriesel on 2021-11-10 at 18:14
kriesel is online now  
Old 2022-08-13, 20:25   #13
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

26×3×37 Posts
Default Memory use during PRP

P-1, P+1, or ECM factoring may benefit from considerable allowed RAM use. Primality testing via PRP or LLDC has less need of RAM. For exponents up to ~120M, a few hundred MB per worker is enough. Even dual workers at 500+M exponent each do not use 3GiB of RAM on systems with several times that or more installed.
It looks like from the small charted set of data, that ~2.6 bytes times sum of exponents being primality tested at the moment on the system is a usable rough estimate of required RAM, in the absence of any of the memory-hungry factoring algorithms.
I don't think Windows version matters. Data were collected from Vista to Windows 11. Prime95 version probably does not matter much either.


Top of reference tree: https://www.mersenneforum.org/showpo...22&postcount=1
Attached Files
File Type: pdf prime95 PRP memory usage.pdf (17.9 KB, 26 views)

Last fiddled with by kriesel on 2022-08-13 at 20:47
kriesel is online now  
Old 2022-09-08, 05:46   #14
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

26·3·37 Posts
Default Interpreting the error-counts 32-bit word

(draft)


I haven't verified that the output form in results.txt or results.json.txt matches the internal storage form, but it seems likely.

from prime95 v30.8b15 source module commonb.c starting line 6116
Code:
/* Increment the error counter.  The error counter is one 32-bit field containing 5 values.  Prior to version 29.3, this was */
/* a one-bit flag if this is a continuation from a save file that did not track error counts, a 7-bit count of errors that were */
/* reproducible, a 8-bit count of ILLEGAL SUMOUTs or zeroed FFT data or corrupt units_bit, a 8-bit count of convolution errors */
/* above 0.4, and a 8-bit count of SUMOUTs not close enough to SUMINPs. */
/* NOTE:  The server considers an LL run clean if the error code is XXaaYY00 and XX = YY and aa is ignored.  That is, repeatable */
/* round off errors and all ILLEGAL SUMOUTS are ignored. */
/* In version 29.3, a.k.a. Wf in result lines, the 32-bit field changed.  See comments in the code below. */

void inc_error_count (
    int    type,
    unsigned long *error_count)
{
    unsigned long addin, orin, maxval;

    addin = orin = 0;
    if (type == 0) addin = 1, maxval = 0xF;                // SUMINP != SUMOUT
    else if (type == 4) addin = 1 << 4, maxval = 0x0F << 4;        // Jacobi error check
    else if (type == 1) addin = 1 << 8, maxval = 0x3F << 8;        // Roundoff > 0.4
    else if (type == 5) orin = 1 << 14;                // Zeroed FFT data
    else if (type == 6) orin = 1 << 15;                // Units bit, counter, or other value corrupted
    else if (type == 2) addin = 1 << 16, maxval = 0xF << 16;    // ILLEGAL SUMOUT
    else if (type == 7) addin = 1 << 20, maxval = 0xF << 20;    // High reliability (Gerbicz or dblchk) PRP error
    else if (type == 3) addin = 1 << 24, maxval = 0x3F << 24;    // Repeatable error

    if (addin && (*error_count & maxval) != maxval) *error_count += addin;
    *error_count |= orin;
}
So, if the 32 bit error count word was 0x12345678, I think that would mean the following:
1 & 0xC most significant two bits unassigned
0x12 && 0x3F is the repeatable error count field with max value 63 base 10;
3 is GEC error field, max value fifteen
4 is Illegal Sumout field, max value fifteen
5 & 4 is FFT data zeroed error bit field, 0 or 1
5 & 8 is corrupted data indicator bit field, 0 or 1
0x56 & 3F is roundoff error > 0.4 field, max value 63 base 10;
7 is Jacobi symbol error check field, max value fifteen
8 is suminp!=sumout errors field, max value fifteen


Top of reference tree: https://www.mersenneforum.org/showpo...22&postcount=1

Last fiddled with by kriesel on 2022-09-09 at 01:20
kriesel is online now  
Old 2022-09-25, 17:08   #15
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

26×3×37 Posts
Default P-1 performance

Mprime / prime95 V30.8 introduces an enhancement in P-1 performance, using polynomials to achieve almost 100% pairing of primes in stage 2. This allows cost-effectively factoring to higher stage 2 bounds, and achieving higher factor found probability, saving more primality tests.
Use V30.8 or later for P-1 factoring. Use adequate memory to enable the gains. A few GB is better than only running stage 1. The savings are approximately logarithmic with allowed memory, so 16 GiB is good, 32 is better, more is better yet, until risking the onset of slowdown by paging/swapping which can cut performance drastically and change the expected gain into a large loss. Prime95's GUI limits allowed stage 2 ram to 90% of installed physical system ram. That limit can be overridden by editing local.txt's Memory= line with a text editor, then restarting prime95.

Use adequate memory and bounds the first time P-1 is run on an exponent. "Optimizing" by running P-1 to low bounds first, selected to maximize factors found per unit of initial computing effort, is actually a DE-optimization for the project. Avoid inadequate-bounds factoring attempts always.

Typically on CPUs I've benchmarked, at the wavefront of DC or first time testing, fewer cores per worker produces highest aggregate throughput figures, but the difference is slight. The response of prime95 v30.8 P-1 to a lot of allowed ram is larger. So run two workers for a single-CPU-package system, and they will use about the same amount of time in stage 1 and two, and alternate using large quantities of memory for stage 2, fully employing available memory and maximizing expected net savings of computing time. See second attachment. That attachment is a work in progress, but clearly already shows by comparing the first try and retry curves for similar exponents (current first-primality-test wavefront) that the expected time saved is much larger for a first try, even at considerably less allowed ram than for a retry with nearly 64 GiB of ram. It also indicates there is not much difference in expected time saved versus retried exponent for the same allowable ram, from near the current DC wavefront (66M) to the first-test wavefront (110M).

Multi-socket systems (Dual-Xeon, Quad-Xeon, etc) may have nonuniform memory access (NUMA). Specifying a large amount of allowed ram that will cause significant traffic over the NUMA interconnect (QPI, UPI, etc) may be slower than using a lesser amount of ram all connected to one processor socket. Performance may be better on a dual-Xeon system by running 4 workers, with ~45% of system ram allowed per worker, so that it could all be on the near side of the NUMA interconnect. There was a noticeable dip in performance on a dual-Xeon system with 2 workers when using more than half the total system ram in stage 2 in a single worker, which would require some of it to be accessed across the NUMA boundary. See first attachment.

See also https://www.mersenneforum.org/showpo...&postcount=724 and https://www.mersenneforum.org/showpo...&postcount=727


Top of reference tree: https://www.mersenneforum.org/showpo...22&postcount=1

Last fiddled with by kriesel on 2022-11-14 at 13:29 Reason: updated second attachment
kriesel is online now  
Closed Thread

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
gpuOwL-specific reference material kriesel kriesel 32 2022-08-07 17:06
clLucas-specific reference material kriesel kriesel 5 2021-11-15 15:43
Mfakto-specific reference material kriesel kriesel 5 2020-07-02 01:30
gpu-specific reference material kriesel kriesel 4 2019-11-03 18:02
CUDAPm1-specific reference material kriesel kriesel 12 2019-08-12 15:51

All times are UTC. The time now is 11:19.


Thu Dec 1 11:19:52 UTC 2022 up 105 days, 8:48, 0 users, load averages: 0.74, 0.83, 0.77

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2022, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.

≠ ± ∓ ÷ × · − √ ‰ ⊗ ⊕ ⊖ ⊘ ⊙ ≤ ≥ ≦ ≧ ≨ ≩ ≺ ≻ ≼ ≽ ⊏ ⊐ ⊑ ⊒ ² ³ °
∠ ∟ ° ≅ ~ ‖ ⟂ ⫛
≡ ≜ ≈ ∝ ∞ ≪ ≫ ⌊⌋ ⌈⌉ ∘ ∏ ∐ ∑ ∧ ∨ ∩ ∪ ⨀ ⊕ ⊗ 𝖕 𝖖 𝖗 ⊲ ⊳
∅ ∖ ∁ ↦ ↣ ∩ ∪ ⊆ ⊂ ⊄ ⊊ ⊇ ⊃ ⊅ ⊋ ⊖ ∈ ∉ ∋ ∌ ℕ ℤ ℚ ℝ ℂ ℵ ℶ ℷ ℸ 𝓟
¬ ∨ ∧ ⊕ → ← ⇒ ⇐ ⇔ ∀ ∃ ∄ ∴ ∵ ⊤ ⊥ ⊢ ⊨ ⫤ ⊣ … ⋯ ⋮ ⋰ ⋱
∫ ∬ ∭ ∮ ∯ ∰ ∇ ∆ δ ∂ ℱ ℒ ℓ
𝛢𝛼 𝛣𝛽 𝛤𝛾 𝛥𝛿 𝛦𝜀𝜖 𝛧𝜁 𝛨𝜂 𝛩𝜃𝜗 𝛪𝜄 𝛫𝜅 𝛬𝜆 𝛭𝜇 𝛮𝜈 𝛯𝜉 𝛰𝜊 𝛱𝜋 𝛲𝜌 𝛴𝜎𝜍 𝛵𝜏 𝛶𝜐 𝛷𝜙𝜑 𝛸𝜒 𝛹𝜓 𝛺𝜔