mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware

Reply
 
Thread Tools
Old 2023-05-30, 10:00   #1
S485122
 
S485122's Avatar
 
"Jacob"
Sep 2006
Brussels, Belgium

7·281 Posts
Default Error rates and ECC

I just downloaded all LL tests returned in the past year : between 2022-05-30 and 2023-05-29.

Some LL results are unverified or the Mersenne number has been factored, then there are a few duplicate results (same user, final residue and shift, slandrum returned a whole lot on 2022-12-18 and Ryan Popper on 2023-01-06).

Of the remaining 78165 LL results returned between 2022-05-30 and 2023-05-29 506 are bad : about 0,65% (excluding exponents below 60M or above 120M doesn't change this significantly.)

Which brings me on the subject of ECC. IMHO ECC memory is not necessary for CPU primality testing (at least with DDR3, DDR4 ; not enough data about DDR5 or DDR6.) DDR2 was not reliable, but with DDR3 and DDR4 machines can run LLs year after year without an error.
S485122 is offline   Reply With Quote
Old 2023-05-30, 10:37   #2
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

11111100001002 Posts
Default

Offered as a possible counterexample:
https://www.mersenne.org/report_LL/?..._id=martinette
i5-1035G1 64GiB DDR4 laptop
31 verified LL, 5 bad; 5/36 = 13.9% bad
Note that the actual track record of the 64 GiB incarnation is worse than computed above, as the verified include some results from the original more reliable 16 GiB configuration. The 64GiB is 2 new SODIMMs obtained from ATech.

Its still-16-GiB twin (the laptop I'm typing on) is 47-0:
https://www.mersenne.org/report_LL/?...comp_id=martin

Last fiddled with by kriesel on 2023-05-30 at 10:39
kriesel is offline   Reply With Quote
Old 2023-05-30, 10:47   #3
retina
Undefined
 
retina's Avatar
 
"The unspeakable one"
Jun 2006
My evil lair

1AC816 Posts
Default

Quote:
Originally Posted by S485122 View Post
IMHO ECC memory is not necessary for CPU primality testing ...
Agreed. It never was necessary, even for known bad memory. The user just gets more and more frustrated with continual bad results.

But, IMO, for preserving sanity, ECC is fabulous. No* wasted computing cycles or electrons.

*Technically it should be "fewer", but I've never personally seen any bad results when using ECC.
retina is online now   Reply With Quote
Old 2023-05-30, 11:18   #4
Jurzal
 
Jurzal's Avatar
 
Jan 2023
Riga, Latvia

3·23 Posts
Default

It is not just ECC, it is also CPU throwing up an error even if memory works just fine.
In overclocking community we use P95 to stress-test CPU overclocks, especially with AMD Zen PBO2 algorithm and undervolting, ensuring stable operations is critical.

PC can work all fine with heavy undervolt, until you put 24h stress test and in 1-2 hours with raised temps errors start to creep in from the CPU calculations due to undervolt. Also, CPU silicon degradation matters and older overclocked CPU's need to reduce the overclocks with age, and raise voltages just to keep the same performance.

All my errors in LL and PRP were caused by too aggressive undervolt and overclock on my 5900X.
Even with tuned RAM, but ECC can't match regular tuned RAM performance for Zen 3. Reliability yes, performance, no.

EDIT: Don't get me wrong, ECC is great for prime search reliability. But it is not a silver bullet, there are other components affecting it too.

Last fiddled with by Jurzal on 2023-05-30 at 11:33
Jurzal is offline   Reply With Quote
Old 2023-05-30, 11:36   #5
R. Gerbicz
 
R. Gerbicz's Avatar
 
"Robert Gerbicz"
Oct 2005
Hungary

17×97 Posts
Default

Quote:
Originally Posted by S485122 View Post
Of the remaining 78165 LL results returned between 2022-05-30 and 2023-05-29 506 are bad : about 0,65% (excluding exponents below 60M or above 120M doesn't change this significantly.)

Which brings me on the subject of ECC. IMHO ECC memory is not necessary for CPU primality testing
And even using LL with ECC+-Jacobi test does not help you to catch (all) FFT errors.
R. Gerbicz is offline   Reply With Quote
Old 2023-05-31, 23:42   #6
slandrum
 
Jan 2021
California

2·7·41 Posts
Default

Quote:
Originally Posted by S485122 View Post
I ... then there are a few duplicate results (same user, final residue and shift, slandrum returned a whole lot on 2022-12-18 ...
Those were an error on my part, I moved a bunch of files to a new location as I was updating software and scripts, and neglected to copy the files that said which results had already been sent so a bunch of results got sent again.
slandrum is offline   Reply With Quote
Old 2023-06-07, 16:20   #7
Xyzzy
 
Xyzzy's Avatar
 
Aug 2002

2·43·101 Posts
Default

Is this interesting and/or useful?

https://pdos.csail.mit.edu/papers/so...opson-meng.pdf

Xyzzy is offline   Reply With Quote
Old 2023-06-13, 03:11   #8
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

11111100001002 Posts
Default

Definitely interesting. Now if we could find a full software implementation for Linux or Windows...

https://www.academia.edu/12046032/A_...puting_Systems

EDAC (Linux) is access to hardware error detection counts (Ram, cache, PCI). https://buttersideup.com/mediawiki/index.php/Main_Page

This one needs a bit of hardware support too; https://www.researchgate.net/publica...n_for_Memories
kriesel is offline   Reply With Quote
Old 2023-06-24, 23:45   #9
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

22·2,017 Posts
Default

Quote:
Originally Posted by S485122 View Post
Of the remaining 78165 LL results returned between 2022-05-30 and 2023-05-29 506 are bad : about 0,65% (excluding exponents below 60M or above 120M doesn't change this significantly.)
I suspect if you break it down such as separate error rates for 57-60M and 114-120M intervals, you'll find a lower error rate at lower exponent, and high error rate and few samples at higher exponent. Run time at 120M will be ~4.3 times as long as at 60M, so on the same hardware, final LL residue error rate could easily be ~4 times as high. The server stopped issuing LL first test assignments some time ago.
kriesel is offline   Reply With Quote
Old 2023-06-26, 01:18   #10
cxc
 
cxc's Avatar
 
"Catherine"
Mar 2023
Melbourne

2×3×11 Posts
Default

I looked at the user list to see if there were any patterns in the last year, but unfortunately the LL-DC user report doesn’t allow a breakdown results by exponent range. I tried aggregating the data a little:
Code:
                   All        1–9 attempts      10–99     100–999    1000+ attempts
--------------|-------------|--------------|------------|----------|----------------
No failures:   1517 / 2009    1078 / 1282     433 / 651    6 / 69     0 / 7*
At least 1:     492 /          204 /          218 /       63 /        7*/
Only 1 failure: 286 /          143 /          129 /       14 /          /
More than 1:    206 /           61 /           89 /       49 /        7*/
2–9 failures:   186 /           61 /           85 /       38 /        2 /
     2 fails:    92 /           44 /           29 /       19 /          /
     3 fails:    41 /            7 /           31 /        3 /          /
     4 fails:    18 /            5 /           11 /        2 /          /
     5 fails:    10 /            3 /            3 /        4 /          /
     6 fails:     9 /            2 /            4 /        2 /        1 /
     7 fails:     9 /              /            4 /        5 /          /
     8 fails:     5 /              /            2 /        2 /        1 /
     9 fails:     2 /              /            1 /        1 /          /
More than 10:    20 /             N/A           4 /       11 /        5*/
More than 100:    5 /             N/A            N/A       3 /        2*/
More than 1000:     /             N/A            N/A        N/A         /
* Includes unidentified -Anonymous- combined as a single contributor
Two prolific contributors in the 1000+ attempts column have fewer than 10 failures; even those with more failures are still running at ~98% success minimum (the six non-anonymous contributors are at 98%, 98%, 98.6%, 99.4%, 99.6%, and 99.6%) which is solid, given the fleet of machines that have to be coordinated to achieve this. In each of the bands with fewer attempts a majority of contributors have two or fewer failures; overall 5.7% of users have more than two failures (and when picking through the data, there seem to be a few clear cases where machines are no longer being maintained and/or have h/w faults that make it difficult to run the LL test anyway).
cxc is offline   Reply With Quote
Old 2023-06-26, 12:59   #11
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

22·2,017 Posts
Default

Getting the current or recent error rate lower is definitely good. But achieving a low current test error rate does not mean the mismatch rate on DC would be that low. It's common that a DC run today will be for an exponent whose first test was run several years ago, on older slower hardware, before the addition of the Jacobi check to prime95, so probably at at least double the error rate per test, probably considerably more.
Mismatch rate is first test error rate plus DC test error rate.
The top producers of DC output shows a wide variation of success rate among individuals.
Mismatch rate = 1 - successes / attempts. Summing the successes and attempts of the top N for n=1 to 60 yielded mismatch rate ranging from 2.05% at N=1 to 4.08% peak at N=20.
One could carry that out for top 500 too I suppose, although from 20 to 60, it looks like it stabilizes around 3.8%.
That would be rate per exponent; mean error rate per test (old and new) would be ~half that; 1.9%. (A little less than half I think, because some get 3 or four tests before matches are achieved.)
See second and third attachments of https://www.mersenneforum.org/showpo...91&postcount=6 for top 60 data and analysis.

Last fiddled with by kriesel on 2023-06-26 at 13:04
kriesel is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
Electrical Service Rates storm5510 Hardware 178 2023-04-13 00:51
error rates and P-1 test drakkar67 Prime Sierpinski Project 9 2008-05-26 14:29
error rates drakkar67 Prime Sierpinski Project 12 2006-04-21 17:26
Error Rates Prime95 Math 31 2002-09-06 14:34
Error rates revealed Prime95 Math 1 2002-09-01 00:10

All times are UTC. The time now is 11:37.


Sat Sep 23 11:37:46 UTC 2023 up 10 days, 9:20, 0 users, load averages: 1.08, 0.97, 1.02

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2023, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.

≠ ± ∓ ÷ × · − √ ‰ ⊗ ⊕ ⊖ ⊘ ⊙ ≤ ≥ ≦ ≧ ≨ ≩ ≺ ≻ ≼ ≽ ⊏ ⊐ ⊑ ⊒ ² ³ °
∠ ∟ ° ≅ ~ ‖ ⟂ ⫛
≡ ≜ ≈ ∝ ∞ ≪ ≫ ⌊⌋ ⌈⌉ ∘ ∏ ∐ ∑ ∧ ∨ ∩ ∪ ⨀ ⊕ ⊗ 𝖕 𝖖 𝖗 ⊲ ⊳
∅ ∖ ∁ ↦ ↣ ∩ ∪ ⊆ ⊂ ⊄ ⊊ ⊇ ⊃ ⊅ ⊋ ⊖ ∈ ∉ ∋ ∌ ℕ ℤ ℚ ℝ ℂ ℵ ℶ ℷ ℸ 𝓟
¬ ∨ ∧ ⊕ → ← ⇒ ⇐ ⇔ ∀ ∃ ∄ ∴ ∵ ⊤ ⊥ ⊢ ⊨ ⫤ ⊣ … ⋯ ⋮ ⋰ ⋱
∫ ∬ ∭ ∮ ∯ ∰ ∇ ∆ δ ∂ ℱ ℒ ℓ
𝛢𝛼 𝛣𝛽 𝛤𝛾 𝛥𝛿 𝛦𝜀𝜖 𝛧𝜁 𝛨𝜂 𝛩𝜃𝜗 𝛪𝜄 𝛫𝜅 𝛬𝜆 𝛭𝜇 𝛮𝜈 𝛯𝜉 𝛰𝜊 𝛱𝜋 𝛲𝜌 𝛴𝜎𝜍 𝛵𝜏 𝛶𝜐 𝛷𝜙𝜑 𝛸𝜒 𝛹𝜓 𝛺𝜔