mersenneforum.org > News The Next Big Development for GIMPS
 Register FAQ Search Today's Posts Mark Forums Read

2020-06-18, 21:03   #34
kriesel

"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

22·3·7·53 Posts

Quote:
 Originally Posted by ewmayer These are Prime95/mprime checkpoint files? Because that's way bigger than needed for that exponent, even with 2 full-length PRP residues - 1 for the PRP tests, 1 for the Gerbicz-check residue - taken into account. A 91M expo yields a residue of ceiling(exponent/8) ~ 11.4Mbytes, a minimal-length checkpoint file will only be of that size plus a few more bytes for metadata.
I vaguely recall George describing conditions under which prime95 saves more than one residue per file. I happen to have some recent bu files handy, and see ~1x, 2x, and 3x the size you estimate for the same exponent. A 2x file easily compresses to a few percent larger than a 1x file, not so surprisingly, easily in IZArc. These are for PRP. The 1x I have are 91M LL files.
The .7z below is the product of the .bu4 file

Code:
06/18/2020  03:34 PM        35,639,672 p95038813
06/18/2020  03:52 PM        12,043,108 p95038813.7z
06/18/2020  03:04 PM        35,639,672 p95038813.bu
06/18/2020  02:34 PM        35,639,672 p95038813.bu2
06/18/2020  12:21 PM        23,759,816 p95038813.bu3
06/18/2020  06:17 AM        23,759,816 p95038813.bu4

2020-06-18, 23:27   #35
kriesel

"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

22×3×7×53 Posts

Quote:
 Originally Posted by Prime95 Double-checking has always lagged first-time testing and the lag gets worse every year. Imagine if 90% of the first time tests did not need a DC? Double-checking could close the gap within a few years.
Some scenarios:
1. Business as usual. More first tests are done than DCs, while the proof development continues and gets tested, and rolled out. (Months) The backlog builds. All of those first tests will need DC because the LL first tests are not subject to the proof, and the PRP first tests mostly won't have the proof done either, and any gpuowl users may be scattered in regard to power and topk choices on proof attempts.
2. Some level of deemphasizing first tests and increasing the DC rate, starting soon. It might consist of "whatever makes the most sense" becoming equivalent to LL DC, if that's not already in place. Fewer first tests for a while, maybe even catch up some on DC. Eventually deemphasizing LL relative to PRP even more than now; maybe even limiting LL first time assignments entirely.
3. Pick a proof power and topk set soon and ask gpuowl PRP testers to start using them routinely, and to save the proof files as much as practical. Combine with #2 above. Development of verification continues in parallel; verification of the early proofs occurs later.
4. Eventually there's a go-live of PRP proof verification with a backlog to catch up on for PRP gpuowl users. If what we end up with is compatible with earlier proof runs. Prime95/mprime, mlucas get adapted too.
5. Further out there is a mass conversion of clients, motivated by a lack of availability of first test assignments without PRP and proof generation capability.
A complicating factor is COVID19 leaving some systems inaccessible to users for application updating, administration, or ordinary operation. And that's likely to go on another year.

Re credit, I suggest proof and verification by users count as some moderate multiple above what the same number of hours would get traditionally for LL or PRP, to encourage adoption. Remember that some hardware can't do PRP currently but can do LL or TF or P-1, and TF/other is a large ratio already for gpus.

A DC backlog measure versus DC wavefront or year is posted at https://www.mersenneforum.org/showpo...4&postcount=15
Substantial adoption of PRP with proof, and timely verification would not only help cut the backlog, it could reduce the workload for the strategic double and triple check effort and offer the possibility of quicker feedback about client reliability in a time frame where reliability issues can be addressed, not left to make more bad runs accumulated into the database to be found bad several years later.

Last fiddled with by kriesel on 2020-06-18 at 23:33

2020-06-19, 11:57   #36
ATH
Einyen

Dec 2003
Denmark

22×3×5×72 Posts

Quote:
 Originally Posted by ewmayer Anyhow, try your compression magic on either an Mlucas or gpuowl savefile, you'll see the expected "maximal entropy, effectively no redundancy" result.
Yeah you are correct, the smaller files do not compress. It was probably because the big file had the same residue several times.

2020-06-19, 12:06   #37
axn

Jun 2003

469810 Posts

Quote:
 Originally Posted by ewmayer These are Prime95/mprime checkpoint files? Because that's way bigger than needed for that exponent, even with 2 full-length PRP residues - 1 for the PRP tests, 1 for the Gerbicz-check residue - taken into account.
You need 3 in the worst case -- current iteration, GEC base, and GEC cumulative product. At least, that's what I did for the cudaWagstaff stuff.

2020-06-19, 18:46   #38
ewmayer
2ω=0

Sep 2002
República de California

1151810 Posts

Quote:
 Originally Posted by ATH Yeah you are correct, the smaller files do not compress. It was probably because the big file had the same residue several times.
IIRC George's checkpoint files don't use bytewise 'compressed' residues, i.e. there is some 0-bits fat in there.

Quote:
 Originally Posted by axn You need 3 in the worst case -- current iteration, GEC base, and GEC cumulative product. At least, that's what I did for the cudaWagstaff stuff.
I init the GEC product to the PRP-test seed, 3 - is there any good reason to do otherwise?

2020-06-19, 19:58   #39
R. Gerbicz

"Robert Gerbicz"
Oct 2005
Hungary

2·3·233 Posts

Quote:
 Originally Posted by ewmayer IIRC George's checkpoint files don't use bytewise 'compressed' residues, i.e. there is some 0-bits fat in there. I init the GEC product to the PRP-test seed, 3 - is there any good reason to do otherwise?
If you want a flexible check, where the L=interval used for check is not fixed then you need to save the base, where you restarted the check with a new L. You can restart at every error checked residue, because restarting at t means only that for the new residue sequence
r(m+t)=base^(2^m) mod N will be true for base=r(t)=3^(2^t) mod N
The only change is that at error check you need to multiple by base, and here in general base is big, not a small base=3 number. The overhead of this will be very small, just one mulmod per error check.

2020-06-19, 20:07   #40
Prime95
P90 years forever!

Aug 2002
Yeehaw, FL

11011111111012 Posts

Quote:
 Originally Posted by ewmayer IIRC George's checkpoint files don't use bytewise 'compressed' residues, i.e. there is some 0-bits fat in there.
The files are "fat free".

After a GEC check, both matching GEC values are written to the save file. Why? If we only write one value and bit rot sets in after the GEC check and before the the save file is written then the save file is corrupt. Prime95 goes to great lengths to make sure there are always two GEC values so that corruption is near impossible.

2020-06-19, 20:56   #41
ewmayer
2ω=0

Sep 2002
República de California

1151810 Posts

Quote:
 Originally Posted by Prime95 The files are "fat free". After a GEC check, both matching GEC values are written to the save file. Why? If we only write one value and bit rot sets in after the GEC check and before the the save file is written then the save file is corrupt. Prime95 goes to great lengths to make sure there are always two GEC values so that corruption is near impossible.
I supplement the GEC residue written to the savefile with the same kind of auxiliary checksum I use for the PRP-test residue. In my case, for more or less historical reason that is the triplet of Selfridge-Hurwitz residues: R mod(2^64,2^35-1,2^36-1). The first is just the GIMPS Res64 and is all but useless, but the other 2 combine to give a greater than 1 in 2^70 check strength. If the same set of checksums computed on-the-fly from the residue read from the savefile mismatch the reference ones, we try the redundant secondary savefile. If that also mismatches, we can try the last-good-GEC savefile, written every 1M iterations. if that also also mismatches, and our iteration count is > 10M, we can try the last every-10M-iter persistent savefile.

Per your recommendation, I also take great care to verify the integrity of the RAM-stored GEC residue used by the running program.

Last fiddled with by ewmayer on 2020-06-19 at 20:57

2020-06-20, 00:02   #42
kriesel

"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

105448 Posts

Quote:
 Originally Posted by ewmayer I supplement the GEC residue written to the savefile with the same kind of auxiliary checksum I use for the PRP-test residue. In my case, for more or less historical reason that is the triplet of Selfridge-Hurwitz residues: R mod(2^64,2^35-1,2^36-1). The first is just the GIMPS Res64 and is all but useless, but the other 2 combine to give a greater than 1 in 2^70 check strength. If the same set of checksums computed on-the-fly from the residue read from the savefile mismatch the reference ones, we try the redundant secondary savefile. If that also mismatches, we can try the last-good-GEC savefile, written every 1M iterations. if that also also mismatches, and our iteration count is > 10M, we can try the last every-10M-iter persistent savefile. Per your recommendation, I also take great care to verify the integrity of the RAM-stored GEC residue used by the running program.
What does the errored bit pattern look like?
How often are 1M iterations lost as a result?
How often are up to 10M iterations lost as a result?

Last fiddled with by kriesel on 2020-06-20 at 00:02

2020-06-20, 00:15   #43
ewmayer
2ω=0

Sep 2002
República de California

2×13×443 Posts

Quote:
 Originally Posted by kriesel What does the errored bit pattern look like? How often are 1M iterations lost as a result? How often are up to 10M iterations lost as a result?
o Don't know;
o Only ever happened on my notoriously flaky Haswell CPU, perhaps 1x per 100M iter, on average (max was 4 GEC failures on a ~104M expo, George did PRP-DC using his code, we matched);
o Never happened to me yet, over at least 50 PRP tests on Haswell, NUC and multiple Android broke-o-phones.

2020-06-20, 03:01   #44
axn

Jun 2003

2×34×29 Posts

Quote:
 Originally Posted by ewmayer I init the GEC product to the PRP-test seed, 3 - is there any good reason to do otherwise?
In which case, you'd need to save the current iteration, the last verified GEC check (for rolling back), and the current GEC cumulative product. Still 3 needed.

I'm wondering now how you're managing with just two?

 Similar Threads Thread Thread Starter Forum Replies Last Post airsquirrels Hardware 313 2019-10-29 22:51 mathwiz GMP-ECM 0 2019-05-15 01:06 Jean Penné Software 0 2011-06-16 20:05 Jean Penné Software 6 2011-04-28 06:21 Jean Penné Software 4 2010-11-14 17:32

All times are UTC. The time now is 04:18.

Tue Sep 29 04:18:13 UTC 2020 up 19 days, 1:29, 0 users, load averages: 1.95, 1.84, 1.76