mersenneforum.org  

Go Back   mersenneforum.org > Extra Stuff > Blogorrhea > kriesel

Closed Thread
 
Thread Tools
Old 2021-10-24, 23:43   #34
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

6,659 Posts
Default Optimal prp proof power versus exponent

Re prime95 / mprime at least, assuming enough disk space is available, overall-compute-optimal (proof & server & cert total effort) proof powers are thought to be, based on information provided by George Woltman, for some version v30.x of mprime/prime95, perhaps v30.3:
Quote:
Originally Posted by Prime95 View Post
1.7M - 6.7M = 7
6.7M - 26.6M = 8
26.6M - 106.5M = 9
106.5M - 414.2M = 10
414.2M+ = 11
and
Quote:
Originally Posted by Prime95 View Post
...it looks like the next transition will be near 1600M.
So power 10 is the best choice for first test wavefront through 100Mdigit and somewhat higher.

And
Quote:
Originally Posted by kriesel View Post
Looks like about every 2 bits on fft length is +1 on proof power. So (extrapolating in Mlucas fft lengths) that would imply power 12 would be sufficient to ~6.2G, ~1.87 Gdigit, not something of concern for decades or centuries.
(See table at end of Mlucas source code file get_fft_radices.c)
Extrapolating higher, power 13 would cover optimal up to 414.2/106.5 * 6.2 G ~ 24. G, well past the maximum fft length 512 Mi of Mlucas v20.x which will support up to ~8.9 G exponent.

And extrapolating as needed to go lower, than prime95's commonb.c source code provides (power 5):
420K - 1.7M power 6
105K - 420K power 5
~26K - 105K power 4
~6.5K - 26K power 3
~1.6K - 6.5K power 2
~400 - 1.6K power 1

The crossover exponent values are somewhat dependent on program efficiency, so somewhat subject to change among mprime/prime95 versions, and across applications (gpuowl; eventually Mlucas).
Recent attempts to re-derive the proof power transition points for mprime/prime95 from program runs, source code examination, and cost function analysis have not duplicated the above, giving different results instead.


Top of this reference thread: https://www.mersenneforum.org/showpo...89&postcount=1
Top of reference tree: https://www.mersenneforum.org/showpo...22&postcount=1

Last fiddled with by kriesel on 2022-05-06 at 16:10 Reason: added dependency on program efficiency & rederivation
kriesel is online now  
Old 2021-11-13, 00:26   #35
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

1A0316 Posts
Default Requirements for comparability of interim residues

To compare parallel long runs in progress, it is useful/necessary to have runs produce and preserve comparable interim results. Differences may indicate errors early. That is especially important for longer runs, and code relatively lower in error detection and correction. Standalone P-1 factoring typically has less error detection. That is partly because less emphasis was placed on it due to shorter run time than primality testing for the same exponent. It's also partly because the number theory does not offer as many opportunities for error detection in standalone P-1 factoring. (The Jacobi symbol check is more costly in P-1 stage 1, because unlike for LL, we don't know the correct Jacobi symbol value; we must compute it, and compute the check on the interim results.) It's also partly because the impact of an error is lower in P-1 stage 2, affecting only a small fraction of the stage, not the whole stage, or whole primality test as can occur in LL. However, in P-1 factoring for F33 or OBD or higher Mersennes, P-1 run times become months or years on typical fast hardware, increasing the importance of error detection.

In the context of GIMPS Mersenne prime searching, for P-1 interim res64 matching between multiple runs on the same number, all the following must be present, which is more complex than required conditions for LL or PRP interim res64 matching:

In P-1 stage 1:
  1. same exponent and type, e.g. 3321928171 Mersenne for an OBD attempt;
  2. same seed (the initial value that gets powered, usually 3, at the beginning of stage 1);
  3. precisely the same stage 1 prime-powers product, which requires precisely the same formula for computing it (typically B1-powersmooth of all primes < B1, * 2 * exponent);
  4. same modpow algorithm, such as all left-to-right (prime95/mprime does this for initial stage 1, but does right-to-left for stage1-extension);
  5. same interim iteration (squarings) count, and therefore compatible logging intervals
For P-1 stage 2, which depends on stage 1 final residue:
  1. same exponent and type;
  2. matching stage 1 final full size residue results as start point files;
  3. same software (e.g. Mlucas, and perhaps version restrictions);
  4. exact same stage 2 buffer count, implying specifying buffer count on a system with more memory than the other system;
  5. res64 stage 2 output implemented (some software does not implement s2 res64 output, e.g. gpuowl)
  6. same iteration number (q in Mlucas parlance, or perhaps in other software s2 primes coverage), and therefore compatible logging intervals
  7. same prime-pairings algorithm (More or less implied by same software, unless there are differences between versions, as has occurred with prime95 IIRC, and may occur in the future with Mlucas)
  8. equivalence of subtle implementation details, including those that may affect total number of modmuls etc. required for the same gross parameter set (exponent, type, B1, B2)

For P+1, ECM: no idea. P+1 random seed, ECM random curve parameter, would seem to make comparing runs more difficult and less necessary.


For TF: interim residues are not output, so there's nothing to check.


By comparison, for LL, it's simpler:
  1. LL seed value 4,
  2. same exponent, & type, e.g. 3321928171, Mersenne
  3. same iteration number, with proper allowance for prime95/mprime's high-by-2 loop counter

And for PRP (& PRPDC), it's a little more complicated again. With GEC, there is extremely reliable checking in progress, typically with automatic rollback and retry from the last saved confirmed-good state. The ability to compare and check any PRP iteration number's interim residue is likely to be needed by developers during debugging. This requires:
  1. same PRP seed value, typically 3 (except for PRP-1 type 0),
  2. same exponent, & type, e.g. 3321928171, Mersenne
  3. same iteration number, with proper allowance for prime95/mprime's possibly different loop counter
  4. same PRP type 1-5 is not always strictly required for interim residue matching, although it is for final residues (except that type 1 and 5 are equivalent in the absence of factors)
  5. same approach on producing the increasing powers from 1 to ~exponent, e.g. left to right or right to left and nuances relating to optimization.
  6. If using GEC, which requires a straight squaring sequence, corresponding to type 3, adjustments to power will be made at the end to convert to other PRP types.
  7. same approach on reporting the interim residues. (When using GEC, are the residues recomputed for reporting to correspond to the nominal type, or are they left as in the type 3 sequence.)
  8. Some very limited testing indicates gpuowl v5 (type 4), gpuowl v6.11-380 (type 1), and Mlucas v20.1.1 (type 1) have compatible interim 64-bit residues for the same exponent and iteration number while prime95 v30.7 is an outlier.
Compatible logging intervals are simple to achieve in LL or PRP, with [1,2,5]*10n multipliers on iteration count typically for log intervals. It is not so simple in P-1, when the units (modmuls-in-algorithm-I or specific delta s2-primes progress or whatever) differ, and possibly in a way that the least common multiple might be substantial compared to the total run duration, or where the number of modmuls may differ slightly between implementations.


(Perhaps more to come. Including corrections.)


A special thanks here to Ernst Mayer for considerable explanation by PM regarding P-1.
Constructive comments especially by software authors are invited.


Top of this reference thread: https://www.mersenneforum.org/showpo...89&postcount=1
Top of reference tree: https://www.mersenneforum.org/showpo...22&postcount=1

Last fiddled with by kriesel on 2022-04-15 at 19:10 Reason: minor formatting
kriesel is online now  
Old 2021-12-07, 15:54   #36
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

6,659 Posts
Default Exponent limits

Exponent limits of software and servers are tabulated by GIMPS computation type and application or server, in the attachments. Maximum exponent empirically confirmed is reliably 100% of nominal for TF, but varies from 20% to ~101% for other computation types.


Top of this reference thread: https://www.mersenneforum.org/showpo...89&postcount=1
Top of reference tree: https://www.mersenneforum.org/showpo...22&postcount=1
Attached Files
File Type: pdf exponent limits.pdf (33.3 KB, 43 views)

Last fiddled with by kriesel on 2022-04-25 at 07:30 Reason: status update
kriesel is online now  
Old 2022-06-26, 17:11   #37
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

6,659 Posts
Default Gerbicz Error Check block size

(draft; add links to development posts)


Note, some of this was originally posted at https://mersenneforum.org/showpost.p...postcount=2773 or elsewhere.

The exponent is necessarily prime or there is no point to the PRP test. So GECblocksize > mod (exponent, GECblocksize) > 0.
Quote:
Originally Posted by R. Gerbicz View Post
About the 0.2% cost of the strong error check overhead in time:
You simply can't do it better than 2/sqrt(p) (where the whole is 1), if you want to see at least one strong error check.
So you can achieve the 0.2% (total) overhead in time for p>1e6.
What would/could happen with much larger p, and with even better error rate (better than 3%):
With L=H=sqrtint(p)/10 we would see at least 100 error checks
and the overhead would be only 20/sqrt(p), and this one can be arbitrarily small, if p is "large". But this is still not a recommended setup, because we don't know what would be the future memory's error rate, and what would be the used algorithm/method on integer multiplication.
So if we use block size ~sqrt(p)/10 we get ~100 error checks along the way, and overhead ~20/sqrt(p). For p ~ 110M that would be block size ~1049, overhead ~0.0019 ~0.2%.


Gpuowl
Gpuowl allows as few as two blocks per GEC check, as at startup; l is not required to equal h.
GEC block size is kept constant through the run on an exponent, and is often highly composite. The entire set of gpuowl PRP iterations are guarded by GEC by computing additional iterations to complete the last GEC block, up to and past the exponent. (IIRC Preda has explained this before.)
For example, 77232917 / 1000 = 77232.917, so 77233 blocks of 1000 (77233000 iterations) would be used.
The overhead of GEC is small, but such that larger than default blocksize computing a few more total iterations can actually be more efficient, if the reliability is high. (See end of this reference post.)
From gpuowl help.txt:
Code:
-block <value>     : PRP GEC block size, or LL iteration-block size. Must divide 10'000.
10,000 = 104 = 24 54 implying legal block sizes of 2 4 5 8 10 16 20 25 40 50 80 100 125 200 250 400 500 625 1000 1250 2000 2500 5000, and 10,000, but not 1 which may be meaningless.

Code:
2022-06-26 12:52:53 gpuowl v6.11-380-g79ea0cc
2022-06-26 12:52:53 config: -user kriesel -cpu asr2/radeonvii3-w2 -d 3 -use NO_ASM -maxAlloc 14000 -cleanup -block 1 -proof 10
2022-06-26 12:52:53 device 3, unique id ''
2022-06-26 12:52:53 asr2/radeonvii3-w2 216091 FFT: 128K 256:1:256 (1.65 bpw)
2022-06-26 12:52:53 asr2/radeonvii3-w2 Expected maximum carry32: 00000
2022-06-26 12:52:53 asr2/radeonvii3-w2 using long carry kernels
2022-06-26 12:52:53 asr2/radeonvii3-w2 OpenCL args "-DEXP=216091u -DWIDTH=256u -DSMALL_HEIGHT=256u -DMIDDLE=1u -DPM1=0 -DAMDGPU=1 -DWEIGHT_STEP_MINUS_1=0x8.d305cfef9df78p-5 -DIWEIGHT_STEP_MINUS_1=-0xd.d574837e195a8p-6 -DNO_ASM=1  -cl-unsafe-math-optimizations -cl-std=CL2.0 -cl-finite-math-only "
2022-06-26 12:52:57 asr2/radeonvii3-w2 OpenCL compilation in 3.85 s
 Assertion failed: blockSize >= 2, file Gpu.cpp, line 502
Block size 2 and 4 also failed with the same error message, while 5 ran.

These have estimated GEC overhead (at p ~ 108 IIUC) as follows:
Code:
blksz l  % overhead
1        200.
2        100.
4         50.
5         40.
8         32.
10        20.
16        12.5 
20        10.
25         8.
40         5.
50         4.  
80         2.5
100        2.
125        1.6
200        1.
250        0.8
400        0.5
500        0.4
625        0.32
1000       0.2
1250       0.16
2000       0.1
2500       0.08
5000       0.04
10000      0.02
Block size is determined at the start, stored in the save file, and used unchanged throughout the gpuowl run of the exponent. Saving 0.3% on the whole run by using block size 1000 instead of 400 more than pays for a possible additional ~1000-400= 600 iterations at the end, in the absence of detected error; 113M*0.3% = 339,000. 113M* ( 0.2%-0.02%) = 203400 iterations saved vs up to 9000 more iterations past the exponent for block size 10000 vs. 1000.
In case of an error, a number of iterations corresponding to the log interval are repeated at least once:

Code:
2022-06-25 06:29:33 roa/radeonvii 852348659 OK 575720000  67.55%; 9737 us/it; ETA 31d 04:11; 2f21cdfbb27f9808 (check 11.47s) 49 errors
2022-06-25 06:32:59 roa/radeonvii 852348659 EE 575740000  67.55%; 9726 us/it; ETA 31d 03:19; c9292397e8e476d9 (check 11.19s) 49 errors
2022-06-25 06:33:12 roa/radeonvii 852348659 OK 575720000 loaded: blockSize 1000, 2f21cdfbb27f9808
2022-06-25 06:36:38 roa/radeonvii 852348659 OK 575740000  67.55%; 9727 us/it; ETA 31d 03:25; 09a617fede5d61be (check 11.47s) 50 errors
2022-06-25 06:40:04 roa/radeonvii 852348659 OK 575760000  67.55%; 9723 us/it; ETA 31d 03:01; e724dd1feb882e6e (check 11.46s) 50 errors
On reliable fast hardware, for normal exponents, large block sizes are efficient. For large exponents or less reliable or slower hardware, with long run times, lesser block sizes and frequent log output may be more efficient.

Using too high a block size for the exponent produces few GEC checks and apparently causes proof generation to fail:
Code:
2022-06-26 13:38:17 gpuowl v6.11-380-g79ea0cc
2022-06-26 13:38:17 config: -user kriesel -cpu asr2/radeonvii3-w2 -d 3 -use NO_ASM -maxAlloc 14000 -cleanup -block 2000 -proof 10 -log 20000
2022-06-26 13:38:17 device 3, unique id ''
2022-06-26 13:38:17 asr2/radeonvii3-w2 216091 FFT: 128K 256:1:256 (1.65 bpw)
2022-06-26 13:38:17 asr2/radeonvii3-w2 Expected maximum carry32: 00000
2022-06-26 13:38:17 asr2/radeonvii3-w2 using long carry kernels
2022-06-26 13:38:17 asr2/radeonvii3-w2 OpenCL args "-DEXP=216091u -DWIDTH=256u -DSMALL_HEIGHT=256u -DMIDDLE=1u -DPM1=0 -DAMDGPU=1 -DWEIGHT_STEP_MINUS_1=0x8.d305cfef9df78p-5 -DIWEIGHT_STEP_MINUS_1=-0xd.d574837e195a8p-6 -DNO_ASM=1  -cl-unsafe-math-optimizations -cl-std=CL2.0 -cl-finite-math-only "
2022-06-26 13:38:21 asr2/radeonvii3-w2 OpenCL compilation in 3.89 s
2022-06-26 13:38:21 asr2/radeonvii3-w2 216091 OK        0 loaded: blockSize 2000, 0000000000000003
2022-06-26 13:38:21 asr2/radeonvii3-w2 validating proof residues for power 10
2022-06-26 13:38:21 asr2/radeonvii3-w2 Proof using power 10
2022-06-26 13:38:22 asr2/radeonvii3-w2 216091 OK     4000   1.83%;   97 us/it; ETA 0d 00:00; 63c70be5ec859db2 (check 0.15s)
2022-06-26 13:38:23 asr2/radeonvii3-w2 216091 OK    20000   9.17%;  105 us/it; ETA 0d 00:00; a51215284f0c1608 (check 0.15s)
2022-06-26 13:38:26 asr2/radeonvii3-w2 216091 OK    40000  18.35%;  126 us/it; ETA 0d 00:00; 9f69d4e111046456 (check 0.16s)
2022-06-26 13:38:29 asr2/radeonvii3-w2 216091 OK    60000  27.52%;  121 us/it; ETA 0d 00:00; 5614e1c31fc5b23a (check 0.15s)
2022-06-26 13:38:31 asr2/radeonvii3-w2 216091 OK    80000  36.70%;  107 us/it; ETA 0d 00:00; ba8e91ee8a118ae6 (check 0.15s)
2022-06-26 13:38:33 asr2/radeonvii3-w2 216091 OK   100000  45.87%;   98 us/it; ETA 0d 00:00; 0b51858a4a12ef62 (check 0.15s)
2022-06-26 13:38:35 asr2/radeonvii3-w2 216091 OK   120000  55.05%;   99 us/it; ETA 0d 00:00; 5098a06175e33b5e (check 0.15s)
2022-06-26 13:38:37 asr2/radeonvii3-w2 216091 OK   140000  64.22%;  100 us/it; ETA 0d 00:00; 67c044a637589727 (check 0.15s)
2022-06-26 13:38:39 asr2/radeonvii3-w2 216091 OK   160000  73.39%;   98 us/it; ETA 0d 00:00; f933c2a2638b1b9a (check 0.15s)
2022-06-26 13:38:42 asr2/radeonvii3-w2 216091 OK   180000  82.57%;  100 us/it; ETA 0d 00:00; 387e35fbbe410479 (check 0.15s)
2022-06-26 13:38:44 asr2/radeonvii3-w2 216091 OK   200000  91.74%;   99 us/it; ETA 0d 00:00; 7e58fc60c8cd8180 (check 0.15s)
2022-06-26 13:38:45 asr2/radeonvii3-w2 PP   216091 / 216091, 0000000000000001
Assertion failed: k > 0 && k <= topK, file ProofSet.h, line 281
Using a very low block size produces high overhead (the magnitude of which may be dependent on performance ratio between CPU and GPU or other factors; following is >100% overhead on a slow CPU/fast GPU combination, higher than the expected ~40% overhead extrapolation):
Code:
2022-06-26 13:03:13 gpuowl v6.11-380-g79ea0cc
2022-06-26 13:03:13 config: -user kriesel -cpu asr2/radeonvii3-w2 -d 3 -use NO_ASM -maxAlloc 14000 -cleanup -block 5 -proof 10 -log 20000
2022-06-26 13:03:13 device 3, unique id ''
2022-06-26 13:03:13 asr2/radeonvii3-w2 216091 FFT: 128K 256:1:256 (1.65 bpw)
2022-06-26 13:03:13 asr2/radeonvii3-w2 Expected maximum carry32: 00000
2022-06-26 13:03:13 asr2/radeonvii3-w2 using long carry kernels
2022-06-26 13:03:13 asr2/radeonvii3-w2 OpenCL args "-DEXP=216091u -DWIDTH=256u -DSMALL_HEIGHT=256u -DMIDDLE=1u -DPM1=0 -DAMDGPU=1 -DWEIGHT_STEP_MINUS_1=0x8.d305cfef9df78p-5 -DIWEIGHT_STEP_MINUS_1=-0xd.d574837e195a8p-6 -DNO_ASM=1  -cl-unsafe-math-optimizations -cl-std=CL2.0 -cl-finite-math-only "
2022-06-26 13:03:17 asr2/radeonvii3-w2 OpenCL compilation in 3.84 s
2022-06-26 13:03:17 asr2/radeonvii3-w2 216091 OK        0 loaded: blockSize 5, 0000000000000003
2022-06-26 13:03:17 asr2/radeonvii3-w2 validating proof residues for power 10
2022-06-26 13:03:17 asr2/radeonvii3-w2 Proof using power 10
2022-06-26 13:03:17 asr2/radeonvii3-w2 216091 OK       10   0.00%;  199 us/it; ETA 0d 00:01; 9da4a09b9023d001 (check 0.01s)
2022-06-26 13:03:22 asr2/radeonvii3-w2 216091 OK    20000   9.21%;  205 us/it; ETA 0d 00:01; a51215284f0c1608 (check 0.01s)
2022-06-26 13:03:25 asr2/radeonvii3-w2 216091 OK    40000  18.43%;  195 us/it; ETA 0d 00:01; 9f69d4e111046456 (check 0.01s)
2022-06-26 13:03:30 asr2/radeonvii3-w2 216091 OK    60000  27.64%;  206 us/it; ETA 0d 00:01; 5614e1c31fc5b23a (check 0.01s)
2022-06-26 13:03:34 asr2/radeonvii3-w2 216091 OK    80000  36.85%;  205 us/it; ETA 0d 00:00; ba8e91ee8a118ae6 (check 0.01s)
2022-06-26 13:03:39 asr2/radeonvii3-w2 216091 OK   100000  46.06%;  249 us/it; ETA 0d 00:00; 0b51858a4a12ef62 (check 0.02s)
2022-06-26 13:03:46 asr2/radeonvii3-w2 216091 OK   120000  55.28%;  365 us/it; ETA 0d 00:01; 5098a06175e33b5e (check 0.02s)
2022-06-26 13:03:52 asr2/radeonvii3-w2 216091 OK   140000  64.49%;  290 us/it; ETA 0d 00:00; 67c044a637589727 (check 0.01s)
2022-06-26 13:03:56 asr2/radeonvii3-w2 216091 OK   160000  73.70%;  208 us/it; ETA 0d 00:00; f933c2a2638b1b9a (check 0.01s)
2022-06-26 13:04:00 asr2/radeonvii3-w2 216091 OK   180000  82.91%;  205 us/it; ETA 0d 00:00; 387e35fbbe410479 (check 0.01s)
2022-06-26 13:04:04 asr2/radeonvii3-w2 216091 OK   200000  92.13%;  208 us/it; ETA 0d 00:00; 7e58fc60c8cd8180 (check 0.01s)
2022-06-26 13:04:08 asr2/radeonvii3-w2 PP   216091 / 216091, 0000000000000001
2022-06-26 13:04:08 asr2/radeonvii3-w2 216091 OK   217090 100.00%;  204 us/it; ETA 0d 00:00; 28d01226b9466041 (check 0.01s)
2022-06-26 13:04:08 asr2/radeonvii3-w2 proof: building level 1, hash 501ced7b2faed88c
2022-06-26 13:04:08 asr2/radeonvii3-w2 proof: building level 2, hash c6210f5fce31fa40
2022-06-26 13:04:08 asr2/radeonvii3-w2 proof: building level 3, hash b64b7be25a5b3f92
2022-06-26 13:04:08 asr2/radeonvii3-w2 proof: building level 4, hash f127ebdb12e4fbbc
2022-06-26 13:04:08 asr2/radeonvii3-w2 proof: building level 5, hash fe7a3beaf20377e2
2022-06-26 13:04:08 asr2/radeonvii3-w2 proof: building level 6, hash fe7ba158a7c29784
2022-06-26 13:04:08 asr2/radeonvii3-w2 proof: building level 7, hash cd55f7ef1e061d66
2022-06-26 13:04:09 asr2/radeonvii3-w2 proof: building level 8, hash 03a04903f6dcfbbf
2022-06-26 13:04:10 asr2/radeonvii3-w2 proof: building level 9, hash 0667a5e37d43bb0a
2022-06-26 13:04:13 asr2/radeonvii3-w2 proof: building level 10, hash a1a0233eb7bda91a
2022-06-26 13:04:18 asr2/radeonvii3-w2 PRP-Proof 'proof\216091-10.proof' generated
2022-06-26 13:04:18 asr2/radeonvii3-w2 Proof: cleaning up temporary storage
2022-06-26 13:04:19 asr2/radeonvii3-w2 {"status":"P", "exponent":"216091", "worktype":"PRP-3", "res64":"0000000000000001", "residue-type":"1", "errors":{"gerbicz":"0"}, "fft-length":"131072", "proof":{"version":"1", "power":"10", "hashsize":"64", "md5":"d46aad26ddc6adf4bec2a4b4051b7688"}, "program":{"name":"gpuowl", "version":"v6.11-380-g79ea0cc"}, "user":"kriesel", "computer":"asr2/radeonvii3-w2", "timestamp":"2022-06-26 18:04:19 UTC"}
Using a midrange block size for small exponent produces acceptable overhead and successful proof generation:
Code:
2022-06-26 13:40:42 gpuowl v6.11-380-g79ea0cc
2022-06-26 13:40:42 config: -user kriesel -cpu asr2/radeonvii3-w2 -d 3 -use NO_ASM -maxAlloc 14000 -cleanup -block 200 -proof 10 -log 20000
2022-06-26 13:40:42 device 3, unique id ''
2022-06-26 13:40:42 asr2/radeonvii3-w2 216091 FFT: 128K 256:1:256 (1.65 bpw)
2022-06-26 13:40:42 asr2/radeonvii3-w2 Expected maximum carry32: 00000
2022-06-26 13:40:42 asr2/radeonvii3-w2 using long carry kernels
2022-06-26 13:40:42 asr2/radeonvii3-w2 OpenCL args "-DEXP=216091u -DWIDTH=256u -DSMALL_HEIGHT=256u -DMIDDLE=1u -DPM1=0 -DAMDGPU=1 -DWEIGHT_STEP_MINUS_1=0x8.d305cfef9df78p-5 -DIWEIGHT_STEP_MINUS_1=-0xd.d574837e195a8p-6 -DNO_ASM=1  -cl-unsafe-math-optimizations -cl-std=CL2.0 -cl-finite-math-only "
2022-06-26 13:40:46 asr2/radeonvii3-w2 OpenCL compilation in 3.84 s
2022-06-26 13:40:46 asr2/radeonvii3-w2 216091 OK        0 loaded: blockSize 200, 0000000000000003
2022-06-26 13:40:46 asr2/radeonvii3-w2 validating proof residues for power 10
2022-06-26 13:40:46 asr2/radeonvii3-w2 Proof using power 10
2022-06-26 13:40:46 asr2/radeonvii3-w2 216091 OK      400   0.18%;  113 us/it; ETA 0d 00:00; 99a8c14b82765acf (check 0.04s)
2022-06-26 13:40:49 asr2/radeonvii3-w2 216091 OK    20000   9.21%;  138 us/it; ETA 0d 00:00; a51215284f0c1608 (check 0.03s)
2022-06-26 13:40:52 asr2/radeonvii3-w2 216091 OK    40000  18.42%;  133 us/it; ETA 0d 00:00; 9f69d4e111046456 (check 0.02s)
2022-06-26 13:40:54 asr2/radeonvii3-w2 216091 OK    60000  27.62%;  110 us/it; ETA 0d 00:00; 5614e1c31fc5b23a (check 0.03s)
2022-06-26 13:40:56 asr2/radeonvii3-w2 216091 OK    80000  36.83%;  108 us/it; ETA 0d 00:00; ba8e91ee8a118ae6 (check 0.02s)
2022-06-26 13:40:58 asr2/radeonvii3-w2 216091 OK   100000  46.04%;  112 us/it; ETA 0d 00:00; 0b51858a4a12ef62 (check 0.02s)
2022-06-26 13:41:00 asr2/radeonvii3-w2 216091 OK   120000  55.25%;  107 us/it; ETA 0d 00:00; 5098a06175e33b5e (check 0.02s)
2022-06-26 13:41:03 asr2/radeonvii3-w2 216091 OK   140000  64.46%;  105 us/it; ETA 0d 00:00; 67c044a637589727 (check 0.02s)
2022-06-26 13:41:05 asr2/radeonvii3-w2 216091 OK   160000  73.66%;  106 us/it; ETA 0d 00:00; f933c2a2638b1b9a (check 0.02s)
2022-06-26 13:41:07 asr2/radeonvii3-w2 216091 OK   180000  82.87%;  102 us/it; ETA 0d 00:00; 387e35fbbe410479 (check 0.02s)
2022-06-26 13:41:09 asr2/radeonvii3-w2 216091 OK   200000  92.08%;  109 us/it; ETA 0d 00:00; 7e58fc60c8cd8180 (check 0.04s)
2022-06-26 13:41:11 asr2/radeonvii3-w2 PP   216091 / 216091, 0000000000000001
2022-06-26 13:41:11 asr2/radeonvii3-w2 216091 OK   217200 100.00%;  107 us/it; ETA 0d 00:00; 146b40888c95c7e9 (check 0.02s)
2022-06-26 13:41:11 asr2/radeonvii3-w2 proof: building level 1, hash 501ced7b2faed88c
2022-06-26 13:41:11 asr2/radeonvii3-w2 proof: building level 2, hash c6210f5fce31fa40
2022-06-26 13:41:11 asr2/radeonvii3-w2 proof: building level 3, hash b64b7be25a5b3f92
2022-06-26 13:41:11 asr2/radeonvii3-w2 proof: building level 4, hash f127ebdb12e4fbbc
2022-06-26 13:41:11 asr2/radeonvii3-w2 proof: building level 5, hash fe7a3beaf20377e2
2022-06-26 13:41:11 asr2/radeonvii3-w2 proof: building level 6, hash fe7ba158a7c29784
2022-06-26 13:41:12 asr2/radeonvii3-w2 proof: building level 7, hash cd55f7ef1e061d66
2022-06-26 13:41:12 asr2/radeonvii3-w2 proof: building level 8, hash 03a04903f6dcfbbf
2022-06-26 13:41:13 asr2/radeonvii3-w2 proof: building level 9, hash 0667a5e37d43bb0a
2022-06-26 13:41:16 asr2/radeonvii3-w2 proof: building level 10, hash a1a0233eb7bda91a
2022-06-26 13:41:21 asr2/radeonvii3-w2 PRP-Proof 'proof\216091-10.proof' generated
2022-06-26 13:41:21 asr2/radeonvii3-w2 Proof: cleaning up temporary storage
2022-06-26 13:41:22 asr2/radeonvii3-w2 {"status":"P", "exponent":"216091", "worktype":"PRP-3", "res64":"0000000000000001", "residue-type":"1", "errors":{"gerbicz":"0"}, "fft-length":"131072", "proof":{"version":"1", "power":"10", "hashsize":"64", "md5":"d46aad26ddc6adf4bec2a4b4051b7688"}, "program":{"name":"gpuowl", "version":"v6.11-380-g79ea0cc"}, "user":"kriesel", "computer":"asr2/radeonvii3-w2", "timestamp":"2022-06-26 18:41:22 UTC"}
Mprime/prime95

GEC was introduced to prime95 at v29.4. It does it differently, leaving unguarded the last few iterations past the last whole block up to the exponent. It also dynamically varies GEC block size up or down based on error rate observed, or downward for the last iterations left IIRC. So the number of unguarded iterations at the end is only dozens, with a low expected rate of error.

From prime95's undoc.txt:
Code:
When doing highly-reliable error checking, the interval between compares can be 
controlled with these two settings in prime.txt:
    PRPGerbiczCompareInterval=n        (default is 1000000)
Reducing the interval will reduce how far the program "rolls back" when an error
is detected.  It will also increase the overhead associated with error-checking.
NOTE: For technical reasons, PRPGerbiczCompareInterval is rounded to the nearest perfect square.
ALSO NOTE:  The program automatically adjusts the Gerbicz interval downward when an error is
detected.  This will reduce the amount of time "lost" rolling back to the last verified good
iteration after an error is detected.  Over time, the Gerbicz interval will be adjusted back
upward after many successful compares.
Mlucas
From a recent version's help.txt:
Code:
The ITERS_BETWEEN_CHECKPOINTS value can be customized by adding a "CheckInterval = [value]"
line to one's mlucas.ini file, but note that there are constraints on the value related to
the Gerbicz-checking done for PRP tests. Specifically, the CheckInterval value must be a
multiple of 1000 and must divide 1 million. Violation of these constraints will trigger an
assertion-exit if a PRP-test is attempted.
I think that means the save file interval doubles as the GEC block size. That would allow GEC block sizes of 1000, 2000, 4000, 5000, 8000, 10000, 20000, 40000, 50000, 100000, 200000, 500000. (Not yet confirmed by test.)


Top of this reference thread: https://www.mersenneforum.org/showpo...89&postcount=1
Top of reference tree: https://www.mersenneforum.org/showpo...22&postcount=1

Last fiddled with by kriesel on 2022-08-07 at 16:20
kriesel is online now  
Closed Thread

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
"The Librarians" on TNT ... Mersenne Prime reference Madpoo Lounge 6 2017-01-31 20:03
GPU Computing Cheat Sheet (a.k.a. GPU Computing Guide) Brain GPU Computing 20 2015-10-25 18:39
How do you obtain material of which your disapproval governs? jasong jasong 97 2015-09-14 00:17
NFS reference Jushi Math 2 2006-08-28 12:07
The difference between P2P and distributed computing and grid computing GP2 Lounge 2 2003-12-03 14:13

All times are UTC. The time now is 00:22.


Fri Aug 12 00:22:00 UTC 2022 up 35 days, 19:09, 2 users, load averages: 1.35, 1.51, 1.32

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2022, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.

≠ ± ∓ ÷ × · − √ ‰ ⊗ ⊕ ⊖ ⊘ ⊙ ≤ ≥ ≦ ≧ ≨ ≩ ≺ ≻ ≼ ≽ ⊏ ⊐ ⊑ ⊒ ² ³ °
∠ ∟ ° ≅ ~ ‖ ⟂ ⫛
≡ ≜ ≈ ∝ ∞ ≪ ≫ ⌊⌋ ⌈⌉ ∘ ∏ ∐ ∑ ∧ ∨ ∩ ∪ ⨀ ⊕ ⊗ 𝖕 𝖖 𝖗 ⊲ ⊳
∅ ∖ ∁ ↦ ↣ ∩ ∪ ⊆ ⊂ ⊄ ⊊ ⊇ ⊃ ⊅ ⊋ ⊖ ∈ ∉ ∋ ∌ ℕ ℤ ℚ ℝ ℂ ℵ ℶ ℷ ℸ 𝓟
¬ ∨ ∧ ⊕ → ← ⇒ ⇐ ⇔ ∀ ∃ ∄ ∴ ∵ ⊤ ⊥ ⊢ ⊨ ⫤ ⊣ … ⋯ ⋮ ⋰ ⋱
∫ ∬ ∭ ∮ ∯ ∰ ∇ ∆ δ ∂ ℱ ℒ ℓ
𝛢𝛼 𝛣𝛽 𝛤𝛾 𝛥𝛿 𝛦𝜀𝜖 𝛧𝜁 𝛨𝜂 𝛩𝜃𝜗 𝛪𝜄 𝛫𝜅 𝛬𝜆 𝛭𝜇 𝛮𝜈 𝛯𝜉 𝛰𝜊 𝛱𝜋 𝛲𝜌 𝛴𝜎𝜍 𝛵𝜏 𝛶𝜐 𝛷𝜙𝜑 𝛸𝜒 𝛹𝜓 𝛺𝜔