2019-05-22, 19:50 | #1 |
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
3×19×107 Posts |
Statistics and statistical properties
This is a reference thread regarding statistics of GIMPS results and the expected and observed statistical properties of the various computation results. Please use the reference discussion thread for comments https://www.mersenneforum.org/showthread.php?t=23383
Table of Contents
See also https://www.mersenneforum.org/showpo...40&postcount=4 regarding error rates of primality tests. Top of reference tree: https://www.mersenneforum.org/showpo...22&postcount=1 Last fiddled with by kriesel on 2021-11-28 at 16:29 |
2019-05-22, 19:50 | #2 |
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
6099_{10} Posts |
Statistical properties of categories of GIMPS results and interim results
Statistical properties of categories of GIMPS results and interim results
(originally posted at https://www.mersenneforum.org/showpo...32&postcount=1) I've been thinking lately, about what the expected probability distributions are for various categories of outputs, in GIMPS computing runs, how the distributions may be affected by various types of errors, and whether or when it might be feasible to detect the difference. All the following is in regard to p>>64, such as at the current GIMPS wavefronts of primality testing and double checking, and up to p<2^{32}. In perl, the table for gpu applications of believed-bad res64 values is currently %badresidues=( 'cllucas', '0x0000000000000002, 0xffffffff80000000', 'cudalucas', '0x0000000000000000, 0x0000000000000002, 0xfffffffffffffffd', 'cudapm1', '0x0000000000000000, 0x0000000000000001, 0xfff7fffbfffdfffe, 0xfff7fffbfffdffff, 0xfff7fffbfffffffe, 0xfff7fffbffffffff, '. '0xfff7fffffffdfffe, 0xfff7fffffffdffff, 0xfff7fffffffffffe, 0xfff7ffffffffffff, 0xfffffffbfffdfffe, 0xfffffffbfffdffff, '. '0xfffffffbfffffffe, 0xfffffffbffffffff, 0xfffffffffffdfffe, 0xfffffffffffdffff, 0xfffffffffffffffe, 0xffffffffffffffff', 'gpuowl', '0x0000000000000000', 'mfaktc', '', 'mfakto', '' ); For cpu-based primality testers, interim res64 values 0x00, 0x02, and 0xfffffffffffffffd are regarded as bad LL values, and 0x00 is regarded as a bad PRP res64 value. However, in https://www.mersenneforum.org/showth...?t=5862&page=2 we see that res64 values 0x000000000000 or 0xffffffffffffffff occur sometimes legitimately, as in the last iteration before the LL test identifies a Mersenne prime. a) LL final residues: 0 if Mp is a prime and the computation went correctly. Certain error conditions generate spurious 0x02 or 0x0 or 0xffffffff80000000 or 0xfffffffffffffffd sequences or final res64. Some of these are application specific. Composite mersenne numbers' final LL full res are I think expected to be uniformly distributed in 0<res<Mp, so res64 are expected to be (almost) uniformly distributed 0<res64<ffffffffffffffff. (res64 would have a slightly lowered probability of ffffffffffffffff because res64(Mp)=ffffffffffffffff is excluded.) Which would mean final res64 values for composite mersenne numbers are likely to be, but not guaranteed to be, unique over p<2^{32}. There are 203,280,221 primes below 4294967296. The res64 field has 18,446,744,073,709,551,616 possible values. If all but the last are found and are unique, the last one has a 1.1 x 10^{-11} chance of coinciding with one of them. I estimate there is a 0.11% chance of any of the res64 values of composite Mersennes in that range coinciding. There are 50,847,534 primes below 10^{9}. I estimate there is a 0.00006 chance of any of the res64 values of composite Mersennes below p=10^{9} coinciding. So any observed coincidences, especially multiple coincidences, should be looked at closely as clues of possible hardware or software or reporting flaws. The historical rate of LL final residue error per LL test is ~1% with the Jacobi check, ~2% without, at the wavefront. b) LL interim residues: for a given seed value, there's initially a lockstep sequence, until the modulo Mp begins to make the values different for each p. For seed 4, with sufficiently high p (near 2^{32}), there is an iteration that would correctly produce res64 0x0000000000000002. Which coincides with an error condition check value. At some point after the mod Mp starts having effect, the interim res64 values start to look pseudorandom to us mere mortals. Are they, in the absence of various error conditions we already have checks for, and other than the "penultimate" values associated with Mersenne primes, equally likely, over a large number of exponents and iterations, 0<=res64<=ffffffffffffffff? (again, except for the pesky Mp res64 exclusion and deferred carry effect. I vaguely recall some carry operations are deferred in fft multiplies. Which should not affect final residues, but means we must be careful about interim residues used for validity tests or screen output.) c) PRP3 Situations similar to the preceding arise for PRP3, although it depends on base, residue type, etc. There's less incentive for any statistical evaluation there since the Gerbicz error check is so good. Interim PRP res64 can range 0<=res64<0xffffffffffffffff (<=ffffffffffffffff with deferred carry) with uniform distribution (again, except for the pesky Mp res64 exclusion and deferred carry effect) A rough estimation of the residual PRP3 result error rate is the following. From 2019-08-12 to 2021-08-12, above exponent 77M, there are 3 confirmed erroneous PRP results, and 123,519 verified PRP results. In 50M-77M for the same period, there are 751 verified PRP results and 0 erroneous. Overall, that's an error rate of approximately 3/(123522 + 751 ) ~24. ppm in practice (with high uncertainty due to the very small number of errors). d) P-1 res64 Like the primality tests, P-1 factoring such as in CUDAPm1 produces periodically res64 outputs, both in stage 1 and in stage 2. Certain error condition values have been found to apply. Some error conditions have been observed to cause arbitrary res64 values to repeat at every output to console in stage 2. On one gpu model and P-1 run, I observed a sort of cyclic pattern of several res64 values which was mostly composed of hex f and consisted only of the 5 hex digits 7bdef (the only hex chars with at most one zero bit; 0111, 1011, 1101, 1110, 1111). In the absence of errors, is the res64 in P-1 stage 2 expected to be a uniform distribution over 0<=res64<ffffffffffffffff? (again, except for the pesky Mp res64 exclusion and deferred carry effect) Similarly, for stage 1? I think they are, because they are from computations mod Mp. Res64 values 0x00, 0x01, 0x02, and 0xffffffffffffffff appear to be rare but legal values, providing they do not monotonously repeat. As noted above, CUDAPm1 has res64 output values in both stage 1 and stage 2. Per Mihai, gpuowl is different and does not have meaningful res64 values during stage 2. It does output res64 values during stage 1. e) Factor found The distribution of factors found is nonuniform. k=1 is most common. The probability of prime f=2kp+1 being a prime factor of Mp is 1/f, which for given p is nearly proportional to 1/k. There's also the f= 1 or 7 mod 8 requirement, which derives from k=0 or -p mod 4, per the Wagstaff paper Divisors of Mersenne Numbers. f) No factor found The probability distribution of remainders in trial factoring where no factor is found is uniform, but over a different range 1 to f-1 for each candidate factor f. Since candidate factors are of form f=2kp+1, they are different from one exponent p to the next, so not reusable. If a remainder is detected >= f, that would be a sign of error probably in the mod function. Questions: 1) What errors do you see in the above? 2) What potentially useful statistical behavior is omitted? Top of reference tree: https://www.mersenneforum.org/showpo...22&postcount=1 Last fiddled with by kriesel on 2021-08-25 at 18:17 Reason: added comments on tf candidate factors & remainders |
2019-05-22, 20:00 | #3 |
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
3×19×107 Posts |
Statistics of Mersenne number factors found
A table of numbers of exponents with 0 to 7 factors under 2^65 is available at https://www.mersenneforum.org/showpo...1&postcount=46
Fully factored exponents show higher prevalence of p=3 mod 4 than p=1 mod 4. https://www.mersenneforum.org/showpo...4&postcount=62 And the ratio rises as they are screened for p > increasing thresholds. Semiprime exponents show higher prevalence of p=3 mod 4 than p=1 mod 4. https://www.mersenneforum.org/showpo...5&postcount=69 And the ratio rises to 2:1 as they are screened for p > increasing thresholds. Exponents p = 3 mod 4 have slightly more factors found, than p= 1 mod 4. https://www.mersenneforum.org/showpo...4&postcount=77 Also f=1 mod 8 has fewer factors found than f=7 mod 8, by about 12%. https://www.mersenneforum.org/showpo...6&postcount=80 Combining consideration of f = 1 or 7 mod 8 and 1 or 5 mod 6, f = 1, 7, 17, or 23 mod 24. "for exponents p=3 (mod 4), factors that are f=23 (mod 24) are found about one-third more frequently than the other kinds." https://www.mersenneforum.org/showpo...7&postcount=85 Known factors and other info can be downloaded from links at https://www.mersenneforum.org/showpo...1&postcount=11 for p<10^{9}; for above, at https://www.mersenneforum.org/showpo...54&postcount=9 Wagstaff paper: Divisors of Mersenne Numbers, MATHEMATICS OF COMPUTATION VOLUME 40, NUMBER 161 JANUARY 1983. PAGES 385-397 http://www.ams.org/journals/mcom/198...-0679454-X.pdf Top of reference tree: https://www.mersenneforum.org/showpo...22&postcount=1 Last fiddled with by kriesel on 2020-12-14 at 15:50 |
2019-05-22, 20:03 | #4 |
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
13723_{8} Posts |
Statistics of Mersenne Prime Exponents
Begin with the content of https://primes.utm.edu/notes/faq/NextMersenne.html
We expect slightly less than 6 Mersenne primes in a 10:1 interval on exponent. M82589933 is part of an unusual run of (at least) 13 in the 10^{7} to 10^{8} interval. "For comparison: 1 to 10: 4 10 to 100: 6 100 to 1000: 4 1000 to 10000: 8 10000 to 100000: 6 100000 to 1000000: 5 1000000 to 10000000: 5" https://www.mersenneforum.org/showpo...70&postcount=5 (The next decade after the above, 10^{7} to 10^{8 }is not completely double checked yet. In 10^{8} to 10^{9} there are relatively few tested so far, with a large amount of further computation to do.) p for Mp is 1 mod 8 at twice the frequency of 3, 5, or 7 mod 8 up to Mp50. https://www.mersenneforum.org/showpo...2&postcount=53 The Wagstaff conjecture predicts about 57 Mp below p=10^{9}. https://www.mersenneforum.org/showpo...7&postcount=55 Of the known Mersenne primes, many more are p=1 mod 4 (31) than p=3 mod 4 (19). https://www.mersenneforum.org/showpo...8&postcount=59 For #51, p= 82589933 = 1 mod 4, so 31 (60.78%) vs 19 (37.25%) 3 mod 4 and 1 (1.96%) of 2 mod 4. 31/19 ~ 1.6316. At exponents > 1000, it's ~2:1. Mersenne primes are twice as likely to be p=1 mod 4 as 3 mod 4 conjectured, with some empirical tabular support. Within p=1 (mod 4), Mersenne primes are twice as likely to have p=1 (mod 8) rather than p=5 (mod 8) https://www.mersenneforum.org/showpo...5&postcount=71 (added following 2021-08-29) There's naturally a somewhat inverse relationship between a) which p mod whatever have more found factors, and b) which p mod whatever have more found Mersenne primes. I think I may have seen it posted elsewhere, but can't find it now, that exponent p mod 24 shows considerable variation in incidence of known Mersenne primes. Code:
P = 1 mod 24 8 15.69% P = 2 mod 24 1 1.96% P = 3 mod 24 1 1.96% P = 5 mod 24 9 17.65% P = 7 mod 24 8 15.69% P = 11 mod 24 3 5.88% P = 13 mod 24 3 5.88% P = 17 mod 24 11 21.57% P = 19 mod 24 5 9.80% P = 23 mod 24 2 3.92% total 51 Code:
p count 1 0 2 1 3 1 5 1 7 4 11 1 13 2 17 3 19 1 23 0 Code:
p count p n 1 8 5 8 7 4 11 2 13 1 17 8 19 4 23 2 If we break it down further to p mod 120, we get too few known primes per bin to show much. Code:
p mod 120 of the 51 known Mersenne primes p count 1 3 2 1 3 1 5 1 7 3 13 1 17 4 19 1 29 1 31 1 37 1 (15 excluding 2,3,5 in p mod 120 from 0 to 40) 41 2 43 1 49 2 53 3 61 1 67 1 71 1 77 1 79 1 (13 in p mod 120 from 40 to 80) 83 1 89 3 91 2 97 3 101 3 103 3 107 2 113 2 119 1 (20 in p mod 120 from 80 to 120) Last fiddled with by kriesel on 2021-12-13 at 13:05 Reason: minor edits |
2019-07-08, 01:50 | #5 |
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
3·19·107 Posts |
Probability estimates for trial factors in same bit level or class
The odds of finding a factor in a bit level of trial factoring are ln(bitlevel/(bitlevel-1)), or approx. 1/(bitlevel - 1/2) (better than 18 ppm at bitlevel 70, 15 ppm at bitlevel 76, 12 ppm at 86). These estimates are probabilities for completing the entire bit level, not quitting after the first class containing a factor.
What are the odds of finding more than one trial factor in a bit level? per https://mersenneforum.org/showpost.p...postcount=1147 no factors: (bitlevel-1)/bitlevel one factor: (bitlevel-1)/bitlevel * (ln(bitlevel/(bitlevel-1))) Two factors: (bitlevel-1)/bitlevel * (ln(bitlevel/(bitlevel-1)))^{2} / 2 N factors: (bitlevel-1)/bitlevel * (ln(bitlevel/(bitlevel-1)))^{n} / n! The probability of finding n factors declines quickly with increasing n. In the same class of the same bit level? Two factors/bit-level case This case appears likely to occur many thousands of times in p<10^{9}. If two factors or more are found in the same bit level, it doesn't matter which class the first one occupies. Assuming the class location of the next factor found is independent, there's a one in c chance it coincides with the class of the first factor, where c is the ACTIVE number of classes. For MORE_CLASSES, the 2x2x3x5x7x11 = 4620 classes are winnowed down to 960 by whole-class presieving for very small primes, so c=960; for LESS_CLASSES, the 420 classes are winnowed down to 96, so c=96. LESS_CLASSES: 96/420 = 1/2 x 2/3 x 4/5 x 6/7 = 48 / 210 = ~.2286 MORE_CLASSES: 960/4620= 1/2 x 2/3 x 4/5 x 6/7 x 10/11 = 480 / 2310 =~ .2078 (9.1% smaller) hypothetical EXTRA_CLASSES: 11520 / 60060 = 1/2 x 2/3 x 4/5 x 6/7 x 10/11 x 12/13 = 5760 / 30030 = ~.1918 (7.7% smaller) The chance of two factors both in the same class is then (bitlevel-1)/bitlevel * (ln(bitlevel/(bitlevel-1)))^{2} / 2 / c. For 75 bits, 74/75 * (ln(75/74))^{2}/2/c = 8.89E-5/c; c=96 for less-classes so 9.26E-7, or c=960 for more-classes so 9.26E-8 (using Fortran style exponential notation). If two factors are found in the same bit level, the chance of them landing in the same class is 1/c, and the chance of different classes 1-1/c. (1/c +(1-1/c) = 1, sum of probabilities = 1, check.) Three factors/bit-level case This case appears likely to occur hundreds of times in p<10^{9}. If three factors are found in the same bit level, the chance of none of them coinciding in the same class is p0= (c-1)/c * (c-2)/c = (c^{2}-3c+2)/c^{2} = 1- (3c-2)/c^{2} The chance of any two of them landing in the same class is the second one matches the first but the third does not, or the first two do not match but the third matches one of them, p2 = 1/c (c-1)/c +(c-1)/c 2/c = (c-1)/c^{2} + (2c-2)/c^{2} = 3(c-1)/c^{2} while the chance of all 3 landing in the same class is 1/c^{2}, for a combined chance of p3 = (3c-2)/c^{2}. (p0 +p2 +p3 = 1- (3c-2)/c^{2} + 3(c-1)/c^{2} + 1/c^{2} = 1, check.) For less-classes the chance of 2 or more landing in the same class is ( 3 x 96 - 2 ) / 96^{2} =~ .031033 while for more-classes it's (3 x 960 - 2 ) / 960^{2} =~ .0031228 chance of coincident classes. The probability of 3 factors is (bitlevel-1)/bitlevel * (ln(bitlevel/(bitlevel-1)))^{3} / 3! For bitlevel 75, p=74/75 * (ln(75/74))^3 / 3! = 3.978E-7, so per factoring attempt, less-classes yields a chance of coincident factors by 3 found of 1.233E-8 while more-classes yields 1.241E-9. The combinations of two or three factors per bit level coinciding in a class are then, for 75 bits, Less-classes: 9.26E-7 + 1.233E-8 = 9.38E-7 More-classes: 9.26E-8 + 1.241E-9 = 9.38E-8 The odds of four or more are not zero, but they fall to the right of the rounding of the above figures. Note, for three or more factors per bit level, these are not binomial distributions (since probabilities of matching are not constant with repeated trials) Four factors/bit-level case This case appears likely to occur few times in p < 10^{9}. If four factors are found in the same bit level, the chance of each falling in a separate class is the chance of 3 successive misses. The chance of any two of them landing in the same class is, the second matches the first but the third and fourth do not, otherwise the third matches the first or second but the others do not, or otherwise the fourth matches one of the preceding 3 but the first three did not match each other. The chance of 3 coinciding is 4-fold; 123, 124, 134, or 234; 1/c^{2} * (c-1)/c + (1/c)(c-1)/c * 1/c + (c-1)/c(1/c)^{2} + (c-1)/c(1/c)^{2}= 4(c-1)/c^{3} The chance of all four coinciding is 1/c^{3}. no coincidences: p0= (c-1)/c (c-2)/c (c-3)/c = (c-1)(c-2)(c-3)/c^{3} = 1 - (6 c^{2} -11c +6)/c^{3}) coincidence of just 2: p2= 1/c (c-1)(c-2)/c^{2} + (c-1)(c-2)/c^{2} * 2 / c + 3/c *(c-1)(c-2)/c^{2} = 6(c-1)(c-2)/c^{3} coincidence of 3 in a class: p3= 4(c-1)/c^{3} coincidence of 4 in the same class: p4= 1/c^{3} The probability sums for the four-factors cases listed above don't add up to probability one, so something's not quite right yet. P0+p2+p3+p4 = 1 -3(c-1)/c^{3} instead of one. There's also a slight possibility of a double coincidence of two, which could occur one of three ways. A) First and second match, third misses first/second but third and fourth match. p22a=1/c * (c-1)/c * 1/c for this subcase. B) First and third match, second and fourth match, first and second do not match. p22b=p22a. C) First and fourth match, second and third match, first and second do not match. p22c=p22a. Total, p22=3(c-1)/c^{3} In total, chances of 2 or more coinciding: 6(c-1)(c-2)/c^{3} + 4(c-1)/c^{3} + 1/c^{3} +3(c-1)/c^{3} = ( 6(c-1)(c-2) + 4(c-1) + 1 +3(c-1)) / c^{3} = ( 6c^{2} -18c +12 +7c -7 +1 ) / c^{3} = (6 c^{2} -11c +6) / c^{3} none: c=96, p0=0.938687; c=960, p= 0.9937619 2 coincide: c=96, p2=0.060560; c=960, p2= 0.0062305 3: c=96, p3=4.295e-4; c=960, p3= 4.34E-6 4: c=96, p4=1.13e-6; c=960, p4= 1.13E-9 dual-2: c=96, p22=0.000322; c=960, p22=3.25e-6 2 to 4: c=96, pn=0.061313; c=960, pn= 0.0062381 c=96, p0+pn= 1, check; c=960, p0+pn= 1, check. More than four factors/bit-level cases These cases are very unlikely to occur in p<10^{9} and so are not calculated here. They also rapidly become more complex, as illustrated by the numbers of coincidence subcases possible versus number of factors found rises. 5: 2; 3; 4; 5; 22; 23 6: 2; 3; 4; 5; 6; 22; 23; 24; 33; 222 7: 2; 3; 4; 5; 6; 7; 22; 23; 24; 25; 33; 34; 222; 223 8: 2; 3; 4; 5; 6; 7; 8; 22; 23; 24; 25; 26; 33; 34; 35; 44; 222; 223; 224; 233; 2222 9: 2; 3; 4; 5; 6; 7; 8; 9; 22; 23; 24; 25; 26; 27; 33; 34; 35; 36; 44; 45; 222; 223; 224; 225; 233; 234; 333; 2222; 2223 10: 2; 3; 4; 5; 6; 7; 8; 9; 10; 22; 23; 24; 25; 26; 27; 28; 33; 34; 35; 36; 37; 44; 45; 46; 55; 222; 223; 224; 225; 226; 233; 234; 235; 244; 333; 334; 2222; 2223; 2224; 2233; 22222 Summary Odds are estimated for a variety of bit levels and number of factors per level or class in the attachment. Odds are sufficient that in p<10^{9}, up to 2 factors per class seems likely to occur sometime (dozens to thousands of occurrences depending on factoring depth and frequency of use of less-classes), but 5 or more factors per class occurring sometime is unlikely; 3 or more is unlikely from 65 bits upward. Top of reference tree: https://www.mersenneforum.org/showpo...22&postcount=1 Last fiddled with by kriesel on 2021-12-13 at 13:10 |
2020-09-17, 17:51 | #6 |
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
3·19·107 Posts |
Benford's Law
Benford's Law https://en.wikipedia.org/wiki/Benford's_law comes from observations that many cases of observed or measured sets of numbers have a nonuniform distribution of leading digits. That is, 1 is the most common leading digit, at 30.1% rather than 11.1%,and 9 the least, at 4.6% rather than 11.1%, for base ten numbers. It also applies to second and third digits, and to expression in other number bases. Numbers following Benford's law have digit probabilities based on the logarithm of the numbers. It requires that the numbers are distributed over several orders of magnitude, which at 2 to 82589933, Mersenne prime exponents certainly are, as well as the Mersenne primes' number of decimal digits or digits in other number bases. Log_{10}(82589933/2)=7.616; log_{10}(24862048/1) =7.396.
Benford's law is one of the tests used to detect fabricated numbers, such as in tax or accounting or research fraud. The numbers that people think are random when faking values generally are not random, generally avoiding round numbers, repeated digits, ascending or descending series, palindromes, etc. and may even reveal individuals' distinct tendencies. (I have no reason to suspect fakery in the Mersenne primes that have been repeatedly checked, so expected them to exhibit usual probability behavior including probable deviations from expected statistical values, with significant deviation expected due to low number of samples available.) Considering the 51 known Mersenne primes, in base ten, their exponents follow Benford's law quite closely, with occurrence within 10% of expected, usually within +-1. See also https://mersenneforum.org/showpost.p...9&postcount=47 But not so for the number of digits in the decimal expression of the Mersenne primes. https://www.mersenne.org/primes/ In decimal, the leading digit 4 is noticeably underrepresented, 5 is absent, and 6 is substantially overrepresented. The absence of a leading 5 seems to line up with some of the larger ratio gaps between successive exponents. Based on the Lenstra & Pomerance conjecture we expect on average a ratio of ~1.47576 on exponent. (https://primes.utm.edu/notes/faq/NextMersenne.html) Mersenne primes with exponents 127 and 521 (exponent ratio 4.102) have 39 and 157 decimal digits respectively; 1279 and 2203 (exponent ratio 1.722) have 386 and 664; 11213 and 19937 (exponent ratio 1.778) have 3376 and 6002 digits; 132049 and 216091 (exponent ratio 1.636) have 39751 and 65050; 1398269 and 2976221 (exponent ratio 2.129) have 420921 and 895932; 13466917 and 20996011 (exponent ratio 1.559) have 4053946 and 6320430. ALL the ranges where we could look for a known Mersenne prime with a leading 5 in the number of decimal digits have greater than expected average gaps present, and no Mersenne prime, except for the first, 13 and 17, exponent ratio 1.308, having 4 and 6 digits respectively. Average observed ratio for gaps excluding 5 as a leading digit of the number of digits is 2.033, about 38% larger than the expected average ratio. So let's convert that list of decimal number of decimal digits into hexadecimal. And octal. Following are for each known Mersenne prime, the sequence number, exponent p, decimal number of decimal digits, of the Mersenne prime, and hexadecimal and octal representations of number of decimal digits; Code:
# p digits (hex) (octal) 1 2 1 0x1 o1 2 3 1 0x1 o1 3 5 2 0x2 o2 4 7 3 0x3 o3 5 13 4 0x4 o4 6 17 6 0x6 o6 7 19 6 0x6 o6 8 31 10 0xA o12 9 61 19 0x13 o23 10 89 27 0x1B o33 11 107 33 0x21 o41 12 127 39 0x27 o47 13 521 157 0x9D o235 14 607 183 0xB7 o267 15 1279 386 0x182 o602 16 2203 664 0x298 o1230 17 2281 687 0x2AF o1257 18 3217 969 0x3C9 o1711 19 4253 1281 0x501 o2401 20 4423 1332 0x534 o2464 21 9689 2917 0xB65 o5545 22 9941 2993 0xBB1 o5661 23 11213 3376 0xD30 o6460 24 19937 6002 0x1772 o13562 25 21701 6533 0x1985 o14605 26 23209 6987 0x1B4B o15513 27 44497 13395 0x3453 o32123 28 86243 25962 0x656A o62552 29 110503 33265 0x81F1 o100761 30 132049 39751 0x9B47 o115507 31 216091 65050 0xFE1A o177032 32 756839 227832 0x379F8 o674770 33 859433 258716 0x3F29C o771234 34 1257787 378632 0x5C708 o1343410 35 1398269 420921 0x66C39 o1466071 36 2976221 895932 0xDABBC o3325674 37 3021377 909526 0xDE0D6 o3360326 38 6972593 2098960 0x200710 o10003420 39 13466917 4053946 0x3DDBBA o17355672 40 20996011 6320430 0x60712E o30070456 41 24036583 7235733 0x6E6895 o33464225 42 25964951 7816230 0x774426 o35642046 43 30402457 9152052 0x8BA634 o42723064 44 32582657 9808358 0x95A9E6 o45324746 45 37156667 11185272 0xAAAC78 o52526170 46 42643801 12837064 0xC3E0C8 o60760310 47 43112609 12978189 0xC6080D o61404015 48 57885161 17425170 0x109E312 o102361422 49* 74207281 22338618 0x154DC3A o125156072 50* 77232917 23249425 0x162C211 o130541021 51* 82589933 24862048 0x17B5D60 o136656540 The substantial over-representation of leading digit 6 in bases 8, 10 and 16 seems odd. If there's some reason for that, other than statistics in low population sizes, please comment in a discussion thread. Maybe it is surprising, maybe it isn't. Consider that if there's one digit that is the most over-represented, in one base, by pure randomness, its chance of being the most over-represented in a second base also seems to be 1/base. That's only mildly low for 10 or 16, not very low. The usual test criterion, for rejecting that the difference between observed and expected, is less than 5% probability from chance, is not an issue in any of the Pearson chi-squared cases tabulated, although one is borderline. The 3 measures of x^{2}, m and d gave widely different results for the same case. An interesting paper on Benford's law and Mersenne numbers is here. It advises against using powers of two as the number base. Oops. Note to self: pursue further; perhaps try to calculate probability distribution for the various numbers of the different digits. It's not as "simple" as a single binomial distribution. Purpose is to put some numbers to how likely or unlikely the extent of digit-6 over-representation is in either base 10 or 16. Or some study of number theory and statistics. There's a discussion thread I began, in which jwaltos makes some recommendations. Top of reference tree: https://www.mersenneforum.org/showpo...22&postcount=1 Last fiddled with by kriesel on 2021-12-28 at 04:21 Reason: minor edits for style |
2021-11-28, 16:18 | #7 |
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
3×19×107 Posts |
Discovery dates
Discovery dates can be found at https://www.mersenne.org/primes/. Other than patterns such as varying computer usage & shutdown at U of Central Missouri and elsewhere, relating to academic year schedules, cooling load reduction in local summer by GIMPS participants with system fleets, or in the early computer age, multiple discoveries in the same computer run on the same day, this may be largely an exercise in numerology. A linear fit through month-of-year of discovery is almost level.
Of the 41 known Mersenne primes with discovery date known to the day, none fall in July and only one each in March and April. January contained the most with 8, followed closely by September with 7. Average 41/12 ~ 3.4167. December is among the months below the average, with 3. The number of coincident day of year for known Mersenne primes with known discovery dates, after excluding the few that were found the same day as another with the same computer runs, closely matches what we would expect based on uniform distribution probability used in a variant of the birthday problem. Day of year values are 4 7 10 25 27 28 30 40 49 50 63 98 131 136 152 153 155 162 177 236 244 247 250 251 262 268 281 303 307 318 321 341 349 360. That's 34 distinct values, in 41 data points. There are 7 occurrences of coincident day-of-year. All are pairs; no triplets or higher. Three of these are because the discoveries were part of the same computer runs (M521 and M607; M2203 and M2281; M4253 and M4423) The other four are: 136 (M9941 and M24036583; 1963-05-16 and 2004-05-15 respectively) 236 (M2976221 and M43112609; 1997-08-24 and 2008-08-23 respectively) 247 (M1257787 and M32582657; 1996-09-03 and 2006-09-04 respectively) 318 (M1398269 and M13466917; 1996-11-13 and 2001-11-14 respectively) Oddly, there's a leap year in each pair. Leap years occupy ~ 366 days / (3*365 + 366) ~ 0.2505 of the days. The chance of no leap year in a pair is 0.7495^2 = 0.5617. So the chance of a leap year or two in a pair is ~1-0.5617 = 0.4383, the chance of all 4 pairs having at least one ~0.4383^4 ~0.037. (About 1 in 27.) Removing the 3 coinciding from same computer run, there's a population of 38 for the birthday problem. https://en.wikipedia.org/wiki/Birthday_problem Probability of none of those coinciding is 365! / 365^38 / (365-38)! 365! / 365^38 = 1077 993899 058890 252173 100706 114541 143916 512152 957947 286609 973670 019169 779122 584416 099590 377943 631310 997513 116110 699265 105698 186698 385792 917984 305346 986847 512851 707730 010490 884345 828687 060311 348620 648301 007240 161633 613317 283801 958782 397371 881938 704070 024717 151944 821228 740463 295190 564065 350434 708894 402531 157116 022806 846589 206539 086012 029042 586988 719041 809142 247854 748662 282749 727053 971216 563774 994858 534976 722754 486856 121229 875073 511249 498817 402667 349255 610337 045985 624777 215765 056746 744016 761503 389810 832557 755142 695389 766234 257843 417672 476804 083131 579968 477472 992761 604106 384139 340942 445146 623284 991258 875418 717299 355093 070233 864700 102342 807155 822354 628676 216382 611005 120745 566266 (682 digits) (365-38)! = 7930 380485 625411 633272 752415 295302 944886 366881 107611 733184 778480 029858 446765 003671 107763 861600 076825 701830 492192 305853 377965 780965 780340 295589 285869 163190 000632 259847 105736 589079 366709 441983 671826 381676 908043 733517 173277 654192 469974 631710 000299 280179 293148 780381 827478 828043 017396 020782 746400 173696 621067 589383 048659 562023 245288 789026 103352 264098 180763 809219 461866 535802 948393 706482 413289 714758 369362 634127 590333 890813 586362 980348 068695 605432 887817 041395 440138 493890 651585 513618 213068 893413 473645 332365 404125 778003 725586 312053 833477 790233 420298 884836 001508 422974 271903 924732 400848 063020 055739 872208 486400 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 (682 digits) ratio is ~1077993899 / 7930380485 ~ 0.13593. So probability of some coinciding is 1-0.13593 ~ 0.86407. Expected number of discovery dates that coincide with some other while determined independently is 38 - 38 * (364/365)^37 = 3.668, which is well matched by the 4 we find. Top of reference tree: https://www.mersenneforum.org/showpo...22&postcount=1 Last fiddled with by kriesel on 2021-12-13 at 12:58 |
Thread Tools | |
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Statistical properties of categories of GIMPS results and interim results | kriesel | Probability & Probabilistic Number Theory | 1 | 2019-05-22 22:59 |
Statistical help | henryzz | Homework Help | 5 | 2013-03-25 11:26 |
Statistical odds for prime in Wagstaff vs Mersenne | diep | Math | 27 | 2010-01-13 20:18 |
Cpu Properties | mack | Information & Answers | 2 | 2009-09-07 05:48 |
Statistical Investigation of Large Digit Mersenne Primes . . . (please don't bite me) | 9021951 | Math | 29 | 2005-03-15 16:26 |