2019-05-22, 19:50 | #1 |
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
3×23×67 Posts |
Statistics and statistical properties
This is a reference thread regarding statistics of GIMPS results and the expected and observed statistical properties of the various computation results. Please use the reference discussion thread for comments https://www.mersenneforum.org/showthread.php?t=23383
Table of Contents 1. This post 2. Statistical properties of categories of GIMPS results and interim results https://www.mersenneforum.org/showpo...98&postcount=2 3. Statistics of Mersenne number factors found https://www.mersenneforum.org/showpo...99&postcount=3 4. Statistics of Mersenne Prime Exponents https://www.mersenneforum.org/showpo...00&postcount=4 5. Probability estimates for trial factors in same bit level or class https://www.mersenneforum.org/showpo...82&postcount=5 6. Benford's Law https://www.mersenneforum.org/showpo...35&postcount=6 etc tbd See also https://www.mersenneforum.org/showpo...40&postcount=4 regarding error rates of primality tests. Top of reference tree: https://www.mersenneforum.org/showpo...22&postcount=1 Last fiddled with by kriesel on 2020-09-17 at 17:53 |
2019-05-22, 19:50 | #2 |
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
1001000001111_{2} Posts |
Statistical properties of categories of GIMPS results and interim results
Statistical properties of categories of GIMPS results and interim results
(originally posted at https://www.mersenneforum.org/showpo...32&postcount=1) I've been thinking lately, about what the expected probability distributions are for various categories of outputs, in GIMPS computing runs, how the distributions may be affected by various types of errors, and whether or when it might be feasible to detect the difference. All the following is in regard to p>>64, such as at the current GIMPS wavefronts of primality testing and double checking, and up to p<2^{32}. In perl, the table for gpu applications of believed-bad res64 values is currently %badresidues=( 'cllucas', '0x0000000000000002, 0xffffffff80000000', 'cudalucas', '0x0000000000000000, 0x0000000000000002, 0xfffffffffffffffd', 'cudapm1', '0x0000000000000000, 0x0000000000000001, 0xfff7fffbfffdfffe, 0xfff7fffbfffdffff, 0xfff7fffbfffffffe, 0xfff7fffbffffffff, '. '0xfff7fffffffdfffe, 0xfff7fffffffdffff, 0xfff7fffffffffffe, 0xfff7ffffffffffff, 0xfffffffbfffdfffe, 0xfffffffbfffdffff, '. '0xfffffffbfffffffe, 0xfffffffbffffffff, 0xfffffffffffdfffe, 0xfffffffffffdffff, 0xfffffffffffffffe, 0xffffffffffffffff', 'gpuowl', '0x0000000000000000', 'mfaktc', '', 'mfakto', '' ); For cpu-based primality testers, interim res64 values 0x00, 0x02, and 0xfffffffffffffffd are regarded as bad LL values, and 0x00 is regarded as a bad PRP res64 value. However, in https://www.mersenneforum.org/showth...?t=5862&page=2 we see that res64 values 0x000000000000 or 0xffffffffffffffff occur sometimes legitimately, as in the last iteration before the LL test identifies a Mersenne prime. a) LL final residues: 0 if Mp is a prime and the computation went correctly. Certain error conditions generate spurious 0x02 or 0x0 or 0xffffffff80000000 or 0xfffffffffffffffd sequences or final res64. Some of these are application specific. Composite mersenne numbers' final LL full res are I think expected to be uniformly distributed in 0<res<Mp, so res64 are expected to be (almost) uniformly distributed 0<res64<ffffffffffffffff. (res64 would have a slightly lowered probability of ffffffffffffffff because res64(Mp)=ffffffffffffffff is excluded.) Which would mean final res64 values for composite mersenne numbers are likely to be, but not guaranteed to be, unique over p<2^{32}. There are 203,280,221 primes below 4294967296. The res64 field has 18,446,744,073,709,551,616 possible values. If all but the last are found and are unique, the last one has a 1.1 x 10^{-11} chance of coinciding with one of them. I estimate there is a 0.11% chance of any of the res64 values of composite Mersennes in that range coinciding. There are 50,847,534 primes below 10^{9}. I estimate there is a 0.00006 chance of any of the res64 values of composite Mersennes below p=10^{9} coinciding. So any observed coincidences, especially multiple coincidences, should be looked at closely as clues of possible hardware or software or reporting flaws b) LL interim residues: for a given seed value, there's initially a lockstep sequence, until the modulo Mp begins to make the values different for each p. For seed 4, with sufficiently high p (near 2^{32}), there is an iteration that would correctly produce res64 0x0000000000000002. Which coincides with an error condition check value. At some point after the mod Mp starts having effect, the interim res64 values start to look pseudorandom to us mere mortals. Are they, in the absence of various error conditions we already have checks for, and other than the "penultimate" values associated with Mersenne primes, equally likely, over a large number of exponents and iterations, 0<=res64<=ffffffffffffffff? (again, except for the pesky Mp res64 exclusion and deferred carry effect. I vaguely recall some carry operations are deferred in fft multiplies. Which should not affect final residues, but means we must be careful about interim residues used for validity tests or screen output.) c) PRP3 Situations similar to the preceding arise for PRP3, although it depends on base, residue type, etc. There's less incentive for any statistical evaluation there since the Gerbicz error check is so good. Interim PRP res64 can range 0<=res64<0xffffffffffffffff (<=ffffffffffffffff with deferred carry) with uniform distribution (again, except for the pesky Mp res64 exclusion and deferred carry effect) d) P-1 res64 Like the primality tests, P-1 factoring such as in CUDAPm1 produces periodically res64 outputs, both in stage 1 and in stage 2. Certain error condition values have been found to apply. Some error conditions have been observed to cause arbitrary res64 values to repeat at every output to console in stage 2. On one gpu model and P-1 run, I observed a sort of cyclic pattern of several res64 values which was mostly composed of hex f and consisted only of the 5 hex digits 7bdef (the only hex chars with at most one zero bit; 0111, 1011, 1101, 1110, 1111). In the absence of errors, is the res64 in P-1 stage 2 expected to be a uniform distribution over 0<=res64<ffffffffffffffff? (again, except for the pesky Mp res64 exclusion and deferred carry effect) Similarly, for stage 1? I think they are, because they are from computations mod Mp. Res64 values 0x00, 0x01, 0x02, and 0xffffffffffffffff appear to be rare but legal values, providing they do not monotonously repeat. As noted above, CUDAPm1 has res64 output values in both stage 1 and stage 2. Per Mihai, gpuowl is different and does not have meaningful res64 values during stage 2. It does output res64 values during stage 1. e) Factor found The distribution of factors found is nonuniform. k=1 is most common. The probability of prime f=2kp+1 being a prime factor of Mp is 1/f, which for given p is nearly proportional to 1/k. There's also the f= 1 or 7 mod 8 requirement, which derives from k=0 or -p mod 4, per the Wagstaff paper Divisors of Mersenne Numbers. f) No factor found The probability distribution of remainders in trial factoring where no factor is found is uniform, but over a different range 1 to f-1 for each candidate factor f. Questions: 1) What errors do you see in the above? 2) What potentially useful statistical behavior is omitted? Top of reference tree: https://www.mersenneforum.org/showpo...22&postcount=1 Last fiddled with by kriesel on 2020-06-14 at 19:49 |
2019-05-22, 20:00 | #3 |
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
3×23×67 Posts |
Statistics of Mersenne number factors found
A table of numbers of exponents with 0 to 7 factors under 2^65 is available at https://www.mersenneforum.org/showpo...1&postcount=46
Fully factored exponents show higher prevalence of p=3 mod 4 than p=1 mod 4. https://www.mersenneforum.org/showpo...4&postcount=62 And the ratio rises as they are screened for p > increasing thresholds. Semiprime exponents show higher prevalence of p=3 mod 4 than p=1 mod 4. https://www.mersenneforum.org/showpo...5&postcount=69 And the ratio rises to 2:1 as they are screened for p > increasing thresholds. Exponents p = 3 mod 4 have slightly more factors found, than p= 1 mod 4. https://www.mersenneforum.org/showpo...4&postcount=77 Also f=1 mod 8 has fewer factors found than f=7 mod 8, by about 12%. https://www.mersenneforum.org/showpo...6&postcount=80 Combining consideration of f = 1 or 7 mod 8 and 1 or 5 mod 6, f = 1, 7, 17, or 23 mod 24. "for exponents p=3 (mod 4), factors that are f=23 (mod 24) are found about one-third more frequently than the other kinds." https://www.mersenneforum.org/showpo...7&postcount=85 Known factors and other info can be downloaded from links at https://www.mersenneforum.org/showpo...1&postcount=11 for p<10^{9}; for above, at https://www.mersenneforum.org/showpo...54&postcount=9 Wagstaff paper: Divisors of Mersnne Numbers, MATHEMATICS OF COMPUTATION VOLUME 40, NUMBER 161 JANUARY 1983. PAGES 385-397 http://www.ams.org/journals/mcom/198...-0679454-X.pdf Top of reference tree: https://www.mersenneforum.org/showpo...22&postcount=1 Last fiddled with by kriesel on 2019-11-19 at 06:35 |
2019-05-22, 20:03 | #4 |
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
3×23×67 Posts |
Statistics of Mersenne Prime Exponents
Begin with the content of https://primes.utm.edu/notes/faq/NextMersenne.html
We expect slightly less than 6 Mersenne primes in a 10:1 interval on exponent. M82589933 is part of an unusual run of (at least) 13 in the 10^{7} to 10^{8} interval. "For comparison: 1 to 10: 4 10 to 100: 6 100 to 1000: 4 1000 to 10000: 8 10000 to 100000: 6 100000 to 1000000: 5 1000000 to 10000000: 5" https://www.mersenneforum.org/showpo...70&postcount=5 (The next decade after the above, 10^{7} to 10^{8 }is not completely tested first time yet. 10^{8} to 10^{9} 0 so far, with a great deal of further computation to do.) p for Mp is 1 mod 8 at twice the frequency of 3, 5, or 7 mod 8 up to Mp50. https://www.mersenneforum.org/showpo...2&postcount=53 Wagstaff conjecture predicts about 57 Mp below p=10^9. https://www.mersenneforum.org/showpo...7&postcount=55 Of the known Mersenne primes, many more are p=1 mod 4 (31) than p=3 mod 4 (19). (Mp51 had not been announced yet) https://www.mersenneforum.org/showpo...8&postcount=59 For #51, p= 82589933 = 1 mod 4, so 32 vs 19 (1.68:1) Mersenne primes are twice as likely to be p=1 mod 4 as 3 mod 4 conjectured, with some empirical tabular support. Within p=1 (mod 4), Mersenne primes are twice as likely to have p=1 (mod 8) rather than p=5 (mod 8) https://www.mersenneforum.org/showpo...5&postcount=71 Top of reference tree: https://www.mersenneforum.org/showpo...22&postcount=1 Last fiddled with by kriesel on 2020-09-17 at 22:52 |
2019-07-08, 01:50 | #5 |
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
3·23·67 Posts |
Probability estimates for trial factors in same bit level or class
The odds of finding a factor in a bit level of trial factoring are ln(bitlevel/(bitlevel-1)), or approx. 1/(bitlevel - 1/2) (better than 18 ppm at bitlevel 70, 15 ppm at bitlevel 76, 12 ppm at 86). These estimates are probabilities for completing the entire bit level, not quitting after the first class containing a factor.
What are the odds of finding more than one trial factor in a bit level? per https://mersenneforum.org/showpost.p...postcount=1147 no factors: (bitlevel-1)/bitlevel one factor: (bitlevel-1)/bitlevel * (ln(bitlevel/(bitlevel-1))) Two factors: (bitlevel-1)/bitlevel * (ln(bitlevel/(bitlevel-1)))^{2} / 2 N factors: (bitlevel-1)/bitlevel * (ln(bitlevel/(bitlevel-1)))^{n} / n! The probability of finding n factors declines quickly with increasing n. In the same class of the same bit level? Two factors/bit-level case This case appears likely to occur many thousands of times in p<10^{9}. If two factors or more are found in the same bit level, it doesn't matter which class the first one occupies. Assuming the class location of the next factor found is independent, there's a one in c chance it coincides with the class of the first factor, where c is the ACTIVE number of classes. For MORE_CLASSES, the 2x2x3x5x7x11 = 4620 classes are winnowed down to 960 by whole-class presieving for very small primes, so c=960; for LESS_CLASSES, the 420 classes are winnowed down to 96, so c=96. LESS_CLASSES: 96/420 = 1/2 x 2/3 x 4/5 x 6/7 = 48 / 210 = ~.2286 MORE_CLASSES: 960/4620= 1/2 x 2/3 x 4/5 x 6/7 x 10/11 = 480 / 2310 =~ .2078 (9.1% smaller) hypothetical EXTRA_CLASSES: 11520 / 60060 = 1/2 x 2/3 x 4/5 x 6/7 x 10/11 x 12/13 = 5760 / 30030 = ~.1918 (7.7% smaller) The chance of two factors both in the same class is then (bitlevel-1)/bitlevel * (ln(bitlevel/(bitlevel-1)))^{2} / 2 / c. For 75 bits, 74/75 * (ln(75/74))^{2}/2/c = 8.89E-5/c; c=96 for less-classes so 9.26E-7, or c=960 for more-classes so 9.26E-8 (using Fortran style exponential notation). If two factors are found in the same bit level, the chance of them landing in the same class is 1/c, and the chance of different classes 1-1/c. (1/c +(1-1/c) = 1, sum of probabilities = 1, check.) Three factors/bit-level case This case appears likely to occur hundreds of times in p<10^{9}. If three factors are found in the same bit level, the chance of none of them coinciding in the same class is p0= (c-1)/c * (c-2)/c = (c^{2}-3c+2)/c^{2} = 1- (3c-2)/c^{2} The chance of any two of them landing in the same class is the second one matches the first but the third does not, or the first two do not match but the third matches one of them, 1/c (c-1)/c +(c-1)/c 2/c = (c-1)/c^{2} + (2c-2)/c^{2} = 3(c-1)/c^{2} while the chance of all 3 landing in the same class is 1/c^{2}, for a combined chance of (3c-2)/c^{2}. (p0 +p2 +p3 = 1- (3c-2)/c^{2} + 3(c-1)/c^{2} + 1/c^{2} = 1, check.) For less-classes the chance of 2 or more landing in the same class is ( 3 x 96 - 2 ) / 96^{2} =~ .031033 while for more-classes it's (3 x 960 - 2 ) / 960^{2} =~ .0031228 chance of coincident classes. The probability of 3 factors is (bitlevel-1)/bitlevel * (ln(bitlevel/(bitlevel-1)))^{3} / 3! For bitlevel 75, p=74/75 * (ln(75/74))^3 / 3! = 3.978E-7, so per factoring attempt, less-classes yields a chance of coincident factors by 3 found of 1.233E-8 while more-classes yields 1.241E-9. The combinations of two or three factors per bit level coinciding in a class are then, for 75 bits, Less-classes: 9.26E-7 + 1.233E-8 = 9.38E-7 More-classes: 9.26E-8 + 1.241E-9 = 9.38E-8 The odds of four or more are not zero, but they fall to the right of the rounding of the above figures. Note, for three or more factors per bit level, these are not binomial distributions (since probabilities of matching are not constant with repeated trials) Four factors/bit-level case This case appears likely to occur few times in p<10^{9}. If four factors are found in the same bit level, the chance of each falling in a separate class is the chance of 3 successive misses. The chance of any two of them landing in the same class is, the second matches the first but the third and fourth do not, otherwise the third matches the first or second but the others do not, or otherwise the fourth matches one of the preceding 3 but the first three did not match each other. The chance of 3 coinciding is 4-fold; 123, 124, 134, or 234; 1/c^{2} * (c-1)/c + (1/c)(c-1)/c * 1/c + (c-1)/c(1/c)^{2} + (c-1)/c(1/c)^{2}= 4(c-1)/c^{3} The chance of all four coinciding is 1/c^{3}. no coincidences: p0= (c-1)/c (c-2)/c (c-3)/c = (c-1)(c-2)(c-3)/c^{3} = 1 - (6 c^{2} -11c +6)/c^{3}) coincidence of just 2: p2= 1/c (c-1)(c-2)/c^{2} + (c-1)(c-2)/c^{2} * 2 / c + 3/c *(c-1)(c-2)/c^{2} = 6(c-1)(c-2)/c^{3} coincidence of 3 in a class: p3= 4(c-1)/c^{3} coincidence of 4 in the same class: p4= 1/c^{3} The probability sums for the four-factors cases listed above don't add up to probability one, so something's not quite right yet. P0+p2+p3+p4 = 1 -3(c-1)/c^{3} instead of one. There's also a slight possibility of a double coincidence of two, which could occur one of three ways. A) First and second match, third misses first/second but third and fourth match. p22a=1/c * (c-1)/c * 1/c for this subcase. B) First and third match, second and fourth match, first and second do not match. p22b=p22a. C) First and fourth match, second and third match, first and second do not match. p22c=p22a. Total, p22=3(c-1)/c^{3} In total, chances of 2 or more coinciding: 6(c-1)(c-2)/c^{3} + 4(c-1)/c^{3} + 1/c^{3} +3(c-1)/c^{3} = ( 6(c-1)(c-2) + 4(c-1) + 1 +3(c-1)) / c^{3} = ( 6c^{2} -18c +12 +7c -7 +1 ) / c^{3} = (6 c^{2} -11c +6) / c^{3} none: c=96, p0=0.938687; c=960, p= 0.9937619 2 coincide: c=96, p2=0.060560; c=960, p2= 0.0062305 3: c=96, p3=4.295e-4; c=960, p3= 4.34E-6 4: c=96, p4=1.13e-6; c=960, p4= 1.13E-9 dual-2: c=96, p22=0.000322; c=960, p22=3.25e-6 2 to 4: c=96, pn=0.061313; c=960, pn= 0.0062381 c=96, p0+pn= 1, check; c=960, p0+pn= 1, check. More than four factors/bit-level cases These cases are very unlikely to occur in p<10^{9} and so are not calculated here. They also rapidly become more complex, as illustrated by the numbers of coincidence subcases possible versus number of factors found rises. 5: 2; 3; 4; 5; 22; 23 6: 2; 3; 4; 5; 6; 22; 23; 24; 33; 222 7: 2; 3; 4; 5; 6; 7; 22; 23; 24; 25; 33; 34; 222; 223 8: 2; 3; 4; 5; 6; 7; 8; 22; 23; 24; 25; 26; 33; 34; 35; 44; 222; 223; 224; 233; 2222 9: 2; 3; 4; 5; 6; 7; 8; 9; 22; 23; 24; 25; 26; 27; 33; 34; 35; 36; 44; 45; 222; 223; 224; 225; 233; 234; 333; 2222; 2223 10: 2; 3; 4; 5; 6; 7; 8; 9; 10; 22; 23; 24; 25; 26; 27; 28; 33; 34; 35; 36; 37; 44; 45; 46; 55; 222; 223; 224; 225; 226; 233; 234; 235; 244; 333; 334; 2222; 2223; 2224; 2233; 22222 Summary Odds are estimated for a variety of bit levels and number of factors per level or class in the attachment. Odds are sufficient that in p<10^{9}, up to 2 factors per class seems likely to occur sometime (dozens to thousands of occurrences depending on factoring depth and frequency of use of less-classes), but 5 or more factors per class occurring sometime is unlikely; 3 or more is unlikely from 65 bits upward. Top of reference tree: https://www.mersenneforum.org/showpo...22&postcount=1 Last fiddled with by kriesel on 2020-10-18 at 13:29 |
2020-09-17, 17:51 | #6 |
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
3×23×67 Posts |
Benford's Law
Benford's Law https://en.wikipedia.org/wiki/Benford's_law comes from observations that many cases of observed or measured sets of numbers have a nonuniform distribution of leading digits. That is, 1 is the most common leading digit, at 30.1% rather than 11.1%,and 9 the least, at 4.6% rather than 11.1%, for base ten numbers. It also applies to second and third digits, and to expression in other number bases. Numbers following Benford's law have digit probabilities based on the logarithm of the numbers. It requires that the numbers are distributed over several orders of magnitude, which at 2 to 82589933, Mersenne prime exponents certainly are, as well as the Mersenne primes' number of decimal digits or digits in other number bases. Log10(82589933/2)=7.616; log10(24862048/1) =7.396.
Benford's law is one of the tests used to detect fabricated numbers, such as in tax or accounting or research fraud. The numbers that people think are random when faking values generally are not random, generally avoiding round numbers, repeated digits, ascending or descending series, palindromes, etc. and may even reveal individuals' distinct tendencies. (I have no reason to suspect fakery in the Mersenne primes that have been repeatedly checked, so expected them to exhibit usual probability behavior including probable deviations from expected statistical values, with significant deviation expected due to low number of samples available.) Considering the 51 known Mersenne primes, in base ten, their exponents follow Benford's law quite closely, with occurrence within 10% of expected, usually within +-1. But not so for the number of digits in the decimal expression of the Mersenne primes. https://www.mersenne.org/primes/ In decimal, the leading digit 4 is noticeably underrepresented, 5 is absent, and 6 is substantially overrepresented. The absence of a leading 5 seems to line up with some of the larger ratio gaps between successive exponents. Based on the Lenstra & Pomerance conjecture we expect on average a ratio of ~1.47576 on exponent. (https://primes.utm.edu/notes/faq/NextMersenne.html) Mersenne primes with exponents 127 and 521 (exponent ratio 4.102) have 39 and 157 decimal digits respectively; 1279 and 2203 (exponent ratio 1.722) have 386 and 664; 11213 and 19937 (exponent ratio 1.778) have 3376 and 6002 digits; 132049 and 216091 (exponent ratio 1.636) have 39751 and 65050; 1398269 and 2976221 (exponent ratio 2.129) have 420921 and 895932; 13466917 and 20996011 (exponent ratio 1.559) have 4053946 and 6320430. ALL the ranges where we could look for a known Mersenne prime with a leading 5 in the number of decimal digits have greater than expected average gaps present, and no Mersenne prime, except for the first, 13 and 17, exponent ratio 1.308, having 4 and 6 digits respectively. Average observed ratio for gaps excluding 5 as a leading digit of the number of digits is 2.033, about 38% larger than the expected average ratio. So let's convert that list of decimal number of decimal digits into hexadecimal. Ahd octal. Following are for each known Mersenne prime, the sequence number, exponent p, decimal number of decimal digits, of the Mersenne prime, and hexadecimal and octal representations of number of decimal digits; Code:
# p digits (hex) (octal) 1 2 1 0x1 o1 2 3 1 0x1 o1 3 5 2 0x2 o2 4 7 3 0x3 o3 5 13 4 0x4 o4 6 17 6 0x6 o6 7 19 6 0x6 o6 8 31 10 0xA o12 9 61 19 0x13 o23 10 89 27 0x1B o33 11 107 33 0x21 o41 12 127 39 0x27 o47 13 521 157 0x9D o235 14 607 183 0xB7 o267 15 1279 386 0x182 o602 16 2203 664 0x298 o1230 17 2281 687 0x2AF o1257 18 3217 969 0x3C9 o1711 19 4253 1281 0x501 o2401 20 4423 1332 0x534 o2464 21 9689 2917 0xB65 o5545 22 9941 2993 0xBB1 o5661 23 11213 3376 0xD30 o6460 24 19937 6002 0x1772 o13562 25 21701 6533 0x1985 o14605 26 23209 6987 0x1B4B o15513 27 44497 13395 0x3453 o32123 28 86243 25962 0x656A o62552 29 110503 33265 0x81F1 o100761 30 132049 39751 0x9B47 o115507 31 216091 65050 0xFE1A o177032 32 756839 227832 0x379F8 o674770 33 859433 258716 0x3F29C o771234 34 1257787 378632 0x5C708 o1343410 35 1398269 420921 0x66C39 o1466071 36 2976221 895932 0xDABBC o3325674 37 3021377 909526 0xDE0D6 o3360326 38 6972593 2098960 0x200710 o10003420 39 13466917 4053946 0x3DDBBA o17355672 40 20996011 6320430 0x60712E o30070456 41 24036583 7235733 0x6E6895 o33464225 42 25964951 7816230 0x774426 o35642046 43 30402457 9152052 0x8BA634 o42723064 44 32582657 9808358 0x95A9E6 o45324746 45 37156667 11185272 0xAAAC78 o52526170 46 42643801 12837064 0xC3E0C8 o60760310 47 43112609 12978189 0xC6080D o61404015 48* 57885161 17425170 0x109E312 o102361422 49* 74207281 22338618 0x154DC3A o125156072 50* 77232917 23249425 0x162C211 o130541021 51* 82589933 24862048 0x17B5D60 o136656540 The substantial over-representation of leading digit 6 in bases 8, 10 and 16 seems odd. If there's some reason for that, other than statistics in low population sizes, please comment in a discussion thread. Maybe it is surprising, maybe it isn't. Consider that if there's one digit that is the most over-represented, in one base, by pure randomness, its chance of being the most over-represented in a second base also seems to be 1/base. That's only mildly low for 10 or 16, not very low. The usual test criterion, for rejecting that the difference between observed and expected, is less than 5% probability from chance, is not an issue in any of the Pearson chi-squared cases tabulated, although one is borderline. The 3 measures of x^{2}, m and d gave widely different results for the same case. Note to self: pursue further; perhaps try to calculate probability distribution for the various numbers of the different digits. It's not as "simple" as a single binomial distribution. Purpose is to put some numbers to how likely or unlikely the extent of digit-6 over-representation is in either base 10 or 16. Or some study of number theory and statistics. Top of reference tree: https://www.mersenneforum.org/showpo...22&postcount=1 Last fiddled with by kriesel on 2020-09-25 at 21:48 Reason: fixed DOF effect on calculation of Pearson's chi-squared tests, added octal |
Thread Tools | |
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Statistical properties of categories of GIMPS results and interim results | kriesel | Probability & Probabilistic Number Theory | 1 | 2019-05-22 22:59 |
Statistical help | henryzz | Homework Help | 5 | 2013-03-25 11:26 |
Statistical odds for prime in Wagstaff vs Mersenne | diep | Math | 27 | 2010-01-13 20:18 |
Cpu Properties | mack | Information & Answers | 2 | 2009-09-07 05:48 |
Statistical Investigation of Large Digit Mersenne Primes . . . (please don't bite me) | 9021951 | Math | 29 | 2005-03-15 16:26 |