mersenneforum.org Statistics and statistical properties
 Register FAQ Search Today's Posts Mark Forums Read

 2019-05-22, 19:50 #1 kriesel     "TF79LL86GIMPS96gpu17" Mar 2017 US midwest 137078 Posts Statistics and statistical properties This is a reference thread regarding statistics of GIMPS results and the expected and observed statistical properties of the various computation results. Please use the reference discussion thread for comments https://www.mersenneforum.org/showthread.php?t=23383 Table of Contents This post Statistical properties of categories of GIMPS results and interim results https://www.mersenneforum.org/showpo...98&postcount=2 Statistics of Mersenne number factors found https://www.mersenneforum.org/showpo...99&postcount=3 Statistics of Mersenne Prime Exponents https://www.mersenneforum.org/showpo...00&postcount=4 Probability estimates for trial factors in same bit level or class https://www.mersenneforum.org/showpo...82&postcount=5 Benford's Law https://www.mersenneforum.org/showpo...35&postcount=6 Discovery dates https://www.mersenneforum.org/showpo...85&postcount=7 etc tbd See also https://www.mersenneforum.org/showpo...40&postcount=4 regarding error rates of primality tests. Top of reference tree: https://www.mersenneforum.org/showpo...22&postcount=1 Last fiddled with by kriesel on 2021-11-28 at 16:29
 2019-05-22, 19:50 #2 kriesel     "TF79LL86GIMPS96gpu17" Mar 2017 US midwest 608710 Posts Statistical properties of categories of GIMPS results and interim results Statistical properties of categories of GIMPS results and interim results (originally posted at https://www.mersenneforum.org/showpo...32&postcount=1) I've been thinking lately, about what the expected probability distributions are for various categories of outputs, in GIMPS computing runs, how the distributions may be affected by various types of errors, and whether or when it might be feasible to detect the difference. All the following is in regard to p>>64, such as at the current GIMPS wavefronts of primality testing and double checking, and up to p<232. In perl, the table for gpu applications of believed-bad res64 values is currently %badresidues=( 'cllucas', '0x0000000000000002, 0xffffffff80000000', 'cudalucas', '0x0000000000000000, 0x0000000000000002, 0xfffffffffffffffd', 'cudapm1', '0x0000000000000000, 0x0000000000000001, 0xfff7fffbfffdfffe, 0xfff7fffbfffdffff, 0xfff7fffbfffffffe, 0xfff7fffbffffffff, '. '0xfff7fffffffdfffe, 0xfff7fffffffdffff, 0xfff7fffffffffffe, 0xfff7ffffffffffff, 0xfffffffbfffdfffe, 0xfffffffbfffdffff, '. '0xfffffffbfffffffe, 0xfffffffbffffffff, 0xfffffffffffdfffe, 0xfffffffffffdffff, 0xfffffffffffffffe, 0xffffffffffffffff', 'gpuowl', '0x0000000000000000', 'mfaktc', '', 'mfakto', '' ); For cpu-based primality testers, interim res64 values 0x00, 0x02, and 0xfffffffffffffffd are regarded as bad LL values, and 0x00 is regarded as a bad PRP res64 value. However, in https://www.mersenneforum.org/showth...?t=5862&page=2 we see that res64 values 0x000000000000 or 0xffffffffffffffff occur sometimes legitimately, as in the last iteration before the LL test identifies a Mersenne prime. a) LL final residues: 0 if Mp is a prime and the computation went correctly. Certain error conditions generate spurious 0x02 or 0x0 or 0xffffffff80000000 or 0xfffffffffffffffd sequences or final res64. Some of these are application specific. Composite mersenne numbers' final LL full res are I think expected to be uniformly distributed in 0= f, that would be a sign of error probably in the mod function. Questions: 1) What errors do you see in the above? 2) What potentially useful statistical behavior is omitted? Top of reference tree: https://www.mersenneforum.org/showpo...22&postcount=1 Last fiddled with by kriesel on 2021-08-25 at 18:17 Reason: added comments on tf candidate factors & remainders
 2019-05-22, 20:00 #3 kriesel     "TF79LL86GIMPS96gpu17" Mar 2017 US midwest 137078 Posts Statistics of Mersenne number factors found A table of numbers of exponents with 0 to 7 factors under 2^65 is available at https://www.mersenneforum.org/showpo...1&postcount=46 Fully factored exponents show higher prevalence of p=3 mod 4 than p=1 mod 4. https://www.mersenneforum.org/showpo...4&postcount=62 And the ratio rises as they are screened for p > increasing thresholds. Semiprime exponents show higher prevalence of p=3 mod 4 than p=1 mod 4. https://www.mersenneforum.org/showpo...5&postcount=69 And the ratio rises to 2:1 as they are screened for p > increasing thresholds. Exponents p = 3 mod 4 have slightly more factors found, than p= 1 mod 4. https://www.mersenneforum.org/showpo...4&postcount=77 Also f=1 mod 8 has fewer factors found than f=7 mod 8, by about 12%. https://www.mersenneforum.org/showpo...6&postcount=80 Combining consideration of f = 1 or 7 mod 8 and 1 or 5 mod 6, f = 1, 7, 17, or 23 mod 24. "for exponents p=3 (mod 4), factors that are f=23 (mod 24) are found about one-third more frequently than the other kinds." https://www.mersenneforum.org/showpo...7&postcount=85 Known factors and other info can be downloaded from links at https://www.mersenneforum.org/showpo...1&postcount=11 for p<109; for above, at https://www.mersenneforum.org/showpo...54&postcount=9 Wagstaff paper: Divisors of Mersenne Numbers, MATHEMATICS OF COMPUTATION VOLUME 40, NUMBER 161 JANUARY 1983. PAGES 385-397 http://www.ams.org/journals/mcom/198...-0679454-X.pdf Top of reference tree: https://www.mersenneforum.org/showpo...22&postcount=1 Last fiddled with by kriesel on 2020-12-14 at 15:50
 2019-05-22, 20:03 #4 kriesel     "TF79LL86GIMPS96gpu17" Mar 2017 US midwest 3·2,029 Posts Statistics of Mersenne Prime Exponents Begin with the content of https://primes.utm.edu/notes/faq/NextMersenne.html We expect slightly less than 6 Mersenne primes in a 10:1 interval on exponent. M82589933 is part of an unusual run of (at least) 13 in the 107 to 108 interval. "For comparison: 1 to 10: 4 10 to 100: 6 100 to 1000: 4 1000 to 10000: 8 10000 to 100000: 6 100000 to 1000000: 5 1000000 to 10000000: 5" https://www.mersenneforum.org/showpo...70&postcount=5 (The next decade after the above, 107 to 108 is not completely double checked yet. In 108 to 109 there are relatively few tested so far, with a large amount of further computation to do.) p for Mp is 1 mod 8 at twice the frequency of 3, 5, or 7 mod 8 up to Mp50. https://www.mersenneforum.org/showpo...2&postcount=53 The Wagstaff conjecture predicts about 57 Mp below p=109. https://www.mersenneforum.org/showpo...7&postcount=55 Of the known Mersenne primes, many more are p=1 mod 4 (31) than p=3 mod 4 (19). https://www.mersenneforum.org/showpo...8&postcount=59 For #51, p= 82589933 = 1 mod 4, so 31 (60.78%) vs 19 (37.25%) 3 mod 4 and 1 (1.96%) of 2 mod 4. 31/19 ~ 1.6316. At exponents > 1000, it's ~2:1. Mersenne primes are twice as likely to be p=1 mod 4 as 3 mod 4 conjectured, with some empirical tabular support. Within p=1 (mod 4), Mersenne primes are twice as likely to have p=1 (mod 8) rather than p=5 (mod 8) https://www.mersenneforum.org/showpo...5&postcount=71 (added following 2021-08-29) There's naturally a somewhat inverse relationship between a) which p mod whatever have more found factors, and b) which p mod whatever have more found Mersenne primes. I think I may have seen it posted elsewhere, but can't find it now, that exponent p mod 24 shows considerable variation in incidence of known Mersenne primes. Code: P = 1 mod 24 8 15.69% P = 2 mod 24 1 1.96% P = 3 mod 24 1 1.96% P = 5 mod 24 9 17.65% P = 7 mod 24 8 15.69% P = 11 mod 24 3 5.88% P = 13 mod 24 3 5.88% P = 17 mod 24 11 21.57% P = 19 mod 24 5 9.80% P = 23 mod 24 2 3.92% total 51 Below p=1000, p mod 24 counts: Code: p count 1 0 2 1 3 1 5 1 7 4 11 1 13 2 17 3 19 1 23 0 Above p=1000, for known Mersenne primes, p mod 24 is observed with frequency Code: p count p n 1 8 5 8 7 4 11 2 13 1 17 8 19 4 23 2 which happens to be a table of counts consisting only of powers of two. If we break it down further to p mod 120, we get too few known primes per bin to show much. Code: p mod 120 of the 51 known Mersenne primes p count 1 3 2 1 3 1 5 1 7 3 13 1 17 4 19 1 29 1 31 1 37 1 (15 excluding 2,3,5 in p mod 120 from 0 to 40) 41 2 43 1 49 2 53 3 61 1 67 1 71 1 77 1 79 1 (13 in p mod 120 from 40 to 80) 83 1 89 3 91 2 97 3 101 3 103 3 107 2 113 2 119 1 (20 in p mod 120 from 80 to 120) Top of reference tree: https://www.mersenneforum.org/showpo...22&postcount=1 Last fiddled with by kriesel on 2021-12-13 at 13:05 Reason: minor edits
2019-07-08, 01:50   #5
kriesel

"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

137078 Posts
Probability estimates for trial factors in same bit level or class

The odds of finding a factor in a bit level of trial factoring are ln(bitlevel/(bitlevel-1)), or approx. 1/(bitlevel - 1/2) (better than 18 ppm at bitlevel 70, 15 ppm at bitlevel 76, 12 ppm at 86). These estimates are probabilities for completing the entire bit level, not quitting after the first class containing a factor.

What are the odds of finding more than one trial factor in a bit level?
per https://mersenneforum.org/showpost.p...postcount=1147
no factors: (bitlevel-1)/bitlevel
one factor: (bitlevel-1)/bitlevel * (ln(bitlevel/(bitlevel-1)))
Two factors: (bitlevel-1)/bitlevel * (ln(bitlevel/(bitlevel-1)))2 / 2
N factors: (bitlevel-1)/bitlevel * (ln(bitlevel/(bitlevel-1)))n / n!
The probability of finding n factors declines quickly with increasing n.

In the same class of the same bit level?

Two factors/bit-level case
This case appears likely to occur many thousands of times in p<109.
If two factors or more are found in the same bit level, it doesn't matter which class the first one occupies.
Assuming the class location of the next factor found is independent, there's a one in c chance it coincides with the class of the first factor, where c is the ACTIVE number of classes. For MORE_CLASSES, the 2x2x3x5x7x11 = 4620 classes are winnowed down to 960 by whole-class presieving for very small primes, so c=960; for LESS_CLASSES, the 420 classes are winnowed down to 96, so c=96.
LESS_CLASSES: 96/420 = 1/2 x 2/3 x 4/5 x 6/7 = 48 / 210 = ~.2286
MORE_CLASSES: 960/4620= 1/2 x 2/3 x 4/5 x 6/7 x 10/11 = 480 / 2310 =~ .2078 (9.1% smaller)
hypothetical EXTRA_CLASSES: 11520 / 60060 = 1/2 x 2/3 x 4/5 x 6/7 x 10/11 x 12/13 = 5760 / 30030 = ~.1918 (7.7% smaller)

The chance of two factors both in the same class is then (bitlevel-1)/bitlevel * (ln(bitlevel/(bitlevel-1)))2 / 2 / c.
For 75 bits, 74/75 * (ln(75/74))2/2/c = 8.89E-5/c; c=96 for less-classes so 9.26E-7, or c=960 for more-classes so 9.26E-8 (using Fortran style exponential notation).

If two factors are found in the same bit level, the chance of them landing in the same class is 1/c, and the chance of different classes 1-1/c. (1/c +(1-1/c) = 1, sum of probabilities = 1, check.)

Three factors/bit-level case
This case appears likely to occur hundreds of times in p<109.
If three factors are found in the same bit level, the chance of none of them coinciding in the same class is p0= (c-1)/c * (c-2)/c = (c2-3c+2)/c2 = 1- (3c-2)/c2
The chance of any two of them landing in the same class is the second one matches the first but the third does not, or the first two do not match but the third matches one of them,
p2 = 1/c (c-1)/c +(c-1)/c 2/c = (c-1)/c2 + (2c-2)/c2 = 3(c-1)/c2
while the chance of all 3 landing in the same class is 1/c2, for a combined chance of p3 = (3c-2)/c2.
(p0 +p2 +p3 = 1- (3c-2)/c2 + 3(c-1)/c2 + 1/c2 = 1, check.)

For less-classes the chance of 2 or more landing in the same class is ( 3 x 96 - 2 ) / 962 =~ .031033
while for more-classes it's (3 x 960 - 2 ) / 9602 =~ .0031228 chance of coincident classes.
The probability of 3 factors is (bitlevel-1)/bitlevel * (ln(bitlevel/(bitlevel-1)))3 / 3!
For bitlevel 75, p=74/75 * (ln(75/74))^3 / 3! = 3.978E-7, so per factoring attempt,
less-classes yields a chance of coincident factors by 3 found of 1.233E-8 while more-classes yields 1.241E-9.

The combinations of two or three factors per bit level coinciding in a class are then, for 75 bits,
Less-classes: 9.26E-7 + 1.233E-8 = 9.38E-7
More-classes: 9.26E-8 + 1.241E-9 = 9.38E-8
The odds of four or more are not zero, but they fall to the right of the rounding of the above figures.

Note, for three or more factors per bit level, these are not binomial distributions (since probabilities of matching are not constant with repeated trials)

Four factors/bit-level case
This case appears likely to occur few times in p < 109.
If four factors are found in the same bit level, the chance of each falling in a separate class is the chance of 3 successive misses. The chance of any two of them landing in the same class is, the second matches the first but the third and fourth do not, otherwise the third matches the first or second but the others do not, or otherwise the fourth matches one of the preceding 3 but the first three did not match each other.
The chance of 3 coinciding is 4-fold; 123, 124, 134, or 234; 1/c2 * (c-1)/c + (1/c)(c-1)/c * 1/c + (c-1)/c(1/c)2 + (c-1)/c(1/c)2= 4(c-1)/c3
The chance of all four coinciding is 1/c3.
no coincidences: p0= (c-1)/c (c-2)/c (c-3)/c = (c-1)(c-2)(c-3)/c3 = 1 - (6 c2 -11c +6)/c3)
coincidence of just 2: p2= 1/c (c-1)(c-2)/c2 + (c-1)(c-2)/c2 * 2 / c + 3/c *(c-1)(c-2)/c2 = 6(c-1)(c-2)/c3
coincidence of 3 in a class: p3= 4(c-1)/c3
coincidence of 4 in the same class: p4= 1/c3
The probability sums for the four-factors cases listed above don't add up to probability one, so something's not quite right yet. P0+p2+p3+p4 = 1 -3(c-1)/c3 instead of one.
There's also a slight possibility of a double coincidence of two, which could occur one of three ways.
A) First and second match, third misses first/second but third and fourth match. p22a=1/c * (c-1)/c * 1/c for this subcase.
B) First and third match, second and fourth match, first and second do not match. p22b=p22a.
C) First and fourth match, second and third match, first and second do not match. p22c=p22a.
Total, p22=3(c-1)/c3

In total, chances of 2 or more coinciding: 6(c-1)(c-2)/c3 + 4(c-1)/c3 + 1/c3 +3(c-1)/c3 = ( 6(c-1)(c-2) + 4(c-1) + 1 +3(c-1)) / c3
= ( 6c2 -18c +12 +7c -7 +1 ) / c3 = (6 c2 -11c +6) / c3

none: c=96, p0=0.938687; c=960, p= 0.9937619
2 coincide: c=96, p2=0.060560; c=960, p2= 0.0062305
3: c=96, p3=4.295e-4; c=960, p3= 4.34E-6
4: c=96, p4=1.13e-6; c=960, p4= 1.13E-9
dual-2: c=96, p22=0.000322; c=960, p22=3.25e-6

2 to 4: c=96, pn=0.061313; c=960, pn= 0.0062381
c=96, p0+pn= 1, check;
c=960, p0+pn= 1, check.

More than four factors/bit-level cases
These cases are very unlikely to occur in p<109 and so are not calculated here.
They also rapidly become more complex, as illustrated by the numbers of coincidence subcases possible versus number of factors found rises.
5: 2; 3; 4; 5; 22; 23
6: 2; 3; 4; 5; 6; 22; 23; 24; 33; 222
7: 2; 3; 4; 5; 6; 7; 22; 23; 24; 25; 33; 34; 222; 223
8: 2; 3; 4; 5; 6; 7; 8; 22; 23; 24; 25; 26; 33; 34; 35; 44; 222; 223; 224; 233; 2222
9: 2; 3; 4; 5; 6; 7; 8; 9; 22; 23; 24; 25; 26; 27; 33; 34; 35; 36; 44; 45; 222; 223; 224; 225; 233; 234; 333; 2222; 2223
10: 2; 3; 4; 5; 6; 7; 8; 9; 10; 22; 23; 24; 25; 26; 27; 28; 33; 34; 35; 36; 37; 44; 45; 46; 55; 222; 223; 224; 225; 226; 233; 234; 235; 244; 333; 334; 2222; 2223; 2224; 2233; 22222

Summary
Odds are estimated for a variety of bit levels and number of factors per level or class in the attachment. Odds are sufficient that in p<109, up to 2 factors per class seems likely to occur sometime (dozens to thousands of occurrences depending on factoring depth and frequency of use of less-classes), but 5 or more factors per class occurring sometime is unlikely; 3 or more is unlikely from 65 bits upward.

Top of reference tree: https://www.mersenneforum.org/showpo...22&postcount=1
Attached Files
 tf odds.pdf (14.1 KB, 267 views) tf odds3.pdf (16.9 KB, 221 views)

Last fiddled with by kriesel on 2021-12-13 at 13:10

2020-09-17, 17:51   #6
kriesel

"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

3×2,029 Posts
Benford's Law

Benford's Law https://en.wikipedia.org/wiki/Benford's_law comes from observations that many cases of observed or measured sets of numbers have a nonuniform distribution of leading digits. That is, 1 is the most common leading digit, at 30.1% rather than 11.1%,and 9 the least, at 4.6% rather than 11.1%, for base ten numbers. It also applies to second and third digits, and to expression in other number bases. Numbers following Benford's law have digit probabilities based on the logarithm of the numbers. It requires that the numbers are distributed over several orders of magnitude, which at 2 to 82589933, Mersenne prime exponents certainly are, as well as the Mersenne primes' number of decimal digits or digits in other number bases. Log10(82589933/2)=7.616; log10(24862048/1) =7.396.

Benford's law is one of the tests used to detect fabricated numbers, such as in tax or accounting or research fraud. The numbers that people think are random when faking values generally are not random, generally avoiding round numbers, repeated digits, ascending or descending series, palindromes, etc. and may even reveal individuals' distinct tendencies. (I have no reason to suspect fakery in the Mersenne primes that have been repeatedly checked, so expected them to exhibit usual probability behavior including probable deviations from expected statistical values, with significant deviation expected due to low number of samples available.)

Considering the 51 known Mersenne primes, in base ten, their exponents follow Benford's law quite closely, with occurrence within 10% of expected, usually within +-1. See also https://mersenneforum.org/showpost.p...9&postcount=47
But not so for the number of digits in the decimal expression of the Mersenne primes.
https://www.mersenne.org/primes/
In decimal, the leading digit 4 is noticeably underrepresented, 5 is absent, and 6 is substantially overrepresented. The absence of a leading 5 seems to line up with some of the larger ratio gaps between successive exponents.
Based on the Lenstra & Pomerance conjecture we expect on average a ratio of ~1.47576 on exponent. (https://primes.utm.edu/notes/faq/NextMersenne.html)
Mersenne primes with exponents 127 and 521 (exponent ratio 4.102) have 39 and 157 decimal digits respectively; 1279 and 2203 (exponent ratio 1.722) have 386 and 664; 11213 and 19937 (exponent ratio 1.778) have 3376 and 6002 digits;
132049 and 216091 (exponent ratio 1.636) have 39751 and 65050;
1398269 and 2976221 (exponent ratio 2.129) have 420921 and 895932;
13466917 and 20996011 (exponent ratio 1.559) have 4053946 and 6320430.
ALL the ranges where we could look for a known Mersenne prime with a leading 5 in the number of decimal digits have greater than expected average gaps present, and no Mersenne prime, except for the first, 13 and 17, exponent ratio 1.308, having 4 and 6 digits respectively. Average observed ratio for gaps excluding 5 as a leading digit of the number of digits is 2.033, about 38% larger than the expected average ratio.

So let's convert that list of decimal number of decimal digits into hexadecimal. And octal.
Following are for each known Mersenne prime, the sequence number, exponent p, decimal number of decimal digits, of the Mersenne prime, and hexadecimal and octal representations of number of decimal digits;
Code:
 #         p   digits (hex)     (octal)
1         2        1 0x1       o1
2         3        1 0x1       o1
3         5        2 0x2       o2
4         7        3 0x3       o3
5        13        4 0x4       o4
6        17        6 0x6       o6
7        19        6 0x6       o6
8        31       10 0xA       o12
9        61       19 0x13      o23
10        89       27 0x1B      o33
11       107       33 0x21      o41
12       127       39 0x27      o47
13       521      157 0x9D      o235
14       607      183 0xB7      o267
15      1279      386 0x182     o602
16      2203      664 0x298     o1230
17      2281      687 0x2AF     o1257
18      3217      969 0x3C9     o1711
19      4253     1281 0x501     o2401
20      4423     1332 0x534     o2464
21      9689     2917 0xB65     o5545
22      9941     2993 0xBB1     o5661
23     11213     3376 0xD30     o6460
24     19937     6002 0x1772    o13562
25     21701     6533 0x1985    o14605
26     23209     6987 0x1B4B    o15513
27     44497    13395 0x3453    o32123
28     86243    25962 0x656A    o62552
29    110503    33265 0x81F1    o100761
30    132049    39751 0x9B47    o115507
31    216091    65050 0xFE1A    o177032
32    756839   227832 0x379F8   o674770
33    859433   258716 0x3F29C   o771234
34   1257787   378632 0x5C708   o1343410
35   1398269   420921 0x66C39   o1466071
36   2976221   895932 0xDABBC   o3325674
37   3021377   909526 0xDE0D6   o3360326
38   6972593  2098960 0x200710  o10003420
39  13466917  4053946 0x3DDBBA  o17355672
40  20996011  6320430 0x60712E  o30070456
41  24036583  7235733 0x6E6895  o33464225
42  25964951  7816230 0x774426  o35642046
43  30402457  9152052 0x8BA634  o42723064
44  32582657  9808358 0x95A9E6  o45324746
45  37156667 11185272 0xAAAC78  o52526170
46  42643801 12837064 0xC3E0C8  o60760310
47  43112609 12978189 0xC6080D  o61404015
48  57885161 17425170 0x109E312 o102361422
49* 74207281 22338618 0x154DC3A o125156072
50* 77232917 23249425 0x162C211 o130541021
51* 82589933 24862048 0x17B5D60 o136656540
Again, leading digit 6 is over-represented in the hexadecimal, at more than double the expected count. Despite ones being favored by the available range of data. Excluding +-1 variations from expected frequency, 4 7 and E are underrepresented; E is absent. Similarly, in octal, leading digit 6 is more than twice as frequent as expected.
The substantial over-representation of leading digit 6 in bases 8, 10 and 16 seems odd. If there's some reason for that, other than statistics in low population sizes, please comment in a discussion thread.
Maybe it is surprising, maybe it isn't. Consider that if there's one digit that is the most over-represented, in one base, by pure randomness, its chance of being the most over-represented in a second base also seems to be 1/base. That's only mildly low for 10 or 16, not very low.
The usual test criterion, for rejecting that the difference between observed and expected, is less than 5% probability from chance, is not an issue in any of the Pearson chi-squared cases tabulated, although one is borderline. The 3 measures of x2, m and d gave widely different results for the same case.

An interesting paper on Benford's law and Mersenne numbers is here. It advises against using powers of two as the number base. Oops.

Note to self: pursue further; perhaps try to calculate probability distribution for the various numbers of the different digits. It's not as "simple" as a single binomial distribution. Purpose is to put some numbers to how likely or unlikely the extent of digit-6 over-representation is in either base 10 or 16. Or some study of number theory and statistics.
There's a discussion thread I began, in which jwaltos makes some recommendations.

Top of reference tree: https://www.mersenneforum.org/showpo...22&postcount=1
Attached Files
 benfords law.pdf (33.0 KB, 223 views)

Last fiddled with by kriesel on 2021-12-28 at 04:21 Reason: minor edits for style

2021-11-28, 16:18   #7
kriesel

"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

3×2,029 Posts
Discovery dates

Discovery dates can be found at https://www.mersenne.org/primes/. Other than patterns such as varying computer usage & shutdown at U of Central Missouri and elsewhere, relating to academic year schedules, cooling load reduction in local summer by GIMPS participants with system fleets, or in the early computer age, multiple discoveries in the same computer run on the same day, this may be largely an exercise in numerology. A linear fit through month-of-year of discovery is almost level.

Of the 41 known Mersenne primes with discovery date known to the day, none fall in July and only one each in March and April. January contained the most with 8, followed closely by September with 7. Average 41/12 ~ 3.4167. December is among the months below the average, with 3.

The number of coincident day of year for known Mersenne primes with known discovery dates, after excluding the few that were found the same day as another with the same computer runs, closely matches what we would expect based on uniform distribution probability used in a variant of the birthday problem.

Day of year values are 4 7 10 25 27 28 30 40 49 50 63 98 131 136 152 153 155 162 177 236 244 247 250 251 262 268 281 303 307 318 321 341 349 360. That's 34 distinct values, in 41 data points.
There are 7 occurrences of coincident day-of-year. All are pairs; no triplets or higher. Three of these are because the discoveries were part of the same computer runs (M521 and M607; M2203 and M2281; M4253 and M4423)
The other four are:
136 (M9941 and M24036583; 1963-05-16 and 2004-05-15 respectively)
236 (M2976221 and M43112609; 1997-08-24 and 2008-08-23 respectively)
247 (M1257787 and M32582657; 1996-09-03 and 2006-09-04 respectively)
318 (M1398269 and M13466917; 1996-11-13 and 2001-11-14 respectively)
Oddly, there's a leap year in each pair. Leap years occupy ~ 366 days / (3*365 + 366) ~ 0.2505 of the days.
The chance of no leap year in a pair is 0.7495^2 = 0.5617.
So the chance of a leap year or two in a pair is ~1-0.5617 = 0.4383, the chance of all 4 pairs having at least one ~0.4383^4 ~0.037. (About 1 in 27.)

Removing the 3 coinciding from same computer run, there's a population of 38 for the birthday problem. https://en.wikipedia.org/wiki/Birthday_problem
Probability of none of those coinciding is 365! / 365^38 / (365-38)!

365! / 365^38 = 1077 993899 058890 252173 100706 114541 143916 512152 957947 286609 973670 019169 779122 584416 099590 377943 631310 997513 116110 699265 105698 186698 385792 917984 305346 986847 512851 707730 010490 884345 828687 060311 348620 648301 007240 161633 613317 283801 958782 397371 881938 704070 024717 151944 821228 740463 295190 564065 350434 708894 402531 157116 022806 846589 206539 086012 029042 586988 719041 809142 247854 748662 282749 727053 971216 563774 994858 534976 722754 486856 121229 875073 511249 498817 402667 349255 610337 045985 624777 215765 056746 744016 761503 389810 832557 755142 695389 766234 257843 417672 476804 083131 579968 477472 992761 604106 384139 340942 445146 623284 991258 875418 717299 355093 070233 864700 102342 807155 822354 628676 216382 611005 120745 566266 (682 digits)

(365-38)! = 7930 380485 625411 633272 752415 295302 944886 366881 107611 733184 778480 029858 446765 003671 107763 861600 076825 701830 492192 305853 377965 780965 780340 295589 285869 163190 000632 259847 105736 589079 366709 441983 671826 381676 908043 733517 173277 654192 469974 631710 000299 280179 293148 780381 827478 828043 017396 020782 746400 173696 621067 589383 048659 562023 245288 789026 103352 264098 180763 809219 461866 535802 948393 706482 413289 714758 369362 634127 590333 890813 586362 980348 068695 605432 887817 041395 440138 493890 651585 513618 213068 893413 473645 332365 404125 778003 725586 312053 833477 790233 420298 884836 001508 422974 271903 924732 400848 063020 055739 872208 486400 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 (682 digits)

ratio is ~1077993899 / 7930380485 ~ 0.13593. So probability of some coinciding is 1-0.13593 ~ 0.86407.

Expected number of discovery dates that coincide with some other while determined independently is
38 - 38 * (364/365)^37 = 3.668, which is well matched by the 4 we find.

Top of reference tree: https://www.mersenneforum.org/showpo...22&postcount=1
Attached Files
 mersenne discovery dates.pdf (37.4 KB, 18 views)

Last fiddled with by kriesel on 2021-12-13 at 12:58

 Similar Threads Thread Thread Starter Forum Replies Last Post kriesel Probability & Probabilistic Number Theory 1 2019-05-22 22:59 henryzz Homework Help 5 2013-03-25 11:26 diep Math 27 2010-01-13 20:18 mack Information & Answers 2 2009-09-07 05:48 9021951 Math 29 2005-03-15 16:26

All times are UTC. The time now is 11:12.

Sun Jan 16 11:12:46 UTC 2022 up 177 days, 5:41, 0 users, load averages: 0.78, 0.92, 0.88