mersenneforum.org (https://www.mersenneforum.org/index.php)
-   Probability & Probabilistic Number Theory (https://www.mersenneforum.org/forumdisplay.php?f=115)
-   -   Estimating the number of primes in a partially-factored number (https://www.mersenneforum.org/showthread.php?t=19568)

 CRGreathouse 2014-08-01 19:09

Estimating the number of primes in a partially-factored number

I'm interested in using information about a number to estimate the likelihood of it having a certain number of prime factors.

If we had no information about the number except its size, there are two obvious approaches: the Erdős-Kac theorem plus continuity correction, or Landau's extension to the Prime Number Theorem:
[TEX]\pi_k(n)\sim\frac{n}{\log n}\frac{(\log\log n)^{k-1}}{(k-1)!}[/TEX]

Hildebrand, Tenenbaum, and various collaborators have versions of this formula which works over a larger range, though all results are stated asymptotically so it's hard to tell over what ranges each would be best suited.

It's not obvious to me when I should use each of the various estimates. (I also wonder if there is an integral formulation of Landau's theorem, just like Li(x) is a better approximation to pi(x) than x/log x.)

Things get even more interesting if you add in partial factorization. First, it essentially takes care of the difference between distinct and indistinct factorizations, since beyond a fairly small point you don't see repeated prime factors of that size 'in the wild'. Second, it should be able to use the factorization information to refine the above estimates. Essentially you'd expect about log log x primes out of the first x, and you should be able to subtract this delta from the Erdős-Kac estimate directly. (It's not so clear how to modify Landau's formula.)

I ask these questions because in the range of numbers of interest (from those we can factor fairly easily to those we can check for probable primality fairly easily) the numbers are small enough that constant factors and o(1)'s matter.

And (pipe dream?) if the calculations are fairly routine we might ask Syd (or, for that matter, Alpertron) to include them in factordb (resp., his ECM page).

 wblipp 2014-08-02 20:53

How about building on the Dickman-de Bruijn function by defining a series

[TEX]\rho_k(n)[/TEX] is the asymptotic probability that x has exactly k prime divisors greater than x^(1/n).

[TEX]\rho_0(n)[/TEX] is the standard Dickman-de Bruijn function.

The higher k's can (I think) be defined easily in terms of an integral of (k-1) functions.

I'd hope that this heuristic gives a probability density for the number of large primes, with "large" defined by the choice of n. You could use the knowledge of prior factoring effort to choose interesting definitions of "large," and apply Bayes' theorem to modify the density to account for the prior effort.

 R.D. Silverman 2014-08-02 21:48

[QUOTE=wblipp;379565]How about building on the Dickman-de Bruijn function by defining a series

[TEX]\rho_k(n)[/TEX] is the asymptotic probability that x has exactly k prime divisors greater than x^(1/n).

[TEX]\rho_0(n)[/TEX] is the standard Dickman-de Bruijn function.

The higher k's can (I think) be defined easily in terms of an integral of (k-1) functions.

I'd hope that this heuristic gives a probability density for the number of large primes, with "large" defined by the choice of n. You could use the knowledge of prior factoring effort to choose interesting definitions of "large," and apply Bayes' theorem to modify the density to account for the prior effort.[/QUOTE]

Wrong approach. What needs to be computed is the [i]conditional[/i]
probability that a large integer N has k prime factors [b]given[/b] that
it has no factors less than (say) N^1/a, for given a.

 wblipp 2014-08-03 04:06

[QUOTE=R.D. Silverman;379573]Wrong approach. What needs to be computed is the [i]conditional[/i] probability that a large integer N has k prime factors [b]given[/b] that it has no factors less than (say) N^1/a, for given a.[/QUOTE]

That's a good approach for the scenarios where you can calculate that distribution - you can start directly with the residual composite. The only way I know to calculate that distribution is to use the inclusion-exclusion and normalization approach described in your paper with Wagstaff. That works for cases where the total number of factors is a handful. But some cases of interest will have "a" of hundreds or thousands.

For these scenarios, I think you would be better off to start with the entire original number and the distibutions I described, then use Bayes to account for the known factoring efforts.

But it doesn't really matter what I think or you think - this is a question of comparing heuristics that can studied empirically. Perhaps we can get the OP to propose some "interesting" cases.

 CRGreathouse 2014-08-03 05:42

[QUOTE=wblipp;379565]How about building on the Dickman-de Bruijn function by defining a series

[TEX]\rho_k(n)[/TEX] is the asymptotic probability that x has exactly k prime divisors greater than x^(1/n).[/QUOTE]

Hmm. I think I could do this, but it would be a real bear to compute. I wrote code to compute, or at least closely estimate, that function a few years back and it wasn't easy. It seems like this version would suffer from all the problems of the original, but worse.

But at the ranges that I've tested it the Dickman function already does a fairly poor job; not sure how these would fare, better or worse.

[QUOTE=wblipp;379565]You could use the knowledge of prior factoring effort to choose interesting definitions of "large," and apply Bayes' theorem to modify the density to account for the prior effort.[/QUOTE]

Sounds reasonable.

 CRGreathouse 2014-08-03 06:12

[QUOTE=wblipp;379594]But it doesn't really matter what I think or you think - this is a question of comparing heuristics that can studied empirically. Perhaps we can get the OP to propose some "interesting" cases.[/QUOTE]

Sure, let's see. MM127 is an obvious candidate for a big number. For small... well, say [url=http://factordb.com/index.php?id=1000000000044818224]200^199 + 199^200[/url] in Paul's honour.

I think the amount of work on MM127 is well-known. It doesn't look like XYYX has worked on the other one, so suppose not much work has been done on it.

 CRGreathouse 2014-08-04 14:54

Using [TEX]\omega(n)[/TEX] (to avoid the extra variability from the small primes that [TEX]\Omega[/TEX] brings) and searching an interval around 10[SUP]20[/SUP] I find

1 distinct prime factor: 4290
2 distinct prime factors: 21379
3 distinct prime factors: 44810
4 distinct prime factors: 54544
5 distinct prime factors: 42306
6 distinct prime factors: 22179
7 distinct prime factors: 8090
8 distinct prime factors: 2022
9 distinct prime factors: 331
10 distinct prime factors: 49
11 distinct prime factors: 1

which compares to the (naive) Landau predictions of

1 distinct prime factor: 4342
2 distinct prime factors: 16632
3 distinct prime factors: 31849
4 distinct prime factors: 40658
5 distinct prime factors: 38928
6 distinct prime factors: 29817
7 distinct prime factors: 19032
8 distinct prime factors: 10412
9 distinct prime factors: 4984
10 distinct prime factors: 2121
11 distinct prime factors: 812

and the Erdős-Kac predictions

1 distinct prime factor: 29250
2 distinct prime factors: 51228
3 distinct prime factors: 70509
4 distinct prime factors: 76319
5 distinct prime factors: 64973
6 distinct prime factors: 43492
7 distinct prime factors: 22872
8 distinct prime factors: 9438
9 distinct prime factors: 3051
10 distinct prime factors: 772
11 distinct prime factors: 152

This supports the intuition that Landau is better for small numbers of prime factors and Erdős-Kac better for large. In this case the crossover is surprisingly large (8 prime factors) but neither estimate is particularly accurate.

 R.D. Silverman 2014-08-04 18:26

[QUOTE=CRGreathouse;379664]Using [TEX]\omega(n)[/TEX] (to avoid the extra variability from the small primes that [TEX]\Omega[/TEX] brings) and searching an interval around 10[SUP]20[/SUP] I find

This supports the intuition that Landau is better for small numbers of prime factors and Erdős-Kac better for large. In this case the crossover is surprisingly large (8 prime factors) but neither estimate is particularly accurate.[/QUOTE]

I would not expect it to be for composites that are only 20 digits.

 CRGreathouse 2014-08-04 18:33

[QUOTE=R.D. Silverman;379677]I would not expect it to be for composites that are only 20 digits.[/QUOTE]

Me either -- but it's hard to factor big enough numbers to get useful information. Even if I used 100-digit numbers, at significantly greater effort, that would only increase the log log by about 1.6.

But it would still be useful to have good estimates for these ranges. Any ideas of what techniques could be used?

 R.D. Silverman 2014-08-04 21:47

[QUOTE=CRGreathouse;379678]Me either -- but it's hard to factor big enough numbers to get useful information. Even if I used 100-digit numbers, at significantly greater effort, that would only increase the log log by about 1.6.

But it would still be useful to have good estimates for these ranges. Any ideas of what techniques could be used?[/QUOTE]

I would try to get a 2nd term (i.e. main term plus error term with effective constant) for the
Erdos-Kac thm.

 CRGreathouse 2014-08-05 00:05

[QUOTE=R.D. Silverman;379699]I would try to get a 2nd term (i.e. main term plus error term with effective constant) for the
Erdos-Kac thm.[/QUOTE]

Interesting.

I haven't worked through the proof yet. Do you recommend the original or a modern version (Granville-Soundarajan, etc.)?

 R.D. Silverman 2014-08-05 00:55

[QUOTE=CRGreathouse;379711]Interesting.

I haven't worked through the proof yet. Do you recommend the original or a modern version (Granville-Soundarajan, etc.)?[/QUOTE]

I can't give a recommendation, I did not even know about the latter
result.

 CRGreathouse 2014-08-05 14:10

[QUOTE=R.D. Silverman;379715]I can't give a recommendation, I did not even know about the latter
result.[/QUOTE]

I read it over last night -- Granville is a great expositor (in addition to being a first-rate mathematician). It doesn't look like their method easily extends to a correction term, since they're not finding the terms directly but rather the moments and then deriving the desired conclusion from a 'magical' theorem in statistics that says that if all the moments match the normal distribution it is normal. So it appears that 1 + o(1) is all you get.

But it struck me while reading the theorem that even more important than the correction term in variance is the correction term in mean, and it is purely elementary to improve the error term in the [i]average[/i] order of [TEX]\omega[/TEX] from O(1) to M + O(1/log x), where the error term comes both from the fraction of numbers below x which are prime and from Mertens' theorem. I know this isn't the same as the normal order, and I doubt something so sharp could be proved at that level, but this should improve the practical performance.

 CRGreathouse 2014-08-12 15:12

[QUOTE=R.D. Silverman;379699]I would try to get a 2nd term (i.e. main term plus error term with effective constant) for the
Erdos-Kac thm.[/QUOTE]

Look what I found:

Persi Diaconis, Frederick Mosteller, Hironari Onishi, Second-order terms for the variances and covariances of the number of prime factors—Including the square free case, [i]Journal of Number Theory[/i] [b]9[/b]:2 (May 1977), pp. 187-202.

 R.D. Silverman 2014-08-13 13:05

[QUOTE=CRGreathouse;380240]Look what I found:

Persi Diaconis, Frederick Mosteller, Hironari Onishi, Second-order terms for the variances and covariances of the number of prime factors—Including the square free case, [i]Journal of Number Theory[/i] [b]9[/b]:2 (May 1977), pp. 187-202.[/QUOTE]

Excellent.

The authors surprise me. None of them, AFAIK, is a number theorist.

OTOH, I got to take statistics from Mosteller. He was a terrific
lecturer.

 CRGreathouse 2014-08-13 18:46

[QUOTE=R.D. Silverman;380284]The authors surprise me. None of them, AFAIK, is a number theorist.[/QUOTE]

The only one I know is Diaconis who is a probabilist.

 All times are UTC. The time now is 09:08.