mersenneforum.org Estimating the number of primes in a partially-factored number
 Register FAQ Search Today's Posts Mark Forums Read

 2014-08-01, 19:09 #1 CRGreathouse     Aug 2006 597910 Posts Estimating the number of primes in a partially-factored number I'm interested in using information about a number to estimate the likelihood of it having a certain number of prime factors. If we had no information about the number except its size, there are two obvious approaches: the Erdős-Kac theorem plus continuity correction, or Landau's extension to the Prime Number Theorem: $\pi_k(n)\sim\frac{n}{\log n}\frac{(\log\log n)^{k-1}}{(k-1)!}$ Hildebrand, Tenenbaum, and various collaborators have versions of this formula which works over a larger range, though all results are stated asymptotically so it's hard to tell over what ranges each would be best suited. It's not obvious to me when I should use each of the various estimates. (I also wonder if there is an integral formulation of Landau's theorem, just like Li(x) is a better approximation to pi(x) than x/log x.) Things get even more interesting if you add in partial factorization. First, it essentially takes care of the difference between distinct and indistinct factorizations, since beyond a fairly small point you don't see repeated prime factors of that size 'in the wild'. Second, it should be able to use the factorization information to refine the above estimates. Essentially you'd expect about log log x primes out of the first x, and you should be able to subtract this delta from the Erdős-Kac estimate directly. (It's not so clear how to modify Landau's formula.) I ask these questions because in the range of numbers of interest (from those we can factor fairly easily to those we can check for probable primality fairly easily) the numbers are small enough that constant factors and o(1)'s matter. And (pipe dream?) if the calculations are fairly routine we might ask Syd (or, for that matter, Alpertron) to include them in factordb (resp., his ECM page).
 2014-08-02, 20:53 #2 wblipp     "William" May 2003 New Haven 26·37 Posts How about building on the Dickman-de Bruijn function by defining a series $\rho_k(n)$ is the asymptotic probability that x has exactly k prime divisors greater than x^(1/n). $\rho_0(n)$ is the standard Dickman-de Bruijn function. The higher k's can (I think) be defined easily in terms of an integral of (k-1) functions. I'd hope that this heuristic gives a probability density for the number of large primes, with "large" defined by the choice of n. You could use the knowledge of prior factoring effort to choose interesting definitions of "large," and apply Bayes' theorem to modify the density to account for the prior effort.
2014-08-02, 21:48   #3
R.D. Silverman

Nov 2003

11101001001002 Posts

Quote:
 Originally Posted by wblipp How about building on the Dickman-de Bruijn function by defining a series $\rho_k(n)$ is the asymptotic probability that x has exactly k prime divisors greater than x^(1/n). $\rho_0(n)$ is the standard Dickman-de Bruijn function. The higher k's can (I think) be defined easily in terms of an integral of (k-1) functions. I'd hope that this heuristic gives a probability density for the number of large primes, with "large" defined by the choice of n. You could use the knowledge of prior factoring effort to choose interesting definitions of "large," and apply Bayes' theorem to modify the density to account for the prior effort.
Wrong approach. What needs to be computed is the conditional
probability that a large integer N has k prime factors given that
it has no factors less than (say) N^1/a, for given a.

2014-08-03, 04:06   #4
wblipp

"William"
May 2003
New Haven

45008 Posts

Quote:
 Originally Posted by R.D. Silverman Wrong approach. What needs to be computed is the conditional probability that a large integer N has k prime factors given that it has no factors less than (say) N^1/a, for given a.
That's a good approach for the scenarios where you can calculate that distribution - you can start directly with the residual composite. The only way I know to calculate that distribution is to use the inclusion-exclusion and normalization approach described in your paper with Wagstaff. That works for cases where the total number of factors is a handful. But some cases of interest will have "a" of hundreds or thousands.

For these scenarios, I think you would be better off to start with the entire original number and the distibutions I described, then use Bayes to account for the known factoring efforts.

But it doesn't really matter what I think or you think - this is a question of comparing heuristics that can studied empirically. Perhaps we can get the OP to propose some "interesting" cases.

2014-08-03, 05:42   #5
CRGreathouse

Aug 2006

3×1,993 Posts

Quote:
 Originally Posted by wblipp How about building on the Dickman-de Bruijn function by defining a series $\rho_k(n)$ is the asymptotic probability that x has exactly k prime divisors greater than x^(1/n).
Hmm. I think I could do this, but it would be a real bear to compute. I wrote code to compute, or at least closely estimate, that function a few years back and it wasn't easy. It seems like this version would suffer from all the problems of the original, but worse.

But at the ranges that I've tested it the Dickman function already does a fairly poor job; not sure how these would fare, better or worse.

Quote:
 Originally Posted by wblipp You could use the knowledge of prior factoring effort to choose interesting definitions of "large," and apply Bayes' theorem to modify the density to account for the prior effort.
Sounds reasonable.

2014-08-03, 06:12   #6
CRGreathouse

Aug 2006

3×1,993 Posts

Quote:
 Originally Posted by wblipp But it doesn't really matter what I think or you think - this is a question of comparing heuristics that can studied empirically. Perhaps we can get the OP to propose some "interesting" cases.
Sure, let's see. MM127 is an obvious candidate for a big number. For small... well, say 200^199 + 199^200 in Paul's honour.

I think the amount of work on MM127 is well-known. It doesn't look like XYYX has worked on the other one, so suppose not much work has been done on it.

Last fiddled with by CRGreathouse on 2014-08-03 at 16:43

 2014-08-04, 14:54 #7 CRGreathouse     Aug 2006 3×1,993 Posts Using $\omega(n)$ (to avoid the extra variability from the small primes that $\Omega$ brings) and searching an interval around 1020 I find 1 distinct prime factor: 4290 2 distinct prime factors: 21379 3 distinct prime factors: 44810 4 distinct prime factors: 54544 5 distinct prime factors: 42306 6 distinct prime factors: 22179 7 distinct prime factors: 8090 8 distinct prime factors: 2022 9 distinct prime factors: 331 10 distinct prime factors: 49 11 distinct prime factors: 1 which compares to the (naive) Landau predictions of 1 distinct prime factor: 4342 2 distinct prime factors: 16632 3 distinct prime factors: 31849 4 distinct prime factors: 40658 5 distinct prime factors: 38928 6 distinct prime factors: 29817 7 distinct prime factors: 19032 8 distinct prime factors: 10412 9 distinct prime factors: 4984 10 distinct prime factors: 2121 11 distinct prime factors: 812 and the Erdős-Kac predictions 1 distinct prime factor: 29250 2 distinct prime factors: 51228 3 distinct prime factors: 70509 4 distinct prime factors: 76319 5 distinct prime factors: 64973 6 distinct prime factors: 43492 7 distinct prime factors: 22872 8 distinct prime factors: 9438 9 distinct prime factors: 3051 10 distinct prime factors: 772 11 distinct prime factors: 152 This supports the intuition that Landau is better for small numbers of prime factors and Erdős-Kac better for large. In this case the crossover is surprisingly large (8 prime factors) but neither estimate is particularly accurate.
2014-08-04, 18:26   #8
R.D. Silverman

Nov 2003

22×5×373 Posts

Quote:
 Originally Posted by CRGreathouse Using $\omega(n)$ (to avoid the extra variability from the small primes that $\Omega$ brings) and searching an interval around 1020 I find This supports the intuition that Landau is better for small numbers of prime factors and Erdős-Kac better for large. In this case the crossover is surprisingly large (8 prime factors) but neither estimate is particularly accurate.
I would not expect it to be for composites that are only 20 digits.

2014-08-04, 18:33   #9
CRGreathouse

Aug 2006

3×1,993 Posts

Quote:
 Originally Posted by R.D. Silverman I would not expect it to be for composites that are only 20 digits.
Me either -- but it's hard to factor big enough numbers to get useful information. Even if I used 100-digit numbers, at significantly greater effort, that would only increase the log log by about 1.6.

But it would still be useful to have good estimates for these ranges. Any ideas of what techniques could be used?

2014-08-04, 21:47   #10
R.D. Silverman

Nov 2003

11101001001002 Posts

Quote:
 Originally Posted by CRGreathouse Me either -- but it's hard to factor big enough numbers to get useful information. Even if I used 100-digit numbers, at significantly greater effort, that would only increase the log log by about 1.6. But it would still be useful to have good estimates for these ranges. Any ideas of what techniques could be used?
I would try to get a 2nd term (i.e. main term plus error term with effective constant) for the
Erdos-Kac thm.

2014-08-05, 00:05   #11
CRGreathouse

Aug 2006

175B16 Posts

Quote:
 Originally Posted by R.D. Silverman I would try to get a 2nd term (i.e. main term plus error term with effective constant) for the Erdos-Kac thm.
Interesting.

I haven't worked through the proof yet. Do you recommend the original or a modern version (Granville-Soundarajan, etc.)?

 Similar Threads Thread Thread Starter Forum Replies Last Post alpertron Data 566 2021-05-20 13:50 c10ck3r Data 49 2017-12-10 19:39 Trejack FactorDB 7 2016-05-14 05:38 henryzz Math 7 2012-05-23 01:13 petrw1 Factoring 5 2010-11-03 11:31

All times are UTC. The time now is 05:51.

Fri Oct 22 05:51:26 UTC 2021 up 91 days, 20 mins, 1 user, load averages: 1.19, 1.10, 1.13