20180304, 17:08  #1 
"Dana Jacobsen"
Feb 2011
Bangkok, TH
3^{2}·101 Posts 
Requestion for fast chebyshev theta function
I am looking for ideas of making a fast implementation of the Chebyshev theta function. Ideally a method substantially better than the straightforward sumallthelogs.
We can compute the Chebyshev psi function easily given a theta function so no need to worry about that. The theta function is the sum of the logs of the primes up to n. Using log(ab) = log(a)+log(b) it can be seen this is also the log of the primorial. The simple way to do this is to loop through primes, summing up log(p). Assuming finite precision, we can use a sparse table to accelerate it a bit. I'm looking for some ideas on significant improvements. Without tables, using with ~18 digit precision (long double on x8664), it takes 3.7s for 2^32 and 1 minute for 2^36. Going all the way to log(p#), even though we have fast methods for primorials and only have one log to do, seems like a bad idea. p# grows far too quickly. However the idea of batching some of these seems like it could help a little. Given p1 and p2 we can add log(p1)+log(p2) or log(p1*p2). The latter is likely faster. Precision is a consideration. I am initially doing this using a native type (64, 80, or 128bit float). We need to be careful to not lose too much precision in the sum. This also puts a limit on the batching idea, both in the input size and the impact on output precision from the log. Depending on the float type this also limits the function input to ~ 2^53 or 2^64 depending on the float type. I'm doing this in C, and currently it's ~200x faster than Pari/GP forprime(p=2,n,s+=log(p)) even with sub18digit precision. But of course Pari/GP's log() is quite different than libm's, so this isn't a really fair comparison. I was hoping there would be a great solution akin to prime counting / summing, but that works on completely multiplicative functions, which log(n) is not. 
20180304, 21:54  #2  
"Dana Jacobsen"
Feb 2011
Bangkok, TH
3^{2}·101 Posts 
Quote:
Pari is about 10x slower looping through the primes  some of this is just the obvious C vs. GP. primesieve is probably even faster. The real kicker is logl or logq (quadmath) vs. Pari's log. If I wanted more precision I'd have to use MPFR or my own GMP/mpf version (which is slower than MPFR), and I bet most of the difference would disappear. Regardless, I can microoptimize loops and so on, but if there is a better *algorithm* it could easily win. Similar to how even a soso Lehmer's prime count method can run much faster than counting primes with the excellent primesieve. The latter is seriously optimized for its task, but there are better ways to count primes than generating them. 

20180305, 00:27  #3 
Aug 2006
5,987 Posts 
I don't know of a clever algorithm. Batched logs (ideally with some guard bits) are the best I could come up with, too.

20180305, 17:07  #4 
Feb 2017
Nowhere
3·1,993 Posts 
Offhand, I can't think of anything faster than "batched logs and guard bits." As you say, precision is key. Alas, Chebyshev psi seems to be the easier one to work with.
There are any number of tantalizing results: It is of course well known that either Chebyshev theta (sum of log p) or Chebyshev psi (sum of log(p^e)) is asymptotic to the upper bound. There is an "explicit formula" relating psi to the zeroes of the Riemann function. There is also a RHdependent error term which could conceivably help. Of course, you would have to formulate the difference between Chebyshev psi and Chebyshev theta suitably. 
20180305, 18:30  #5 
Dec 2008
you know...around...
2^{3}×101 Posts 
I think I came up with something original on this issue in 2011, but got stuck somewhere inbetween: See posts 1  4 in
http://www.mersenneforum.org/showthread.php?t=15107 
20180306, 14:22  #6 
"Dana Jacobsen"
Feb 2011
Bangkok, TH
3^{2}×101 Posts 
One thing I found useful with precision was using Kahan summation. While it slows it down about 20%, it adds at least 3 correct digits at 2^32. With Pari or MPFR it's cheaper and easier to just add more bits.
I tested using MPFR and it is a massive slowdown, and the times are quite close to Pari/GP. There is little time impact of adding some more digits. I did not do a timing comparison with quadmath, but it's quite noticible slower than gcc/Intel long doubles. 
20180306, 17:04  #7  
Jun 2003
2^{2}×7×193 Posts 
Quote:
Code:
s=1.0; forprime(p=2,2^26, s*=p); log(s) Code:
s=0.0; forprime(p=2,2^26, s+=log(p)); s Last fiddled with by axn on 20180306 at 17:10 Reason: code correction 

20180306, 20:47  #8  
"Robert Gerbicz"
Oct 2005
Hungary
3047_{8} Posts 
Quote:
Even without tables, there is a way to gain a speedup: use the central binomial coefficient: binomial(2*n,n) is divisible by all primes between (n,2*n), and has got exponent<=1 for all primes in (sqrt(2*n),n], and we know that this exponent is constant in "long" intervals, say it is 1 in (n/2,2*n/3]. For p<=sqrt(2*n) compute the exponent for each prime and finally use a good Stirling approx. to get log(binomial(2*n,n))=log((2*n)!)2*log(n!). [Paul Erdos used exactly this idea to prove the Chebyshev's theorem] 

20180308, 13:33  #9  
Feb 2017
Nowhere
1011101011011_{2} Posts 
Quote:
In general, for the prime p, the exponent of the ppower factor of the binomial coefficient binomial(N, m) is the number of carries in the basep addition m + (N  m) = N. For binomial(2*n, n) this is the number of carries in the basep addition n + n = 2*n. This result makes it easy to confirm the statements about exponents. It also makes the following an easy exercise: The binomial coefficients binomial(N, m), m = 0 to N, are all odd if and only if N is one less than a power of two. 

20180331, 14:59  #10 
Dec 2008
you know...around...
1100101000_{2} Posts 
I'm kind of intrigued by this ansatz with the binomial numbers.
I see why we get the prime numbers in intervals of (n/(2k1), n/2k] for k>=1 in the factorization of , but is it possible to "fill the gaps"? For instance, with , we have most of the primes except those in intervals (n/6k, n/(6k+1)], and the prime factors in intervals (n/(6k1), n/6k] have exponent 2. With , all intervals are covered, but the exponent in (n/k, n/(k+1)] is equal to the "hamming weight", i.e. the number of 1's in the binary representation of k. Any further ideas there? And what about x(x) ~ log(x)*[Li(x)(x)] + c  [Li(y)(y)]? With an approximation of Li(y)(y) ~ (1+), an error bound of O() seems possible. I still have no clue how useful this might be. Not only for this particular thread, but in general, and for large n. I mean, for small x, say <=10^16, some already known values for (x) can be used to determine the summation term, and then interpolate accordingly. Precomputed terms could go with a spacing of 10^10 or even 10^11 to get an error of ± 1 in x(x). But I'll wait for answers before I work out any details there. 
Thread Tools  
Similar Threads  
Thread  Thread Starter  Forum  Replies  Last Post 
The FibonacciChebyshev connection  Dr Sardonicus  Number Theory Discussion Group  6  20220115 12:13 
Do normal adults give themselves an allowance? (...to fast or not to fast  there is no question!)  jasong  jasong  35  20161211 00:57 
Fast Approximate Ceiling Function  R.D. Silverman  Programming  27  20101119 17:50 
Chebyshev's Estimates  brownkenny  Math  2  20090122 17:21 
Birkhoff and Hall's theta function  Dougy  Math  2  20090105 05:09 