20170912, 22:50  #1 
"Dana Jacobsen"
Feb 2011
Bangkok, TH
2^{2}·227 Posts 
Generating a uniform random nbit semiprime
I have a routine that generates a random nbit semiprime and would like to improve its efficiency. This is the "unrestricted" semiprime in that I am not just looking for an even bit split, so different than RSA key generation or examples for hard factoring. There are many papers about generation methods for those.
The input is a number of bits n, and the output is an nbit semiprime. The probability of a given result should be equal to that of all other nbit semiprimes. Hence with an input of 6, the values 33,34,35,38,39,46,49,51,55,57,58,62 should be returned with equal likelihood. See CRGreathouse's 2010 post for one possible implementation. For very small bits, e.g. 7 and under, we can just uniformly select from a list. Perfect distribution, super fast. For larger inputs, we can randomly select an nbit number, do an is_semiprime() test, repeating until that is true. This is reasonably fast at small sizes, e.g. average 1 microsecond for 20bit, 3uS for 32bit, 10uS for 48bit, 40uS for 60bit. Your implementation will differ of course. But at some point this really starts slowing down, and basically for >64bit this is not a good method. Charles' idea is clever, weighting the distributions. At a high level this gives a good approximation to the distribution. However it skews heavily toward primes with a large following gap. So much so that 3 occurs as often as 2, and 7 more than 5. I did basically a hack on this method, generating a (mostly) proper curve using a table for small primes then use Charles' method if we selected a larger one. This seems to work reasonably well and gives results much closer to the rejection sampling method. Any ideas on better ways? 
20170913, 01:13  #2 
"Rashid Naimi"
Oct 2015
Remote to Here/There
2^{3}·11·23 Posts 
A bit over my head.
With that disclaimer and for what it's worth. Not sure how is_semiprime() works. But with the right trialbydivision algorithm up to the cuberoot of n (which is significantly less taxing than trialbydivision up to the squareroot of) n you will be left by a set of primes and semiprimes. 
20170913, 02:53  #3  
"Dana Jacobsen"
Feb 2011
Bangkok, TH
2^{2}×227 Posts 
There are various ways to implement it. With mine, 95% of all inputs require little more than a primality test, which is often trivially decided. Then partial factoring quickly finds most solutions. Charles has an example in Pari: http://www.mersenneforum.org/showpos...postcount=2074 that is similar. We both found these tests work well up to a certain point after which it is faster to do more sophisticated factoring. Yes we stop if the input is tiny and we pass the cube root of n.
Quote:


20170913, 04:16  #4 
"Rashid Naimi"
Oct 2015
Remote to Here/There
2^{3}·11·23 Posts 
Again keeping my disclaimer in mind, i will offer some more simpleton comments which is all I can do.
The following example can be stretched in different ranges and is just a concept. Suppose you are aiming to produce the set of all the primes and semiprimes between 100 to 200. You could do trial by division up to n^(1/4). This would filter out all the factors of 2 and 3 for this example range. Then you could perform gcd operations between the remaining items which will distinguish all the primes from 5 to 37. The remaining primes would be to large to form a semi prime in the range (unless multiplied by 2 or 3). Last fiddled with by a1call on 20170913 at 04:38 
20170913, 05:12  #5  
"Dana Jacobsen"
Feb 2011
Bangkok, TH
2^{2}·227 Posts 
Quote:
My problem is, for example, generating 1000 random 128bit semiprimes, each with approximately uniform probability. I don't want to generate all the semiprimes in the range, even if that were at all feasible. There are something like 8 * 10^36 of them. Even with more tame sizes, the point isn't to generate all of them. It is to quickly generate a single random semiprime that has the right probability. Generating them with a somewhat incorrect probability is pretty easy (e.g. generate a uniform random prime p between 2 and 2^(n1), then generate another uniform random prime q in the range such that p*q is an nbit number). 

20170913, 13:07  #6  
Feb 2017
Nowhere
2^{3}·569 Posts 
Quote:
[2^(n1), 2^n  1]. A uniform distribution in an interval usually means, the probability of landing in a subinterval is proportional to its length. But the numbers from the set you're interested in aren't uniformly distributed in the interval. So a process which does well at selecting numbers which are uniformly distributed in the interval, may do less well at selecting members of the subset of semiprimes with equal probability. As to randomness in your subset, theoretical perfection would seem to require knowing in advance all the numbers in the set (here, the nbit semiprimes), then having a process to select one of them randomly, say by enumerating them 1 to K, and randomly selecting an integer from 1 to K. Obviously, knowing the complete set of semiprimes in the interval is impracticable when the bit size is large (as is, therefore, being able to know how "random" your choice of semiprime might be.) So it seems to me that some sort of "weighted" approach  as already mentioned  might be a practical compromise, to get reasonably close to what you want with a reasonable amout of computing power. There is an asymptotic estimate due to Landau for the number of semiprimes less than x, x*(log(log(x))/log(x) but this estimate is not all that great. I'm sure there are estimates of the distribution in terms of the size of the smaller of the two prime factors. I'm equally sure that, if I tried to reproduce such an estimate "on the fly" I would screw it up royally. But such an estimate seems to me to be a good way to go. Use the distribution in terms of the size of the smaller factor to create a "stratified random sample," and then cross your fingers... 

20170913, 15:28  #7  
"Dana Jacobsen"
Feb 2011
Bangkok, TH
2^{2}×227 Posts 
I think you've covered most of what I've got.
Two ways that generate the right distribution are:
Selecting uniform random primes in a range can be done similarly, and I have done each in that context. For primes the second is somewhat reasonable given extremely optimized prime count and nth prime routines, and for semiprimes Charles and I use this in a hardcoded fashion for tiny inputs. With primes, the former has some optimizations that cut out large groups of composites from the selection. I don't think those work nearly as well with semiprimes. The rejection sampling method is practical to 64bits, and feasible probably up into 128 or so bits. Depends on ones available factoring algorithms and performance tolerance. It'd be nice to have something better, but this works. Quote:


20170913, 20:34  #8 
Aug 2006
3^{2}·5·7·19 Posts 
If you use 'my' method, be sure to replace the hacky nextprime() bit with the proper random prime generation which should make the overall method nearly uniform. (PARI/GP now has this bit built in as randomprime.)

20170913, 22:49  #9  
"Dana Jacobsen"
Feb 2011
Bangkok, TH
2^{2}×227 Posts 
Quote:
For 100k 26bit random semiprimes, using the first code (rejection sampling) and looking at the histogram of smallest factor: Code:
17026 2 11633 3 7168 5 5300 7 3447 11 2956 13 2296 17 2098 19 1810 23 1368 29 Code:
15553 3 14520 2 8509 7 7661 5 4050 13 2964 23 2651 19 2649 11 2100 31 1540 17 

20170914, 08:32  #10 
Jun 2003
1369_{16} Posts 
Here's an engineer's solution. Build a table of frequency of bbit semiprimes where the smaller factor is p. But rather than do it for each prime < sqrt(2^b), we do it in batches. Since the smaller primes have more contribution, we can do it at invidividual prime levels for small primes (say < 128). After that we do it at at batches of, say, 0.5 bits, like 2^72^7.5, 2^7.52^8, etc... upto 2^(b/2).
For individual primes, we can estimate the # of semiprimes as Pi( 2^b/p )  Pi( 2^(b1)/p ). For a batch of primes between p&q, we can calculate it as if there are n primes (n=Pi(q)Pi(p)) each of size Mean(p,q) [i dont know which would give better result, AM,GM or HM]. Once we have the frequency counts, we can generate a random number in [0,1) and use it to select prime/batch (in case of batch, select a random prime within the batch). There will be lot of approximations & estimations, but just might work. For a given b, there will be some initialization which can be amortized over multiple calls. Last fiddled with by axn on 20170914 at 08:35 
20170914, 15:48  #11 
"Antonio Key"
Sep 2011
UK
3^{2}·59 Posts 
Don't know if it helps, but I have used the attached perl code which seems to work ok. It uses your random_prime.
Prime distribution, routine executed until one of the counted prime factors = 100k (26bit semiprimes): Code:
Prime Count 2 98900 3 98719 5 99101 7 99327 11 99254 13 98703 17 98651 19 99279 23 99474 29 99749 31 98857 37 99465 41 99004 43 99085 47 98477 53 100000 59 98947 61 98380 67 98631 71 98491 73 99117 79 99165 83 98799 89 98832 97 99597 101 98376 103 99035 107 98870 109 98463 113 98726 127 98853 131 98661 137 98324 139 98539 149 98855 151 99396 157 99167 163 98829 167 98532 173 99200 179 99195 181 99165 191 98809 193 99163 197 99418 199 99166 
Thread Tools  
Similar Threads  
Thread  Thread Starter  Forum  Replies  Last Post 
Semiprime and nalmost prime candidate for the k's with algebra for the Sierpinski/Riesel problem  sweety439  sweety439  11  20200923 01:42 
random comments, random questions and thread titles made for Google  jasong  Lounge  46  20170509 12:32 
Factored my first SemiPrime Some Questions  tal  Msieve  31  20110522 18:17 
Generating Random Primes  davar55  Math  14  20110220 16:06 
About random number (random seed) in Msieve  Greenk12  Factoring  1  20081115 13:56 