mersenneforum.org Basic optimisation question
 Register FAQ Search Today's Posts Mark Forums Read

 2008-03-19, 13:51 #1 fivemack (loop (#_fork))     Feb 2006 Cambridge, England 13·491 Posts Basic optimisation question What's an efficient way to choose samples uniformly at random from the set of, say, 'sets of N different integers less than M' ? I'd prefer not to use N^2 time (pick an element, check that it's different from all the ones picked to date) or M memory (flag each integer as not-picked-yet). This seems to be a fairly key component of all the puzzles
2008-03-19, 14:18   #2
retina
Undefined

"The unspeakable one"
Jun 2006
My evil lair

22·5·307 Posts

Quote:
 Originally Posted by fivemack What's an efficient way to choose samples uniformly at random from the set of, say, 'sets of N different integers less than M' ? I'd prefer not to use N^2 time (pick an element, check that it's different from all the ones picked to date) or M memory (flag each integer as not-picked-yet). This seems to be a fairly key component of all the puzzles
Isn't this kind of like the card dealing algorithm.

1. Select a card at random from the pack.
2. Remove that card from the pack.
3. Repeat 1 & 2 until you have the required number of cards.

If you use something like a linked list you can remove each element in constant time, but finding the selected the element is linear time.

If you don't mind breaking your "flag each integer as not-picked-yet" criteria, you can use a bitmap for the flags if memory is a concern, but may require multiple tries of random numbers until you 'hit' an unselected element.

2008-03-19, 16:13   #3
bsquared

"Ben"
Feb 2007

1101011011012 Posts

Quote:
 Originally Posted by fivemack I'd prefer not to use N^2 time (pick an element, check that it's different from all the ones picked to date)
Does this have to be N^2?

pick all the elements (O(N))
sort (N log N)
remove X duplicates (O(N))
N = X
repeat until X = 0

X is probably much less than N every iteration, so I think the overall complexity is closer to N log N than N^2

2008-03-19, 16:22   #4
R.D. Silverman

Nov 2003

22·5·373 Posts

Quote:
 Originally Posted by fivemack What's an efficient way to choose samples uniformly at random from the set of, say, 'sets of N different integers less than M' ? I'd prefer not to use N^2 time (pick an element, check that it's different from all the ones picked to date) or M memory (flag each integer as not-picked-yet). This seems to be a fairly key component of all the puzzles
Checking that an element has not yet been selected is not 'uniformly
at random' because your event space is not constant.

What you seem to be looking for is a shuffling algorithm. Knuth Vol I
has examples.

2008-04-06, 07:24   #5
robo_mojo

Mar 2008

25 Posts

Quote:
 Originally Posted by R.D. Silverman What you seem to be looking for is a shuffling algorithm.
Shuffling the set only makes sense if M is small. If M is 1,000,000,000, for example, that approach wouldn't be practical, especially if N is very small.

2008-04-06, 07:51   #6
drew

Jun 2005

38210 Posts

Quote:
 Originally Posted by fivemack What's an efficient way to choose samples uniformly at random from the set of, say, 'sets of N different integers less than M' ? I'd prefer not to use N^2 time (pick an element, check that it's different from all the ones picked to date) or M memory (flag each integer as not-picked-yet). This seems to be a fairly key component of all the puzzles
I don't believe shuffling algorithms are appropriate, as most assume M=N and operate in an M-sized memory space. And I don't think you care about the order of the N-sized set, right?

I think the 'pick and check' approach you described is appropriate, but the 'check' step doesn't need to have N complexity...it can be constant complexity.

A hash table would be my choice for N much smaller than M. You can create a hash function ( mod N, for instance) that is roughly N-sized, your collisions would be fairly infrequent. Your worst case performance would still be N^2, but it's extremely unlikely. Typically, it will have order 'N' complexity. By choosing random integers, you've implicitly optimized against any clustering problems.

A common coding technique is to use a bit-mask as an efficient modulus operator if your table length is a power of 2. For instance, if your hash table has 64 elements, and your selection is 'n', then your key = n & 0x3f. For your table length, pick the next power of 2 greater than 2N, then you can use open addressing in an array to effectively handle collisions.

Drew

Last fiddled with by drew on 2008-04-06 at 08:17

 2008-04-08, 13:50 #7 m_f_h     Feb 2007 24·33 Posts I think the appropriate algorithm depends on the relative size of M/N. "sparse" case: If M is very large (i.e. N/M small), it is best to use random values add them to a set (implemented as sorted list ignoring duplicates) until the set has desired cardinality. "dense" case: If N is very large (comparable to M) then eventually you will need O(M) memory (not bytes or worse int's, but at least bits) anyway. Here I'd suggest to chose random values in 1..M to set (or delete, if N>M/2) bits in a bitmap, if {1..M } is too large to shuffle and take the first N values. I wonder whether there coud be an approach using a random generator directly creating M-bit numbers (and doing something efficient to get such a random number having N bits), maybe "or"-ing if there are not enough and some intelligent thing to clear a few bits if there are too many bits set) Last fiddled with by m_f_h on 2008-04-08 at 14:01

 Similar Threads Thread Thread Starter Forum Replies Last Post storflyt32 YAFU 2 2015-06-29 23:25 wong8888 Information & Answers 5 2015-03-22 12:15 iconized Prime Sierpinski Project 2 2012-02-03 00:01 fivemack Math 0 2008-06-16 10:57 drake2 Math 1 2006-01-12 07:40

All times are UTC. The time now is 11:54.

Tue May 11 11:54:38 UTC 2021 up 33 days, 6:35, 1 user, load averages: 2.36, 2.32, 2.28