mersenneforum.org Matching problem
 Register FAQ Search Today's Posts Mark Forums Read

 2021-06-02, 12:44 #1 henryzz Just call me Henry     "David" Sep 2007 Cambridge (GMT/BST) 61·97 Posts Matching problem Suppose you have n letters and n addressed envelopes that you need to match. You can find matches by pairing all letters with envelopes and a friend will tell you how many matches there are(not which). What is the expected number of times you need to pair the letters with envelopes to get all matching? What is the maximum number of pairings you could need to find all matches? What would your strategy be to minimize the above numbers? Do they have the same solutions? For easier discussion let's use n=10 as an example although a general formula would be great.
 2021-06-02, 16:38 #3 Viliam Furik     "Viliam Furík" Jul 2018 Martin, Slovakia 23·5·17 Posts I will demonstrate what I think is the worst-case scenario for n=5, using a bit of strategy in choosing the switches, such that I will try to not place the same letter twice to where I know it doesn't belong. ([x] denotes that letter is for sure in its correct envelope) Begin Step 1 -> 2, 4, 5, 1, 3 Friend -> 0 (call 1) Step 2 -> 2, 3, 5, 1, 4 (3-4 switch) Friend -> 0 (call 2) Step 2 -> 5, 3, 2, 1, 4 (2-5 switch) Friend -> 0 (call 3) Step 2 -> 5, 1, 2, 3, 4 (1-3 switch) Friend -> 0 (call 4) Step 2 -> 5, 1, 4, 3, 2 (2-4 switch) Friend -> 0 (call 5) Step 2 -> 3, 1, 4, 5, 2 (3-5 switch) Friend -> 0 (call 6) Step 2 -> 3, 1, 4, 2, 5 (2-5 switch) Friend -> 1 (call 7) Step 2 -> 3, 5, 4, 2, 1 (1-5 switch) Friend -> 0 (call 8) Step 2 -> 3, 1, 4, 2, [5] (rollback) Step 2 -> 3, 4, 1, 2, [5] (1-4 switch) Friend -> 1 (call 9) Step 2 -> 1, 4, 3, 2, [5] (1-3 switch) Friend -> 3 (call 10) Step 2 -> [1], 2, [3], 4, [5] (2-4 switch) At this point, you don't even need to call your friend, because you are certain you got it correct - If your correctness number is n-2, you know you just have to switch the two remaining letters you didn't yet mark as correct. That is 10 calls for n=5 in the worst case. If someone finds a mistake in my example, please, point it out.
2021-06-02, 19:50   #4
Walter

"Walter S. Gisler"
Sep 2020
Switzerland

B16 Posts

My approach would be to formulate the problem as a 0-1 linear program.

We will have variables $x_{ij} \in \{0,1\}$ which are 1 if letter $i$ matches with envelope $j$ and 0, otherwise.

Clearly, every assignment of letters to envelopes would fulfill the following constraints:

$\sum_{i\in I} x_{ij} = 1 \: \forall j \in J$
$\sum_{j\in J} x_{ij} = 1 \: \forall i \in I$

If we solve this problem we will get a unique assignment and we can receive the number of correct guesses, which we can denote as $c$. With that, we can add one new constraint every time this hint is given. Assuming our previous solution is given as a set of $ij$ pairs, denoted as $S$, this additional constraint can be expressed as follows:

$\sum_{ij \in S} x_{ij} == c$

This constraint essentially says that the correct solution includes $c$ of the matchings in $S$.

I believe this is the strategy that uses the given information the most efficient way possible and minimizes the number of attempts needed to uncover all of the correct pairs. Unlike using only the information from the last (or maybe last two "rounds") it combines all knowledge it retrieves in this process.

It is pretty hard to calculate the expected number of "rounds", but a simulation can help to quantify it. I have implemented this quickly in Python to get an idea. I have added a sample run to illustrate how this method choses the pairs in each round. It is clearly not what a human would do or in any way similar to the previous answer, but seems to be quite efficient.

I'm attaching a graph showing the frequencies (+cumulative frequencies) of the problem being "solved" after a given number of rounds. It can be seen that, if we have 10 letters and 10 envelopes, the expected number of iterations seems to be somewhere around 10. In my experiment (2000 random experiments), the highest number of tries I needed was 15 (just once), but the most common number seems to be 10 (473 times out of 2000 experiments).

I am also attaching the Python code (simulate.txt - seems like I can't upload .py files) in case anyone wants to experiment with it and adjust it to chose a different strategy.

Out of curiosity, I have also done a quick experiment and have adjusted the number of letter/ envelope pairs to 20. Obviously, there are a lot more possible assignments. Nevertheless, the expected numbers seems to be somewhere around 27, which is still quite decent.
Attached Thumbnails

Attached Files
 sample_output_n=10.txt (1.4 KB, 38 views) simulate.txt (3.3 KB, 40 views)

 2021-06-02, 20:37 #5 Viliam Furik     "Viliam Furík" Jul 2018 Martin, Slovakia 23·5·17 Posts I am looking at the code, but I still can't get how the heck does it decide what's best to continue with... How do the constraints work and how do they affect the next "prediction"?
 2021-06-02, 20:50 #6 Walter   "Walter S. Gisler" Sep 2020 Switzerland 11 Posts I like your approach, Viliam. I would be really curious to see how it performs in a Monte Carlo simulation, but it isn't as easy to implement as my method. You can even improve the performance a bit if you keep track of pairings that are wrong. For example, in step 2c, you learn about two numbers that are wrong and it would never make sense to switch either of them back again. In the example that you provided, you are doing one move that could be avoided: You already know (from the very first step) that 4 can't be in the second position, but you are doing a swap later in the process that brings it back into that position. By the way, after 8 steps, the solution is actually clear if you combine all of the knowledge you have gained in the process. To show this, I have created a 5x5 grid representing all possible pairs of letters/ envelopes. Whenever your "friend" tells you that a solution has 0 correct choices, you cross out a possible solution. After the first 8 steps, you got the 0-correct scenario 7 times, so the grid is quite full already (see attached screenshot). Based on this, you can already deduce the correct solution for letter 4 and 5 and don't need to continue trying with those. Furthermore, you have had one solution that gave you 1 correct position: 3, 1, 4, 2, 5 Since from all the 0-correct scenarios you know that the solution will be x, x, x, 4, 5, you can now deduce that 3, 1, 4, 2 is also incorrect. That crosses out one more square and tells you that the solution will be x, x, 3, 4, 5. Furthermore, for position 1, the candidates are either 1 or 4, but 4 is already taken. From that, the whole solution is know. Nevertheless, a pretty good strategy, but you can combine it with additional logic reasoning to reduce the number of steps. Attached Thumbnails
2021-06-02, 21:14   #7
uau

Jan 2017

23×3×5 Posts

Quote:
 Originally Posted by Viliam Furik I am looking at the code, but I still can't get how the heck does it decide what's best to continue with... How do the constraints work and how do they affect the next "prediction"?
I'm not familiar with docplex which seems to be what the code uses to generate the specific guesses, but it seems like it probably just chooses some arbitrary permutation which has the given number of matches with every previous try - if your first try got 4 matches and second 5, then on third try it'll try some permutation that has 4 choices in common with the first and 5 with second.

2021-06-02, 21:24   #8
Walter

"Walter S. Gisler"
Sep 2020
Switzerland

10112 Posts

Quote:
 Originally Posted by Viliam Furik I am looking at the code, but I still can't get how the heck does it decide what's best to continue with... How do the constraints work and how do they affect the next "prediction"?
Viliam, not sure if you are familiar with linear programming/ constraint programming, but it might be a bit confusing at first and requires a slightly different way of thinking. This is certainly not the easiest example to get started with.

The whole idea behind my approach is to describe the correct solution in terms of the knowledge we gain in every step. The knowledge you gain in one single step isn't valuable by itself, but the knowledge you gain in multiple steps combined, tells you a lot about the solution. Think of it as ruling out solutions, rather than moving towards a better solution at every step. At the beginning, the possible solution space is $n!$. With every iteration and every constraint that we add, we cut off some of the solutions and therefore reduce the solution space, until we are left with only one single unique solution.

Let's look at a much easier example with n=3; let's assume the correct solution is [1, 2, 3]

If our first guess is [3,1,2], the number of correct guesses is 0.

Based on this, we can not only rule out [3,1,2], but also [3,2,1], [1,3,2] and [2,3,1] (because each of them has one position in common with a solution we know is completely wrong), and that's exactly what adding that constraint would do.

About choosing the next solution: as I said, all that the constraints do is to reduce the solution space from $n!$ to a smaller number of solutions. So after adding a constraint, we just let the MIP solver chose one of the remaining solutions in the solution space at random. The whole idea isn't necessarily to chose a better solution in every iteration, so "predict the next solution" isn't quite the right wording.

A common problem that is solved with a binary integer program is Sudoku. It is certainly much easier to understand than this. If you are curious about the method I am using here, this would probably be a good starting point: https://www.mathworks.com/help/optim...lem-based.html

2021-06-02, 21:26   #9
Walter

"Walter S. Gisler"
Sep 2020
Switzerland

11 Posts

Quote:
 Originally Posted by uau I'm not familiar with docplex which seems to be what the code uses to generate the specific guesses, but it seems like it probably just chooses some arbitrary permutation which has the given number of matches with every previous try - if your first try got 4 matches and second 5, then on third try it'll try some permutation that has 4 choices in common with the first and 5 with second.
Yes, you are completely right. And I realize this is a much more intuitive explanation than my attempt

Think of docplex as a black box. There are plenty of other libraries I could have used: CBC, Gurobi, OR-Tools, one of many SAT solvers etc.

 2021-06-02, 21:32 #10 uau   Jan 2017 23·3·5 Posts Trying something that could be the correct answer given all previous ones (which I believe is what Walter's code does) is not always the optimal answer, either for average or worst-case behavior (at least worst case for remaining tries in a particular situation, I don't show this to necessarily affect the global worst case): Suppose that n = 1000000. Your first random guess gets 999998 matches right (quite lucky). Now, if you keep trying guesses that could be the correct answer, they can only swap one pair from your initial guess. Since you can only change two places at once, you'd expect to need around 146000 tries to even try changing one of the incorrect positions (you'd try changing twice that number of places, and 50% chance that both the incorrect positions are outside that set). Some kind of bisection strategy should be able to do much better than that. Last fiddled with by uau on 2021-06-02 at 21:35
2021-06-02, 22:38   #11
Walter

"Walter S. Gisler"
Sep 2020
Switzerland

138 Posts

Quote:
 Originally Posted by uau Trying something that could be the correct answer given all previous ones (which I believe is what Walter's code does) is not always the optimal answer, either for average or worst-case behavior (at least worst case for remaining tries in a particular situation, I don't show this to necessarily affect the global worst case): Suppose that n = 1000000. Your first random guess gets 999998 matches right (quite lucky). Now, if you keep trying guesses that could be the correct answer, they can only swap one pair from your initial guess. Since you can only change two places at once, you'd expect to need around 146000 tries to even try changing one of the incorrect positions (you'd try changing twice that number of places, and 50% chance that both the incorrect positions are outside that set). Some kind of bisection strategy should be able to do much better than that.
Interesting case! Would definitely be curious to see thoughts on how to handle this kind of situation better. I am not so sure. isn't 9999998 correct matches just as bad as only 2 correct matches in terms of information value?

I did a few additional tests. Unfortunately it isn't practical to try my method with such a high n because the matrix gets too big, but if I use n = 100 and force exactly this kind of situation, I am on average only using about 30 tries. My feeling is telling me that that's pretty decent compared to what you would expect as an average runtime.

So, with n = 1M, it is true that my approach would probably use over 100k tries in this worst case scenario (it needs to "touch" each wrong position at least once). However, what would the average case scenario look like? My gut feeling is telling me it would be somewhere in that range too. But might completely off on that estimate. It is pretty hard to imagine this game with such a lot of letters and envelopes...

 Similar Threads Thread Thread Starter Forum Replies Last Post PhilF Data 35 2019-05-20 02:36 sdbardwick Lounge 1 2015-02-03 15:03 miket Math 5 2014-08-12 00:41 Dubslow PrimeNet 8 2012-04-27 18:19 patrik Data 1 2011-09-24 23:44

All times are UTC. The time now is 05:12.

Wed Oct 20 05:12:53 UTC 2021 up 88 days, 23:41, 0 users, load averages: 1.12, 1.26, 1.23