TYI 30 asked Elchanan Mossel’s Amazing Dice Paradox (that I heard from Yuval Peres yesterday)

You throw a die until you get 6. What is the expected number of throws (including the throw giving 6)** conditioned** on the event that all throws gave even numbers?

Most people answered 3.

Is it the right answer?

### No!

Please use now the comments thread to offer your answers, explanations, insights, intuition, thoughts and after-thoughts. I am especially eager to hear your take, James Martin! For a nice explanation by Paul Cuffis, see this comment by Yuval.

Comments on the English dilemma between “a die” or “a dice” are also welcome.

(Let me also draw your attention to TYI 29 about exciting models of random trees.)

Advertisements

Pingback: TYI 30: Expected number of Dice throws | Combinatorics and more

I just assumed it would be . Obviously more to it”

Ah! The last probability is 1/6!

Very interesting I have a hard time finding a flaw in the simple calculation –

1) From bayes formula P(6|even)=1/3.

2) The number of throws needed follows a Geometric distribution with $p=1/3$, and therefore the mean is 1/p=3.

The conditioning is more subtle. It is NOT that all throws are even, but that all throws until the first 6 are even.

I realize that I didn’t give enough context in my comment.

I’m referring to a modified version of the experiment in which the thrower continues to throw the die even after the first 6 occurs.

Ok now it makes perfect, I mis-understood the question. Thanks!

An intuition for whats wrong with the naive thinking.

Suppose you throw a die with with 1,000 sides instead of 6, until you get a 6 conditioned that all throws gave 2, 4 or 6 (let’s refer to this as a “valid sequence”). According to the naive thought, the answer should still be 3.

Recall that the probabilities in the conditioned probability space are proportional to the probabilities in the original space.

In the original space, the probability for a valid sequence of length 1 is 1/1,000, and the probability for a valid sequence of length at least 2 is at most 6/1,000,000 (because the first throw needs to be 2 or 4, and the second throw needs to be 2, 4 or 6).

So in the conditioned space, the probability for a length 1 sequence is overwhelming.

By this logic, the answer to “what is the probability that you get a 6 from one throw of a normal 6-sided dice conditioned on that all throws gave 6?” is still 1/6! But it’s 1.

As I wrote, the probabilities in the conditional space are *proportional* to the probabilities in the original space, not equal.

So as a result, in the conditional space, the probability for a sequence of length 1 is at least 1,000/6 times the probability for a sequence of length at least 2.

@Ido

So what is the probability that you get a 6 from one throw of a normal 6-sided dice conditioned on that all throws gave 6?

I would totally make this mistake in a paper!

Do you think the referee would catch it?

I had to simulate it to believe it, and indeed 1.5 is the average.

Below was my approach to the problem, before seeing Paul Cuff’s vastly superior solution.

Let’s think of a fair die as being a uniform distribution over {0,1}x{1,2,3}. If my i^th draw is (a_i, b_i), then the number shown on the die is 2b_i-a_i. Notice that a_i and b_i are independent, and that 2b_i-a_i is indeed uniformly distributed on {1,2,3,4,5,6}.

Now, my sequence of die rolls can be thought of as two independent random variables, where one is an i.i.d. uniform sequence (a_i)_i from {0,1} and the other is an i.i.d. uniform sequence (b_i)_i from {1,2,3}.

Let’s define two more auxiliary random variables: T=min{i: a_i=1} and S=min{i: b_i = 3}. We want to condition on the event that “6”=(0,3) is rolled strictly before time T. Equivalently, we want to condition on the event that S<T. In the event that S<T, the first time a "6" is rolled is exactly S.

So, T and S are independent exponential random variables of expectations 2 and 3, respectively, and we just want to compute E[S | S1, it’s not bad with pen and paper to compute the value E[S | S<t] explicitly, and the conditional expectation turns out to be <2.

Since S and T are independent, the value E[S | S<T] is a weighted average of the above, and so is itself <2. (The series defining E[S | S1, since it’s at least 1 state-by-state (within the conditioning event) and strictly higher with positive probability. Therefore, 0 < E[S | S<T] < 1. Then, the multiple-choice format comes to the rescue, delivering an answer of 3/2.

Whoops, I deleted in my penultimate paragraph. It should read:

“… and we just want to compute E[S | S1, it’s not bad with pen and paper…”

“… and we just want to compute E[S | S<T ]. Given a number t strictly greater than 1, it’s not bad with pen and paper…”

I get 3/2, but instead of the calculations I went through, let me try to give an intuition. Say we just know we rolled the die i times and the first 6 was the last time. Then the probability that all the previous rolls were even is 2/5^(i-1), dropping faster than 1/2^i, which is the likelihood that all rolls were even. So shorter sequences are favored by the conditioning event, making the expectation smaller. As the number of sides increases then, the effect will be diminished, ((n/2-1)/n vs 1/2) and the intuitive answer becomes close to the actual expectation.

3/2 The explanation at the link.

https://drive.google.com/file/d/0BxUf40PyJyBpNFBja1JzanJtUDg/view?usp=sharing

Before conditioning, the probability that we play for rounds decays like (we need roll 2 or 4 every round except the last). Conditioning multiplies by a constant but doesn’t change the exponential decay, so the resulting distribution is the geometric random variable with failure probability , success probability and expected time to success .

Ah well, my take was the same as Paul Cuff’s, it turns out 🙂 I wonder if that is also what Elchanan had in mind when posing it. Anyway, very nice question! (and lovely picture of Elchanan 🙂 )

As for the English dilemma, there is the well-known saying “Never say die!”….

Let p be the probability that a sequence terminates with a 6 before an uneven number occurs (the conditioned space).

Then p = 1/6 * (2/6)*p -> p = 1/4 (either it hits 6 on the first throw, or it has hits 2/4 and you face the same probability again).

So, what is the probability, given the condition, that we hit six on the first throw?

Without the condition, that would be 1/6, but the condition says only 1/4 of the alternatives are possible, thus the probability ends at 2/3.

The expected value of a geometric distribution then gives us E=3/2.

Should say: p = 1/6 + (2/6)*p → p = 1/4

For me, die is the only possible singular of dice, even the f it sometimes sounds awkward. “One dice” just sounds ignorant.

Probability of getting 1/6 on first throw with conditions is 1/3/1/2 or 2/3. Note that any sequence ending with 1/6 can be reversed to begin with 1/6 so E(x)=1/(2/3)=1.5

Where does “1/3/1/2” come from? The answer is correct, I just don’t follow the method.

What would your method give you for an 8-sided die, rolled until getting an 8, conditioning on even results only?

Here’s some empirical support for 3/2:

def throws():

“Return a random list of throws, ending when you get 6.”

die = random.choice((1, 2, 3, 4, 5, 6))

return [die] if (die == 6) else [die] + throws()

mean(len(seq) for seq in (throws() for i in range(100000))

if all(d % 2 == 0 for d in seq))

1.497997

I am totally confused by this.. isn’t a simple intuitive way to think about this be: if the average of hitting 6 on a dice is 3 throws, then hitting a 6 on a dice where you can only hit evens (i imagine a weighted die) be 1.5 because of the halved possibilities? Am I missing something?

The average time it takes to roll a six is 6 throws, not 3. A good way to see this is to imagine a six-million-coin-toss experiment and divide it into runs that end with a six (with an unfinished run at the end that we may ignore). Since there are about a million 6’s in tbe experiment, there are about a million runs, and since tbe total length of the runs is six million, the runs have an average length of 6.

Here’s how I got intuition about this: imagine that we’re throwing darts uniformly at an n*n board. What’s the expected number of throws until we hit the top left corner, conditioned on always hitting the topmost row until that time? In this case, it feels obvious that the conditioning on always hitting the topmost row is so “stressful,” that the conditioned process wants to “get that part over with as quickly as possible,” and just hit the top left corner so it can get on with hitting the rest of the board. Therefore the expected number of throws will be less than n.

Die vs dice is just your standard-issue conflict between the rules of English as they are and as they would be were we designing them from scratch—like “always put the period inside the quote.”

Here is a nice variant:

You toss a die 10,000 times. What is the expected number of tosses until reaching six (the toss giving six is counted) conditioned on the event that six is reached and all tosses

afterreaching six are even.You may consider infinite number of tosses if you fill comfortable with infinite probability spaces.

Very cute.

The infinite case isn’t well defined, I think, since the event that all (infinitely many) tosses after the first six are all even will have probability zero. There is also no convergent limit for the answer as 10000 -> infinity.

However, if you change the condition to the symmetric “six is reached, and all tosses *before* the *last* six are even”, then the conditional expectation / distribution will converge meaningfully as the number of tosses is increased, even though the limit event will still have probability zero.

Another nice variant is to compute just the probability that the first toss gives 6.

“You toss a die until you get 6. What is the probability that you got 6 in the first toss conditioned on the event that all tosses gave even numbers?”

There is a story about a a mathematician who tried to explain intuitively to her husband that the expected number of tosses until reaching “six” is six, and the explanation worked so well that he was convinced that the expected number of tosses until reaching “five” is five.

J. Michael Steele calls this “first step analysis”. Let c be the probability that 6 appears before 2 and 4, given that 6 appears before 1,3, and 5. Then c=5!/(6!/4)=2/3 just by counting the ways to order 1,…,6. Now the expected number, E, satisfies E = 1(c) + (1+E)(1-c) since either we immediately get a 6, or else we expect 1+what we originally expected. Solving for E gives E=1/c=3/2.

This is fascinating example with so many lessons to learn 🙂 Although many people before me posted right solutions to the problem, what I was missing is a ‘rigorous’ wrong solution that arrives to the wrong answer 3 (and its correction). I think that a wrong solution, with the error in it pointed out, is just as instructive as a completely independent solution that arrives to the right answer.

The lesson to be learned is that conditional expectations should be treated with care 🙂

I compiled a small note about this (http://bit.ly/2xUj9bX), so I won’t have to explain this to students after they get this example on conditional expectations. Thanks for the post!

Pingback: Exploring Elchanan Moddel’s fantastic probability problem with kids | Mike's Math Page

Pingback: Exploring Elchanan Mossel’s fantastic probability problem with kids | Mike's Math Page

I wrote a computer to check that the expected value is 1.5 and the program itself lends itself to an easy proof. The program generates random die throws keeps counts of 2’s and 4’s until one sees a six. If a 1, 3 or 5 shows up you can throw away that run and it resets the counter to zero.

Looks at the sequence of die throws generated by the program. Look at each 6. If the previous throw had value 1, 3, 5 or 6, the length of the sequence generating the 6 is 1. So there is a ⅔ probability of the sequence generating 6 in one step. In general the probability of that sequence having length i is (⅓)^(i-1) ⅔. Summing that series gives the expectation 3/2.

Just for fun, here is a wrong “intuitive” way to get to 3/2 🙂

If we had a fair die with only even faces (2,4,6) on it, the expected time to reach 6 when rolling it would have obviously been 3. But our conditioned die isn’t like that – on every toss, the apriori die only has a probability of 1/2 to behave as an “even die”, and probability 1/2 to get an odd result and have the trial thrown out. So, because the trial must not be thrown out before reaching 6, we should reach it more quickly than the “even die” does. Since we have a probability of 1/2 of terminating instead of rolling the even die, we should reach it 2 times more quickly, i.e. on average in 3/2 tosses.

The last line in the above “computation” doesn’t really have any basis, though. In fact, for a 2N-sided die, the same argument as above would tell us that the average number of tosses to reach 2N conditioned on all tosses being even should be a N/2, whereas it is actually 2N/(N+1). These two values agree (miraculously) only for N=3.

Many thanks, everybody, for the interesting comments!

I found 1.5 to be the limit, contrary to my intuition. I computed it this way, in J

roll=: [: }. (],?@[)^:(<:@[ ~: {:@])^:_~

filter=: #@[ = #@(#~ 2&|)

(+/%#)@(#~ *) ((filter*#)@roll)"0 (1e6# 6)

Here’s another way to think about it (but not all that different deep down). Suppose you have a die with just the numbers 2,4,6 on it. You repeatedly roll the die until you get a six. Obviously the expected number of rolls you need is three. But if you change the rules so that before each throw you toss a coin and if it comes up tails the game is aborted, then things change. Initially, the probabilities of the various outcomes were 1/3, 2/9, 4/27, 8/81, …, but afterwards they change to 1/6, 1/18, 1/54, …, which is proportional to the probabilities you get from a geometric distribution with parameter 2/3. And this is equivalent to the question asked. Leaving aside what the result of the calculation is, this makes it obvious that the expectation is going to be different, and indeed that it will go down. But in fact it also makes it clear that in going from the “naive” wrong answer to the correct answer, one is dividing by 2, where is the parameter in the geometric distribution, so goes up from 1/3 to 2/3 and the expectation goes down from 3 to 3/2.

I should say that I fell right into the trap, and am writing this only after learning the right answer and reading some of the comments above. (But in my defence, I didn’t have time to think about the question for long, and I thought it was likely that I was falling into a trap, since otherwise the question wouldn’t have been asked in the first place.)

If one conditions on the event that all the throws gives an even number, then we can safely ignor odd values and the problem is equivalent to considering a dice with only values 1,2,3 and looking for the expected time until the first 3 shows off, that is the expected length of an initial sequence avoiding a value.

That is exactly the very plausible argument that turns out to be wrong and that makes this paradox such a good one.

I “guessed” 1.5, but I think the correct answer as far as English is standardly spoken is actually 3. The reason is not die vs. dice, but instead the word “until.” That implies a stopping time. The J. Michael Steele version is clearly not a stopping time and so is well posed and the calculations make sense and are beautiful and intuitive. So the only sensible version as far as making it a stopping time is: “throw a dice if isn’t even discard it. Keep throwing until a 6 is achieved. You will be waiting 3 tosses.” This uses the word “until” as a stopping time. Can someone describe the problem as given in a similar fashion? It would have to read something like:

“Consider sequences which are even before the first 6 is rolled, and any number 1-6 afterwards. Each such sequence is given equal probability. How long do you wait UNTIL a 6 is rolled?”

This would then be using the word until as a stopping time and hence not be confusing given standard English usage!

If I’m not mistaken, there are complexity classes which involve conditioning on the path. They also get some “curious” results and can compute stuff really fast!

In my opinion, this “paradox” mainly stems from the phrasing.

It becomes much less “paradoxical” when rephrased to:

What is the expected length $\ell$ of a sequence of dice throws that is constructed under the following rules:

When a 1,3 or 5 is rolled, the sequence is reset to the empty sequence.

When a 2 or 4 is rolled, the sequence is appended by said number.

When a 6 is rolled the sequence is appended by 6 and the construction is finished.

In this case, Lance Fortnow’s explanation is excellent and the expected length $E(\ell)$ is given by

$$\sum_{\ell} \ell \cdot 2/3 \ cdot (1/3)^{\ell -1}$$ which is $1,5$.

I didn’t give any interpretation of my calculation of the probability in my earlier comment The calculation can be found here http://bit.ly/2gPSw10. I will provide that now.

The key to understanding this paradox is to realize that the objects of interest are the sequences of trials and not the individual trials. At each trial a decision is made to discard the sequence if the outcome is odd, terminate the experiment if the outcome is a six, or continue the experiment if the outcome is a 2 or a 4. These decisions are based on the entire history up to that point not just the trial under consideration.

Continue reading. https://drive.google.com/file/d/0BxUf40PyJyBpYWhuMUo2bVR0NlU/view?usp=sharing

Under the same hypotheses (all evens), the probability that the sequence has length 1, i.e. is a single 6, is 2/3. Weird at first since in the larger even space the probability of a length one sequence is 1/6.

Let X be the time to get the first 6, and A be the event that all the throws till the first 6 are even. For any n \ge 0,

P(X=n | A)

= P( [X=n] \cap A) / P(A)

= (2/6)^{n-1} (1/6) / P(A)

= c (1/3)^{n-1} , for some constant c.

Therefore, the conditional distribution of X given A is Geometric (2/3), and hence E(X|A) = 3/2.

The following is an easy way to see why the naive intuition fails. Letting A be the event that all throws till the first 6 are even, observe that

P( A | the first throw is 2) = P(A).

This becomes more obvious if A is interpreted as “the first 6 comes before the first odd number”. In other words, A is independent of the event that the first throw is 2, and therefore conditioned on A, the probability that the first throw is 2 is 1/6. Hence, conditioned on A, in each trial, 2, 4 and 6 are not equally likely. In fact, 6 is more likely than 2 and 4.

So what is the probability that we get a 6 from one throw of a normal 6-sided dice conditioned on the event that all throws gave 6?

1.5. It isn’t very different from the expected number of throws until you hit a 3 using a three sided die.

Pingback: Random Storm Thoughts | A bunch of data

Maybe it could be useful if somebody can easily provide us with a random sequence of 200 tosses of a fair dice.

@GilKalai: a bunch of radioactive dice tosses: https://www.random.org/integers/?num=200&min=1&max=6&col=8&base=10&format=html&rnd=new (new sequence every reload)

Let b be a dice outcome in {1,3,5,6}. By symmetry, conditioned on having a sequence of dices of only 2’s and 4’s before reaching the first b, the expected sequence length up to and including the first b is the same for any b in {1,3,5,6}.

Now consider the expected time it takes for a random sequence of dices to first reach one of the values in {1,3,5,6}. This value is clearly 3/2 since its the expected value of a geometric random variable with probability of success 4/6. Furthermore, this value is the average of the previous expectations, which finishes the proof.

When we throw a infinite sided die until you get 6, the expected number of throws (including the throw giving 6) conditioned on the event that all throws gave even numbers is 2 !

Pingback: A Probability Puzzle That You’ll Get Wrong | Math with Bad Drawings

What a lot of mental exercise for the simplest of probability problems!

It doesn’t matter what came up in the previous throws. That’s right – nothing that happened in the past has any influence on the result of the next roll. This assumes, of course, that the die is fair and rolling surface is even.

Every roll carries a 1/6 probability of turning up any number. That’s all there is to it.

This example was presented in my graduate-level statistics class 40 years ago. The answer hasn’t changed, but the number of wrongs answers seems to have grown, though by what factor I couldn’t guess.

Pingback: Monty Hall (1921-2017) and His Problem | A bunch of data

The problem with the intuition is that we think sh are changing the probability space whereas we are not, we are just now rejecting more sequences (e.g. we now treat 16 like we did 22 before).

well the answer seemed to be 3 to me as well but it’s not I don’t know how and why but I simulated 1000 rolls in java and I found out this from 2 simulations :

no. of rolls to get to 6 with the given conditions frequency (i) frequency(ii)

=1 494 504

=2 252 252

=3 136 105

=4 55 66

=5 26 42

=6 17 12

>6 20 19

From this data, it is promising that it is somewhere between 1-2 (1.5 is the correct answer)

Pingback: Links 23 March 2018 – Power Word: Real Name