Here is a problem I heard from Zachary Chase and Yuval Peres. Bob choses a sequences of zeroes and ones of length . The bits are presented to Alice one by one. Alice’s task is to choose, at a certain time of her choice, some number
(smaller than the number of unseen bits) and to predict the fraction of ones in the next
bits. Alice wins if her prediction deviates by at most 0.01 from the correct ratio.
Test your intuition 52: Assuming that
is sufficiently large, is it possible for Alice to achieve this task with probability greater than 0.99?
Clarification: we assume Bob’s sequence is fixed, and the randomness is over a probabilistic strategy chosen by Alice.
Just to be sure: we assume Bob’s sequence is fixed, and the randomness is over a probabilistic strategy chosen by Alice?
Yes, exactly
In the special case where the sequence is determined by flips of a fair coin, how large would K have to be, for a guess of “(K/2)/K” to be within 0.01 of the correct answer, 99% of the time?
Just an attempt. Say, N=2^n. Alice first randomly picks a “scale” 1 \le j \le n and divides the interval into block of size 2^j. Then she picks r randomly between 1 and 2^{n-j), and computes the bit average in the r’th block and gives it as prodiction for the (r+1)-th block. I cannot prove this works for all seqeunces, but some obvious counter-examples that work, e.g. for a fixed scale j do not seem to work here.
I think it is possible, but I don’t have a complete proof. With a minimax-style argument, the failure probability of Alice’s best-choice algorithm on a worst-case sequence equals the the failure probability of the worst-case distribution of inputs when input to the best deterministic algorithm tailored to that distribution.
I couldn’t think of a hard input distribution as n increases, and I don’t think that one exists. It seems like if your distribution allows you to learn significant information about future values from past values, this would let you get an advantage which would let you win with large enough N. If you could get no advantage then the sequence is completely memoryless; in this case our algorithm could set K=N and guess the expected number of 1s over the distribution, which should work for sufficiently large N using the CLT for Markov chains. I think there is an edge case here where you could learn small amounts which aren’t enough to win, but intuitively I think a CLT would still apply in those cases.
This doesn’t translate well into an explicit algorithm, and I’m definitely interested to see what the answer is.
What I said above about the CLT for Markov chains was nonsense, that would require stationary transition probabilities which obviously aren’t present here. However, I still can’t think of a hard distribution.
Quick attempt:
Let X,Y be random variables with Var(X),Var(Y) <= v.
Then, Var((X+Y)/2) + 1/4 * Var(X-Y) = 1/2 * Var(X) + 1/2 * Var(Y) <= v.
In particular, either guessing the average of X and Y or guessing their difference (and thus also guessing Y given X) must be easier than it is to guess either X or Y.
By repeating the above observation (or a slight generalization of it for a weighted average) we get a weaker claim than desired:
If Bob is drawing his input out of any fixed distribution D, then there must exist a pair of disjoint sets of indices (not necessarily consecutive) A_D,B_D such that if you tell me the values of the drawn instance in A then I can guess the density in B really well.
Does Alice know the value of N?
Yes.