There are two players – the **row player** and the **column player** and there are two possible zero-sum games described in the picture below. For each of these games the row players has two strategies T (top) and Bottom (B) and the column player has two strategies L (left) and R (right). The entries in each game are the payoffs that the column player pays the row player, so if the game on the left is played, the row player chooses T and the column player chooses L, then the column player pays 1 to the row player.

## The repeated game scenario

One of these two games is chosen at random with probability 1/2. And both players don’t know which one was chosen. Once this is done the game is repeated infinitely many times. The players dont know what are the payoffs in each round! But the players do know after each round how the other players played. Both players want to guarantee as high as possible average payoff (as the number of repetitions tend to infinity).

We first ask what are the payoff that each player can guarantee. If the row player chooses a mixed strategy of playing T and B with equal probabilities he can guarantee that his expected average gain will be 1/4. The column player can also choose a mixed 1/2 1/2 strategy and guarantee that the expected amount of payment for each round is at most 1/4. So this is the value of the game.

## Enters one side information

**Now comes the twist:** Suppose that the row player is told which game is played.

### TYI: How much on average can the row player guarantee?

In the picture above you see Bob Aumann discusses this game during a celebration for his former student Professor Shmuel Zamir.

**Happy birthday, Shmuel!**

I suppose that they know what strategy the other person chose in the previous round, right? Otherwise they couldn’t do much.

Yes yes!!!

Seems to me like still only 1/4. It is enough for the column player to mirror the distribution of the choices of the row player up to that point (i.e. P(R_n)=p=P(T) and P(L_n)=1-p=P(B)) to make the row player’s best choice p=1/2).

There’s no reason for Row’s p to be fixed during the game, which I think your optimization assumes. By playing T and B in blocks of varying (increasing) lengths row can manipulate column to play (in the first game) L with higher probability on the turns in which row will play T than on those he plays B. I haven’t worked out if it’s possible to choose the lengths such that row’s gains are consistently above 0.25 but I think it is; it’s definitely possible as the limsup, e.g. by doing

100 B, 100 T, 100 B, 100 T, 200 B, 200 T, 400 T, 400 B…

So that during every batch of T’s starting from the second, the probability of column choosing L would range from 2/3 down to 2/4, averaging to 2*ln(4/3) ≈ 0.575. Thus by the end of every T block, the total gain would be half that, 0.288, since half the time was spent in T blocks; but at the end of every B block it would be a third, 0.192, since only a third of the time was in T blocks.

Just to clarify: both players have perfect knowledge of the full settings: that they both get opponent’s strategy in previous round, that row player knows what game is played, that only one game is played (infinitely many times) etc.?

Thanks for the question: yes, yes, yes, yes!