## The Complexity of Zero-Sum Stochastic Games with Perfect Information

Is there a polynomial time algorithm for chess? Well, if we consider the complexity of chess in terms of the board size then it is fair to think that the answer is “no”. But if we wish to consider the complexity in terms of the number of all possible positions then it is easy to go backward over all positions and determine what is the outcome of the game when we start with each given position.

Now, **what about backgammon? **This question represents one of the most fundamental open problem in algorithmic game theory. The difference between Backammon and chess is the element of luck; at each position your possible moves are determined by a roll of two dice. This element of luck increases the computational skill needed for playing backgammon compared to chess.

Indeed, when the panel members about the future of algorithmic game theory, in a discussion at ICS 2011, were asked about the main open problems in the field, Peter Bro Miltersen proposed this problem. It was recently proved that computing a Nash equilibrium of matrix games is computationally hard, PPAD-hard! (Readers, please give me good links with an explanation.) Constantinos Daskalakis, Paul Goldberg, and Christos Papadimitriou proved the first hardness result for 4-players games and Xi Chen and Xiaotie Deng proved that computing Nash equilibriu is PPAD hard already for the case of 2-players.

It looks that the complexity of stochastic zero sum games is a major remaining challenge. The first paper to single this out as a problem in that we do not know to be in P was by Anne Condon in 1988. This will be the main topic of this post.

This post is also meant to offer background for two talks in next week IPAM workshop “Efficiency of the Simplex Method: Quo vadis Hirsch conjecture?” It is related to a few other talks in that workshop. Uri Zwick’s excellent lecture at the Israeli theory day last year at the Open University is a great introduction to the topic as well. (The lecture itself starts in minute 7:13.) The terminology used by Uri is slightly different, He refers to backgammon as a two and a half player game where the half player, nature, rolls the dice.

## Shapley’s stochastic games.

The setting is very simple:

We have a graph of positions including an initial position.

At each position we have a zero sum game described by a payoff matrix.

For each pair of strategies in this game we have a “discounted probability distribution” on the positions.

A “discounted probability distribution” is like a probability distribution but the sum of probabilities is strictly less than 1. We can simply regard the difference to 1 as the probability that the game ends.

Once you are at some position, the two players choose strategies for the game at the position, the pair of strategies detetmines the amount player I has to pay player II, and then the game move to a different position or ends according to the (discounted) probability distribution associated to the pair of strategies.

Shapley proved (and this is not so difficult based on standard minmax theorems) that this game (with mixed strategies) has a value. And that there are stationary strategies for the player attaining this value. A stationary strategy simply means that the a player move depends only on the position and not on the history.

**Remark:** One of the major open problems in game theory was to extend Shapley’s result to stochastic games where there is no discounting. (Of course, this requires a careful formulation; in particular, values and equilibria are approximate notions.) This was proved by Jean Francois Mertens and Abraham Neyman in the 80s. Extending this result to many-player stochastic games, namely showing that such games always have an approximate Nash equilibrium is one of the major open problems in game theory. Nicolas Vieille proved that this is the case for 2-person games. Yesterday, (in a dinner celebrating the 1-day serial workshop “Action now” as well as Elon Lindenstrauss’s Fields medal) Merale Neyman told me that when you consider continuous time games, the existence of Nash equilibrium can be proved.

## Games with perfect information

The game has perfect information if in each position only one player makes a move. So all matrix games have either one row or one column. In this case, it can be shown that the value is attained by pure strategies. Namely, at each position the player choses one strategy.

Problem:Is there a polynomial time algorithm to find the optimal strategy (and thus the value) of a stochastic zero sum game with perfect information?

Walter Ludwig showed that RANDOM FACET is subexponential for solving soochastic games with perfect information. (Ludwig’s proof applied to games with outdegree 2 and the result ws extended to the general case by Sergei Vorobyov, Henrik Björklund, Sven Sandberg, and Sergei Vorobyov.) Nir Halman showed that the question can be regarded as an abstract linear programs (in the sense of Sharir and Welzl).

If in every position there are only two strategies (and it looks like we can always somehow reduce to this case) then the underlying abstract “polytope” is a cube. A polynomial time algorithm for reaching the sink for a unique sink acyclic orientation of the discrete cube will give a positive answer to the problem. (These games are also sometimes called “turn-based games”.)

## Markov decision processes

If there is only one player then the game is called a Markov Decision process (MDP). Markov decision processes can be solved using linear programming. Is there a strongly polynomial algorithm?

Yinyu Ye’s recent result (that will be presented at IPAM) asserts that the answer is yes if we assume that the probability that the game ends is at least some constant c. (The polynomial depends on c.) Thomas Dueholm Hansen, Peter Bro Miltersen and Uri Zwick extended Ye’s result to 2-player perfect information stochastic games with a constant discount factor and this result was presented by Thomas at ICS2011.

The recent examples by Oliver Friedmann, Thomas Dueholm Hansen, and Uri Zwick showing that RANDOM EDGE and RANDOM FACET are subexponential (described in this post) were based on playing such games.

## Acyclic unique sink orientations on cubes

Consider the graph of a n-dimensional polytope P. An orientation of the edges is called **unique sink **if on every face of the polytope there is only one sink. Acyclic unique sink orientations of graphs of polytopes are also called abstract objective functions. (We mentioned them in this post about abstract linear objective functions as well as this one about telling simple polytopes from their graphs.) We can consider more abstract settings (similar to those we consider in polymath3) but it is fruitful also to go the other direction and consider the special case that P is the n dimensional cube. Acyclic unique sink orientations of the cube are very interesting objects and finding a path following algorithm to reach the sink is an important algorithmic question. It is closely related to the goal of finding a polynomial version of the simplex algorithm.

Pingback: Remote Blogging: Efficiency of the Simplex Method: Quo vadis Hirsch conjecture? | Combinatorics and more

Pingback: IPAM remote blogging: The Many Facets of Linear Programming | Combinatorics and more

What about the case of one of Shapley’s games without perfect information? If each player has many strategies, then this is PPAD-hard by the result you mentioned as there is a matrix game at each node, but what if each player has only 2 strategies? Is this obviously PPAD-hard also?

This is a good question. It may be the case (but it is unknown) that the zero sum game is in P even when there is no complete information. I dont know what is the situation for non zero sum game.

Can you clarify why backgammon can be modelled using this model of games, since backgammon does not end with positive probability at each state?

Hmm, good point. I cannot answer right away why we can regard the game as a game with discount. I will think about it. A little point about backgammon is that we associate throwing the dice to the move before and not after.

To Dave:

Essentially all natural variants of 2-player perfect-information stochastic games (discounts vs. no discounts, payoffs given at the end of the game vs. rewards accumulated during the game, postive stopping probability vs. zero stopping probability, win/lose vs. rational payoffs) are equivalent in the sense that efficient algorithms for one would lead to efficient algorithms for the others. They are also equivalent in stronger senses. We discuss it in this paper:

Daniel Andersson and Peter Bro Miltersen: The complexity of solving stochastic games on graphs, ISAAC 2009.

I should emphasize that the situation is very different in the imperfect information case.

Pingback: Sightseeing | A Decade Under The Influence

I all the time emailed this web site post page to all my associates, for the reason that

if like to read it then my friends will too.