## July 19, 2012

### Zero-Determinant Strategies in the Iterated Prisoner’s Dilemma

#### Posted by Mike Shulman

A note to those arriving from the article in the Chronicle of Higher Education: my opinion may not have been accurately represented in that article. Please read my whole post and judge for yourself.

Usually I blog about things because I am excited about them, but today is a little different. Not that I’m not excited about this post, but recently I’ve been too busy with other things to do much blogging. However, what excitement can’t do, annoyance can, and I’m getting tired of seeing this (amazing, surprising, insightful, groundbreaking) paper subject to hype and (what seems to me to be) misinterpretation.

• William Press and Freeman Dyson, Iterated Prisoner’s Dilemma contains strategies that dominate any evolutionary opponent, PNAS open access, March 2012.

If you don’t know what the prisoner’s dilemma is, Wikipedia has a fairly good article. The important things to take away are that

1. At first, it seems like defection is the only rational strategy, but
2. When the game is played over and over again, there can be an advantage to cooperation.

The first point is easy to see: defection is a dominant strategy, and mutual defection is the only Nash equilibrium.

The second point can be made both experimentally and theoretically. On the experimental side, in computer tournaments of iterated prisoner’s dilemma, the strategy TIT FOR TAT (or a small variation thereof) almost invariably accumulates the highest total scores. TIT FOR TAT is a very simple strategy, which cooperates on the first round, and thereafter simply copies what its opponent did on the previous round — thereby rewarding cooperation with more cooperation and punishing defection with defection. If we add an “evolutionary” aspect to the tournament, with higher-scoring strategies reproducing more, then over time TIT FOR TAT and its variations come to dominate the landscape.

On the theoretical side, Robert Axelrod (who also performed the first computer tournaments, in the 1980s) proved various theorems. For instance, assuming the expected number of iterations is sufficiently large, TIT FOR TAT is collectively stable, meaning that no single mutant strategy in a population of TIT FOR TATs can average a better score than the wild types surrounding it. TIT FOR TAT is not unique in this way: for instance, the strategy ALWAYS DEFECT is also collectively stable. But TIT FOR TAT is furthermore collectively stable with respect to invasion by small groups of mutants, whereas a small group of TIT FOR TATs (who can cooperate with each other) can successfully invade a population of ALWAYS DEFECT. Axelrod wrote a classic book called The Evolution of Cooperation about the iterated prisoner’s dilemma and TIT FOR TAT, which is still a good read.

TIT FOR TAT does have a few drawbacks. For instance, imagine a “sneaky” strategy that tries defecting once or twice to see if its opponent is gullible, but otherwise plays TIT FOR TAT. An ordinary TIT FOR TAT playing against this strategy will end up in a cycle of mutual defection. The same thing happens in a “noisy” environment where occasional “random” defections happen even against the will of the players. In this sort of situation, slight modifications of TIT FOR TAT can do a bit better; one called GENEROUS TIT FOR TAT is like TIT FOR TAT but forgives its opponents’ defection about 1/10 of the time.

The iterated prisoner’s dilemma is of course a very simplistic model, but this makes it easy to analyze, and the impressive thing is that conclusions of this sort seem applicable to many situations in “real life”. Animals of various sorts have been observed seemingly “playing tit for tat”, as have corporations and nation-states (to say nothing of undergraduates in economics experiments). It has been widely hypothesized that this sort of mechanism underlies how cooperation between animals and human beings evolved in the first place.

Enter Press and Dyson. Here’s some of the hype I’ve collected on the Internet.

Robert Axelrod’s 1980 tournaments of iterated prisoner’s dilemma strategies have been condensed into the slogan, Don’t be too clever, don’t be unfair. Press and Dyson have shown that cleverness and unfairness triumph after all.

Press and Dyson have discovered strategies that trump Tit-for-Tat…

although “kind, non-envious, retaliatory, forgiving” are still effective strategies, the optimal strategy is one that removes agency from its opponent, requiring the opponent to cooperate with the whims of its ruler.

The new paper is undoubtedly a breakthrough, but as far as I can tell, it doesn’t justify these claims.

So what do Press and Dyson actually do in this 5-page paper? They start by supposing that both players are playing a probabilistic strategy which depends only on the previous round. (I’ll come back to that assumption later on.) Thus each player’s strategy is determined by four probabilities: the probabilities of cooperation given that the previous round’s outcome was CC (both cooperate), CD ($Y$ exploited $X$), DC ($X$ exploited $Y$), or DD (mutual defection). Putting together these probabilities for both players, we obtain a probability distribution for the outcome of each round, conditional on the outcome of the previous round — in other words, a Markov chain.

The stationary distribution $\mathbf{v}$ of this Markov chain is the probability distribution of outcomes for any given round (in the long run), and hence governs the long-run expected payoffs of each player by the dot product formulas

$s_X = \mathbf{v} \cdot \mathbf{S}_X$ $s_Y = \mathbf{v} \cdot \mathbf{S}_Y$

where $\mathbf{S}_X$ and $\mathbf{S}_Y$ are the vectors of payoffs for player $X$ and $Y$ respectively in the four possible outcomes. If we order the outcomes as (CC,CD,DC,DD), then with the common choice of payoffs (3-3 for mutual cooperation, 1-1 for mutual defection, 5-0 for exploitation) we have

$\mathbf{S}_X = (3,0,5,1)$ $\mathbf{S}_Y = (3,5,0,1).$

Now using some undergraduate linear algebra, Press and Dyson show that the dot product of the stationary distribution $\mathbf{v}$ with any vector $\mathbf{f}$ can be expressed as a $4\times 4$ determinant in which one column is $\mathbf{f}$, one column is entirely under the control of player $X$ (that is, it involves only the four probabilities that describe $X$’s strategy), and another column is entirely under the control of player $Y$.

This means that either $X$ or $Y$ can (potentially) independently force the dot product of $\mathbf{v}$ with some other chosen vector $\mathbf{f}$ to be zero, by choosing their strategy so as to make the column they control be proportional to $\mathbf{f}$. In particular, taking $\mathbf{f}= \alpha \mathbf{S}_X + \beta \mathbf{S}_Y + \gamma \mathbf{1}$, either player can (potentially) force a given linear relation to hold between the long-run scores of both players:

$\alpha s_X + \beta s_Y + \gamma = 0.$

I say “potentially” because it might turn out that for a given $\alpha,\beta,\gamma$ there is no choice of probabilities (which, of course, have to be between $0$ and $1$) that make the columns proportional. For instance, if $X$ chooses $\beta=0$, thereby trying to force her own score to be some fixed number, it turns out there are (unsurprisingly) no such solutions. But if $X$ chooses $\alpha=0$, there are solutions: $X$ can force $Y$’s score to be some fixed number (within reasonable bounds — between $1$ and $3$ for the standard payoffs) independently of whatever strategy $Y$ might choose. (Obviously $X$ cannot force $Y$’s score to be less than $1$, since $Y$ can guarantee himself an average payoff of at least $1$ by defecting all the time. Similarly, $Y$ can guarantee himself an average payoff of at most $3$ by cooperating all the time.)

To describe one such strategy, pick two small positive numbers $\varepsilon_1$ and $\varepsilon_2$. The strategy is (assuming I did the algebra correctly):

• If both players cooperated last round, cooperate with probability $1-\varepsilon_1$.
• If I exploited my opponent last round, cooperate with probability $\frac{1}{2}(\varepsilon_1 + 3\varepsilon_2)$.
• If my opponent exploited me last round, cooperate with probability $1 - 2 \varepsilon_1 - \varepsilon_2$.
• If both players defected last round, cooperate with probability $\varepsilon_2$.

The other linear relation that Press and Dyson consider is $(s_X - 1) = \chi(s_Y - 1)$: forcing a given ratio to hold between the two player’s scores (or more precisely, the scores minus the all-defection payoff). In this case there are solutions for any $\chi \ge 1$: player $X$ can “extort” a greater share of the payoffs independently of whatever strategy $Y$ might choose. One such strategy is (again, assuming I did the algebra correctly):

• If both players cooperated last round, cooperate with probability $1-\frac{2\chi-2}{4\chi + 1}$.
• If I exploited my opponent last round, cooperate with probability $\frac{\chi+4}{4\chi+1}$.
• If my opponent exploited me last round, defect.
• If both players defected last round, defect.

Of course, by defecting all the time, player $Y$ can guarantee $s_X = s_Y = 1$, which satisfies $(s_X - 1) = \chi(s_Y - 1)$ for any $\chi$. However, given that $X$ is playing a “extortionate” strategy of this sort, one can show that $Y$’s own score will be maximized if he cooperates all the time — in which case $X$ does in fact get a higher average payoff than $Y$. The limiting scores as $\chi\to\infty$, for the standard payoffs, are $s_Y \to 1$ (removing the incentive for $Y$ to cooperate at all) and $s_X \to 4 \frac{1}{3}$.

I find this paper simply astounding. I think it’s surprising in the extreme that such a simple analysis of the iterated prisoner’s dilemma, leading to such interesting conclusions, had not yet been done after so many decades of study. And the wider implications for theories of cooperation are also fascinating.

In their discussion, Press and Dyson focus on the results in terms of “evolutionary” players versus those with a “theory of mind”. In their parlance, an “evolutionary” player is one who attempts to maximize his own score using some optimization scheme. This is the sort of player against which you want to play an extortionate strategy, since his optimization will then lead him to cooperate, giving you an unequal share of the payoffs.

On the other hand, playing an extortionate strategy is not itself an “evolutionary” strategy — its attraction is based on the assumption that your opponent will change his strategy in response to it, learning to cooperate and putting the two of you into your preferred sweet spot in the long run. Press and Dyson say that a player has a “theory of mind” if she can adjust her strategy with the intent of making her opponent adjust his strategy in response. If you try playing an extortionate strategy against a player with a theory of mind, he may realize what you are up to and decide to defect all the time — thereby lowering his own scores in the short run, but hoping to convince you that your extortionate strategy is not working and that you should switch to a more equitable one. Thus, Press and Dyson say that the iterated prisoner’s dilemma between two players with a theory of mind becomes an ultimatum game.

An alternative way of slicing the pie which doesn’t bring in theories of mind is to talk about short-memory and long-memory strategies. Zero-determinant strategies are obviously very short-memory strategies, like TIT FOR TAT: they only make decisions based on the previous round. The behaviour of a player with a “theory of mind”, on the other hand, instead of being described as “switching strategies”, could be considered to be a single very long-memory strategy, which happens to mimic particular short-memory strategies over short time spans.

(In an appendix, Press and Dyson prove that although zero-determinant strategies are derived under the assumption that both players have a memory of only one round, in fact they still behave the same way as long as the other player has a memory of only finitely many rounds. (Or more precisely, I guess, a memory which is short relative to the total number of rounds being played.) The idea is simple: if $Y$ is playing a strategy with a memory of $N$ rounds, while $X$ is playing a strategy with a memory of only 1 round, then by averaging over the resulting probability distribution of all sequences of $N$ outcomes, we can produce an alternative strategy for $Y$, with a memory of only 1 round, whose long-run average scores against $X$’s strategy are identical.)

So what about the hype? Do the “zero-determinant strategies” of Press and Dyson overturn the dominance of TIT FOR TAT, showing that selfishness wins out after all? I think the answer is: not remotely! Zero-determinant strategies thrive in a very different environment than TIT FOR TAT.

Remember that TIT FOR TAT’s experimental and theoretical dominance is relative to a large population of many players, each with their own strategy. This is crucial to TIT FOR TAT’s success, which has the interesting attribute that it never “wins” a single match! In other words, when you play TIT FOR TAT against some other strategy repeatedly for some large number of rounds, TIT FOR TAT will never come out “ahead” in that its score is greater than that of the other strategy. (In fact, TIT FOR TAT is the limiting case $\chi = 1$ of the Press-Dyson extortionate strategies: it forces the two players’ long-run scores to be equal. In a finite game, of course, edge effects can make this inexact.)

The way that TIT FOR TAT thrives in Axelrod’s evolutionary (or, perhaps better, “ecological”) environment is by playing in a large number of high-scoring games. Namely, against any other strategy which is “nice” (which Axelrod defined to mean “never being the first to defect”), TIT FOR TAT accumulates high scores from constant mutual cooperation — as does the other player.

By contrast, non-nice strategies may “win” individual matches (scoring higher than their opponent does in that match), but those matches tend to have low total scores. (And against TIT FOR TAT, non-nice strategies generally tie at very low scores such as mutual defection.) Extortionate strategies (which must be non-nice to achieve $\chi\gt 1$) are no exception.

Where extortionate strategies do succeed is in the situation that Press and Dyson describe: when you are playing against a single fixed opponent only (and, preferably, an evolutionary one without a theory of mind). But even in this situation, extortionate strategies do not “defeat” TIT FOR TAT when played against it. TIT FOR TAT has no theory of mind, but neither is it evolutionary. Against an extortionate player, TIT FOR TAT’s simple rule leads to a cycle of mutual defection — exactly the response of the player with a theory of mind to an unfair ultimatum! And if the extortionate player reacts to the refusal of the ultimatum by switching to a fairer strategy, then TIT FOR TAT is forgiveness itself.

The only way that a zero-determinant strategy can “trump” TIT FOR TAT is that it can successfully take advantage of some opponents which TIT FOR TAT cannot. Of course, even ALWAYS DEFECT has this property: it can take advantage of ALWAYS COOPERATE, which TIT FOR TAT cannot. But extortionate zero-determinant strategies are cleverer than ALWAYS DEFECT, and can exploit a wider range of opponents (in particular, “evolutionary” ones). Thus if the goal is to “win” matches in this sense, rather than to accumulate points, then these extortionate strategies can be successful.

To test these conclusions, the authors of the following paper ran a computer tournament including both TIT FOR TAT and some zero-determinant strategies:

• Alexander Stewart and Joshua Plotkin, Extortion and cooperation in the Prisoner’s Dilemma, PNAS (not open access, sadly).

I don’t know the names of all the 20 or so strategies that played, but in addition to TIT FOR TAT they included ALWAYS DEFECT, ALWAYS COOPERATE, GENEROUS TIT FOR TAT, TIT FOR TWO TATS (defect only if your opponent’s previous two moves were defections), and RANDOM (flip a coin), and probably some “sneakier” ones. The zero-determinant strategy EXTORT-2 which enforces $(s_X - 1) = 2 (s_Y - 1)$ came in second to last in the number of total points, but second to the top in the number of matches “won” — in both cases playing second fiddle to ALWAYS DEFECT. (The paper doesn’t give a lot of details, but I suspect that most of these “victories” were rather pyrrhic. For instance, if ALWAYS DEFECT plays TIT FOR TAT, then on the first round ALWAYS DEFECT will score 5 versus TIT FOR TAT’s 0, and then on all subsequent rounds they will both score 1. Technically, this is a “victory” for ALWAYS DEFECT.)

Stewart and Plotkin also considered another very interesting zero-determinant strategy, which Press and Dyson did not mention. This one tries to enforce the relationship $(s_X - 3) = \chi (s_Y - 3)$, where $3$ is the “mutual cooperation” reward score. One such strategy is (assuming I did the algebra right):

• If both players cooperated last round, cooperate.
• If I exploited my opponent last round, cooperate.
• If my opponent exploited me last round, cooperate with probability $\frac{\chi-1}{3\chi+2}$.
• If both players defected last round, cooperate with probability $\frac{2(\chi-1)}{3\chi+2}$.

Note that $\chi=1$ reproduces TIT FOR TAT again. For $\chi \gt 1$, the relationship $(s_X - 3) = \chi (s_Y - 3)$ looks like it is saying that if your score is more than $3$, mine is twice as much more than $3$. But actually, it’s impossible for both players’ long-run average scores to be strictly greater than $3$, so it would be more accurate to write this as $(3 - s_X) = \chi (3 - s_Y)$ — meaning that if your score is less than $3$, then mine is twice as much less!

Stewart and Plotkin chose $\chi=2$ and called the resulting strategy ZDGTFT-2, because it’s very similar to GENEROUS TIT FOR TAT (GTFT). (Actually, I can’t be sure whether their ZDGTFT-2 is this particular strategy or another one which enforces the same linear relationship; they don’t describe theirs explicitly.) The difference between ZDGTFT-2 and GTFT is that with ZDGTFT-2, the probability of forgiving the other player’s defection when I also defected last round is twice as large as it is when I cooperated last round.

In Stewart and Plotkin’s tournament, the overall score of ZDGTFT-2 was very slightly better than that of GTFT (by about one percentage point) — while GTFT did somewhat better than ordinary TIT FOR TAT (by about five percentage points). So can zero-determinant strategies beat TIT FOR TAT at its own game? One tournament isn’t conclusive, but it suggests that if so, their advantage is probably miniscule.

In conclusion, I think the unexpected existence of zero-determinant strategies is a discovery of the first order, with many fascinating implications that we can see already, and others that I’m sure are yet to come. It opens up entirely new ways to analyze evolutionary, social, and international relationships, potentially greatly expanding the (already surprisingly wide) applicability of the iterated prisoner’s dilemma. But as far as I can tell, the only way that extortionate strategies can “beat” TIT FOR TAT is if we change the rules, so that we are no longer playing a game that TIT FOR TAT ever claimed to be able to win.

Posted at July 19, 2012 8:34 PM UTC

TrackBack URL for this Entry:   http://golem.ph.utexas.edu/cgi-bin/MT-3.0/dxy-tb.fcgi/2542

### Re: Zero-Determinant Strategies in the Iterated Prisoner’s Dilemma

Thanks for this summary.

I am not by far an expert on this matter, most I know of it is via a friend’s Master Thesis many years back – which was a thesis in environmental sociology. Now reading your account here, reminds me of some questions I had and have.

What I find striking about the discussion about the prisonor dilemma is how on a very crude simplistic mathematical model a heap of sociological interpretation is stacked. In effect, to rules for just bit-flips one assigns words designed to describe behaviour of multicellular organisms in a complex environment.

My first question is: is there actual interest in the prisoner dilemma in pure mathematics, or is the interest it generates all qua its interpretation as a model in sociology or ecology?

You mention above

and so I am guessing that the answer to this question is “the second of the above”, but let me know. (I could probably find out by reading more of the sources, but allow me to ask anyway.)

My second question is: as a model in sociology and ecology, how useful has it been when compared with experiment?

There are plenty of examples of mathematically precise but hopelessly overly simplistic assumptions about behavious of humans – say in finance. Is the “prisoner dilemma” an exception?

In this respect finally a comment: you end by saying:

the only way that extortionate strategies can “beat” TIT FOR TAT is if we change the rules

This leads me back to my question: are we intending to do math here, or model building in sociology? In the first case, changing the rules in the middle of the game is an error that invalidates the whole undertaking. But in the second case uncertainty in the rules along which to play is a hallmark of the whole subject. It’s what makes sociology (or whatever: ecology, finance) a non-exact science. If we regard the prisoner dilemma as actually being a model for some kind of complex systems, then we should be prepared to change its rules and be in fact centrally intersted in how the system changes as the rules are being changed.

Sorry if all these questions and comments maybe are tangential to the central point of your post. But it’s what came to my mind when reading this.

Posted by: Urs Schreiber on July 21, 2012 10:00 AM | Permalink | Reply to this

### Re: Zero-Determinant Strategies in the Iterated Prisoner’s Dilemma

is there actual interest in the prisoner dilemma in pure mathematics, or is the interest it generates all qua its interpretation as a model in sociology or ecology? You mention above “some undergraduate linear algebra” and so I am guessing that the answer to this question is “the second of the above”…

This question is stacked with objectionable hypotheses!

Firstly, I am skeptical that there is a meaningful notion of “interesting in pure mathematics” which is totally unrelated to any application of mathematics. We certainly find things interesting in pure mathematics that don’t themselves have a direct application, but this is because of our interest in other things, which is because of our interest in other things, which … was at least originally because of our interest in some application. Because game theory is explicitly an abstraction of some activity that people and organisms engage in in the real world, it’s difficult for me to imagine what it would even mean for something in game theory to be “interesting” in “pure mathematics” without any reference to its meaning as a theory of games.

Secondly, I object very strongly to the implication that no mathematics that is understandable by an undergraduate can be interesting!!!

as a model in sociology and ecology, how useful has it been when compared with experiment?

Well, you should not expect it to make quantitative predictions. Nobody does. The predictions it makes are qualitative, that strategies like TIT FOR TAT should be observed under certain environmental conditions. And in fact they are. I highly recommend Axelrod’s book The evolution of cooperation, which contains some discussion of tit-for-tat behavior in animals and international relations. One of his most striking examples is the live and let live behavior during WWI trench warfare, which was often characterized by quite explicit tit-for-tat-like systems.

If we regard the prisoner dilemma as actually being a model for some kind of complex systems, then we should be prepared to change its rules and be in fact centrally intersted in how the system changes as the rules are being changed.

I said in the post that the new strategies “open up entirely new ways to analyze evolutionary, social, and international relationships”. That seems to be exactly what you’re saying: when there are different rules, then we need new methods and theorems. My point was that the new strategies, and their effectiveness under new sets of rules, do not invalidate the conclusions of the old theorems, which still apply under their own original sets of rules.

Posted by: Mike Shulman on July 23, 2012 7:58 PM | Permalink | Reply to this

### Re: Zero-Determinant Strategies in the Iterated Prisoner’s Dilemma

You could, of course, come up with a version of the iterative prisoner’s dilemma in which the players are allowed to change strategies, and it’s built into the rules ahead of time under what circumstances they would change strategies, and how it is determined what strategy they would switch to, and different players would have different strategy changing frequencies/criteria and different methods of determining what strategy they would switch to. You could consider all possible combinations of initial strategy, strategy changing frequency/criteria, and method of selecting new strategy. This would be expanded version what was considered in the original paper. I then thought you could expand this further and allow the option for players to die and reproduce, based on their success rate, where the offspring inherit the strategies of the parents, and then see over successive generations, what percentage of players are employing which strategies.

Jeffery Winkler

Posted by: Jeffery Winkler on July 31, 2012 8:35 PM | Permalink | Reply to this

### Re: Zero-Determinant Strategies in the Iterated Prisoner’s Dilemma

The Stewart/Plotkin paper may not be open access, but Bill Press has a copy on his website.

Posted by: Robin Houston on July 22, 2012 12:46 PM | Permalink | Reply to this

### Re: Zero-Determinant Strategies in the Iterated Prisoner’s Dilemma

Is the existence of these strategies something specific to prisoner’s dilemma, or would any matrix game admit a similar set of strategies?

Posted by: matt on July 22, 2012 6:08 PM | Permalink | Reply to this

### Re: Zero-Determinant Strategies in the Iterated Prisoner’s Dilemma

Yes, any matrix game (potentially) admits zero-determinant strategies. Nothing is specific to the prisoner’s dilemma until you start putting in numbers to see what strategies are feasible (i.e. give you probabilities between 0 and 1). I am not aware of any work yet on zero-determinant strategies for other matrix games.

Posted by: Mike Shulman on July 23, 2012 8:00 PM | Permalink | Reply to this

### Re: Zero-Determinant Strategies in the Iterated Prisoner’s Dilemma

Thanks for this great post, Mike. It’s nice and timely for me, because I’ve been thinking a little about these new results as well and you’ve summarised it all very neatly.

If anyone is interested to see what it’s like to play against these ZD strategies, I made a wee interactive gadget at http://s3.boskent.com/prisoners-dilemma/fixed.html

(I also wrote a bit on my blog about it, but there isn’t anything very meaty there for a crowd like this one.)

Posted by: Robin Houston on July 23, 2012 11:55 PM | Permalink | Reply to this
Read the post What are the implications of the recently discovered zero-determinant strategies for the prisoner's dilemma?
Weblog: Quora
Excerpt: Mike Shulman gives a thorough introduction to the topic at [1]. Though the discovery has apparently been hyped a bit too far, it opens up the study of a whole class of strategies that may be in use 'out there', in finance, geo-politics and animal behav...
Tracked: July 29, 2012 11:01 PM

### Re: Zero-Determinant Strategies in the Iterated Prisoner’s Dilemma

Thanks alot, it’s really a great post and the paper also seems great.

As you say it is astonishing that such “long term” strategies have not been developed earlier. I guess that will lead to a flurry of activity. (Actually I am a bit suspicious, I wonder if it will not come out that after all such ideas had already been proposed.)

Thinking about it I would believe there is a much more general framework where those metagames can be modeled and analyzed. Actually I think I already read the term “metagame” used technically, or game on games, or something.

Perhaps this would connect to statistics where you estimate time series, try to see what strategy your opponent is on etc. That sound very attractive and if anybody has comments on this I would enjoy reading them.

Further idea: do such games necessarily lead to stable strategies as a function of (discrete -or continuous) time? Or can we cycle through players picking various “local” strategies, defecting more sometimes, then cooperating more, then defecting again.

Thanks again for the adrenaline-infused post.

Posted by: plm on August 8, 2012 4:49 PM | Permalink | Reply to this

### Re: Zero-Determinant Strategies in the Iterated Prisoner’s Dilemma

I think the story goes back at least to David Blackwell and 1956. It is what is called in game theory Blackwell’s approachability theorem that can be used to determine which (alpha,beta,gamma) values are possible. The theorem will give a complete (easy to calculate) characterization of this set (and in fact the theorem itself is more general and has a rigorous proof, i.e., no limitations on the strategies used).

Posted by: Csaba Szepesvari on August 19, 2012 5:27 PM | Permalink | Reply to this

### Re: Zero-Determinant Strategies in the Iterated Prisoner’s Dilemma

The post you linked to seems to be about zero-sum games, which the Prisoner’s Dilemma is not. Am I misreading it, or is there a version of the theorem which applies to PD, or a clever trick of translation?

Posted by: Mike Shulman on August 23, 2012 7:14 PM | Permalink | Reply to this

### Re: Zero-Determinant Strategies in the Iterated Prisoner’s Dilemma

I have written up the reduction as a pdf. Apologies for not pasting it here; I don’t know how to use latex in responses.
- Csaba

Posted by: Csaba Szepesvari on October 20, 2012 5:54 PM | Permalink | Reply to this

### Re: Zero-Determinant Strategies in the Iterated Prisoner’s Dilemma

Looks plausible to me, though without knowing anything about approachability theory I can’t tell for sure. Thanks for mentioning this!

Posted by: Mike Shulman on October 26, 2012 1:52 AM | Permalink | Reply to this

### Re: Zero-Determinant Strategies in the Iterated Prisoner’s Dilemma

There’s a problem that I haven’t seen pointed out anywhere, although I’ve been looking at this for a couple of days. It’s of the division-by-zero sort.

Specifically, the paper turns on a formula for $$\alpha s_x + \beta s_Y + \gamma$$ as a ratio of determinants, and then focuses entirely on making the numerator be 0. The problem is that in some nontrivial circumstances, the denominator can also be zero. Referring to the Dyson and Press paper, $D(p,q,1)$ in equation [6] is zero if either player plays TITFORTAT, for example, as the determinant in question will have a column of zeros. I haven’t done the calculation, but obviously if both players set ZD strategies to extort a 3-to-1 payout, something must go wrong. Might there not be some other troublesome strategies?

Posted by: Kevin O'Bryant on August 23, 2012 2:57 AM | Permalink | Reply to this

### Re: Zero-Determinant Strategies in the Iterated Prisoner’s Dilemma

There’s another problem which I think might be the same problem. They note that $Mv = v$ for the stationary vector of the Markov chain, or any multiple of it. But then they use the converse, that since $Ms=s$, $s$ must be a multiple of the stationary vector. It’s not always true that a Markov chain has a single stationary vector.

Posted by: Kevin O'Bryant on August 23, 2012 6:24 AM | Permalink | Reply to this

### Re: Zero-Determinant Strategies in the Iterated Prisoner’s Dilemma

Hi Kevin,

to turn on mathematics, you need to select a text filter than includes the word(s) ‘MathML’. Unfortunately the system won’t remember who you are, so you have to do it every time.

Posted by: David Roberts on August 23, 2012 8:13 AM | Permalink | Reply to this

### Re: Zero-Determinant Strategies in the Iterated Prisoner’s Dilemma

That’s a good point! I think the problem arises slightly earlier than that, though: their formula for $\alpha s_X + \beta s_Y + \gamma$ is only valid if its denominator is nonzero. Their equation [2] doesn’t imply, as they claim, that every row of $Adj(\mathbf{M}')$ is proportional to the desired stationary vector $\mathbf{v}$; it only implies that every nonzero row of $Adj(\mathbf{M}')$ is proportional to $\mathbf{v}$. It might turn out that the particular row of $Adj(\mathbf{M}')$ they choose is zero, and hence tells us nothing about $\mathbf{v}$.

On the other hand, I don’t quite agree with your examples. X’s row in the determinant becomes zero if $p_1=p_2=1$ and $p_3=p_4=0$, which I believe is not TIT FOR TAT but rather “always play the same move that you played last time.” Same for Y’s row. That’s what happens if X tries to set her own score, equations [10]. And I don’t think Y playing this strategy can mess up any ZD strategy X may be attempting either, since from X’s point of view, this strategy of Y is equivalent either to ALWAYS COOPERATE or to ALWAYS DEFECT (depending on which state Y started in).

I also don’t think there is a problem a priori with two extortionate strategies playing against each other. The extortionate strategy forces a particular ratio to hold between $s_X-1$ and $s_Y-1$, not between $s_X$ and $s_Y$ themselves; thus if we are trying to force two incompatible ratios to hold, we can always do it with $s_X=s_Y=1$, i.e. mutual defection.

However, if both players play TIT FOR TAT, then the first column of the determinant is zero. More generally, this happens if after mutual cooperation, both players always cooperate, and after all other outcomes, someone always defects.

This leads to your second comment, which sounds to me like the “other side” of the same problem: actually actually, equation [2] only implies that every nonzero row of $Adj(\mathbf{M}')$ is proportional to some stationary vector. Presumably which stationary vector you end up in depends on the initial conditions; so even if the row that Press-Dyson pick is nonzero, whether or not the ZD strategy works might (at least a priori) depend on the players initial moves, which are not specified by the probabilities $p_i,q_i$.

And we know very well that, for instance, it’s crucial to TIT FOR TAT’s success that it cooperates on the first move; otherwise two TIT FOR TATs playing each other would get locked into a cycle of mutual defection. Here we certainly seem to have two different stationary vectors — neither of which is nonzeroly proportional to the row of $Adj(\mathbf{M}')$ picked by Press-Dyson, since as above, that is zero — but both of these vectors still do exhibit the ratio of scores that TIT FOR TAT claims to be forcing, when regarded as a ZD strategy.

So are there any “actual failures” – purported ZD strategies that can fail to achieve their aims, perhaps only under some set of initial moves? Another way to make that determinant zero would be to make a pair of columns or rows equal. By sign considerations, this would have to involve some probabilities being one or zero.

For instance, to make the last two rows equal, $-1+q_2=q_4$ implies $q_2=1$, $q_4=0$, whence $p_3 q_2 = p_4 q_4$ implies $p_3=0$, and then $p_3=p_4$ implies $p_4=0$. So Y always cooperates if he was just exploited, while X always defects if she just exploited Y, while both always defect after a mutual defection. This obviously also has at least two stationary vectors, now with different payoff ratios — but is it compatible with any ZD strategies? I don’t know.

Similarly, to make the second and third columns equal, we have $p_1=q_1$, $p_2=1$, $q_3=0$, $p_3=0$, $q_2=1$, and $p_4=q_4$. Now we have at least two stationary vectors “X continually exploits Y” and “Y continually exploits X”, again with different payoff ratios — but is it compatible with any ZD strategies?

Press and Dyson make a throwaway remark on page three that in an extortionate strategy, “The value of $\phi$ is irrelevant, except that singular cases (where strategies result in infinitely long ‘duels’) are more likely at its extreme values.” I wonder if this issue is related to what they had in mind. Certainly all the above examples seem to be “singular cases” in that they lie on the boundaries of various polytopes.

Posted by: Mike Shulman on August 23, 2012 7:06 PM | Permalink | Reply to this

### Re: Zero-Determinant Strategies in the Iterated Prisoner’s Dilemma

I was just pointed to this blog post today by a friend. Thanks for the detailed analysis. Obviously the point about winning individual matches vs achieving a high score is relevant and well made. However, I have a few quick comments to make:

1. You seem to be arguing quite strongly that ZD strategies do not do better that TFT (in terms of score rather than number of wins) in an evolutionary/’ecological’ situation. This is clearly not necessarily true: Stewart and Plotkin show one particular ZD strategy outscoring both TFT and GTFT in exactly such a tournament. Obviously you know this, but you could have stated it more explicitly: some ZD strategies do better than TFT even relative to large numbers of players.

2. Related to the point above: there is no single extortionate ZD strategy, this is a continuous class of strategies. Some members of that class may do very well in scenarios like Axelrod’s tournament, others may do badly.

3. You suggest that an extortionate ZD strategy playing against TFT would get locked into a cycle of mutual defection. This is not true, because depending on which ZD strategy, P(C|DD)>0 is allowed and would break the cycle.

4. It appears to me that an extortionate ZD strategy would be collectively stable in the same way as TFT is.

5. In terms of actual evolutionary behaviour, I wonder whether players with “evolutionary” (in the Press and Dyson sense) strategies might not be a better model in some circumstances than treating each player as having a single fixed strategy (TFT, GTFT, ALLD etc.)?

### Re: Zero-Determinant Strategies in the Iterated Prisoner’s Dilemma

Welcome to the Cafe, Sesh! I’ll respond to each of your comments in turn.

1. I mentioned this in my penultimate paragraph. It’s unclear to me whether the advantage of ZDGTFT in the Stewart-Plotkin tournament is statistically significant; I think more experimentation needs to be done before reaching any definite conclusions about whether ZDGTFT is actually better than GTFT. (Probably people are doing — or have already done — it. I’m not plugged into the relevant community.)

However, perhaps you are right that in a few places I said “zero-determinant strategy” where I really intended to refer only to the extortionate ones. I think those are the ones that are relevant to the overblown hype of the form “selfishness wins after all!” — ZDGTFT is just a slightly cleverer way for altruism to beat selfishness.

2. What you say is true: Press and Dyson describe a 2-parameter family of extortionate strategies. But I didn’t ignore this entirely; in the post I merely simplified it to a one-parameter family by choosing the maximum possible value for their parameter $\phi$.

3. Press and Dyson’s extortionate strategies all have P(C|DD)=0; see their equation [12] (P(C|DD) is their $p_4$). If you or someone else has found a different sort of extortionate strategy that has P(C|DD)>0, I’d love to see it!

4. Yes, an extortionate ZD strategy would be collectively stable, but its mode of collective stability would be more like that of ALWAYS DEFECT than like that of TIT FOR TAT, since when extortionate strategies play against each other they end up constantly defecting.

5. It’s an interesting question; I think the answer probably depends on the timescale and mechanism of the evolution we’re interested in modeling. This gets into the difference between “short-run” and “long-run” strategies too.

Posted by: Mike Shulman on September 21, 2012 11:45 PM | Permalink | Reply to this

Post a New Comment