Skip to main content.
26 November, 2008

Is there a rational strategy in Finite Iterated Prisoners Dilemma?

(UPDATE: I think there’s a mistake in the argument here – see Bob Stalnaker’s comment 11 below.)

Row and Column are going to play 100 rounds of Prisoners Dilemma. At each round they can either play Co-op or Defect, with standard rules. (So the payoffs are symmetric, and on each round Defect dominats Co-op for each player, but each playing Co-op is Pareto superior to each playing Defect.) The following is true at the start of the game.

  1. Each player is rational.
  2. No player loses a belief that they have if they receive evidence that is consistent with that belief.
  3. For any r, if it is true that if a player were to play Co-op on round r, the other player would play Defect on every subsequent round, then it is also true that if the first player were to play Defect on round r, then the other player would still play Defect on every subsequent round.
  4. The first three premises are matters of common belief.

Call a strategy that a player can play consistent with those four assumptions an approved strategy. (Note that one of the assumptions is that the player is rational, so these will all be rational strategies.) Assume for reductio that there are approved strategies S1 and S2 such that if Column plays S2, then Row can play S1, and this will involve sometimes playing Co-op. I will try to derive a contradiction from that assumption.

Let r be the largest number such that there are approved strategies S1, S2 and if Column is playing S2, and Row plays S1, then Row plays Co-op on round r. I will now argue that it is irrational for Row to play Co-op on round r, contradicting the assumption that S1 is an approved strategy.

Since both players are playing approved strategies, they are both acting consistently with the initial assumptions. So by premise 2, the initial assumptions still hold, and this is a matter of common belief. So it is still a matter of common belief that each player is playing an approved strategy.

If Row plays Co-op on round r, that is still, by hypothesis, an approved strategy, so Column would react by sticking to her approved strategy, by another application of premise 2. Since r is the last round under which any playing an approved strategy against an approved strategy co-operates, and Column is playing an approved strategy, Row believes that if she were to play Co-op, Column would play Defect on every subsequent round. By premise 3 (or, more precisely, by her belief that premise 3 still holds), Row can infer that Column will also play Defect on every subsequent round if she plays Defect on this round.

Putting these two facts together, Row believes prior to playing this round that whatever she were to to, Column would react by playing Defect on every subsequent round. If that’s the case, then she would get a higher return by playing Defect this round, since the only reason to ever play Co-op is that it has an effect on play in later rounds. But it will have no such effect. So it is uniquely rational for Row to play Defect at this round. But this contradicts our assumption that S1 is a rational strategy, and according to it Row plays Co-op on round r.

If our assumption is true, then there can be no approved strategy that ever co-operates before observing the other player co-operate. If there were such a strategy, call it S3, then we can imagine a game where both players play S3. By hypothesis there is a round r where the player playing S3 co-operates before the other player co-operates. So if both players play S3, which is approved, then they will both play Defect up to round r, then play Co-op on that round. But that’s to say that they will play Co-op while (a) playing an approved strategy and (b) believing that the other playing will play an approved strategy. And this contradicts our earlier result.

This does not mean that a rational player can never co-operate, but it does mean that they can never co-operate while the initial assumptions are in place. A rational player might, for instance, co-operate on seeing that her co-player is playing tit-for-tat, and hence that the initial assumptions are not operational.

Nor does it mean, as I think some theorists have been too quick to conclude, that playing Defect all the time is an approved, or even a rational, strategy. Assume that there are approved strategies, and that (as we’ve shown so far) they all involve playing Defect on the first round. Now the familiar objections to backward induction reasoning, tracing back at least to Philip Pettit and Robert Sugden’s “The Backward Induction Paradox”, become salient objections.

If Row holds all the initial assumptions, she may also believe that if she were to play Co-op on the first round, then Column would infer that she is an irrational agent, and that as such she’ll play Tit-for-Tat. (This isn’t built into the original assumptions, but it is consistent with them.) And if Row believes that is how Column would react, then Row is rational to play Co-op, or at least more rational on this occasion than playing Defect. Indeed, even if Row thinks there is a small chance that if she plays Co-op, Column will conclude that she is irrationally playing Tit-for-Tat, then the expected return of playing Co-op will be higher, and hence it will be rational. I conclude that, given any kind of plausible assumptions Row might have about Column’s beliefs, playing Co-op on the first round is rational.

In their paper, Pettit and Sugden try to make two arguments. The first I’ve very quickly excerpted here – namely that the assumption that always Defect is uniquely rational leads to contradiction given minimal assumption about Row’s beliefs about how Column would react. The second, if I’m reading them correctly, is that rational players may play some strategy other than always Defect. The argument for the second conclusion involves rejecting premise 2 of my model. They rely on cases where players react to rational strategies by inferring the other player is irrational, or believes they are irrational, or believes they believe that they are irrational etc. Such cases are not altogether implausible, but it is interesting to think about what happens without making such a possibility.

And I conclude that given my initial assumption, there is no approved strategy. And I’m tempted to think that’s because there is no rational strategy to follow. Just like in Death in Damascus, any strategy a player might follow, they have reason to believe is an irrational strategy when they are playing it. This is a somewhat depressing conclusion, I think, but causal decision theory sometimes doesn’t give us straightforward advice, and I suspect finite iterated Prisoners Dilemma, at least given assumptions like my premise 2, is a case where causal decision theory doesn’t give us any advice at all.

Posted by Brian Weatherson at 1:06 pm

12 Comments »

Asymmetric Death in Damascus

Here is the ‘Death in Damascus’ case from Allan Gibbard and William Harper’s classic paper on causal decision theory.

Consider the story of the man who met Death in Damascus. Death looked surprised, but then recovered his ghastly composure and said, ‘I am coming for you tomorrow’. The terrified man that night bought a camel and rode to Aleppo. The next day, Death knocked on the door of the room where he was hiding, and said I have come for you’.

‘But I thought you would be looking for me in Damascus’, said the man.

‘Not at all’, said Death ‘that is why I was surprised to see you yesterday. I knew that today I was to find you in Aleppo’.

Now suppose the man knows the following. Death works from an appointment book which states time and place; a person dies if and only if the book correctly states in what city he will be at the stated time. The book is made up weeks in advance on the basis of highly reliable predictions. An appointment on the next day has been inscribed for him. Suppose, on this basis, the man would take his being in Damascus the next day as strong evidence that his appointment with Death is in Damascus, and would take his being in Aleppo the next day as strong evidence that his appointment is in Aleppo…

If… he decides to go to Aleppo, he then has strong grounds for expecting that Aleppo is where Death already expects him to be, and hence it is rational for him to prefer staying in Damascus. Similarly, deciding to stay in Damascus would give him strong grounds for thinking that he ought to go to Aleppo.

Causal decision theorists often say that in these cases, there is no rational thing to do. Whatever the man does, he will (when he does it) have really good evidence that he would have been better off if he did something else. Evidential decision theorists often say that this is a terrible consequence of causal decsion theory, but it seems plausible enough to me. It’s bad to make choices that bring about your untimely death, or that you have reason to believe will bring about your untimely death, and that’s what the man here does. So far I’m a happy causal decision theorist.

But let’s change the original case a little. (The changes are similar to the changes in Andy Egan’s various counterexamples to causal decision theory.) The man wants to avoid death, and he believes that Death will predict where he will go tomorrow, and go there tomorrow, and that he’ll die iff he is where Death is. But he has other preferences too. Let’s say that his live options are to spend the next 24 hours somewhat enjoyably in Las Vegas, or exceedingly unpleasanty in Death Valley. Then you might think he’s got a reason to go to Vegas; he’ll die either way, but it will be a better end in Vegas than Death Valley.

Let’s make this a little more precise with some demons and boxes. There is a demon who is, as usual, very good at predicting what you’ll do. The demon has put two boxes, A and B, on the table in front of you, and has put money in them by the following rules.

What should you do? Three possible answers come to mind.

Answer 1: If you take box A, you’ll probably get $100. If you take box B, you’ll probably get $700. You prefer $700 to $100, so you should take box A.

Verdict: WRONG!. This is exactly the reasoning that leads to taking one box in Newcomb’s problem, and one boxing is wrong. (If you don’t agree, then you’re not going to be in the target audience for this post I’m afraid.)

Answer 2: There’s nothing you can rationally do. If you choose A, you would have been better off choosing B, and you’ll know this. If you choose B, you would have been better off choosing A, and you’ll know this. If you walk away, or mentally flip a coin, you’ll get nothing, which seems terrible.

Verdict: I think correct, but three worries.

First, the argument that the mixed strategy is irrational goes by a little quickly. If you are sure you are going to play a mixed strategy, then you couldn’t do any better than by playing it, so it isn’t obviously irrational. So perhaps what’s really true is that if you know that you aren’t going to play a mixed strategy, then playing a mixed strategy would have a lower payoff than playing some pure strategy. For instance, if you are playing B, then if you had have played the mixed strategy (Choose B with probability 0.5, Choose A with probability 0.5), your expected return would have been $750, which is less than the $800 that you would have got if you’d chosen A. And this generalises to any pure strategy that you choose, and any mixed strategy that you could have chosen as an alternative; whatever two strategies you pick, there is a pure strategy that you could have chosen that would have been better. So for anyone who’s not playing a mixed strategy, it would be irrational to play a mixed strategy. And I suspect that condition covers all readers.

Second, this case seems like a pretty strong argument against Richard Jeffrey’s preferred view of using evidential decision theory, but restricting attention to the ratifiable strategies. Only mixed strategies are ratifiable in this puzzle, but mixed strategies seem absolutely crazy here. So don’t restrict yourself to ratifiable strategies.

Third, it seems odd to give up on the puzzle like this. Here’s one way to express our dissatisfaction with answer two. The puzzle is quite asymmetric; box B is quite different to box A in terms of its outcome profile. But our answer is symmetric; either pure strategy is irrational from the perspective of someone who is planning to play it. Perhaps we can put that dissatisfaction to work.

Answer 3: If you choose A, you could have done much much better choosing B. If you choose B, you could have done a little better choosing A. So B doesn’t look as bad as A by this measure. So you should choose B.

Verdict: Tempting, but ultimately I think inconsistent.

I think the intuitions that Andy pumps with his examples are really driven by something like this reasoning. But I don’t think the reasoning really works. Here is a less charitable, but I think more revealing, way of putting the reasoning.

Choosing A is really irrational. Choosing B is only a bit irrational. Since as rational agents we want to minimise irrationality, we should choose B, since that is minimally irrational.

But it should be clear why that can’t work. If choosing B is what rational agents do, i.e. is rational, then one of the premises of our reasoning is mistaken. B is not a little bit irrational, rather, it is not irrational at all. If choosing B is irrational, as the premises state, then we can’t conclude that it is rational.

The only alternative is to deny that B is even a little irrational. But that seems quite odd, since choosing B involves doing something that you know, when you do it, is less rewarding than something else you could just as easily have done.

So I conclude Answer 2 is correct. Either choice is less than fully rational. There isn’t anything that we can, simply and without qualification, say that you should do. This is a problem for those who think decision theory should aim for completeness, but cases like this suggest that this was an implausible aim.

Posted by Brian Weatherson at 12:01 am

10 Comments »

9 November, 2008

Decision Theory Notes

I’ve been continuing to write my decision theory notes for the decision theory course I’ve been doing this term. The course hasn’t ended up quite the way I expected. Because I wanted to go into more details on some of the fundamentals, and because I wanted to avoid heavy lifting in the mathematics, I’ve skipped anything to do with infinities. So that meant cutting countable additivity, the two-envelope problem, the St Petersburg Paradox, etc. But the flip side of that is that there’s more than originally intended on the nature of utility, and the most recent additions have been going quite slowly through the foundations of game theory.

One thing that I suspect is not news to many readers of this site, but which was very striking to me, was how much orthodox game theory resembles evidential decision theory. (Many people will be familiar with this because it is a point that Robert Stalnaker has made well. But writing the notes really drove home for me how true it is.)

It is really hard to offer a motivation for exclusively playing equilibrium strategies in one-shot zero-sum games that doesn’t look like a motivation for taking one-box in Newcomb’s problem. Since us causal decision theorists think taking one-box is irrational, this makes me very suspicious of the normative significance of equilibrium strategies in one-shot games. Of course, most real world games are not one-shot, and I can understand playing a mixed strategy in a repeated zero-sum game. But it is much harder to see why we should play a mixed strategy in a one-shot game.

Posted by Brian Weatherson at 1:00 pm

No Comments »

CEU Summer University

The CEU Summer course in philosophy this year will be on conditionals. The faculty will include Barry Loewer, Jason Stanley, Dorothy Edgington, Alan Hajek, Angelika Kratzer and Robert Stalnaker. There is more information, including details on how to apply, is here. The application deadline is February 16, and it looks like it will be a great learning experience, so I encourage anyone interested to look it up!

Posted by Brian Weatherson at 11:02 am

No Comments »

7 November, 2008

Rutgers Announcements

I’ll repeat these announcements closer to the date, but I wanted to give everyone a heads up about two very big events we have planned at Rutgers for the spring.

Derek Parfit will be giving a series of three lectures on (among other things) naturalism. The talks are scheduled to be on March 26, April 6 and April 22, though those dates are subject to change.

And Rutgers will be hosting a forum on Darwinism featuring Phillip Kitcher and Jerry Fodor. Again the dates are subject to change, but it is currently scheduled for February 18.

Whether or not those dates change, I’ll repeat these announcements closer to the time, but I hope to see many people from the New York area at these events!

Posted by Brian Weatherson at 6:03 pm

No Comments »

Compass Notes

Three announcements related to Philosophy Compass.

Posted by Brian Weatherson at 5:57 pm

No Comments »

6 November, 2008

Links

Hopefully now that the election is over there will be more philosophising. In the meantime, three links.

UPDATE: OBAMA IS THE PRESIDENT!.

Posted by Brian Weatherson at 10:11 am

No Comments »