Tuesday, July 15, 2014

The statements of the prisoner

In the previous pieces I described a theory about intelligence: the statements theory. The starting point says behaviour is based on a gathering of statements. What is the use of a theory about the stacking of statements?

In the following I describe the prisoner’s dilemma. This is a situation from game theory. It’s a two-players game, in which the participants have the options to cooperae or defect. The best results are gained when both players cooperate. Experiments show that in most cases players don’t act that way. The game theory, taking the payoff matrix as the description of the game, does not explain why players take the less-satisfying option. My description of the game from the viewpoint of the statements theory makes it plausible why players get to play less and less cooperatively.

When played as a game the prisoner’s dilemma can be accomplished multiple times sequentially. See also: Wikipedia Prisoner’s dilemma .



The payoff matrix

Both players must make their selection simultaneously: cooperative (C) or non-cooperative (NC). The matrix above shows the payoffs of the game.

When both play C, they get a reward equal to 3 both. When both play NC, they get a reward equal to 1 both. When one plays C and the other plays NC, the cooperative player gets nothing and the “traitor” gets 5.

When the game is iterated (played multiple times) the highest payoffs are earned when both players cooperate. Both get 3 per game. Nevertheless players seldomly cooperate in experiments. Furthermore it is shown when repetitively playing, players are getting less cooperative. You would expect subjects to obtain insight in the game, and see that cooperation is the better strategy. Why subjects fail to do so, is not quite clear.

Davis (1970) writes that it is surprising to see that players tend to cooperate less and less during a run of games.
        Morton D. Davis (1970) Game Theory, A Nontechnical Introduction

Let’s take a look at the game from the players’ point of view.
Player 1 plays based on his experiences. Every completed game is an experience, a statement. The total of statements guides his playing selections. Player 2 plays tit for tat. Tit for tat is a succesful strategy, meaning he cooperates in the first game and in the following games he selects the option his opponent took before. For instance when player 1 selects C (cooperate), player 2 will select C in the next game. When player 1 selects NC (non-cooperative), then player 2 will select NC in the next game. Remember they have to make their selection simultaneously.


The first four games

Game 1. I am player 1 and I select C. My opponent, player 2, selects C as well. Of course I have no influence on the choice of my opponent. All I see is the result, which is that cooperation pays me 3 and pays him 3: (C, me 3, he 3)

Game 2. I want to try NC. My opponent plays the way I did the game before, but that’s one thing I don’t know. So he plays C now. The result is that my NC-selection pays me 5 and pays him 0: (NC, me 5, he 0).

Game 3. My experience so far is that NC if beneficial compared to C, so I select NC again. Player 2 selects NC, which is what I did the game before (but I don’t notice that).
The result is (NC, me 1, he 1).

Game 4. The last 3 games resulted in an average payoff of 3 when I played C and 3 when I played NC. I still have no preference and decide to play C one more time. Player 2 plays NC. Result: (C, me 0, he 5).

The funny thing is that we have all four possible outcomes, as stated in the matrix above. My (player 1’s) information is complete, you’d expect me to have enough statements to play rationally, to follow the best strategy.


The succesive games in iterated playing (click to enlarge)

Game 5. According to my experience, C pays me 1½ on average = (3 + 0) / 2. NC pays me 3 on average = (5 + 1) / 2. So I select NC. And I will persist the next 14 games!

My earns are low, after each game my average payoff descends, still selecting NC. But only after game 18 the average NC-payoff has sank to 1½.

Game 19. I play C now and I instantly get punished for that, because player 2 plays NC now. My average C-payoff is lowered to 1 now. The NC-payoff still is 1½ and it will descend further if I’d stick to selection NC. However it will never get below the average C-payoff. So I will never select C.

The runs of contiguous games in which player 1 played NC, will get longer and longer, until infinite. What we see is, using these strategies the players are getting less cooperatively, even though this is not expected based on the payoff matrix.

What if player 2 plays based on his experiences too? Then this simulation shows they will meet each other playing cooperatively after a while, which happens in game 11. However this is a delicate balance. After 10 more games C as well as NC pay off 1½. What to do now? Play C (if C >=NC) or play NC (if C > NC)? Both players are in two minds. If player 2 would start to cooperate one game later than player 1 (game 12 in stead of 11), the would never again meet each other in cooperation. Which means they stick to play NC. Also with this based-on-experience strategy on both sides, there is a tendency to non-cooperative playing.

  
Conclusion

Basing on the payoff matrix, game theory can derive playing strategies. This approach does not explain the fenomenon of more and more non-cooperative behaviour.

The approach of stacking statements indeed explains why subjects do not follow optimal strategies, in casu why they cooperate less then you’d logically expect.


Back to contents                             Dutch version / Nederlandse versie

No comments:

Post a Comment