Literature DB >> 34084544

Memory-two zero-determinant strategies in repeated games.

Masahiko Ueda1.   

Abstract

Repeated games have provided an explanation of how mutual cooperation can be achieved even if defection is more favourable in a one-shot game in the Prisoner's Dilemma situation. Recently found zero-determinant (ZD) strategies have substantially been investigated in evolutionary game theory. The original memory-one ZD strategies unilaterally enforce linear relationships between average pay-offs of players. Here, we extend the concept of ZD strategies to memory-two strategies in repeated games. Memory-two ZD strategies unilaterally enforce linear relationships between correlation functions of pay-offs and pay-offs of the previous round. Examples of memory-two ZD strategy in the repeated Prisoner's Dilemma game are provided, some of which generalize the tit-for-tat strategy to a memory-two case. Extension of ZD strategies to memory-n case with n ≥ ~2 is also straightforward.
© 2021 The Authors.

Entities:  

Keywords:  Repeated games; memory-n strategies; zero-determinant strategies

Year:  2021        PMID: 34084544      PMCID: PMC8150048          DOI: 10.1098/rsos.202186

Source DB:  PubMed          Journal:  R Soc Open Sci        ISSN: 2054-5703            Impact factor:   2.963


Introduction

Repeated games offer a framework explaining forward-looking behaviours and reciprocity of rational agents [1,2]. Since it was pointed out that game theory of rational agents can be applied to evolutionary behaviour of populations of biological systems [3], evolutionary game theory has investigated the condition where mutualism is maintained in conflicts [4-11]. In the repeated Prisoner’s Dilemma game, it was found that, although there are many equilibria, none of them are evolutionary stable owing to neutral drift [12]. Because rationality of each biological individual is bounded, evolutionary stability of strategies whose length of memory is one has mainly been focused on in evolutionary game theory. However, memory-one strategies contain several useful strategies in the Prisoner’s Dilemma game, such as the grim trigger strategy [13], the tit-for-tat (TFT) strategy [14-17], and the win-stay lose-shift strategy [5], which can form cooperative Nash equilibria. In 2012, Press and Dyson discovered a novel class of memory-one strategies called zero-determinant (ZD) strategies [18]. ZD strategy unilaterally enforces a linear relationships between average pay-offs of players. ZD strategies in the Prisoner’s Dilemma game contain the equalizer strategy which unilaterally sets the average pay-off of the opponent, and the extortionate strategy by which the player can gain greater average pay-off than the opponent. After their work, evolutionary stability of ZD strategies in the Prisoner’s Dilemma game was investigated by several authors [19-24]. Furthermore, the concept of ZD strategies has been extended to multi-player multi-action games [25-29]. Linera algebraic properties of ZD strategies in general multi-player multi-action games with many ZD players were also investigated in [30], which found that possible ZD strategies are constrained by the consistency of the linear pay-off relationships. Another extension is ZD strategies in repeated games with imperfect monitoring [30-32], where possible ZD strategies are more restricted than ones in perfect monitoring cases. Furthermore, ZD strategies were also extended to repeated games with discounting factor [28,33-35] and asymmetric games [36]. Performance of ZD strategies such as the extortionate strategy and the generous ZD strategy in the Prisoner’s Dilemma game has also been investigated in human experiments [37-39]. Moreover, behaviour of the extortionate strategy in structured populations was found to be quite different from that in well-mixed populations [40,41]. Although ZD strategies are not necessarily a rational strategy, they contain the TFT strategy in the Prisoner’s Dilemma game [18], which returns the opponent’s previous action, and accordingly ZD strategies form a significant class of memory-one strategies. Recently, properties of longer memory strategies have been investigated in the context of repeated games with implementation errors [42-45]. In general, longer memory enables complicated behaviour [46]. Especially, it has been shown that, in the Prisoner’s Dilemma game, a memory-two strategy called tit-for-tat-anti-tit-for-tat (TFT-ATFT) is successful under implementation errors [42]. Although TFT-ATFT normally behaves as TFT, it switches to ATFT when it recognizes an error, and then returns to TFT when mutual cooperation is achieved or when the opponent unilaterally defects twice. In [45], a successful strategy in memory-three class which can easily be interpreted has also been proposed. Recall that original memory-one TFT strategy, which is also successful but is not robust against errors, is a special case of memory-one ZD strategies. Therefore, discussion of longer-memory strategies in the context of ZD strategies would be useful. However, the concept of ZD strategies has not been extended to longer-memory strategies. In this paper, we extend the concept of ZD strategies to memory-two strategies in repeated games. Memory-two ZD strategies unilaterally enforce linear relationships between correlation functions of pay-offs at the present round and pay-offs at the previous round. We provide examples of memory-two ZD strategy in a repeated Prisoner’s Dilemma game. Particularly, one of the examples can be regarded as an extension of TFT strategy to a memory-two case. The paper is organized as follows. In §2, we introduce a model of repeated game with memory-two strategies. In §3, we extend ZD strategies to the memory-two strategy class. In §4, we provide examples of memory-two ZD strategies in a repeated Prisoner’s Dilemma game. In §5, extension of ZD strategies to a memory-n case with n ≥ 2 is discussed. Section 6 is devoted to concluding remarks.

Model

We consider an N-player repeated game. Action of player a ∈ {1,…, N} is described as σ ∈ {1,…, M}. We collectively denote state . We consider the situation that the length of memory of strategies of all players are at most two. (We will see in §6 that this assumption can be weakened to memory-n with n ≥ 2.) Strategy of player a is described as the conditional probability of taking acton σ when states at last round and second-to-last round are and , respectively. Let be pay-offs of player a when the state is . The time evolution of this system is described by the Markov chain where is joint distribution of the present state and the last state at time t, and we have defined the transition probability Initial condition is described as . We consider the situation that the discounting factor is δ = 1 [1].

Memory-two zero-determinant strategies

We consider the situation that the Markov chain (equation (2.1)) has a stationary probability distribution: By taking summation of both sides with respect to with an arbitrary a, we obtain Furthermore, by taking summation of both sides with respective to , we obtain Therefore, the quantity is mean-zero with respective to the stationary distribution : for arbitrary σ. This is the extension of Akin’s lemma [25,28,30,47,48] to the memory-two case. (We remark that the term is regarded as memory-one strategy ‘Repeat’, which repeats the action at the previous round.) We call as a Press-Dyson (PD) matrix. It should be noted that is controlled only by player a. When player a chooses her strategies as her PD matrices satisfy with some coefficients and {α}, where we have introduced , we obtain where 〈 · · · 〉(st) represents average with respect to the stationary distribution P(st). This is the extension of the concept of ZD strategies to a memory-two case. We remark that the original (memory-one) ZD strategies unilaterally enforce linear relationships between average pay-offs of players at the stationary state. Here, memory-two ZD strategies unilaterally enforce linear relationships between correlation functions of pay-offs at the present round and pay-offs at the previous round at the stationary state. (It should be noted that the quantity does not depend on t in the stationary state.) We note that because the number of the components of a PD matrix is M2 and the number of pay-off tensors s ⊗ s in equation (3.6) is (N + 1)2, the space of memory-two ZD strategies is small even for the Prisoner’s Dilemma game (N = 2 and M = 2), and most of the memory-two strategies are not memory-two ZD strategies. In addition, although we choose s ⊗ s as a basis in the right-hand side of equation (3.6), such choice is not necessary and we can choose another function [49]. We remark that, when we take summation of both sides of equation (3.1) with respect to , we obtain This uniquely determines the stationary distribution of a single state . We also remark that, because of the normalization condition of the conditional probability , PD matrices satisfy for arbitrary .

Examples: repeated Prisoner’s Dilemma

Here, we consider a two-player two-action Prisoner’s Dilemma game [18]. Actions of two players are 1 (cooperation) or 2 (defection). Pay-offs of two players are and with T > R > P > S. We provide three examples of memory-two ZD strategies.

Example 1: relating correlation function with average pay-offs

We consider the situation that player 1 takes the following memory-two strategy: (We have assumed that T − P ≥ P − S. For the case T − P < P − S, a slight modification is needed.) Then, we find that her PD matrix is which means and that this strategy is a memory-two ZD strategy which unilaterally enforces Therefore, the correlation function is related to the average pay-offs 〈s1〉(st) and 〈s2〉(st) by the ZD strategy. We provide numerical results about this linear relationship. We set parameters (R, S, T, P) = (3, 0, 5, 1). The strategy of player 2 is set to and we change q in the range [0, 1]. (Note that the strategy of player 2 is essentially memory-one.) Actions of both players at t = 0 and t = 1 are sampled from uniform distribution. In figure 1, we display the result of numerical simulations of one sample, where the average is calculated by time average at t = 100000. We can see that the linear relation (4.4) indeed holds for all q.
Figure 1

Time-averaged pay-offs of two players and correlation functions with t = 100 000 for various q when the strategy of player 1 is given by equation (4.1). The red line corresponds to the right-hand side of equation (4.4).

Time-averaged pay-offs of two players and correlation functions with t = 100 000 for various q when the strategy of player 1 is given by equation (4.1). The red line corresponds to the right-hand side of equation (4.4).

Example 2: extended tit-for-tat strategy

Here, we introduce a memory-two ZD strategy which can be called an extended tit-for-tat (ETFT) strategy. We consider the situation when player 1 takes the following memory-two strategy: Note that this does not depend on the concrete values of pay-offs (R, S, T, P). Then, we find that her PD matrix is and it satisfies Therefore, it is memory-two ZD strategy, which enforces This linear relationship can be seen as some fairness condition between two players. Because the original TFT strategy enforces 〈s1〉(st) = 〈s2〉(st) [18], equation (4.6) can be regarded as an extension of TFT strategy. We provide numerical results about the ETFT strategy. Parameters and strategies of player 2 are set to the same values as those in the previous subsection. In figure 2, we display the result of numerical simulations of one sample. We can check that the linear relationship of equation (4.9) seems to hold for all q.
Figure 2

Time-averaged pay-offs of two players and correlation functions with t = 100 000 for various q when the strategy of player 1 is given by equation (4.6). The red line corresponds to the right-hand side of equation (4.9).

Time-averaged pay-offs of two players and correlation functions with t = 100 000 for various q when the strategy of player 1 is given by equation (4.6). The red line corresponds to the right-hand side of equation (4.9). We can understand feelings of the ETFT player as follows. We find that, when the previous state is (1, 1) or (2, 2), ETFT behaves as TFT. When the previous state and the second-to-last state are or ((2, 1), (1, 1)), ETFT regards that one of the players mistakes his/her action, and returns action 1 (cooperation) or 2 (defection) at random. Similarly, when the previous state and the second-to-last state are ((1, 2), (2, 2)) or ((2, 1), (2, 2)), ETFT also regards that one of the players mistakes his/her action, and returns action 1 or 2 at random. When the previous state and the second-to-last state are ((1, 2), (1, 2)), ETFT ceases to cooperate and returns action 2. When the previous state and the second-to-last state are ((2, 1), (2, 1)), ETFT continues to exploit and returns action 2. Finally, when the previous state and the second-to-last state are ((1, 2), (2, 1)) or ((2, 1), (1, 2)), ETFT generously cooperates. Although this strategy is different from TFT-ATFT [42]: which is deterministic, ETFT may be successful because ETFT has several properties in common with TFT-ATFT. Furthermore, because ETFT is stochastic, it may be robust against implementation errors. Evolutionary stability of ETFT must be investigated in future. It should be noted that a slightly modified version of ETFT: is also a memory-two ZD strategy. We call this strategy as type-2 extended tit-for-tat (ETFT-2) strategy. The PD matrix of ETFT-2 is described as which enforces a linear relationship That is, the sign of the last term is different from that of ETFT. In figure 3, we also display the result of numerical simulations of one sample, where parameters are set to the same values as before. We can check that equation (4.14) holds for all q. Although ETFT-2 is similar to ETFT, it will be exploited by all-D strategy (which always defects), because T1(1|1, 2, 1, 2) = 1. Therefore, it is expected that ETFT-2 is less successful than ETFT.
Figure 3

Time-averaged pay-offs of two players and correlation functions with t = 100 000 for various q when the strategy of player 1 is given by equation (4.12). The red line corresponds to the right-hand side of equation (4.14).

Time-averaged pay-offs of two players and correlation functions with t = 100 000 for various q when the strategy of player 1 is given by equation (4.12). The red line corresponds to the right-hand side of equation (4.14).

Example 3: fickle tit-for-tat strategy

Here, we introduce another memory-two ZD strategy which can be called a fickle tit-for-tat (FTFT) strategy. In this subsection, we assume 2R > T + S, which corresponds to the condition that mutual cooperation is favourable to the period-two sequence (1, 2) → (2, 1) → (1, 2) → · · · [18]. We consider the situation that player 1 takes the following memory-two strategy: Then, we find that her PD matrix is and it satisfies Therefore, it is also memory-two ZD strategy, which enforces This linear relationship can be regarded as another type of fairness condition between two players. One can compare the strategy matrix of FTFT (equation (4.15)) with that of TFT (equation (4.10)). FTFT can take a different action from TFT with finite probability when the previous state is (1, 2) or (2, 1). We provide numerical results about the FTFT strategy. Parameters and strategies of player 2 are set to the same values as those in the previous subsections. In figure 4, we display the result of numerical simulations of one sample. We confirm that the linear relationship equation (4.18) holds for all q.
Figure 4

Time-averaged pay-offs of two players and correlation functions with t = 100 000 for various q when the strategy of player 1 is given by equation (4.15). The red line corresponds to the right-hand side of equation (4.18).

Time-averaged pay-offs of two players and correlation functions with t = 100 000 for various q when the strategy of player 1 is given by equation (4.15). The red line corresponds to the right-hand side of equation (4.18).

Example 4: extended zero-sum strategy

Here we assume that 2R > T + S and 2P < T + S. In memory-one strategies, there exists the following memory-one ZD strategy: where we have introduced Because this strategy satisfies it unilaterally enforces Because this relationship means that the sum of average pay-offs of two players is fixed, this ZD strategy can be called a zero-sum strategy (ZSS). As an extension of ZSS, we can consider the following memory-two strategy: Then, we find that a PD matrix of the player can be rewritten as Therefore, this strategy is a memory-two ZD strategy enforcing Because this strategy can be regarded as an extension of ZSS, we call this strategy an extended zero-sum strategy (EZSS). In figure 5, we display the result of numerical simulation of one sample, where parameters are set to the same values as before. We can check that equation (4.25) indeed holds for all q.
Figure 5

Time-averaged pay-offs of two players and correlation functions with t = 100 000 for various q when the strategy of player 1 is given by equation (4.23). The red line corresponds to the right-hand side of equation (4.25).

Time-averaged pay-offs of two players and correlation functions with t = 100 000 for various q when the strategy of player 1 is given by equation (4.23). The red line corresponds to the right-hand side of equation (4.25).

Remark

As in memory-one cases [18], possible memory-two ZD strategies are restricted by the sign of each component of matrix , because is probability for all and must be . For example, memory-two ZD strategy of player 1 satisfying does not exist.

Extension to memory-n case

In this section, we discuss that extension of ZD strategies to a memory-n (n ≥ 2) case. By using the technique of this paper, extension of Akin’s lemma (equation (3.5)) to a longer-memory case is straightforward. Therefore, extension of the concept of ZD strategies to longer-memory cases is also straightforward. For the memory-n case, the time evolution is described by the Markov chain with the transition probability Then, by taking summation of both sides of the stationary condition with respect to , , ·· ·, , we obtain the extended Akin’s lemma: with which can be called a PD tensor. When player a chooses her strategy as her PD tensors satisfy with some coefficients and , she unilaterally enforces a linear relationship This is the memory-n ZD strategy. In other words, in memory-n ZD strategies, linear relationships between correlation functions of pay-offs during timespan n are unilaterally enforced. Constructing useful examples of memory-n ZD strategies with n ≥ 2 in the Prisoner’s Dilemma game is a subject of future work.

Concluding remarks

In this paper, we extended the concept of ZD strategies in repeated games to memory-two strategies. Memory-two ZD strategies unilaterally enforce linear relationships between correlation functions of pay-offs and pay-offs at the previous round. We provided examples of memory-two ZD strategies in the Prisoner’s Dilemma game. Some of them can be regarded as variants of TFT strategy. We also discussed that extension of ZD strategies to a memory-n (n ≥ 2) case is straightforward. Before ending this paper, we make two remarks. First, in our numerical simulations, we investigated only simple situations which correspond to well-mixed populations and without evolutionary behaviour. It has been known that evolutionary behaviour can drastically change when populations are structured [50-52]. Therefore, investigating performance of our variants of the TFT strategy in evolutionary game theory in well-mixed populations and structured populations is an important future problem. The second remark is related to the length of memory. Recently, it has been found that long memory can promote cooperation in the Prisoner’s Dilemma game [53,54]. Our variants of the TFT strategy may promote cooperation because they are constructed based on TFT. Furthermore, as discussed in §4, ETFT has several properties in common with TFT-ATFT [42], which is successful under implementation errors. Investigating which extension of TFT is the most successful is a significant problem. Additionally, whether TFT-ATFT is memory-two ZD strategy or not should be studied. Click here for additional data file.
  35 in total

1.  Evolutionary cycles of cooperation and defection.

Authors:  Lorens A Imhof; Drew Fudenberg; Martin A Nowak
Journal:  Proc Natl Acad Sci U S A       Date:  2005-07-25       Impact factor: 11.205

2.  From extortion to generosity, evolution in the Iterated Prisoner's Dilemma.

Authors:  Alexander J Stewart; Joshua B Plotkin
Journal:  Proc Natl Acad Sci U S A       Date:  2013-09-03       Impact factor: 11.205

3.  Strategies that enforce linear payoff relationships under observation errors in Repeated Prisoner's Dilemma game.

Authors:  Azumi Mamiya; Genki Ichinose
Journal:  J Theor Biol       Date:  2019-06-12       Impact factor: 2.691

Review 4.  Partners and rivals in direct reciprocity.

Authors:  Christian Hilbe; Krishnendu Chatterjee; Martin A Nowak
Journal:  Nat Hum Behav       Date:  2018-03-19

5.  Zero-determinant strategies in finitely repeated games.

Authors:  Genki Ichinose; Naoki Masuda
Journal:  J Theor Biol       Date:  2017-11-14       Impact factor: 2.691

6.  A strategy of win-stay, lose-shift that outperforms tit-for-tat in the Prisoner's Dilemma game.

Authors:  M Nowak; K Sigmund
Journal:  Nature       Date:  1993-07-01       Impact factor: 49.962

7.  Evolution of extortion in Iterated Prisoner's Dilemma games.

Authors:  Christian Hilbe; Martin A Nowak; Karl Sigmund
Journal:  Proc Natl Acad Sci U S A       Date:  2013-04-09       Impact factor: 11.205

8.  Adaptive dynamics of extortion and compliance.

Authors:  Christian Hilbe; Martin A Nowak; Arne Traulsen
Journal:  PLoS One       Date:  2013-11-01       Impact factor: 3.240

9.  Defection and extortion as unexpected catalysts of unconditional cooperation in structured populations.

Authors:  Attila Szolnoki; Matjaž Perc
Journal:  Sci Rep       Date:  2014-06-30       Impact factor: 4.379

10.  Partners or rivals? Strategies for the iterated prisoner's dilemma.

Authors:  Christian Hilbe; Arne Traulsen; Karl Sigmund
Journal:  Games Econ Behav       Date:  2015-07
View more
  1 in total

1.  Mitigation of block withholding attack based on zero-determinant strategy.

Authors:  Min Ren; Hongfeng Guo; Zhihao Wang
Journal:  PeerJ Comput Sci       Date:  2022-07-21
  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.