Literature DB >> 25843220

Evolutionary performance of zero-determinant strategies in multiplayer games.

Christian Hilbe¹, Bin Wu², Arne Traulsen², Martin A Nowak³.

Abstract

Repetition is one of the key mechanisms to maintain cooperation. In long-term relationships, in which individuals can react to their peers׳ past actions, evolution can promote cooperative strategies that would not be stable in one-shot encounters. The iterated prisoner׳s dilemma illustrates the power of repetition. Many of the key strategies for this game, such as ALLD, ALLC, Tit-for-Tat, or generous Tit-for-Tat, share a common property: players using these strategies enforce a linear relationship between their own payoff and their co-player׳s payoff. Such strategies have been termed zero-determinant (ZD). Recently, it was shown that ZD strategies also exist for multiplayer social dilemmas, and here we explore their evolutionary performance. For small group sizes, ZD strategies play a similar role as for the repeated prisoner׳s dilemma: extortionate ZD strategies are critical for the emergence of cooperation, whereas generous ZD strategies are important to maintain cooperation. In large groups, however, generous strategies tend to become unstable and selfish behaviors gain the upper hand. Our results suggest that repeated interactions alone are not sufficient to maintain large-scale cooperation. Instead, large groups require further mechanisms to sustain cooperation, such as the formation of alliances or institutions, or additional pairwise interactions between group members.

Entities: Chemical Disease Gene Species

Keywords: Cooperation; Evolutionary game theory; Public goods game; Repeated games; Zero-determinant strategies

Mesh：

Year: 2015 PMID： 25843220 PMCID： PMC4425415 DOI： 10.1016/j.jtbi.2015.03.032

Source DB: PubMed Journal: J Theor Biol ISSN： 0022-5193 Impact factor: 2.691

Introduction

One of the major questions in evolutionary biology is why individuals cooperate with each other. Why are some individuals willing to pay a cost (thereby decreasing their own fitness) in order to help someone else? During the last decades, researchers have proposed several mechanisms that are able to explain why cooperation is abundant in nature (Nowak, 2006; Sigmund, 2010). One such mechanism is repetition: if I help you today, you may help me tomorrow (Trivers, 1971). Among humans, this logic of reciprocal giving has been documented in numerous behavioral experiments (e.g., Wedekind and Milinski, 1996; Keser and van Winden, 2000; Fischbacher et al., 2001; Dreber et al., 2008; Grujic et al., 2014). Moreover, it has also been suggested that direct reciprocity is at work in several other species, including vampire bats (Wilkinson, 1984), sticklebacks (Milinski, 1987), blue jays (Stephens et al., 2002), and zebra finches (St. Pierre et al., 2009). From a theoretical viewpoint, these observations lead to the question under which circumstances direct reciprocity evolves, and which strategies can be used to sustain mutual cooperation. The main model to explore these questions is the iterated prisoner׳s dilemma, a stylized game in which two individuals repeatedly decide whether they cooperate or defect (Rapoport and Chammah, 1965; Doebeli and Hauert, 2005). The payoffs of the game are chosen such that mutual cooperation is preferred over mutual defection, but each individual is tempted to defect at the expense of the co-player. Theoretical studies have highlighted several successful strategies for this game (Axelrod and Hamilton, 1981; Molander, 1985; Kraines and Kraines, 1989; Nowak and Sigmund, 1992, 1993b). Evolution often occurs in dynamical cycles (Boyd and Lorberbaum, 1987; Nowak and Sigmund, 1993a; van Veelen et al., 2012): unconditional defectors (ALLD) can be invaded by reciprocal strategies like Tit-for-Tat (TFT), which in turn often catalyze the evolution of more cooperative strategies like generous Tit-for-Tat (gTFT) and unconditional cooperators (ALLC). Once ALLC is common, ALLD can reinvade, thereby closing the evolutionary cycle (Nowak and Sigmund, 1989; Imhof et al., 2005; Imhof and Nowak, 2010). The above mentioned strategies for the iterated prisoner׳s dilemma share an interesting mathematical property: they enforce a linear relationship between the players׳ payoffs in an infinitely repeated game (Press and Dyson, 2012). For example, when player 1 adopts the strategy Tit-for-Tat, the players׳ payoffs π will satisfy the equation , irrespective of player 2׳s strategy. Similarly, when player 1 adopts ALLD, payoffs will satisfy (where c and b denote the cost and the benefit of cooperation, respectively; this version of the prisoner׳s dilemma is sometimes called the donation game, see e.g. Sigmund, 2010). Finally, when player 1 applies gTFT, the enforced payoff relation becomes . Strategies that enforce such linear relationships between payoffs have been called zero-determinant strategies, or ZD strategies (this name is motivated by the fact that these strategies let certain determinants vanish, see Press and Dyson, 2012). After Press and Dyson׳s discovery, several studies have explored how ZD strategies for the repeated prisoner׳s dilemma fare in an evolutionary context (Akin, 2013; Stewart and Plotkin, 2012, 2013; Hilbe et al., 2013a,b; Adami and Hintze, 2013; Szolnoki and Perc, 2014a,b; Chen and Zinger, 2014), and in behavioral experiments (Hilbe et al., 2014a). Zero-determinant strategies are not confined to pairwise games; they also exist in the iterated public goods game (Pan et al., 2014), and in fact in any repeated social dilemma, with an arbitrary number of involved players (Hilbe et al., 2014b). In this way, it has become possible to identify the multiplayer-game analogues of the above mentioned strategies. For example, the multiplayer-version of TFT in a repeated public goods game is proportional Tit-for-Tat (pTFT): if j of the other group members cooperated in the previous round, then a pTFT-player cooperates with probability in the next round, with n being the size of the group. Herein, we will explore the role of these recently discovered multiplayer ZD strategies for the evolution of cooperation. We consider two evolutionary scenarios. First, we consider a conventional setup, in which the members of a well-mixed population are engaged in a series of repeated public goods games, and where successful strategies reproduce more often. In line with previous studies (Boyd and Richerson, 1988; Hauert and Schuster, 1997; Grujic et al., 2012), our simulations confirm that the prospects of cooperation depend on the size of the group. Small groups promote generous ZD strategies that allow for high levels of cooperation, whereas larger groups favor the emergence of selfish ZD strategies such as ALLD. For our second evolutionary scenario, we consider a player with a fixed ZD strategy whose co-players are allowed to adapt their strategies over time. Similar to the case of the repeated prisoner׳s dilemma (Press and Dyson, 2012; Chen and Zinger, 2014), the resulting group dynamics then depends on the applied ZD strategy of the focal player. But also here, the possibilities of a single player to generate a positive group dynamics diminishes with group size, irrespective of the strategy applied by the focal player. Taken together, these results suggest that larger groups make it more difficult to sustain cooperation. In the discussion, we will thus argue that there are three potential mechanisms that can help individuals solving their multiplayer social dilemmas: they can either provide additional incentives on a pairwise basis (Rand et al., 2009; Rockenbach and Milinski, 2006); they can coordinate their actions and form alliances (Hilbe et al., 2014b); or they can implement central institutions which enforce mutual cooperation (Ostrom, 1990; Sigmund et al., 2010; Sasaki et al., 2012; Cressman et al., 2012; Traulsen et al., 2012; Zhang and Li, 2013; Schoenmakers et al., 2014).

Model

Iterated multiplayer dilemmas and memory-one strategies

In the following, we consider a group of n individuals, which is engaged in a repeated multiplayer dilemma. In each round of the game, players can decide whether to cooperate (C) or to defect (D). The payoffs in a given round depend on the player׳s own decision, and on the number of cooperators among the remaining group members. That is, in a round in which j of the other group members cooperate, the focal player receives a for cooperation, and b for defection (see also Table 1). We suppose that the multiplayer game takes the form of a social dilemma, such that payoffs satisfy the following three conditions (see also Kerr et al., 2004): (a) individuals prefer their co-players to be cooperative, and for all j; (b) within a mixed group, defectors outperform cooperators, for all j; (c) mutual cooperation is favored over mutual defection, . Several well-known examples of multiplayer games satisfy these criteria, including the public goods game (see e.g. Ledyard, 1995), the volunteer׳s dilemma (Diekmann, 1985; Archetti, 2009), or the collective-risk dilemma (Milinski et al., 2008; Santos and Pacheco, 2011; Abou Chakra and Traulsen, 2014).

Table 1

Payoff table for multiplayer games with n group members (see also Gokhale and Traulsen, 2010; van Veelen and Nowak, 2012; Gokhale and Traulsen, 2014; Peña et al., 2014; Du et al., 2014). The payoff of a player depends on the player׳s own action, and on the number of cooperating co-players. As an example of a multiplayer dilemma, we will discuss linear public good games. There, cooperators contribute an amount to a common pool. Total contributions to the common pool are multiplied by a factor r with , and evenly shared among all group members. Thus, the payoff of a cooperator is , whereas the payoff of a defector is .

Number of cooperating co-players	n−1	n−2	…	0
Payoff for cooperation	an−1	an−2	…	a0
Payoff for defection	bn−1	bn−2	…	b0

We assume that the multiplayer game is repeated, such that the group members face the same dilemma situation over multiple rounds. Herein, we will focus on infinitely repeated games, but the theory of ZD strategies can also be developed for games with finitely many rounds, or when future payoffs are discounted (Hilbe et al., 2014a, 2015). In repeated games, players can react on their co-players׳ previous behavior. In the simplest case, players only consider the outcome of the last round, that is, they apply a so-called memory-one strategy. Memory-one strategies consist of two parts: a rule that tells the player what to do in the first round, and a rule for what to do in all subsequent rounds, depending on the previous round׳s outcome. In infinitely repeated games, the first-round play can typically be neglected (see also Appendix A.1). In that case, memory-one strategies can be written as a vector . The entries correspond to the player׳s cooperation probability in the next round, given that the player used in the previous round, and that j of the other group members cooperated. Using this notation, the strategy ALLD can be written as (); the strategy ALLC takes the form (); and the strategy proportional Tit-for-Tat is given by pTFT=(). When all players in a group apply memory-one strategies, one can directly calculate the resulting payoffs for each group member, using a Markov chain approach (Nowak and Sigmund, 1993b; Hauert and Schuster, 1997). A detailed description is given in Appendix A.1. However, it is worth noting that the computation of payoffs is numerically expensive, because one needs to calculate the entries of a transition matrix (and the left eigenvector thereof). The exponential increase in computation time for large groups makes it difficult to attain evolutionary results beyond a certain group size (for example, in Hauert and Schuster, 1997, the maximum group size considered is n=5).

Zero-determinant strategies

Only recently, Press and Dyson (2012) have described a particular subclass of memory-one strategies for the repeated prisoner׳s dilemma. With these so-called ZD strategies, a player can enforce a linear relationship between her own payoff and the co-player׳s payoff. Such strategies do also exist in multiplayer social dilemmas (Hilbe et al., 2014b): a memory-one strategy is called a ZD strategy if there are constants l, s, and such that the entries of can be written asBy adopting such a strategy, player i can enforce the payoff relationshipwhere π is the payoff of player i, and is the average payoff of i׳s co-players (Press and Dyson, 2012; Hilbe et al., 2014b). We call s the slope of the ZD strategy, as it controls how the co-players׳ payoffs π− change with the focal player׳s payoff π. Moreover, we call l the baseline payoff: when all players adopt the same ZD strategy, then , and Eq. (2) implies that each player obtains the payoff . The parameter ϕ in the definition of ZD strategies does not have a direct impact on the enforced payoff relationship (Eq. (2)). However, the value of ϕ determines how fast payoffs converge over the course of the game (Hilbe et al., 2014a). Thus, we call ϕ the convergence factor. It is instructive to consider a few examples of ZD strategies for the public goods game. One example is the strategy proportional Tit-for-Tat with cooperation probabilities . The strategy pTFT results from the definition of ZD strategies (1) by setting s=1 and . Since s=1, it follows from Eq. (2) that a player using pTFT enforces the fair relationship , that is, a pTFT player ensures that he always gets exactly the average payoff of the group. In a similar way, many well-known memory-one strategies can be represented as ZD strategies, including ALLD, ALLC, extortionate strategies (EXT), and generous strategies (GEN), as shown in Table 2 and Fig. 1.

Table 2

Examples of ZD strategies for the repeated public goods game. The three strategies ALLD, pTFT, and ALLC can be written as ZD strategies as specified in the table. Moreover, one can define two important sub-classes of ZD strategies. Extortionate strategies (EXT) choose the lowest possible baseline payoff, l=0, and a positive slope value, . In this way, extortionate players ensure that their payoff is always above average, (Hilbe et al., 2014b). Generous ZD strategies (GEN), on the other hand, choose the highest possible baseline payoff , and a positive slope value . As a consequence, generous players ensure that they never outperform their co-players, . An illustration of these strategies is given in Fig. 1.

Strategy	Baseline payoff l	Slope s	Convergence factor ϕ
ALLD	0	(n−1)r−n(n−1)r	(n−1)rcn(r−1)
EXT	0	s>0	ϕ
pTFT	rc−c2	1	1c
GEN	rc− c	s>0	ϕ
ALLC	rc− c	(n−1)r−n(n−1)r	(n−1)rcn(r−1)

Fig. 1

Illustration of ZD strategies for the repeated public goods game. In each panel, the focal player applies a fixed ZD strategy (ALLD, an extortionate strategy EXT, pTFT, a generous strategy GEN, or ALLC). The other group members are not restricted to any particular strategy. The x-axis depicts the resulting payoff π for the focal player, and the y-axis shows the corresponding average payoff of the other group members. The grey-shaded area depicts the space of all possible payoff combinations for the repeated public goods game, and the black dashed line shows the payoff combinations where the focal player yields exactly the average payoff of the other group members. The colored lines give the possible payoffs according to Eq. (2). For each ZD strategy, the parameter s corresponds to the slope of the colored line. Moreover, when , the parameter l corresponds to the intersection of the colored line with the dashed diagonal. For extortionate strategies, the line intersects the diagonal at l=0, and it has a positive slope (for the graph we use s=0.8, implying that on average, the co-players only get 80% of the extortioner׳s payoff). For generous strategies the line intersects the diagonal at the social optimum, , and it has a positive slope . Because the colored payoff lines for ALLD and EXT are below the diagonal, this shows that defectors and extortioners earn more than average. On the other hand, GEN and ALLC yield a payoff below average. The payoff of pTFT always matches the average payoff of the group.

Compared to groups with memory-one players, the calculation of payoffs becomes considerably more simple when all players adopt ZD strategies. To see this, suppose each of the n group members applies some ZD strategy with parameters l, s and ϕ. As a result, each player enforces a linear payoff relationship as in Eq. (2). Overall, this leads to n linear equations in the n unknown payoffs π. This system of equations can be solved explicitly (for details, see Appendix A.2); the payoffs of the players are given bywhereThis formula allows a fast calculation of payoffs even in large groups. The representation of ZD strategies in Eq. (1) has one apparent disadvantage. Because the definition requires three free parameters l, s, and ϕ, it may be difficult to decide whether or not a given memory-one strategy can be written as a ZD strategy. To solve this difficulty, one can derive an alternative representation of ZD strategies. In Appendix A.3, we show that a memory-one strategy is a ZD strategy for the public goods game if and only if the entries satisfyFor general group sizes n, these conditions define the 3-dimensional subspace of ZD strategies, within the 2n-dimensional space of memory-one strategies. When we consider a public goods game between three players only, we can illustrate the resulting space of ZD strategies. To this end, let us assume that players only use reactive strategies (i.e., their cooperation probabilities only depend on the actions of the co-players, but not on their own action, such that for all j). For groups of three players, reactive strategies thus take the form , where p is the probability to cooperate when i of the co-players cooperated in the previous round. Since , the space of all reactive strategies takes the form of a cube (Fig. 2). The conditions for ZD strategies (5) simplify to the condition , which is a two-dimensional plane in the cube of reactive strategies. This plane has the four corners ALLD, pTFT, ALLC, and the anti-reciprocal strategy . Extortionate strategies are on the edge between pTFT and ALLD; in particular, they all have (extortioners never cooperate after mutual defection). In contrast, generous strategies are on the edge between pTFT and ALLC; in particular, they must have (generous players always cooperate after mutual cooperation).

Fig. 2

ZD strategies in the space of reactive strategies for a repeated public goods game between three players. A memory-one strategy is called reactive, if it only depends on the co-players׳ behavior, such that . The space of reactive strategies is given by the cube with . The set of reactive ZD strategies is a plane connecting the points , , and the anti-reciprocal strategy . Extortioners are on the edge with (extortioners never cooperate after mutual defection), whereas generous strategies are on the edge with (they always cooperate after mutual cooperation).

Evolution of zero-determinant strategies

In the following, we want to explore the role of these various ZD strategies in evolutionary processes. To get an intuitive understanding of the possible transitions, let us first focus on a restricted strategy set. Specifically, we consider the strategies ALLD, ALLC, and pTFT; moreover, we include a particular instance of an extortionate strategy (for which we set the slope s=0.8, as depicted in Fig. 1B), and a particular instance of a generous strategy (also having a slope s=0.8, as depicted Fig. 1D). Using other instances of extortionate or generous strategies would leave the main conclusions unchanged, as described in more detail below. For the evolutionary dynamics, we consider a population with N individuals. Let N, N, N, N, and N denote the number of unconditional defectors, extortioners, pTFT players, generous players, and unconditional cooperators, respectively, such that . In each time step, groups of size are randomly formed (by sampling group members from the population without replacement). Given the composition of the group, we can calculate the payoff of each player using the payoff formula (3). By summing up over all possible group compositions, this yields the expected payoff for each strategy in the population. To model the spread of successful strategies, we consider a pairwise comparison process (Blume, 1993; Szabó and Tőke, 1998; Traulsen et al., 2006; Hilbe et al., 2013a; Stewart and Plotkin, 2013). In each time step, some randomly chosen player is given the chance to imitate the strategy of some other randomly chosen group member. If the focal player׳s expected payoff is , and the role model׳s payoff is , then the focal player adopts the role model׳s strategy with probabilityThe parameter denotes the strength of selection. In the limit , selection is neutral and the imitation probability simplifies to . In the limit of strong selection () the role model is imitated only if its strategy is sufficiently beneficial. In addition to these imitation events, we assume that subjects sometimes explore new strategies: in each time step, a randomly chosen player may switch to another strategy with probability (with all other strategies having the same chance to be chosen). Overall, these assumptions lead to a stochastic selection-mutation process, in which successful strategies have a higher chance to be adopted (Nowak et al., 2004; Imhof and Nowak, 2006; Antal et al., 2009). To explore the role of different strategies for the evolutionary dynamics, we have run simulations for different subsets of ZD strategies, and for two different group sizes (as shown in Fig. 3). When groups are small and the population consists only of defectors and generous players, cooperation cannot emerge when initially rare (Fig. 3A). Instead, the emergence of cooperation is dependent on additional strategies that are able to invade ALLD. For example, extortionate strategies can serve as a catalyst for cooperation: extortioners are able to subvert defectors, and once the fraction of extortioners has surpassed a certain threshold, generous ZD strategies can invade and fixate in the population (Fig. 3B). A similar effect can be observed by adding pTFT to the population, which also promotes the evolution of generosity (Fig. 3C). Compared to pTFT, generous ZD strategies have the advantage that they are less prone to errors, as they are more likely to accept a co-player׳s accidental defection. Adding unconditional cooperators, however, can destabilize populations of generous players (Fig. 3D). ALLC players are able to subvert a generous population by neutral drift, which in turn allows for the re-invasion of defectors. As in the case of the iterated prisoner׳s dilemma, the dynamics of the repeated public goods game may result in cycles between cooperation and defection.

Fig. 3

Evolution of different ZD strategies for two possible group sizes (n=4 or n=8) in a pairwise comparison process. Each panel shows the outcome of a representative simulation, starting from a homogeneous population of defectors. For the upper panels with small group sizes, we observe the following dynamics: (A) when players are only allowed to choose between ALLD and the generous strategy, then generous players cannot invade. (B) Adding the extortionate strategy allows generous players to take over the population. (C) Similarly, also pTFT can act as a catalyst for the emergence of generous strategies. (D) When all five strategies are present, cycles between cooperation and defection occur. (E)–(H) The generous strategy ceases to be stable when the groups become too large. In that case, only defectors and extortioners can spread in the population. Parameters: for the social dilemma, we have used a public goods game with c=1 and r=2; evolutionary parameters were set to and , and population size N=100. For the players׳ strategies we used the parameters in Table 2; for EXT and GEN we have set s=0.8 and .

Larger group sizes further impede the evolution of cooperation: when the group size is above a certain threshold, evolution either settles at a population of defectors, or at a population of extortioners (the lower panels in Fig. 3 depict the case n=8). This effect of group size is also illustrated in Fig. 4, which shows the average abundance of each of the five considered ZD strategies as a function of group size n. Whereas generous strategies are most abundant when , more selfish strategies succeed in large groups.

Fig. 4

The effect of group size on the evolution of ZD strategies. We have simulated the evolutionary process when all five ZD strategies are present in the population, and calculated the average abundance of each strategy over 107 time steps. Generous players are most abundant for small population sizes, whereas larger populations favor the evolution of ALLD and extortionate strategies. Parameters are the same as in Figure 3.

To obtain an analytical understanding for these results, let us calculate under which conditions a mutant ZD strategy can invade into a population of defectors. If the mutant applies a ZD strategy with parameters and , we can use the payoff equation (3) to calculate the mutant׳s payoff in a group of defectorsBecause baseline payoffs satisfy , and because slopes fulfill (Hilbe et al., 2014b), it follows that , i.e., no single mutant has a selective advantage in an ALLD population. In particular, for generous mutants (with and ) we get , and hence they are disfavored when rare. However, two strategy classes are able to invade ALLD by neutral drift: when the mutant either applies an extortionate strategy (with ), or pTFT (with ), then . These calculations confirm that both pTFT and extortionate strategies can act as a catalyst for cooperation, as they are able to subvert a population of defectors irrespective of the size of the group. Similarly, we can also explore the stability of a population of generous players. As expected, ALLC mutants are always able to invade by neutral drift (again irrespective of group size). Moreover, using Eq. (3), it follows that the payoff of a single defector exceeds the residents׳ payoff ifwhere is the slope of the generous strategy. Thus, any given generous strategy can be invaded by ALLD, provided that the group size n is sufficiently large. Equivalently, to be stable against defectors, a generous strategy must not be too generous, . In particular, it follows that the set of stable generous strategies shrinks with the size of the group. Taken together, these results suggest that it becomes increasingly difficult to achieve cooperation in large groups.

Evolution in the space of memory-one strategies

By focusing on the five ZD strategies above, we have gained insights into the possible transitions from defection to cooperation; moreover, it has allowed us to show how overly altruistic strategies (such as ALLC) and large group sizes can lead to the downfall of cooperation. However, the focus on these five particular strategies also comes with a risk. We may have neglected other important strategies, which may have a critical effect on the evolutionary outcomes. In order to assess how general the above results are, let us explore in the following how the dynamics of repeated social dilemmas change when we allow for all possible memory-one strategies. Specifically, we apply the adaptive dynamics approach introduced by Imhof and Nowak (2010); that is, we adapt the previously used evolutionary process as follows. Again, we consider a population of size N that is engaged in a repeated public goods game, starting with a homogeneous population of defectors. When a mutation occurs, the mutant strategy is not restricted to a particular subset of ZD strategies; instead, mutants may adopt any memory-one strategy (i.e., a mutant׳s memory-one strategy is created by drawing random numbers uniformly from the unit interval [0,1]). We assume that mutations are sufficiently rare, such that the mutant strategy either fixates, or goes extinct, before the next mutation occurs (this process may take a long time,see Fudenberg and Imhof, 2006; Wu et al., 2012). As a consequence, the dynamics results in a sequence of strategies (, , , …), where the strategy is the strategy applied by the resident after t mutation events. Given this strategy sequence, we can calculate the sequence of resident payoffs (π0, π1, π2,…), using the payoff algorithm described in Appendix A.1. By analyzing these two sequences for different parameter values n, we can analyze the impact of group size on the evolution of strategies, and on the resulting average payoffs. As shown in Fig. 5A, larger group sizes lead, on average, to lower population payoffs. This is not only in line with our previous results depicted in Fig. 4; it also confirms the results of Boyd (1989), showing that large groups are more likely to end up in selfish states. However, it is worth noting that the previous results were based on the comparison of the ALLD strategy with a handful of other, more cooperative strategies (in Boyd, 1989, defectors were matched against threshold variants of Tit-for-Tat,which only cooperate if at least k of the other players cooperated in the previous round). Fig. 5A shows that this conclusion also holds in the larger (and more general) strategy space of memory-one strategies: larger group sizes impede the evolution of cooperation (which is in line with the simulations presented in Hauert and Schuster, 1997).

Fig. 5

The effect of group size on average payoffs and on the relative abundance of different ZD strategies and win-stay lose-shift. Both graphs show the result of the evolutionary imitation process, in which mutant strategies either reach fixation or go extinct before the next mutant arises. The process was run for 107 mutant strategies. (A) Whereas average payoffs are close to the optimum for small group sizes, payoffs drop in large groups. (B) A strategy is favored by selection if its relative abundance is above the dashed line. For small group sizes, selection favors cooperative strategies like WSLS and GEN. As groups become larger, EXT and ALLD take over and cooperation breaks down. Parameters: efficiency r=2.4, cost of cooperation c=1, population size N=100, strength of selection .

To gain further insights into what drives this downfall of cooperation, we have also explored which strategies were used by the residents over the course of the evolutionary process. To this end, we have applied the method introduced by Hilbe et al. (2013a): to measure the relative importance of a given strategy , we have recorded how often the evolutionary process visits the neighborhood of (as the strategy׳s neighborhood, we have taken the 1% of memory-one strategies that are closest to ). Using this method, we call being favored by selection if the evolutionary process spends more than 1% of the time in this neighborhood (i.e., if the process spends more time in the neighborhood than expected under neutrality). Let us first apply this method to the five ZD strategies considered before. As shown in Fig. 5B, our results reflect the qualitative findings in the previous section. Only in small groups, the generous strategy is favored by selection; as the group size increases, ALLD and the extortionate strategy become increasingly successful. For comparison, we have also explored the evolutionary success of the traditional champion in repeated games, win-stay lose-shift (WSLS, see Nowak and Sigmund, 1993b). WSLS only cooperates if all group members have used the same action in the previous round, i.e., , and otherwise (in Hauert and Schuster, 1997 this strategy is called Pavlov, and in Pinheiro et al., 2014 it is called an All-or-None strategy). In Hilbe et al. (2014b) it is shown that WSLS is a Nash equilibrium if , which is satisfied for the parameters used for the simulations. Indeed, Fig. 5B confirms that WSLS is favored by selection for all considered group sizes, but its relative importance decreases with n. For , the process spends more than 20% of the time in the neighborhood of WSLS, whereas for n=7 the neighborhood is only visited 12% of the time. These results indicate that although WSLS is able to sustain cooperation even in larger groups, evolutionary processes tend to favor ALLD and extortionate strategies instead, which is in line with the downfall of average payoffs as the group size increases.

Performance of ZD strategies against adapting opponents

In the previous two sections, we have considered a traditional setup to study evolutionary processes. We have assumed that all players come from the same population, and they all are equally likely to change their strategies over time. However, for the iterated prisoner׳s dilemma it has been suggested that extortioners, and more generally ZD strategies with a positive slope, are particularly successful when they are stubborn (Press and Dyson, 2012; Hilbe et al., 2013a; Chen and Zinger, 2014): they should refrain from switching to other strategies that may be more profitable in the short run, in order to gain a long-run advantage. When a player with a fixed strategy is paired with adapting co-players, the nature of the interaction changes. Instead of a symmetric and simultaneous game, the interaction now takes the form of an asymmetric and sequential game (Bergstrom and Lachmann, 2003; Damore and Gore, 2011): by choosing a fixed strategy, the stubborn player moves first, whereas the adapting players have a chance to evolve, and to move towards a best reply over time. To investigate such a setup in the context of multiplayer dilemmas, let us modify the evolutionary process as follows: instead of considering a large population of players, let us consider a fixed group of size n that is engaged in a sequence of repeated public goods game. One of the players, called the focal player, is assumed to take a fixed ZD strategy. The other group members are allowed to change their strategies from one repeated game to the next. Specifically, we assume that in each time step, one of the adapting players is chosen at random. This player is then given the chance to experiment with a different strategy. When the payoff of the old strategy is , whereas the new strategy yields , we assume that the player switches to the new strategy with probability ρ as specified in Eq. (6). Overall, these assumptions result in an evolutionary process in which one player sticks to his strategy, whereas the other players can change to better strategies, given the current composition of the group. In Fig. 6, we show the outcome of such an evolutionary process under the assumption that all players are restricted to the five ZD strategies used before. Independent of the fixed strategy of the focal player, all simulations have in common that the focal player׳s payoff decreases with group size. Nevertheless, the strategy of the focal player still has a considerable impact on the resulting group dynamics. For small group sizes, the simulations confirm that focal players with a higher slope value s tend to gain higher payoffs (see Fig. 6, upper panels). The co-players of a focal ALLD or ALLC player often adapt towards selfish strategies, whereas the co-players of a focal EXT, pTFT, or GEN player tend to adopt cooperative strategies (as depicted in Fig. 6, lower panels). Only as the group size becomes large, this positive effect of the focal player׳s strategy on the group dynamics disappears. For example, in groups of size n=8, the strategy distribution of the remaining group members is largely independent of the fixed strategy of the focal individual. The only exception occurs when the focal player is unconditionally altruistic (in which the remaining group members favor ALLD, independent of the group size). These simulations confirm that stubborn players are most successful when they apply ZD strategies with a high slope value (the most successful strategy in Fig. 6 is pTFT, which is also the strategy that has the maximum value for s). Higher slope values correspond to players that are more conditionally cooperative. Thus, when players aim to have a positive impact on the group dynamics, they need to apply reciprocal strategies.

Fig. 6

Asymmetric evolution for a focal player using a fixed ZD strategy, whereas the other group members may switch between different ZD strategies. We consider five different cases, depending on whether the focal player applies (A) ALLD, (B) an extortionate strategy, (C) pTFT, (D) a generous strategy, and (E) ALLC. The upper panel shows the resulting average payoff for the focal player, whereas the lower panels give the strategy abundance of the other group members, averaged over a simulation with 105 adaptation events. Independent of the focal player׳s strategy, payoffs decrease with group size. However, focal players with a conditionally cooperative strategy (as in B–D) tend to have higher payoffs than unconditional focal players (as in A and E). The remaining parameters are the same as in Fig. 3: for the public goods game we have used c=1 and r=2; evolutionary parameters were set to (strong selection) and (rare mutations).

Discussion

Repeated interactions provide an important explanation for the evolution of cooperation: individuals cooperate because they can expect to be rewarded in future (Trivers, 1971; Axelrod and Hamilton, 1981; Doebeli and Hauert, 2005; Nowak, 2006; Sigmund, 2010). The framework of repeated games does not necessarily require sophisticated mental capacities. Several experiments suggest that various animal species are able to use reciprocal strategies (Wilkinson, 1984; Milinski, 1987; Stephens et al., 2002; St. Pierre et al., 2009), and also theory suggests that full cooperation can already be achieved using simple strategies that only refer to the outcome of the last round. Much research in the past has been devoted to explore conditionally cooperative strategies in pairwise interactions. There has been considerably less effort to understand the evolution of reciprocity in larger groups (some exceptions include Boyd, 1989; Hauert and Schuster, 1997; Kurokawa and Ihara, 2009; Grujic et al., 2012; Van Segbroeck et al., 2012). This is surprising, because using the theory of ZD strategies, most of the successful strategies for the repeated prisoner׳s dilemma can be naturally generalized to other social dilemmas, with an arbitrary number of players (Hilbe et al., 2014b; Pan et al., 2014). In the public goods game, for example, the set of ZD strategies includes ALLD and ALLC, but also reciprocal strategies like proportional Tit-for-Tat (pTFT), extortionate and generous strategies. Herein, we have explored how these strategies fare from an evolutionary perspective. Our simulations suggest that the evolutionary success of ZD strategies critically depends on the size of the group. In smaller groups, the dynamics of strategies is comparable to the dynamics in the prisoner׳s dilemma (Nowak and Sigmund, 1992; Imhof and Nowak, 2010; Hilbe et al., 2013a): selfish populations can be invaded by extortioners or pTFT, which in turn can give rise to the evolution of generous ZD strategies. Generous strategies, however, can be subverted by unconditional cooperators, which can lead back to populations of defectors. These evolutionary cycles collapse when groups become too large. In large groups, evolution favors selfish strategies instead, resulting in a sharp decrease in population payoffs. To obtain these results, we have sometimes restricted the strategy space, by focusing on players using ZD strategies only. This focus has allowed us to calculate payoffs efficiently. In general, the time to compute payoffs in multiplayer games increases exponentially in the size of the group (which makes it unfeasible to simulate games with more than 5–10 players). But for ZD strategies, payoffs can be computed directly, using the formula in Eq. (3). The focus on ZD strategies, however, may come at the risk of neglecting other important strategies, such as win-stay lose-shift (WSLS). Nevertheless, our main qualitative results remain unchanged even when we consider the more general space of memory-one strategies (as shown in Fig. 5). Overall, we have observed that repeated interactions can only help sustaining cooperation when groups are sufficiently small. The downfall of cooperation in large groups can be prevented if large-scale endeavors have an efficiency advantage: Pinheiro et al. (2014) observe that WSLS remains successful if is kept constant (and therefore r needs to increase as n becomes large). However, for many examples (such as the management of common resources) such an efficiency advantage seems unfeasible. For such cases, our results suggest that repeated interactions alone are no longer able to sustain cooperation. Yet, human societies are remarkably successful in maintaining cooperative norms even in groups of considerable size (Fehr and Fischbacher, 2003), suggesting that large-scale cooperation is based on additional mechanisms. Three mechanisms seem to be especially relevant: individuals can maintain cooperation if there are additional pairwise incentives to cooperate (Rand et al., 2009; Rockenbach and Milinski, 2006); they can increase their strategic power by coordinating their actions and by forming alliances (Hilbe et al., 2014b); or they can implement central institutions that enforce mutual cooperation (Sigmund et al., 2010, 2011; Sasaki et al., 2012; Hilbe et al., 2014b; Schoenmakers et al., 2014). Interestingly, each of these additional mechanisms is costly, and thus requires an evolutionary explanation on its own. In particular, these additional mechanisms are only likely to evolve when other, more efficient ways to establish cooperation fail. Herein, we have shown such a failure to establish cooperation when repeated interactions take place in large groups.

56 in total

Evolutionary performance of zero-determinant strategies in multiplayer games.

Introduction

Model

Iterated multiplayer dilemmas and memory-one strategies

Zero-determinant strategies

Evolution of zero-determinant strategies

Evolution in the space of memory-one strategies

Performance of ZD strategies against adapting opponents

Discussion

1. Evolutionary cycles of cooperation and defection.

2. From extortion to generosity, evolution in the Iterated Prisoner's Dilemma.

3. Evolution of extortion in structured populations.

4. The evolution of cooperation.

5. Long-term social bonds promote cooperation in the iterated Prisoner's Dilemma.

6. Cooperation and control in multiplayer social dilemmas.

7. Evolution of all-or-none strategies in repeated public goods dilemmas.

8. A comparative analysis of spatial Prisoner's Dilemma experiments: conditional cooperation and payoff irrelevance.

9. Extortion subdues human players but is finally punished in the prisoner's dilemma.

10. Defection and extortion as unexpected catalysts of unconditional cooperation in structured populations.

1. Evolutionary consequences of behavioral diversity.

2. Autocratic strategies for iterated games with arbitrary action spaces.

3. Zero-Determinant Strategies in Iterated Public Goods Game.

4. Comparing reactive and memory-one strategies of direct reciprocity.

5. Small groups and long memories promote cooperation.

6. Asymmetric Power Boosts Extortion in an Economic Experiment.

7. Updating Darwin: Information and entropy drive the evolution of life.

8. Towards an Iterated Game Model with Multiple Adversaries in Smart-World Systems.

9. Evolution of Conformity in Social Dilemmas.

10. Humans choose representatives who enforce cooperation in social dilemmas through extortion.