| Literature DB >> 33037241 |
Yohsuke Murase1, Seung Ki Baek2.
Abstract
Direct reciprocity is one of the key mechanisms accounting for cooperation in our social life. According to recent understanding, most of classical strategies for direct reciprocity fall into one of two classes, 'partners' or 'rivals'. A 'partner' is a generous strategy achieving mutual cooperation, and a 'rival' never lets the co-player become better off. They have different working conditions: For example, partners show good performance in a large population, whereas rivals do in head-to-head matches. By means of exhaustive enumeration, we demonstrate the existence of strategies that act as both partners and rivals. Among them, we focus on a human-interpretable strategy, named 'CAPRI' after its five characteristic ingredients, i.e., cooperate, accept, punish, recover, and defect otherwise. Our evolutionary simulation shows excellent performance of CAPRI in a broad range of environmental conditions.Entities:
Mesh:
Year: 2020 PMID: 33037241 PMCID: PMC7547665 DOI: 10.1038/s41598-020-73855-x
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Description of well-known strategies in the iterated PD game. Whenever possible, each strategy is represented as a tuple of five probabilities, i.e., , where means the probability to cooperate in the first round, and means the probability to cooperate after obtaining payoff in the previous round (see Eq. 1). Here, a zero-determinant (ZD) strategy has a positive parameter , and its other parameter lies in the unit interval[9,13,19].
| Strategy | Description |
|---|---|
| AllC | (1, 1, 1, 1, 1) |
| AllD | (0, 0, 0, 0, 0) |
| Tit-for-tat (TFT) | (1, 1, 0, 1, 0) |
| Generous TFT | (1, 1, |
| Tit-for-two-tats (TF2T) | Defect if the co-player defected in the previous two rounds. |
| Win-Stay-Lose-Shift (WSLS) | (1, 1, 0, 0, 1) |
| generous ZD | |
| extortionate ZD | |
| Trigger | Defect if defection has ever been observed. |
Figure 1(Left) A schematic diagram of the strategy space. Strategies that tend to cooperate (defect) are shown on the left (right). The blue ellipse represents a set of efficient strategies, which are cooperative to sustain mutual cooperation, and its subset of partner strategies is denoted by the dashed blue curve. On the other hand, the red ellipse represents a set of defensible strategies, which often defect to defend themselves from malicious co-players. In general, their intersection is small. When , for instance, the sizes of efficient and defensible sets are 7639 and 2144, respectively, whereas the intersection contains only eight strategies. (Right) The diamond depicts the region of possible average payoffs for Alice and Bob. The blue triangle shows the feasibility region when Alice uses a defensible strategy. If Alice and Bob both use the same strategy satisfying efficiency, they will reach (R, R) (the blue dot).
Recovery paths to mutual cooperation for the memory-three successful strategies. Only the most common five patterns are shown in this table.
| Action sequence | # of strategies |
|---|---|
The upper and lower rows represent the sequences of actions taken by Alice and Bob, respectively, when Bob defected from mutual cooperation by error. The right column shows the number of strategies having each pattern, as well as its fraction with respect to the total number of successful strategies.
Action table of CAPRI.
The superscript on the upper left corner of each element indicates which rule is involved.
Figure 2(a) Automaton representation of CAPRI. Its prescribed actions are denoted by the node colours (blue for c and red for d). The labels on the edges indicate the players’ actions. The transition caused by erroneous defection at mutual cooperation (‘0’) is depicted by an orange dashed arrow. (b) Distribution of payoffs when Alice’s strategy is CAPRI (left) or TFT-ATFT (right), whereas Bob adopts one of probabilistic memory-three strategies uniformly at random. The elementary payoffs are .
Figure 3(a) Abundance of memory-one partners, rivals, and the other strategies. We consider a simplified version of the PD game, parametrized by b, the benefit to the co-player when a player cooperates with a unit cost. In terms of the elementary payoffs, this corresponds to , , , and . The Moran process is simulated with selection strength in a population of size N, where the product is fixed as 10. Three parameters (benefit-cost-ratio b, population size N, and error rate e) are varied one by one[3]. Their default values are unless otherwise stated. We also show the simulation results with (b) TFT-ATFT, (c) CAPRI, and (d) both TFT-ATFT and CAPRI, introduced with probability . These are average results over 10 Monte–Carlo runs.