| Literature DB >> 34187189 |
Minjae Kim1, Jung-Kyoo Choi2, Seung Ki Baek1.
Abstract
Evolutionary game theory assumes that players replicate a highly scored player's strategy through genetic inheritance. However, when learning occurs culturally, it is often difficult to recognize someone's strategy just by observing the behaviour. In this work, we consider players with memory-one stochastic strategies in the iterated Prisoner's Dilemma, with an assumption that they cannot directly access each other's strategy but only observe the actual moves for a certain number of rounds. Based on the observation, the observer has to infer the resident strategy in a Bayesian way and chooses his or her own strategy accordingly. By examining the best-response relations, we argue that players can escape from full defection into a cooperative equilibrium supported by Win-Stay-Lose-Shift in a self-confirming manner, provided that the cost of cooperation is low and the observational learning supplies sufficiently large uncertainty.Entities:
Keywords: Bayesian inference; Win-Stay-Lose-Shift; evolution of cooperation; observational learning; reciprocity
Mesh:
Year: 2021 PMID: 34187189 PMCID: PMC8242928 DOI: 10.1098/rspb.2021.1021
Source DB: PubMed Journal: Proc Biol Sci ISSN: 0962-8452 Impact factor: 5.349
Best response among M1 pure strategies. Against each strategy in the first column, we obtain the best response (the second column), and the resulting average payoff (equation (2.5)) earned by the best response is given as a power series of ε in the third column. In the second column, we have placed a dagger next to a strategy when it is the best response to itself.
| opponent strategy | best response | payoff of the best response to the opponent strategy | Misc. |
|---|---|---|---|
| (1 − | AllD | ||
| 1/2 − (1/4 + | |||
| (1 − | |||
| 1/2 − | |||
| 1/3 + (2/9 − | |||
| 1 − (2 + | |||
| 1 − 3(1 + | |||
| 1 − (2 + | |||
| GT1 | |||
| WSLS | |||
| (1 − | TFT | ||
| 1/2 + | |||
| 1 − (1 + | |||
| 1 − 2(1 + | |||
| 1 − (1 + | AllC |
Figure 1Graphical representation of best-response relations in table 1. If is the best response to , we represent it as an arrow from to . The blue node (Win-Stay-Lose-Shift) means an efficient NE with 1 − v ∼ O(ε), whereas the red nodes (Always-Defect and M1 Grim Trigger) mean inefficient ones with as shown in table 2. (Online version in colour.)
Stationary probability distribution , where we have retained only the leading-order term in the ε-expansion for each v. When we describe a strategy in binary, the boldface digits are the ones that are frequently observed with v ∼ O(1) and thus readily identifiable as long as M ≫ 1. In this table, the eight strategies in Category I have three or four such digits, so if the population is using one of these strategies, Alice can tell which one is being played after M ( ≫ 1) observations. As for Category II, the member strategies d1 and d7 would be indistinguishable if M ≪ ε−1 because they differ at their non-boldface digits. Still, Alice can find the best response d0 which is common to both of them (table 1). In Category III, each member strategy has just one boldface digit, so the strategies as well as the best responses can be identified only if M ≫ ε−1.
| category | strategy | ||||
|---|---|---|---|---|---|
| I | |||||
| II | |||||
| III | 1 | ||||
| 2 | 1 | ||||
| 1 | |||||
| 1 | 2 | ||||
| 1 | |||||
| 1 |
Figure 2Best-looking responses to maximize the expected payoff under uncertainty in observation, when 1 ≪ M ≪ ε−1. Compared with figure 1, the first difference is that Alice uses equation (2.7) against d0, d6 and d8. In addition, she will use equation (2.8) against d9, d14 and d15. (Online version in colour.)
Figure 3Effect of the prior on the observer’s choice. A point in the triangle represents three fractions, which sum up to one, and its distance to an edge is proportional to the fraction of the strategy at the opposite vertex [21]. (a) When the observer sees nearly defection only, the prior takes the form of (f0, f6, f8), for which we can find the strategy that gives the best expected payoff as written in each region. When c is low, d9 (WSLS) gives the highest expected payoff for most of the prior. (b) Even when the cost increases to c = 0.9, the observer should choose WSLS if the prior contains a sufficiently high fraction of d6. (c) If the observer sees cooperation almost all the time, the prior can be expressed as (f9, f14, f15). If c is low, WSLS can be the observer’s choice when f9 is high enough. (d) The region of WSLS disappears as c exceeds 1/2, and the only possible choice is between d1 and d0 (AllD).