Literature DB >> 22207832

Neural network models of learning and categorization in multigame experiments.

Davide Marchiori1, Massimo Warglien.   

Abstract

Previous research has shown that regret-driven neural networks predict behavior in repeated completely mixed games remarkably well, substantially equating the performance of the most accurate established models of learning. This result prompts the question of what is the added value of modeling learning through neural networks. We submit that this modeling approach allows for models that are able to distinguish among and respond differently to different payoff structures. Moreover, the process of categorization of a game is implicitly carried out by these models, thus without the need of any external explicit theory of similarity between games. To validate our claims, we designed and ran two multigame experiments in which subjects faced, in random sequence, different instances of two completely mixed 2 × 2 games. Then, we tested on our experimental data two regret-driven neural network models, and compared their performance with that of other established models of learning and Nash equilibrium.

Entities:  

Keywords:  categorization; cross-game learning; learning; mixed strategy equilibrium; neural networks; regret; repeated games

Year:  2011        PMID: 22207832      PMCID: PMC3246315          DOI: 10.3389/fnins.2011.00139

Source DB:  PubMed          Journal:  Front Neurosci        ISSN: 1662-453X            Impact factor:   4.677


Introduction

In everyday life, interactive as well as individual decision problems very rarely repeat themselves identically over time; rather, the experience on which most human learning is based comes from the continuous encounter of different instances of different decision tasks. The current paper proposes an experimental study in which subjects faced different instances of two interactive decision problems (games), making a step forward in the realism of the strategic situations simulated in the lab. Specifically, subjects played in sequence different completely mixed games, each obtained by multiplying the payoffs of one of two archetypal games for a randomly drawn constant. In each sequence, the perturbed payoff games of the two types were randomly shuffled. Thus, at each trial, subjects’ task was twofold: recognize the type of the current game and act in accordance to this categorization. In spite of its evident economic relevance, the topic of human interactive learning in mutating strategic settings has not received until now much attention, from both an experimental and modeling perspective. One important stream of literature on this topic includes studies in which the experimental design is recognizably divided into two parts, according to which the repeated play of a stage game is followed by the repeated play of another one. The main goal of these studies is that of assessing the effects of learning spillovers (or transfer) from the first to the second part of the experiment (as in Kagel, 1995; Knez and Camerer, 2000; Devetag, 2005), also conditional to different environmental and framing conditions (as in Cooper and Kagel, 2003, 2008). In a different experimental paradigm, Rankin et al. (2000) propose a design in which players faced sequences of similar but not identical stag-hunt games, and whose goal is that of evaluating the basins of attractions of the risk- and payoff-dominant strategies in the game space. Our experimental design distinguishes from those illustrated above for two key features. First, subjects played different instances of two different games, and, second, the instances of the two games occurred in random order, thus without inducing any evident partition in the experiment structure; at the beginning of our experiments, subjects were only told that they would have faced a sequence of interactive decision problems. From the modeling perspective, a similarity-based decision process was for the first time formalized in the “Case-Based Decision Theory” (Gilboa and Schmeidler, 1995), according to which decisions are made based on the consequences from actions taken in similar past situations. Besides, the case-based approach was for the first time applied to game theory with the “fictitious play by cases” model proposed by LiCalzi (1995). This model addresses the situation in which players play sequentially different games, and the play in the current game is only affected by experiences with past similar games. In this vein, Sgroi and Zizzo (2007, 2009) explore neural networks’ capability of learning game-playing rules and of generalizing them to never previously encountered games. The authors show that back-propagations neural network can learn to play Nash pure strategies, and use these skills when facing new games with a success rate close to that observed in experiments with human subjects. The contribution by Marchiori and Warglien (2008) has shown that, in repeatedly played completely mixed games, reinforcement learning models have limited predictive power, and that the best predictors, i.e., a fictitious play model and a neural network fed back by a measure of regret, have substantially the same accuracy. The current paper extends this research and shows that the added value of modeling learning by means of neural networks is that of capturing subjects’ sensitivity to dynamic changes in the payoff structure. Specifically, we introduce a variant of the zero-parameter Perceptron-Based (PB0) model, which we call SOFTMAX-PB0, test these two neural network models on the data from our multigame experiments, and compare their performance with that of other established learning models and Nash equilibrium.

The Multigame Experiments

The current paper proposes two multigame experiments, whose goal is that of improving our understanding of the processes of categorization in games. Eight groups of eight subjects each participated in the experiments, and each group played a different sequence of 120 games (see Table A3 in Appendix). Within each group, half of the subjects were assigned the role of row player and the others that of column player; at each round, subjects assigned to different roles were randomly and anonymously paired. At the end of each round, subjects were provided with feedback about their and their opponents’ actions and payoffs.
Table A3

Two of the game sequences played in Experiment 1 and 2.

Profile:Sequence 1 – Experiment 1
Sequence 1 – Experiment 2
U, L
U, R
D, L
D, R
U, L
U, R
D, L
D, R
Game 1190561796789156190569327873243769327
Game 25819923235472111291291203511342569912035
Game 31524414353711251524416649156587813716649
Game 4742170263561742112537111511115113329
Game 523569221831101942356915646147557312815646
Game 6331151313527121747418654175658715318654
Game 7135391274763111135399126803780379621
Game 8722442828857259158158106319343934311225
Game 9341181313927125767616348144671446717338
Game 10421421616833151929214843131611316115734

Game 1152180212114219011611616247142661426617138
Game 121705016060801401705017050160608014017050
Game 13511742020541185113113105309243924311124
Game 1453183212154319411811819557184699216119557
Game 151243611744581021243621262199749917421262
Game 1662213252505022513713718855177668815518855
Game 1723268218811091912326814141124581245814933
Game 186421925258512321411417421653065307817
Game 192186420577102180218642988726312226312231570
Game 201925618168901581925618354172648615118354

Game 2160205242424821713313317351153711537118340
Game 2210029943547821002995278339833910022
Game 234916619196391761071071063110037508710631
Game 24148431395269122148439728913445809728
Game 257927031318632861751751093210238519010932
Game 2624672232871162032467218955178668915518955
Game 2761210242474922213513514843131611316115734
Game 281424213450671171424215846140651406516837
Game 2922967215801071882296716147142661426617138
Game 3048166191953917510710715345144547212615345

Game 3154183212164319411811816448145671456717438
Game 322035919171951672035918253171648515018253
Game 33722452828857259158158109329645964511625
Game 3452179212104218911611617050150701507018040
Game 351705016060801401705012837120456010512837
Game 361514414253711241514416648156587813616648
Game 3717150161608014117150220642077710318122064
Game 3861209242464922213513516448145671456717438
Game 3944150171763515897977221642964297717
Game 40146431375168120146439227863243759227

Game 4124070225841121972407014442135506711814442
Game 4274252292965926716316324070212992129925556
Game 43240702268511319824070381133153315408
Game 441223511443571001223518955167781677820144
Game 4547161181893717010410414542137516811914542
Game 46411401616533149919115144133621336216035
Game 472035919171951672035919056179678915619056
Game 4852180212114219011611614643137516812014643
Game 4961209242464922113513518253160751607519342
Game 5018053169638414818053216632037610117821663

Game 511985818770931631985823970211982119825356
Game 52196571856992162196571153410940549511534
Game 53284832681001342342848314241134506711714241
Game 5444152171793516198989026843142749026
Game 55341171313827124757515244134621346216135
Game 562691101072197595915646138641386416536
Game 5757196232314620812712721061185861858622249
Game 58121351134256991213523870210982109825256
Game 591875517666881541875518053170638514818053
Game 605418421217431951191198224772938678224

Game 6161209242454922113513517752156731567318841
Game 62351211414228128787814141133496611614141
Game 63217388717784747235692218311019423569
Game 64208611967398171208612788224611424611429565
Game 6516347153577613416347235692218311019323569
Game 66421431616833151929220460180841808421648
Game 67311071212625114696920159178831788321347
Game 6825875243911212122587515946150567513115946
Game 691634815357761341634816348153577613416348
Game 701253711844591031253716147142661426617037

Game 7147162191913817110510517752167628314617752
Game 721474313852691211474316949159597913916949
Game 731864775156741415917522452246314
Game 741454213651681191454218554163761637619643
Game 7538130151533013884841083110238518910831
Game 7613239124466210913239218642057710218021864
Game 7779271313196328717517516849148691486917839
Game 78118341114155971183412837112521125213530
Game 791233611643581011233618755165771657719844
Game 804415217178351619898225662127910618622566

Game 8154186212194319712012012135107501075012828
Game 82125361174458103125362697923711023711028463
Game 8323268218821091912326823769209972099725155
Game 8461210242474922313613619657185699216219657
Game 8541140161653314890905516521926455516
Game 86291001111823106656520259190719516620259
Game 871554514654731281554512536118445910312536
Game 881313812346611071313812135114425710012135
Game 8958198232334621012812818755165771657719844
Game 9031592296111148259315922567522610522610527160

Game 9120058188709416520058228672158010718822867
Game 9222064207771031812206413539119551195514331
Game 93431461717234155959515545146547312815545
Game 94161471525776133161471163410940549511634
Game 95186541756587153186542707923811123811128663
Game 96301041212324111676715144142537112515144
Game 9755189222224420012212214843140527012214843
Game 9821864205771021792186419557172801728020645
Game 9951174202054118411311319356170791707920445
Game 10062211242494922413713719657173801738020846

Game 1012025919071951662025914542136516811914542
Game 10267230272715424414914915645137641376416536
Game 10362212252505022513713716949149691496917939
Game 1041955718368911601955712336108501085013028
Game 10526678250941252192667812938121456010612938
Game 10664218252575123114114112336116435810112336
Game 10710330973648851033021864192891928923051
Game 10845154181813616310010015244134621346216135
Game 1094013716162321458989228672158010718822867
Game 1101113210439529111132227662148010718722766

Game 111441501717735159979715946150567513115946
Game 11247159181883716910310318955167781677820144
Game 113291011111923107656518354172648615118354
Game 114135391274763111135395917562128495917
Game 1151434213550671181434220861184851848522149
Game 116441511717835160989816047141661416617037
Game 1172076119573971712076114643129601296015534
Game 118321091212925116707020460180841808421648
Game 1191955718368911601955713840130486511313840
Game 120110321043952911103219958188709416419958
The experimental design is summarized in Table 1.
Table 1

The two pairs of completely mixed .

Archetypal games
Game A
Game B
Experiment 1Player 2LRPlayer 2LR
Player 1Player 1
U17, 516, 6U5, 172, 20
D8, 1417, 5D4, 1811, 11
Nash Eq.: P(U) = 0.9, P(L) = 0.1Nash Eq.: P(U) = 0.7, P(L) = 0.9
Game A
Game C

Experiment 2Player 2LRPlayer 2LR
Player 1Player 1
U17, 516, 6U17, 515, 7
D8, 1417, 5D15, 718, 4
Nash Eq.: P(U) = 0.9, P(L) = 0.1Nash Eq.: P(U) = 0.6, P(L) = 0.6
The two pairs of completely mixed .

Experiment 1

Four groups of subjects played four game sequences built starting from two 2 × 2 constant-sum games (henceforth game A and game B; see Table 1). Game A and B payoffs were chosen in such a way that equilibrium probabilities for one player were not so different [respectively, P(U) = 0.9 and 0.7], whereas the other player was supposed to reverse his/her strategy [respectively, P(L) = 0.1 and 0.9]. Moreover, to get a balanced experimental design, payoffs in each cell of the two games where chosen to sum up to the same constant. To build each sequence, 60 “type A” games were obtained by multiplying game A’s payoffs for 60 randomly drawn constants (normally distributed with mean 10 and SD 4). The same procedure was used to obtain 60 “type B” games. Type A and B games were then shuffled in such a way that in each block of 10 trials there were five type A and five type B games in random order. Thus, in each block of 10 trials subjects could face the same number of type A and type B games.

Participants

Thirty-two students from the faculties of Economics, Law, and Sociology of the University of Trento (Italy) participated in Experiment 1. Subjects were paid based on their cumulated payoff in 12 randomly selected trials plus a show-up fee (see Experimental Instructions in Appendix).

Results

Figure 1 reports the relative frequency of U and L choices in blocks of 10 trials, separately for type A and B games.
Figure 1

Observed proportions of .

Observed proportions of . Observed behavior in type A games is not well approximated by Nash equilibrium. Row players play Nash mixture in the first two blocks [for which P(U) = 0.89], but the proportion of U choices eventually converges to 0.74. As for the column players, play starts close to random behavior in the first block and converges to 0.33, higher than the 0.1 predicted by Nash’s theory. The predictive power of Nash equilibrium in type B games is also rather poor. In equilibrium, row players are supposed to choose action U with probability 0.7, whereas observed play converges to the relative frequency of 0.9. Column players are predicted to choose action L 90% of the times, but the observed proportion converges, from the third block, to about 0.4.

Experiment 2

Experiment 2 was identical to the previous one, except for the fact that games A and C were used to build the four sequences (see Table 1). Game C was chosen in such a way that equilibrium probabilities were, for both players, close to equal chance; thus, no reversal of choice strategies was implied. Also in this case, in each cell of games A and B, payoffs sum up to the same constant. Thirty-two students from the faculties of Economics, Law, and Sociology of the University of Trento (Italy) participated in Experiment 2. Subjects were paid based on their cumulated payoff in 12 randomly selected trials plus a show-up fee (see Experimental Instructions in Appendix). Figure 2 illustrates the results from Experiment 2. The relative frequency of U choices in type A games is systematically higher than that predicted by Nash’s theory, similarly to what happened in Experiment 1. It is interesting to note that, in type C games, empirical behavior of both row and column players eventually converges to Nash play [P(U) = P(L) = 0.6], confirming that Nash equilibrium is a good predictor (at least in the long run) when predicted choice probabilities are close to 0.5 (Erev and Roth, 1998; Erev and Haruvy, in preparation).
Figure 2

Observed proportions of .

Observed proportions of .

Cross-game learning

The question of how play in type A games is affected by the simultaneous play of games of a different kind can be easily answered by comparing choice frequencies in type A games in the two experiments. To this end, we ran a two-way, repeated measures analysis (results are summarized in A1 and A2 in Appendix), in which we tested the effects of the variables Experiment (i.e., the experimental condition) and Time, and of their interaction on choice frequencies for both row and column players. As a result, the variable Experiment has no significant effect, implying that no cross-game learning is taking place. We conclude that, when games of just two types are present, subjects are able to recognize the two strategic situations and act without confounding them.

The Model

Since when McCulloch and Pitts (1943) introduced the first neuronal model in 1943, artificial neural networks have usually been intended as mathematical devices for solving problems of classification and statistical pattern recognition (see for example, Hertz et al., 1991; Bishop, 1995). For this reason, neural network-based learning models are the most natural candidates for predicting data from our multigame experiments, wherein a categorization task is implicit. We present here a variant of the PB0 model proposed in Marchiori and Warglien (2008), which we call SOFTMAX-PB0. This model is a simple perceptron, i.e., a one-layer feed-forward neural network (Rosenblatt, 1958; Hopfield, 1987); its input units (labeled with in) are as many as the game payoffs, whereas its output units (labeled with out) are as many as the actions available to a player. Different from the PB0 model, according to SOFTMAX-PB0, the activation states of output units are determined via the softmax rule (1), and can thus be readily interpreted as choice probabilities. The term w in (1) is the weight of the connection from input unit in to output unit out. Compared to the use of the tanh activation function, calculating activation states via the softmax rule avoids the premature saturation of output units, and in general results in a better fit of the data and has important theoretical implications. Adaptive learning from time step t − 1 to time step t occurs through modifications in the connection weights as follows: with: In the current model, the parameter λ that appears in (3) is replaced by a deterministic function, whose value at time step t is defined as the ratio between the experienced cumulated regret and the maximum cumulated regret. It is worth noting that the SOFTMAX-PB0 is non-parametric, as also in the softmax activation function (1) no free parameters are introduced. In (3), targ is the ex-post best response to the other players’ actions, and it is equal to one if action j was the best response, and zero otherwise. Finally, the regret term is simply defined as the difference between the maximum obtainable payoff given other players’ actions and the payoff actually received. The SOFTMAX-PB0 and the PB0 models, behavior is the result of adjustments in the direction of the ex-post best response (ex-post rationalizing process), and these adjustments are proportional to a measure of regret, consistently with findings in the neuroscientific field (Coricelli et al., 2005; Daw et al., 2006). The SOFTMAX-PB0 model, as well as the PB0 one, presents some architectural analogies with established models of learning in games, but it has also some peculiar features that differentiate it from its competitors, as illustrated in Figure 3. Established learning models have two main cyclic component processes: (1) behavior is generated by some stochastic choice rule that maps propensities into probabilities of play; (2) Learning employs feedback to modify propensities, which in turn affect subsequent choices.
Figure 3

Adapted from Marchiori and Warglien (.

Adapted from Marchiori and Warglien (. The (SOFTMAX-)PB0 model’s architecture is only partially similar to that of the other learning models. What distinguishes our models is the direct dependence of choice behavior upon game payoffs (represented in the “input layer”). Whereas in a typical economic learning model choice is a function of propensities only, here it is function of both propensities and the payoffs of the game. This architecture provides the (SOFTMAX-)PB0 model with a peculiar capability to discriminate among different games. Conventional learning models in economics are designed for repeated games. There is learning, but no discrimination or generalization: the simulated agent is unable to discriminate between different games at a certain moment; if given abruptly a different game, it would respond in the same way, or just throw away what it had previously learned. On the other hand, discrimination is something perceptrons do very well, and since the output is also directly affected by perceived inputs (the activation states of input units), a network, besides learning, will respond differently to different games.

The sampling paradigm for modeling learning

Particularly relevant to the current analysis are the two contribution by Erev (2011) and by Gonzalez and Dutt (2011), in which the INERTIA SAMPLING AND WEIGHTING (I-SAW) and INSTANCE BASED LEARNING (IBL) models are proposed. According to these models, agents are supposed to make their decisions based on samples from their past experience. These models have been shown to capture important regularities of human behavior in decisions from experience (Erev et al., 2010; Gonzalez et al., 2011). The most obvious way of modifying these models in order to perform conditional behavior is that of considering agents that draw from a subset of past experiences that are relevant to the current decision task. However, such an implementation would imply an exogenous intervention for the classification of the situation at hand, requiring an explicit theory of what is similar/relevant to what. On the other hand, the modeling approach based on sampling easily gives account for learning spillover effects (Marchiori et al., unpublished). However, the classification operated by the (SOFTMAX-)PB0 model is endogenous; agents just observe inputs and respond to them without any external intervention and the entire process of classification is implicit in the structure of the model itself.

Materials and Methods

Predicted choice frequencies were obtained by averaging results over 150 simulations, and, for parametric models, this procedure was repeated for each parameter configuration. Table 2 collects the description of the portions of the parameter spaces investigated.
Table 2

Explored portions of parameter spaces and the parameter configurations yielding the lowest average MSD in the two experiments.

ModelPortions of parameter spaces consideredBest fit parameters
NFPλ in [1.5, 4.0] by = 0.25w in [0.1, 0.9] by = 0.1λ = 4.0, w = 0.7
NRLλ in [3.0, 7.0] by = 0.5w in [0.10, 0.90] by = 0.05λ = 5.5, w = 0.50
REλ in [2.2, 3.4] by = 0.1N(1) in [27, 34] by = 1λ = 2.7, N(1) = 31
RLλ in [6.0, 10.0] by = 0.5w in [0.10, 0.90] by = 0.05λ = 10.0, w = 0.50
SFPλ in [10.0, 14.0] by = 0.5w in [0.05, 0.90] by = 0.05λ = 13.0, w = 0.75
stEWAλ in [1, 9] by = 0.1λ = 5.8
Explored portions of parameter spaces and the parameter configurations yielding the lowest average MSD in the two experiments. We tested models’ predictive power by considering estimated choice frequencies corresponding to the parameter configurations that minimized the mean square deviation (henceforth MSD; Friedman, 1983; Selten, 1998) in our two experiments. Considering average MSD scores in the two experiments does not penalize directly the number of free parameters of a model; therefore, in this analysis, parametric models are advantaged over the non-parametric PB0 and SOFTMAX-PB0 ones. In our comparative analysis, we considered the following learning models: normalized fictitious play (NFP; Erev et al., 2007); normalized reinforcement learning (NRL; Erev et al., 2007); Erev and Roth’s reinforcement learning (REL; Erev and Roth, 1998); reinforcement learning (RL; Erev et al., 2007); stochastic fictitious play (Erev et al., 2007); and self-tuning experience weighted attraction (stEWA; Ho et al., 2007). Section “Competitor Models and Investigated Portions of Parameter Spaces” in Appendix briefly reviews these models.

Simulation Results and Discussion

Although simple perceptrons suffer severe theoretical limitations in the discrimination tasks they can carry out (Minsky and Papert, 1969; Hertz et al., 1991), our simulation results show that they are nonetheless able to discriminate between two different strategic situations and predict well choice behavior observed in our multigame experiments. Simulation results are collected in Figure 4 and, more in detail, in Tables 3 and 4.
Figure 4

Predicted and observed choice frequencies in Experiment 1 (top panels) and 2 (lower panels).

Table 3

Predicted and observed choice frequencies in Experiment 1.

MSDType A games
Type B games
Blocks123456123456
EmpiricalP(U)0.890.890.840.810.760.740.160.320.410.390.420.38
P(L)0.530.400.260.260.260.260.740.780.850.860.910.91
Nash0.053P(U)0.900.900.900.900.900.900.700.700.700.700.700.70
P(L)0.100.100.100.100.100.100.900.900.900.900.900.90
NFP0.081P(U)0.550.650.570.560.580.510.620.510.630.600.560.67
P(L)0.590.500.600.540.640.550.640.650.600.700.570.62
NRL0.094P(U)0.590.400.680.650.640.640.580.430.680.660.640.64
P(L)0.650.730.540.530.540.570.670.700.530.530.540.55
PB00.024P(U)0.780.960.970.960.950.950.490.350.260.280.160.25
P(L)0.770.680.470.380.350.330.710.800.850.860.930.87
REL0.076P(U)0.500.490.510.510.510.490.510.480.510.500.490.49
P(L)0.500.500.500.490.500.490.500.500.510.500.490.49
RL0.097P(U)0.600.370.640.630.630.640.570.400.640.630.630.64
P(L)0.600.700.470.490.510.570.620.670.470.490.510.55
SFP0.094P(U)0.570.530.670.590.510.500.630.500.480.420.600.60
P(L)0.540.500.510.320.520.410.580.690.650.790.680.69
SOFTMAX-PB00.018P(U)0.840.970.950.900.900.920.460.280.250.310.310.31
P(L)0.760.560.320.390.510.460.720.820.870.830.890.82
stEWA0.086P(U)0.620.640.640.640.640.640.670.640.640.640.640.64
P(L)0.720.750.750.750.750.750.740.750.750.750.750.75

The second column from the left reports the MSD scores associated to each model. For parametric models, predicted frequencies have been obtained with the parameter configuration reported in the fourth column of Table .

Table 4

Predicted and observed choice frequencies in Experiment 2.

MSDType A games
Type C games
Blocks123456123456
EmpiricalP(U)0.880.880.860.920.880.860.420.600.600.680.560.63
P(L)0.510.430.440.430.330.420.780.750.820.700.630.60
Nash0.034P(U)0.900.900.900.900.900.900.600.600.600.600.600.60
P(L)0.100.100.100.100.100.100.600.600.600.600.600.60
NFP0.036P(U)0.620.690.590.660.610.650.620.550.640.640.640.57
P(L)0.500.500.570.500.560.510.570.510.460.540.500.52
NRL0.047P(U)0.720.920.870.580.560.700.760.850.860.610.550.70
P(L)0.520.490.450.430.490.490.560.570.600.460.490.49
PB00.029P(U)0.750.850.890.880.950.930.780.730.560.500.530.46
P(L)0.570.550.550.510.610.700.540.450.520.560.570.57
REL0.055P(U)0.510.490.510.510.500.510.500.510.520.500.510.51
P(L)0.500.500.490.500.510.490.500.510.500.500.490.49
RL0.053P(U)0.710.930.880.560.520.680.750.870.870.590.500.68
P(L)0.530.490.470.410.520.520.550.590.590.440.520.52
SFP0.036P(U)0.620.710.590.610.600.700.590.500.590.600.630.40
P(L)0.510.480.550.410.460.500.580.510.470.560.560.60
SOFTMAX-PB00.028P(U)0.770.830.870.850.930.900.800.680.540.560.590.48
P(L)0.560.610.550.490.600.700.520.460.530.610.560.58
stEWA0.058P(U)0.540.540.540.540.540.540.540.540.540.540.540.54
P(L)0.610.630.640.640.640.640.610.630.640.640.640.64

The second column from the left reports the MSD scores associated to each model. For parametric models, predicted frequencies have been obtained with the parameter configuration reported in the fourth column of Table .

Predicted and observed choice frequencies in Experiment 1. The second column from the left reports the MSD scores associated to each model. For parametric models, predicted frequencies have been obtained with the parameter configuration reported in the fourth column of Table . Predicted and observed choice frequencies in Experiment 2. The second column from the left reports the MSD scores associated to each model. For parametric models, predicted frequencies have been obtained with the parameter configuration reported in the fourth column of Table . Predicted and observed choice frequencies in Experiment 1 (top panels) and 2 (lower panels). Established learning models are not able to discriminate between the two different game structures, providing the same “average” behavior for both types of games (see Tables 3 and 4), and are always outperformed by Nash equilibrium. On the contrary, the SOFTMAX-PB0 and PB0 models are able to replicate subjects’ conditional behavior, due to the direct dependence of their response on game payoffs, remarkably outperforming Nash equilibrium and all the other models of learning considered in this analysis. Comparison of the performance of the PB0 and SOFTMAX-PB0 models shows how the introduction of the softmax rule for calculating output units’ activations improves the fit of the data.

Cross-game learning

As reported at the end of Section “The Multigame Experiments,” our experimental data do not provide evidence of cross-game learning. In regard to this, simulation results show that there is a partial qualitative parallelism between the (SOFTMAX-)PB0 model’s predictions and observed behavior. For example, for the row player, the (SOFTMAX-)PB0 model provides very similar trajectories in the two experiments. However, if we consider column player’s predicted behavior, the (SOFTMAX-)PB0 model produces very different trajectories in the two experiments. This might imply that the (SOFTMAX-)PB0’s structure is not complex enough to completely avoid spillover effects across games, although this aspect would deserve a more systematic investigation. However, it is not difficult to imagine situations in which learning spillovers do take place and this feature of the (SOFTMAX-)PB0 model would turn out to be advantageous.

Conclusion

The present paper presents an experimental design in which subjects faced a sequence of different interactive decision problems, making a step forward in the realism of the situations simulated in the lab. The problems in the sequences were different instances of two 2 × 2 completely mixed games. Thus, at each trial, subjects’ task was twofold: recognize the type of the current decision problem, and then act according to this categorization. Our experimental results show that subjects are able to recognize the two different game structures in each sequence and play accordingly to this classification. Moreover, our experimental data do not provide evidence of cross-game learning, as there are no significant differences in the play of type A games in the two experiments. Our experiments were designed with the precise goal of testing the discrimination capability of the PB0 and SOFTMAX-PB0 neural network models in comparison with that of other established models of learning proposed in the Psychology and Economics literature. Simulation results show that traditional “attraction and stochastic choice rule” learning models are not able to discriminate between the different strategic situations, providing a poor “average” behavior, and are always outperformed by Nash equilibrium. On the contrary, the (SOFTMAX-)PB0 model is able to replicate subjects’ conditional behavior, due to the direct dependence of its response on game payoffs, and performs better than standard theory of equilibrium. This latter fact is particularly remarkable; in our experiments, the two classes of games were built based on their Nash equilibrium, so that the classification was induced by the different equilibrium predictions. On the contrary, our neural network models of adaptive learning were able to classify the different game structures without any external and predetermined partition of the game space. We are well aware of the need for a more systematic and comprehensive analysis of categorization in games. Further experimental research could focus, for example, on sequences with more than two types of games, or on the effects of different degrees of payoff perturbations on learning spillovers.

Conflict of Interest Statement

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
Table A1

Two-way, repeated measures ANOVA (row players).

dfSum SqMean SqF valuePr(>F)
Experiment10.030.030.900.38
Residuals60.230.04

We tested the model Proportion (.

Table A2

Two-way, repeated measures ANOVA (Column players).

dfSum SqMean SqF valuePr(>F)
Experiment10.110.111.070.34
Residuals60.610.10

We tested the model Proportion (.

  8 in total

1.  The perceptron: a probabilistic model for information storage and organization in the brain.

Authors:  F ROSENBLATT
Journal:  Psychol Rev       Date:  1958-11       Impact factor: 8.934

2.  Regret and its avoidance: a neuroimaging study of choice behavior.

Authors:  Giorgio Coricelli; Hugo D Critchley; Mateus Joffily; John P O'Doherty; Angela Sirigu; Raymond J Dolan
Journal:  Nat Neurosci       Date:  2005-08-07       Impact factor: 24.884

3.  Predicting human interactive learning by regret-driven neural networks.

Authors:  Davide Marchiori; Massimo Warglien
Journal:  Science       Date:  2008-02-22       Impact factor: 47.728

4.  A logical calculus of the ideas immanent in nervous activity. 1943.

Authors:  W S McCulloch; W Pitts
Journal:  Bull Math Biol       Date:  1990       Impact factor: 1.758

5.  Cortical substrates for exploratory decisions in humans.

Authors:  Nathaniel D Daw; John P O'Doherty; Peter Dayan; Ben Seymour; Raymond J Dolan
Journal:  Nature       Date:  2006-06-15       Impact factor: 49.962

Review 6.  Instance-based learning: integrating sampling and repeated decisions from experience.

Authors:  Cleotilde Gonzalez; Varun Dutt
Journal:  Psychol Rev       Date:  2011-10       Impact factor: 8.934

7.  Increasing Cooperation in Prisoner's Dilemmas by Establishing a Precedent of Efficiency in Coordination Games.

Authors: 
Journal:  Organ Behav Hum Decis Process       Date:  2000-07

8.  Learning algorithms and probability distributions in feed-forward and feed-back networks.

Authors:  J J Hopfield
Journal:  Proc Natl Acad Sci U S A       Date:  1987-12       Impact factor: 11.205

  8 in total
  5 in total

1.  The neuroscience and psychophysiology of experience-based decisions: an introduction to the research topic.

Authors:  Eldad Yechiam; Itzhak Aharon
Journal:  Front Psychol       Date:  2012-06-05

2.  Transfer of conflict and cooperation from experienced games to new games: a connectionist model of learning.

Authors:  Leonidas Spiliopoulos
Journal:  Front Neurosci       Date:  2015-03-31       Impact factor: 4.677

3.  Toward a general theoretical framework for judgment and decision-making.

Authors:  Davide Marchiori; Itzhak Aharon
Journal:  Front Psychol       Date:  2015-02-17

4.  Why Are There Failures of Systematicity? The Empirical Costs and Benefits of Inducing Universal Constructions.

Authors:  Steven Phillips; Yuji Takeda; Fumie Sugimoto
Journal:  Front Psychol       Date:  2016-08-31

5.  Hierarchical decision-making produces persistent differences in learning performance.

Authors:  Thorbjørn Knudsen; Davide Marchiori; Massimo Warglien
Journal:  Sci Rep       Date:  2018-10-25       Impact factor: 4.379

  5 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.