| Literature DB >> 29630600 |
Laurent Dollé1, Ricardo Chavarriaga2, Agnès Guillot1, Mehdi Khamassi1.
Abstract
We present a computational model of spatial navigation comprising different learning mechanisms in mammals, i.e., associative, cognitive mapping and parallel systems. This model is able to reproduce a large number of experimental results in different variants of the Morris water maze task, including standard associative phenomena (spatial generalization gradient and blocking), as well as navigation based on cognitive mapping. Furthermore, we show that competitive and cooperative patterns between different navigation strategies in the model allow to explain previous apparently contradictory results supporting either associative or cognitive mechanisms for spatial learning. The key computational mechanism to reconcile experimental results showing different influences of distal and proximal cues on the behavior, different learning times, and different abilities of individuals to alternatively perform spatial and response strategies, relies in the dynamic coordination of navigation strategies, whose performance is evaluated online with a common currency through a modular approach. We provide a set of concrete experimental predictions to further test the computational model. Overall, this computational work sheds new light on inter-individual differences in navigation learning, and provides a formal and mechanistic approach to test various theories of spatial cognition in mammals.Entities:
Mesh:
Year: 2018 PMID: 29630600 PMCID: PMC5908205 DOI: 10.1371/journal.pcbi.1006092
Source DB: PubMed Journal: PLoS Comput Biol ISSN: 1553-734X Impact factor: 4.475
Fig 1Model overview.
The proposed computational model is composed of four main modules. The Direction module uses model-free (MF) reinforcement learning to associate visual information encoded by Cue Cells with propositions of oriented movements encoded by Action Cells. This results in an orientation proposed to the gating network; The Planning module is a model-based (MB) system which builds by Hebbian learning a topological Planning Graph with Place Cells and proposes to the gating network an orientation of movement reflecting the shortest path to find the reward; The Exploration module proposes random orientations of movement to the Gating Network. The selection between the outputs of the three modules is learned by a separate associative module through model-free reinforcement learning. The inputs of the Direction and Planning modules (CC and PG) are linked to the units in the gating network. The gating values g (k = E, D or P, corresponding to Exploration, Direction and Planning modules) are weighted sums of the input values r (j = CC or PG) with weights . At each stimulated timestep, one among the modules is selected according to a winner-take-all scheme.
Fig 2(a-c) Experiment I: Original Morris water maze task by Morris et al. 1982 [34]. a) Simulated environment: gray disks represent schematized distal cues, the dotted circle represents the platform. b) Original results plotted as a learning curve of escape latency versus trials. c) Simulated results: Direction vs Planning. (d-f) Experiment II: Delayed Matching Task by Steele and Morris 1999 [35]. d) Simulated environment: black crosses represent starting locations, dotted circles represent the possible platform locations. The distal cues around the water maze are here not represented for the sake of clarity of the figure. e) Original results plotted as a learning curve of escape latency versus days. f) Simulated results: Direction and Planning versus Direction only. D: Direction; P: Planning. *** corresponds to significance level P < 0.001.
Fig 3(a-c) Experiment III. a) Environment. b) Original results plotted as a learning curve of escape latency versus trials. c) Simulated results of the Planning + Direction group vs Direction only group. (d-f) Experiment IV. d) Simulated environment: gray disks represent distal cues, dotted circles represent the platform e) Original results plotted as a learning curve of escape latency versus trials. f) Simulated results of the Planning + Direction group vs Direction only group.
Fig 4(a-c) Experiment V. a) Environment: gray disks represent distal cues, dotted circles represent the platform, gray cone represents the proximal cue b) Original results. c) Simulated results: time spent near the proximal cue (B) and the distal cue (F) by the full model combining Direction and Planning strategies. (d-f) Experiment VI. d) Environment: gray disks represent possible location of the platform, red dot and cone indicate cues. e) original results: escape latencies and time spent near the previous platform location during the test trial. f) Simulated results: time spent near the previous platform location in the Test trial by the full model combining Direction and Planning strategies. The horizontal dashed line represents chance level. *** and * correspond respectively to significance levels P < 0.001 and P < 0.05.
Fig 5Experiment V: Detailed simulation results.
a) Simulation results with either the Direction (D) strategy only (left) or with the Planning (P) strategy only (right) only. b) Strategy selection rate across sessions in the full model with all strategies (Direction, Planning, Exploration). c) Original experimental results with regards to the evolution of escape latencies during Stages 1 and 2 (adapted from [37]). d) Simulation results showing the evolution of escape latencies during Stages 1 and 2 in the full model, the model with D only and the model with P only. e) Occupancy rate in the F octant during the extinction trials in the three versions of the model.
Fig 6Experiment VI: Detailed simulation results.
a) Original experimental results of [38] showing escape latencies during the three sessions of Stage 2. (b-g) Simulation results. b) Time spent in the quadrant containing the previous platform location when the model-based Planning strategy in the model is replaced by a model-free Locale (L) strategy. c) Time spent in the quadrant containing the previous platform location with the model containing only the Direction (D) strategy and the Exploration strategy. d) Escape latencies of the full model (DP). e) Escape latencies of the DL model. f) Escape latencies of the D model. g) Selection rate of each strategy in each condition of the task for the full model (DP).
Summary table of the different models that could reproduce each experiment.
D: Direction strategy alone (combines different Direction modules in the case of multiple cues); P: Planning strategy alone; DP: combined Direction and Planning strategies; DL: combined Direction and Locale strategies.
| Exp | Reference | Main phenomenon | D | P | DP | DL |
|---|---|---|---|---|---|---|
| I | Morris et al., 1982 [ | Reference memory in the hidden water maze | No | Yes | Yes | Yes |
| II | Steele et al., 1999 [ | Delayed matching to place—Reaching a moving hidden goal | No | Yes | Yes | No |
| III | Devan & White 1999 [ | Competition between cue-guided and place-based strategies | No | No | Yes | Yes |
| IV | Pearce et al. 1998 [ | Gradual competition between distal and proximal cues | No | No | Yes | No |
| V | Rodrigo et al. 2006 [ | Generalization gradient | No | No | Yes | |
| VI | Roberts et al. 1999 [ | Blocking | No | No | Yes | No |
Summary table of the main predictions raised by simulations of the DP model.
Same abbreviations as in Table 1.
| Exp | Model | Prediction |
|---|---|---|
| I | DP | If the experiment is prolonged, hippocampus-lesioned animals should eventually reach the platform with performance that is not statistically different than control animals. |
| I | DP | Striatum-lesioned animals should not be impaired in this task. |
| II | DP | Hippocampus-lesioned animals should be slower at learning the platform location and should hence not display much within-session reduction in escape latency. |
| II | DP | Inactivation of the striatum during late sessions should not affect performance. |
| III | DP | Striatum-lesioned animals should produce an intermediate performance between control and hippocampus-lesioned animals: no impairment in the hidden cue case; lower performance than controls in the visible case. |
| III | DP | Transient inactivations of the striatum when the intra-maze cue is hidden should barely affect performance. |
| III | DP | Transient inactivations of the hippocampus during the competition trial should reduce the time spent at the previous location of the platform during the first sessions, and barely affect performance during subsequent sessions. |
| III | DP | The behavior of both cue-responders and place-responders should reflect a dominance of each individual’s preferred strategy, but in neither group should this behavior result from the complete absence of the other strategy. |
| IV | DP | Striatum-lesioned animals should show: a spared fast adaptation between the first and the fourth trial of each session; larger escape latencies than controls and hippocampus-lesioned animals at the first trial of each session; no progressive improvement of performance across sessions seen in the two other groups, and which could be the signature of a slow model-free learning process. |
| V | DP | During gradient trials, animals should rapidly reach octant B with the Direction strategy, while the Planning and Exploration strategies would give their contribution after, the former attracting the simulated agents towards octant F. The generalization gradient should thus not result from a complete loss of the associative strength of proximal cue B during learning. |
| VI | DP | Escape latencies for the Diff conditions in the first session of Stage 2 should not be as high as those in the first session of Stage 1, thanks to the cooperation of the Planning and Exploration strategies which should enable some generalization between these two situations with a different proximal cue. |
| VI | DP | Hippocampus-lesioned animals should show larger escape latencies than controls; a quicker learning across sessions during Stage 1 because they should not be polluted by the presence of an inefficient spatial strategy anchored on distal cues; and a blocking effect in the Session-Same condition in addition to the Trial-Same condition. |
| VI | DP | In all conditions—Trial-Same included—should the animals be able to quickly re-use distal cues after reaching the goal quadrant in the absence of the proximal cue, when searching for the absent platform. |
Summary table of the main predictions raised by simulations of the DL model.
Same abbreviations as in Table 1.
| Exp | Model | Prediction |
|---|---|---|
| I | DL | The performance of Hippocampus-lesioned animals should never reach that of the control animals even after a large number of simulated trials. |
| I | DL | The performance of Striatum-lesioned animals should also be impaired (but less) compared to control animals. |
| II | DL | Both control and striatum-lesioned animals should be slow at learning the platform location and should hence not display much within-session reduction in escape latency. |
| III | DL | The behavior of place-responders should rely on a cooperation between strategies, while the involvement of the spatial strategy should be much weaker in cue-responders, resulting in frequent homogeneous trajectories only controlled by the Direction strategy. |
| VI | DL | There should be smaller escape latencies in the first session of Stage 2 compared to the first session of Stage 1, for the same reasons as the DP model. |