| Literature DB >> 35978035 |
Qiliang He1, Jancy Ling Liu2, Lou Eschapasse3, Elizabeth H Beveridge3, Thackery I Brown4.
Abstract
Reinforcement learning (RL) models have been influential in characterizing human learning and decision making, but few studies apply them to characterizing human spatial navigation and even fewer systematically compare RL models under different navigation requirements. Because RL can characterize one's learning strategies quantitatively and in a continuous manner, and one's consistency of using such strategies, it can provide a novel and important perspective for understanding the marked individual differences in human navigation and disentangle navigation strategies from navigation performance. One-hundred and fourteen participants completed wayfinding tasks in a virtual environment where different phases manipulated navigation requirements. We compared performance of five RL models (3 model-free, 1 model-based and 1 "hybrid") at fitting navigation behaviors in different phases. Supporting implications from prior literature, the hybrid model provided the best fit regardless of navigation requirements, suggesting the majority of participants rely on a blend of model-free (route-following) and model-based (cognitive mapping) learning in such navigation scenarios. Furthermore, consistent with a key prediction, there was a correlation in the hybrid model between the weight on model-based learning (i.e., navigation strategy) and the navigator's exploration vs. exploitation tendency (i.e., consistency of using such navigation strategy), which was modulated by navigation task requirements. Together, we not only show how computational findings from RL align with the spatial navigation literature, but also reveal how the relationship between navigation strategy and a person's consistency using such strategies changes as navigation requirements change.Entities:
Mesh:
Year: 2022 PMID: 35978035 PMCID: PMC9385652 DOI: 10.1038/s41598-022-18245-1
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.996
Figure 1Experimental materials. (A) Layout of the environment. S indicates the fixed starting location in the Fixed phase, G1 ~ 3 indicate the goal object locations. (B, C) Actual view of landmark objects and rooms from the participants. Note that participants were brought back to the same starting location after finding a goal object during the Fixed phase.
Figure 3Model comparison in the Fixed (A) and the Random (B) phases. BIC Bayesian Information Criterion. n.s. not significant. ***p < 0.001.
Figure 4Model comparison in the Fixed (A) and the Random (B) phases. BIC Bayesian Information Criterion. **p < 0.01, ***p < 0.001.
Correlation matrix between navigation strategy (ω) and exploration–exploitation (θ) in the Fixed and Random phases.
| 1 | 2 | 3 | 4 | |
|---|---|---|---|---|
| 1. Fixed ω | – | |||
| 2. Random ω | 0.53*** | – | ||
| 3. Fixed θ | 0.25** | 0.31*** | – | |
| 4. Random θ | − 0.27** | − 0.35*** | − 0.20* | – |
*p < 0.05, **p < 0.01, ***p < 0.001.
Figure 2Model-free (A) and model-based (B) reinforcement learning models. The numbers in the figure are state values, showing how a navigator decides to move along the route in a given state/room. (A) Model-free valuation based on the TD algorithm. After finding an object this algorithm updates the values only along the traversed path. (B) Model-based valuations derived from dynamic programming. The model-based algorithm assumes a perfect cognitive map and the values in the entire environment are precomputed (See Model-based reinforcement learning). S starting location, G1 Goal object # 1.