Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 Stochastic reinforcement benefits skill acquisition.

Literature DB >> 24532838

Stochastic reinforcement benefits skill acquisition.

Eran Dayan¹, Bruno B Averbeck, Barry J Richmond, Leonardo G Cohen.

Abstract

Learning complex skills is driven by reinforcement, which facilitates both online within-session gains and retention of the acquired skills. Yet, in ecologically relevant situations, skills are often acquired when mapping between actions and rewarding outcomes is unknown to the learning agent, resulting in reinforcement schedules of a stochastic nature. Here we trained subjects on a visuomotor learning task, comparing reinforcement schedules with higher, lower, or no stochasticity. Training under higher levels of stochastic reinforcement benefited skill acquisition, enhancing both online gains and long-term retention. These findings indicate that the enhancing effects of reinforcement on skill acquisition depend on reinforcement schedules.

Mesh：

Year: 2014 PMID： 24532838 PMCID： PMC3929848 DOI： 10.1101/lm.032417.113

Source DB: PubMed Journal: Learn Mem ISSN： 1072-0502 Impact factor: 2.460

Learning new skills is driven by reinforcement, which can be either extrinsic, as in the form of monetary rewards (Wachter et al. 2009; Abe et al. 2011), or intrinsic (Shohamy 2011), as in a sense of fulfillment and pride. Normative models of valuation (Bell et al. 1988) view humans as reward-maximizing entities. Consequently, learning novel skills with reinforcement schedules where successful performance is continuously reinforced should maximally facilitate learning. Indeed, previous studies of reward-based motor skill acquisition utilized reinforcement schedules of this sort (Wachter et al. 2009; Abe et al. 2011). Yet, in ecologically valid settings complex skills are often acquired when mapping between actions and rewarding outcomes is not continuous and fixed but, rather, variable and unknown. This results in reinforcement schedules of a stochastic nature, and a state commonly referred to as uncertainty (Platt and Huettel 2008; Bach and Dolan 2012). For procedural skills, performance improvements can occur not only during training (“online”), but also between training sessions in periods where there is no active practice occurring (“offline”). These two forms of learning lead to formation of long-term memory (Doyon and Benali 2005; Dayan and Cohen 2011), and both appear to be affected by reinforcement (Wachter et al. 2009; Abe et al. 2011). Here, we studied how procedural learning would proceed under the incentive of stochastic reinforcement. We trained four groups of subjects (n = 48) on a sequential visuomotor task (Reis et al. 2009; Schambra et al. 2011), manipulating reinforcement schedules between groups. The task required subjects to move a cursor back and forth between a “home” position and five individually colored numbered targets (four gates and one thick line) by modulating pinch force applied onto a force transducer (Fig. 1A). Successful trials were ones where the sequence of movements (1-home, 2-home, 3-home, 4-home, 5) was performed accurately within a fixed amount of time (8 sec). Performance-based auditory feedback (a “beep” sound) was given whenever a target gate was passed through successfully. The experiment comprised three sessions (Fig. 1B). Following the first block of 20 trials, where baseline performance was assessed, reinforcement schedules were implemented for five blocks of training, where successful completion of each trial could result in visual reward feedback (“you win 0.6$”). Tests of skill levels were then administered with no reinforcement immediately after training as well as 24 h and 7 d post-training, to assess offline skill gains and long-term retention of skill, respectively. Training was carried out under four different reinforcement schedules (Fig. 1C), manipulated in a between subject design, varying reward probability to create four levels of stochastic reinforcement (with probability, P, at 0.25, 0.5, 0.75, 1). Under this experimental manipulation uncertainty as to whether a successful trial would get rewarded is maximal at P = 0.5 and decreases with higher and lower probabilities (Schultz et al. 2008), since these probabilities are associated with higher certainty pertaining to possible chances of being rewarded or unrewarded (Fig. 1D).

Figure 1.

Task and design. (A) By varying the magnitude of pinch-force applied onto a force transducer, subjects moved a cursor back and forth via five numbered targets within a fixed period of time. (B) Experimental design. The experiment comprised three sessions, including a training session with reward feedback, followed by three tests of skill. (C) Reinforcement schedules. Four reinforcement schedules were tested, with reward feedback provided on 25%, 50%, 75%, or 100% of successful trials. (D) Reward uncertainty. Stochastic reinforcement was maximal and was associated with maximal levels of outcome uncertainty when reward probability was 0.5. With probabilities of 0.25 and 0.75, stochasticity and uncertainty were lower since the learning agents were operating with greater certainty pertaining to lower and higher chances of being rewarded, respectively. Skill acquisition was quantified per block using a skill measure combining movement time and error rates to capture shifts in the task's speed–accuracy trade-off function along training (Reis et al. 2009; Schambra et al. 2011; see Supplemental Methods). The two groups that trained with lower levels of stochasticity (0.25 and 0.75) showed indistinguishable performance across all four testing sessions (F(3,66) = 2.541, ns), and therefore data from these two groups were collapsed into one group, henceforth the “low stochasticity” group (n = 24), and compared with a high stochasticity group (reward probability, P = 0.5, n = 11) and a fixed reward group (no stochasticity, reward probability, P = 1.0, n = 11). A repeated measures analysis of variance (ANOVA) revealed a significant interaction between the level of stochasticity and testing session (F(6,129) = 5.36, P < 0.0001) (Fig. 2A). Specifically, all groups had comparable baseline performance levels prior to training (F(2,43) = 2.438, ns). After training, the three training groups showed significantly different performance (F(2,43) = 4.794, P < 0.02). Namely, the high stochasticity group showed significantly more skillful performance than the low stochasticity (P = 0.023, post hoc Fisher's least square differences test) and the fixed reward groups (P = 0.004). One day post-training, the three groups maintained their levels of performance but skill levels did not differ (F(2,43) = 2.472, ns). One week after training, most subjects in the high stochasticity (7/11) and the fixed reward (9/11) groups showed additional gains compared to their performance right after training. In the low stochasticity group, on the other hand, more subjects (13/24) showed evidence of forgetting (i.e., worse performance at 1 wk, compared to immediately after training). Overall, 1 wk after training, performance of the three training groups differed (F(2,43) = 8.71, P < 0.001), whereby the high stochasticity group performed significantly better relative to the low stochasticity (P < 0.0001) and the fixed reward groups (P < 0.001). To evaluate the observed differences further, we also assessed the degree of online within-session gains (Fig. 2B), defined as the difference in skill between test and baseline (ΔTest–Baseline). Online gains were significantly different among the groups (F(2,43) = 4.668, P < 0.02), with the group that trained under high stochasticity showing significantly larger gains relative to the two other groups (P < 0.04 and P < 0.005, when compared to the low stochasticity and fixed reward groups, respectively; see also Supplemental Results). Next we further assessed long-term retention of skill (Fig. 2C), defined as the difference between skill at 1 wk and performance immediately post-training (ΔRetest2–Test). This analysis again showed differences among the training groups (F(2,43) = 4.466, P < 0.02), with the group training under high stochasticity showing better retention relative to the low stochasticity (P < 0.005) or to the fixed reward groups (P < 0.05).

Figure 2.

Training-related skill changes. (A) Changes of skill along training. Skill (a metric expressing shifts in the speed–accuracy trade-off function) at baseline, immediately after training (Test), 24 h later, and 7 d post-training. (B) Online learning. Online within-session gains were assessed by subtracting baseline skill scores from those measured immediately after training (test). (C) Long-term retention. To assess long-term retention, skill scores measured immediately after training were subtracted from scores measured 1 wk after training ended. Error bars depict SEM. (Ls) low stochasticity, (Hs) high stochasticity, (FR) fixed reward. The results reported here show that, contrary to what might be assumed from normative models of valuation (Bell et al. 1988), humans who learn new motor skills do not maximally benefit from training schedules where successful performance is continuously reinforced. Rather, training with high levels of stochastic reinforcement benefits skill learning more strongly, enhancing online within-session skill gains and further resulting in a stronger long-lasting memory trace of the acquired skill. Sensorimotor control is carried out in the face of various sources of sensory, motor, and outcome uncertainties. Progress has been made recently in understanding how the brain controls movement facing inherent sensory and motor noise (Orban and Wolpert 2011). Yet, less is known about the behavioral consequences of outcome uncertainty and the possible strategies utilized to compensate for it, owing in part to lack of relevant empirical data (Bach and Dolan 2012). Earlier work established that removal of various forms of augmented extrinsic feedback about task success results in improved retention of skills (Schmidt et al. 1989; Winstein 1991). Augmented information feedback refers to the extrinsic feedback provided to the learner to support learning (Swinnen 1996). In the current experimental paradigm, visual and auditory performance feedback were provided in real time to allow subjects to perform the task accurately and did not differ across all training groups. Reinforcement, on the other hand, was provided probabilistically at the end of each trial, as an incentive for successful performance, independently from augmented information feedback. Important differences exist with respect to intermittent delivery of feedback and reinforcement (Swinnen 1996). Whereas removal of augmented information feedback affects retention but not learning (Schmidt et al. 1989; Winstein 1991), the current results documented enhancing effects of stochastic reinforcement on both learning and retention, further demonstrating the difference between how reinforcement and feedback guide skill acquisition. Reinforcement-mediated motor learning has been shown in a variety of paradigms (Fischer and Born 2009; Wachter et al. 2009; Abe et al. 2011; Huang et al. 2011; Izawa and Shadmehr 2011; Shmuelof et al. 2012). Although it is still unresolved what constitutes a reward signal for the motor system (Wolpert et al. 2011), an emerging framework suggests that an underlying objective of voluntary movement is to achieve more valuable states (Shadmehr and Krakauer 2008). Along these lines, motor learning can be conceptualized as a process of optimization, where motor commands that minimize costs and maximize reward are shaped (Shadmehr and Krakauer 2008). To account for the findings reported here a mechanistic framework for reinforcement-mediated motor learning necessitates valuation systems that can integrate variables such as probability and magnitude to drive learning, of the sort suggested by models of reinforcement learning (Kaelbling et al. 1996; Sutton and Barto 1998). A range of information processing systems, both artificial and organic, appear to benefit from stochastic biological noise (McDonnell and Ward 2011), possibly by improving the system's overall signal-to-noise ratio. The beneficial effects of stochastic reinforcement may also stem from the influence these schedules and the uncertainty associated with them have on the saliency and the amount of attention directed toward reward-predicting cues (Pearce and Hall 1980; Dayan et al. 2000; Esber and Haselgrove 2011). Thus, training with stochastic reinforcement may render learning agents more susceptible to gain from training by allocation of more attentional and cognitive control resources during learning. The stochastic reinforcement schedules used in our experimental design may induce what has been referred to as “expected uncertainty,” which results from a known unreliability of predictive relationships within a familiar environment (Yu and Dayan 2005). Previous modeling work linked expected uncertainty with faster learning (Yu and Dayan 2003), providing a possible mechanism for the within-session learning gains reported here. Other earlier findings based on animal models established that intermittent reinforcement schedules applied during operant conditioning are associated with greater resistance to extinction, expressed in increased response rate during this phase (e.g., lever presses), but no effects on acquisition (Humphreys 1939; Jenkins and Stanley 1950). It was also shown that the increased response rate varies monotonically with the thinning of the reinforcement schedule (Gallistel and Gibbon 2000). The current results challenge these earlier accounts showing effects during acquisition which do not vary monotonically with reward probability and are followed by a stronger memory trace. More generally, our results suggest that stochastic reinforcement also exerts effects on higher forms of learning such as the acquisition of complex novel visuomotor skills in humans, where successful task performance requires a skillful combination of speed and accuracy.

26 in total

Review 1. Reorganization and plasticity in the adult brain during learning of motor skills.

Authors: Julien Doyon; Habib Benali
Journal: Curr Opin Neurobiol Date: 2005-04 Impact factor: 6.627

Review 2. A computational neuroanatomy for motor control.

Authors: Reza Shadmehr; John W Krakauer
Journal: Exp Brain Res Date: 2008-02-05 Impact factor: 1.972

3. Partial reinforcement: a review and critique.

Authors: W O JENKINS; J C STANLEY
Journal: Psychol Bull Date: 1950-05 Impact factor: 17.737

Review 4. Risky business: the neuroeconomics of decision making under uncertainty.

Authors: Michael L Platt; Scott A Huettel
Journal: Nat Neurosci Date: 2008-03-26 Impact factor: 24.884

5. Anticipated reward enhances offline learning during sleep.

Authors: Stefan Fischer; Jan Born
Journal: J Exp Psychol Learn Mem Cogn Date: 2009-11 Impact factor: 3.051

6. Rethinking motor learning and savings in adaptation paradigms: model-free memory for successful actions combines with internal models.

Authors: Vincent S Huang; Adrian Haith; Pietro Mazzoni; John W Krakauer
Journal: Neuron Date: 2011-05-26 Impact factor: 17.173

7. Reward improves long-term retention of a motor memory through induction of offline memory gains.

Authors: Mitsunari Abe; Heidi Schambra; Eric M Wassermann; Dave Luckenbaugh; Nicolas Schweighofer; Leonardo G Cohen
Journal: Curr Biol Date: 2011-03-17 Impact factor: 10.834

8. Uncertainty, neuromodulation, and attention.

Authors: Angela J Yu; Peter Dayan
Journal: Neuron Date: 2005-05-19 Impact factor: 17.173

9. Differential effect of reward and punishment on procedural learning.

Authors: Tobias Wächter; Ovidiu V Lungu; Tao Liu; Daniel T Willingham; James Ashe
Journal: J Neurosci Date: 2009-01-14 Impact factor: 6.167

10. Explicit neural signals reflecting reward uncertainty.

Authors: Wolfram Schultz; Kerstin Preuschoff; Colin Camerer; Ming Hsu; Christopher D Fiorillo; Philippe N Tobler; Peter Bossaerts
Journal: Philos Trans R Soc Lond B Biol Sci Date: 2008-12-12 Impact factor: 6.237

16 in total

1. Brain structural substrates of reward dependence during behavioral performance.

Authors: Eran Dayan; Janne M Hamann; Bruno B Averbeck; Leonardo G Cohen
Journal: J Neurosci Date: 2014-12-03 Impact factor: 6.167

2. Motor Learning Enhances Use-Dependent Plasticity.

Authors: Firas Mawase; Shintaro Uehara; Amy J Bastian; Pablo Celnik
Journal: J Neurosci Date: 2017-01-31 Impact factor: 6.167

3. Neuromodulation of reinforced skill learning reveals the causal function of prefrontal cortex.

Authors: Eran Dayan; Jasmine Herszage; Rony Laor-Maayany; Haggai Sharon; Nitzan Censor
Journal: Hum Brain Mapp Date: 2018-07-25 Impact factor: 5.038

4. Contribution of explicit processes to reinforcement-based motor learning.

Authors: Peter Holland; Olivier Codol; Joseph M Galea
Journal: J Neurophysiol Date: 2018-03-14 Impact factor: 2.714

5. Probability differently modulating the effects of reward and punishment on visuomotor adaptation.

Authors: Yanlong Song; Ann L Smiley-Oyen
Journal: Exp Brain Res Date: 2017-09-08 Impact factor: 1.972

6. Baseline frontostriatal-limbic connectivity predicts reward-based memory formation.

Authors: Janne M Hamann; Eran Dayan; Friedhelm C Hummel; Leonardo G Cohen
Journal: Hum Brain Mapp Date: 2014-07-31 Impact factor: 5.038

7. The dissociable effects of punishment and reward on motor learning.

Authors: Joseph M Galea; Elizabeth Mallia; John Rothwell; Jörn Diedrichsen
Journal: Nat Neurosci Date: 2015-02-23 Impact factor: 24.884

8. Rewarding imperfect motor performance reduces adaptive changes.

Authors: K van der Kooij; K E Overvliet
Journal: Exp Brain Res Date: 2016-01-12 Impact factor: 1.972

9. Vector and position coding in goal-directed movements.

Authors: Marieke C W van der Graaff; Eli Brenner; Jeroen B J Smeets
Journal: Exp Brain Res Date: 2016-11-17 Impact factor: 1.972

10. Reward abundance interferes with error-based learning in a visuomotor adaptation task.

Authors: Katinka van der Kooij; Leonie Oostwoud Wijdenes; Tessa Rigterink; Krista E Overvliet; Joeren B J Smeets
Journal: PLoS One Date: 2018-03-07 Impact factor: 3.240