| Literature DB >> 29872162 |
Henning Schroll1,2,3, Andreas Horn4, Joachim Runge5, Axel Lipp4, Gerd-Helge Schneider6, Joachim K Krauss5, Fred H Hamker7, Andrea A Kühn8.
Abstract
We set out to investigate whether beta oscillations in the human basal ganglia are modulated during reinforcement learning. Based on previous research, we assumed that beta activity might either reflect the magnitudes of individuals' received reinforcements (reinforcement hypothesis), their reinforcement prediction errors (dopamine hypothesis) or their tendencies to repeat versus adapt responses based upon reinforcements (status-quo hypothesis). We tested these hypotheses by recording local field potentials (LFPs) from the subthalamic nuclei of 19 Parkinson's disease patients engaged in a reinforcement-learning paradigm. We then correlated patients' reinforcement magnitudes, reinforcement prediction errors and response repetition tendencies with task-related power changes in their LFP oscillations. During feedback presentation, activity in the frequency range of 14 to 27 Hz (beta spectrum) correlated positively with reinforcement magnitudes. During responding, alpha and low beta activity (6 to 18 Hz) was negatively correlated with previous reinforcement magnitudes. Reinforcement prediction errors and response repetition tendencies did not correlate significantly with LFP oscillations. These results suggest that alpha and beta oscillations during reinforcement learning reflect patients' observed reinforcement magnitudes, rather than their reinforcement prediction errors or their tendencies to repeat versus adapt their responses, arguing both against an involvement of phasic dopamine and against applicability of the status-quo theory.Entities:
Mesh:
Year: 2018 PMID: 29872162 PMCID: PMC5988736 DOI: 10.1038/s41598-018-26887-3
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Figure 1Reinforcement-based learning paradigm. (A) Trial setup. At the beginning of each trial, a red fixation square was presented to patients until the joystick had not been moved for at least 1500 ms. This time period served as a baseline in all analyses. The fixation square then turned green, prompting patients to decide for a response and move the joystick accordingly. 500 ms after the decision, the red square was again presented until the joystick had not been moved for 1,500 ms (keep the upcoming period of feedback presentation uncontaminated by post-movement artefacts). Afterwards, the feedback stimulus (number between 0 and 10) was presented for 1,200 ms, followed by an inter-trial interval. (B) Feedback probability curves. Each movement direction was mapped onto a Gaussian feedback probability curve that defined the likelihood of different reinforcement magnitudes. Mappings between responses and probability curves remained constant for an average of 20 trials (SD: 3).
Figure 2Behavioral results. (A) Learning progress. Average reinforcement magnitudes are shown across trials for constant response-reinforcement mappings. Trial #1 corresponds to the first trial after a novel response-reinforcement mapping became valid. (B) Response choices are based on reinforcements (Pearson’s r = 0.90, p < 0.0001). For different reinforcement magnitudes, the probabilities of repeating the previous trial’s response are shown. Feedback magnitudes below 2 and above 8 are included into the bins of 2 and 8, respectively. (C) Response latencies do not significantly depend on reinforcements (r = −0.14, p = 0.09). Response latencies are shown for trials following different reinforcement magnitudes. (D) Response durations do not significantly depend on reinforcements (r = −0.13, p = 0.22). Response durations are shown for trials following different reinforcement magnitudes. Error bars represent SEMs in all sub-plots, computed across patients.
Figure 3Grand-average time-frequency plots show task-related changes in oscillatory power relative to the baseline interval. Black lines show the borders of significant clusters of power changes as determined with the methods by Maris and Oostenveld[18]. (A) Response-locked data. Time point zero denotes response onset. (B) Feedback-locked data. Time point zero denotes feedback onset; feedback presentation terminated at 1.2 sec.
Figure 4Reinforcement magnitudes modulate beta activity during feedback presentation. (A) Across time-frequency space, Pearson’s correlation coefficients between reinforcement magnitudes and task-related changes in LFP power are shown. Time point zero corresponds to feedback onset, time-point 1.2 to feedback offset. The frequency range spans 5 to 80 Hz. Correlations were tested for significance with a cluster-based approach described by Maris and Oostenveld[18]. Borders of a significant cluster are highlighted by a black line. (B) Average LFP power changes within the significant cluster of panel A relative to baseline are shown for different reinforcement magnitudes.
Figure 5Reinforcement prediction errors do not modulate STN oscillations. (A) Pearson’s correlation coefficients between reinforcement prediction errors and feedback-locked changes in LFP power are shown across time-frequency space. Time point zero corresponds to feedback onset, time-point 1.2 to feedback offset. (B) Correlations between reinforcement prediction errors and response-locked changes in LFP power are shown across time-frequency space. The frequency range spans 5 to 80 Hz. Correlations were tested for significance with a cluster-based approach described by Maris and Oostenveld[18].
Figure 6Feedback- and response-locked LFPs do not differ between switch and repeat trials. (A) For feedback-locked LFPs, average time-frequency maps are shown for all trials in which patients changed versus repeated their previous trials’ responses. Moreover, the difference map is shown. Time point zero corresponds to feedback onset, time point 1.2 to feedback offset. (B) For response-locked LFPs, average time-frequency maps are shown for all trials in which patients changed versus repeated their previous responses. Moreover, the difference map is depicted. Time point zero corresponds to response onset. Correlations were tested for significance with a cluster-based approach described by Maris and Oostenveld[18].
Figure 7Reinforcement magnitudes of a given trial modulate alpha/low beta activity during the next trial’s joystick movement. (A) Across time-frequency space, Pearson’s correlation coefficients between reinforcement magnitudes and task-related changes in LFP power are shown. Time point zero corresponds to response onset; average response duration is depicted by a vertical line at 669 ms. The frequency range spans 5 to 80 Hz. Correlations were tested for significance with a cluster-based approach described by Maris and Oostenveld[18]. The borders of a significant cluster are highlighted by a black line. (B) Average LFP power changes within the significant cluster of panel A relative to baseline are shown for different reinforcement magnitudes.
Overview of clinical data.
| Patient no (sex) | Operating center | Age | Onset age | Preoperative medication | Postoperative medication | Preoperative UPDRS III motor score OFF med. | Preoperative UPDRS III motor score ON med. | Contact pairs excluded from LFP analysis | Stimulation settings |
|---|---|---|---|---|---|---|---|---|---|
| 1 (m) | H | 66 | 50 | l-dopa 600 mg/d pramipexole 1.75 mg/d rotigotine 4 mg/d bornaprine 12 mg/d opipramol 100 mg/d rasagiline | rotigotine 4 mg/d bornaprine 6 mg/d rasagiline | 27 | 8 | none | 1–1.9 V; 11–2.2 V; 130 Hz, 60 µS |
| 2 (m) | H | 69 | 58 | l-dopa 550 mg/d pramipexole 2.1 mg/d amantadine 200 mg/d domperidone 20 mg/d | pramipexole ret. 2.1 mg/d domperidone 20 mg/d | 35 | 25 | none | 1–1.6 V; 9–3.0 V; 180 Hz, 60 µS |
| 3 (m) | H | 66 | 59 | l-dopa 850 mg/d pramipexole 2.1 mg/d clozapine 25 mg/d | l-dopa 625 mg/d | 39 | 30 | none | 2–3.6 V; 9–3.6 V; 130 Hz, 60 µS |
| 4 (m) | H | 37 | 28 | l-dopa 800 mg/d apomorphine 124 mg/d rasagiline | l-dopa 500 mg/d rasagiline | 37 | 44 | none | 3–4.3 V; 11–2.3 V; 130 Hz; 60 µS |
| 5 (m) | H | 58 | 51 | pramipexole ret 3.15 mg/d | pramipexole 1.575 mg/d | 24 | 8 | none | 2–2.3 V; 10-/11–2.6 V; 130 Hz; 60 µS |
| 6 (m) | B | 69 | 44 | l-dopa 150 mg/d pramipexole ret 4.2 mg/d amantadine 400 mg/d | l-dopa 375 mg/d amantadine 400 mg/d | 26 | 13 | none | 2+/3–3.7 V; 10+/11–3.0 V; 130 Hz; 60 µS |
| 7 (f) | B | 47 | 40 | l-dopa 600 mg/d ropinirole 16 mg/d piribedil 150 mg/d tolcapone 300 mg/d rasagiline | n.d. | n.d. | 18 | L12 | 2–2.0 V; 10–1.6 V; 130 Hz, 60 µS |
| 8 (m) | B | 50 | 43 | l-dopa 400 mg/d pramipexole ret 3.15 mg/d rasagiline | pramipexole 2.1 mg/d rasagiline | 27 | 7 | none | 1–2.1 V; 9–2.5 V; 140 Hz; 60 µS |
| 9 (f) | B | 58 | 50 | l-dopa 400 mg/d pramipexole 2.8 mg/d | pramipexole ret 0.52 mg/d | 34 | 24 | none | 2–1.0 V; 10–2.5 V; 130 Hz; 60 µS |
| 10 (m) | B | 54 | 47 | l-dopa 1200 mg/d | no l-dopa | 14 | 3 | L01, R01, R12, R23 | 2–4.5 V; 130 Hz; 60 µS |
| 11 (m) | B | 56 | 38 | l-dopa 400 mg/d amantadine 450 mg/d rasagiline | l-dopa 300 mg/d amantadine 450 mg/d rasagiline | 41 | 20 | none | 1–3.2 V; 9–2.0 V; 130 Hz; 60 µS |
| 12 (m) | H | 53 | 44 | l-dopa 500 mg | l-dopa 187.5 mg/d | 21 | 13 | L23 | 2–0.5 V; 10–1.1 V; 130 Hz; 60 µS |
| 13 (f) | H | 52 | 41 | l-dopa 900 mg/d ropinirole 16 mg/d | l-dopa 400 mg/d | 26 | 18 | none | 2–1.2 V; 10–2.2 V; 130 Hz, 60 µS |
| 14 (f) | B | 66 | 55 | l-dopa 425 mg/d pramipexole 4.2 mg/d rasagiline | pramipexole ret 3.15 mg/d | 34 | 11 | none | 2–1.4 V; 10–1.6 V; 130 Hz; 60 µS |
| 15 (m) | B | 62 | 53 | l-dopa 900 mg/d tolcapone 600 mg/d rotigotine 4 mg/d | l-dopa 500 mg/d | 31 | 14 | none | 0–2.0 V; 8–2.0 V; 130 Hz, 60 µS |
| 16 (m) | B | 72 | 57 | l-dopa 1050 mg/d piribedil 150 mg/d | l-dopa 500 mg/d piribedil 50 mg/d | 59 | 43 | R12, R23 | 2–2.6 V; 10–2.2 V; 130 Hz; 60 µS |
| 17 (m) | B | 68 | n.d. | l-Dopa 700 mg/d ropinerole 16 mg/d amantadine 200 mg/d rasagiline1mg/d | l-dopa 500 mg/d amantadine 200 mg/d rasagiline1mg/d | 37 | 27 | none | 1–2.8 V; 9–2.0 V; 130 Hz, 60 µS |
| 18 (m) | B | 64 | 55 | l-dopa 400 mg/d pramipexole ret 3.15 mg/d amantadine 300 mg/d rasagiline | l-dopa 300 mg/d pramipexole ret. 0.52 amantadine 100 mg/d rasagiline | 43 | 20 | none | 1–2.7 V; 9–3.4 V; 110 Hz; 60 µS |
| 19 (m) | B | 67 | 63 | l-Dopa 1300 mg/d rotigotine 6 mg/d | l-dopa 800 mg/d rotigotine 6 mg/d | 26 | 17 | none | 1–2.7 V; 9–2.2 V; 130 Hz; 60 µS |
Figure 8Spatially reconstructed positions of all contact pairs from which STN LFPs were bipolarly recorded. For orientation, motor, associative and limbic parts of the STN, as defined in the atlas by Accola et al.[40] are shown in red, cyan and yellow, respectively.