| Literature DB >> 30979910 |
Ashraf Mahmud1, Petio Petrov1, Guillem R Esber2, Mihaela D Iordanova3.
Abstract
Temporal-difference (TD) learning models afford the neuroscientist a theory-driven roadmap in the quest for the neural mechanisms of reinforcement learning. The application of these models to understanding the role of phasic midbrain dopaminergic responses in reward prediction learning constitutes one of the greatest success stories in behavioural and cognitive neuroscience. Critically, the classic learning paradigms associated with TD are poorly suited to cast light on its neural implementation, thus hampering progress. Here, we present a serial blocking paradigm in rodents that overcomes these limitations and allows for the simultaneous investigation of two cardinal TD tenets; namely, that learning depends on the computation of a prediction error, and that reinforcing value, whether intrinsic or acquired, propagates back to the onset of the earliest reliable predictor. The implications of this paradigm for the neural exploration of TD mechanisms are highlighted.Entities:
Mesh:
Year: 2019 PMID: 30979910 PMCID: PMC6461709 DOI: 10.1038/s41598-019-42244-4
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Figure 1Serial cue presentation provides an effective blocking examination. (A) Increase in freezing levels in Groups Block Serial and Block Simultaneous during Conditioning. Fear to the pre-trained cue in Group Simultaneous was greater than that of Group Serial. (B) Simultaneous compound conditioning in Groups Control Simultaneous and Block Simultaneous revealed an increase in freezing on Day 1 and a maintenance of this high level of fear on Day 2. (C) Serial compound conditioning in Groups Control Serial and Block Serial revealed a blocking effect from the second trial of Day 1, which was then maintained on Day 2 with Group Control showing consistently higher levels of freezing to the clicker compared to Group Block (F1,22 = 11.8, CI [0.46, 1.85]). (D) Freezing to the clicker was lower in the blocking groups compared to the control groups and the effect of blocking was similar between the compound and serial groups. Freezing to the pre-trained light was greater in Group Block Simultaneous compared to Control Simultaneous, but the direction of this difference was reversed for the serial groups at least on the first trial (inset).
Figure 2Second-order conditioning in a serial compound procedure. (A) Fear to CS1 increased faster in Group Serial compared to Groups Single and Compound on Day 1. On subsequent days of conditioning, fear to Group Serial remained higher compared to the other two groups individually (Day 2: Single F1,26 = 14.14, CI [0.42, 1.82]; Compound F1,26 = 16.62, CI [0.52, 1.91]; Day 3: Single F1,26 = 16.28, CI [0.53, 1.99]; Compound F1,26 = 29.98, CI [0.96, 2.39]), steady (Day 2: F1,26 = 3.43, CI [−0.04, 0.80]; Day 3: F < 1, [−0.37, 0.44]) with no interactions (Day 2: Serial vs. Single F1,26 = 1.16, CI [−0.64, 1.72]; Serial vs. Compound F1,26 = 2.47, CI [−0.39, 1.97]; Day 3: Serial vs. Single F < 1, [−0.92, 1.40]; Serial vs. Compound F1,26 = 2.50, CI [−0.37, 1.90]). (B) Fear to CS2 in Group Serial and the equivalent temporal interval post CS1 in Group Single increased on the first day but did not differ between the groups on any of the days (see main text for Day1). There were no differences between the groups (Day 2: F1,18 = 1.04, CI [−0.35, 1.04]; Day 3: F1,18 = 1.67, CI [−0.25, 1.09]), no linear trend across trials (Day 2: F < 1, [−0.54, 0.40]; Day 3: F1,18 = 2.50, CI [−0.77, 0.10]) and no interaction (Day 2: F1,18 = 2.94, CI [−1.71, 0.16]; Day 3: F < 1, [−0.56, 1.18]). (C) Freezing to the cue during non-reinforced tests show that fear to CS1 was greater in Group Serial compared to Groups Single and Compound, and freezing to CS2 was greater in Group Serial compared to Group Compound. (D) The overall levels of freezing to CS2 during conditioning were correlated with freezing to CS1 during test.