Literature DB >> 31419766

Cognitive and neural mechanisms underlying the mnemonic effect of songs after stroke.

Vera Leo¹, Aleksi J Sihvonen², Tanja Linnavalli¹, Mari Tervaniemi³, Matti Laine⁴, Seppo Soinila⁵, Teppo Särkämö⁶.

Abstract

Sung melody provides a mnemonic cue that can enhance the acquisition of novel verbal material in healthy subjects. Recent evidence suggests that also stroke patients, especially those with mild aphasia, can learn and recall novel narrative stories better when they are presented in sung than spoken format. Extending this finding, the present study explored the cognitive mechanisms underlying this effect by determining whether learning and recall of novel sung vs. spoken stories show a differential pattern of serial position effects (SPEs) and chunking effects in non-aphasic and aphasic stroke patients (N = 31) studied 6 months post-stroke. The structural neural correlates of these effects were also explored using voxel-based morphometry (VBM) and deterministic tractography (DT) analyses of structural MRI data. Non-aphasic patients showed more stable recall with reduced SPEs in the sung than spoken task, which was coupled with greater volume and integrity (indicated by fractional anisotropy, FA) of the left arcuate fasciculus. In contrast, compared to non-aphasic patients, the aphasic patients showed a larger recency effect (better recall of the last vs. middle part of the story) and enhanced chunking (larger units of correctly recalled consecutive items) in the sung than spoken task. In aphasics, the enhanced chunking and better recall on the middle verse in the sung vs. spoken task correlated also with better ability to perceive emotional prosody in speech. Neurally, the sung > spoken recency effect in aphasic patients was coupled with greater grey matter volume in a bilateral network of temporal, frontal, and parietal regions and also greater volume of the right inferior fronto-occipital fasciculus (IFOF). These results provide novel cognitive and neurobiological insight on how a repetitive sung melody can function as a verbal mnemonic aid after stroke.

Entities: Chemical Disease Gene Species

Keywords: Aphasia; Chunking; Serial position effect; Singing; Stroke; Verbal memory

Mesh：

Year: 2019 PMID： 31419766 PMCID： PMC6706631 DOI： 10.1016/j.nicl.2019.101948

Source DB: PubMed Journal: Neuroimage Clin ISSN： 2213-1582 Impact factor: 4.881

Introduction

The role of music as a learning tool in enhancing verbal memory and recall has been a focus of scientific interest and debate for decades. Musical mnemonics refers to the idea that music, especially singing, may act as a mnemonic or learning aid, providing a structured temporal scaffolding framework that facilitates word learning (Ferreri and Verga, 2016). Experimental studies using songs have shown that hearing the melody of a well-known song can cue the retrieval of its lyrics (Rubin, 1977) and that lyrics are effectively paired with melody also when learning unfamiliar songs (Crowder et al., 1990; Samson and Zatorre, 1991). Also studies comparing memory for novel verbal material presented in sung or spoken formats have reported better learning and/or delayed recall of sung vs. spoken material (Calvert and Tart, 1993; Ludke et al., 2014; McElhinney and Annett, 1996; Rainey and Larsen, 2002; Tamminen et al., 2017; Wallace, 1994), also in the context of implicit (statistical) learning (Bosseler et al., 2016; Schön et al., 2008), although some studies have found no advantage when directly comparing the two presentation forms (Racette and Peretz, 2007; Thaut et al., 2005) or when controlling for differences in presentation rate (Kilgour et al., 2000). In addition to healthy subjects, the mnemonic effects of songs have been explored also in persons with memory impairment caused by a neurological illness. An advantage of sung material over spoken material has previously been observed on the learning and recall of word lists in multiple sclerosis (MS) patients (Thaut et al., 2014) as well as in the recall of unfamiliar song lyrics or texts in Alzheimer's disease (AD) patients (Moussard et al., 2014; Palisson et al., 2015; Simmons-Stern et al., 2010) and in amnesic patients (Haslam and Cook, 2002). Aphasic stroke patients have been observed to repeat the words of familiar songs (Straube et al., 2008) as well as complete the ends of phrases of familiar songs (Kasdan and Kiran, 2018) better when they are presented in sung than spoken format, whereas no advantage of sung over spoken presentation has previously been observed in the learning of unfamiliar lyrics (Hébert et al., 2003; Racette et al., 2006; Straube et al., 2008). Recently, using a larger sample of stroke patients (N = 31) studied longitudinally at acute and 6-month post-stroke stage, we found that stroke patients showed better learning and delayed recall of novel, narrative stories when presented in sung vs. spoken format at the 6-month stage but not at the acute stage (Leo et al., 2018). At 6 months, the relative benefit of the sung melody on story learning was seen especially in patients with mild aphasia, suggesting that aphasic patients can benefit from the sung melody as a mnemonic aid in recalling novel verbal material. At the neural level, further results from voxel-based morphometry (VBM) and deterministic tractography (DT) analyses of structural MRI data showed that in aphasic patients the sung > spoken learning effect correlated extensively with larger grey matter volume (GMV) in left frontal areas, right temporal and limbic areas, and bilateral parietal and striatal areas as well as with higher fractional anisotropy (FA) and volume in multiple frontotemporal white matter tracts, including the uncinate fasciculus (UF), the arcuate fasciculus (AF), and the inferior fronto-occipital fasciculus (IFOF) (Särkämö and Sihvonen, 2018). Overall, based on the previous studies on healthy subjects and clinical populations, the mnemonic effect of songs in facilitating verbal learning and memory appears to occur most often when the to-be-learned verbal material is linguistically connected (phrases or sentences) instead of isolated words and the sung melody is simple in its melodic and rhythmic structure, repetitive, and slower in tempo compared to speech. In general, the melodic and rhythmic patterns of music provide a rich structure that can potentially help in combining words and phrases, identifying line lengths and stress patterns, and adding emphasis and focus on the surface characteristics of the verbal material (e.g., phonemic structure, phrasing) (Wallace, 1994). However, few studies have explored how the mnemonic effect of singing unfolds structurally across the length of the to-be-recalled materials, and how it affects different learning- and memory-related processes. The serial position effect (SPE) is a classic learning-related phenomenon whereby information occurring early and late in a sequence is remembered better than information presented in the middle of the sequence (Crowder and Greene, 2000; Deese and Kaufman, 1957). Typically, in a learning task where the number of serially presented items exceeds working memory capacity, the SPE is reflected by a U-shaped curve with an increased likelihood to recall the first items (primacy effect, PE) and the last items (recency effect, RE) compared to the ones in the middle. The classic dual-storage memory model (Atkinson and Shiffrin, 1968) attributes the RE to immediate access from a short-term storage buffer and the PE to retrieval from a long-term memory storage through working memory. Although the neural basis of SPE is still rather poorly understood, there is some evidence from neuroimaging and clinical studies of verbal memory that the PE and RE are associated with different neural networks: the PE with dorsolateral prefrontal areas and the RE with inferior parietal, temporal, and hippocampal areas (Buchsbaum et al., 2011; Düzel et al., 1996; Innocenti et al., 2013; Spalletta et al., 2016; Staffaroni et al., 2017). Most of the evidence for SPEs comes from studies using unconnected items, such as lists of words or numbers, with aphasic patients showing impairment especially in the PE (Ivanova et al., 2018; Jefferies et al., 2008; Ostergaard and Meudell, 1984). However, the SPE has also been found for connected speech, both in healthy subjects (Freebody and Anderson, 1986; Newhouse and Holen, 1975) and in patients with brain injury (Hall and Bornstein, 1991) and aphasia (Brodsky et al., 2003). It is possible that the learning and recall of sung verbal material could show a different pattern of SPEs compared to spoken verbal material, but experimental evidence supporting this is still scarce and limited to healthy persons. Comparing the immediate recall of digit sequences presented in spoken and sung format to healthy subjects, Silverman (2007) found a significant interaction between serial position and condition, with less bowed U curve (indicating more stable recall performance across the list, especially for the middle items) in the sung condition. In another related study, Maylor (2002) found that the degree of familiarity of song lyrics was linked to a better recall across all verses of the song, although both PE and RE were also observed. Grouping individual items into larger units, a process known as chunking, is another well-known memory mechanism that can facilitate learning and optimize performance by decreasing memory load (Bor et al., 2003; Gobet et al., 2001). Chunking-based encoding of verbal material has been linked to a wide-scale bilateral network comprising of dorsolateral prefrontal, inferior parietal, and posterior temporal regions (Bor et al., 2004). Recently, chunking training in working memory has been observed to enhance language processing in aphasia (Eom and Sung, 2016). Wallace (1994) has suggested that in a sung verbal learning task, the repetitive melodic and rhythmic structure can serve as an encoding and retrieval cue by chunking consecutive items into melodic phrases and assisting in positioning and sequencing textual units, thus decreasing the likelihood that units will be misplaced and disrupt memory for succeeding units. However, there is currently very little experimental evidence on chunking in the context of music or singing, with only two studies in healthy subjects reporting greater chunking of recalled material for words presented with background music vs. silence (Ferreri et al., 2015) and via singing vs. speaking (McElhinney and Annett, 1996). Extending our earlier findings (Leo et al., 2018), the aim of the present study was to (i) determine whether the previously observed mnemonic benefit in the learning and delayed recall of sung vs. spoken novel narrative stories 6 months post-stroke would be related to a difference in SPE and chunking effects between the tasks and aphasic vs. non-aphasic patients, and to (ii) uncover the structural neural correlates of these effects. For this purpose, we analyzed recall performance for the first, middle, and last part of the stories as well as for the number of correctly recalled consecutive words (chunk length). Our hypothesis was that the sung melody would help in combining words and linking succeeding verses together in memory, which would result in longer chunks in the sung than spoken task. We also expected that in the sung task, recall would be more stable across the story (resulting in smaller SPE) or, optionally, more enhanced for the end part of the story (larger RE). The latter effect could emerge if the repetition of the melody across verses plays a role in building the mnemonic effect, making the last part of the story more salient in short-term storage and therefore easier to recall in the sung task. Given the close coupling between prosody and music or singing (Brown, 2017; Hausen et al., 2013; Thompson et al., 2012), we also sought to explore if the potential benefits in the sung vs. spoken task on SPE and chunking would be associated with the ability to perceive linguistic and emotional prosody. Finally, given our previous results (Leo et al., 2018) and the prior evidence for the relative preservation of vocal music (Sihvonen et al., 2017a) and the benefits of singing-based interventions in aphasia (Belin et al., 1996; Schlaug et al., 2008; van der Meulen et al., 2014; Zumbansen et al., 2014), we expected that the abovementioned SPE and chunking effects in the sung vs. spoken task would be evident particularly in aphasic patients.

Materials and methods

Subjects and study design

Subjects (N = 31) were right-handed stroke patients recruited during 2013–2016 from the Department of Clinical Neurosciences at the Turku University Hospital. All patients had an MRI-verified first-ever acute ischemic stroke or intracerebral hemorrhage in the left or right hemisphere, primarily in middle cerebral artery (MCA) territory, and at least minor cognitive impairment caused by the stroke. Patients with prior neurological or psychiatric disease, substance abuse, or significant hearing impairment were excluded. All participants underwent a neuropsychological testing and a MRI session within 3 weeks after the stroke (acute stage) and at the 6-month post stroke stage. The neuropsychological testing session lasted 2–3 h. The study was approved by the Ethics Committee of the Hospital District of Southwest Finland and was performed in conformance with the Declaration of Helsinki. All patients signed an informed consent and received standard stroke treatment and rehabilitation.

Standard neuropsychological tests

was assessed using the Aphasia Severity Rating Scale (ASRS) from the Boston Diagnostic Aphasia Examination (BDAE; Goodglass and Kaplan, 1983). The ASRS scoring was done clinically mostly based on free conversational speech, drawing information also from performance on standard tests of verbal comprehension [shortened Token Test (De Renzi and Faglioni, 1978)] and production [shortened Boston Naming Test (Laine et al., 1993a), semantic and phonemic verbal fluency tasks (Lezak et al., 2012)]. Patients with an ASRS score of 4 or less were classified as aphasic (Goodglass and Kaplan, 1983). was evaluated with the Story Recall (SR) subtest (immediate and delayed verbal recall of a short story) from the Rivermead Behavioural Memory Test (RBMT; Wilson et al., 1985) and an Auditory Verbal Learning Task with words (AVLT; 10 orally presented words, three learning trials and delayed recall). was evaluated with a shortened version (Särkämö et al., 2009) of the Montreal Battery of Evaluation of Amusia (MBEA; Peretz et al., 2003) comprising of the Scale and Rhythm subtests (discriminating piano melodies based on melodic pitch and rhythm changes). The role of music in life pre-stroke was also assessed with the Barcelona Music Reward Questionnaire (BMRQ; Mas-Herrero et al., 2013). Prosody perception was evaluated 6 months post-stroke with two tasks involving linguistic and emotional prosody. In the linguistic prosody task (from Hausen et al., 2013), the subjects heard 30 utterances produced with a prosodic stress pattern that denoted it either as a compound word [e.g., “näytä KISsankello” meaning “show the harebell (or literally ‘cat's-bell’ in English) flower”] or as a phrase comprised of the same two words (e.g., “näytä KISsan KELlo” meaning “show the bell of the cat” in English). After each utterance, they saw two pictures on the screen depicting the compound and two-word phrase options and responded by selecting the picture that matched what they heard. In the emotional prosody task (adapted from Leinonen et al., 1997), the subjects heard 96 one-word utterances (female name “Saara”) produced with a prosodic pattern that expressed six different emotional states (neutral, afraid, sad, happy, angry, surprised) and they had to select which emotion the stimulus expressed.

Sung-spoken story recall task

Stimuli

The sung-spoken story recall task (SSSRT) was performed 6 months post-stroke. The task was developed to compare the learning and recall of novel verbal material (stories) presented in sung and spoken formats. The SSSRT comprised of two short narrative stories (A & B) themed around an unexpected or ironic event in everyday life (A: forgetting a mobile phone to restaurant and later discovering that someone had made expensive hotline calls with it, B: traveling to Spain on holiday and finding out that luggage was lost at the airport and arrives on the last day of the holiday). The stories were 56 (A) and 55 (B) words long and were arranged in 5 verses (10–13 words per verse). The two stories were recorded by the same female voice in (i) spoken format (with natural prosody) and (ii) in sung format. The sung versions had the same melody, which was composed to be simple, containing 6–7 different tones minor key in 4 bars, with 4/4 m and a tempo of 180 beats per minute (bpm) (see Fig. 1 for the notation of the melody). The same melody repeated in all the 5 verses of the song. The durations of stories were 34 s (A) and 36 s (B) in the spoken versions and 53 s (A & B) in the sung versions. The full lyrics and audio examples of the spoken and sung stories are available as Supplementary Material.

Fig. 1

Melody used in the sung part of the sung-spoken story recall task (SSSRT).

Procedure

The spoken and sung versions of the SSSRT were presented to the patients by counter-balancing the verbal content of the stories. Thus, half of the patients heard story A spoken and story B sung and half heard story A sung and story B spoken. The spoken version was presented first, with three consecutive learning trials and a delayed recall trial 25 min later. Then, after a 15-min interval, the sung version was presented following the same protocol. We chose to use this fixed presentation order instead of a counterbalanced in order to avoid the possibility that when performing the sung version first, the patients could then covertly use the melody (i.e., imagining or humming it in their mind) while performing the spoken version. This would have been possible since the A and B stories were designed to have similar linguistic structure (in terms of line length and phrasing) so that both would work with the same melody. The stimuli were presented on a laptop computer with headphones, the volume being adjusted to a comfortable and clearly audible level. On each trial, the task of the patient was to try to recall as much of the story as he/she could. To make the recall situation as natural and comfortable as possible in the sung condition, the patient was given the option of recalling the story either by speaking or by singing. No patient chose singing so all recall performances were done by speaking.

Data analysis

The scoring protocol for the SSSRT was similar as in the RBMT-SR: 2 points for each correct word and 1 point for each partially correct or semantically similar word. For the SPE analyses, the percentages of correct responses for the three learning trials (T1/T2/T3) and the delayed recall (del) trial were calculated separately for the first verse (V1), the middle verse (V3), and the last verse (V5), as well as for the primacy effect (PE, V1 minus V3) and the recency effect (RE, V5 minus V3). For the chunking analyses, we first divided the item-level (individual words) scores across the stories into chunks (defined as the number of consecutive words that were recalled correctly). Based on this, we then calculated the average length of the chunks in the two tasks. Differences between the sung and spoken task performance were analyzed statistically using mixed-model analyses of variance (ANOVA) as well as independent-samples and paired t-tests. In the mixed-model ANOVAs, the Greenhouse-Geisser correction was used when appropriate. The level of statistical significance was set at p < .05. All statistical analyses were performed using IBM SPSS Statistics 24.

Structural magnetic resonance imaging

MRI data acquisition

Participants were scanned 6 months post-stroke using a standard 12-channel head matrix coil on a 3 T Siemens Magnetom Verio scanner (Siemens Medical Solutions, Erlangen, Germany) at the Medical Imaging Centre of Southwest Finland. T1-weighted high-resolution MPRAGE scans (flip angle = 9, TR = 2300 ms, TE = 2.98 ms, voxel size = 1.0 × 1.0 × 1.0 mm) as well as diffusion MRI scans (TR = 11,700 ms, TE = 88 ms, acquisition matrix = 112 × 112, 66 axial slices, voxel size = 2.0 × 2.0 × 2.0 mm) with one non-diffusion weighted volume and 64 diffusion weighted volumes (b-values of 1000 s/mm2) were acquired.

Voxel-based morphometry

Voxel-based morphometry (VBM) analysis was carried out using VBM (Ashburner and Friston, 2000) Statistical Parametric Mapping software (SPM8, Wellcome Department of Cognitive Neurology, UCL) under MATLAB 8.4.0 (The MathWorks Inc., Natick, MA, USA, version R2014b). For exact methodological description including detailed information on preprocessing, see our recently published articles (Sihvonen et al., 2016; Sihvonen et al., 2017b). The preprocessed and modulated grey matter (GM) images were entered into a second-level analysis using t-tests to assess the relationship between the behavioural performance and the GM volume (GMV) across the entire GM space within the aphasic group and in the aphasic vs. non-aphasic group. Age, gender, and total intracranial volume (TIV) were added as nuisance covariates (Barnes et al., 2010). All results were thresholded at a whole-brain uncorrected p < .001 threshold with a cluster extent of >100 contiguous voxels. To evaluate which GM correlates were facilitating the behavioural performance, partial correlations with two-tailed false discovery rate (FDR) corrected p-values controlling for age, sex and TIV were calculated for each significant cluster separately for aphasic and non-aphasic patients.

Deterministic tractography

To evaluate the relationship between the task performance and white matter (WM) pathways, the following four tracts were dissected in the left and right hemispheres using TrackVis (version 0.6.0.1, Build 2015.04.07) and included in the deterministic tractography (DT) analyses: arcuate fasciculus (AF), inferior fronto-occipital fasciculus (IFOF), inferior longitudinal fasciculus (ILF), and uncinate fasciculus (UF). These frontotemporal tracts were selected on the basis of previous DTI studies in healthy subjects and different clinical populations that link them directly to verbal learning or verbal memory performance (Chiou et al., 2016; López-Barroso et al., 2013; Mabbott et al., 2009; McDonald et al., 2008; Reggente et al., 2018). For complete methodological information including description on the included WM tracts, see our recently published article (Sihvonen et al., 2017c). After dissection, statistical information (tract volume and FA value) of each WM tract was collected using a MATLAB toolbox, “along-tract statistics” (Colby et al., 2012). Volume and FA values were then imported to IBM SPSS Statistics 24 and further analyzed to evaluate the relationship between the WM tract parameters and behavioural performance using two-tailed Pearson correlation analysis in the aphasic and non-aphasic groups. Standard FDR-correction was applied to control for multiple correlations.

Results

Patient characteristics

Based on the BDAE-ASRS scores, 14 patients (45%) were classified as aphasic (all with left hemisphere lesions) and 17 as non-aphasic. In the aphasic group, the severity of language impairment was primarily mild (10 patients: BDAE-ASRS score 4, 4 patients: BDAE-ASRS score 3). As shown in Table 1, the aphasic and non-aphasic groups were comparable in all demographic and clinical characteristics, except in AVLT learning [t(29) = 2.9, p = .006] and delayed recall [t(29) = 2.1, p = .049] in which the aphasic patients scored lower, as expected.

Table 1

Characteristics of the patients.

	All patients	Aphasics	Non-aphasics	p value
	N = 31	N = 14	N = 17	p value
Demographical variables
Age (years)	53.0 (14.3)	51.4 (17.7)	54.4 (11.3)	0.564 (t)
Gender (male/female)	19/12	9/5	10/7	0.756 (χ²)
Education (years)	14.5 (3.4)	15.1 (4.3)	14.1 (2.4)	0.428 (t)
Pre-stroke musical background
Formal music training (yes/no)	8/23	4/10	4/13	0.750 (χ²)
Active singing or playing (yes/no)	15/16	7/7	8/9	0.870 (χ²)
BMRQ score (max.100)	75.6 (12.6)	77.6 (11.5)	74.0 (13.7)	0.435 (t)
Pre-stroke leisure activities
Music listening frequencya	4.6 (1.0)	4.4 (1.3)	4.8 (0.8)	0.245 (t)
Radio listening frequencya	2.7 (1.6)	2.4 (1.3)	3.0 (1.8)	0.328 (t)
Reading frequencya	3.8 (1.7)	3.9 (1.4)	3.8 (2.0)	0.958 (t)
Clinical variables (acute post-stroke)
Lesion laterality (left/right)	20/11	14/0	6/11	0.001 (χ²)
Lesion size (cm³)	53.4 (54.5)	39.0 (50.4)	65.3 (56.2)	0.186 (t)
Stroke type (infarct/hemorrhage)	22/9	9/5	13/4	0.457 (χ²)
NIHSS score (max. 42)	4.7 (3.1)	3.9 (2.3)	5.5 (3.6)	0.158 (t)
BDAE Aphasia Severity Rating Scaleb	4.3 (0.7)	3.7 (0.5)	4.8 (0.4)	< 0.001(t)
MBEA Scale and Rhythm avg. (%)	72.9 (14.1)	73.9 (9.9)	72.1 (17.1)	0.720 (t)
AVLT Learning score (3 trials, max. 30)	18.0 (4.6)	15.6 (4.0)	20.0 (4.2)	0.006 (t)
AVLT Delayed recall score (max. 10)	4.5 (2.7)	3.4 (2.4)	5.4 (2.7)	0.049 (t)
RBMT Story recall immediate (max. 42)	13.9 (7.0)	12.0 (6.0)	15.5 (7.6)	0.174 (t)
RBMT Story recall delayed (max. 42)	11.1 (7.5)	8.6 (5.3)	13.1 (8.6)	0.104 (t)

Data are reported as mean (SD) unless otherwise stated. Abbreviations: t = independent-samples t-test, χ2 = chi-square test, AVLT = Auditory Verbal Learning Task, BDAE = Boston Diagnostic Aphasia Examination, BMRQ = Barcelona Music Reward Questionnaire, MBEA = Montreal Battery of Evaluation of Amusia, NIHSS = National Institute of Health Stroke Scale, RBMT = Rivermead Behavioural Memory Test.

Likert scale 1–7 (1 = not at all, 7 = daily).

Likert scale 0–5 (0 = no usable speech or comprehension, 5 = minimal speech handicaps).

Characteristics of the patients. Data are reported as mean (SD) unless otherwise stated. Abbreviations: t = independent-samples t-test, χ2 = chi-square test, AVLT = Auditory Verbal Learning Task, BDAE = Boston Diagnostic Aphasia Examination, BMRQ = Barcelona Music Reward Questionnaire, MBEA = Montreal Battery of Evaluation of Amusia, NIHSS = National Institute of Health Stroke Scale, RBMT = Rivermead Behavioural Memory Test. Likert scale 1–7 (1 = not at all, 7 = daily). Likert scale 0–5 (0 = no usable speech or comprehension, 5 = minimal speech handicaps).

Serial position effects in the sung vs. spoken tasks

Verse-level performance of the patients in the SSSRT is shown in Table 2. Differences between SPEs in the sung and spoken tasks were analyzed using a mixed-model ANOVA with Task (spoken/sung), Trial (T1/T2/T3/del), and Verse (V1/V3/V5) as within-subject factors and Aphasia (non-aphasic/aphasic) as a between-subject factor. To focus here on SPE, we report only the main and interaction effects involving Verse. The mixed-model ANOVA yielded a significant Verse main effect [F(2, 56) = 17.2, p < .001], indicating better recall of V1 than V3 (primacy effect, PE) and V5 than V3 (recency effect, RE). We also observed a Trial x Verse interaction [F(3.6, 99.5) = 4.4, p = .004], indicating more uniform recall of the verses in the delayed recall trial than in the learning trials, and Task x Verse x Aphasia interaction [F(1.5, 40.9) = 3.7, p = .046] (see Fig. 2).

Table 2

Verse-level performance in the SSSRT.

Task	Trial	Verse	All patients	Aphasic	Non-aphasic
Task	Trial	Verse	N = 31	N = 14	N = 17
Spoken	1st learning trial (T1)	V1	57.7 (23.0)	54.8 (20.4)	60.2 (25.2)
		V3	35.2 (27.7)	39.3 (31.1)	31.8 (25.0)
		V5	44.4 (28.5)	38.3 (32.2)	49.5 (25.0)
	2nd learning trial (T2)	V1	67.6 (24.9)	64.7 (28.6)	70.0 (22.1)
		V3	45.7 (23.7)	45.4 (22.9)	45.8 (25.0)
		V5	61.9 (23.1)	58.8 (23.1)	64.4 (23.4)
	3rd learning trial (T3)	V1	72.3 (25.1)	65.4 (31.4)	78.1 (17.4)
		V3	58.1 (22.4)	56.3 (21.1)	59.5 (24.0)
		V5	62.3 (28.0)	58.1 (29.1)	65.8 (27.5)
	Delayed recall trial (del)	V1	56.5 (27.0)	58.3 (31.1)	54.9 (23.9)
		V3	54.2 (21.4)	55.6 (21.0)	53.0 (22.3)
		V5	53.2 (25.6)	47.4 (30.1)	58.0 (20.9)
	Average across trials	V1	63.5 (22.4)	60.8 (25.3)	65.8 (20.2)
		V3	48.3 (19.0)	49.2 (18.0)	47.5 (20.3)
		V5	55.5 (21.9)	50.6 (23.3)	59.4 (20.5)
	PE (V1 minus V3)		15.2 (20.3)	11.6 (14.8)	18.2 (23.9)
	RE (V5 minus V3)		7.2 (16.6)	1.5 (14.7)	11.9 (17.0)
Sung	1st learning trial (T1)	V1	49.3 (26.5)	51.4 (33.0)	47.8 (20.6)
		V3	34.9 (24.8)	33.7 (21.5)	35.9 (27.8)
		V5	50.3 (22.1)	51.0 (25.0)	49.7 (20.2)
	2nd learning trial (T2)	V1	69.8 (22.4)	76.9 (20.2)	64.0 (23.0)
		V3	60.9 (24.2)	60.9 (31.3)	60.8 (17.4)
		V5	65.2 (19.0)	67.2 (21.3)	63.6 (17.5)
	3rd learning trial (T3)	V1	71.4 (24.8)	73.5 (27.6)	69.5 (22.7)
		V3	66.6 (23.9)	64.5 (26.4)	68.3 (22.1)
		V5	68.3 (19.5)	73.1 (21.5)	64.2 (17.2)
	Delayed recall trial (del)	V1	62.5 (25.3)	64.5 (29.6)	60.9 (22.0)
		V3	62.5 (26.1)	54.6 (31.7)	69.0 (19.0)
		V5	63.9 (18.5)	66.6 (20.3)	61.8 (17.2)
	Average across trials	V1	63.9 (21.7)	66.6 (24.9)	61.6 (19.1)
		V3	56.7 (20.3)	53.5 (23.9)	59.5 (16.9)
		V5	62.2 (16.9)	64.4 (19.6)	60.2 (14.4)
	PE (V1 minus V3)		7.3 (19.7)	13.1 (15.6)	2.1 (21.9)
	RE (V5 minus V3)		5.5 (17.2)	11.0 (15.4)	0.8 (17.6)

Data are mean (SD). PE = primacy effect, RE = recency effect, V1 = 1st verse, V3 = 3rd (middle) verse, V5 = 5th (last) verse.

Fig. 2

Percentage of correct responses (mean ± SEM) in the first (V1), middle (V3) and last (V5) verses of the sung (white circles) and spoken (black squares) story recall tasks. Data are shown across all patients (left) and within the non-aphasic and aphasic groups (right). Grey asterisks denote significant Verse main effects in mixed-model ANOVAs in the sung and spoken the tasks. Black asterisks denote significant differences between the sung and spoken tasks for individual verses in paired t-tests.

Verse-level performance in the SSSRT. Data are mean (SD). PE = primacy effect, RE = recency effect, V1 = 1st verse, V3 = 3rd (middle) verse, V5 = 5th (last) verse. Percentage of correct responses (mean ± SEM) in the first (V1), middle (V3) and last (V5) verses of the sung (white circles) and spoken (black squares) story recall tasks. Data are shown across all patients (left) and within the non-aphasic and aphasic groups (right). Grey asterisks denote significant Verse main effects in mixed-model ANOVAs in the sung and spoken the tasks. Black asterisks denote significant differences between the sung and spoken tasks for individual verses in paired t-tests. This latter three-way interaction was followed up by separate mixed-model ANOVAs performed for the non-aphasic and aphasic patients and for the spoken and sung tasks. In the non-aphasic patients, the Verse effect was significant in the spoken task [F(2, 32) = 7.9, p = .002], with post hoc tests showing better recall of V1 than V3 (PE; p = .006) and V5 than V3 (RE; p = .011), but not in the sung task (Fig. 2). In the aphasic patients, the Verse effect was significant in both the spoken task [F(2, 26) = 5.5, p = .010] and the sung task [F(2, 26) = 6.3, p = .006]. Post hoc tests showed that in the spoken task the aphasic patients recalled V1 better than V3 (PE; p = .012) and also V1 better than V5 (p = .014), whereas in the sung task they recalled V1 better than V3 (PE; p = .008) and V5 better than V3 (RE; p = .019) (Fig. 2). The differences in the SPE patterns of the tasks were further analyzed with (i) paired t-tests comparing the sung and spoken tasks for each verse (V1/V3/V5) and for the PE (V1 minus V3) and RE (V5 minus V3) within the non-aphasic and aphasic patient groups and (ii) independent-samples t-tests comparing the non-aphasic and aphasic patient groups on the difference between the tasks (sung minus spoken) for each verse (V1/V3/V5), PE, and RE. The paired t-tests showed that in the non-aphasic patients, recall was better in the sung than spoken task for V3 [t(15) = 2.7, p = .018] (Fig. 2) and both the PE and the RE were marginally smaller in the sung than spoken task [t(15) = 2.0, p = .063 and t(15) = 2.1, p = .055, respectively]. In contrast, the aphasic patients showed better recall of V5 in the sung than spoken task [t(13) = 3.3, p = .006] (Fig. 2). The independent-samples t-tests indicated that the recall of V5 was better in the sung than spoken task in the aphasic than in the non-aphasic patients [t(28) = 2.5, p = .019]. Similarly, the RE was larger in the sung than spoken task in the aphasic than in the non-aphasic patients [t(28) = 2.6, p = .014]. Together, this pattern of results suggests that in non-aphasic patients the sung presentation led to more stable recall across the story, reducing the SPE and facilitating the recall of the middle part compared to the spoken presentation. In contrast, in aphasic patients the sung presentation facilitated the recall of the last part of the story, resulting in larger RE, compared to the spoken presentation.

Chunking effects in the sung vs. spoken tasks

Chunk length averages of the patients in the SSSRT are presented in Table 3. Differences between chunk length in the sung and spoken tasks were analyzed using a mixed-model ANOVA with Task (spoken/sung) and Trial (T1/T2/T3/del) as within-subject factors and Aphasia (non-aphasic/aphasic) as a between-subject factor. This yielded a significant Trial main effect [F(1.8, 53.1) = 15.8, p < .001], indicating progressively increasing chunk length during the learning trials, as well as a significant Task x Aphasia interaction [F(1, 29) = 7.4, p = .011] (see Fig. 3).

Table 3

Average chunk length in the SSSRT.

Task	Trial	All patients	Aphasic	Non-aphasic
Task	Trial	N = 31	N = 14	N = 17
Spoken	1st learning trial (T1)	3.1 (1.5)	3.1 (1.6)	3.1 (1.4)
	2nd learning trial (T2)	3.7 (1.6)	3.5 (1.8)	3.8 (1.5)
	3rd learning trial (T3)	4.9 (3.0)	4.3 (2.1)	5.5 (3.5)
	Delayed recall trial (del)	3.8 (2.5)	3.3 (1.9)	4.2 (2.8)
	Average across trials	3.9 (1.9)	3.6 (1.6)	4.2 (2.1)
Sung	1st learning trial (T1)	2.8 (1.2)	3.1 (1.5)	2.5 (0.8)
	2nd learning trial (T2)	3.3 (1.6)	3.5 (2.1)	3.1 (0.9)
	3rd learning trial (T3)	5.6 (4.5)	6.5 (6.2)	4.7 (2.0)
	Delayed recall trial (del)	4.2 (2.7)	4.7 (3.6)	3.8 (1.5)
	Average across trials	4.0 (2.1)	4.4 (2.8)	3.6 (1.1)

Data are mean (SD).

Fig. 3

Average length of recalled chunks (mean ± SEM) in the spoken (black) and sung (white) tasks in all patients (left) and in non-aphasic and aphasic patients (right). Significant Task x Group interaction indicated with an asterisk.

Average chunk length in the SSSRT. Data are mean (SD). Average length of recalled chunks (mean ± SEM) in the spoken (black) and sung (white) tasks in all patients (left) and in non-aphasic and aphasic patients (right). Significant Task x Group interaction indicated with an asterisk. The Task x Aphasia interaction was further analyzed with (i) paired t-tests comparing the chunk length between the sung and spoken tasks within the non-aphasic and aphasic groups and (ii) an independent-samples t-test comparing the non-aphasic and aphasic groups on the chunk length difference between the tasks (sung minus spoken). The paired t-tests showed an opposite, marginally significant pattern of recall effects in the two tasks, with the aphasic patients recalling longer chunks in the sung than spoken task [t(13) = 2.0, p = .073] and the non-aphasic patients recalling longer chunks in the spoken than sung task [t(15) = 1.8, p = .097]. The independent-samples t-test showed that the aphasic patients recalled longer chunks in the sung than spoken task compared to the non-aphasic patients [t(28) = 2.7, p = .013].

Relationship between prosody perception and SSSRT serial position and chunking effects

In order to determine if the mnemonic benefit of sung compared to spoken presentation was associated with prosodic skills, we performed correlation analyses (Pearson, two-tailed) between the scores on the linguistic and emotional prosody perception tasks and performance in the spoken and sung tasks for each verse (V1, V3, V5), PE, RE, and chunk length in non-aphasic and aphasic patients. There were no significant correlations in non-aphasic patients. In aphasic patients (see Fig. 4), emotional prosody perception correlated significantly with the recall of V1 (r = 0.53, p = .05) and V5 (r = 0.65, p = .011) in the spoken task and of V1 (r = 0.77, p = .001), V3 (r = 0.76, p = .002), and V5 (r = 0.86, p < .001) in the sung task. Analyses of the difference score (sung minus spoken) further showed that the better recall of V3 in the sung than spoken task correlated with better emotional prosody perception (r = 0.58, p = .031). In aphasic patients, emotional prosody perception also correlated with chunk length in the sung task (r = 0.76, p = .001) and in the difference between the tasks (sung minus spoken, r = 0.79, p = .001). No other significant correlations were observed within the aphasic group.

Fig. 4

Scatter plots showing the correlation between the recall of V1 (A), V3 (B), and V5 (C) and average chunk length (D) in the sung (black) and spoken (white) tasks and in their difference (sung minus spoken, grey) and the performance in the emotional prosody perception task in aphasic patients (N = 14). Only significant correlations are shown with regression lines.

Neural correlates of the serial position and chunking effects in the sung vs. spoken tasks

The structural neural correlates of the different SPE patterns and chunking effects in the sung vs. spoken task in the aphasic and non-aphasic patients (see above) were analyzed by correlating them with grey matter volume (GMV) from voxel-based morphometry (VBM) data and to the volume and fractional anisotropy (FA) of white matter tracts from deterministic tractography (DT) data.

VBM results

In the aphasic patients, the larger RE in the sung vs. spoken task was associated with greater GMV in the left posterior temporal [superior temporal gyrus (STG), middle temporal gyrus (MTG)], parietal [postcentral gyrus (postCG), middle occipital gyrus (MOG)], and limbic [parahippocampal gyrus (PHG)] regions as well as in the right frontal [precentral gyrus (preCG)], posterior temporal [inferior temporal gyrus (ITG)], and parietal [inferior parietal lobule (IPL)] regions (Table 4 and Fig. 5A). Similarly, the left posterior temporal (MTG), frontal (preCG), and parietal (postCG, MOG) regions as well as the right posterior temporal (ITG) regions also showed greater GMV associated with the sung > spoken RE in aphasic compared to non-aphasic patients (Table 4 and Fig. 5B). No other significant effects were observed.

Table 4

Significant correlations between grey matter volume and SSSRT performance.

Patients/Contrast	Condition	Area	MNI coordinates	Cluster size	t-Value	Correlation
Aphasic	Sung > spoken RE	Left middle temporal gyrus (BA 21)	-66 -52 1	1008	15.2⁎⁎	r = 0.96, p < .001
		Left superior temporal gyrus (BA 22)	-69 -41 11	1008	15.2⁎⁎	r = 0.96, p < .001
		Left middle occipital gyrus (BA 19)	-45 -88 13	512	14.5⁎	r = 0.95, p < .001
		Left postcentral gyrus (BA 3)	-35 -21 44	849	9.4⁎⁎	r = 0.95, p < .001
		Left parahippocampal gyrus (BA 34)	-13 -19 -22	643	8.2⁎	r = 0.95, p < .001
		Right inferior temporal gyrus (BA 20, 37)	47 -79 -24	2596	8.5⁎⁎	r = 0.96, p < .001
		Right cerebellum	53 -65 -25	2596	8.5⁎⁎	r = 0.96, p < .001
		Right inferior parietal lobule (BA 40)	59 -31 34	1184	8.1⁎⁎	r = 0.93, p < .001
		Right postcentral gyrus (BA 2, 3)	57-22 45	1184	8.1⁎⁎	r = 0.93, p < .001
		Right precentral gyrus (BA 4)	57 -14 36	480	7.3⁎	r = 0.95, p < .001
Aphasic > Non-aphasic	Sung > spoken RE	Left middle temporal gyrus (BA 37)	-46 -82 15	880	5.7⁎	n.s.
		Left middle occipital gyrus (BA 19)	-50 -75 0	880	5.7⁎	n.s.
		Left middle temporal gyrus (BA 21, 37, 39)	-61 -56 5	782	5.6⁎	n.s.
		Left precentral gyrus (BA 4)	-41 -23 57	1026	5.0⁎	r = 0.74, p = .003
		Left postcentral gyrus (BA 3)	-39 -26 57	1026	5.0⁎	r = 0.74, p = .003
		Right inferior temporal gyrus (BA 20, 37)	59 -56 -24	1874	6.0⁎⁎	r = 0.55, p = .041
		Right cerebellum	47 -70 -23	1874	6.0⁎⁎	r = 0.55, p = .041

All results are thresholded at a whole-brain uncorrected p < .001 threshold at the voxel level with a minimal cluster size set to 100 voxels. Correlations are partial correlations with 2-tailed p-value controlling for age, sex and TIV. BA = Brodmann area, RE = recency effect.

p < .05 FWE-corrected at the cluster level.

p < .005 FWE-corrected at the cluster level.

Fig. 5

Voxel-based morphometry (VBM) results showing significant correlations between regional grey matter volume and the sung > spoken recency effect (RE) in the left and right hemispheres (A) within the aphasic group and (B) in a contrast between the aphasic and non-aphasic groups. CER = cerebellum, IPL = inferior parietal lobule, ITG = inferior temporal gyrus, L = left, MOG = middle occipital gyrus, MTG = middle temporal gyrus; PostCG = postcentral gyrus, PreCG = precentral gyrus, R = right, STG = superior temporal gyrus.

Significant correlations between grey matter volume and SSSRT performance. All results are thresholded at a whole-brain uncorrected p < .001 threshold at the voxel level with a minimal cluster size set to 100 voxels. Correlations are partial correlations with 2-tailed p-value controlling for age, sex and TIV. BA = Brodmann area, RE = recency effect. p < .05 FWE-corrected at the cluster level. p < .005 FWE-corrected at the cluster level. Voxel-based morphometry (VBM) results showing significant correlations between regional grey matter volume and the sung > spoken recency effect (RE) in the left and right hemispheres (A) within the aphasic group and (B) in a contrast between the aphasic and non-aphasic groups. CER = cerebellum, IPL = inferior parietal lobule, ITG = inferior temporal gyrus, L = left, MOG = middle occipital gyrus, MTG = middle temporal gyrus; PostCG = postcentral gyrus, PreCG = precentral gyrus, R = right, STG = superior temporal gyrus.

DT results

In the aphasic patients (Fig. 6A), there was a significant correlation between larger RE in the sung than spoken task and larger volume of the right inferior fronto-occipital fasciculus (IFOF; r = 0.67, p = .009). In contrast, the non-aphasic patients (Fig. 6B) showed a strong correlation between smaller PE in the sung than spoken task and larger volume (r = −0.69, p = .003) and FA (r = −0.63, p = .009) of the left arcuate fasciculus (AF, long segment). No other significant correlations were observed.

Fig. 6

Deterministic tractography (DT) results showing significant correlations between (A) the volume of the right inferior fronto-occipital fasciculus (IFOF) and the sung > spoken recency effect (RE) in aphasic patients and (B) the volume of the left arcuate fasciculus (AF, long segment) and the sung > spoken primacy effect (PE) in non-aphasic patients.

Discussion

In the present study, we set out to investigate in a sample of 31 stroke patients whether the mnemonic benefit of sung vs. spoken novel narrative stories after stroke (Leo et al., 2018) would be related to a difference in the memory processes involved, indicated by serial position and chunking effects, and whether these effects would differ in aphasic and non-aphasic patients. Moreover, using voxel-based morphometry (VBM) and deterministic tractography (DT) analyses, we sought to uncover the structural neural correlates of these effects. Our main finding was that non-aphasic and aphasic patients showed a different pattern of serial position effects (SPE) and chunking effects in the two tasks, with (i) more stable recall performance (no SPE) in the sung than spoken task in the non-aphasic patients and (ii) longer recalled chunks and better recall of last verse (larger recency effect, RE) in the sung than spoken task in the aphasic patients. The latter effect was also coupled with grey matter volume (GMV) in mostly bilateral temporal, frontal, and parietal regions, as well as with the volume of the right inferior fronto-occipital fascuculus (IFOF). In the non-aphasic patients, there was a classic SPE pattern, with both primacy effect (PE) and RE in the spoken task, but no discernible SPE in the sung task. Verse-level comparison between the tasks showed that non-aphasic patients recalled the middle verse (V3) better in the sung than spoken task, thereby making the recall performance more stable in the sung task. This result is in line with previous studies in healthy subjects, which have reported a smaller SPE (less bowed U curve) for the learning of digits when presented in sung than spoken format (Silverman, 2007) and a familiarity effect whereby familiar song lyrics are linked to better recall performance across all verses of a song (Maylor, 2002). Interestingly, the better stability of recall in the sung task, which resulted in a smaller PE than in the spoken task, also correlated with the volume and integrity (indicated by fractional anisotropy) of the left arcuate fasciculus (AF) in non-aphasic patients. Forming the dorsal “perception-action” pathway, the left AF is thought to map sensory targets in posterior temporal areas to motor programs coded in Broca's area (Hickok and Poeppel, 2007) and it has also been found to be crucial for sustained rehearsal processes in verbal working memory (Buchsbaum et al., 2005). Consequently, the left AF is considered an important tract for verbal learning in children (Leroy et al., 2011; Su et al., 2018) and adults (López-Barroso et al., 2013; Thiebaut de Schotten et al., 2014) and it also forms a key part of the dual-stream pathway for singing production (Loui, 2015), its structure being malleable by singing training (Halwani et al., 2011). It is plausible that the slower presentation rate and the cues provided by the repetitive melodic structure in our sung task made it easier for covert rehearsal and less demanding for working memory, thereby recruiting the left dorsal pathway (AF) less than the spoken task. Aphasic patients showed a PE in both the spoken and the sung task but a RE only in the sung task. The RE was larger in the sung than spoken task in aphasic compared to non-aphasic patients, which was attributable primarily to aphasic patients' better recall of the last verse (V5) in the sung task. Aphasic patients also recalled longer chunks in the sung than spoken task compared to non-aphasic patients. This pattern of results is line with the hypothesis that (i) the sung melody helps in combining words and linking succeeding verses together in memory, enabling chunking (Ferreri et al., 2015; McElhinney and Annett, 1996; Wallace, 1994) and (ii) the repetition of the same melody across the verses gradually builds the mnemonic effect in the sung task, making the last part of the story more salient and easily accessible for recall (enhancing the RE) once the melody is learned. Previous studies in non-fluent aphasic patients have shown that when recalling sequential verbal material, aphasic patients have a relatively normal RE but a clearly smaller PE, which seems to be related to their reduced verbal memory span and difficulties in covert rehearsal (Ivanova et al., 2018; Jefferies et al., 2008; Ostergaard and Meudell, 1984). Thus, it is possible that due to their verbal memory deficits, the aphasic patients in our study did not benefit from the sung melody as an aid in covert rehearsal in working memory as the non-aphasic patients (who had less severe memory deficits) apparently did, but showed more automatic effects driven more by stimulus-specific factors (i.e., the repetitive melody and rhythm of the song enable chunking and make the last verse most salient in short-term storage). It is possible that recalling the sung melody may also act as a contextual cue when attempting to retrieve the lyrics from memory. For example, Isarida and Isarida (2006) found that in the delayed recall of word lists, the RE was stronger when same environmental cues (including also background music played during the task) were present in both the initial encoding and the delayed recall situation in healthy subjects. In VBM, the sung > spoken RE was associated with greater GMV (i) in left posterior temporal (STG, MTG), parietal (postCG, MOG), and limbic (PHG) regions and in right frontal (preCG), posterior temporal (ITG), and parietal (IPL) regions within the aphasic group and (ii) in left posterior temporal (MTG), frontal (preCG), and parietal (postCG, MOG) and right posterior temporal (ITG) regions in the aphasic group compared to the non-aphasic group. This pattern was different than the one we previously reported for the general sung > spoken learning effect in aphasic patients, which comprised of primarily left prefrontal areas [inferior frontal gyrus (IFG), middle frontal gyrus (MFG), anterior cingulate] and bilateral superior parietal and striatal areas (Särkämö and Sihvonen, 2018). The present results are well in line with previous neuroimaging studies of the RE in verbal memory in both healthy subjects and clinical groups with memory deficits, which have specifically implicated the IPL (Buchsbaum et al., 2011; Innocenti et al., 2013), MTG and ITG (Düzel et al., 1996; Spalletta et al., 2016; Staffaroni et al., 2017), and hippocampal area (Spalletta et al., 2016) in the RE. Overall, the results also conform well with neuroimaging studies of singing in which the STG and MTG have been linked to perceptual processing of lexical/phonological and melodic features and the preCG, postCG, and cerebellar regions to vocal-motor processing, in a largely bilateral fashion (Callan et al., 2006; Méndez Orellana et al., 2014; Özdemir et al., 2006; Salmi et al., 2017; Schön et al., 2010; Segado et al., 2018). Aside from production, also listening to singing activates bilateral frontotemporal areas as well as subcortical/limbic areas (e.g., hippocampus, striatum, orbitofrontal cortex) more extensively than listening to speech (Callan et al., 2006; Méndez Orellana et al., 2014; Schön et al., 2010). The right IPL, in turn, has been identified as a crucial hub for higher-level analysis of melodic (Royal et al., 2016) and rhythmic (Konoike et al., 2012) structure of music. In neuroimaging studies of aphasic patients, recovery of speech has been linked to the functioning of both left and right hemisphere language networks (Forkel et al., 2014; Saur et al., 2006) and singing-based rehabilitation, for example using melodic intonation therapy (MIT), has been found to increase functional activation in frontotemporal auditory-motor and language areas during speech/singing production, either in the left or right hemisphere or bilaterally (Belin et al., 1996; Breier et al., 2010; Jungblut et al., 2014; Laine et al., 1993b; Schlaug et al., 2008). Using DTI, structural neuroplasticity changes induced by MIT have thus far been reported only in right frontotemporal tracts, including the AF and the uncinate fasciculus (UF) (Wan et al., 2014; Zipse et al., 2012). In the present study, we found that the sung > spoken RE in aphasic patients correlated with larger volume in the right IFOF. Part of the ventral processing stream, the IFOF has been increasingly recognized to play a role in language processing (Dick and Tremblay, 2012) and cognition (Cremers et al., 2016), including working memory and learning (Chiou et al., 2016; Krogsrud et al., 2018), as well as in music perception (Sihvonen et al., 2017c). In addition to occipital lobe areas, the posterior termination branches of the IFOF extend also to the posterior inferior temporal and parietal areas (Duffau et al., 2013; Sarubbo et al., 2013), close to the ITG and IPL clusters that were linked to the RE in our aphasic patients and in previous imaging studies (see above). As the IFOF is thought to play a role in multimodal integration and semantic processing, (Sarubbo et al., 2013), it is plausible that in our sung task this tract carries the melodic and rhythm information to prefrontal areas (IFG, MFG) where this is integrated with the verbal content of the story and processed in working memory. In addition to music, the right ventral stream has recently been implicated in the perception of prosody (Frühholz et al., 2015; Sammler et al., 2015). Interestingly, the correlation analyses showed that in aphasic patients, performance in the emotional (but not linguistic) prosody perception task was linked to better recall of the first (V1) and last (V5) verses in the spoken task and more strongly with better recall of the first (V1), middle (V3), and last (V5) verses in the sung task, with the sung > spoken difference and emotional prosody correlation being significant only in V3. In aphasics, emotional prosody correlated also with chunk length in sung task and in the sung > spoken task difference. Given that emotional prosody perception occurs in a bilateral but right-dominant network (Wildgruber et al., 2006), which is often at least partly preserved in aphasia (Barrett et al., 1999; Ross et al., 1997), this result suggests that the mnemonic benefit of songs on verbal learning in aphasia may be associated also with their emotion-expressing function and the additional emotional cues provided by songs for recall. In summary, the results of the present study demonstrate that stroke patients benefit from sung repetitive melody as a memory aid in the learning and recall of novel verbal material. However, the cognitive and neural mechanisms underlying the mnemonic effect of songs differ in non-aphasic and aphasic patients, likely owing to differences in lesion patterns and the severity of memory deficits. It seems that in non-aphasic patients, the cues provided by the musical structure facilitate covert rehearsal of the material in verbal working memory, mediated by the left dorsal pathway (AF), resulting in more even recall performance across the verses of the story. Aphasic patients, in turn, seem to benefit from the repetitive melody and rhythm of singing as a means of chunking the words and making the last verse of the story most salient in memory for recall, mediated by bilateral frontal, temporal, and parietal areas as well as the right ventral pathway (IFOF) in particular. One limitation of the present study was the relatively small size of the subgroups (14 aphasic, 17 non-aphasic). In the future, it would be interesting to explore the benefits and mechanisms of musical mnemonics in a larger sample of aphasic patients, including patients with different subtypes and severity levels of aphasia. The following are the supplementary data related to this article. Supplementary material

5 in total

1. Serial position effects in the Logical Memory Test: Loss of primacy predicts amyloid positivity.

Authors: Davide Bruno; Kimberly D Mueller; Tobey Betthauser; Nathaniel Chin; Corinne D Engelman; Bradley Christian; Rebecca L Koscik; Sterling C Johnson
Journal: J Neuropsychol Date: 2020-12-04 Impact factor: 2.276

2. It's in our hands: a rapid, international initiative to translate a hand hygiene song during the COVID-19 pandemic.

Authors: N Thampi; Y Longtin; A Peters; D Pittet; K Overy
Journal: J Hosp Infect Date: 2020-05-06 Impact factor: 3.926

3. Melodic Intonation Therapy on Non-fluent Aphasia After Stroke: A Systematic Review and Analysis on Clinical Trials.

Authors: Xiaoying Zhang; Jianjun Li; Yi Du
Journal: Front Neurosci Date: 2022-01-27 Impact factor: 4.677

4. Neuroanatomical correlates of speech and singing production in chronic post-stroke aphasia.

Authors: Noelia Martínez-Molina; Sini-Tuuli Siponkoski; Anni Pitkäniemi; Nella Moisseinen; Linda Kuusela; Johanna Pekkola; Sari Laitinen; Essi-Reetta Särkämö; Susanna Melkas; Boris Kleber; Gottfried Schlaug; Aleksi Sihvonen; Teppo Särkämö
Journal: Brain Commun Date: 2022-01-11

5. Abnormal singing can identify patients with right hemisphere cortical strokes at risk for impaired prosody.

Authors: Rebecca Z Lin; Elisabeth B Marsh
Journal: Medicine (Baltimore) Date: 2021-06-11 Impact factor: 1.817

5 in total