Literature DB >> 25328839

Insight into the neurophysiological processes of melodically intoned language with functional MRI.

Carolina P Méndez Orellana¹, Mieke E van de Sandt-Koenderman², Emi Saliasi³, Ineke van der Meulen², Simone Klip⁴, Aad van der Lugt⁴, Marion Smits⁴.

Abstract

BACKGROUND: Melodic Intonation Therapy (MIT) uses the melodic elements of speech to improve language production in severe nonfluent aphasia. A crucial element of MIT is the melodically intoned auditory input: the patient listens to the therapist singing a target utterance. Such input of melodically intoned language facilitates production, whereas auditory input of spoken language does not.
METHODS: Using a sparse sampling fMRI sequence, we examined the differential auditory processing of spoken and melodically intoned language. Nineteen right-handed healthy volunteers performed an auditory lexical decision task in an event related design consisting of spoken and melodically intoned meaningful and meaningless items. The control conditions consisted of neutral utterances, either melodically intoned or spoken.
RESULTS: Irrespective of whether the items were normally spoken or melodically intoned, meaningful items showed greater activation in the supramarginal gyrus and inferior parietal lobule, predominantly in the left hemisphere. Melodically intoned language activated both temporal lobes rather symmetrically, as well as the right frontal lobe cortices, indicating that these regions are engaged in the acoustic complexity of melodically intoned stimuli. Compared to spoken language, melodically intoned language activated sensory motor regions and articulatory language networks in the left hemisphere, but only when meaningful language was used. DISCUSSION: Our results suggest that the facilitatory effect of MIT may - in part - depend on an auditory input which combines melody and meaning.
CONCLUSION: Combined melody and meaning provide a sound basis for the further investigation of melodic language processing in aphasic patients, and eventually the neurophysiological processes underlying MIT.

Entities: CellLine Chemical Disease Gene Species

Keywords: Aphasia; auditory perception; fMRI; language; language therapy; singing

Mesh：

Year: 2014 PMID： 25328839 PMCID： PMC4107379 DOI： 10.1002/brb3.245

Source DB: PubMed Journal: Brain Behav Impact factor: 2.708

Introduction

Aphasia is a severe language disorder that affects language comprehension and production at different degrees, compromising both spoken and written modalities. The most common cause of aphasia is stroke, in which a neurovascular event damages the language areas localized in the left hemisphere. A common treatment to restore spoken language in severe nonfluent aphasic patients is Melodic Intonation Therapy (MIT) (Albert et al. 1973). This form of therapy has recently received much press attention after the successful recovery of U.S. congresswoman Gabrielle Giffords (Bambury 2011). In a stepwise procedure, MIT uses musical elements of speech such as melody and rhythm (Norton et al. 2009) to help the patient to initiate language production. In the first steps, the speech and language therapist (SLT) shows the patient how to produce a specific target utterance by “singing” the utterance, that is, accentuating its melody and the rhythm. This is accompanied by tapping with the left hand. Such melodically intoned auditory input is thought to play a crucial role in facilitating language production, by priming the patient's inner rehearsal of the target utterance (Norton et al. 2009). MIT's critical elements, intonation, and left-hand tapping, are both thought to be related to right hemisphere activation. Intonation targets the potential role of this hemisphere in processing spectral information, musical features, and prosody, while left-hand tapping engages the right hemisphere sensorimotor network that controls hand and mouth movements (Norton et al. 2009). Although it is not yet clear whether it is melody, rhythm or their combination used in MIT that specifically aid speech production (van der Meulen et al. 2012; Stahl et al. 2013), the treatment has been associated with functional (Vines et al. 2011) and also structural changes in the right hemisphere (Schlaug et al. 2009). The positive effect of this treatment, hypothetically aiding the reorganization of language representation in the damaged brain, has triggered interest in understanding how the musical elements, that are used in MIT, are processed in the brain. Neuroimaging studies investigating the differences between spoken and melodic language in healthy volunteers have thus far focused primarily on production (i.e. speaking and singing) (Riecker et al. 2000; Jeffries et al. 2003; Ozdemir et al. 2006; Gunji et al. 2007). Despite the methodological diversity of these studies, in general they report a lateralization effect for singing to the right, and speech to the left hemisphere. Thus, encouraging the aphasic patients to use melody during their speech production may target areas in the undamaged right hemisphere, but the question remains what the role is of the melodically intoned auditory input, that is offered intensively during MIT and that probably plays a crucial role in the initial facilitation of language production. From this point of view, that is, reception instead of production, Meyer et al. (2002) investigated the perceptual differences in processing spoken normal sentences, spoken delexicalized sentences, and prosodic speech (speech utterance reduced to speech melody). Melody (pitch variations in speech) is a component of prosody among several others such as rhythm and loudness (Nooteboom 1997). Their results suggest that right hemispheric activation observed while processing normal speech stimuli mainly comes from the underlying processing of prosody. Later studies have focused on the perception of spoken and sung language, and have shown differences in hemispheric lateralization (Callan et al. 2006; Schön et al. 2010). Speech prosody patterns are similar to the musical features in singing such as melody, rhythm, and loudness, but they exhibit differences regarding their acoustic features. Callan et al. (2006) found right-lateralized activation of the anterior superior temporal gyrus (STG) for sung language, and a strongly left-lateralized activity pattern for spoken language. Schön et al. (2010) suggested that linguistic and musical processing have a different hemispheric specialization. Brain activation patterns for sung versus spoken words showed more extended activations in the right temporal lobe, whereas the processing of linguistic aspects in singing versus vocalization showed a predominance in the left temporal lobe. A recent study of Merrill et al. (2012) found that listening to song and speech activated the temporal lobe rather symmetrically. However, substantial nonoverlap was also found: activation in the inferior frontal gyrus (IFG) was left-lateralized for spoken words as well as for processing pitch in the speech, while right-sided lateralization was found for pitch in the song. The brain regions involved in the auditory perception of melodically intoned language – a simplified version of singing – have not, to our knowledge, been reported. No more than three to four tones are used to exaggerate speech prosody (Helm-Estabrooks et al. 1989; Sparks 2008). Melodically intoned language is a key feature in MIT and for a greater insight into its neurophysiological processes, this feature needs to be examined. The aim of this study is to investigate the differential perceptual processing of spoken and melodically intoned language using functional MRI. We furthermore assessed whether there was an effect of lexical-semantic content, since it is a meaningful language that MIT uses to improve everyday communication in aphasic patients. A sparse temporal sampling design was employed for acquisition of the functional imaging data to ensure that scanner noise would not interfere with the auditory stimuli, thus being maximally sensitive to differences between the different types of language stimuli.

Methods

Participants

Twenty right-handed volunteers (median age: 23 years, range: 21–51 years, 15 females) with no neurological or psychiatric history, participated in this study. None of the participants had any particular musical education. They did not use any prescription medication except oral contraception. Handedness was determined with the Edinburgh Handedness Inventory (Oldfield 1971) indicating 100% right-handedness in all participants. The study was approved by the institutional review board and all participants gave written informed consent prior to participation. Due to technical failure during data acquisition, one participant (female, aged 21 years) was excluded from the analysis.

Experimental stimuli and paradigm

The experiment consisted of two conditions of spoken and melodically intoned stimuli. Each condition contained three categories of 30 items each: (1) 30 meaningful items (17 real words and 13 short noun, prepositional or verb phrases); (2) 30 meaningless items without lexical-semantic information (17 pseudowords and 13 short phrases containing pseudowords); (3) 30 neutral utterances, consisting of a repetitive consonant vocal combination (“Nana”). (Fig. 1; sample stimuli (in Dutch) can be provided upon request). Within and across both conditions, stimuli were matched across the three categories for the number of syllables (range: 2–6), for intonation and stress patterns (for spoken stimuli), melodic contour (for melodically intoned stimuli), semantic content, and syntactic structure of the phrases. We chose to use different words as spoken and melodically intoned stimuli to prevent our participants from becoming familiarized with the words, thus avoiding unwanted and unpredictable effects such as habituation, memory, and learning. Representative examples of the stimuli from both conditions are given in Figure 1, indicating the very minor differences in semantic content between stimuli of a given category such as “goede morgen” (good morning) in the spoken condition and “goede middag” (good afternoon) in the melodically intoned condition.

Figure 1

Stimulus examples (in Dutch) of the two experimental conditions. Spoken stimuli (left side of the figure): words are separated into syllables with a black dot. Syllables that are underlined are stressed. Melodically intoned stimuli (right side of the figure): musical notation of the stimulus. In each condition there are three types of stimuli: (1) meaningful, (2) meaningless, and (3) neutral utterances. Provided are examples of words with two and four syllables, and of short phrases of six syllables. Approximately ♩ = 120. The items were selected by a clinical linguist specialized in MIT and were recorded by a female therapist. Spoken stimuli were recorded with a natural intonation and were not stressed rhythmically in order to keep them as natural as possible. Melodically intoned stimuli were recorded with the same prosodic patterns as those used in MIT. All recorded items had a maximum duration of 3 sec. Melodically intoned items were on average longer than the spoken items (2.24 sec vs. 1.23 sec, respectively; 2-sample t-test P < 0.0001). The experiment was conducted in an event-related design consisting of four experimental conditions and two control conditions. The stimuli in the experimental conditions consisted of 30 melodically intoned meaningful items (“melodic-sense”), 30 spoken meaningful items (“spoken-sense”), 30 melodically intoned meaningless items (“melodic-nonsense”), and 30 spoken meaningless items (“spoken-nonsense”). The two control conditions consisted of the neutral utterances, either melodically –intoned (n = 30; “melodic-neutral”) or spoken (n = 30; “spoken-neutral”). The task was presented binaurally through an MR compatible headphone system. Participants were required to press the response button upon hearing a meaningful item by pressing the response pad held in the left hand. Stimuli were pseudo-randomized using the genetic algorithm toolbox Optimize Design 11 (Wager and Nichols 2003) and implemented in Matlab version 6.5.1 (The Mathworks Sherborn, MA), with optimization for the contrast between melodically intoned versus spoken language primarily (which we will refer to as acoustic information), and for the contrast between meaningful and meaningless language secondarily (lexical-semantic information). The task was presented using Presentation v13.0 software (Neurobehavioral Systems Inc. Albany, CA) installed on a desktop PC, which was dedicated for stimulus presentation. External triggering by the MR system ensured synchronization of the stimulus paradigm with the imaging data acquisition and precise recording of task performance, and response times through a fiber-optic button response pad. Participants were familiarized with the task prior to scanning with a sample set of representative items. Behavioral data (responses and reaction times) were collected during scanning. Differences in performance between melodically intoned and spoken items were assessed with a two sample t-test.

fMRI image analysis

Imaging acquisition and preprocessing

Scanning was performed on a 3T MR system (HD platform, GE Healthcare, Milwaukee, WI). An 8-channel head coil was used for reception of the signal. For anatomical reference, a high-resolution 3 dimensional (3D) Inversion Recovery (IR) Fast Spoiled Gradient Echo (FSPGR) T1-weighed sequence was used, with the following pulse sequence parameters: repetition time (TR)/echo time (TE)/inversion time (TI) 10.5/2.1/300 ms; flip angle 18°; acquisition matrix 416 × 256; field of view (FOV) 250 × 175 mm2; 172 slices with a slice thickness of 1.6 mm and 0.8 mm overlap; acquisition time 4:40 min. For functional imaging, a sparse temporal sampling design was employed for acquisition of the functional imaging data, using a single shot T2*-weighted gradient echo echo-planar imaging (EPI) sequence sensitive to blood oxygenation level dependent (BOLD) contrast (TE 30 ms; flip angle 75°; acquisition matrix 64 × 96; FOV 220 × 220 mm2; slice thickness 3.5 mm with no gap; 39 slices with full brain coverage). TR was 6000 ms and acquisition time 3000 ms resulting in a 3000 ms silent gap which was used for presentation of the auditory stimulus. Total duration was 18:30 min. The functional imaging data acquisition included five dummy scans that were discarded from further analysis. Imaging analysis was performed using SPM8 (Statistical Parametric Mapping; Wellcome Trust Centre for Neuroimaging, London, UK). Images were manually reoriented to the anterior commissure and subsequently all T2*-weighed functional images were realigned to correct for the participant's motion during data acquisition and were coregistered with the individual's high-resolution T1-weighed anatomical image (Friston et al. 1995). The functional and anatomical images were normalized to the standard brain space defined by the Montreal Neurological Institute (MNI) as provided within SPM8, using affine and nonlinear registration. This resulted in resampled voxel sizes of 3 × 3×3 mm3 for the functional and 1 × 1×1 mm3 for the anatomical images. The normalized functional images were smoothed with a 3D Gaussian Full Width Half Maximum (FWHM) filter of 6 × 6×6 mm3 to increase the signal-to-noise ratio, correct for interindividual anatomical variation and to normalize the data (Friston et al. 1999).

Statistical analysis of fMRI data

All fMRI data were analyzed within the context of the General Linear Model (GLM), by modeling the experimental conditions convolved with the hemodynamic response function (HRF), corrected for temporal autocorrelation and filtered with a high-pass filter of 128 sec cutoff. The neutral conditions were not modeled and served as an implicit baseline. To account for the sparse sampling acquisition, we defined the micro time resolution and onset based on the time bin that corresponded to the middle of the actual acquisition time (1500 ms). Motion parameters were included in the model as regressors of no interest to reduce the potential confounding effects due to motion. Because of the significantly longer duration of the melodically intoned versus the spoken stimuli, stimulus duration was modeled as an additional regressor of no interest to account for confounding stimulus duration effects. The individual t-contrast images for spoken-sense, spoken-nonsense, melodic-sense, and melodic-nonsense were used to perform a full-factorial ANOVA group analysis (n = 19 participants). The two within-subject factors, prosody and lexical-semantic information (equal variance, levels not independent), were entered in this analysis. Main effects as well as the interaction between these factors were investigated. The following contrasts were created to evaluate the main effects of lexical-semantic information: sense > nonsense and nonsense > sense; and of acoustic information: spoken > melodic and melodic > spoken. Interaction effects for acoustic information with lexical-semantic information were explored with the following contrasts: spoken-sense versus spoken-nonsense, melodic-sense versus melodic-nonsense, spoken-sense versus melodic-sense, and spoken-nonsense versus melodic-nonsense. The threshold for significance was set at P < 0.05 family wise error (FWE) corrected for multiple comparisons. Anatomical labeling of significantly activated clusters was performed using the Automated Anatomical Labeling map (Tzourio-Mazoyer et al. 2002) software extension to SPM8, using the extended local maxima labeling option. Figures were created with the SPM render function.

Results

Task performance

Participants performed well in both conditions with an average accuracy of 96% (SD: 3%). Performance was equally high in both conditions (P = 0.486).

fMRI activation results

Lexical-semantic information: main effect and interactions

We found a main effect for the lexical-semantic information factor (F (1,72) = 26.27 PFWE corrected <0.05). Post hoc analysis revealed no increased activation for the meaningless items compared to meaningful items (nonsense > sense). For the meaningful items compared to meaningless items (sense > nonsense) increased activation was seen left-lateralized in the supramarginal gyrus (SMG) and inferior parietal lobule (IPL). Increased bilateral activation was seen in the rolandic operculum, insula, supplementary, and cingulate motor area. Right-sided activation was observed in the pre- and postcentral gyrus at the level of the hand motor area, presumably due to the button presses (Fig. 2A; Table 1).

Figure 2

Three dimensional brain rendering with superposition of the activation maps displayed at PFWE corrected<0.05, k ≥ 10 for the following contrasts: (A) sense > nonsense stimuli, (B) spoken-sense > spokennonsense stimuli, (C) melodic-sense > melodic-nonsense, (D) melodic > spoken stimuli, (E) melodicsense > spoken- sense stimuli.

Table 1

Anatomical location	Side	Cluster size	MNI			T-value

			x	y	z
Inferior parietal lobule (50%)	L	259	−54	−31	40	8.08
Supramarginal gyrus (40%)	L
Rolandic operculum/insula (100%)	L	24	−48	−1	4	5.87
Rolandic operculum/insula (100%)	R	34	48	5	4	6.27
Supplementary motor area (70%)	L/R	512	6	−4	52	10.00
Middle cingulate gyrus (50%)	L/R
Pre- and postcentral gyrus (82%)	R	645	36	−22	49	15.57
Supramarginal gyrus (5%)	R
Inferior parietal lobule (4%)	R
Thalamus (50%)	R	51	15	−22	4	6.51
Cerebellum (100%)	L	23	−18	−61	−23	5.74