Literature DB >> 35323933

Longitudinal change in neural response to vocal emotion in adolescence.

Michele Morningstar^1,2,3, Whitney I Mattson², Eric E Nelson^2,3.

Abstract

Adolescence is associated with maturation of function within neural networks supporting the processing of social information. Previous longitudinal studies have established developmental influences on youth's neural response to facial displays of emotion. Given the increasing recognition of the importance of non-facial cues to social communication, we build on existing work by examining longitudinal change in neural response to vocal expressions of emotion in 8- to 19-year-old youth. Participants completed a vocal emotion recognition task at two timepoints (1 year apart) while undergoing functional magnetic resonance imaging. The right inferior frontal gyrus, right dorsal striatum and right precentral gyrus showed decreases in activation to emotional voices across timepoints, which may reflect focalization of response in these areas. Activation in the dorsomedial prefrontal cortex was positively associated with age but was stable across timepoints. In addition, the slope of change across visits varied as a function of participants' age in the right temporo-parietal junction (TPJ): this pattern of activation across timepoints and age may reflect ongoing specialization of function across childhood and adolescence. Decreased activation in the striatum and TPJ across timepoints was associated with better emotion recognition accuracy. Findings suggest that specialization of function in social cognitive networks may support the growth of vocal emotion recognition skills across adolescence.

Entities: Chemical

Keywords: adolescence; development; emotion recognition; fMRI; social cognition; vocal emotion

Mesh：

Year: 2022 PMID： 35323933 PMCID： PMC9527472 DOI： 10.1093/scan/nsac021

Source DB: PubMed Journal: Soc Cogn Affect Neurosci ISSN： 1749-5016 Impact factor: 4.235

Adolescence is characterized by marked changes in hormones, social behaviour, and emotional and cognitive abilities (Steinberg and Morris, 2001; Crone and Dahl, 2012; Nelson ). At a neurobiological level, the adolescent brain continues to undergo extensive maturation in structure and function throughout the teenage years (Giedd ; Paus, 2005; Blakemore, 2012). In particular, regions involved in processing social information—including areas like the medial prefrontal cortex (mPFC), superior temporal sulcus (STS), temporo-parietal junction (TPJ) and temporal pole (Carrington and Bailey, 2009; Mills )—become increasingly specialized with age and ongoing experience with social stimuli (Johnson ). According to the Interactive Specialization model of neurodevelopment (Johnson, 2000), maturation of function in these areas of the brain may be indexed by increased efficiency of processing, more focalized activation to a narrower range of relevant stimuli and greater connectivity within the social brain network. Such patterns of change in engagement and efficiency of function are thought to support improved social cognitive abilities in adolescence, such as face processing, biological motion detection, and mentalizing or emotion recognition (ER) (Blakemore, 2008; Johnson ; Burnett ; Kilford ). Most work probing developmental influences on neural engagement during ER tasks has examined youth’s response to emotional faces (see review by Leppänen and Nelson, 2006). However, other non-verbal cues—such as a speaker’s tone of voice, beyond the content of their speech—also convey important social and affective information that must be decoded in social contexts (Banse and Scherer, 1996; Mitchell and Ross, 2013). There is substantial behavioural evidence that the capacity to recognize emotional intent in others’ voices (vocal ER) is undergoing active maturation well into the teenage years (Chronaki ; Grosbras ; see review by Morningstar ). Like with other social cognitive functions, ongoing development in vocal ER in adolescence is likely bolstered by learning-related processes. To the extent that experience shapes learning, the increased exposure to affective cues in social interactions with peers—which become highly salient and central to teenagers during adolescence (Spear, 2000)—may scaffold teenagers’ vocal ER skills over time. In addition, the specialization of relevant neural networks may support increased vocal ER with age. However, little is known about the developmental changes in youth’s neural processing of vocal affect. Models of neural response to vocal emotion in the adult brain (Schirmer and Kotz, 2006) outline three stages of processing in a fronto-operculo-temporal network (Wildgruber ; Iredale ; Kotz ). According to these models, acoustic information is first extracted in the primary auditory cortex within Heschl’s gyrus. Representation of meaningful suprasegmental features of the auditory stream (e.g. stress and tone) is then processed within a wider band of the STS and temporal lobe more broadly (Belin ; Wildgruber ; Yovel and Belin, 2013). Evaluation of prosody and its emotional content then largely implicates the bilateral inferior frontal cortex (Ethofer , 2009; Alba-Ferrara ; Fruhholz ). In addition to these auditory processing areas, other regions involved in social information processing (e.g. mPFC and TPJ) likely play a role in decoding emotional intent in others’ voices. Indeed, the mentalizing network has been implicated in drawing inferences about others’ emotions in multi-modal stimuli (Hooker ; Zaki , 2012). Previous work by our group noted age-related changes in the involvement of such social cognitive regions in a vocal ER task (Morningstar ). In a sample of 8- to 19-year-olds, increased age was associated with (i) greater engagement of frontal areas involved in linguistic and emotional processing and (ii) greater connectivity between frontal areas and the right TPJ, when hearing vocal stimuli. These neural patterns were associated with greater vocal ER accuracy, suggesting that emerging fronto-temporo-parietal networks are relevant to growth in this social cognitive skill in adolescence. However, this work was cross-sectional, which limits inferences about neural mechanisms of behavioural change over time.

Goals and hypotheses

The current study builds on this work by examining youth’s behavioural and neural responses to vocal emotional prosody in a longitudinal framework. Youth aged 8 to 19 years (sample from Morningstar ) completed a vocal ER task while undergoing functional magnetic resonance imaging (fMRI) at two timepoints, 1 year apart. Effects of both time (visit 1 vs visit 2; within-subject variable) and age (in years; between-subject variable) on neural response to vocal emotion were investigated simultaneously in a mixed-effect model. The current study’s accelerated longitudinal design affords the opportunity to examine within-subject change across time alongside between-subject change across the age range in youth’s blood-oxygen-level-dependent signal during the vocal ER task. Effects of time and age may denote different developmental patterns (Figure 1). For instance, a particular region of the brain could show decreased activation to vocal emotion across timepoints (which could be attributed to practice effects and/or increased efficiency of processing with maturation across a 1 year time span) and decreased activation across age (which could indicate that the task elicits less response in that region in older youth compared to younger participants, perhaps as a function of increased experience with vocal emotion in social situations for adolescents compared to children; Figure 1, Panel I). In contrast, a region could show stability in activation across timepoints (i.e. no effect of time) but increased activation in older than younger participants (i.e. effect of age), suggesting that response in this region may not change appreciably across a 1 year interval but may increase over a longer time frame from middle childhood to late adolescence (Figure 1, Panel B). Still another possibility is that the amount of stability in a region’s activation across timepoints differs as a function of a participant’s age (i.e. interaction of time × age). Although effects of time and age are likely interdependent, each of these patterns implies a different mechanism for developmental change in neural processing.

Fig. 1.

Hypothetical patterns of development across age and/or time.

Hypothetical patterns of development across age and/or time. Beyond the addition of a second timepoint to investigate responses to vocal emotion longitudinally, we build upon our previous work with this sample by isolating effects related to emotional information in the voice (i.e. contrasting each emotional category to neutral, rather than to baseline—which could index activation to any auditory stimulus; see the ‘Neural activation to emotional voices’ section). Based on prior research on the neural mechanisms underlying social cognition and vocal ER (Kilford ; Morningstar ), we hypothesized that developmental change in neural response to vocal emotion (whether over time and/or over age) would be most evident in neural regions involved in later stages of auditory emotion processing involved in interpretation (e.g. inferior frontal gyrus) and mentalizing/social cognition more broadly (e.g. mPFC, TPJ and temporal pole). Given the paucity of prior research on vocal emotion processing in the developing brain, we conducted exploratory whole-brain analyses of the effect of time and age on neural response to vocal emotion. We predicted that change in neural response may be related to ER accuracy, but did not make an a priori hypothesis about the direction of such an effect.

Methods

Participants

Time 1

Forty-one youth (26 female) ages 8–19 years (M = 14.00, s.d. = 3.38) were recruited via email advertisements circulated to employees of a large children’s hospital (USA). Exclusion criteria included devices or conditions contraindicated for MRI (e.g. braces) or developmental disorders (e.g. autism or Turner’s syndrome). Participants’ self-report of race indicated that 68% of the sample was White, 17% was Black or African American and 15% was multi-racial or of other races. The sample’s age followed a sufficiently normal distribution (mean and median age = 14.00; 61% of participants aged 1 s.d. away from mean; skewness = −0.17; kurtosis = −1.37). One participant did not complete the task in scanner at Time 1 (see below). Thus, 40 youth were included in analyses at Time 1.

Time 2

Thirty-four youth (23 female) participated in the second visit for this study, ∼1 year later (with 10–16 months between visits; M = 11.90, s.d. = 1.86). Attrition was due to participants moving out of state (n = 2), having braces or permanent metal retainers (n = 4) or declining to participate (n = 1). The sex distribution in the sample did not differ across timepoints, χ2(1, N = 75) = 0.15, P = 0.70. Technical errors during data collection resulted in loss of data for one participant. Thus, 33 youth were included in analyses at Time 2. Written parental consent and written participant assent or consent was obtained before participation. All procedures were approved by the hospital Institutional Review Board.

Procedure

Youth first received instructions and a practice version of the task (using exaggerated vocalizations as stimuli) in a mock scanner. Following this, participants completed a forced-choice vocal ER task while undergoing fMRI. Participants heard 75 recordings of vocal expressions produced by three 13-year-old actors (two females; Morningstar ). Each actor spoke the same five sentences (e.g. ‘I didn’t know about it’ and ‘Why did you do that?’) in each of five emotional tones of voice: anger, fear, happiness, sadness and a neutral expression. Recordings retained for the current study were chosen based on listeners’ ratings of their recognizability and authenticity (see Morningstar ). Participants were asked to identify the intended emotion from five labels (anger, fear, happiness, sadness and neutral) using hand-held response boxes inside the scanner. Each trial of the task was composed of stimulus presentation (M duration = 1.34 s, range 0.89–2.03 s), followed by a 5 s response period. Trials were presented in an event-related design with a jittered inter-trial interval of 1–8 s (mean 4.5 s). Auditory stimuli were delivered to participants via pneumatic earbuds embedded in noise-cancelling padding. A monitor at the head of the magnet bore was visible to participants via a head-coil mirror: a fixation cross was shown on the screen during the inter-trial interval and auditory stimulus delivery, and a pictogram of response labels was shown during the response period. Trials were separated into three 6 min runs (25 trials per run). Each run contained an equal number of recordings for each emotion and sentence.

Image acquisition and processing

MRI data were acquired on a Siemens 3 Tesla scanner, using a standard 64-channel head-coil array. Due to equipment upgrade during data collection, Time 1 data from five participants were acquired on a different Siemens 3T scanner with a 32-channel head coil. The imaging protocol included three-plane localizer scout images, an isotropic 3D T1-weighted anatomical scan covering the whole brain (MPRAGE) and echo planar imaging (EPI) acquisitions. Imaging parameters for the MPRAGE were 1 mm voxel dimensions, 176 sagittal slices, repetition time (TR) = 2300 ms, echo time (TE) = 2.98 ms and field of view (FOV) = 248 mm. Imaging parameters for the EPI data were TR = 1500 ms, TE = 30 ms and FOV = 240 mm. EPI images were preprocessed and analysed in Analysis of Functional NeuroImages (AFNI), version 18.1.05 (Cox, 1996). Functional images were aligned to the first volume, oriented to the anterior commissure/posterior commissure line and co-registered to the T1 anatomical image. Images then underwent non-linear warping to the Talairach template and were spatially smoothed with a Gaussian filter (Full Width at Half Maximum, 6 mm kernel). Voxel-wise signal was scaled within-subject to a mean value of 100. Volumes in which >10% of voxels were signal outliers (above 200) or contained movement >1 mm from their subsequent volume were censored in first-level analyses. This procedure resulted in an average of 5.8% of volumes being censored across the whole sample.

Analysis

Task performance (accuracy)

Although emotional displays can also be characterized dimensionally, we conceptualized participants’ ER accuracy as the extent to which they selected the label that corresponded to the categorical emotion ‘type’ speakers had intended to convey. Accuracy was computed using the sensitivity index (Pr; Corwin, 1994), a measure of accuracy based on signal detection statistics (e.g. Pollak ). Youth’s hit rates (HR; correct responses) and false alarms (FA; incorrect responses) were combined into a single estimate of sensitivity (HR − FA) for each emotion in the task. Similar to dʹ [i.e. z(HR) − z(FA)], Pr is more appropriate for tasks in which participants’ accuracy is low (Snodgrass and Corwin, 1988), as is often the case with vocal ER tasks utilizing teenage voices as stimuli (e.g. Morningstar , 2019). Linear mixed-effects models are better suited to longitudinal data than repeated-measures analysis of variance, as it does not delete participants lost to attrition list-wise and produces more accurate estimates of time-related effects for observations nested within people (Mirman, 2014). Using lmerTest (Kuznetsova ) in R (R Core Team, 2017), we fit a linear mixed-effects model predicting Pr based on the fixed effects of Time (within-subjects variable, 2 levels: Time 1 visit vs Time 2 visit), Age (age at Time 1 visit in years; between-subjects continuous variable, mean-centred), Emotion type (within-subjects variable, 5 levels: anger, fear, happiness, sadness and neutral [sum-coded with neutral as the reference category]) and all higher-order interaction terms (Time × Age, Time × Emotion, Age × Emotion, Time × Age × Emotion). A fixed effect of Sex (between-subjects variable, 2 levels: male vs female) was included as a control variable, without interaction terms. We included a random intercept for each participant, as well as a by-participant random slope of Time (to allow change across visits to vary by participant). The equation was as follows: Pr ∼ 1 + Time × Emotion × Age + Sex + (1 + Time | participant). Estimated P-values for all fixed effects were obtained from likelihood ratio tests of the full model with the effect in question against the model without it (Winter, 2013).

Neural activation to emotional voices

At the subject level, regressors were included for the presentation of each type of emotional vocal stimuli (i.e. voices expressing anger, fear, happiness or sadness; from stimulus onset to offset) contrasted to the neutral vocal stimuli (i.e. voices conveying a neutral expression). This model enabled us to examine both emotion-general responses (average response to all emotions vs neutral) and emotion-specific responses (e.g. happiness vs neutral, anger vs neutral, etc.). In addition to these four regressors, nuisance regressors for motion (six affine directions and their first-order derivatives) and scanner drift (within each run) were included at the subject level. At the group level, individual contrast images produced for each participant were fit to a linear mixed-effects model (3dLME in AFNI; Chen ). The model tested effects of Time (within-subjects variable, 2 levels: Time 1 vs Time 2; with a random slope), mean-centred Age (age at Time 1 visit in years; between-subjects continuous variable, mean-centred) and Emotion type (within-subjects variable, four levels: anger vs neutral, fear vs neutral, happiness vs neutral and sadness vs neutral) on neural activation to emotional vs neutral voices, with a random intercept. All higher-order interactions were included in the model (Time × Age, Time × Emotion, Emotion × Age, Time × Age × Emotion). Sex (between-subjects variable, two levels: male vs female) was included as a control variable. Within this model, F-statistics were computed for the main effects of Time, Age and Emotion, as well as two-way interactions between these factors. The standard false discovery rate correction factor was employed by combining a conservative cluster-forming threshold (P < 0.001) with a cluster-size correction. This cluster-size threshold correction was generated with AFNI’s updated spatial autocorrelation function (3dclustsim with ACF (Cox ), which addresses the correction issue raised in Eklund ), which applies Monte Carlo simulations with study-specific smoothing estimates, two-sided thresholding and first-nearest neighbour clustering. Clusters larger than the calculated threshold of 26 contiguous voxels and comprising >25% of grey matter (based on the Talairach atlas in AFNI) are reported below. Mean activation in voxels within resulting clusters was extracted for analyses relating neural activation to ER accuracy.

Relationship between neural activation and ER accuracy

We tested the association between accuracy in the ER task (Pr) and neural response in clusters for which activation varied as a function of time and/or age. Change in neural activation in each cluster was computed as the difference between Time 1 and Time 2 for the contrast representing all emotional vs neutral voices. An overall accuracy value for each timepoint was computed by taking the average of Pr across all emotion types at that timepoint. For each cluster, we fit a general linear model of the association between change in neural activation and Pr at Time 2, controlling for Pr at Time 1 (Rausch ).

Results

Change in ER accuracy

Average accuracy (Pr) was 0.26 (s.d. = 0.09) at Time 1, and 0.29 (s.d. = 0.12) at Time 2 (see Table 1 for emotion-specific estimates of accuracy at both timepoints).

Table 1.

P r for each emotion type at Time 1 and Time 2

P _r	Time 1 (M, s.d.)	Time 2 (M, s.d.)
Anger	0.49 (0.20)	0.52 (0.24)
Fear	0.12 (0.12)	0.13 (0.11)
Happiness	0.12 (0.14)	0.19 (0.19)
Sadness	0.32 (0.19)	0.28 (0.19)
Neutral	0.24 (0.18)	0.35 (0.23)
Average	0.26 (0.09)	0.29 (0.12)

Note: Pr = sensitivity index (Corwin, 1994) of accuracy in the vocal ER task. M = mean, s.d. = standard deviation.

P r for each emotion type at Time 1 and Time 2 Note: Pr = sensitivity index (Corwin, 1994) of accuracy in the vocal ER task. M = mean, s.d. = standard deviation. The linear mixed-effects model revealed a significant effect of Emotion type on accuracy (χ2(16) = 209.02, P < 0.001): parameter estimates suggested that anger and sadness were better recognized than other emotions on average, whereas fear and happiness were recognized less accurately than were other emotions (consistent with typical accuracy rates for emotional prosody; Johnstone and Scherer, 2000). There was also an effect of Sex (χ2(1) = 6.12, P = 0.01): female participants were more accurate than male participants. Lastly, there was a trend-level effect of Age on accuracy (χ2(10) = 17.92, P = 0.056), whereby older youth were marginally more accurate than younger youth. Other effects were not significant (all Ps < 0.11). The data are illustrated in Figure 2. Parameter estimates, their standard errors and corresponding t- and P-values are included in Table 2.

Fig. 2.

Effect of Emotion and marginal effect of Age on Pr.

Table 2.

Parameter estimates for linear mixed-effects model of the effect of Time, Emotion and Age on Pr

Parameter	Estimate	SE	t	P	OR [95% CI]
Intercept	0.255	0.030	8.416	<0.001***	0.26 [0.21, 0.32]
Time	0.026	0.021	1.254	0.219	0.10 [0.03, 0.17]
Emotion Anger	0.246	0.052	4.755	<0.001***	0.26 [0.19, 0.32]
Emotion Fear	−0.118	0.052	−2.276	0.024*	−0.12 [−0.19, −0.05]
Emotion Happiness	−0.169	0.052	−3.254	0.001***	−0.12 [−0.19, −0.05]
Emotion Sadness	0.134	0.052	2.592	0.010**	0.08 [0.01, 0.15]
Age	0.004	0.009	0.505	0.617	0.02 [0.01, 0.04]
Sex	−0.067	0.027	−2.459	0.018*	−0.07 [−0.12, −0.02]
Time × Emotion Anger	−0.010	0.034	−0.305	0.761	−0.08 [−0.19, 0.02]
Time × Emotion Fear	−0.020	0.034	−0.583	0.560	−0.09 [−0.20, 0.01]
Time × Emotion Happiness	0.030	0.034	0.882	0.379	−0.04 [−0.15, 0.06]
Time × Emotion Sadness	−0.074	0.034	−2.195	0.029*	−0.15 [−0.25, −0.05]
Time × Age	0.004	0.006	0.684	0.499	−0.01 [−0.03, 0.01]
Age × Emotion Anger	0.003	0.016	0.173	0.863	−0.01 [−0.03, 0.01]
Age × Emotion Fear	−0.014	0.016	−0.903	0.367	−0.03 [−0.05, −0.01]
Age × Emotion Happiness	−0.010	0.016	−0.649	0.517	−0.02 [−0.04, <0.01]
Age × Emotion Sadness	−0.009	0.016	−0.565	0.572	−0.02 [−0.04, <0.01]
Time × Age × Emotion Anger	−0.002	0.010	−0.147	0.883	0.01 [−0.02, 0.04]
Time × Age × Emotion Fear	0.001	0.010	0.063	0.950	0.02 [−0.02, 0.05]
Time × Age × Emotion Happiness	0.007	0.010	0.705	0.482	0.02 [−0.01, 0.05]
Time × Age × Emotion Sadness	0.008	0.010	0.792	0.429	0.02 [−0.01, 0.05]

Note: Pr = sensitivity index (Corwin, 1994) of accuracy in the vocal ER task. Details of the model are provided in the text. Estimates are derived from the model fit with REML (restricted maximum likelihood); t-tests and associated P-values are derived using Satterthwaite’s approximation method (as recommended by Luke, 2017). Age represents participants’ chronological age (in years) at Time 1. Emotion type is sum-coded, with neutral as the reference category. Parameters referring to a specific emotion are thus representing estimates for that emotion compared to the grand average of all emotions. SE = standard error. OR = odds ratio, with 95% confidence interval (CI).

P < 0.05.

P < 0.01.

P < 0.001.

Effect of Emotion and marginal effect of Age on Pr. Parameter estimates for linear mixed-effects model of the effect of Time, Emotion and Age on Pr Note: Pr = sensitivity index (Corwin, 1994) of accuracy in the vocal ER task. Details of the model are provided in the text. Estimates are derived from the model fit with REML (restricted maximum likelihood); t-tests and associated P-values are derived using Satterthwaite’s approximation method (as recommended by Luke, 2017). Age represents participants’ chronological age (in years) at Time 1. Emotion type is sum-coded, with neutral as the reference category. Parameters referring to a specific emotion are thus representing estimates for that emotion compared to the grand average of all emotions. SE = standard error. OR = odds ratio, with 95% confidence interval (CI). P < 0.05. P < 0.01. P < 0.001.

Change in neural activation to emotional voices

Results of the linear mixed-effects model are represented in Table 3 and Figures 3–5. There was an effect of Time in three clusters: the right dorsal striatum (R-DS), right inferior frontal gyrus (R-IFG) and right precentral gyrus (R-PcG) showed reduced response to vocal emotions from Time 1 to Time 2 (Figure 3). There was also an effect of Age on activation in the left dorsomedial frontal gyrus at the midline (dmPFC): older youth had a greater response to emotional voices in this region than did younger participants (Figure 4). In addition, there was an interaction of Time and Age on activation in two clusters in the right TPJ (R-TPJ): younger youth showed increased response to vocal emotion in these clusters between Time 1 and Time 2, but older participants showed decreased response across timepoints (Figure 5).

Table 3.

Effect of Time, Emotion and Age on activation to emotional voices

Effect	Structure	F	k	x	y	z	Brodmann area
Time	R dorsal striatum (R-DS)	19.58	30	29	−14	1	n/a
	R inferior frontal gyrus (R-IFG)	15.68	27	56	19	14	44
	R precentral gyrus (R-PcG)	22.13	38	49	−11	46	4
Emotion	L postcentral gyrus	10.47	151	−39	−29	54	1
	L superior and medial frontal gyrus	11.03	128	−4	11	51	6
	L middle frontal gyrus	8.50	72	−39	16	31	8
	R superior temporal gyrus	10.80	58	59	−4	1	22
	L superior temporal gyrus	7.98	58	−51	−11	4	41
	R postcentral gyrus	7.64	46	36	−26	49	4
	L cuneus	9.69	42	−26	−74	16	19
	L lingual gyrus	8.15	38	−9	−84	−6	18
	L inferior frontal gyrus	7.10	26	−49	26	11	45
Age	L medial frontal gyrus (dmPFC)	18.84	42	−6	51	39	9
Sex	R cingulate gyrus	22.74	33	16	16	39	8
	R middle frontal gyrus	22.10	27	41	−4	44	6
Time × Age	R supramarginal gyrus (R-TPJ)	29.67	41	56	−46	36	39
	R supramarginal gyrus (R-TPJ)	38.87	38	61	−51	21	39

Note: Clusters listed here represent areas in which there was a main effect of either Time, Emotion, Age at Time 1, or an interaction of Time × Age on activation to emotional voices (happiness, anger, fear and sadness) vs neutral voices. Sex is included in the model as a covariate: effects of Sex revealed that males showed less response to emotional voices than to neutral voices in the right cingulate gyrus and middle frontal gyrus—but that females did not (details available from the first author). An interaction of Time × Age was also found in the cerebellum. Clusters were formed using 3dclustsim at P < 0.001 (corrected, with a cluster-size threshold of 26 voxels). R = right, L = left. k = cluster size in voxels. xyz coordinates represent each cluster’s peak, in Talairach–Tournoux space.

Fig. 3.

Effect of Time on activation to emotional voices.

Fig. 4.

Effect of Age (at Time 1) on activation to emotional voices.

Fig. 5.

Interaction of Time and Age (at Time 1) on activation to emotional voices.

Effect of Time, Emotion and Age on activation to emotional voices Note: Clusters listed here represent areas in which there was a main effect of either Time, Emotion, Age at Time 1, or an interaction of Time × Age on activation to emotional voices (happiness, anger, fear and sadness) vs neutral voices. Sex is included in the model as a covariate: effects of Sex revealed that males showed less response to emotional voices than to neutral voices in the right cingulate gyrus and middle frontal gyrus—but that females did not (details available from the first author). An interaction of Time × Age was also found in the cerebellum. Clusters were formed using 3dclustsim at P < 0.001 (corrected, with a cluster-size threshold of 26 voxels). R = right, L = left. k = cluster size in voxels. xyz coordinates represent each cluster’s peak, in Talairach–Tournoux space. Effect of Time on activation to emotional voices. Effect of Age (at Time 1) on activation to emotional voices. Interaction of Time and Age (at Time 1) on activation to emotional voices. Lastly, activation in several frontal and temporal regions—including, notably, the bilateral superior temporal gyri (presumed to represent the temporal voice areas, based on the location of these clusters of interest)—also varied by Emotion type (Table 3). On average, happy voices were found to elicit greater activation than emotionally neutral voices across frontal and temporal regions, whereas sad/fearful/angry voices elicited less activation than neutral voices (see the Appendix for details). Effects of Emotion type did not vary by participant age or across timepoints.

Relationship between change in neural activation to emotional voices and ER accuracy

To probe the functional consequences of changes in neural activation to emotional voices across time, we examined the association between accuracy in the ER task (Pr) and neural response in the R-DS, R-IFG, R-PcG, dmPFC and R-TPJ (i.e. clusters for which activation varied as a function of time and/or age). Results of this analysis are provided in Table 4. In all models, accuracy at Time 1 positively predicted accuracy at Time 2. In addition, changes in both the R-DS and the R-TPJ predicted accuracy at Time 2. For both clusters, a decrease in activation in these regions from Time 1 to Time 2 was associated with greater accuracy at Time 2.

Table 4.

Relationship between change in neural activation and Pr at Time 2

Cluster of interest	Effect	df	F	P	η²
R-DS	Change in neural activation	1, 27	4.618	0.041*	0.146
	Time 1 P_r	1, 27	9.829	0.004**	0.267
R-IFG	Change in neural activation	1, 27	0.591	0.449	0.021
	Time 1 P_r	1, 27	7.247	0.012*	0.212
R-PcG	Change in neural activation	1, 27	0.265	0.611	0.010
	Time 1 P_r	1, 27	8.673	0.007**	0.243
dmPFC	Change in neural activation	1, 27	0.218	0.644	0.008
	Time 1 P_r	1, 27	7.721	0.010**	0.222
R-TPJ	Change in neural activation	1, 27	5.235	0.030*	0.162
	Time 1 P_r	1, 27	5.805	0.023*	0.177

Note: Pr = sensitivity index (Corwin, 1994) of accuracy in the vocal ER task. R = right. DS = dorsal striatum, IFG = inferior frontal gyrus, PcG = precentral gyrus, dmPFC = dorsomedial prefrontal gyrus, TPJ = temporo-parietal junction. Models (described in the text) are testing the association between change in neural activation within the cluster of interest and Pr at Time 2, controlling for Pr at Time 1. df = degrees of freedom, η2 = partial eta squared.

P < 0.05.

P < 0.01.

Relationship between change in neural activation and Pr at Time 2 Note: Pr = sensitivity index (Corwin, 1994) of accuracy in the vocal ER task. R = right. DS = dorsal striatum, IFG = inferior frontal gyrus, PcG = precentral gyrus, dmPFC = dorsomedial prefrontal gyrus, TPJ = temporo-parietal junction. Models (described in the text) are testing the association between change in neural activation within the cluster of interest and Pr at Time 2, controlling for Pr at Time 1. df = degrees of freedom, η2 = partial eta squared. P < 0.05. P < 0.01.

Discussion

The current study is a longitudinal examination of children and adolescents’ neural and behavioural responses to vocal emotional information. We investigated effects of time (within-subject change between visit 1 and visit 2) and age (between-subject differences in activation by chronological age at scan) on 8- to 19-year-olds’ neural processing of, and capacity to identify the intended emotion in, vocal expressions produced by other teenagers at two timepoints, 1 year apart. Reduced activation to vocal emotion across timepoints was noted in the R-IFG —a region involved in mentalizing tasks and the identification of emotional meaning in paralinguistic information—and the R-DS. Moreover, there was an effect of age on dmPFC activation, whereby older youth engaged this region more in response to emotional voices than younger participants did. Lastly, younger participants showed increased right TPJ activation across time, but older participants showed the opposite pattern (interaction of time × age). Decreased striatal and TPJ responses to vocal emotion across timepoints were also related to increased vocal ER accuracy, when controlling for baseline performance at Time 1. Although extension of this work in a larger sample and across more timepoints is needed, these findings are preliminary evidence that changes in engagement and efficiency of brain regions involved in mentalizing may support increased social cognitive capacity in interpreting vocal affect in adolescence. As in prior investigations of change in ER skills over a 1 year period (e.g. Overgaauw ; Taylor ), we did not find a significant change in task accuracy across timepoints. A marginal effect of age on performance suggested that older youth were somewhat better at the task than were younger participants, which is consistent with reports of age-related change in vocal ER skills throughout adolescence (e.g. Grosbras ; Morningstar ; Amorim ). It is possible that incremental growth in vocal ER ability occurs on a longer time scale than a 1 year interval and instead matures at a more protracted rate across childhood and adolescence; however, larger samples are needed to confirm this finding. In contrast, neural engagement with emotional voices did change over time in a 1 year period. Specifically, the right DS and IFG showed a decrease in activation across timepoints. Our findings complement prior reports of the IFG showing age-related decreases in activation during a mentalizing/facial ER task in 12- to 19-year-olds (Gunther Moor ; Overgaauw ). Similarly, children aged 9–14 years showed more (left) IFG activation than did adults in a mentalizing task (Wang ). Beyond its involvement in mentalizing and theory of mind tasks (e.g. Shamay-Tsoory ; Dricu and Frühholz, 2016), the IFG is also heavily involved in processing emotional intent in non-verbal stimuli (Nakamura ; Sergerie ; Frühholz and Grandjean, 2013; Mitchell and Phillips, 2015), multi-modal social stimuli (e.g. self vs other faces and voices; Kaplan ) and paralinguistic information in the human voice (Wildgruber ; Kotz ). A similar response pattern was noted in the right DS. The DS has previously been shown to be relevant to processing social reward (Guyer ) and in coding the value of different outcomes in learning contexts (Delgado ), which could plausibly be involved in the current task of decoding intent in social cues. However, the striatum (amongst other subcortical structures) has also been implicated in the second stage of vocal emotion processing, in which information about salience, affect and context are integrated with basic features of the auditory signal (e.g. Schirmer and Kotz, 2006; Abrams ). Increased efficiency of processing may account for this pattern of change in the IFG and DS: for instance, increasingly focalized response in these regions could yield lower estimates of activation across these clusters over time. This pattern of change across visits, in the context of stability across age, may be related to learning-related processes (e.g. ‘practice’ effects) that manifest in a similar way in children and adolescents. Stimuli that have been encountered before—either because they were seen in visit 1 or because youth are gaining experience with emotional expressions more broadly—may require less engagement from regions involved in later stages of auditory processing and the interpretation of social information. In conjunction with this potential specialization of function, it is also possible that lower estimates of activation in a region over time are due to spatial localization of cortical response (Johnson )—such that estimates of average activation in a cluster of interest may be ‘diluted’ if response occurs within a narrower set of voxels in that region. Such focalization effects likely play an important role in the developmental trajectory of vocal ER skills; indeed, decreased DS activation over time predicted increased ER accuracy at Time 2 (even when controlling for baseline differences in accuracy at Time 1). In comparison, we found that activation in the dmPFC was positively associated with age but was stable across timepoints. This pattern of change is suggestive of maturational changes that occur across a longer timescale in childhood and adolescence. As age is a proxy for a vast array of developmental influences, increased dmPFC activation with age may reflect processes that undergo a dramatic change in adolescence—including experience with peers (Nelson ), social cognitive abilities that rely on the mPFC and other areas of the ‘social brain’ (Kilford ) or other aspects of social information processing that are refined during the teenage years (Nelson ). An increase in dmPFC response to vocal emotion is broadly consistent with cross-sectional age-related changes in this sample that have been previously reported (Morningstar ). Extending our previous work that found age-related increases in response to vocal stimuli in frontal regions involved in linguistic processing and emotion categorization, the current analyses locate changes in response to emotional vocal information to a more rostral area of the dmPFC that is implicated with mentalizing (Frith and Frith, 2006). Prior work has suggested that dmPFC activation in a variety of social cognitive tasks is higher in children (Wang ; Kobayashi ) and adolescents (Blakemore ; Burnett ; Sebastian ) than adults. However, in a longitudinal examination of 12- to 19-year-olds’ interpretation of emotional eyes (Overgaauw ), the same region of the dmPFC showed a curvilinear pattern of change across age, with lowest activation noted in mid-adolescents (∼15–16 years old)—and stability over time (i.e. no effect of Time, as in the current investigation). Explanations for these divergent findings can only be speculative. It is possible that maturational influences on dmPFC engagement in social cognitive tasks vary as a function of stimulus. Indeed, many of the studies that report decreases in dmPFC activation across age groups (e.g. Wang ; Kobayashi ; Burnett ) utilize tasks that tap into higher-order mentalizing skills (e.g. false belief tasks or detection of irony). In contrast, this developmental pattern is less consistent in studies that assess ER or the passive viewing of affective facial displays (e.g. Pfeifer ; Overgaauw ). Moreover, given the greater computational requirements of decoding vocal emotion—such as the need to track temporal information across a sentence (Liebenthal ) and increased task difficulty compared to facial ER (Scherer, 2003)—the developmental trajectory of dmPFC response to emotional voices may be different from other forms of social stimuli. This is an empirical question that future research comparing neural response to various non-verbal modalities in a longitudinal framework could help answer. Further, change in activation within the right TPJ was dependent on both time and age. The slope of within-person change across visits varied as a function of participants’ age, such that TPJ response to emotional voices increased across time for younger participants but decreased for older youth. The TPJ is a key region for mentalizing ability (Saxe and Kanwisher, 2003; Saxe and Wexler, 2005) and the decoding of complex social cues (Blakemore, 2008; Redcay, 2008), such as affective tones of voice. The pattern of response in this region is consistent with predictions from the Interactive Specialization model of functional brain development (Johnson, 2000), which predicts that initial stages of development are characterized by broad and diffuse patterns of activation to stimuli—but that experience and learning lead to narrowing and focalization of response over time. Such a pattern may be exemplified by the TPJ response to emotional voices. Although this region may initially show increased engagement in parsing this type of socio-emotional information in early adolescents, increased efficiency of processing (or narrowing of response localization) over time may result in lesser engagement—and potentially stronger integration to a distributed network of regions that conjointly process the information—in older youth. Indeed, decreased activation in the right TPJ across time was associated with better ER performance. These results suggest that reduced activation across time could be representative of a specialization process, which may be supporting increased social cognitive capacity in interpreting vocal affect in adolescence. Lastly, it is noteworthy that findings of changing response across age or timepoints were primarily found in regions typically associated with social cognition, rather than auditory processing. Although the IFG is arguably involved in parsing both socio-emotional and linguistic information (Frühholz and Grandjean, 2013; Kotz ; Mitchell and Phillips, 2015; Dricu and Frühholz, 2016; Kirby and Robinson, 2017), the TPJ and dmPFC are considered part of the default mode and mentalizing networks and contribute to the processing of social stimuli (e.g. Li ; Meyer, 2019). As such, our findings may reflect maturational patterns in the broader default mode network. In contrast, although emotion-specific responses were noted in these regions, there were no developmental influences on activation in the auditory cortex or superior temporal gyrus—areas that are theorized to support early processing of vocal stimuli (e.g. Schirmer and Kotz, 2006). [Consistent with previous work with other non-facial non-verbal cues (e.g. Peelen ; Ross ), we found emotion-specific modulation of neural response in voice-sensitive regions of the brain but no evidence of an age × emotion interaction.] Our focus on responses to emotional information in the voice (e.g. contrasting emotional voices to a baseline of neutral voices) may have highlighted developmental trends in the ‘social brain’ areas over auditory processing regions. Nonetheless, these findings suggest the possibility that the development of the capacity to decode vocal cues of emotion in others’ affective prosody may primarily rely on the maturation and fine-tuning of social cognitive, rather than early auditory processing, networks.

Strengths and limitations

The current study provides novel information about developmental influences on neural and behavioural responses to vocal emotion, an understudied aspect of social cognition. The coupling of fMRI and assessments of ER performance in a longitudinal design enables the investigation of shifts in both functional activation and associated behaviour (Schriber and Guyer, 2016). Our findings highlight potential neural mechanisms of specialization through which growth in social cognitive performance may be facilitated in adolescence. Continued efforts to delineate these developmental processes will be valuable for understanding the possible social impact of deviations to these patterns in youth who show differential responses to vocal affect (e.g. children with autism spectrum disorder; Abrams , 2019). However, limitations must be noted. First, the use of an intensive methodology like fMRI by necessity limited our sample size, which is modest for an investigation of age-related change across time. Although larger longitudinal data sets are now increasingly available through consortium efforts, these typically do not include tasks that probe non-facial processing. Our findings are preliminary evidence that important developmental changes in response to non-facial non-verbal cues may be occurring during childhood and adolescence; as such, our results encourage the inclusion of tasks probing these social cognitive capacities (such as vocal ER tasks) in large-scale neuroimaging studies on socio-emotional development. Second, the inclusion of additional timepoints would permit the evaluation of non-linear patterns of change over time and allow investigations of the mechanisms behind shifts in the directionality of neural response within-person across development (e.g. in the TPJ). Third, the current study cannot disentangle effects of age from that of pubertal maturation. Variations in adrenarcheal or gonadal hormones are known to influence neural response to social stimuli (e.g. Forbes ; Moore ; Klapwijk ). Although age and pubertal status are highly correlated, assessing the relationship between pubertal development and the neural processing of vocal emotion would add to our understanding of normative development in adolescence. It should also be noted that our analyses use neutral prosody as a comparison point to canonical emotional prosody. There is evidence that neutral emotional expressions are not necessarily equivalent to ‘null’ stimuli and may be perceived as negative (E. Lee ) rather than devoid of emotion. However, using an implicit baseline (e.g. the fixation cross, or non-stimuli periods) also poses interpretative challenges; therefore, it was necessary to select one expression type as the reference level in our analyses. Given precedent for this approach (e.g. Kotz ; meta-analysis by Schirmer, 2018), we selected neutral as a comparison point. Accuracy for neutral expressions was closest to the ‘grand mean’ accuracy across emotion types; moreover, response times were similar for neutral as for other expressions (see Supplemental Materials for more details). These patterns suggest that the recognition of neutral voices was likely of a similar ‘difficulty level’ as for other emotion types, and an adequate reference point in our analyses. However, to the extent that neutral may be more ‘ambiguous’ than other emotion types (and classification may be more challenging), future research would benefit from contrasting neutral to other low-intensity expressions to track developmental changes in the perceptual and neural differentiation of ambiguous non-verbal cues with age (Tahmasebi ; Lee ). Including a non-emotional auditory control condition in the task would also help interpret age-related changes in neural response to neutral prosodic stimuli (see Supplemental Material).

Conclusions

The current study examined changes in neural and behavioural responses to vocal emotion in adolescents. Several regions of the mentalizing network (inferior frontal gyrus, dorsomedial prefrontal cortex and TPJ) showed a change in activation over time and/or age. Notably, change in the right TPJ across timepoints varied as a function of participants’ age, suggesting an ‘inverted U’-like pattern of response across development. Reduced activation in the TPJ and in the DS over time was associated with better performance on the vocal ER task. Our results suggest that specialization in engagement of social cognitive networks supports the growth of vocal ER skills across adolescence. This social cognitive capacity is crucial to our understanding of the world around us, especially in modern contexts in which face-to-face interaction is more limited or in which facial communication is impaired (e.g. due to face coverings). Deepening knowledge about typical developmental trajectories of neural and behavioural responses to non-verbal social information will be crucial to understanding deviations from these norms in neurodevelopmental disorders. Click here for additional data file.

82 in total

1. Functional brain development in infants: elements of an interactive specialization framework.

Authors: M H Johnson
Journal: Child Dev Date: 2000 Jan-Feb

Review 2. Beyond the right hemisphere: brain mechanisms mediating vocal emotional processing.

Authors: Annett Schirmer; Sonja A Kotz
Journal: Trends Cogn Sci Date: 2006-01 Impact factor: 20.229

3. The self across the senses: an fMRI study of self-face and self-voice recognition.

Authors: Jonas T Kaplan; Lisa Aziz-Zadeh; Lucina Q Uddin; Marco Iacoboni
Journal: Soc Cogn Affect Neurosci Date: 2008-09 Impact factor: 3.436

4. Neurodevelopmental changes of reading the mind in the eyes.

Authors: Bregtje Gunther Moor; Zdena A Op de Macks; Berna Güroglu; Serge A R B Rombouts; Maurits W Van der Molen; Eveline A Crone
Journal: Soc Cogn Affect Neurosci Date: 2011-04-22 Impact factor: 3.436

5. Neural processing associated with cognitive and affective Theory of Mind in adolescents and adults.

Authors: Catherine L Sebastian; Nathalie M G Fontaine; Geoffrey Bird; Sarah-Jayne Blakemore; Stephane A De Brito; Eamon J P McCrory; Essi Viding
Journal: Soc Cogn Affect Neurosci Date: 2011-04-04 Impact factor: 3.436

Review 6. Understanding adolescence as a period of social-affective engagement and goal flexibility.

Authors: Eveline A Crone; Ronald E Dahl
Journal: Nat Rev Neurosci Date: 2012-09 Impact factor: 34.870

7. Underconnectivity between voice-selective cortex and reward circuitry in children with autism.

Authors: Daniel A Abrams; Charles J Lynch; Katherine M Cheng; Jennifer Phillips; Kaustubh Supekar; Srikanth Ryali; Lucina Q Uddin; Vinod Menon
Journal: Proc Natl Acad Sci U S A Date: 2013-06-17 Impact factor: 11.205

8. Two systems for empathy: a double dissociation between emotional and cognitive empathy in inferior frontal gyrus versus ventromedial prefrontal lesions.

Authors: Simone G Shamay-Tsoory; Judith Aharon-Peretz; Daniella Perry
Journal: Brain Date: 2008-10-29 Impact factor: 13.501

9. The longitudinal development of social and executive functions in late adolescence and early adulthood.

Authors: Sophie J Taylor; Lynne A Barker; Lisa Heavey; Sue McHale
Journal: Front Behav Neurosci Date: 2015-09-15 Impact factor: 3.558

Review 10. A unified coding strategy for processing faces and voices.

Authors: Galit Yovel; Pascal Belin
Journal: Trends Cogn Sci Date: 2013-05-10 Impact factor: 20.229