Mariko Uchida-Ota1, Takeshi Arimitsu2, Daisuke Tsuzuki3, Ippeita Dan4, Kazushige Ikeda2, Takao Takahashi2, Yasuyo Minagawa5. 1. Center for Advanced Research on Logic and Sensibility, Keio University, Tokyo, Japan; Center for Research in International Education, Tokyo Gakugei University, Tokyo, Japan. 2. Department of Pediatrics, Keio University School of Medicine, Tokyo, Japan. 3. Department of Language Sciences, Tokyo Metropolitan University, Tokyo, Japan. 4. Faculty of Science and Engineering, Chuo University, Tokyo, Japan. 5. Center for Advanced Research on Logic and Sensibility, Keio University, Tokyo, Japan; Department of Psychology, Faculty of Letters, Keio University, Kanagawa, Japan. Electronic address: myasuyo@bea.hi-ho.ne.jp.
Abstract
Language development and the capacity for communication in infants are predominantly supported by their mothers, beginning when infants are still in utero. Although a mother's speech should thus have a significant impact on her neonate's brain, neurocognitive evidence for this hypothesis remains elusive. The present study examined 37 neonates using near-infrared spectroscopy and observed the interactions between multiple cortical regions while neonates heard speech spoken by their mothers or by strangers. We analyzed the functional connectivity between regions whose response-activation patterns differed between the two types of speakers. We found that when hearing their mothers' speech, functional connectivity was enhanced in both the neonatal left and right frontotemporal networks. On the left it was enhanced between the inferior/middle frontal gyrus and the temporal cortex, while on the right it was enhanced between the frontal pole and temporal cortex. In particular, the frontal pole was more strongly connected to the left supramarginal area when hearing speech from mothers. These enhanced frontotemporal networks connect areas that are associated with language (left) and voice processing (right) at later stages of development. We suggest that these roles are initially fostered by maternal speech.
Language development and the capacity for communication in infants are predominantly supported by their mothers, beginning when infants are still in utero. Although a mother's speech should thus have a significant impact on her neonate's brain, neurocognitive evidence for this hypothesis remains elusive. The present study examined 37 neonates using near-infrared spectroscopy and observed the interactions between multiple cortical regions while neonates heard speech spoken by their mothers or by strangers. We analyzed the functional connectivity between regions whose response-activation patterns differed between the two types of speakers. We found that when hearing their mothers' speech, functional connectivity was enhanced in both the neonatal left and right frontotemporal networks. On the left it was enhanced between the inferior/middle frontal gyrus and the temporal cortex, while on the right it was enhanced between the frontal pole and temporal cortex. In particular, the frontal pole was more strongly connected to the left supramarginal area when hearing speech from mothers. These enhanced frontotemporal networks connect areas that are associated with language (left) and voice processing (right) at later stages of development. We suggest that these roles are initially fostered by maternal speech.
Language acquisition in humaninfants shows incredible development in the first year of life. Evidence from developmental psychology indicates that early language and communicative development is chiefly supported by primary caretakers, who in many cases are the mothers. Examples have included phoneme perception in infants adjusted to maternal articulation (Cristià, 2011), advanced prosodic perception of bilingual newborns whose mothers spoke two languages during pregnancy (Abboub et al., 2016), advantages in facial-emotional recognition (Montague and Walker-Andrews, 2002), and word learning (Barker and Newman, 2004). One-month-old infants (Mehler et al., 1978) and neonates (DeCasper and Fifer, 1980) can discriminate their mothers’ voices from an unfamiliar female voice. Additionally, the fetus is constantly exposed to its mother’s speech; vocal sounds and vibrations are conducted through the intrauterine environment to stimulate developing auditory neural pathways (May et al., 2011), enabling the fetus to specifically respond to its mother’s speech. In fact, bilateral auditory cortex in the temporal lobes of preterm newborns at the gestational age of 25–32 weeks becomes thicker due to exposure to the speech sounds and heartbeat of the mother than it does in response to environmental noise during the first month after birth (Webb et al., 2015). This suggests that the auditory cortex is more adaptive to maternal sounds than to environmental sounds. Therefore, recognition of the mother’s speech is facilitated, and it becomes established as the most familiar source of vocal stimulation for the neonate.Several brain regions apart from the auditory cortex have been reported as neuronal substrates underlying the mother’s special role in infant auditory recognition and language development. Compared with an unfamiliar woman’s voice, maternal speech elicits specific event-related potentials (ERPs) in the parietal and frontal areas, as well as the bilateral temporal areas, in neonates and infants (deRegnier et al., 2000; Siddappa et al., 2004; Therien et al., 2004; Purhonen et al., 2005; Beauchemin et al., 2010). A functional magnetic resonance imaging (fMRI) study in 2-month-old infants also reported significant responses to maternal speech in the medial prefrontal cortex (mPFC), orbitofrontal cortex (OFC), amygdala, and left temporal region (Dehaene-Lambertz et al., 2010). Significantly greater activation in mPFC was also shown in 7–9-month-old infants when they heard their mothers produce infant-directed speech (Naoi et al., 2012) and in 6-month-old infants who heard their own names spoken by their mothers (Imafuku et al., 2014). The N400 component, which reflects semantic priming, is observed in the parietal area of 9-month-olds exclusively when word stimuli are spoken by a maternal voice (Parise and Csibra, 2012). Abrams et al. (2016) reported that when hearing their mothers’ speech, the strength of functional connectivity between the temporal region—as a voice-processing circuit (Beauchemin et al., 2010; Grossmann et al., 2010)—and the OFC and nucleus accumbens—as a reward circuit (Haber and Knutson, 2010)— was correlated with scores of social communication skill in 10-year-old children. Thus, maternal speech plays indispensable roles in facilitating language acquisition and social communication skills in infants, and this facilitation is based on activity in several brain regions, including the temporal, frontal, and parietal cortices that are assumed to be interacting with each other.Despite these studies, little is actually known about whether or how these regions interact when neonates hear their mothers’ speech. From infancy, left and right temporal cortices play different roles; the left temporal region strongly responds to phonologically different sounds (e.g., Peña et al., 2003; Sato et al., 2012; Arimitsu et al., 2011) and the right temporal region strongly responds to prosodic aspects of speech (e.g., Homae et al., 2006; Grossmann et al., 2010; Arimitsu et al.; 2011; for review, see Minagawa-Kawai et al., 2011a). Because neonates are exposed to their mothers’ speech beginning in utero, both temporal regions might process acoustic features of a mother’s speech differently from those produced by another person. Temporal region activation and connectivity with the frontal and parietal regions should differ depending on the familiarity of speech because these brain regions integrate different perceptual information and contribute to higher-level speech processing. Perani et al. (2011) reported that the structural connectivity between the temporal and prefrontal region (a known language-related neural substrate) was detected in neonates by tracking fibers using diffusion tensor imaging (DTI). The authors simultaneously reported that the functional connectivity between these regions while hearing normal speech is not fully mature in neonates. However, the vocal stimulus they used in their study was not the maternal voice.Consequently, the first aim of the present study was to use functional near-infrared spectroscopy (fNIRS) to examine with high spatial accuracy the cortical regions in neonates that respond to the maternal speech. The second aim was to characterize any changes in functional connectivity that might be induced by the maternal voice. We compared cortical responses to maternal speech with those in response to an unknown female speech. We hypothesized that the maternal speech would activate a stronger cortical network between the temporal and frontal regions of the neonatal brain. Our reasoning for this hypothesis is that the maternal speech is continually presented to the fetus in utero and the familiar phonetic and prosodic features of this speech might more readily trigger higher-level speech processing in language areas of the brain.
Methods
Participants
The participants were 37 term neonates (20 females and 17 males) with normal hearing, which was assessed using auditory brainstem responses or other clinical tests. Their mean age was 4.5 ± 1.4 days (range: 2–7; 20 participants were 4 days old) and their mean gestational age was 39.3 ± 1.2 weeks (range: 37–41). Mean birth weight was 3097 ± 267 g (range: 2628–3676 g). All of the mothers were monolingual native Japanese speakers. Written informed consent was obtained from parents before participation. The study was approved by the ethics committee of Keio University Hospital (No. 20090189).
Stimuli and procedures
The experiments were performed in a testing room at Keio University Hospital. The experiment had two conditions based on differing levels of stimulus familiarity: the auditory stimuli spoken by the neonate’s mother were familiar to the neonate, while those spoken by another participant’s mother (a stranger) were unfamiliar. Stimuli were sampled at 44 kHz (16 bit) by a digital voice recorder and used as natural speech stimuli without any low-pass filtering. All speech stimuli comprised 18 short sentences from a Japanese original script that had rich intonations characteristic of infant-directed speech (Cooper and Aslin, 1990). When these stimuli were recorded, the speakers were asked to speak clearly with a high overall pitch, wide pitch excursions, slow tempo, and exaggerated emphatic stress. Different stranger-voice stimuli were used across participants to ensure that any observed effects were related to their unfamiliarity and not any specific acoustic characteristics. For example the mother’s voice for baby-A was used as the stranger’s voice for baby-B and the mother’s voice for baby-B was used as the stranger’s voice for baby-C. Acoustic parameters for each stimulus (utterance duration, intensity, and fundamental frequency) did not differ significantly across speech (see Table S1). Stimuli were presented to neonates via two speakers positioned 45 cm from their heads. Stimuli were presented in a block design such that each trial comprised a silent period (10 s) followed by a stimulation period (15 s). Thus, each trial lasted 25 s. For each stimulation period, 4 or 5 sentences (4.5 sentences in average) were presented with a pause between the sentences. The average duration of a single sentence was 2285 ms (SD: 753 ms).During an experimental session, we randomly presented trials from the two speech conditions (mother and stranger) and terminated the experiment when at least five trials of each condition succeeded without gross movement of the neonate’s head or body (see section 2.4 for details on judging body movement artefacts). We recorded changes in regional cerebral hemoglobin (Hb) concentration using a NIRS system (ETG-4000, Hitachi Medical Corporation, Tokyo, Japan) while each neonate was exposed to the speech stimuli while lying in a supine position. Light beams of 695- and 830-nm wavelengths were emitted from each probe with a maximum intensity of 1.5 mW. The transmitted light was sampled at 10 Hz by the detecting probes. For the participants (n = 18) in the latter half of this study, we were able to perform the simultaneous recording of NIRS using electroencephalography (EEG), electrooculography (EOG), electrocardiography (ECG), and respiratory chest movements using a digital polygraph system (Polymate AP1132; TEAC, Tokyo, Japan), as a modified version of the ethics permission for co-registration measurement was obtained. EEGs were recorded from the Fz and Pz points using in the international 10/20 sensor placement system, and EEG and EOG measurements were used to score sleep states. Sleep states were determined according to the criteria put forth by Anders, Emde, & Parmelle (1971) and Scholle and Schäfer (1999). The ECG and respiratory chest-movement measurements were used for a different purpose in a separate study (Uchida, et al., 2017).Hb signals were measured at 46 positions on the frontal and temporal regions of the scalp (Fig. 1: See section 2.3 for details regarding the method for mapping these channels). The emitting and detecting probes were separated by 2 cm and arranged in a 3 × 3 or 3 × 5 square lattice. The measurement positions were defined as the midpoint between the emitting and detecting probes. The 3 × 3 holders for the left and right temporal regions were placed so that the midpoint between the positions for measurement channel (Chs) 11 and 12 (or between Chs 23 and 24) corresponded to the T3 (or T4) position in the 10/20 system. The lowest probe row was nearly aligned with the horizontal reference curve (F7-T3-T5 or F8-T4-T6). The 3 × 5 holder for the frontal region was set so that the midpoint between Chs 26 and 27 was placed at Fpz, and the lowest probe row was aligned with the horizontal reference curve (F7-Fp1-Fpz-Fp2-F8). The middle column was aligned along the sagittal reference curve.
Fig. 1
Channel locations for Hb signals obtained using NIRS on a size-modified infant brain.
Channel locations for Hb signals obtained using NIRS on a size-modified infant brain.
Estimation of macroanatomical locations
To determine the underlying cortical structures that corresponded to the measurement Chs on the scalp, we used a modified version of the virtual registration method (Tsuzuki et al., 2007; Okamoto and Dan, 2005). This uses MRI template data from a single 12-month-old infant with macroanatomical segmentation and detailed landmarking of scalp structures (Matsui et al., 2014). Specifically, we linearly reduced the size of the infant template based on the head circumference (Fpz-T3-POz-T4-Fpz) of a 12-month-old infant (44.2 cm) and a neonate template that we generated as an average from all participants in this study (34.4 cm). Subsequently, we arranged virtual holders that were the same size as the real probe holders (2 cm inter-optode distances) and allocated them along the references of the 10/20 system on the head surface of this minified infant template, which reproduced the real holder allocation. The given Chs on the head surface were then projected onto the cortical surface of the infant template as shown in Fig. 1. Finally, macro-anatomy of the lateral cortical surface was estimated primarily using the infant template with subsidiary reference to automatic anatomical labeling (AAL, Tzourio-Mazoyer et al., 2002; Matsui et al., 2014). Regions of interest (ROIs) were determined based on the macroanatomical estimation.These methods are validated by Tsuzuki et al. (2017) who quantified individual and developmental variations in cortical structure among infants ranging in age from birth to 2 years. Specifically, they examined individual variability in the distribution of each macroanatomical landmark position that was projected on the lateral cortical template of a 12-month-old infant (Matsui et al., 2014), which was identical to the template used in the present study. They found that individual variability was smaller than the pitch of the 10/10 system landmarks. They concluded that the 10/10 system (and the 10/20 system) can serve as a robust predictor of macroanatomy estimated from the scalp of infants ranging in age from birth to 2 years. Therefore, linearly reducing the size of the 12-month-old brain in applying the virtual registration method is appropriate for neonate macroanatomical estimation.
Signal preprocessing
Signal preprocessing and the following averaging analysis and the phase-locking analysis were performed in MATLAB (Math Works Inc., Natick, MA). In particular, signal preprocessing was performed using the platform for optical topography analysis tools (POTATo, version 3.7.2 beta; Hitachi, Ltd, Tokyo, Japan) running on MATLAB. Data from the NIRS system were transformed into changes in oxygenated (oxy-) and deoxygenated (deoxy-) Hb molar concentration (unit: mM·mm). In this transformation, based on the modified Beer–Lambert law (e.g. Maki et al., 1995), the optical path length (L) and absorption coefficients against oxy- and deoxy-Hb (εoxy, εdeoxy) were assumed to be constant. Specifically, the product of L and the differential path length factor was set to 1 because measured L was not available. εoxy and εdeoxy for 695 nm wavelength were respectively set to 0.415 and 1.990 mM−1 cm−1, and those for 830 nm wavelength were respectively set to 1.013 and 0.778 mM−1 cm−1. When the contact between the probes and the scalp was insufficient, the oxy-Hb signals of the neighboring measurement channels constantly varied between low and extremely high. Therefore, we investigated the distribution of the variation (standard deviation: SD) in the oxy-Hb signal within the first 10–25 s of measurement among all channels, and the measurement channels were excluded from the following analyses when the SD was above 95% (> 0.2 mM·mm) or below 5% (< 0.001 mM·mm) of the distribution. The time-continuous Hb signals were band-pass filtered between 0.04 and 0.20 Hz using a zero-phase digital filter. We set the lower limit of the band to 0.04 Hz, because a narrower band was preferable for our subsequent phase-locking analysis. We set the higher limit of the band to 0.20 Hz to detect fast hemodynamic responses with a 2–3 s peak latency. We expected that the hemodynamic response curve would return to baseline within a few seconds after stimulus offset. However, it frequently returned to baseline only a few seconds before the following stimulus onset. Therefore, we used 0.1 s before stimulus onset as the baseline point for evaluating the relative change in Hb signals in response to the stimulus. We segmented the Hb signals into 25 s blocks, which included a pre-stimulation silent time-point 0.1 s before the following stimulus onset, a 15-s stimulus, and a 9.9-s post-stimulation silent period. We visually confirmed any unusually large oxy-Hb signal amplitudes (> 0.3 mM·mm) that occurred when the infants moved their heads slightly during the measurement. We observed an absolute maximum oxy-Hb peak above 0.3 mM·mm in 15.1% of all experimental blocks. These blocks were deemed error blocks, contaminated with body movement artefacts, and discarded. Moreover, data from participants for whom we could not obtain at least four blocks in more than two thirds of the channels were discarded for each condition. In the resulting data shown in Table S2, the mean number of available blocks per Ch across participants did not differ between stimulus conditions (mother’s voice: 6.43 ± 1.75; stranger’s voice: 5.97 ± 1.37; t58 = 1.17, p = 0.25), nor did the mean number of available channels per block (mother’s voice: 42.06 ± 3.21; stranger’s voice: 42.27 ± 2.75; t60 = −0.28, p = 0.78). We analyzed data from at least 25 participants for each condition and for each Ch.
Averaging analysis
First, we averaged the block data for oxy- and deoxy-Hb signals during stimulus exposure for each participant, channel, and stimulus condition. We considered both oxy- and deoxy-Hb as variables for the following statistical tests, because hemodynamic physiology is complex and it was unclear which measure best represents the neural correlates of particular cognitive function, particularly in young infants, as described in 2.6. Then, we performed two-tailed Wilcoxon’s rank sum tests (α = 0.05) within each stimulus condition of mother’s speech and strangers’ speech to identify the channels in which the mean change in Hb 3.0–14.9 s after stimulus onset was significantly different from those during the 0.1-s pre-stimulation period across participants. Moreover, averaged data across participants underwent two-tailed Wilcoxon’s rank sum testing (α = 0.05) to identify channels in which changes in the Hb signal during the ‘mother’ condition differed significantly from those during the ‘stranger’ condition. In addition, to investigate the hemispheric lateralization of regions with stronger responsiveness to mother’s voice than stranger’s voice, a two-way rank-based robust analysis of variance (ANOVA) test (Hettmansperger and McKean, 2011; Hocking, 1985) was conducted to determine the main effects and interactions between the hemispheric factor (left channels versus right channels) and the voice stimulus factor (mother’s voice versus stranger’s voice). To take multiple comparisons among all channels into account, we used false discovery rate (FDR) correction (q = 0.05) (Benjamini and Hochberg, 1995). The effect size was calculated using the following equation: , where Z and N represent the z-score and sample size, respectively (Field, 2005).
Phase-locking analysis
We chose channels in which changes in the Hb signal were significantly different between voice conditions as seed ROIs and examined the functional connectivity between each seed and all other channels. We used a phase-based method to investigate the functional connectivity (Lachaux et al., 1999; Tass et al., 1998) and focused on phase-locking (phase synchronization) between the two Hb signals. The reasons for using this phase-based method instead of more general amplitude-based methods such as the general linear model (GLM, e.g., Perani et al., 2011) or dynamic causal modeling (Tak et al., 2015) relate to Hb data characteristics that are unique to neonates. The amplitude-based method requires a hemodynamic response function (HRF) model that is based on physiological neural mechanisms. However, it is difficult to define a good HRF model in infants due to the variability of the hemodynamic response, which involves vasculature, and consequently the neurovascular coupling is immature (Gervain et al., 2011; Arimitsu et al., 2018; Gemignani et al., 2018). In the present study, we could not define an appropriate HRF model for neonatal Hb signals due to variability among different cortical areas and among stimulus conditions (see the Results section). Therefore, we selected a phase-based method requiring no prior knowledge of the shape of the expected hemodynamic response.The phase-based method has several steps. First, the instantaneous phases, φX(t,i) and φY(t,i), were extracted from the Hilbert transformation of the Hb signals for channel X and channel Y, respectively, for time t in the i-th block. X corresponds to each channel of the seed ROIs, and Y corresponds to each channel other than X. To calculate the phase difference between channels X and Y, we used the equation: θ(t,i) = φX(t,i) - φY(t,i) (see Fig. 2A and B). We used modulus after dividing each phase by 2π to detect the preferred values of θ, irrespective of noise-induced phase slips (Tass et al., 1998). Next, for each Ch-pair of X and Y, we used a statistical test based on surrogate data to judge whether the θ did not vary (phase-locking) during 3.0–14.9 s after stimulus onset. A large amount of θ data for every Ch-pair was needed in this test to obtain the appropriate distribution of θ. However, the number of samples of θ from some stimulus periods per participant was not sufficient, because the sampling period of Hb signal was low (10 Hz) and the stimulus period was short (15 s). Therefore, we collected θ data from all participants for each stimulus condition. Fig. 2C shows an example of the distribution of the actual samples of θ between X = Ch 40 and Y = Ch 5 across all participants in the mother’s speech condition. The surrogate data of the same Ch-pair of X and Y were produced by applying the iterated amplitude-adjusted Fourier transform method (Schreiber and Schmitz, 1996) to the Hb signal of channel Y (Fig. 2D). This method enabled us to randomly change θ between X and Y (Fig. 2E). The surrogate distribution of was also obtained by collecting data from all participants (Fig. 2F). We calculated ρ as an index based on Shannon entropy (Tass et al., 1998) to test the null hypothesis that samples of θ for the Ch-pair were drawn from a uniform distribution (i.e., the distribution of non-phase-locking data). ρ is defined as: , where S is the Shannon entropy and M is the number of bins in the distribution of θ. The optimal number of M was given by , where Ms denotes the number of samples (Otnes and Enochson, 1972), and M was set to 112. The Shannon entropy is , where p(m) is the probability of the m-th bin. ρ = 0 corresponds to a uniform distribution (no phase-locking) and ρ = 1 corresponds to perfect phase-locking across participants. We selected Ch-pairs with higher ρ than the significance level, which corresponded to 99% of the surrogate distribution of ρ given by 200 surrogate data sets. These selected Ch-pairs were interpreted as phase-locking pairs.
Fig. 2
Significant synchronization between two channels. (A) Examples of Hb-signal waves for channel X (Ch. 40; solid line) and channel Y (Ch. 5; dashed line) in the i-th block of a participant. (B) Phase difference θ(t,i) between X and Y. (C) The actual distribution of θ for all blocks in the ‘mother’ condition for all participants. (D) An example of surrogate data produced by the phase-randomized Hb-signal wave of Y. (E) Phase difference (t,i) between X and surrogate Y. (F) The surrogate distribution of for all participants is similar to a uniform distribution.
Significant synchronization between two channels. (A) Examples of Hb-signal waves for channel X (Ch. 40; solid line) and channel Y (Ch. 5; dashed line) in the i-th block of a participant. (B) Phase difference θ(t,i) between X and Y. (C) The actual distribution of θ for all blocks in the ‘mother’ condition for all participants. (D) An example of surrogate data produced by the phase-randomized Hb-signal wave of Y. (E) Phase difference (t,i) between X and surrogate Y. (F) The surrogate distribution of for all participants is similar to a uniform distribution.Next, we sought Ch-pairs where the phase-locking level varied based on the influence of the voice stimulus. First, we calculated the phase-locking value (PLV; Lachaux et al., 1999) of the Ch-pairs. The PLV between channels X and Y at time, t, is given as the length of the mean vector of θ across N blocks: , where j denotes the imaginary unit that is used to represent θ as a vector on the unit circle that is defined in the complex plane. PLV is the inter-block variability of θ at t; it is close to 1 if the phase difference varies little (phase-locking) across blocks, and is close to 0 if there is no phase-locking. We calculated PLV using at least four good blocks of data (N ≥ 4) for each participant. For each Ch-pair, we obtained good PLV data from at least 25 neonates. We performed two-tailed Wilcoxon’s rank sum tests (α = 0.05) on individual data to reveal Ch-pairs where PLV between about 3.0 and 14.9 s differed significantly from that for the 0.1-s silent pre-stimulation period. Moreover, individual PLV data underwent two-tailed Wilcoxon’s rank sum tests (α = 0.05) to identify Ch-pairs in which changes in PLV were significantly different between stimulus conditions. We have reported preliminary results for the same participant dataset using a different method of analysis (cross correlation), but at that time we failed to reveal a clear difference between the conditions (Uchida et al., 2015).
Results
All participants were asleep during measurements. Their sleep state was judged as ‘active sleep’ because we observed frequent motor activity of limbs and rapid eye movements. Furthermore, EEG and EOG recordings collected from 18 participants showed EEG patterns that were mainly composed of low-voltage irregular and mixed patterns (Anders et al., 1971; Scholle and Schäfer, 1999) and rapid eye movements.Oxy- and deoxy-Hb values changed significantly across broad areas of the frontal cortex, both sides of temporal cortex, and parts of the motor and somatosensory cortices during the ‘mother’ condition (p < 0.01, > 0.5; Fig. 3A and Table S3). In contrast, the ‘stranger’ condition yielded significant changes in oxy-Hb only in the right temporal cortex (p < 0.005, > 0.5; Fig. 3B and Table S3). As shown in Fig. 3C and Table S3, several channels differed significantly between the two stimulus conditions. Compared with the ‘stranger’ condition, the mother’s speech produced greater changes in oxy-Hb in left and central frontal pole (FP; Chs 29 and 40) and right middle temporal gyrus (MTG; Ch 23) (p < 0.001, > 0.6). Additionally, the mother’s speech also resulted in strong and significant changes in deoxy-Hb in the left inferior/middle frontal gyrus (IFG/MFG; Ch 38) and left precentral/superior temporal gyrus (PrCG/STG; Ch 6) (p < 0.005, > 0.5). Here, we labeled Chs 38 and 6 as IFG/MFG and PrCG/STG, respectively. We note that the anatomical estimation of Ch 38 by Matsui et al. (2014) included a greater proportion of MFG than IFG, as shown in Table S3. This is because the definition of the IFG-MFG border is intricate and differs depending on the anatomical atlas, e.g., the AAL (Tzourio-Mazoyer et al., 2002), Brodmann (Lancaster et al., 2000), or Matsui et al. (2014). The latter atlas chiefly relies on AAL definitions and includes less IFG in this area than do other atlases.
Fig. 3
Cortical areas related to hearing familiar (mother) and unfamiliar (stranger) voices. (A) Channels with significant decreases in oxy-Hb (solid magenta) and increases in deoxy-Hb (dotted cyan) in response to a mother’s voice (compared with the prestimulus period). (B) Channels with significant increases in Hb signals in response to a stranger’s voice (compared with the prestimulus period). (C) Channels with significant differences between conditions. (D) Grand-averaged time courses of oxy-Hb (magenta) and deoxy-Hb (cyan) to a mother’s voice (solid line) and a stranger’s voice (dotted line) for each channel showing significant differences in panel C. Error-ranges between the two thin lines indicate the 95% confidence intervals. Exposure to the vocal stimulus was from time 0 to 15 s. FP, frontal pole; IFG/MFG, inferior/middle frontal gyrus; PrCG, precentral gyrus; STG, superior temporal gyrus; MTG, middle temporal gyrus.
Cortical areas related to hearing familiar (mother) and unfamiliar (stranger) voices. (A) Channels with significant decreases in oxy-Hb (solid magenta) and increases in deoxy-Hb (dotted cyan) in response to a mother’s voice (compared with the prestimulus period). (B) Channels with significant increases in Hb signals in response to a stranger’s voice (compared with the prestimulus period). (C) Channels with significant differences between conditions. (D) Grand-averaged time courses of oxy-Hb (magenta) and deoxy-Hb (cyan) to a mother’s voice (solid line) and a stranger’s voice (dotted line) for each channel showing significant differences in panel C. Error-ranges between the two thin lines indicate the 95% confidence intervals. Exposure to the vocal stimulus was from time 0 to 15 s. FP, frontal pole; IFG/MFG, inferior/middle frontal gyrus; PrCG, precentral gyrus; STG, superior temporal gyrus; MTG, middle temporal gyrus.We also found brain regions sensitive to the ‘stranger’ speech. Stronger changes in oxy-Hb occurred in the right superior temporal gyrus (STG; Ch 18) when hearing a stranger’s speech than when hearing a mother’s speech (p = 0.0001, = 0.76). Grand averaged time courses of the Hb signals for these six channels are shown in Fig. 3D (for all channels see Figure S1). Averaged oxy-Hb and deoxy-Hb tended to decrease and increase, respectively, in distributed cortical areas when hearing the mother’s speech. Conversely, averaged oxy-Hb and deoxy-Hb tended to increase and decrease, respectively, when hearing a stranger’s speech. These amplitude changes in Hb signals were not linearly related to the participants’ ages in days (Table S4).We also investigated hemispheric lateralization of each region showing significant differences between the two stimulus conditions (Chs 40, 29, 38, 6, 18, and 23; see Fig. 3C). Apart from Ch 40, which was in the medial line and had no contralateral channel, the remaining five channels and their contralateral channels were entered in the two-way rank-based robust ANOVA with hemisphere and voice stimulus as factors for each channel (FP, IFG/MFG, PrCG/STG, STG, and MTG: as region labels of five channels). The results revealed significant main effects of the voice stimulus in all channels, and significant interaction of the two factors was found in only FP (Ch 29 vs 33) (F = 11.227, p < 0.002, > 0.4). Specifically, the decrease in FP oxy-Hb was significantly greater in the left hemisphere (Ch 29) than in the right hemisphere (Ch 33) during exposure to the mother’s voice.We again focused six channels (Chs 40, 29, 38, 6, 18, and 23) that showed significant differences between the two voice stimulus conditions, and investigated the strength of connectivity to other channels. Using these Chs as seed ROIs (Chs 29 and 40 as ROI-1 in oxyHb, Ch 38 as ROI-2 in deoxyHb, Ch 6 as ROI-3 in deoxyHb, Ch 18 as ROI-4 in oxyHb, and Ch 23 as ROI-5 in oxyHb; Fig. 3C), we calculated PLV between each seed and the other channels to determine functional connectivity. Fig. 4 (see also Table S5) shows that PLV was significantly higher (significant connectivity) when hearing vocal stimuli than when not hearing anything in the pre-stimulus period (uncorrected p < 0.05, > 0.5). Overall, mother and stranger speech elicited ipsilateral and contralateral functional connections. However, the mother’s speech elicited broader connections than the stranger’s speech. The central FP (Ch 40, ROI-1) formed a significant connection with the bilateral temporal areas during the ‘mother’ condition, but did not form any significant connections during the ‘stranger’ condition (Fig. 4A, row 1). Connections were formed between bilateral temporal areas and the left IFG/MFG (Ch 38, ROI-2) or left PrCG/STG (Ch 6, ROI-3) during the ‘mother’ condition (Fig. 4A, rows 2 and 3). Particularly, as shown in Fig. 4B, the left IFG/MFG. which covers the anterior language area, had many strong connections with temporal areas including STG and SMG (posterior language area), while this was not the case in the ‘stranger’ condition. The right STG (Ch 18, ROI-4), which exhibited significantly stronger changes in oxy-Hb during the ‘stranger’ condition (Fig. 3D; Table S3), did not actually form any connection with other areas during this condition. However, it did form connections with frontal areas during the ‘mother’ condition (Fig. 4C). The right MTG (Ch 23, ROI-5), which exhibited a significant increase in oxy-Hb during the ‘stranger’ condition and a significant decrease during the ‘mother’ condition (Fig. 3D, row 6), formed broad connections with frontal areas during both stimulus conditions (Fig. 4A, row 5). However, connections extended around the lateral fissure, including the right temporal pole, only in the ‘mother’ condition. Additionally, the central FP and left SMG were more strongly connected during the ‘mother’ condition (Fig. 5; uncorrected p = 0.013, = 0.31). No PLVs were correlated with infant age (in days) (Kendall’s correlation analysis, p > 0.05, see Table S5).
Fig. 4
Spatial functional connectivity maps. (A) Overhead view of significant functional connectivity for seed ROIs in the FP, left IFG/MFG, left PrCG/STG, right STG, and right MTG while hearing a mother’s voice (left column) and a stranger’s voice (right column). Location of each seed ROI is shown by a black dot and an arrow, and the significant connections (p < 0.05) are shown by colored lines (oxy-Hb, magenta; deoxy-Hb, light blue). Connectivity strength is indicated by line thickness. (B) Connectivity with the left IFG/MFG viewed from the left side. (C) Connectivity with the right STG viewed from the front and diagonal right side. SMG: supramarginal gyrus (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article).
Fig. 5
Stronger connectivity in response to the mother’s voice than to a stranger’s voice. (A) View from the left side of the connectivity showing significant difference between the ‘mother’ and ‘stranger’ conditions. (B) PLV time courses between the central FP and left SMG while hearing the mother’s voice (thick line) or a stranger’s voice (dotted thin line). * A significant difference between vocal stimulus conditions was detected during this period.
Spatial functional connectivity maps. (A) Overhead view of significant functional connectivity for seed ROIs in the FP, left IFG/MFG, left PrCG/STG, right STG, and right MTG while hearing a mother’s voice (left column) and a stranger’s voice (right column). Location of each seed ROI is shown by a black dot and an arrow, and the significant connections (p < 0.05) are shown by colored lines (oxy-Hb, magenta; deoxy-Hb, light blue). Connectivity strength is indicated by line thickness. (B) Connectivity with the left IFG/MFG viewed from the left side. (C) Connectivity with the right STG viewed from the front and diagonal right side. SMG: supramarginal gyrus (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article).Stronger connectivity in response to the mother’s voice than to a stranger’s voice. (A) View from the left side of the connectivity showing significant difference between the ‘mother’ and ‘stranger’ conditions. (B) PLV time courses between the central FP and left SMG while hearing the mother’s voice (thick line) or a stranger’s voice (dotted thin line). * A significant difference between vocal stimulus conditions was detected during this period.
Discussion
In this study, maternal speech elicited significant hemodynamic changes in broad areas of the neonatal brain, particularly the frontal and temporal areas. Further functional connectivity analysis revealed that the frontal area synchronized with the bilateral temporal cortices when hearing maternal speech, particularly with the left temporal cortex.
Maternal speech enhances the left-side frontotemporal network
Left-lateralized responses in the temporal region have frequently been observed in infants for familiar spoken language (e.g., normal vs. reversed speech of a native language, Dehaene-Lambertz et al., 2002; native vs. foreign language, Minagawa-Kawai et al., 2011b; dialect differences, Cristià et al., 2014) as well as in neonates (e.g., normal vs. backward speech, Peña et al., 2003; native vs. foreign languages, Sato et al., 2012; unfamiliar spoken language vs. whistling language, Molavi et al., 2014). Dehaene-Lambertz et al. (2010) reported a significant response to maternal speech not only in the left temporal region of 2-month-olds, but also in the mPFC, OFC, and amygdala. Significantly greater activation in the mPFC was also observed when 7–9-month-old infants heard their mothers’ infant-directed speech (Naoi et al., 2012) and when 6-month-old infants heard their own names spoken by their mothers (Imafuku et al., 2014). The present study was consistent with these previous findings in revealing stronger brain responses in the superior temporal, prefrontal, and precentral regions in response to maternal speech than to non-maternal speech. In particular, compared with baseline, the ‘mother’ condition evoked significant activity in many left channels, specifically in the FP that showed a significant left dominance relative to the contra-lateral channel. In contrast, that was not seen when comparing the ‘stranger’ condition to baseline with no significant activity in the left hemisphere. Thus, language processing appears to be more specialized or facilitated when speech is familiar. This interpretation is supported by previous behavioral and EEG studies reporting the advantage of maternal stimuli for language acquisition in infants (Barker and Newman, 2004; Cristià, 2011; Parise and Csibra, 2012).Functional connectivity originating in the left prefrontal area (IFG/MFG) spread to broader temporal areas (including the left STG, MTG, and SMG) in the ‘mother’ condition than in the ‘stranger’ condition (Fig. 4B). This leftward connectivity might partially correspond to either of the previously described neural language networks: the dorsal pathway connecting the inferior frontal gyrus via the arcuate fasciculus to the temporal cortex (detected by DTI in infants; Dubois et al., 2009; Leroy et al., 2011) or the ventral pathway connecting the ventral IFG via the extreme capsule to the temporal cortex (detected by DTI in newborns; Perani et al., 2011). However, other than in a resting-state fNIRS study (Homae et al., 2011) in 3-month-olds after having heard native speech spoken by a stranger, no frontotemporal functional connectivity has been reported in the literature. Thus, our current results are the first report of left frontotemporal connectivity for neonates when hearing speech. Moreover, we have demonstrated that long-range functional connectivity between the left temporal area (SMG) and the central FP is stronger in response to a mother’s speech than to a stranger’s speech (Fig. 5). However, this long-range functional connection was barely detectable in newborns (Zhang et al., 2007; Perani et al., 2011). This suggests that the neonatal FP might be indirectly connected to the left temporal region.It seems that exposure to maternal speech in utero might have shaped this frontotemporal network. This idea is supported by the study mentioned above in which one-month exposure to maternal speech was reported to thicken the auditory cortex of infants born extremely prematurely (Webb et al., 2015). Furthermore, the strength of hemodynamic activity and functional connectivity induced by maternal speech in these left regions did not correlate with age (days since birth), as shown in Table S4. Taken together, our results provide some evidence that neonatal left frontotemporal connectivity is enhanced by maternal speech during the fetal period.
Maternal speech enhances the right frontotemporal network
The right temporal lobe is known to respond dominantly to prosodic differences in 3-month-old infants (Homae et al., 2006) and neonates (Arimitsu et al., 2011). It also responds to changes in voice type (male ⬌ female) in preterm infants at gestational 28–32 weeks (Mahmoudzadeh et al., 2013). In the present study, neonates exhibited significant activity in the right temporal area in response to both stranger and mother speech. This suggests that both types of speech elicited significant activity in the right temporal area because they have acoustic properties of the human voice that are rich in prosodic information. However, the right temporal region (STG) was functionally connected to the FP, including the mPFC and OFC, only in the ‘mother’ condition (Fig. 4C). Because right STG is engaged in voice identification, this network specific to the maternal voice may reflect processing of familiar voices in relation to attention, emotion, or other cognitive functions governed by the frontal area (see below for details). We assume that this right frontotemporal network is organized through exposure of daily maternal speech during pregnancy. This idea is supported by the non-significant correlation between the strength of connectivity and neonate age (days).In adults, the right temporal region, including the STG, MTG, and temporal pole (TP), is associated with the discrimination of emotional prosody (Zatorre et al., 1992; Sander et al., 2005), discrimination of voices and speaker identification (Belin and Zatorre, 2003; Kriegstein and Giraud, 2004; Nakamura et al., 2010), and recall of social information regarding personal interactions or emotional episodes (McCarthy and Warrington, 1992; Markowitsch, 1995; Olson et al., 2007). The right STG relays information to the TP, which connects to the amygdala and FP (OFC) and is thus involved in emotional processing (Kondo et al., 2003; Liu et al., 2013). In neonates, the cingulate gyrus, which connects to the FP and is involved in emotional processing, exhibits small-world network properties in the right hemisphere (Ratnarajah et al., 2013). Although the present fNIRS study cannot directly visualize these connections deep in the brain due to its spatial limitations, the FP-right STG connectivity found exclusively in the ‘mother’ condition may likely be involved in this small-world network.
Neural substrates that influence maternal speech-induced infant behavior
A mother’s speech influences her infant’s behavior. Previous behavioral studies that measured non-nutritive sucking in 1-month-old infants (Mehler et al., 1978) and neonates (DeCasper and Fifer, 1980) indicated that infants increase their sucking behavior in response to the mother’s speech to a greater extent than they do in response to the speech of an unknown female. These results were interpreted to suggest that the infant prefers their mother’s speech. However, Moon et al. (2015) did not find a sucking response associated with the mother’s speech for neonates and concluded that neonates are not sufficiently motivated by their mothers’ speech to alter sucking behavior. Taking these behavioral studies into account, in addition to the voice recognition processing in the neonatal brain, we need to consider the motor and motivation processing needed to generate a behavior. We previously used indices of EEG and respiratory rate (the number of breaths taken per minute) to investigate cortical activity associated with the respiratory behavior of neonates when they hear their mother’s speech (Uchida et al., 2017). Several types of changes in respiratory rate were promoted by the mother’s speech, and the amplitude of the EEG delta rhythms in the frontal cortex (Fz) was simultaneously increased. This suggests that the respiratory response to the maternal speech may be associated with the frontal cortex, and this may play a role in the motor and motivation processing that drives respiratory behavior from the neonatal period. The central FP (Ch 40) in the present study was near Fz and functionally connected to the bilateral temporal cortex, including the STG (ROI-3 and -4 in Fig. 4A) and left SMG (Fig. 5A). Similar connectivity that was strengthened by hearing a mother’s speech was reported in children with high social communication skills (Abrams et al., 2016). The authors suggested that the STG is a voice-recognition processing region and that the frontal region is a region involved in reward and affective processing (motivation processing) of familiar sounds. While our results obtained from the frontal cortex do not necessary indicate such motivation processing, future work examining the relationship between FP-STG functional connectivity and respiratory behavior while hearing the maternal voice would reveal such motivation system in neonates.
Physiology of hemodynamic response to mother’s voice
The physiological mechanisms underlying various hemodynamic response patterns observed in infants is a controversial issue in infant fNIRS and fMRI studies (e.g., Lloyd-Fox et al., 2010; Arimitsu et al., 2015; Issard and Gervain, 2018). Typically, increased oxy-Hb with a slight decrease in deoxy-Hb (i.e., a positive blood-oxygen-level dependent (BOLD) response) is observed, as represented by an HRF model (Peña et al., 2003; Arichi et al., 2010; Liao et al., 2010; Taga et al., 2011). In contrast, a negative BOLD response, characterized by decreased oxy-Hb and increased deoxy-Hb, has also been reported in infants (Yamada et al., 1997, 2000; Meek et al., 1998; Morita et al., 2000; Muramoto et al., 2002; Kusaka et al., 2004). Our results showed the typical HRF pattern for the ‘stranger’ condition and an atypical reversed HRF for the ‘mother’ condition. It has been suggested that factors triggering this variety of hemodynamic response patterns involves the difference between awake and sleep states (Meek et al., 1998; Taga et al., 2003, 2011; Kotilahti et al., 2005) and age (in days from birth; i.e., the amount of time that neonates are exposed to the mother’s voice outside of the womb). However, none of these factors explain the present data. The neonates were in the same sleep state (active sleep) across the stimulus conditions, but showed differential HRFs depending on the vocal stimuli. We also confirmed no correlation between age (in days) and hemodynamic changes for our participants 2 to 7 days after birth (Table S4). Systemic fluctuations evoked by task-related body movements and psychophysiological changes often contaminate fNIRS signals (Yamada et al., 2012) and may cause atypical types of HRF. The effect of systemic fluctuations should be examined in future studies. However, the systemic components may have little effect on the results of the present study, because we discarded blocks containing head and body movement (See 2.4). Task-evoked changes in skin blood flow due to systemic effects can also mask fNIRS signals. However, the contribution of the deep layer including cortical tissues to fNIRS signals in infants during language tasks was 72%, greater than that of the shallow layer including skin tissue (Funane et al., 2016). Therefore, it is likely that the fNIRS signals in the present study mainly reflected cortical hemodynamics rather than systemic hemodynamics.In general, a typical (adult-like) HRF pattern is thought to mainly reflect increased hyperemia for the delivery of oxygenated blood to the activated brain region, which outstrips the oxygen consumption increase of neural activity (Attwell and Iadecola, 2002; Heeger and Ress, 2002; Sirotin et al., 2009). However, Yamamoto and Kato (2002) executed a language task, reporting that reverse HRF (increased deoxy-Hb) reflected capillary hemodynamics associated with higher oxygen consumption, while typical HRF reflected hemodynamic changes in the large veins. In addition, neonatal responses often show the reverse pattern, associated with the immaturity of hyperemia in neonatal cortical pial arteries (Kozberg et al., 2013), or with a higher oxygen demand for rapid synaptogenesis (Morita et al., 2000; Muramoto et al., 2002). Therefore, these atypical patterns can be regarded as signals reflecting brain activations. A difference in the processing load might contribute to the differential HRF patterns. Specifically, cortical oxygen consumption could increase because of the diverse network synchronization that is evoked by familiar and eventful stimuli like a mother’s speech. In the present study, the stranger’s speech evoked a positive BOLD response and limited or no network synchronization. The right STG (ROI-4), which exhibited a significantly positive BOLD response to the stranger’s speech, was not associated with any region. Conversely, the mother’s speech evoked a negative BOLD response and increased PLVs in more diverse regions in the frontal, left, and right temporal areas when compared with the stranger’s speech. ROI-4, which showed a non-significant weak negative BOLD response to the mother’s speech, connected with a broad area including the frontal cortex (Fig. 4C). Such broad synchronization while hearing the mother’s speech as a familiar stimulus could promote increased oxygen consumption and lead to a negative BOLD signal. Conversely, oxygen consumption could be overwhelmed by hyperemia at the capillary or precapillary arteriole level, without recruiting pial arteries. This would result in positive BOLD responses when hearing a stranger’s speech. The mother’s voice likely elicits a unique HRF pattern as it induces a strong processing load for familiar stimuli.
Conclusion
The present fNIRS study revealed an early cortical network enhanced by maternal speech in neonates using a phase synchronization method that was applicable to block-design data. The functional connectivity between the frontal pole and the temporal cortex was strengthened more when neonates heard their mother’s speech than when they heard a stranger’s speech. Frontotemporal connectivity was discussed in relation to the facilitation of the language network in the left hemisphere and in relation to voice identification for the right hemisphere. These results suggest that maternal speech fosters functional frontotemporal circuitry at the beginning of life, and probably contributes to the formation of higher cognitive networks such as language.
Funding
This work was supported by the Global COE (Center of Excellence) program of Keio University and the Japan Society for the Promotion of Science (JSPS) Kakenhi (Grant Nos. 15H01691, 19H05594, 24791123, 24591609, and 15K09725).
Declaration of Competing Interest
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
Authors: K Nakamura; R Kawashima; M Sugiura; T Kato; A Nakamura; K Hatano; S Nagumo; K Kubota; H Fukuda; K Ito; S Kojima Journal: Neuropsychologia Date: 2001 Impact factor: 3.139
Authors: Judit Gervain; Jacques Mehler; Janet F Werker; Charles A Nelson; Gergely Csibra; Sarah Lloyd-Fox; Mohinish Shukla; Richard N Aslin Journal: Dev Cogn Neurosci Date: 2010-08-04 Impact factor: 6.464
Authors: Daniel A Abrams; Tianwen Chen; Paola Odriozola; Katherine M Cheng; Amanda E Baker; Aarthi Padmanabhan; Srikanth Ryali; John Kochalka; Carl Feinstein; Vinod Menon Journal: Proc Natl Acad Sci U S A Date: 2016-05-16 Impact factor: 11.205