Literature DB >> 29253738

The integration of audio-tactile information is modulated by multimodal social interaction with physical contact in infancy.

Yukari Tanaka¹, Yasuhiro Kanakogi², Masahiro Kawasaki³, Masako Myowa⁴.

Abstract

Interaction between caregivers and infants is multimodal in nature. To react interactively and smoothly to such multimodal signals, infants must integrate all these signals. However, few empirical infant studies have investigated how multimodal social interaction with physical contact facilitates multimodal integration, especially regarding audio - tactile (A-T) information. By using electroencephalogram (EEG) and event-related potentials (ERPs), the present study investigated how neural processing involved in A-T integration is modulated by tactile interaction. Seven- to 8-months-old infants heard one pseudoword both whilst being tickled (multimodal 'A-T' condition), and not being tickled (unimodal 'A' condition). Thereafter, their EEG was measured during the perception of the same words. Compared to the A condition, the A-T condition resulted in enhanced ERPs and higher beta-band activity within the left temporal regions, indicating neural processing of A-T integration. Additionally, theta-band activity within the middle frontal region was enhanced, which may reflect enhanced attention to social information. Furthermore, differential ERPs correlated with the degree of engagement in the tickling interaction. We provide neural evidence that the integration of A-T information in infants' brains is facilitated through tactile interaction with others. Such plastic changes in neural processing may promote harmonious social interaction and effective learning in infancy.

Entities: Chemical Disease Gene Species

Keywords: Electroencephalogram (EEG); Infants; Multisensory integration; Touch interaction

Mesh：

Year: 2017 PMID： 29253738 PMCID： PMC6969118 DOI： 10.1016/j.dcn.2017.12.001

Source DB: PubMed Journal: Dev Cogn Neurosci ISSN： 1878-9293 Impact factor: 6.464

Introduction

Infants learn social behaviors through interaction with others. Such interaction involves sensory information, which is multimodal in nature. Infants may simultaneously receive visual (smiles, and eye contact), auditory (infant-directed speech) and tactile (gentle touches) information (Sullivan and Horowitz, 1983; Nishimura, Kanakogi, & Myowa-Yamakoshi, 2016). To react interactively and easily to such multimodal input, infants have to integrate all these signals. The mechanisms by which infants integrate audio − visual (i.e., A-V) (Bahrick, Netto, & Hernandez-Reif, 1998; Lewkowicz and Ghazanfar, 2009, Lewkowicz, 2010) and visual − tactile (i.e., V-T) information (Zmyj, Jank, Schütz-Bosbach, & Daum, 2011; Bremner, Holmes, & Spence, 2008) are increasingly understood. However, relatively little is known about the developmental mechanism involved in the integration of A-T information, and its function. The integration of A-T information should particularly be understood during social interactions, given the role of tactile and speech signals in the context of affective bonds between caregivers and infants. Coupled A-T cues help to regulate infants’ emotional state and attention, which encourages harmonious interaction between mothers and infants (Jahromi, Putnam, & Stifter, 2004). Young infants are also sensitive to such A-T stimulation in natural communicative situations; 4 − 6-month-old infants often laugh in response to A-T tickling stimulation (Sroufe and Wunsch, 1972, Sroufe and Waters, 1976). During tickling interactions, caregivers often say “tickle” using infant-directed speech, or they show their hands to the infants (Fogel, Nelson-Goens, Hsu, & Shapiro, 2000; Messinger, Dickson, & Fogel, 2001; Negayama and Yamaguchi, 2005). These multimodal signals facilitate the integration of arbitrary multimodal information (Slater, Quinn, Brown, & Hayes, 1999; Hernandez-Reif and Bahrick, 2001), emphasizing significant features within the environment (Gogate, Bahrick, & Watson, 2000; Gogate, Walker-Andrews, & Bahrick, 2001). Thus, infants may integrate auditory and tactile information through social interactions. Yet, it remains unclear how A-T information is integrated in infants’ brains through the experience of tactile interaction. Only 1 electroencephalogram (EEG) study has investigated whether young children integrate A-T information (i.e., pure tone and vibration) (Russo, Foxe, Brandwein, Altschuler, Gomes et al., 2010). The study showed stronger event-related potentials (ERPs) around 100 − 200 msec at temporal and central sites when children perceived multimodal A-T stimuli, as compared to unimodal stimuli. However, the previous study did not focus on the effects obtained in the context of social interaction. If infants integrate A-T information in a social situation, their neural processing involved in A-T information would be modulated. ERPs can describe the time course of neural processing in infants’ brains, which reflects stimulus processing at different functional stages during integration of A-V (Kushnerenko, Teinonen, Vikein, & Csibra, 2008; Grossman, Striano, & Friederici, 2006) and V-T (Rigato, Begum Ali, van Velzen & Bremner, 2014) information. Furthermore, the activity of specific frequency ranges, such as beta (about 15 − 20 Hz) and gamma (above 40 Hz) bands, is related to the integration of multimodal information (Asano, Imai, Kita, Kitajo, Okada, et al., 2015; Schneider, Lorenz, Senkowski, & Engel, 2011). Thus, by using EEG and ERPs, the dynamic neural processing involved in A-T integration modulated by social interaction can be assessed. As mentioned above, tickling interactions facilitate investigation of A-T integration. In typical tickling interactions between adults and infants, there are synchronized multimodal cues that encourage infants to integrate A-T information. Our pilot study showed that, during natural mother − infant tickling interactions, infants show anticipatory coordinated behaviors, depending on the A-T cues provided by their mothers. Initially, mothers often spoke to and simultaneously tickled the infants, who laughed reactively; after several interactions, mothers spoke before they tickled the infants, who exhibited anticipatory body movement prior to tickling (see Supplementary Information). To reveal the plastic changes facilitating A-T integration, we focused on the perception of auditory information modulated by the experience of multimodal tickling interaction. The omission paradigm allows assessment of whether unimodal information processing is modulated by multimodal experiences, by evaluating how multimodal stimuli are associated in the brain (den Ouden, Friston, Daw, McIntosh, & Stephan, 2009; Emberson, Richards, & Aslin 2015). It involves (i) simultaneous presentation of 2 or more stimuli from different modalities, to allow infants to associate them, before (ii) recording the neural responses to perception of only 1 of these stimuli (when they are no longer paired). The present study investigated how neural processing of A-T integration is modulated by multimodal social interaction involving physical contact during infancy. We focused on 7 − 8-month-old infants, as their brains have shown evidence of integration of multimodal information (Kushnerenko et al., 2008, Grossmann et al., 2006, Rigato et al., 2014). We used the omission paradigm in 2 phases: the exposure and the test phases. During the exposure phase, infants heard one pseudoword while being tickled (multimodal ‘A-T’ condition) and another while not being tickled (unimodal ‘A’ condition). In the test phase, we used EEG to measure the infants’ brain activity when they heard the same pseudowords in the absence of tickling. We compared the ERPs and oscillatory responses between these conditions. We considered 2 hypotheses. First, we predicted that A-T information is integrated through the tickling interaction, which will be reflected as stronger ERPs in the early period (before 200 msec after stimulus onset) and higher beta- or gamma-band activity at temporal and central sites for the A-T compared to the A condition (Russo et al., 2010). Second, we predicted that, as a result of integrating A-T information, expectation-related somatosensory responses will be elicited for the A-T condition compared to the A condition. The neural response to an omitted stimulus is measured using a negative component, the N250 (occurring 250–450 msec from stimulus onset) (Garrido et al., 2009), as reported in somatosensory systems (Kekoni, Hämäläinen, Saarinen, Gröhn, Reinikainen, Lehtokoski et al., 1997; Akatsuka, Wasaka, Nakata, Inui, Hoshiyama, & Kakigi, 2005). Oscillatory responses in the theta-range in infancy reflect expectation of upcoming stimuli (Stroganova, Orekhova, & Posikera, 1998; Orekhova, Stroganova, & Posikera, 1999). A stronger N250-like response and higher theta activation should be obtained when somatosensory systems respond to omitted, but expected, stimuli as a result of A-T integration. We also investigated whether ERP responses are related to infants' behavior in tickling interaction to confirm that multimodal interaction affect their brain responses.

Materials and methods

Participants

Data from a total of 28 infants (14 boys, M = 236.58 days, SD = 19.67, range = 210–264 days) were included in the study. An additional 10 infants (4 boys) participated in the experiment, but the relevant data were excluded for the following reasons: fussiness in the exposure phase (n = 6); not completing the entire test session (n = 1), and excessive noise within their EEG data (n = 3). All participants were neurologically typical, full-term (between 37 and 42 weeks of gestation) Japanese infants. Parents of infants gave informed consent and the study protocol was approved by the Ethics Committee of the Web for Integrated Studies of the Human Mind, Japan (WISH, Japan).

Stimuli for the test phase

We used 2 pseudowords (/topi-topi/ and /beke-beke/) as the stimuli for the test phase. The words consisted of the repetition of 2 moras, because the Japanese words typically used during a tickling interaction is /kocho-kocho/, which also involves the repetition of 2 moras. The stimuli used were recordings of the voice of a female experimenter who tickled the infants during the exposure phase. She did not know the purpose of the present study, and spoke each target word repeatedly in an infant-directed speech manner. Words were recorded at a 22.05-kHz sampling rate (in 16-bit monaural format) using a digital recorder in a soundproof chamber. After recording, another experimenter chose 2 different types of prosody per word, which were considered to reflect the most natural prosody. We prepared 2 different prosodic types in order to maintain the infants’ attention during the test phase. The auditory stimuli presented to each infant therefore consisted of a total of 4 stimuli (2 words with 2 natural prosodic patterns). The auditory stimuli were controlled for the following parameters: the average fundamental frequency (F0), pitch maximum (F-Max), frequency range (F-range), and duration (Supplementary Information Table S1). The intensity of the auditory stimuli was adjusted across stimuli by equalizing the root mean square power of all sound files. These stimuli were presented to participants at around 50.15 dB sound pressure level (SPL).

Procedure

The experiment had 2 phases: an exposure phase (during which infants and an experimenter interacted), followed by a test phase (during which infants only heard words via a speaker) (Table 1). Before the exposure phase, an EEG cap was placed on the infants’ heads, in order to shorten the time interval between these 2 phases (the mean time interval was 2 min). In our pilot test, we tried to record the infants’ EEG during both the exposure and test phases to analyze the relationship between them. However, infants moved largely in the exposure phase, since they were highly interested in a dynamic social interaction. If we restrained infants’ body movement, the interaction became unnatural. Therefore, we set an exposure phase separate from the test phase, and we measured the EEG in only the test phase.

Table 1

Protocol of the experiment. In the exposure phase, each block was presented alternately. The words assigned to each condition and the order of the presentation were counterbalanced among infants.

	Exposure phase (Live interaction)	Test phase (Through speaker)
A-T condition (Multimodal)	One word with tickling (e.g., topi-topi) × 5 times (1 block)	The same words as exposure phase were alternately presented, without tickling (e.g., topi–topi or beke-beke)
A condition (Unimodal)	The other word without tickling (e.g., beke-beke) × 5 times (1 block)

Protocol of the experiment. In the exposure phase, each block was presented alternately. The words assigned to each condition and the order of the presentation were counterbalanced among infants.

Exposure phase

The exposure phase took place with infants seated on their caregiver’s lap in a quiet room. Prior to the experiment, the experimenter played with the infants for a few minutes to build a rapport with them. Once the experiment commenced, the experimenter—sitting face-to-face with the infant—interacted with them in the following 2 manners: (1) tickling block: simultaneously tickling the infants’ torsos whilst the target word was emitted (i.e., the multimodal ‘A-T’ condition); (2) speech block: no physical interaction whilst the target word was emitted (i.e., the unimodal ‘A’ condition) (see Fig. 1). Each interaction was repeated 5 times as 1 block. The blocks in each condition were presented alternately. The experimenter smiled, made eye contact, and used infant-directed speech to the infant in both blocks. The experimenter was trained, during our pilot study, to speak each word with the same pitch, duration, and tempo. The exposure phase finished after infants either (i) completed 60 events per condition (24 blocks in total), or (ii) became inattentive to the experimenter, as indicated by their showing fussiness or becoming fidgety. The presentation of 2 conditions was counter-balanced across infants in this phase. The combination of words assigned to each condition was also counter-balanced across participants. Half of the participants heard the word “beke-beke” in the A-T condition and “topi-topi” in A condition, and vice versa. At least 50 events per condition were required for the final analysis. When infants became bored before completing 50 trials, the data of that infant were not used for further analysis (n = 6). The phase lasted until infants cried, or 60 trials per condition had been completed. Infants heard each word 53.93 times on average (SD = 4.38) per condition, which is virtually the same as reported by Seidl et al. (2015). The exposure phase lasted approximately 4 min (from 3.2 to 5.4 min). We also recorded infant’s behavior in order to analyze their motor responses to tickling (see below section on Coding of Infants’ Engagement in Tickling Interaction).

Fig. 1

Infant and experimenter in the exposure phase. (a) Tickling block and (b) speech block.

Test phase

After the exposure phase, infants and caregivers entered the shielded room. Infants sat on the caregiver’s lap in front of a 22-inch CRT monitor (RDT223BK, Mitsubishi Corporation, Tokyo, Japan), with a speaker (301 V, BOSE, Framingham, MA) located behind the monitor. Caregivers were instructed not to speak to infants during the EEG recording. The recording started once infants sat still. The experimental procedure is shown in Fig. 2. Following a simple attention grabbing animation (1000 ms) and subsequent blank screen (1000 − 1100 ms), an animal illustration irrelevant to the experiment was presented on the screen (2000 ms), prior to the presentation of a pseudoword (1000 ms). The inter-trial interval was 1000 − 1500 ms. The words used in the A-T and A conditions were presented alternately in order to prevent repetition suppression (Grill-Spector, Henson, & Martin, 2006). To avoid associative learning between specific illustrations and the pseudowords used in the test phase, we controlled the frequency of the presentation of specific illustrations and words by constructing blocks. 1 block consisted of 8 trials, and the combination of 4 words and 8 illustrations was counterbalanced across the blocks. We conducted 2 tests using a different order of blocks, and the tests were counterbalanced across infants. When the infants’ attention deviated during EEG recording, the experiment was paused whilst some attractive 30 s movies (irrelevant to the experiment) were presented to recapture the infants’ attention towards the monitor. The recording was restarted once the infants looked at the screen again. Completion occurred when infants became bored or had completed a total of 160 trials. The recording lasted about 10 min in total.

Fig. 2

Experimental procedure of EEG recording during the test phase. We presented an animation movie as an attention-getter (1000 ms), followed by an illustration irrelevant to the sound (1000 − 1100 ms) to grasp and keep infants’ attention. Then, the word in either condition was presented (1000 ms). Inter-trial intervals were 1000 − 1500 ms. We analyzed EEG data during the presentation of the sound only, with the 200-ms recording prior to sound presentation used as the baseline period.

EEG data acquisition and processing

EEG data were recorded with a 64-channel Geodesic Sensor Net and analyzed using Net Station software (EGI, Eugene, OR) sampled at 1000 Hz. Impedance was measured prior to EEG recording and kept below 50 kΩ. All recordings were initially referenced to the vertex and later re-referenced to the average of all channels. We also recorded infants’ behavior during EEG acquisition using 2 video cameras (HDR XR502 V, SONY, Tokyo, Japan; C615 HD webcam; Logitech, Newark, CA). During recording, a third experimenter checked the infants’ body movement online. She checked whether infants heard stimuli while keeping still (coded as ‘0’), they moved their body a little (coded as ‘1’), or markedly (coded as ‘2’) per each trial. These data were used for detecting motion artifacts.

ERP analysis

EEG data were digitally filtered off-line using a 0.3–30 Hz band-pass filter. Based upon prior infant research investigating EEG components in the perception of speech words (e.g., Renate and Debra, 2007; Kooijman, Hagoort, & Cutler, 2009), the data were segmented into a 1000-ms epoch that was time-locked to the onset of the auditory stimulus (target), preceded with a 200-ms pre-stimulus baseline period (so a total of 1200 ms). Artifacts were screened with the following automatic detection methods: eye blinks (140-mV threshold in the frontal region within 80 ms post stimulus presentation), eye movement (55-mV threshold), and excessive noise (i.e., channels with amplitudes exceeding 200 mV were excluded). We also inspected all EEG data visually, and marked bad channels. Segments with 10 or more bad channels were excluded. Additionally, upon visual screening analysis of the video recording data, segments containing marked body movements (i.e., coded as “2”) were also excluded from averaging, as were those segments that were likely to be due to motion artifacts. We used, on average, 32.89 trials for the A-T condition (range: 21–56), and 33.00 trials for the A condition (range: 20–56) per infant. The EEG data of infants were excluded from further analysis when fewer than 20 trials were left per condition (n = 2), or when the acquisition rate was less than 40% per condition (n = 1). The averages of amplitudes were calculated separately for each condition (the A-T condition and the A condition). To determine the target regions and time period, we referred to previous research in adults (Schneider et al., 2011; Tanaka, Fukushima, Okanoya, & Myowa-Yamakoshi, 2014). The infants’ EEGs, however, can differ from adults’ EEGs in terms of latency and spatial distribution (e.g., Wunderlich, Cone-Wesson, & Shepherd, 2006). This was the first study to examine auditory processing associated with tactile stimulation in infancy. Thus, we visually inspected the EEG data to determine the appropriate regions. In adults, tactile priming stimuli affect subsequent auditory processing (Schneider et al., 2011, Tanaka et al., 2014). Previous studies have found that ERPs in the middle frontal to central, and temporal regions are modulated by the congruency of prior tactile stimuli and subsequent auditory stimuli (e.g., vocal sounds). Thus, we focused on each of the 9 frontal to parietal regions, which included 3–5 channels, by visual inspection of the ERP wave form at each electrode, to improve the signal-to-noise ratio (for the electrode sites analyzed in this study, see Supplementary Information Fig. S1). We determined the following 3 time periods, on the basis of previous studies (Schneider et al., 2011, Tanaka et al., 2014) and our preliminary analysis. Previous studies in adults found significant effects of tactile priming in the early auditory N1 (60 − 80 ms from stimulus onset), and P2 (120 − 200 ms from stimulus onset) peak in the temporal and frontal regions, and negative peak in the 200 − 400 ms period within the central regions. We calculated peak latency in those time windows. Our ERP data showed a negative peak around 120 ms in the central regions, and 190 ms in the temporal regions, as well as a negative peak around 500 ms in the central regions. N1 and N250 latency was delayed in infancy as compared to that in adults (Wunderlich, Cone-Wesson, & Shepherd, 2006). For preliminary analysis, we conducted t-tests comparing 2 conditions (A-T vs. A) at each time-point, in order to describe the time range during which ERP differences were observed between the conditions for each area. To avoid the detection of spurious differences among conditions, we considered a time range of 78 consecutive time-points (78 ms) of p-values (p < 0.05 indicated a significant effect) (Guthrie and Buchwald, 1991). We found a significant difference, from 72 to 201 ms and from 680 to 893 ms in the left temporal region, and from 401 to 658 ms in the middle central region, as well as from 737 to 899 ms in the right central region. On the basis of these analyses, we determined the early period as N1 (50 − 200 ms), middle period as N250-like (400 − 600 ms after stimulus onset), and the late period as the late long wave (LLW, 700 − 900 ms). The mean amplitude in each period was computed for each condition. These variances were analyzed by repeated-measures ANOVAs with sensory modality (2: A-T/A) and electrode region (9: left frontal/middle frontal/right frontal/left central/middle central/right central/left temporal/middle parietal/right temporal) as within-subjects factors. Bonferroni correction was applied for post-hoc analysis.

Time-Frequency analysis

In order to evaluate cortical oscillatory activation, time-frequency amplitudes were calculated with wavelet transformation for the pre-processed segmented EEG data. The amplitude for each time-point under each condition was the arctangent of the result of the convolution of the original EEG signal s(t) with a complex Morlet wavelet function w(t, f):where σ is the standard deviation of the Gaussian window (the number of cycles = 7), with f ranging from 2 to 20 Hz in 0.5-Hz steps (Tallon-Baudry et al., 1996, Kawasaki et al., 2010). The event-related amplitudes were corrected with the averages (μAMP) and the standard deviations (σAMP) of the amplitudes during the inter-trial interval (baseline) with the formula:where AMP’(t,f) and AMP (t,f) is the corrected amplitudes and the real amplitudes, respectively. We compared the amplitudes of each time-point and each frequency between the A-T and A conditions by means of the Wilcoxon sign tests with the multiple comparison correction (Bonferroni correction for the number of electrodes). The data for 2 subjects were excluded for the time-frequency analyses due to large artifacts. These individuals’ data, however, met the criteria of the ERPs (i.e., at least 20 trials were left per condition, and the acquisition rate was more than 40% per condition). We also visually assessed the ERP waves, but did not find any issues, such as marked electrical noise. Thus, the data of these 2 subjects were included in the ERP analysis. In order to localize the generator of the scalp EEG oscillations in greater detail, we applied a standardized low resolution EEG tomography (sLORETA) (Pascual-Marqui et al., 2002). The sLORETA images were corresponded by a 5-mm spatial resolution and the statistical contrast maps between conditions were calculated. The peak Talairach Atlas coordinates were identified for the specific frequency band (theta and beta bands in this study).

Coding of infants’ engagement in tickling interaction

To examine the relationship between infants’ brain activity and their engagement in the tickling interaction, we measured their behavior during the exposure phase. Since the present study was designed to examine the effect of multimodal interaction on subsequent auditory stimulus processing, it is possible that the degree of engagement in tickling interaction affected infants’ brain activity. Tickling leads to an involuntary stereotyped motor reaction, such as laughter and straining of the body (Provine, 2004). Thus, these behavioral indices are considered to be suitable for assessing the degree of engagement of infants in the tickling interaction. One experimenter coded all of an infants’ behavior individually for each block off-line, whilst another experimenter coded 25% of the data. The following parameters were scored on a scale of 1–5: (1) the degree of emotional display (how strongly infants showed positive emotional expression), (2) the degree of body movement (how much infants moved their body), and (3) the degree of attention (how attentive infants were to the experimenter). Coding schemas are shown in Supplementary Information Table S2. The inter-coder reliability was sufficiently high (Cronbach alpha coefficients: α = 0.94 for tickling emotional display, α = 0.76 for speech emotional display, α = 0.75 for tickling body movement, α = 0.70 for speech body movement, α = 0.93 for tickling attention, α = 0.87 for speech attention, respectively). The scores of the tickling blocks were significantly higher than those of the speech blocks for (1) the degree of emotional display (tickling blocks: M = 3.59, SD = 0.46, speech blocks: M = 3.19, SD = 0.66, t(27) = 2.71, p = 0.01, Cohen's d = 0.51) and (2) the degree of body movement (tickling blocks: M = 3.47, SD = 0.99, speech blocks: M = 2.06, SD = 1.01, t(27) = 4.72, p < 0.001, Cohen's d = 0.89). In contrast, the score was not significantly different between blocks for (3) the degree of attention (tickling blocks: M = 4.02, SD = 0.60, speech blocks: M = 4.18, SD = 0.60, t(27) = −1.46, p = 0.16, n.s., Cohen's d = −0.28). We therefore successfully ensured that infants were engaged in the tickling interaction, and that their attention level did not differ between blocks. We next conducted correlation analysis among these scores, and found a significant positive correlation between emotional display and body movement (r = 0.82, p < 0.001). We then chose emotional display and attention scores as a behavioral index, and excluded body movement. This is because the score for body movement was difficult to interpret, and included both positive (laughed at tickling) and negative (tried to avoid tickling) emotional expressions. The mean total engagement scores during the tickling block were subtracted from those of the speech block for each infant, and these were labeled as (1) engagement score of emotional display, and (2) engagement score of attention, respectively.

Results

ERP waveform results

To examine the relationship among condition and regions, we conducted ANOVAs with condition and regions for each time period (Fig. 3). We found a significant interaction with condition (A-T and A) in the early and late time periods. In the early time period (N1: 50–200 ms after stimulus onset), we found a significant interaction between condition and region (F (8,216) = 3.25, p = 0.001, η = 0.11). Post-hoc analysis revealed that ERPs in the left temporal region show greater positive activity in the A-T condition (M = −0.38 μV, SD = 3.17) than in the A condition (M = −2.33 μV, SD = 2.52; t (242) = 3.60, p < 0.001) .

Fig. 3

Grand-averaged event-related potential (ERP) waveforms at 9 regionally grouped electrode sites for point-by-point comparison between the A-T and A conditions. Solid lines show ERPs in the A-T condition whereas dashed lines show ERPs in the A condition. The period and regions with significant differences between conditions are highlighted. LF: left frontal, MF: middle frontal, RF: right frontal, LC: left central, MC: middle central, RC: right central, LT: left temporal, MP: middle parietal, RP: right parietal. In the middle time period (N250-like: 400–650 ms after stimulus onset), we did not find significant interaction between condition and region (F (8,216) = 1.65, p = 0.11, η = 0.06), or a main effect of condition (F (1,216) = 0.633, p = 0.433, η = 0.02). In the late time period (LLW: 700–900 ms after stimulus onset), we found a significant interaction between condition and region (F (8,216) = 2.21, p = 0.03, η = 0.08). Post-hoc analysis revealed that only ERPs in the left temporal region showed greater positive activity in the A-T condition (M = 0.62 μV, SD = 5.07) than in the A condition (M = −1.80 μV, SD = 4.92; t (242) = 2.83, p = 0.005). Taken together, we found significant differences in ERP responses between conditions only in the left temporal region, in several time periods. In particular, in the early time period (N1), greater positive ERP amplitudes were elicited in the left temporal region when infants heard words that had been spoken accompanied with touch, than when they heard words spoken without touch in the exposure phase. In the late time period (LLW), we again found a greater positive mean amplitude in the left temporal region in the A-T condition than in the A condition.

Oscillatory response

Fig. 4 shows the time-frequency p-values for the differences in amplitudes between conditions at the representative electrodes within the frontal, central, temporal, and parietal areas. Significantly large differences were sustainably observed in the theta range (c. 5 − 7 Hz) at the midline frontal and central electrodes after the onset of the stimulus presentation. In contrast, the differences in the beta range (about 15 − 18 Hz) were transient in the left temporal electrodes for 701 − 800 ms from the onset of the stimulus presentations. The topographic maps of the p-values for the differences in amplitudes between conditions are shown in Fig. 5 (theta peak frequency: 6 − 7 Hz, peak time window: 501 − 600 ms; beta peak frequency: 15 − 16 Hz, peak time window: 701 − 800 ms). The Fz and T7 electrodes showed the largest significances in the theta and beta bands, respectively. Both the frontal theta and temporal beta amplitudes under the A-T condition were significantly larger than those under the A condition.

Fig. 4

Fig. 5

Theta (top; 6 − 7 Hz, 501 − 600 ms) and beta (bottom; 15 − 16 Hz, 701 − 800 ms) topographic maps of statistical p-values. The maps represent differences in the amplitudes between the A-T and A conditions (left), of subject-averaged amplitudes under the A conditions (middle), and of subject-averaged amplitudes under the A-T conditions (right).

Statistical p-values for different time-frequency amplitudes at 9 regionally grouped electrode sites between the A-T and A conditions. White vertical lines show the onset of the stimulus presentations. Theta (top; 6 − 7 Hz, 501 − 600 ms) and beta (bottom; 15 − 16 Hz, 701 − 800 ms) topographic maps of statistical p-values. The maps represent differences in the amplitudes between the A-T and A conditions (left), of subject-averaged amplitudes under the A conditions (middle), and of subject-averaged amplitudes under the A-T conditions (right). In the source estimation, the sLORETA analyses were based on the statistical estimates which were shown in Fig. 5 left. The sLORETA showed that the theta and beta peaked sources were localized in the right middle frontal gyrus (peak Talairach Atlas coordinates; x = 50, y = 15, z = 45; Brodmann Area 9) and the left superior temporal gyrus (peak Talairach Atlas coordinates; x = −60, y = −15, z = 10; Brodmann Area 22), respectively.

Relationship between ERPs and infants’ engagement in tickling interaction

We conducted correlation analyses between ERPs and engagement scores in the exposure phase using Pearson’s coefficient (r). We used differential ERP amplitudes between conditions (A-T vs. A [mV]) as a measurement of the effect of tickle. Since we conducted correlation analysis for each time period, the p values were modified with Bonferroni corrections for region and period (corrected p = 0.002). The results for brain regions are shown in Supplementary Information Table S3. We found a significant negative correlation between the engagement score of emotional display and the differential ERP in the N250 time period in the middle central region (r(27) = −0.65, p = 0.001; Fig. 6). The differential ERP in the N250 in the right central region was also positively correlated (r(27) = −0.52, p = 0.005), which did not reach significance after corrections. ERPs in this period had a negative peak in the central regions (Fig. 3); the negative correlation indicates that infants who laughed more often showed stronger ERPs with negative electrical activity during the A-T condition than the A condition. On the other hand, we found no significant correlations in the left temporal region (N1 period: r(27) = 0.32, p = 0.10, LLW period: r(27) = 0.27, p = 0.16). We did not find any significant correlations between attention and differential ERPs in target regions (−0.24 < rs < 0.18, ps > 0.22).

Fig. 6

Correlation between infants’ engagement score during tickling interaction and their event-related potential (ERP) response in the middle central region. The X-axis shows the differential engagement score of emotional display between blocks (tickling vs. speech) and the Y-axis shows differential ERP responses (A-T vs. A) [mV] in N250-like time period.

Discussion

The present study investigated how neural processing is modulated by multimodal social interaction involving physical contact in infancy. After familiarization of 2 different kinds of words, 1 of which was heard accompanied with, and the other without, social touch, infants’ EEGs were obtained. In the left temporal regions, ERPs in the early and late periods, and beta band responses in the late period, were more pronounced in the A-T condition than in the A condition. Beta peaked sources were localized in the left superior temporal gyrus. In frontal and central regions, theta band responses in the middle period, were stronger in the A-T condition than in the A condition. Theta peaked sources were localized in the right middle frontal gyrus. Finally, the more engaged infants were in the tickling interaction, the larger the differential ERPs were in the central regions, but these correlations were not significant in the temporal regions. Differences involving the ERPs within the left temporal regions may reflect the process of integration of A-T information. In particular, ERPs in the early period within this region may be responsible for the integration of A-T information at the perceptual level. Such an interpretation is consistent with a previous finding in which stronger ERPs from before 200 ms were obtained when children perceived multimodal A-T stimuli compared to unimodal stimuli, and which was also dominant in the left hemisphere (Russo et al., 2010). Similarly, neurophysiological studies in macaques (Kayser, Petkov, Augath, & Logothetis, 2005) and human adults (Gobbelé, Schürmann, Forss, Juottonen, Buchner et al., 2003; Schürmann, Caetano, Hlushchuk, Jousmäki, & Hari, 2006) also suggest that left primary auditory cortex activate to integrate A-T information. The auditory N1 response is considered to represent activity in the primary auditory cortex in adults (Hari, Hämäläinen, Ilmoniemi, & Kaukoranta, 1984). Recent neurophysiological research in infancy also found that A-V multimodal information affects the primary sensory cortex (Watanabe, Homae, & Nakano, 2013). From these previous studies, the modulated N1 response in the present study might reflect activation of the primary sensory cortex when hearing words that had been associated with tactile cues. In the LLW periods, we also found stronger ERPs and beta-band activity in the A-T condition than in the A condition. ERPs in this time period are considered to play a role in the integration of multimodal meaningful and semantic inputs (Mills, Prat, Zangl, Stager, Neville et al., 2004). This was also supported by the result of beta band activity analysis. The oscillatory activation of the beta band is involved in the multimodal semantic process in the temporal and parietal cortexes (Weiss and Mueller, 2012, Asano et al., 2015). We also found that beta peak sources were localized in the left superior temporal gyrus. Previous research has found that the left superior temporal gyrus is a core region for integrating tactile sensation and sounds of an object with source estimation of gamma band activity in adults (Schneider et al., 2011). In adults, multimodal input modulated unimodal information processing (Thelen et al., 2015). It was also found that brain areas involved in multimodal integration is influenced by unimodal input. For example, simultaneous visual imagery and auditory stimulation resulted in an illusory translocation of auditory stimuli that was associated with activity in the left superior temporal sulcus, a key region for multimodal integration (Driver and Noesselt, 2008, Dahl et al., 2009; Beauchamp, Nath, & Pasalar, 2010; Nath and Beauchamp, 2012). Our results suggested that infants' brains integrated A-T sensory information and encoded the multimodal semantic relationship between sounds and tactile cues. To our knowledge, this is the first report of neurophysiological evidence that A-T information is integrated in infants’ brains through brief tickling interactions with others. On the other hand, in contrast to integration, differences involving theta activity in the frontal region may reflect the process of social learning, rather than somatosensory processing. Frontal theta activity is considered to reflect the motivation for learning (Begus, Southgate, & Gliga, 2015), or enhanced attention to social stimuli (Zhang, Koerner, Miller, Grice-Patil, Svec et al., 2010; Begus, Gliga, & Southgate, 2016). In the present study, the tactile cue was tickling, which induced a positive emotional reaction. The frontal theta activity during this time period might reflect the motivation to obtain a social reward (i.e., tickling) rather than the expectation of a pure “tactile stimulation”. We did not know whether participants more successfully learnt A-T words than A words, as we did not conduct a memory test with them. However, a recent behavioral study found that 4-month-old infants discriminated target words after they heard them in conjunction with physical contact from a social partner (Seidl, Tincoff, Baker, & Cristia, 2015). Based on these previous findings, it is likely that physical contact during the exposure phase became a social reward for the infants, which facilitated infants’ internal attention to A-T words, as compared to A words, and that infants’ brains successfully encoded these A-T words. We found a relationship between ERPs and individual differences involving the engagement in tickling interaction. The greater the engagement by infants within the tickling interaction (i.e., more laughter), the larger the differential ERPs obtained in the central regions during the N250-like. This significant correlation was found only in the central region, but not in temporal regions. This difference may reflect various functions involving the integration of A-T information. A previous fMRI study in adults investigated the neural mechanism specific to tickle-related laughter (Wattendorf et al., 2013). They found that participants who often laughed during tickling showed higher BOLD activation in the sensorimotor regions, bilateral operculum, thalamus, and periaqueductal gray matter. This suggests that several neural networks (sensorimotor to limbic areas) are involved in the tickle-related laughter. We therefore tentatively assume that physical contact in a social situation in the exposure phase enhanced subsequent activation of both left temporal and limbic areas, which may facilitate infants’ integration of A-T information, and also promote their attention to social reward (i.e., tickling). During the fetal period, the tactile sensory system develops earlier than other sensory systems (Moore and Persaud, 2008), and the fetus is therefore potentially sensitive to A-T information. For example, fetuses show increased motor and heart rate responses to A-T stimulation as compared with when stimulation occurs in just 1 sensory modality in isolation (Kisilevsky and Muir, 1991). However, such a response might occur because of the immaturity of somatosensory cortical processing before somatosensory pruning (Shibata, Fuchino, Naoi, Kohno, Kawai et al., 2012; see also Marshall and Meltzoff, 2015). On the other hand, the results of this study suggest that infants’ brains do integrate A-T information through interaction with others during the first year of life. In human adults, the neural processing involved in A-T integration is also modulated by the experience of multimodal cues provided to infants during daily interactions (Tanaka, Fukushima, Okanoya, & Myowa-Yamakoshi, 2014). During natural interaction, infants and caregivers influence each other mutually; caregivers provide multimodal cues to infants, and infants’ responses strengthen the interactive involvement of their caregivers (Fukuyama, Qin, Kanakogi, Nagai, Asada et al., 2014). Such two-way relationships may facilitate A-T integration for both caregivers and infants, and this could form the neural basis for the integration of A-T information in infancy. Furthermore, to integrate multimodal social information, postnatal experience might be important. although a fetus is sensitive to A-T (i.e., auditory-pitch and tactile vibration) stimulation, intermodal interaction can automatically occur, regardless of prior experience when frequency rate is shared among A-T modality (Butler, J. S., Foxe, Fiebelkorn, Mercier, & Molholm, 2012). On the other hand, the integration of social information is modulated by perceptual experiences, which affects memory and learning (Kriegstein and Giraud, 2006). Previous behavioral studies in infancy also found that multimodal cues provided by mothers facilitate novel word learning in infancy (Gogate, Bolzani, & Betancourt, 2006). From these previous findings, the multimodal input in social situations might facilitate binding of multiple information in infants’ brains, which might contribute to effective learning, such as recognition of novel information and language development. Thus, we also speculate that when non-social touch is used during the exposure phase with our experimental paradigm, infants’ brain activity would not change so dramatically and plastically. A limitation of the present study is that it remains unclear how visual information contributes to the integration of A-T information within a social interaction. Infants always perceive visual information, such as their partners’ face and body, during a dyadic interaction. Such visual cues are important for infants to respond interactively and predictively to their social partners (Iverson, Capirci, Longobardi, & Caselli, 1999; Striano and Stahl, 2005). We did not present visual cues (i.e., the face and body of the experimenter) during the test phase, so that we could focus on the integration of A-T information while eliminating the confounding effect of visual information. Infants, however, might have preferred to reference the experimenter’s face during the perception of word stimuli during the test phase. It is possible that the effect of the integration of A-T information was weakened because there were no visual cues during the test phase. Future studies should investigate how visual cues affect the integration of multimodal A-T information in a social context in infancy.

Conclusions

We have here investigated how the neural processing of A-T information integration is modulated during multimodal social interaction involving physical contact in infancy. We found that sounds associated with tactile interaction led to integration of A-T information in the left temporal areas, and activation of the middle frontal region. We have thus provided neural evidence the A-T information is integrated in infants’ brains by the experience of a brief tickling interaction with others. Our findings also suggest that multimodal interactions between caregivers and infants in a natural context might contribute to binding multiple forms of information in infants’ brains, which may facilitate effective social learning in infancy.

Conflict of interest

None.

53 in total

1. A study of multimodal motherese: the role of temporal synchrony between verbal labels and gestures.

Authors: L J Gogate; L E Bahrick; J D Watson
Journal: Child Dev Date: 2000 Jul-Aug

2. Mismatch responses related to temporal discrimination of somatosensory stimulation.

Authors: Kosuke Akatsuka; Toshiaki Wasaka; Hiroki Nakata; Koji Inui; Minoru Hoshiyama; Ryusuke Kakigi
Journal: Clin Neurophysiol Date: 2005-08 Impact factor: 3.708

3. Maturation of the cortical auditory evoked potential in infants and young children.

Authors: Julia Louise Wunderlich; Barbara Katherine Cone-Wesson; Robert Shepherd
Journal: Hear Res Date: 2006-02-03 Impact factor: 3.208

4. Gamma-band activity as a signature for cross-modal priming of auditory object recognition by active haptic exploration.

Authors: Till R Schneider; Simone Lorenz; Daniel Senkowski; Andreas K Engel
Journal: J Neurosci Date: 2011-02-16 Impact factor: 6.167

5. A neural basis for interindividual differences in the McGurk effect, a multisensory speech illusion.

Authors: Audrey R Nath; Michael S Beauchamp
Journal: Neuroimage Date: 2011-07-20 Impact factor: 6.556

6. Sensitivity to triadic attention in early infancy.

Authors: Tricia Striano; Daniel Stahl
Journal: Dev Sci Date: 2005-07

7. Neural coding of formant-exaggerated speech in the infant brain.

Authors: Yang Zhang; Tess Koerner; Sharon Miller; Zach Grice-Patil; Adam Svec; David Akbari; Liz Tusler; Edward Carney
Journal: Dev Sci Date: 2010-11-23

8. fMRI-Guided transcranial magnetic stimulation reveals that the superior temporal sulcus is a cortical locus of the McGurk effect.

Authors: Michael S Beauchamp; Audrey R Nath; Siavash Pasalar
Journal: J Neurosci Date: 2010-02-17 Impact factor: 6.167

9. Implicit multisensory associations influence voice recognition.

Authors: Katharina von Kriegstein; Anne-Lise Giraud
Journal: PLoS Biol Date: 2006-10 Impact factor: 8.029

10. Sound symbolism scaffolds language development in preverbal infants.

Authors: Michiko Asano; Mutsumi Imai; Sotaro Kita; Keiichi Kitajo; Hiroyuki Okada; Guillaume Thierry
Journal: Cortex Date: 2014-09-16 Impact factor: 4.027