| Literature DB >> 24133475 |
Mingdi Xu1, Fumitaka Homae, Ryu-Ichiro Hashimoto, Hiroko Hagiwara.
Abstract
Self-recognition, being indispensable for successful social communication, has become a major focus in current social neuroscience. The physical aspects of the self are most typically manifested in the face and voice. Compared with the wealth of studies on self-face recognition, self-voice recognition (SVR) has not gained much attention. Converging evidence has suggested that the fundamental frequency (F0) and formant structures serve as the key acoustic cues for other-voice recognition (OVR). However, little is known about which, and how, acoustic cues are utilized for SVR as opposed to OVR. To address this question, we independently manipulated the F0 and formant information of recorded voices and investigated their contributions to SVR and OVR. Japanese participants were presented with recorded vocal stimuli and were asked to identify the speaker-either themselves or one of their peers. Six groups of 5 peers of the same sex participated in the study. Under conditions where the formant information was fully preserved and where only the frequencies lower than the third formant (F3) were retained, accuracies of SVR deteriorated significantly with the modulation of the F0, and the results were comparable for OVR. By contrast, under a condition where only the frequencies higher than F3 were retained, the accuracy of SVR was significantly higher than that of OVR throughout the range of F0 modulations, and the F0 scarcely affected the accuracies of SVR and OVR. Our results indicate that while both F0 and formant information are involved in SVR, as well as in OVR, the advantage of SVR is manifested only when major formant information for speech intelligibility is absent. These findings imply the robustness of self-voice representation, possibly by virtue of auditory familiarity and other factors such as its association with motor/articulatory representation.Entities:
Keywords: formant; fundamental frequency; self recognition; speech perception; voice recognition
Year: 2013 PMID: 24133475 PMCID: PMC3795466 DOI: 10.3389/fpsyg.2013.00735
Source DB: PubMed Journal: Front Psychol ISSN: 1664-1078
Figure 1Manipulations of the frequency band of speech sounds. The voice sample is derived from a 20-year-old male participant reading “narau” (learn). (A) The original voice (NORMAL). (B) The LOW condition, where the voice was low-pass filtered at the cut-off frequency of the mean of F2 and F3. (C) The HIGH condition, where the voice was high-pass filtered at the cut-off frequency of the mean of F2 and F3. The upper panels display the waveforms and the lower panels display the spectrograms. The preceding and ending unvoiced parts (about 200 ms) are not shown in the figure.
Figure 2Grand-averaged accuracies of SVR and OVR under the 15 conditions. The accuracies for SVR are represented by solid lines and those for OVR are represented by broken lines. The green lines: NORMAL. The blue lines: LOW. The red lines: HIGH. The error bars represent the standard error of the mean among participants.
Three-way repeated measures ANOVA results.
| Identity | 0.14 | |||
| F0* | <0.001 | “−4” < “0” | 0.01 | |
| “+4” < “0” | 0.002 | |||
| “+4” < “+2” | 0.02 | |||
| Frequency Band* | <0.001 | NORMAL > LOW | <0.001 | |
| NORMAL > HIGH | <0.001 | |||
| Identity × F0 | 0.77 | |||
| Identity × Frequency Band | 0.14 | |||
| F0 × Frequency Band* | <0.001 | |||
| Identity × F0 × Frequency Band | 0.70 |
Within-subject factors: Identity (SVR, OVR), F0 (−4, −2, 0, +2, +4 semitones) and Frequency Band (NORMAL, LOW, HIGH). The asterisks represent significant main effect or interaction.
Two-way repeated measures ANOVA results in NORMAL, LOW, and HIGH, respectively.
| NORMAL | Identity | 0.70 | |||
| F0* | 0.001 | “−4” < “0” | 0.04 | ||
| “+4” < “0” | 0.003 | ||||
| “+4” < “+2” | 0.01 | ||||
| Identity × F0 | 0.48 | ||||
| LOW | Identity | 0.39 | |||
| F0* | <0.001 | “−4” < “0” | 0.002 | ||
| “−2” < “0” | 0.005 | ||||
| “+4” < “0” | <0.001 | ||||
| “+2” < “0” | 0.001 | ||||
| “+4” < “+2” | 0.02 | ||||
| Identity × F0 | 0.88 | ||||
| HIGH | Identity* | 0.04 | SVR > OVR | 0.04 | |
| F0 | 0.59 | ||||
| Identity × F0 | 0.73 |
Within-subject factors: Identity (SVR, OVR) and F0 (−4, −2, 0, +2, +4 semitones). The asterisks represent significant main effect or interaction.
Figure 3The mean accuracies of SVR and OVR under NORMAL, LOW, and HIGH. These mean accuracies were calculated by collapsing the accuracies of the 5 conditions (−4, −2, 0, +2, +4 semitones) of F0 modulation. The accuracies for SVR are represented by black bars, and those for OVR are represented by white bars. The error bars represent the standard error of the mean among participants. The asterisks indicate the levels of significance in the statistical analyses (*p < 0.05, ***p < 0.001). Notably, the Identity effect (SVR > OVR) is significant only in HIGH.