| Literature DB >> 28769836 |
Maureen Fontaine1, Scott A Love1, Marianne Latinus1.
Abstract
The ability to recognize an individual from their voice is a widespread ability with a long evolutionary history. Yet, the perceptual representation of familiar voices is ill-defined. In two experiments, we explored the neuropsychological processes involved in the perception of voice identity. We specifically explored the hypothesis that familiar voices (trained-to-familiar (Experiment 1), and famous voices (Experiment 2)) are represented as a whole complex pattern, well approximated by the average of multiple utterances produced by a single speaker. In experiment 1, participants learned three voices over several sessions, and performed a three-alternative forced-choice identification task on original voice samples and several "speaker averages," created by morphing across varying numbers of different vowels (e.g., [a] and [i]) produced by the same speaker. In experiment 2, the same participants performed the same task on voice samples produced by familiar speakers. The two experiments showed that for famous voices, but not for trained-to-familiar voices, identification performance increased and response times decreased as a function of the number of utterances in the averages. This study sheds light on the perceptual representation of familiar voices, and demonstrates the power of average in recognizing familiar voices. The speaker average captures the unique characteristics of a speaker, and thus retains the information essential for recognition; it acts as a prototype of the speaker.Entities:
Keywords: average; familiarity; identity; prototypes; recognition; speech; voice; vowels
Year: 2017 PMID: 28769836 PMCID: PMC5509798 DOI: 10.3389/fpsyg.2017.01180
Source DB: PubMed Journal: Front Psychol ISSN: 1664-1078
Acoustic measurements for trained-to-familiar voices and famous voices.
| Original Vowel | 220 | 1,132 | 17 | Original Vowel | 141 | 1,048 | 15 |
| Morphs - 2 | 222 | 1,107 | 21 | Morphs - 2 | 108 | 1,026 | 17 |
| Morphs - 3 | 225 | 1,024 | 20 | Morphs - 3 | 126 | 1,039 | 16 |
| Morphs - 4 | 223 | 1,092 | 21 | Morphs - 4 | 122 | 1,002 | 16 |
| Morphs - 5 | 223 | 1,139 | 22 | Morphs - 5 | 119 | 1,023 | 18 |
| Morphs - 6 | 223 | 1,107 | 22 | ||||
| Original Vowel | 243 | 1,088 | 27 | Original Vowel | 141 | 1,048 | 15 |
| Morphs - 2 | 251 | 1,216 | 36 | Morphs - 2 | 108 | 1,026 | 17 |
| Morphs - 3 | 233 | 948 | 26 | Morphs - 3 | 126 | 1,039 | 16 |
| Morphs - 4 | 243 | 1,116 | 34 | Morphs - 4 | 122 | 1,002 | 16 |
| Morphs - 5 | 242 | 1,156 | 33 | Morphs - 5 | 119 | 1,023 | 18 |
| Morphs - 6 | 242 | 1,135 | 34 | ||||
| Original Vowel | 214 | 1,089 | 14 | Original Vowel | 144 | 1,067 | 18 |
| Morphs - 2 | 223 | 1,126 | 19 | Morphs - 2 | 114 | 1,080 | 20 |
| Morphs - 3 | 207 | 1,094 | 19 | Morphs - 3 | 146 | 1,028 | 17 |
| Morphs - 4 | 214 | 1,110 | 20 | Morphs - 4 | 138 | 1,072 | 20 |
| Morphs - 5 | 216 | 1,126 | 20 | Morphs - 5 | 132 | 1,053 | 20 |
| Morphs - 6 | 214 | 1,113 | 20 | ||||
Note that famous voices were that of male speakers and trained-to-familiar voices were from female speakers.
Recognition of trained-to-familiar and familiar voices.
| 1 | 2 | 3 | 4 | 5 | 6 | |
| T (12) | 10.19 | 2.88 | 4.85 | 3.83 | 4.58 | 2.97 |
| <0.0001 | 0.014 | 0.004 | 0.002 | 0.0006 | 0.012 | |
| Cohen's dz | 2.8266 | 0.7996 | 1.3440 | 1.0609 | 1.2691 | 0.8224 |
| T (12) | 7.52 | 5.2 | 14.8 | 8.65 | 8.62 | |
| <0.0001 | 0.0002 | <0.0001 | <0.0001 | <0.0001 | ||
| Cohen's dz | 2.0858 | 1.442 | 4.1036 | 2.3984 | 2.3916 | |
One-sample t-test against chance level.
Figure 1Performance in the recognition of trained-to-familiar voices. Percent correct (A) and response times (B; ms) are represented as a function of the level of averageness (i.e., number of utterances per voice average). Gray dots represent each participant's data point. The black square represents the average performance across listeners. In (A), the dotted line indicates chance level. Black lines: linear regression built using the average slope and intercept values obtained after performing the linear regression in each subject. The slope was significantly decreasing in (A) indicating that performance worsened with increasing number of utterances per average.
Figure 2Performance in the recognition of famous voices. Percent correct (A) and response times (B;ms) are represented as a function of the level of averageness (i.e., number of utterances per voice average). Gray dots represent each participant's data point. The black square represents the average performance across listeners. In (A), the dotted line indicates chance level. Black lines: linear regression built using the average slope and intercept values obtained after performing the linear regression in each and every subject. For famous voices, performance increased and RTs decreased significantly with increasing number of utterances per average.