| Literature DB >> 35558718 |
Adriel John Orena1,2, Asia Sotera Mader1, Janet F Werker1.
Abstract
Young infants are attuned to the indexical properties of speech: they can recognize highly familiar voices and distinguish them from unfamiliar voices. Less is known about how and when infants start to recognize unfamiliar voices, and to map them to faces. This skill is particularly challenging when portions of the speaker's face are occluded, as is the case with masking. Here, we examined voice-face recognition abilities in infants 12 and 24 months of age. Using the online Lookit platform, children saw and heard four different speakers produce words with sonorous phonemes (high talker information), and words with phonemes that are less sonorous (low talker information). Infants aged 24 months, but not 12 months, were able to learn to link the voices to partially occluded faces of unfamiliar speakers, and only when the words were produced with high talker information. These results reveal that 24-month-old infants can encode and retrieve indexical properties of an unfamiliar speaker's voice, and they can access this information even when visual access to the speaker's mouth is blocked.Entities:
Keywords: indexical information; infancy; online study; speaker perception; voice recognition
Year: 2022 PMID: 35558718 PMCID: PMC9088808 DOI: 10.3389/fpsyg.2022.874411
Source DB: PubMed Journal: Front Psychol ISSN: 1664-1078
Summary of notes for trials in the task familiarization and test phases. Each trial was 14 seconds long. See Figure 1 for time course of audio.
| Task familiarization phase | |
|---|---|
| Trials | Notes |
| A1, A2 | There was one |
|
In the first 2 seconds, two animal cartoons appeared silently side-by-side | |
|
At the 2 second mark, one animal made a sound and wiggled | |
|
At the 9 second mark, either the same or different animal makes a sound and wiggled. No ferns descended. | |
| A3, A4 | There was one |
|
In the first 2 seconds, two animals appeared silently side-by-side | |
|
At the 2 second mark, one animal made a sound and wiggled | |
|
At the 9 second mark, ferns descended. Then, either the same or different animal made a sound and wiggled. | |
| A5, A6 | There was one |
|
In the first 2 seconds, two human cartoons appeared silently side-by-side | |
|
At the 2 second mark, one human made a sound and wiggles | |
|
At the 9 second mark, ferns descended. Then, either the same or different human made a sound and wiggled. | |
|
|
|
| B1-B4 | Each set of four trials had: |
| B6-B9 |
Two Two |
| B11-B14 | |
| For each trial (see In the first 2 seconds, two human cartoons appeared silently side-by-side At the 2 second mark, one human made a sound. At the 9 second mark, ferns descended. Then, either the same or different human made a sound. | |
| B16- | |
| B19 | |
| B5 | Across the four animal trials, there were: |
| B15 |
Two In the first 2 seconds, two animal cartoons appeared silently side-by-side At the 2 second mark, one animal made a sound. At the 9 second mark, ferns descended. Then, either the same or different animal made a sound. |
| B20 | |
| B10 | |
Figure 1Visual schematic of the two types of trials in the test phase of the experiment: Same Speaker trial, and Different Speaker trial. represents window of analysis (WoA), which was 2 s long.
Figure 2Twelve-month-old infants’ looking data during the critical WoA in the test phase, separated by trial type (same speaker vs. different speaker) and talker information (high talker vs. low talker). The looking data for animal trials are also represented on this graph. A value above 0.5 represents proportionally longer looking time to the target voice. The dotted line at 0.5 refers to equal proportion looking to both faces on the screen. Errors bars represent standard error.
Figure 3Twenty-four-month-old infants’ looking data during the critical WoA in the test phase, separated by trial type (same speaker vs. different speaker) and talker information (high talker vs. low talker). The looking data for animal trials are also represented on this graph. A value above 0.5 represents proportionally longer looking time to the target voice. The dotted line at 0.5 refers to equal proportion looking to both faces on the screen. Errors bars represent standard error.