| Literature DB >> 29423398 |
Yuna Jhang1, Beau Franklin2, Heather L Ramsdell-Hudock3, D Kimbrough Oller4,5,6.
Abstract
Seeking roots of language, we probed infant facial expressions and vocalizations. Both have roles in language, but the voice plays an especially flexible role, expressing a variety of functions and affect conditions with the same vocal categories-a word can be produced with many different affective flavors. This requirement of language is seen in very early infant vocalizations. We examined the extent to which affect is transmitted by early vocal categories termed "protophones" (squeals, vowel-like sounds, and growls) and by their co-occurring facial expressions, and similarly the extent to which vocal type is transmitted by the voice and co-occurring facial expressions. Our coder agreement data suggest infant affect during protophones was most reliably transmitted by the face (judged in video-only), while vocal type was transmitted most reliably by the voice (judged in audio-only). Voice alone transmitted negative affect more reliably than neutral or positive affect, suggesting infant protophones may be used especially to call for attention when the infant is in distress. By contrast, the face alone provided no significant information about protophone categories. Indeed coders in VID could scarcely recognize the difference between silence and voice when coding protophones in VID. The results suggest that partial decoupling of communicative roles for face and voice occurs even in the first months of life. Affect in infancy appears to be transmitted in a way that audio and video aspects are flexibly interwoven, as in mature language.Entities:
Keywords: communication; facial affect; infant vocalization; multimodal communication; vocal affect
Year: 2017 PMID: 29423398 PMCID: PMC5798486 DOI: 10.3389/fcomm.2017.00010
Source DB: PubMed Journal: Front Commun (Lausanne) ISSN: 2297-900X
FIGURE 1The data indicate that audio-only (AU) (blue bar in the right-hand cluster) was about as effective in transmitting negativity as video-only (VID) (red bar in the right-hand cluster), but that AU was considerably less effective in transmitting positivity or neutrality than the other conditions (blue bars in the middle and left-hand clusters). VID (red bars in the middle and left-hand clusters) was significantly more reliable than AU for positivity and neutrality. Only for negativity did the audio-video (AV) condition (yellow bar, right-hand cluster) yield highest agreement. 95% confidence intervals are included. Kappas at the top of the figure are means.
Nine recordings from nine infants spanning the first year were drawn from a larger study (Oller et al., 2013).
| Affect analyses
| Vocal type analyses
| ||||||||
|---|---|---|---|---|---|---|---|---|---|
| For all conditions: AU, VID, and AV | AU | VID | AV | ||||||
|
|
|
|
| ||||||
| Infant | Ages (in months, weeks) | (A) Number of protophones | (B) Number of cries and laughs | (A) | (B) | (A) | (B) | (A) | (B) |
| 1 | 3, 1 | 175 | 7 | 186 | 4 | 133 | 18 | 184 | 12 |
| 2 | 3, 3 | 99 | 22 | 81 | 21 | 40 | 1 | 75 | 22 |
| 3 | 4, 1 | 115 | 20 | 114 | 10 | 116 | 7 | 106 | 15 |
| 4 | 7, 0 | 135 | 0 | 206 | 4 | 124 | 1 | 209 | 4 |
| 5 | 7, 1 | 103 | 6 | 122 | 10 | 89 | 4 | 111 | 31 |
| 6 | 10, 1 | 139 | 0 | 179 | 2 | 141 | 0 | 186 | 0 |
| 7 | 10, 2 | 66 | 10 | 58 | 14 | 61 | 10 | 61 | 14 |
| 8 | 11, 1 | 111 | 1 | 156 | 12 | 82 | 4 | 152 | 8 |
| 9 | 11, 2 | 76 | 29 | 135 | 4 | 92 | 0 | 130 | 5 |
|
| |||||||||
| Sum | 1,019 | 95 | 1,237 | 81 | 878 | 45 | 1,214 | 111 | |
| Mean utterances/session | 113.2 | 10.6 | 137.4 | 9 | 97.6 | 5 | 134.9 | 12.3 | |
| SD | 37.1 | 10.7 | 49.3 | 6.2 | 33.9 | 5.9 | 51.6 | 9.7 | |
Over a thousand utterances were coded for affect by seven coders in three modalities (AU, VID, and AV) and for vocal type in the same three modalities by two coders. As in the prior study the number of cry and laugh utterances was, according to the coding, low compared to the number of protophones (<10%).
A presentation of proportions of cases where facial and vocal affect were not judged concordantly in AU and VID for each affect type.
| (A) denominator = # of utterances judged to have the specified affect in AU | (B) denominator = # of utterances judged to have the specified affect in VID | |
|---|---|---|
|
| ||
| Numerator = # of utterances where the VID judgment was disconcordant with the AU judgment | Numerator = # of utterances where the AU judgment was disconcordant with the VID judgment | |
| Positive | 0.23, 61/262 | 0.37, 136/369 |
| Neutral | 0.24, 104/428 | 0.23, 99/441 |
| Negative | 0.37, 99/267 | 0.13, 24/182 |
(A) Cases where affect was judged in AU as positive, neutral, or negative by the majority (at least four) of the seven coders (represented by the denominator in each of the six cells), whereas affect was not judged in VID concordantly by at least three of the coders (represented by the numerator in each of the six cells). (B) Cases where affect was judged in VID as positive, neutral, or negative by the majority, whereas affect in AU was not judged as positive, neutral, and negative by at least three of the coders. Each cell represents a proportion. In all six cells of the table our expectation was violated: non-concordant judgments of affect from VID and AU were not rare (always > 0.10), but accounted for about a quarter of judgments overall. The sum of the denominators in the table does not reach the total N of 1,019, because there were cases (<10%) where the seven coders did not produce a majority of judgments for any of the affect types (e.g., three positive, two negative, and two neutral).
For cases where AU and VID judgments of affect were non-concordant, the table presents the proportions where (A) AV judgments agreed with AU judgments or (B) with VID judgments for each of the three affect types.
| (A) AV agrees with AU when VID is disconcordant with AU | (B) AV agrees with VID when AU is discordant with VID | |
|---|---|---|
| Positivity | 0.05, 3/61 | 0.76, 103/136 |
| Neutrality | 0.10, 10/104 | 0.69, 68/99 |
| Negativity | 0.16, 16/99 | 0.75, 18/24 |
The proportions of AV judgments that agreed with AU were significantly lower (p < 0.001, by chi square) for all affect types than those agreeing with VID, which suggests that if vocal and facial affect judgments conflict, the AV judgments tend strongly to agree with VID. As in other cases, the data suggest facial expression tends to predominate in affect judgments.
FIGURE 2Chance level for these correlations is 0; thus video-only (VID) for squeals, vocants, and growls was not above chance level. The very low correlations suggest that VID provided little if any significant information about protophone type. On the other hand, audio-only (AU) and audio-video (AV) both yielded significant information on all three vocal types. 95% confidence intervals are included. Kappas at the top of the figure are means.
(A) Data for coder 1 and (B) data for coder 2.
| Silent | Not silent | ||
|---|---|---|---|
| (A) Coder 1 | Silent | 24 | 91 |
| Not silent | 79 | 1,050 | |
| (B) Coder 2 | Silent | 29 | 87 |
| Not silent | 81 | 1,143 |
When coders used VID to try to detect silent periods (as distinct from protophones), the task was very difficult. False positives (upper right cells for both coders) and misses (lower left cells) substantially outnumbered hits (true positives, upper left cells). Still, observed hits and correct rejections (true negatives, lower right cells) were higher than expected by chance. Kappa after correcting for chance was slight: 0.15 for coder 1, with 95% CI [0.07, 0.22] and 0.18 for coder 2, with 95% CI [0.11, 0.27].