| Literature DB >> 31191389 |
Hyunjoo Yoo1, Eugene H Buder2,3, Dale D Bowman3,4, Gavin M Bidelman2,3,5, D Kimbrough Oller2,3,6.
Abstract
Prior research has not evaluated acoustic features contributing to perception of human infant vocal distress or lack thereof on a continuum. The present research evaluates perception of infant vocalizations along a continuum ranging from the most prototypical intensely distressful cry sounds ("wails") to the most prototypical of infant sounds that typically express no distress (non-distress "vocants"). Wails are deemed little if at all related to speech while vocants are taken to be clear precursors to speech. We selected prototypical exemplars of utterances representing the whole continuum from 0 and 1 month-olds. In this initial study of the continuum, our goals are to determine (1) listener agreement on level of vocal distress across the continuum, (2) acoustic parameters predicting ratings of distress, (3) the extent to which individual listeners maintain or change their acoustic criteria for distress judgments across the study, (4) the extent to which different listeners use similar or different acoustic criteria to make judgments, and (5) the role of short-term experience among the listeners in judgments of infant vocalization distress. Results indicated that (1) both inter-rater and intra-rater listener agreement on degree of vocal distress was high, (2) the best predictors of vocal distress were number of vibratory regimes within utterances, utterance duration, spectral ratio (spectral concentration) in vibratory regimes within utterances, and mean pitch, (3) individual listeners significantly modified their acoustic criteria for distress judgments across the 10 trial blocks, (4) different listeners, while showing overall similarities in ratings of the 42 stimuli, also showed significant differences in acoustic criteria used in assigning the ratings of vocal distress, and (5) listeners who were both experienced and inexperienced in infant vocalizations coding showed high agreement in rating level of distress, but differed in the extent to which they relied on the different acoustic cues in making the ratings. The study provides clearer characterization of vocal distress expression in infants based on acoustic parameters and a new perspective on active adult perception of infant vocalizations. The results also highlight the importance of vibratory regime segmentation and analysis in acoustically based research on infant vocalizations and their perception.Entities:
Keywords: acoustic analysis; active perception; adult perception; babbling; cry; distress sounds; fuss; infant vocalizations
Year: 2019 PMID: 31191389 PMCID: PMC6548812 DOI: 10.3389/fpsyg.2019.01154
Source DB: PubMed Journal: Front Psychol ISSN: 1664-1078
FIGURE 1Number of mean ratings across the ten trials within all listeners across the entire 100-point scale in intervals of size 5. The figure illustrates that the entire rating scale was used by the listeners, that is, that ratings occurred within all the intervals of possible ratings. To understand the figure, note that each of 39 listeners produced 42 mean ratings over 10 trials on each of the 42 stimuli. Thus the figure represents 42 × 39 = 1638 mean ratings organized in 20 intervals. For example, the interval from 0 to 5 accounted for 94 mean ratings. The interval with the largest number of ratings (129) was 11 to 15, and the intervals with the smallest number of mean ratings (57) were tied at intervals 56 to 60, 61 to 65, and 96 to 100.
Mean distress ratings of the 39 listeners on the 42 stimuli.
| No. | Mean distress ratings (0 to 100) | Duration (ms) | Average pitch (Hz) | Spectral ratio | Number of regimes |
|---|---|---|---|---|---|
| 1 | 3.93 | 759 | 318 | 19.8 | 1 |
| 2 | 11.30 | 591 | 397 | 21.2 | 1 |
| 3 | 11.30 | 701 | 332.3 | 34.8 | 1 |
| 4 | 13.83 | 679 | 357.4 | 23 | 1 |
| 5 | 14.19 | 706 | 396.4 | 19.7 | 1 |
| 6 | 14.62 | 570 | 408.9 | 26.2 | 1 |
| 7 | 15.05 | 536 | 416.1 | 14.2 | 1 |
| 8 | 16.23 | 547 | 430.1 | 6.3 | 2 |
| 9 | 16.79 | 452 | 420.6 | 8.8 | 1 |
| 10 | 20.86 | 762 | 372.8 | 2.3 | 1 |
| 11 | 22.15 | 544 | 363.9 | 12.2 | 2 |
| 12 | 26.11 | 73 | 298.2 | 10.7 | 2 |
| 13 | 26.79 | 996 | 396.3 | 13.8 | 1 |
| 14 | 27.10 | 860 | 459.5 | 21.7 | 1 |
| 15 | 31.03 | 633 | 437.6 | 5 | 4 |
| 16 | 31.37 | 740 | 436.5 | 16 | 3 |
| 17 | 33.89 | 781 | 372.6 | 10.5 | 2 |
| 18 | 36.48 | 615 | 382.7 | 1 | 2 |
| 19 | 36.71 | 650 | 474.5 | 12.3 | 1 |
| 20 | 38.99 | 1079 | 366.5 | 6.6 | 2 |
| 21 | 39.81 | 1085 | 479.8 | 18.4 | 1 |
| 22 | 40.12 | 1964 | 494.6 | 23.7 | 1 |
| 23 | 40.77 | 836 | 396.4 | 6.7 | 1 |
| 24 | 41.39 | 1512 | 500.8 | 12.9 | 1 |
| 25 | 44.85 | 1096 | 316.3 | 4 | 2 |
| 26 | 46.84 | 707 | 509.2 | 12 | 3 |
| 27 | 51.89 | 855 | 385.4 | −1.7 | 1 |
| 28 | 58.34 | 1401 | 390.9 | −3.1 | 3 |
| 29 | 68.08 | 1976 | 428.5 | −2 | 1 |
| 30 | 70.84 | 853 | 442.7 | 0.2 | 3 |
| 31 | 71.22 | 1252 | 435.9 | 9.9 | 3 |
| 32 | 76.26 | 1206 | 432.3 | −12.3 | 3 |
| 33 | 77.97 | 1281 | 383.8 | 2 | 3 |
| 34 | 78.77 | 815 | 451.4 | −1.4 | 3 |
| 35 | 79.42 | 1215 | 386.1 | −8.1 | 3 |
| 36 | 79.51 | 1712 | 441.8 | −9.6 | 3 |
| 37 | 79.93 | 1597 | 424.2 | 0.2 | 3 |
| 38 | 80.31 | 988 | 505.5 | 9.5 | 3 |
| 39 | 83.44 | 1743 | 373.7 | −4.9 | 3 |
| 40 | 83.97 | 2000 | 384.9 | −15.1 | 5 |
| 41 | 90.70 | 1361 | 378.3 | −3.1 | 4 |
| 42 | 91.29 | 1891 | 423.6 | −8.9 | 5 |
FIGURE 2Pearson correlations between each of the nine acoustic parameters selected for the full model and the mean perception ratings of distress level. Average Pitch (f0 mean) represents mean fundamental frequency within each utterance. Max pitch represents the maximum f0 within each utterance. Max amplitude (peak of the root-mean-square amplitude) represents maximum amplitude (in volts) across each utterance. Spectral ratio represents the ratio for each utterance of spectral energy below 2 kHz to the energy above 2 kHz in the regime segment with the minimum ratio. Spectral mean represents the maximum mean of spectral concentration (long-term spectral average) in kHz across the regime segments in each utterance. Spectral dispersion (SD) represents the maximum standard deviation of spectral concentration in kHz across the regime segments in each utterance. Periodicity represents the minimum cepstral peak prominence in dB across the regime segments in each utterance, a measure of periodicity. Number of regimes represents the number of regime segments within each utterance. Error bars = ± 1 SEM.
Standardized coefficients and relative contribution to the full model of each acoustic predictor of distress ratings (∗p < 0.05, ∗∗p < 0.001).
| Predictors | Standardized (β) coefficient | |
|---|---|---|
| Intercept | 47.0 | <0.00001 |
| Duration (ms) | 8.0 | |
| Average pitch (f0) | 6.2 | 0.13 |
| Max pitch (f0) | −1.8 | 0.67 |
| Max amplitude (RMS) | 2.4 | 0.28 |
| Spectral ratio | −6.9 | 0.10 |
| Spectral mean | 2.3 | 0.65 |
| Spectral dispersion (SD) | 2.4 | 0.59 |
| Periodicity | 1.6 | 0.64 |
| Number of regimes | 9.1 |
Parameters selected in the backward selection method model for predicting rated level of distress based on acoustic parameters.
| Mean (SD) | β | ||
|---|---|---|---|
| Intercept | NA | −19.0 | 0.22 |
| Duration (ms) | 1030.6 (450.9) | 0.02 | <0.0001 |
| Average pitch (f0) | 479.8 (66) | 0.09 | <0.025 |
| Spectral ratio | 1.77 (1.3) | −0.92 | <0.001 |
| Number of regimes | 2.4 (1.4) | 7.0 | <0.003 |
Intra-rater differences across 10 trials.
| Acoustic parameter | Number of listeners out of 39 with significant trends of variation across 10 trials | Chi-square test for intra-rater variation across 10 trials | |
|---|---|---|---|
| Chi-square ( | Effect size (w) | ||
| Duration | 5 | 1.47 (0.23) | 0.19 |
| Average pitch (f0) | 7 | 3.22 (0.07) | 0.29 |
| Max pitch (f0) | 7 | 3.22 (0.07) | 0.29 |
| Max amplitude (RMS) | 8 | 4.22 (0.04) | 0.33 |
| Spectral ratio | 6 | 2.30 (0.13) | 0.24 |
| Spectral mean | 6 | 2.30 (0.13) | 0.24 |
| Spectral dispersion (SD) | 4 | 4.11 (0.04) | 0.32 |
| Periodicity | 9 | 5.28 (0.02) | 0.37 |
| Number of regimes | 10 | 6.40 (0.01) | 0.41 |
Inter-rater differences across 39 listeners.
| Acoustic parameter | Proportion of trials failing to reject the null hypothesis in the permutation test | Chi-square test for inter-rater variation | ||
|---|---|---|---|---|
| chi-square ( | effect size (w) | |||
| Duration | 0.98 | 117.54^ | 0.11 | 9752 |
| Average pitch (f0) | 0.98 | 89.33^ | 0.10 | 9757 |
| Max pitch (f0) | 0.92 | 79.01a | 0.09 | 9747 |
| Max amplitude (RMS) | 0.97 | 74.22^ | 0.09 | 9778 |
| Spectral ratio | 0.80 | 997.68a | 0.32 | 9761 |
| Spectral mean | 0.94 | 9.14 (0.003) | 0.03 | 9756 |
| Spectral dispersion (SD) | 0.77 | 1345.15a | 0.37 | 9770 |
| Periodicity | 0.92 | 57.50a | 0.08 | 9747 |
| Number of regimes | 0.90 | 201.39a | 0.14 | 9734 |
FIGURE 3Pearson correlations between each of nine acoustic parameters and the mean ratings of distress level by experienced and inexperienced listeners. Spectral Ratio, Spectral Mean, and Spectral Dispersion (SD) were significantly different across groups (see Appendix 4 to view statistics on all 9 parameters). Error bars = ± 1 SEM. ∗p < 0.05, ∗∗p < 0.001.