| Literature DB >> 34775819 |
Sven Grawunder1,2, Natalie Uomini3, Liran Samuni4,5, Tatiana Bortolato1,6,5, Cédric Girard-Buttoz1,6,5, Roman M Wittig1,6,5, Catherine Crockford1,6,5.
Abstract
The origins of human speech are obscure; it is still unclear what aspects are unique to our species or shared with our evolutionary cousins, in part due to a lack of a common framework for comparison. We asked what chimpanzee and human vocal production acoustics have in common. We examined visible supra-laryngeal articulators of four major chimpanzee vocalizations (hoos, grunts, barks, screams) and their associated acoustic structures, using techniques from human phonetic and animal communication analysis. Data were collected from wild adult chimpanzees, Taï National Park, Ivory Coast. Both discriminant and principal component classification procedures revealed classification of call types. Discriminating acoustic features include voice quality and formant structure, mirroring phonetic features in human speech. Chimpanzee lip and jaw articulation variables also offered similar discrimination of call types. Formant maps distinguished call types with different vowel-like sounds. Comparing our results with published primate data, humans show less F1-F2 correlation and further expansion of the vowel space, particularly for [i] sounds. Unlike recent studies suggesting monkeys achieve human vowel space, we conclude from our results that supra-laryngeal articulatory capacities show moderate evolutionary change, with vowel space expansion continuing through hominoid evolution. Studies on more primate species will be required to substantiate this. This article is part of the theme issue 'Voice modulation: from origin and mechanism to social impact (Part II)'.Entities:
Keywords: chimpanzees; evolution of language; formants; hominoid; primate; speech
Mesh:
Year: 2021 PMID: 34775819 PMCID: PMC8591386 DOI: 10.1098/rstb.2020.0455
Source DB: PubMed Journal: Philos Trans R Soc Lond B Biol Sci ISSN: 0962-8436 Impact factor: 6.237
Figure 1(a) Spectrograms of the four major chimpanzee vocalization types included. (b) Articulatory parameters visualizing the categorical coding scheme, with visual examples for each cell (lip protrusion and lip rounding: all categories represented; jaw position: 2 of 4 categories shown (fully closed (nasal emission), close (limited opening), mid, open (wide open), shown in electronic supplementary material, figure S3; see electronic supplementary material, table S2 for category definitions). Asterisk, not expected to occur/be feasible in the chimpanzee repertoire. Empty squares are expected to occur but were not represented in our sample. Photo credits: Liran Samuni, Cat Hobaiter.
Number of breath units (BU) per call type for acoustic and articulatory data.
| call type | BUs with acoustic measures only including two non-adjacent BUs per call type per call bout ( | BU with acoustic measures ( | BU with articulatory measures ( | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| hoo | grunt | bark | scream | hoo | grunt | bark | scream | hoo | grunt | bark | scream | |
| number of BU | 140 | 113 | 121 | 31 | 344 | 230 | 169 | 73 | 282 | 67 | 67 | 55 |
| number of chimpanzees (>10 yr old) | 18 | 20 | 24 | 12 | 18 | 20 | 24 | 12 | 7 | 10 | 10 | 7 |
Principal component classification of four chimpanzee call types using acoustic variables, showing the principal component loadings and the proportion of variance explained by each principal component.
| PC1 | PC2 | PC3 | PC4 | PC5 | PC6 | PC7 | PC8 | |
|---|---|---|---|---|---|---|---|---|
| centre of gravity | −0.323 | 0.388 | −0.453 | 0.160 | −0.391 | 0.236 | 0.534 | 0.139 |
| harmonics to noise ratio | −0.106 | −0.677 | 0.268 | −0.054 | 0.012 | 0.030 | 0.673 | −0.027 |
| intensity slope | −0.123 | −0.085 | 0.187 | 0.961 | 0.033 | −0.074 | −0.099 | 0.056 |
| F0 sd | −0.307 | −0.139 | −0.583 | 0.021 | 0.662 | −0.324 | 0.048 | −0.025 |
| duration | −0.235 | −0.532 | −0.352 | −0.006 | −0.245 | 0.519 | −0.455 | −0.026 |
| F1 | −0.536 | 0.209 | 0.280 | −0.056 | 0.085 | 0.139 | −0.028 | −0.748 |
| F2 | −0.491 | 0.139 | 0.378 | −0.161 | 0.300 | 0.267 | −0.092 | 0.632 |
| F0 | −0.435 | −0.127 | 0.048 | −0.139 | −0.501 | −0.687 | −0.184 | 0.127 |
| SD | 1.60 | 1.25 | 1.04 | 0.98 | 0.82 | 0.78 | 0.632 | 0.41 |
| proportion of variance | 0.32 | 0.19 | 0.14 | 0.12 | 0.085 | 0.076 | 0.051 | 0.02 |
| cumulative proportion | 0.31 | 0.51 | 0.64 | 0.77 | 0.85 | 0.93 | 0.98 | 1.00 |
Figure 2Principal component analysis clustering four chimpanzee call types. (a) The distribution of the maximum variation between call types across the first three principal components. (b) PC1 loads principally on fundamental frequency and formants, PC2 on HNR and duration, PC3 on COG and F0 sd. (c) The distribution of the maximum variation between call types across the first three discriminant functions in the linear discriminant analysis including acoustic variables (LDA). (d) LDA: discriminant function loadings for functions 1–3.
Permuted discriminant analysis of four chimpanzee call types using (a) acoustic variables with all data and (b) visually defined jaw and lip articulatory variables.
| acoustic | articulatory | |
|---|---|---|
| no. correct cross classified | 428.98 | 256.75 |
| no. expected correct cross classified (cc) | 213.74 | 129.37 |
| % correct cc | 61.72 | 70.54 |
| % expected correct cc | 30.75 | 35.54 |
| 0.001 | 0.001 | |
| no. randomized cases/DFA | 719 | 398 |
| no. cases selected to construct discriminant functions | 24 | 34 |
Figure 3(a) Formant plot of current dataset (14 female/10 male chimpanzees) with superimposed human vowels (ellipses) (taken from [40]). Although there is more correlation between F1 and F2 than in humans, chimpanzee vocalizations similar to human [u], [ε] and [a] are emitted with F1 and F2 values commensurate with similar sounding human vowels. (b) Formant plot comparing our chimpanzee vocalizations with other primate species, with data drawn from other studies: 15 baboons [4] and one rhesus macaque [6]: interpret with caution due to expected species differences in vocal tract length and the selection of only some calls per repertoire.