| Literature DB >> 34714438 |
Sasha K Sturdy1,2, David R R Smith1, David N George3.
Abstract
The perceived pitch of human voices is highly correlated with the fundamental frequency (f0) of the laryngeal source, which is determined largely by the length and mass of the vocal folds. The vocal folds are larger in adult males than in adult females, and men's voices consequently have a lower pitch than women's. The length of the supralaryngeal vocal tract (vocal-tract length; VTL) affects the resonant frequencies (formants) of speech which characterize the timbre of the voice. Men's longer vocal tracts produce lower frequency, and less dispersed, formants than women's shorter vocal tracts. Pitch and timbre combine to influence the perception of speaker characteristics such as size and age. Together, they can be used to categorize speaker sex with almost perfect accuracy. While it is known that domestic dogs can match a voice to a person of the same sex, there has been no investigation into whether dogs are sensitive to the correlation between pitch and timbre. We recorded a female voice giving three commands ('Sit', 'Lay down', 'Come here'), and manipulated the recordings to lower the fundamental frequency (thus lowering pitch), increase simulated VTL (hence affecting timbre), or both (synthesized adult male voice). Dogs responded to the original adult female and synthesized adult male voices equivalently. Their tendency to obey the commands was, however, reduced when either pitch or timbre was manipulated alone. These results suggest that dogs are sensitive to both the pitch and timbre of human voices, and that they learn about the natural covariation of these perceptual attributes.Entities:
Keywords: Dog; Glottal-pulse rate; Pitch; Speech; Timbre; Vocal-tract length
Mesh:
Year: 2021 PMID: 34714438 PMCID: PMC9107418 DOI: 10.1007/s10071-021-01567-4
Source DB: PubMed Journal: Anim Cogn ISSN: 1435-9448 Impact factor: 2.899
Fig. 1Schematic of the manipulations of the recordings. The original recordings of an adult female voice (top-left) was adjusted by either simulating a reduction in the glottal-pulse rate (GPR; top-right) alone, simulating an increase in the vocal-tract length (VTL; bottom-left) alone, or generating a synthesized adult male voice by performing both manipulations together (bottom-right). The dashed circle indicates the original position in GPR-VTL space of the adult female voice
Information about the dogs that participated in the experiment
| Dog | Breed | Age (years) | Sex |
|---|---|---|---|
| Annie | Border Terrier | 4 | Female |
| Aukan | German Shepherd | 9 | Male |
| Balu | Lurcher | 15 | Male |
| Cooper | Labrador Retriever | 7 | Male |
| Hattie | Border Terrier | 15 | Female |
| Honey | Romanian Shepherd | 3 | Female |
| Iggy | Labrador Retriever | 2 | Male |
| Merckx | Belgian Malinois | 1 | Male |
| Puck | Belgian Malinois | 2 | Male |
| Skye | Border Collie | 9 | Female |
Fig. 2Spectrograms of recordings of the four versions of the ‘Come here’ command: the original recording (top-left), with simulated GPR reduced (top-right), with simulated VTL increased (bottom-left), and the synthesized male voice with both simulated GPR reduced and VTL increased (bottom-right). Darker greys correspond to higher energy values. The concentration of energy (darker greys) at certain frequencies marks the formants of speech—notice how they drop in frequency when the VTL is increased (within column change) denoting a change from a woman’s to a man’s VTL. The vertical striations mark the vocal fold vibrations—notice how the striations move apart when the GPR is reduced (within row change) denoting a change from a woman’s to a man’s GPR. The text above each spectrogram shows the approximate distribution of speech sounds over time transcribed using the International Phonetic Alphabet (International Phonetic Association 1999; ‘Come here’ → /kʌm/ /hɪə/)
Fig. 3Mean proportion of correct responses that the dogs made to the commands. The three columns show the responses to each of the three commands (left: ‘Come here’; centre: ‘Lay down’; right: ‘Sit’). The top row shows data from the first block of 60 trials, and the bottom row shows data from the second block of 60 trials. Error bars indicate one standard error of the mean and the circles represent individual subjects’ data