| Literature DB >> 35142977 |
Anna Gábor1,2, Noémi Kaszás3, Tamás Faragó3, Paula Pérez Fraga4,3, Melinda Lovas3, Attila Andics4,3.
Abstract
Speech carries identity-diagnostic acoustic cues that help individuals recognize each other during vocal-social interactions. In humans, fundamental frequency, formant dispersion and harmonics-to-noise ratio serve as characteristics along which speakers can be reliably separated. The ability to infer a speaker's identity is also adaptive for members of other species (like companion animals) for whom humans (as owners) are relevant. The acoustic bases of speaker recognition in non-humans are unknown. Here, we tested whether dogs can recognize their owner's voice and whether they rely on the same acoustic parameters for such recognition as humans use to discriminate speakers. Stimuli were pre-recorded sentences spoken by the owner and control persons, played through loudspeakers placed behind two non-transparent screens (with each screen hiding a person). We investigated the association between acoustic distance of speakers (examined along several dimensions relevant in intraspecific voice identification) and dogs' behavior. Dogs chose their owner's voice more often than that of control persons', suggesting that they can identify it. Choosing success and time spent looking in the direction of the owner's voice were positively associated, showing that looking time is an index of the ease of choice. Acoustic distance of speakers in mean fundamental frequency and jitter were positively associated with looking time, indicating that the shorter the acoustic distance between speakers with regard to these parameters, the harder the decision. So, dogs use these cues to discriminate their owner's voice from unfamiliar voices. These findings reveal that dogs use some but probably not all acoustic parameters that humans use to identify speakers. Although dogs can detect fine changes in speech, their perceptual system may not be fully attuned to identity-diagnostic cues in the human voice.Entities:
Keywords: Acoustics; Dog; Interspecific voice discrimination; Speaker-sensitivity
Mesh:
Year: 2022 PMID: 35142977 PMCID: PMC9334438 DOI: 10.1007/s10071-022-01601-z
Source DB: PubMed Journal: Anim Cogn ISSN: 1435-9448 Impact factor: 2.899
Fig. 1Illustration of the experimental setting. A, B: doors, C, D: location of the owner, Experimenter 1 and the loudspeakers; E: plastic wall; F: starting point which was the location of the dog and Experimenter 2 at the beginning of each trial. This Figure was prepared using SweetHome3D software developed by eTeks (http://www.sweethome3d.com/)
Acoustic parameters
| Variable | Description |
|---|---|
| Measures of fundamental frequency | |
| | Mean fundamental frequency |
| | Range ( |
| | Slope ( |
| | Standard deviation of fundamental frequency |
| | Maximum fundamental frequency |
| | Minimum fundamental frequency |
| | Relative position of minimum fundamental frequency (time of |
| | Relative position of maximum fundamental frequency (time of |
| | End fundamental frequency |
| | Start fundamental frequency |
| Measures of noisiness | |
|
| Jitter: periodicity of vocal fold vibration |
|
| Number of voice cycles |
|
| Mean number of voice cycles |
|
| Wiener Entropy: uniformity of the spectrum |
|
| Harmonics-to-noise ratio: the degree of acoustic periodicity |
|
| Standard deviation of the HNR |
|
| Maximum HNR |
| Measures of spectral energy | |
| | Formant dispersion: average frequency difference between the first five consecutive formants |
| CG | Center of gravity: average frequency in the spectrum |
| Dev Freq | Deviation frequency: standard deviation of the center of gravity in the spectrum |
| Energy Diff | Energy difference between 0–2000 and 2000–6000 Hz bands |
| Sk | Skewness of the spectrum |
| Kr | Kurtosis of the spectrum |
| cmoment | Non-normalized skewness of the spectrum |
| BEn | Band Energy: density of the spectrum between 2000 and 4000 Hz |
Experimental protocol
| Phase | No. of trials | Stimulus | Stimulus type | Speakers |
|---|---|---|---|---|
| 1. Training | 2 | Naming and calling the dog | Live | Owner |
| 2 | Neutral speech | Live | Owner | |
| Max 6 | Neutral speech | Live | Owner vs Experimenter 1 | |
| 2. Test | 10 | Neutral speech | Playback | Owner vs Control persons |
| 3. Olfaction control | 2 | Neutral speech | Playback | Owner vs Control persons |
This table shows the number of trials, stimuli, stimulus types and speakers involved in the different phases of the experiment
Fig. 2Proportion of owner and control voice choices. The figure shows the proportion of owner (green) and control voice (orange) choices per trial (A) and depending on the owner’s hiding side (B), the last speaker within trials (C), and the speakers’ gender match (D). Error bars represent SEM. Test: N = 28, olfaction control: N = 23
Dogs’ choosing success during the test and the olfaction control phases
| Phase | Dependent Variable | Proportion of correct choices | Odds ratio | Estimate | SEM | ||
|---|---|---|---|---|---|---|---|
| Test | Choosing success | 0.82 | 4.957 | 1.601 | 0.197 | 8.110 | < 0.001 |
| Olfaction control | 0.87 | 7.385 × 1012 | 29.630 | 12.498 | 2.371 | 0.018 |
The table shows the results of the intercept only binomial GzLMMs. SEM standard error of mean. Test: N = 28, olfaction control: N = 23
Acoustic and design parameter effects on dogs’ choosing success
| Dependent variable | Fixed effects | Estimate | SEM | ||
|---|---|---|---|---|---|
| Choosing success | Owner’s side: left | 0.688 | 0.330 | 2.086 | 0.037 |
| Last speaker: owner | 0.630 | 0.330 | 1.912 | 0.056 |
The table shows results of the binomial GzLMM on choosing success within the test trials. SEM standard error of mean. N = 28
Relation between behavioral variables
| Dependent Variable | Fixed effects | Estimate | SEM | z | p |
|---|---|---|---|---|---|
| Choosing success | Looking time | 0.031 | 0.009 | 3.648 | < 0.001 |
| Choosing latency | − 0.130 | 0.069 | − 1.809 | 0.059 |
Effects of choosing latency and looking time on dogs’ choosing success revealed by binomial GzLMM using choosing success as a dependent variable. SEM standard error of mean. N = 28
Fig. 3Positive association between choosing success and looking time. X-axis shows the proportion of time dogs spent looking toward the screen corresponding to their owner’s voice during stimulus presentations. Y-axis represents the proportion of correct (owner’s voice) choices. Each dot represents the results of one trial, so each trial of every dog tested is displayed. N = 28
Effect of speakers’ acoustic distance on looking time
| Dependent variable | Fixed effects | Estimation | SE | df | ||
|---|---|---|---|---|---|---|
| Looking time | Gender match | − 1.107 | 3.227 | − 0.343 | 260.814 | 0.732 |
| 5.613 | 2.775 | 2.023 | 207.948 | 0.044 | ||
| 4.010 | 1.729 | 2.319 | 258.586 | 0.021 | ||
| − 8.956 | 3.885 | − 2.305 | 135.823 | 0.023 |
Results of the GLMM investigating the effect of speakers’ acoustic distance on looking time. f: fundamental frequency, ppj jitter. N = 28
Fig. 4Effect of speakers’ acoustic distance on looking time. Association between looking time and jitter (ppj) speaker distance (left) and gender match by fundamental frequency (f) mean speaker distance (right). Ppj and f mean distances are represented by z scores. Each dot represents the results of one trial, so each trial of every dog tested is displayed. N = 28