| Literature DB >> 24621604 |
Marina Scheumann1, Anna S Hasting2, Sonja A Kotz3, Elke Zimmermann1.
Abstract
Voice-induced cross-taxa emotional recognition is the ability to understand the emotional state of another species based on its voice. In the past, induced affective states, experience-dependent higher cognitive processes or cross-taxa universal acoustic coding and processing mechanisms have been discussed to underlie this ability in humans. The present study sets out to distinguish the influence of familiarity and phylogeny on voice-induced cross-taxa emotional perception in humans. For the first time, two perspectives are taken into account: the self- (i.e. emotional valence induced in the listener) versus the others-perspective (i.e. correct recognition of the emotional valence of the recording context). Twenty-eight male participants listened to 192 vocalizations of four different species (human infant, dog, chimpanzee and tree shrew). Stimuli were recorded either in an agonistic (negative emotional valence) or affiliative (positive emotional valence) context. Participants rated the emotional valence of the stimuli adopting self- and others-perspective by using a 5-point version of the Self-Assessment Manikin (SAM). Familiarity was assessed based on subjective rating, objective labelling of the respective stimuli and interaction time with the respective species. Participants reliably recognized the emotional valence of human voices, whereas the results for animal voices were mixed. The correct classification of animal voices depended on the listener's familiarity with the species and the call type/recording context, whereas there was less influence of induced emotional states and phylogeny. Our results provide first evidence that explicit voice-induced cross-taxa emotional recognition in humans is shaped more by experience-dependent cognitive mechanisms than by induced affective states or cross-taxa universal acoustic coding and processing mechanisms.Entities:
Mesh:
Year: 2014 PMID: 24621604 PMCID: PMC3951321 DOI: 10.1371/journal.pone.0091192
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Description of playback categories.
| Playback category | no. of senders | Recording conditions | detailed context description (Calls were recorded during…) |
|
| |||
| Human infant | 5 | MS at PH: SK6 + MPR | situations where the mother forbids the infant something (e.g., toy, food, access). |
| dog | 7 | MS at PH: SK6 + MPR | aggressive conflicts between dogs (another dog approached the sender, the sender vocalized and the other dog went away or changed its behaviour; sender defends a toy) or in a stranger situation (a strange person approached a leashed dog which barked and showed its teeth). Eight of the calls occurred during human-dog and 16 of the calls during dog-dog interactions. |
| chimpanzee | 8 | MS at ZS: SK6 + MPR | aggressive interactions between chimpanzees. Chimpanzees chased each other in the enclosure and had physical conflicts. |
| tree shrew | 6 | SS and WK at IZ: SME64 + PA + LP | the sender produced the calls while chasing the other away |
|
| |||
| Human infant | 5 | MS at PH: SK6 + MPR | tickling or playing from 1 ½ year-old infants. |
| dog | 8 | MS at PH: SK6 + MPR | play interactions: either dogs played with each other (chasing each other but also waiting that the other follows; barking in front of another dog to initiate play behavior), or a human held a toy (e.g., ball or stick) in front of the dog. Eight of the calls occurred during human-dog and 16 of the calls during dog-dog interactions. |
| chimpanzee | 6 | MS at ZS: SK6 + MPR; BF at ZH and SP: SMKH 816 + Nagra IV-SJ | tickling sessions of five chimpanzee infants and one adult chimpanzee. |
| tree shrew | 6 | SS and WK at IZ: SME64 + PA + LP | male-female interaction. The male is producing these calls to attrack an oestric female |
Information on number of senders, recording conditions (recorder, place of recording and equipment) and detailed context description for each playback category. All stimuli (except affiliative chimpanzee stimuli obtained by Birgit Fördereuther which were recorded during tickling sessions) were videotaped. For the tree shrew calls, which were obtained by Simone Schehka and colleagues, we refer to their video analyses for context classification [57], [58]. For our own recordings we synchronized audio and video recordings and assigned each vocalization to a detailed context. Each detailed context was assigned to one of the two superordinate context categories, affiliative or agonistic context.
Abbreviations: recorder: MS = M. Scheumann, SS = S. Schehka, WK = W. Konerding, BF = B. Förderreuther; place of recording: PH = private households, ZS = Leintal Zoo Schwaigern, IZ = Institute of Zoology, University of Veterinary Medicine Hannover, ZH = Hannover Zoo, SP = Schwaben Park; recording equipment: SK6 = Sennheiser microphone K 6, MPR = Marantz pocket recorder PMD 660, SME64 = Sennheiser microphone ME64, PA = pre-amplifier (Avid Technology, Öhringen, Germany, M-Audio DMP3), LP = laptop (Toshiba, Irvine, CA, USA, Satellite A10-s100) equipped with a A/D converter: (National Instruments, Austin, TX, USA, DAQCard-6062E) and the software NIDisk (Engineering Design, Belmont, MA, USA), SMKH 816 = Sennheiser microphone MKH 816.
Figure 1Sonograms of examples of playback stimuli of the eight playback categories.
Acoustic characterization of the playback categories.
| Human infant | Dog | Chimpanzee | Tree shrew | |
|
| ||||
| STIM DUR | 0.72±0.18 | 0.85±0.15 | 0.75±0.18 | 0.77±0.13 |
| CALL DUR | 87.54±18.82 | 72.01±20.10 | 100±0 | 59.98±5.15 |
| No. CALL | 1.79±1.32 | 2.58±1.18 | 1±0 | 6.38±1.21 |
| PEAK | 1241.89±738.20 | 773.23±282.51 | 1815.80±444.92 | 5279.18±1164.82 |
| MEAN f0 | 510.73±145.93 | 358.35±224.62 | 1197.87±158.67 | 2341.90±282.48 |
| SD f0 | 81.11±86.05 | 69.48±68.51 | 73.76±49.76 | 194.36±100.96 |
| %VOI | 76.97±25.67 | 42.32±32.38 | 92.83±9.19 | 45.35±11.47 |
|
| ||||
| STIM DUR | 0.76±0.13 | 0.72±0.16 | 0.82±0.14 | 0.69±0.16 |
| CALL DUR | 88.64±14.28 | 55.34±12.66 | 64.47±10.14 | 54.94±14.51 |
| No. CALL | 2.21±1.41 | 2.29±0.69 | 4.38±1.81 | 5.04±2.10 |
| PEAK | 896.97±731.50 | 312.65±426.50 | 799.56±359.07 | 438.09±1287.29 |
| MEAN f0 | 479.66±323.51 | 493.47±216.22 | - | - |
| SD f0 | 110.28±177.85 | 68.89±49.61 | - | - |
| %VOI | 78.86±19.12 | 39.86±24.35 | 0±0 | 0±0 |
Mean ± standard deviation of following acoustic parameters: STIM DUR = duration of the stimulus (seconds), CALL DUR = percentage of time of the calls in the whole stimulus (%), No. VOC = number of calls in the stimulus; PEAK = peak frequency (Hz), MEAN f0 = mean fundamental frequency (f0, Hz), SD f0 = standard deviation of f0 (Hz), %VOI = percentage of voiced frames in the stimulus (%).
Figure 2Familiarity ratings for the playback categories.
Mean and standard deviation of the (A) species recognition index and the (B) assumed familiarity index for each playback category.
Figure 3Valence ratings for the playback categories.
Mean and standard deviation of the valence index for the playback categories of the (A) self-perspective and the (B) others- perspective. one-sample t-test *** p≤0.001, **p≤0.01, *p≤0.05.