| Literature DB >> 31405018 |
Stavros A Ntalampiras1, Luca Andrea Ludovico2, Giorgio Presti3, Emanuela Prato Prato Previde4, Monica Battini5, Simona Cannas6, Clara Palestrini7, Silvana Mattiello8.
Abstract
Cats employ vocalizations for communicating information, thus their sounds can carry a widerange of meanings. Concerning vocalization, an aspect of increasing relevance directly connected withthe welfare of such animals is its emotional interpretation and the recognition of the production context.To this end, this work presents a proof of concept facilitating the automatic analysis of cat vocalizationsbased on signal processing and pattern recognition techniques, aimed at demonstrating if the emissioncontext can be identified by meowing vocalizations, even if recorded in sub-optimal conditions. Werely on a dataset including vocalizations of Maine Coon and European Shorthair breeds emitted in threedifferent contexts: waiting for food, isolation in unfamiliar environment, and brushing. Towards capturing theemission context, we extract two sets of acoustic parameters, i.e., mel-frequency cepstral coefficients andtemporal modulation features. Subsequently, these are modeled using a classification scheme based ona directed acyclic graph dividing the problem space. The experiments we conducted demonstrate thesuperiority of such a scheme over a series of generative and discriminative classification solutions. Theseresults open up new perspectives for deepening our knowledge of acoustic communication betweenhumans and cats and, in general, between humans and animals.Entities:
Year: 2019 PMID: 31405018 PMCID: PMC6719916 DOI: 10.3390/ani9080543
Source DB: PubMed Journal: Animals (Basel) ISSN: 2076-2615 Impact factor: 2.752
Figure 1Time-frequency spectrograms of meows coming from the three considered classes.
Figure 2Details of the QCY Q26 Pro Mini Wireless Bluetooth Music Headset. The small hole in the middle of the right view is the microphone.
Figure 3Two cats provided with a Bluetooth microphone placed on the collar and pointing upwards.
Dataset composition (MC, Maine Coon; ES, European Shorthair; IM/IF, Intact Males/Females; NM/NF, Neutered Males/Females).
| Food (93) | Isolation (220) | Brushing (135) | ||||
|---|---|---|---|---|---|---|
| MC (40) | ES (53) | MC (91) | ES (129) | MC (65) | ES (70) | |
| IM (20) | - | 5 | 10 | - | 5 | - |
| NM (79) | 14 | 8 | 17 | 15 | 21 | 4 |
| IF (70) | 22 | - | 28 | - | 20 | - |
| NF (279) | 4 | 40 | 36 | 114 | 19 | 66 |
Figure 4Representative representations of Mel-scaled spectrogram and temporal modulation corresponding to all three classes (waiting for food, isolation in unfamiliar environment, and brushing).
Figure 5The DAG-HMM addressing the three-class problem of classifying cat manifestations. At each level, the remaining classes for testing are mentioned beside each node. Digging inside each node, an HMM-based sound classifier is responsible for activating the path of the maximum log-likelihood.
The recognition rates achieved by each classification approach. The highest one is emboldened.
| Classification Approach | Recognition Rate (%) |
|---|---|
| Directed acyclic graphs—Hidden Markov Models | |
| Class-specific Hidden Markov Models | 80.95 |
| Universal Hidden Markov Models | 76.19 |
| Support vector machine | 78.51 |
| Echo state network | 68.9 |
The confusion matrix (in %) representing the classification results achieved by the DAG-HMM.
| Responded | ||||
|---|---|---|---|---|
| Presented | ||||
| - | - | |||
| - | 7.41 | |||
| 4.76 | - | |||