| Literature DB >> 34804460 |
Chuan-Yu Chang1,2, Sweta Bhattacharya3, P M Durai Raj Vincent3, Kuruva Lakshmanna3, Kathiravan Srinivasan4.
Abstract
The cry is a loud, high pitched verbal communication of infants. The very high fundamental frequency and resonance frequency characterize a neonatal infant cry having certain sudden variations. Furthermore, in a tiny duration solitary utterance, the cry signal also possesses both voiced and unvoiced features. Mostly, infants communicate with their caretakers through cries, and sometimes, it becomes difficult for the caretakers to comprehend the reason behind the newborn infant cry. As a result, this research proposes a novel work for classifying the newborn infant cries under three groups such as hunger, sleep, and discomfort. For each crying frame, twelve features get extracted through acoustic feature engineering, and the variable selection using random forests was used for selecting the highly discriminative features among the twelve time and frequency domain features. Subsequently, the extreme gradient boosting-powered grouped-support-vector network is deployed for neonate cry classification. The empirical results show that the proposed method could effectively classify the neonate cries under three different groups. The finest experimental results showed a mean accuracy of around 91% for most scenarios, and this exhibits the potential of the proposed extreme gradient boosting-powered grouped-support-vector network in neonate cry classification. Also, the proposed method has a fast recognition rate of 27 seconds in the identification of these emotional cries.Entities:
Mesh:
Year: 2021 PMID: 34804460 PMCID: PMC8601804 DOI: 10.1155/2021/7517313
Source DB: PubMed Journal: J Healthc Eng ISSN: 2040-2295 Impact factor: 2.682
Figure 1Waveform and spectrogram of an infant cry.
Figure 2Waveform and spectrogram of sleep cry.
Figure 3Waveform and spectrogram of hunger cry.
Figure 4Waveform and spectrogram of discomfort cry.
Consolidated review of techniques and challenges in infant cry classification.
| Ref. No. | Dataset | Methodology | Research challenges |
|---|---|---|---|
| [ | DA-IICT infant cry and baby Chillanto | Convolutional restricted Boltzmann machine (ConvRBM) model | (i) Model was not implemented in any real-time environment |
| [ | Self-recorded datasets, baby Chillanto, National Taiwan University Hospital, Dunstan baby, iCOPE, University of Milano Bicocca | MFCC, KNN, SVM, GMM, CNN and RNN | (i) Scalability of the dataset |
| [ | Sainte-Justine Hospital (Montreal, Canada), Al-Sahel Hospital (Lebanon), Al-Raee Hospital (Lebanon) | DFNN | (i) Exclusion of various topologies and transfer functions |
| [ | Audio recordings from free sound, BigSoundBank, sound archive, ZapSplat, SoundBible and sound jay | CNN | (i) Model was not evaluated against any other models |
| [ | Self-recorded audio recordings | SVC and RBF | (i) Exclusion of more features or categories in the dataset |
| [ | Donate-a-cry corpus | SVM, random forest, logistic regression, KNN | (i) Exclusion of a more extensive dataset for enhanced justification of the model |
| [ | Datasets from various online resources with infant cry clips | Random forest classification model | (i) Use of smart phones other than Motorola G5/ |
| [ | Donate-a-cry corpus | KNN and SVM | (i) Exclusion of an extensive dataset with more infant cry categories |
Figure 5Flow diagram of the proposed system.
Twelve extracted features of acoustic feature engineering.
| Time domain | Frequency domain | ||
|---|---|---|---|
| 1 | Magnitude |
| Pitch |
| 2 | Average |
| Bandwidth |
| 3 | Variance |
| Peak |
| 4 | Zero crossing rate |
| Valley |
|
| Formant | ||
|
| LPCC | ||
|
| MFCC | ||
|
| ΔMFCC | ||
Figure 6Infant cry classification-grouped-support-vector network.
Figure 7Infant cry acquisition setup.
Dataset for training and testing.
| Training data | Testing data | Total | |
|---|---|---|---|
| Hunger | 186 cries | 186 cries | 372 cries |
| Sleep | 129 cries | 129 cries | 258 cries |
| Discomfort | 186 cries | 186 cries | 372cries |
Classification accuracy based on the extracted 12 features.
| System classification | Actual Classification | |||
|---|---|---|---|---|
| Hunger | Sleep | Discomfort | Accuracy | |
| Hunger | 168 | 3 | 15 | 0.9032 |
| Sleep | 7 | 113 | 9 | 0.8759 |
| Discomfort | 7 | 1 | 178 | 0.9569 |
| Mean | 0.9120 | |||
Figure 8Receiver operating characteristic curve of the proposed model with 12 features.
Classification accuracy deploying the selected 5 features.
| Actual classification | System classification | |||
|---|---|---|---|---|
| Hunger | Sleep | Discomfort | Accuracy | |
| Hunger | 178 | 2 | 6 | 0.9569 |
| Sleep | 4 | 120 | 5 | 0.9302 |
| Discomfort | 5 | 1 | 180 | 0.9677 |
| Mean | 0.9516 | |||
Figure 9Classification accuracy with and without feature selection.
Figure 10Receiver operating characteristic curve of the proposed model with 5 features.
Categorization accuracy for different genders.
| Correct | Incorrect | Total | Accuracy | |
|---|---|---|---|---|
| Male | 181 | 12 | 193 | 0.9378 |
| Female | 218 | 11 | 229 | 0.9519 |
Comparison with other models.
| No. | Reference | Method | Number of features | Validation | Dataset | Mean accuracy Hunger (%) | Mean accuracy Sleep (%) | Mean accuracy Discomfort (%) |
|---|---|---|---|---|---|---|---|---|
| 1. | Hariharan et al. [ | Extreme learning machine (ELM) kernel classifier | 12 | 10-Fold cross-validation | Baby Chillanto database, Mexican Infants | 90.23 | — | 81.98 |
| 2. | Liu et al. [ | Compressed | 1 (BFCC) | — | Neonatal intensive care unit (NICU) of a Hospital (anonymous). | 68.42 | 68.42 | 70.64 |
| 1 (LPC) | — | 46.67 | 57.89 | 57.89 | ||||
| 1 (LPCC | — | 48.89 | 47.37 | 62.67 | ||||
| 1 (MFCC) | — | 53.33 | 68.42 | 71.05 | ||||
| 3. | Saraswathy et al. [ | Probabilistic neural network | 17 | 10-Fold cross-validation | 1. Baby Chillanto database, Mexican Infants | — | — | 90.79 |
| General regression neural network | 17 | 10-Fold cross-validation | 1. Baby Chillanto database, Mexican Infants | — | — | 78.71 | ||
| 4. | Orlandi et al. [ | Logistic regression | 22 | 10-Fold cross-validation | Infant cry dataset - S. Giovanni di Dio hospital, Firenze, Italy. | — | — | 80.505 |
| Random Forest | 22 | 10-Fold cross-validation | Infant cry dataset - S Giovanni di Dio hospital, Firenze, Italy. | — | — | 86.702 | ||
| Alaie et al. [ | Maximum a posteriori probability or Bayesian adaptation | 2 | Stratified K-fold cross-validation | Infant cry database - neonatology departments of several hospitals in Canada and Lebanon | — | — | 65 .22 | |
| Boosting mixture learning (BML) adaptation method for refining the mean and variance vectors. | 2 | Stratified K-fold cross-validation | Infant cry database - neonatology departments of several hospitals in Canada and Lebanon | — | — | 67 .68 | ||
| Coupling old and boosting mixture learning adaptation estimates over the mean and variance vectors | 2 | Stratified K-fold cross-validation | Infant cry database - neonatology departments of several hospitals in Canada and Lebanon | — | — | 68 .18 | ||
| Boosting mixture learning adaptation method for refining only the mean vectors | 2 | Stratified K-fold cross-validation | Infant cry database - neonatology departments of several hospitals in Canada and Lebanon | — | — | 69 .59 | ||
| 6. | Jun et al. [ | End-to-end deep | — | — | Real-world data collected using a sensor device. | — | — | 97 |
| 7. | Parga et al. [ | Cry-translation algorithm | 10 | — | ChatterBaby dataset | 44 | — | 90.7 |
| 8. | Chang et al. [ | DAG-SVM method | 15 | k-fold cross-validation | Infant cry dataset - national Taiwan university hospital Yunlin branch, Taiwan | 86.36 | 76.81 | 95.45 |
| 9. | Proposed | Grouped-support-vector network | 12 | 10-Fold cross-validation | Infant cry dataset - national Taiwan university hospital Yunlin branch, Taiwan | 90.32 | 87.59 | 95.69 |