| Literature DB >> 35270967 |
Sandie Cabon1, Bertille Met-Montot1, Fabienne Porée1, Olivier Rosec2, Antoine Simon1, Guy Carrault1.
Abstract
Cry analysis is an important tool to evaluate the development of preterm infants. However, the context of Neonatal Intensive Care Units is challenging, since a wide variety of sounds can occur (e.g., alarms and adult voices). In this paper, a method to extract cries is proposed. It is based on an initial segmentation between silence and sound events, followed by feature extraction on the resulting audio segments and a cry and non-cry classification. A database of 198 cry events coming from 21 newborns and 439 non-cry events was created. Then, a set of features-including Mel-Frequency Cepstral Coefficients-issued from principal component analysis, was computed to describe each audio segment. For the first time in cry analysis, noise was handled using harmonic plus noise analysis. Several machine learning models have been compared. The K-Nearest Neighbours approach showed the best results with a precision of 92.9%. To test the approach in a monitoring application, 412 h of recordings were automatically processed. The cries automatically selected were replayed and a precision of 92.2% was obtained. The impact of errors on the fundamental frequency characterisation was also studied. Results show that despite a difficult context, automatic cry extraction for non-invasive monitoring of vocal development of preterm infants is achievable.Entities:
Keywords: NICU; audio processing; classification; continuous monitoring; harmonic plus noise analysis; neuro-behavioral development; preterms; real context; spontaneous cry extraction
Mesh:
Year: 2022 PMID: 35270967 PMCID: PMC8915127 DOI: 10.3390/s22051823
Source DB: PubMed Journal: Sensors (Basel) ISSN: 1424-8220 Impact factor: 3.576
Figure 1Overview of the cry extraction process.
List of the computed features for classification.
| Type of Feature | Estimation Method | Number of Instances |
|---|---|---|
| Fundamental frequency | HNM | 1 |
| Number of harmonics | HNM | 1 |
| Harmonic amplitudes | HNM | 18 |
| Harmonic phases | HNM | 14 |
| Gain | HNM | 1 |
| Filter coefficients | HNM | 20 |
| Mel-Frequency Cepstral Coefficients | HNM | 16 |
| Zero crossing rate | ZCR | 1 |
| Duration | Duration | 1 |
Figure 2Overview of the evaluation of the approach.
Figure 3Visualisation of the dataset using the first two principal components.
Figure 4Heatmap reporting the values of the coefficients applied to project the original feature set on 41 principal components.
Parameter testing summary. Final selected parameters for accuracy are underlined, whereas final selected parameters for precision are marked in bold.
| Method | Parameters |
|---|---|
| KNN | Number of neighbors ∈ [1, 3, |
| Distance: Manhattan or | |
| LDA | Solver ∈ [singular value decomposition, |
| LR | Cut-off ∈ [ |
| RF | Number of trees ∈ [5, 10, |
| Quality split criterion: | |
| MLP | Number of hidden layers ∈ [ |
| Number of perceptrons per layers ∈ [1, 2, 5, 10, | |
| Activation function ∈ [identity, logistic sigmoid, | |
| hyperbolic tan, | |
| SVM linear | No additional parameter |
| SVM polynomial | degree ∈ [ |
| SVM gaussian | margin ∈ [0.01, 0.1, |
| gamma ∈ [ |
Figure 5Performances (in %) of cry selection on the test set for each machine learning approach by maximising the accuracy in the learning phase.
Figure 6Performances (in %) of cry selection on the test set for each machine learning approach by maximising the precision in the learning phase.
Figure 7Cry extraction results on the deployment database. Proportions by category of segments automatically labelled as cry by our model.
Figure 8Three examples of cry characterisation using estimations. Each time, the estimation either with the fixed band 150–750 Hz (in orange) or with a manually selected band (in green) with smoothing (in yellow) is superimposed on the spectrogram of the cries.
Figure 9Six examples of cries overlapping with other sounds: two alarms (in blue), two adult voices (in red) and two background noises (in orange). Estimates of the fundamental frequency are superimposed on the spectrogram of the cries (in yellow).
Figure 10An example of a segment with several sounds and the estimates (yellow).
Figure 11New computation of the cry extraction method (segmentation and classification) for a segment of long duration. The red line indicates the segment that was automatically discarded by the KNN model.
Figure 12An example of vocalisation and the estimates (yellow).
Figure 13Three examples of misclassified events.