| Literature DB >> 35915642 |
Giovanni Costantini1, Valerio Cesarini Dr1, Carlo Robotti2,3, Marco Benazzo2,3, Filomena Pietrantonio4, Stefano Di Girolamo5, Antonio Pisani6,7, Pietro Canzi2, Simone Mauramati2, Giulia Bertino2, Irene Cassaniti8, Fausto Baldanti3,8, Giovanni Saggio1.
Abstract
Alongside the currently used nasal swab testing, the COVID-19 pandemic situation would gain noticeable advantages from low-cost tests that are available at any-time, anywhere, at a large-scale, and with real time answers. A novel approach for COVID-19 assessment is adopted here, discriminating negative subjects versus positive or recovered subjects. The scope is to identify potential discriminating features, highlight mid and short-term effects of COVID on the voice and compare two custom algorithms. A pool of 310 subjects took part in the study; recordings were collected in a low-noise, controlled setting employing three different vocal tasks. Binary classifications followed, using two different custom algorithms. The first was based on the coupling of boosting and bagging, with an AdaBoost classifier using Random Forest learners. A feature selection process was employed for the training, identifying a subset of features acting as clinically relevant biomarkers. The other approach was centered on two custom CNN architectures applied to mel-Spectrograms, with a custom knowledge-based data augmentation. Performances, evaluated on an independent test set, were comparable: Adaboost and CNN differentiated COVID-19 positive from negative with accuracies of 100% and 95% respectively, and recovered from negative individuals with accuracies of 86.1% and 75% respectively. This study highlights the possibility to identify COVID-19 positive subjects, foreseeing a tool for on-site screening, while also considering recovered subjects and the effects of COVID-19 on the voice. The two proposed novel architectures allow for the identification of biomarkers and demonstrate the ongoing relevance of traditional ML versus deep learning in speech analysis.Entities:
Keywords: 1E, Vowel /e/ vocal task; 2S, Sentence vocal task; 3C, Cough vocal task; Adaboost; CFS, Correlation-based Feature Selection; CNN, Convolutional Neural Network; COVID-19; Classification; DL, Deep Learning; Deep learning; H, Healthy control subjects; MFCC, Mel-frequency Cepstral Coefficients; ML, Machine Learning; NS, Nasal Swab; P, Positive subjects; PCR, Polymerase Chain Reaction-based molecular swabs; PvsH, Positive versus Healthy subjects comparison; R, Recovered subjects; RF, Random Forest; ROC, Receiver-Operating Curve; ReLu, Rectified Linear Unit; RvsH, Recovered versus Healthy subjects comparison; SVM, Support Vector Machine; Speech processing
Year: 2022 PMID: 35915642 PMCID: PMC9328841 DOI: 10.1016/j.knosys.2022.109539
Source DB: PubMed Journal: Knowl Based Syst ISSN: 0950-7051 Impact factor: 8.139
Inclusion and exclusion criteria.
| Inclusion criteria | P | R | H | Exclusion criteria | P | R | H |
|---|---|---|---|---|---|---|---|
| 18–80 yo age range | ✔ | ✔ | ✔ | Drugs acting on CNS | ✔ | ✔ | ✔ |
| European ethnicity | ✔ | ✔ | ✔ | Head/neck cancer | ✔ | ✔ | ✔ |
| Italian native speaker | ✔ | ✔ | ✔ | Lung cancer | ✔ | ✔ | ✔ |
| Positive NS (< 10 days) | NA | ✔ | NA | Chemoradiation therapy | ✔ | ✔ | ✔ |
| Two consecutive negative NS | NA | ✔ | NA | C-PAP Therapy | ✔ | ✔ | ✔ |
| LUS | NA | ✔ | NA | Tracheal intubation | ✔ | ✔ | ✔ |
| Negative SS test (< 20 days) | NA | NA | ✔ | Tracheostomy | ✔ | ✔ | ✔ |
Abbreviations: NS: SARS-CoV-2 nasal swab for RNA detection; LUS: lung ultrasound score; SS: SARS-CoV-2 serum sample for IgM and IgG quantification; CNS: Central Nervous System; C-PAP: Continuous Positive Airway Pressure; LUS: Lung Ultrasound score; NA: not applicable.
Fig. 1Flowchart describing the complete pipeline of the Machine Learning approach based on the Adaboost classifier (exemplified for the PvsH comparison).
Fig. 2Visualization of the data augmentation techniques applied to mel-frequency spectrograms.
Top left: original sample spectrogram, top right: pink noise addition, bottom left: time masking, bottom right: frequency masking. .
Fig. 3CNN1 architecture.
A “Conv Block” is described in the higher box and is comprised of a convolutional layer followed by a batch normalization layer and a ReLu (Rectified Linear Unit) activation function. The number after “Conv Block” indicates the number of parallel convolutional filters/neurons in the layer. Max Pool: max pooling layer; FC: Fully connected layer: the number in the round brackets indicates the number of neurons.
Fig. 4CNN2 architecture (for the sole 3C — Cough vocal task).
A “Conv Block” is described in the higher box and is comprised of a convolutional layer followed by a batch normalization layer and a ReLu (Rectified Linear Unit) activation function. The number after “Conv Block” indicates the number of parallel convolutional filters/neurons in the layer. Max Pool: max pooling layer; FC: Fully connected layer: the number in the round brackets indicates the number of neurons. .
Confusion matrices for the PvsH comparison over the two classification approaches (Adaboost and CNN).
| #Inst | Real class | Adaboost | CNN | ||||||
|---|---|---|---|---|---|---|---|---|---|
| 1E | 2S | 3C | Final | 1E | 2S | 3C | Final | ||
| 1 | H | – | – | X | ok | – | – | – | ok |
| 2 | H | – | – | X | ok | – | – | – | ok |
| 3 | H | – | – | – | ok | – | – | – | ok |
| 4 | H | – | – | – | ok | – | – | – | ok |
| 5 | H | X | – | – | ok | X | X | – | X |
| 6 | H | – | – | – | ok | – | – | – | ok |
| 7 | H | – | – | – | ok | – | – | X | ok |
| 8 | H | – | – | – | ok | – | X | – | ok |
| 9 | H | – | – | – | ok | – | – | – | ok |
| 10 | H | – | X | – | ok | – | – | – | ok |
| 11 | P | – | – | – | ok | – | – | X | ok |
| 12 | P | – | X | – | ok | – | X | – | ok |
| 13 | P | – | – | X | ok | – | – | – | ok |
| 14 | P | – | – | – | ok | – | – | – | ok |
| 15 | P | – | – | X | ok | X | – | – | ok |
| 16 | P | – | X | – | ok | – | – | – | ok |
| 17 | P | – | – | X | ok | – | – | – | ok |
| 18 | P | – | X | – | ok | – | – | – | ok |
| 19 | P | – | – | – | ok | – | – | – | ok |
| 20 | P | – | – | – | ok | – | – | X | ok |
| 95 | 80 | 75 | 90 | 85 | 85 | ||||
Abbreviations: #Inst: Number of test instance; H: Healthy group; P: Positive group; 1E: Sustained vowel /e/ vocal task sub-classifier; 2S: Sentence vocal task sub-classifier; 3C: Cough vocal task sub-classifier; CNN: Convolutional Neural Network approach; -: No error in sub-classifier; X: Classification error; Final: Final classification output obtained by means of majority voting of the three (1E, 2S, 3C) sub-classifiers; ok: No final classification error.
Confusion matrices for the RvsH comparison over the two classification approaches (Adaboost and CNN).
| #Inst | Real class | Adaboost | CNN | ||||||
|---|---|---|---|---|---|---|---|---|---|
| 1E | 2S | 3C | Final | 1E | 2S | 3C | Final | ||
| 1 | H | X | – | X | X | X | – | X | X |
| 2 | H | X | – | – | ok | X | – | – | ok |
| 3 | H | X | – | – | ok | – | – | X | ok |
| 4 | H | X | – | X | X | X | – | X | X |
| 5 | H | X | – | – | ok | X | – | X | X |
| 6 | H | X | – | – | ok | X | – | – | ok |
| 7 | H | – | – | – | ok | – | X | X | X |
| 8 | H | – | – | X | ok | X | – | X | X |
| 9 | H | – | – | – | ok | X | – | X | X |
| 10 | H | – | – | – | ok | – | – | – | ok |
| 11 | H | – | – | – | ok | X | – | – | ok |
| 12 | H | – | – | – | ok | – | – | – | ok |
| 13 | H | – | X | X | X | X | – | X | X |
| 14 | H | – | – | X | ok | – | – | X | ok |
| 15 | H | X | X | – | X | X | X | X | X |
| 16 | H | X | X | – | X | X | – | – | ok |
| 17 | H | X | – | – | ok | – | – | – | ok |
| 18 | H | X | – | – | ok | X | X | – | X |
| 19 | R | – | – | X | ok | – | – | – | ok |
| 20 | R | – | – | – | ok | – | – | – | ok |
| 21 | R | – | – | – | ok | – | – | X | ok |
| 22 | R | X | – | – | ok | – | – | – | ok |
| 23 | R | – | – | – | ok | – | – | – | ok |
| 24 | R | X | – | – | ok | – | – | – | ok |
| 25 | R | – | – | – | ok | – | – | – | ok |
| 26 | R | – | X | – | ok | – | – | – | ok |
| 27 | R | – | – | – | ok | – | – | – | ok |
| 28 | R | – | – | X | ok | – | – | – | ok |
| 29 | R | – | – | – | ok | – | – | – | ok |
| 30 | R | – | – | X | ok | – | – | – | ok |
| 31 | R | – | – | – | ok | – | – | – | ok |
| 32 | R | – | – | – | ok | – | – | – | ok |
| 33 | R | – | – | – | ok | X | – | – | ok |
| 34 | R | – | – | X | ok | – | – | – | ok |
| 35 | R | – | – | X | ok | – | – | – | ok |
| 36 | R | – | – | – | ok | – | – | – | ok |
| 66.7 | 88.9 | 72.2 | 63.9 | 91.7 | 69.4 | ||||
Abbreviations: #Inst: Number of test instance; H: Healthy group; R: Recovered group; 1E: Sustained vowel /e/ vocal task sub-classifier; 2S: Sentence vocal task sub-classifier; 3C: Cough vocal task sub-classifier; CNN: Convolutional Neural Network approach; -: No error in sub-classifier; X: Classification error; Final: Final classification output obtained by means of majority voting of the three (1E, 2S, 3C) sub-classifiers; ok: No final classification error.
Most relevant feature domains as retained after the wrapper-based ranking step in the Adaboost-based ML pipeline.
| 1E | 2S | 3C | |
|---|---|---|---|
| PvsH comparison | Voicing Probability | RASTA-PLP | RASTA-PLP |
| RASTA-PLP | Spectral Loudness | Spectral Variation | |
| MFCC | MFCC | Spectral Loudness | |
| RvsH comparison | Spectral Variation | MFCC | RASTA-PLP |
| Energy | RASTA-PLP | MFCC | |
| RASTA-PLP | Energy | Spectral Variation | |
Abbreviations: PvsH: Positive versus Healthy; RvsH: Recovered versus Healthy; 1E: Sustained vowel /e/ vocal task sub-classifier; 2S: Sentence vocal task sub-classifier; 3C: Cough vocal task sub-classifier; RASTA-PLP: Features related to RASTA (Relative Spectral) Coefficients applied to the PLP domain (Perceptual Linear Predictive); Spectral Variation: Umbrella term for features related to variations in the spectrum, such as: slope, kurtosis, skewness, flux; MFCC: Mel-frequency Cepstral Coefficients.
Fig. 5ROC curves.
Above: ROC curve for the PvsH (Positive versus Healthy) comparison. Below: ROC curve for the RvsH (Recovered versus Healthy) comparison. Red line refers to the 1E — vowel/e/ vocal task sub-classifier; blue line refers to the 2S — sentence vocal task sub-classifier; green line refers to the 3C — cough vocal task sub-classifier. Axes span from 0 to 1. AUC (Area Under the Curve) values are reported in the manuscript.
Fig. 6Radar plot for the PvsH-3C sub-classifier.
PvsH: Positive versus Healthy; 3C: Cough vocal task. Radar plot was built on the top 20 features (as ranked by the linear wrapped SVM ranker), averaged over all the subjects, and normalized by the H class. Blue unit circle (colored area) represents the H class, red curve represents the P class. .
Literature review.
| Study | Input signals | Recording specifications | COVID-19 screening and validation characteristics | No. of positive (P) subjects | Classes considered | Algorithm(s) | Validation method | Accuracy |
|---|---|---|---|---|---|---|---|---|
| Ours | Vowel, speech, cough | Unique device (lossless) | PCR, serology | 70 (310 total) | P, H, R | Adaboost, CNN (custom) | Independent test set | 100% 86% (R vs H) |
| Laguarta et al. | Cough | Crowdsourced | None (self-reported) | 2660 (5320 total) | P, H | CNN (ResNet50) | Independent test set | 97% |
| Imran et al. | Cough | Unspecified | Unspecified | 70 (543 total) | P, H, pertussis, bronchitis | SVM | Cross-validation | 92% |
| Pinkas et al. | Vowel, speech, cough | Multiple devices (lossy) | PCR | 29 (88 total) | P, H | RNN + SVM | Independent test set | 79% (F1 score) |
| Shimon et al. | Vowel, cough | Multiple devices (lossy) | PCR | 69 (199 total) | P, H | SVM, RF | Independent test set | 80% (mean) |
| Suppakitjanusant et al. | Vowel, speech, cough | Unique device | Unspecified | None (76 recovered, 116 total) | R, H | CNN (transfer learning) | Cross-validation | 74% (mean, R vs H) |
| Despotovic et al. | Vowel, speech, breath, cough | Crowdsourced | None (self-reported) | 84 (1103 total) | P, H | Adaboost, Multilayer Perceptron, CNN | Cross-validation | 88% |
| Muguli et al. | Vowel, speech, breath, cough | Crowdsourced | None (self-reported) | 60 (990 total) | P, H | Various | Various | 73% (baseline) |
Abbreviations: PCR: Polymerase Chain Reaction-based molecular swab; P: COVID-19 Positive subjects; H: Healthy subjects; R: Recovered subjects; CNN: Convolutional Neural Network; SVM: Support Vector Machine; RNN: Recurrent Neural Network.
“Lossless” refers to raw, unprocessed and uncompressed sound data, while “lossy” implies that compression and/or artifacts are present. “Accuracy” refers to the highest reported classification accuracy for the binary Positive VS Healthy classification, except when otherwise specified. Please note that the algorithms used in each study are greatly summarized in the Table. For studies which did not have a single, final, accuracy result, the mean accuracy has been reported, and specified as such.