| Literature DB >> 35047869 |
Yi Chang1, Xin Jing2, Zhao Ren2,3, Björn W Schuller1,2.
Abstract
Since the COronaVIrus Disease 2019 (COVID-19) outbreak, developing a digital diagnostic tool to detect COVID-19 from respiratory sounds with computer audition has become an essential topic due to its advantages of being swift, low-cost, and eco-friendly. However, prior studies mainly focused on small-scale COVID-19 datasets. To build a robust model, the large-scale multi-sound FluSense dataset is utilised to help detect COVID-19 from cough sounds in this study. Due to the gap between FluSense and the COVID-19-related datasets consisting of cough only, the transfer learning framework (namely CovNet) is proposed and applied rather than simply augmenting the training data with FluSense. The CovNet contains (i) a parameter transferring strategy and (ii) an embedding incorporation strategy. Specifically, to validate the CovNet's effectiveness, it is used to transfer knowledge from FluSense to COUGHVID, a large-scale cough sound database of COVID-19 negative and COVID-19 positive individuals. The trained model on FluSense and COUGHVID is further applied under the CovNet to another two small-scale cough datasets for COVID-19 detection, the COVID-19 cough sub-challenge (CCS) database in the INTERSPEECH Computational Paralinguistics challengE (ComParE) challenge and the DiCOVA Track-1 database. By training four simple convolutional neural networks (CNNs) in the transfer learning framework, our approach achieves an absolute improvement of 3.57% over the baseline of DiCOVA Track-1 validation of the area under the receiver operating characteristic curve (ROC AUC) and an absolute improvement of 1.73% over the baseline of ComParE CCS test unweighted average recall (UAR).Entities:
Keywords: COUGHVID; COVID-19; FluSense; cough; transfer learning
Year: 2022 PMID: 35047869 PMCID: PMC8761863 DOI: 10.3389/fdgth.2021.799067
Source DB: PubMed Journal: Front Digit Health ISSN: 2673-253X
Figure 1The workflow of this study. CovNet is the proposed transfer learning framework, which includes transferring parameters and incorporating embeddings. CovNet is first applied on the Flusense as the source data, COUGHVID as the target data. Afterwards, to further validate the effectiveness of CovNet, the CovNet based pre-trained COUGHVID models are applied on two smaller Computational Paralinguistics challengE (ComParE) 2021 COVID-19 cough sub-challenge (CCS) dataset and DiCOVA 2021 Track-1 dataset.
Figure 2The proposed transfer learning framework :CovNet. (A) Parameters of the first n convolutional layers/blocks (convs1) of the current COUGHVID model are frozen and initialised by the corresponding first n convolutional layers/blocks (convs0) of the pre-trained FluSense model. (B) Embeddings are extracted after the n-th convs0 of the pre-trained FluSense model. The extracted embeddings are concatenated or added to the current embeddings generated after the n-th convs1 of the COUGHVID model.
Figure 3Models' architecture: (A) Convolutional neural network-4 (CNN-4), (B) VGG-7, (C) residual network-6 (ResNet-6), (D) MobileNet-6. “conv” stands for the convolutional layer, and “block” indicates the convolutional block. The number before the “conv” is the kernel size; the number after the “conv” is the output channel number. The number after “FC” is the input neurons' size.
Data distribution of the FluSense data.
|
|
| |||
|---|---|---|---|---|
|
|
|
|
| |
| Breathe | 167 | 238 | 58 | 297 |
| Cough | 2,486 | 6,148 | 1,537 | 7,685 |
| Gasp | 337 | 315 | 79 | 394 |
| Other | 3,863 | 15,059 | 3,765 | 18,824 |
| Silence | 832 | 1,116 | 279 | 1,395 |
| Sneeze | 611 | 540 | 135 | 675 |
| Sniffle | 589 | 604 | 151 | 755 |
| Speech | 2,615 | 16,614 | 4,154 | 20,768 |
| Throat clearing | 102 | 118 | 29 | 147 |
| ∑ | 11,602 | 40,752 | 10,188 | 50,940 |
The “original” column indicates the number of audio samples; whereas the “pre-processing” columns show the number of segments with unified length of 1 s.
Data distribution of the COUGHVID data.
|
|
|
|
|
|---|---|---|---|
| Negative | 5,660 | 1,415 | 7,075 |
| Positive | 559 | 140 | 699 |
| ∑ | 6,219 | 1,555 | 7,774 |
Data distribution of the Computational Paralinguistics challengE (ComParE) COVID-19 cough sub-challenge (CCS) data.
|
|
|
|
|
|
|---|---|---|---|---|
| Negative | 215 | 183 | 169 | 567 |
| Positive | 71 | 48 | 39 | 158 |
| ∑ | 286 | 231 | 208 | 725 |
DiCOVA Track-1 data distribution of each fold of cross-validation.
|
|
|
|
|
|---|---|---|---|
| Negative | 772 | 193 | 965 |
| Positive | 50 | 25 | 75 |
| ∑ | 822 | 218 | 1040 |
Models' performances [AUC/UAR %] on FluSense and COUGHVID test datasets.
|
|
|
|
|
| ||
|---|---|---|---|---|---|---|
| Single Learning | FluSense | — | 93.55/65.27 | 93.91/64.76 | 93.23/63.86 | 91.26/58.24 |
| COUGHVID | — | 66.14/59.43 | 68.86/60.43 | 65.15/56.42 | 64.17/54.83 | |
| Transfer Learning | Parameters | FC | 58.59/53.68 | 61.35/57.50 | 54.68/54.14 | 56.91/53.93 |
| conv/block 3 & FC | 68.04/57.04 | 67.01/57.97 | 64.97/57.15 | 67.88/ | ||
| conv/block 2-3 & FC | 69.05/ | 64.92/ | ||||
| conv/block 1-3 & FC | 66.23/56.31 | 65.21/55.64 | ||||
| Embeddings Cat | conv/block 3 | 64.32/ | ||||
| conv/block 2 | 67.30/57.81 | 66.17/55.59 | 65.58/52.30 | |||
| conv/block 1 | 65.15/59.30 | 65.35/ | 58.67/51.92 | 66.37/53.77 | ||
| Embeddings Add | conv/block 3 | 64.27/ | 66.08/ | |||
| conv/block 2 | 66.39/58.82 | 64.55/57.27 | 64.37/57.19 | |||
| conv/block 1 | 65.91/57.17 | 63.85/58.97 | 64.17/56.60 |
Single learning indicates training from scratch and transfer learning includes “Parameters” (transferring parameters), “Embeddings Cat,” and “Embeddings Add” (incorporating emebdddings). The Models' performances with transfer learning are based on the COUGHVID dataset. For “Parameters,” the “Layers” column indicates the layers that are randomly initialised and trainable during the training procedure, and the remaining layers are frozen and initialised by the pre-trained FluSense models; for “Embeddings Cat,” “Embeddings Add,” and “Layers,” the column lists the convolutional layer/block (conv/block), after which embeddings incorporation happens. For convenience, the best test AUC and test UAR of every model under three transfer learning strategies are shown in bold face.
Models' performances [ %], validation AUC on the DiCOVA Track-1 dataset, and test UAR on the ComParE dataset, with single learning (train from scratch), and the proposed transfer learning strategies.
|
|
|
|
|
|
| ||
|---|---|---|---|---|---|---|---|
| Single Learning | – | ComParE | 64.70 | 63.35 | 61.78 | 57.38 | 63.80 |
| DiCOVA | 68.81 | 68.76 | 62.53 | 64.88 | 64.27 | ||
| Transfer Learning | Parameters | ComParE | – | 61.24 | 60.01 |
| 57.22 |
| DiCOVA | – |
| 66.39 |
| 63.29 | ||
| Embeddings | ComParE | – |
| 60.67 | 58.49 | 63.37 | |
| DiCOVA | – |
|
|
| 66.47 |
Pre-trained COUGHVID models and their corresponding transfer learning settings are chosen based on the best performance in .
Figure 4Confusion matrices for the best performance on the ComParE CCS test set and the DiCOVA validation set. For the DiCOVA dataset, since its test dataset is not accessible, the numbers are averaged over the five cross-validation folds.