| Literature DB >> 34276263 |
Saul Calderon-Ramirez1,2, Shengxiang Yang1, Armaghan Moemeni3, David Elizondo1, Simon Colreavy-Donnelly1, Luis Fernando Chavarría-Estrada4, Miguel A Molina-Cabello5,6.
Abstract
A key factor in the fight against viral diseases such as the coronavirus (COVID-19) is the identification of virus carriers as early and quickly as possible, in a cheap and efficient manner. The application of deep learning for image classification of chest X-ray images of COVID-19 patients could become a useful pre-diagnostic detection methodology. However, deep learning architectures require large labelled datasets. This is often a limitation when the subject of research is relatively new as in the case of the virus outbreak, where dealing with small labelled datasets is a challenge. Moreover, in such context, the datasets are also highly imbalanced, with few observations from positive cases of the new disease. In this work we evaluate the performance of the semi-supervised deep learning architecture known as MixMatch with a very limited number of labelled observations and highly imbalanced labelled datasets. We demonstrate the critical impact of data imbalance to the model's accuracy. Therefore, we propose a simple approach for correcting data imbalance, by re-weighting each observation in the loss function, giving a higher weight to the observations corresponding to the under-represented class. For unlabelled observations, we use the pseudo and augmented labels calculated by MixMatch to choose the appropriate weight. The proposed method improved classification accuracy by up to 18%, with respect to the non balanced MixMatch algorithm. We tested our proposed approach with several available datasets using 10, 15 and 20 labelled observations, for binary classification (COVID-19 positive and normal cases). For multi-class classification (COVID-19 positive, pneumonia and normal cases), we tested 30, 50, 70 and 90 labelled observations. Additionally, a new dataset is included among the tested datasets, composed of chest X-ray images of Costa Rican adult patients.Entities:
Keywords: COVID-19; Computer aided diagnosis; Coronavirus; Data imbalance; Semi-supervised learning
Year: 2021 PMID: 34276263 PMCID: PMC8276579 DOI: 10.1016/j.asoc.2021.107692
Source DB: PubMed Journal: Appl Soft Comput ISSN: 1568-4946 Impact factor: 6.725
Accuracy results with the COVID- from the Costa Rican dataset, the higher, the better. LB stands for label balancing, with usual weight correction for the supervised model, and the proposed PBC for the MixMatch model. A total of , and labelled observations were tested. Two data imbalance settings were tested, with and . The sample mean and the sample standard deviation are reported.
| SSDL | COVID- | COVID- | LB | ||||||
|---|---|---|---|---|---|---|---|---|---|
| No | 50% | 50% | NA | 0.871 | 0.039 | 0.912 | 0.049 | 0.951 | 0.025 |
| 70% | 30% | Yes | 0.877 | 0.040 | 0.900 | 0.053 | 0.931 | 0.034 | |
| No | 0.877 | 0.040 | 0.924 | 0.056 | 0.931 | 0.044 | |||
| 80% | 20% | Yes | 0.876 | 0.060 | 0.903 | 0.058 | 0.922 | 0.037 | |
| No | 0.876 | 0.079 | 0.907 | 0.072 | 0.938 | 0.035 | |||
| Yes | 50% | 50% | NA | 0.941 | 0.035 | 0.955 | 0.025 | 0.957 | 0.030 |
| 70% | 30% | Yes | 0.955 | 0.027 | 0.947 | 0.035 | 0.950 | 0.029 | |
| No | 0.907 | 0.042 | 0.900 | 0.049 | 0.914 | 0.028 | |||
| 80% | 20% | Yes | 0.957 | 0.025 | 0.964 | 0.021 | 0.960 | 0.020 | |
| No | 0.922 | 0.031 | 0.926 | 0.047 | 0.919 | 0.033 | |||
Accuracy results with the COVID- cases gathered from the Chinese paediatric repository available in [28]. LB stands for label balancing, with usual weight correction for the supervised model, and the proposed PBC for the MixMatch model.
| SSDL | COVID- | COVID- | LB | ||||||
|---|---|---|---|---|---|---|---|---|---|
| No | 50% | 50% | NA | 0.882 | 0.077 | 0.868 | 0.080 | 0.925 | 0.039 |
| 70% | 30% | Yes | 0.812 | 0.050 | 0.815 | 0.089 | 0.883 | 0.048 | |
| No | 0.823 | 0.048 | 0.815 | 0.087 | 0.868 | 0.064 | |||
| 80% | 20% | Yes | 0.857 | 0.107 | 0.898 | 0.052 | 0.930 | 0.053 | |
| No | 0.823 | 0.125 | 0.872 | 0.066 | 0.930 | 0.037 | |||
| Yes | 50% | 50% | NA | 0.945 | 0.036 | 0.950 | 0.026 | 0.963 | 0.028 |
| 70% | 30% | Yes | 0.925 | 0.042 | 0.930 | 0.053 | 0.943 | 0.034 | |
| No | 0.902 | 0.058 | 0.898 | 0.091 | 0.915 | 0.044 | |||
| 80% | 20% | Yes | 0.947 | 0.037 | 0.957 | 0.022 | 0.962 | 0.028 | |
| No | 0.847 | 0.122 | 0.857 | 0.141 | 0.895 | 0.042 | |||
Accuracy results with the COVID- cases gathered from Indiana dataset [37]. LB stands for the label balancing usage (PBC in the case of SSDL).
| SSDL | COVID- | COVID- | LB | ||||||
|---|---|---|---|---|---|---|---|---|---|
| No | 50% | 50% | NA | 0.845 | 0.044 | 0.853 | 0.053 | 0.879 | 0.038 |
| 70% | 30% | Yes | 0.834 | 0.042 | 0.839 | 0.053 | 0.874 | 0.046 | |
| No | 0.845 | 0.058 | 0.860 | 0.050 | 0.869 | 0.061 | |||
| 80% | 20% | Yes | 0.845 | 0.048 | 0.829 | 0.053 | 0.856 | 0.042 | |
| No | 0.840 | 0.041 | 0.827 | 0.045 | 0.853 | 0.066 | |||
| Yes | 50% | 50% | NA | 0.905 | 0.047 | 0.918 | 0.038 | 0.908 | 0.029 |
| 70% | 30% | Yes | 0.882 | 0.067 | 0.902 | 0.046 | 0.902 | 0.042 | |
| No | 0.837 | 0.078 | 0.819 | 0.109 | 0.834 | 0.037 | |||
| 80% | 20% | Yes | 0.860 | 0.076 | 0.889 | 0.056 | 0.885 | 0.035 | |
| No | 0.803 | 0.062 | 0.747 | 0.095 | 0.795 | 0.078 | |||
Accuracy results with the COVID- cases gathered from the ChestX-ray8 repository available in [41]. LB stands for the label balancing usage (PBC in the case of SSDL).
| SSDL | COVID- | COVID- | LB | ||||||
|---|---|---|---|---|---|---|---|---|---|
| No | 50% | 50% | NA | 0.756 | 0.062 | 0.727 | 0.062 | 0.756 | 0.050 |
| 70% | 30% | Yes | 0.732 | 0.039 | 0.723 | 0.043 | 0.752 | 0.038 | |
| No | 0.739 | 0.051 | 0.744 | 0.053 | 0.773 | 0.049 | |||
| 80% | 20% | Yes | 0.729 | 0.051 | 0.721 | 0.054 | 0.768 | 0.047 | |
| No | 0.735 | 0.052 | 0.739 | 0.070 | 0.777 | 0.050 | |||
| Yes | 50% | 50% | NA | 0.803 | 0.059 | 0.814 | 0.052 | 0.840 | 0.038 |
| 70% | 30% | Yes | 0.816 | 0.048 | 0.815 | 0.038 | 0.839 | 0.049 | |
| No | 0.782 | 0.054 | 0.760 | 0.068 | 0.782 | 0.051 | |||
| 80% | 20% | Yes | 0.798 | 0.050 | 0.818 | 0.044 | 0.824 | 0.039 | |
| No | 0.735 | 0.056 | 0.740 | 0.075 | 0.752 | 0.048 | |||
Accuracy gain comparison when using no SSDL (No MM) vs. MixMatch with the proposed loss balancing correction (MMPBC), and to using MixMatch with no balancing correction (MM) vs. MixMatch with the proposed loss balancing correction (MMPBC). The accuracy gain is evaluated for the tested number of labelled observations (10, 15 and 20). Italic entries correspond to non statistically meaningful gains, after performing a Wilcoxon test, with .
| SSDL | COVID- | COVID- | Comparison | |||
|---|---|---|---|---|---|---|
| Acc. gain | Acc. gain | Acc. gain | ||||
| Costa Rican dataset | 70% | 30% | MM | |||
| MM | ||||||
| 80% | 20% | MM | ||||
| MM | ||||||
| Chinese paediatric dataset | 70% | 30% | MM | |||
| MM | ||||||
| 80% | 20% | MM | ||||
| MM | ||||||
| Chest X-ray8 dataset | 70% | 30% | MM | |||
| MM | ||||||
| 80% | 20% | MM | ||||
| MM | ||||||
| Indiana dataset | 70% | 30% | MM | |||
| MM | ||||||
| 80% | 20% | MM | ||||
| MM | ||||||
Results for the extended test dataset binary classification setting. LB stands for the label balancing usage (PBC in the case for SSDL), using 400 test images. PBC results with no statistical significance (with ) gains over the non-balanced SSDL implementation are written in italic.
| COVID-19 | COVID | SSDL | LB | ||||||
|---|---|---|---|---|---|---|---|---|---|
| 50% | 50% | No | No | 0.671 | 0.032 | 0.696 | 0.037 | 0.722 | 0.035 |
| 70% | 30% | Yes | Yes | 0.773 | 0.055 | 0.769 | 0.305 | 0.761 | 0.025 |
| Yes | No | 0.651 | 0.065 | 0.678 | 0.075 | 0.652 | 0.09 | ||
| No | Yes | 0.631 | 0.072 | 0.622 | 0.066 | 0.659 | 0.05 | ||
| 80% | 20% | Yes | Yes | 0.785 | 0.045 | 0.78 | 0.035 | 0.772 | 0.029 |
| Yes | No | 0.7 | 0.062 | 0.667 | 0.058 | 0.695 | 0.053 | ||
| No | Yes | 0.67 | 0.036 | 0.642 | 0.042 | 0.726 | 0.01 | ||
Multi-class classification for accuracy measures, using the RSNA dataset (standard-sized test dataset). LB stands for the label balancing usage (PBC in the case of SSDL). PBC results with no statistical significance gains over the non-balanced SSDL implementation are written in italic.
| COVID-19/Pneumonia/Normal | SSDL | LB | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| 10%/45%/45% | No | No | 0.561 | 0.052 | 0.575 | 0.062 | 0.588 | 0.056 | 0.603 | 0.039 |
| Yes | No | 0.587 | 0.048 | 0.565 | 0.045 | 0.595 | 0.035 | 0.571 | 0.056 | |
| Yes | 0.736 | 0.054 | 0.748 | 0.043 | 0.757 | 0.045 | 0.744 | 0.027 | ||
| 20%/40%/40% | No | No | 0.607 | 0.054 | 0.636 | 0.042 | 0.667 | 0.036 | 0.711 | 0.036 |
| Yes | No | 0.67 | 0.063 | 0.691 | 0.043 | 0.695 | 0.048 | 0.7 | 0.044 | |
| Yes | 0.733 | 0.052 | 0.747 | 0.048 | 0.758 | 0.052 | 0.752 | 0.024 | ||
| 30%/35%/35% | No | No | 0.656 | 0.048 | 0.666 | 0.057 | 0.698 | 0.041 | 0.707 | 0.039 |
| Yes | No | 0.727 | 0.654 | 0.766 | 0.047 | 0.76 | 0.032 | 0.738 | 0.04 | |
| Yes | ||||||||||
Multi-class classification for accuracy measures, using the BMIVC dataset (extended-sized dataset with 300 test observations). LB stands for the label balancing usage (PBC in the case of SSDL). PBC results with no statistical significance (with ) gains over the non-balanced SSDL implementation are written in italic.
| COVID-19/Pneumonia/Normal | SSDL | LB | ||||||
|---|---|---|---|---|---|---|---|---|
| 10%/45%/45% | No | No | 0.55 | 0.041 | 0.58 | 0.04 | 0.61 | 0.035 |
| Yes | No | 0.513 | 0.016 | 0.511 | 0.011 | 0.521 | 0.02 | |
| Yes | Yes | 0.618 | 0.047 | 0.647 | 0.027 | 0.67 | 0.027 | |
| 20%/40%/40% | No | No | 0.597 | 0.049 | 0.627 | 0.037 | 0.661 | 0.031 |
| Yes | No | 0.5805 | 0.045 | 0.585 | 0.036 | 0.573 | 0.029 | |
| Yes | Yes | 0.652 | 0.036 | 0.677 | 0.03 | 0.686 | 0.035 | |
| 30%/35%/35% | No | No | 0.615 | 0.036 | 0.649 | 0.012 | 0.68 | 0.24 |
| Yes | No | 0.671 | 0.037 | 0.675 | 0.025 | 0.695 | 0.021 | |
| Yes | Yes | |||||||
Averaged and truncated confusion matrix for multi-class classification using the standard-sized test dataset, for 10 runs, using labels. From left to right, using 10/35/35, 20/40/40 and 30/35/35 percent of the labels for COVID-19, Pneumonia and normal diagnostics, respectively. From top to bottom, the supervised model, the SSDL model with no PBC, and the SSDL model with the PBC.
Averaged and truncated confusion matrix for multi classification using the Valencian-Cohen dataset for multi-class classification with 300 test images (extended-sized test dataset), for 10 runs, using 40/40/20 percent of imbalance setting (for SSDL). From left to right, using , and labels respectively. From top to bottom, the supervised model (with completely balanced labels), the SSDL model with no PBC, and the SSDL model with the PBC.
Fig. 1Training and validation curves for the SSDL with PBC, the SSDL model with no label balancing and supervised models, respectively, from top to bottom. The blue dashed line corresponds to the training loss and the red continuous line to the validation loss.
Mean accuracy/F1-score/precision/recall for the Costa Rican dataset, for the oversampling (OS), the original MixMatch architecture and the proposed PBC imbalance correction method. LB stands for the label balancing usage (PBC in the case of SSDL).
| COVID-19 | COVID | LB | |||
|---|---|---|---|---|---|
| 80% | 20% | PBC | 0.943/0.905/0.937/0.879 | 0.943/0.894/0.944/0.853 | 0.936/0.913/0.951/0.883 |
| OS | 0.877/0.718/1/0.562 | 0.881/0.727/1/0.572 | 0.882/0.698/1/0.5372 | ||
| No | 0.891/0.726/1/0.573 | 0.877/0.718/1/0.56 | 0.874/0.696/1/0.535 | ||
| 70% | 30% | PBC | 0.941/0.889/0.931/0.853 | 0.946/0.907/0.948/0.872 | 0.953/0.918/0.965/0.875 |
| OS | 0.91/0.793/0.982/0.671 | 0.903/0.828/1/0.707 | 0.906/0.798/0.996/0.669 | ||
| No | 0.91/0.789/0.982/0.664 | 0.903/0.778/0.996/0.64 | 0.905/0.818/1/0.696 | ||
Fig. 2Receiver Operator Curves for binary classification (regular-sized test dataset), for the semi-supervised and supervised models with , and (from top to bottom), for the 20/80 percent (left column) and 30/70 percent (right column) imbalance settings (COVID and non COVID classes). The yellow ‘x’ line corresponds to the SSDL with PBC ROC curve, the red dashed line to the SSDL with no imbalance correction, and the blue continuous curve to the supervised model ROC curve. As usual, the -axis corresponds to the false positive ratio, and the -axis to the true positive ratio.
Fig. 3Sample of Receiver Operator Curves (ROCs) for binary classification using the Valencia dataset (400 test images), for the semi-supervised and supervised models with , and (from top to bottom), for the 20/80 percent imbalance settings (COVID and non COVID classes). The yellow ‘x’ line corresponds to the SSDL with PBC ROC curve, the red dashed line to the SSDL with no imbalance correction, and the blue continuous curve to the supervised model ROC curve. As usual, the -axis corresponds to the false positive ratio, and the -axis to the true positive ratio.
Fig. 4From top to bottom: Two sample heatmaps for correct predictions using the Indiana dataset and two samples from the chest X-ray8 dataset, respectively. From left to right: the original image, the heatmap of the MixMatch trained model with the proposed PBC and the output of the supervised model.