| Literature DB >> 35573166 |
Saul Calderon-Ramirez1,2, Shengxiang Yang1, David Elizondo1, Armaghan Moemeni3.
Abstract
In the context of the global coronavirus pandemic, different deep learning solutions for infected subject detection using chest X-ray images have been proposed. However, deep learning models usually need large labelled datasets to be effective. Semi-supervised deep learning is an attractive alternative, where unlabelled data is leveraged to improve the overall model's accuracy. However, in real-world usage settings, an unlabelled dataset might present a different distribution than the labelled dataset (i.e. the labelled dataset was sampled from a target clinic and the unlabelled dataset from a source clinic). This results in a distribution mismatch between the unlabelled and labelled datasets. In this work, we assess the impact of the distribution mismatch between the labelled and the unlabelled datasets, for a semi-supervised model trained with chest X-ray images, for COVID-19 detection. Under strong distribution mismatch conditions, we found an accuracy hit of almost 30%, suggesting that the unlabelled dataset distribution has a strong influence in the behaviour of the model. Therefore, we propose a straightforward approach to diminish the impact of such distribution mismatch. Our proposed method uses a density approximation of the feature space. It is built upon the target dataset to filter out the observations in the source unlabelled dataset that might harm the accuracy of the semi-supervised model. It assumes that a small labelled source dataset is available together with a larger source unlabelled dataset. Our proposed method does not require any model training, it is simple and computationally cheap. We compare our proposed method against two popular state of the art out-of-distribution data detectors, which are also cheap and simple to implement. In our tests, our method yielded accuracy gains of up to 32%, when compared to the previous state of the art methods. The good results yielded by our method leads us to argue in favour for a more data-centric approach to improve model's accuracy. Furthermore, the developed method can be used to measure data effectiveness for semi-supervised deep learning model training.Entities:
Keywords: Chest X-ray; Computer aided diagnosis; Covid-19; Distribution mismatch; MixMatch; Out of distribution detection; Semi-supervised deep learning
Year: 2022 PMID: 35573166 PMCID: PMC9085448 DOI: 10.1016/j.asoc.2022.108983
Source DB: PubMed Journal: Appl Soft Comput ISSN: 1568-4946 Impact factor: 8.263
SSDL error rates (the lower the better) from literature of state of the art methods, using the SVHN dataset. As number of labels, , and were the most frequently used in the literature.
| Model | Category | |||
|---|---|---|---|---|
| Supervised only | Supervised | – | ||
| Pi Model (Pi-M) | 6.83 ± 0.66 | 4.82 ± 0.17 | – | |
| Temporal Ensemble Model (TEM) | 5.12 ± 0.13 | 4.42 ± 0.16 | – | |
| Virtual Adversarial Training with Entropy Minimization (VATM+EM) | – | 3.86 ± 0.22 | – | |
| Virtual Adversarial Training Model (VATM) | – | 5.42 ± 0.22 | – | |
| Mean Teacher Model (MTM) | 4.18 ± 0.5 | 3.95 ± 0.19 | – | |
| Self Supervised network Model (SESEMI) | 6.5 ± 0.28 | 5.59 ± 0.12 | – | |
| Mutual Exclusivity-Transformation Model (METM) | 9.62 ± 1.37 | 4.52 ± 0.4 | 3.66 ± 0.14 | |
| Walker Model (WaM) | 6.25 ± 0.32 | 5.14 ± 0.17 | 4.6 ± 0.21 | |
| Transductive Model (TransM) | Consistency based SSDL | 4.32 ± 0.3 | 3.8 ± 0.27 | 3.35 ± 0.27 |
| Transductive Model with Mean Teacher (TransM+MTM) | 4.09 ± 0.42 | 3.09 ± 0.27 | 3.35 ± 0.27 | |
| Memory based Model (MeM) | – | 4.21 ± 0.12 | – | |
| MixMatch | – | 3.5 ± 0.28 | – | |
| ReMixMatch | Consistency and Pseudo-label based SSDL | – | 2.65 ± 0.08 | – |
| FixMatch using Random Augmentation | – | 2.28 ± 0.11 | – | |
| FixMatch using CTA Augmentation | – | 2.36 ± 0.19 | – | |
| Tri-Net | – | 3.71 ± 0.14 | – | |
| Speed as a supervisor for SSDL (SaaSM) | Pseudo-label based SSDL | – | 3.82 ± 0.09 | – |
| Tri-Net with the Pi-M | – | 3.45 ± 0.1 | – | |
State of the art SSDL methods robust to distribution mismatch. The unseen classes setting is the most tested cause for distribution mismatch. Our proposed method tests covariate and prior probability shift causes for distribution mismatch, and implements a feature space based method for scoring unlabelled data.
| Method name | IID violation cause | Thresholding | OOD data filtering approach |
|---|---|---|---|
| RealMix | Unseen classes | Hard | Output based |
| UASD | Unseen classes | Hard | Output based |
| DS3L | Unseen classes | Soft | Optimization based |
| R-SSL | Unseen classes | Soft | Optimization based |
OOD test benchmarks for different techniques. Datasets with * were randomly cut by half for in-distribution training labelled data and the other half was used as OOD unlabelled data. The table reveals how arbitrary different testbeds have been used for benchmarking OOD detection algorithms, using the unseen classes cause for the IID assumption violation. IOD-OOD dataset pairs are indicated by number pairs in the table.
| Method name | IOD data | OOD data | Category |
|---|---|---|---|
| Max. value of Softmax layer | CIFAR-10 1 | SUN | |
| CIFAR-100 2 | Gaussian | ||
| MNIST 3 | Omniglot 3 | ||
| notMNIST3 | |||
| Uniform noise3 | |||
| Inhibited Softmax | CIFAR-101 | SVHN1 | |
| MNIST2 | LFW-A1 | ||
| notMNIST2 | |||
| Omniglot2 | |||
| ODIN | CIFAR-101 | TinyImageNet | Output |
| CIFAR-1002 | LSUN | ||
| iSUN | |||
| Uniform | |||
| Gaussian | |||
| Epistemic Uncertainty Estimation | CIFAR *1 | CIFAR*1 | |
| FashionMNIST*2 | FashionMNIST* | ||
| SVHN*3 | SVHN*3 | ||
| MNIST*4 | MNIST*4 | ||
| Mahalanobis Latent Distance | CIFAR-101 | SVHN | |
| CIFAR-1002 | CIFAR-103 | ||
| SVHN3 | TinyImageNet | ||
| LSUN | |||
| Deterministic Uncertainty quantification | CIFAR-10 | SVHN | Feature |
| Deep Residual Flow | CIFAR-101 | CIFAR-103 | |
| CIFAR-1002 | TinyImageNet | ||
| SVHN3 | LSUN | ||
| SVHN | |||
COVID-19 observation sources description used in this work.
| Dataset | CR | Chinese | ChestX-ray8 | Indiana |
|---|---|---|---|---|
| No. of patients | 105 | 5856 | 65240 | 4000 |
| Patient’s age range (years) | 7–86 | children | 0–94 | adults |
| No. of obs. | 105 | 5236 | 224316 | 8121 |
| Hospital/clinic | Clinica Chavarria | No info. | Stanford Hospital | Indiana Network |
| for Patient Care | ||||
| Im. resolution | 1907 × 1791 | 1300 × 600 | 1024 × 1024 | 1400 × 1400 |
| Reference | ||||
Fig. 1Row 1, column 1: a COVID-19 observation from [11], row 1, column 2: a COVID-19 observation from the Chinese dataset [13], row 2, column 1: ChestX-ray8 COVID-19 image [14], row 2, column 2: Indiana dataset COVID-19 sample image [15]. The bottom image corresponds to a sample image from the Costa Rica dataset [22]. As it can be seen, images from the Costa Rica dataset include a black frame.
TB-1.1 results: Accuracy of a Alexnet model trained with MixMatch with different datasets. The unlabelled datasets Chest-Xray8, Costa Rican and Chinese datasets include only COVID-19 observations.
| Dataset | ||
|---|---|---|
| Supervised | ||
| Indiana (with COVID-19 | ||
| China | ||
| Costa Rica | ||
| ChestX-ray8 | ||
| ChestX-ray8 65% - Costa Rica 35% | ||
| ChestX-ray8 35% - Costa Rica 65% | ||
| China 65% - Costa Rica 35% | ||
| China 35% - Costa Rica 65% | ||
| Indiana 65% - Costa Rica 35% | ||
| Indiana 35% - Costa Rica 65% |
Fig. 2Summary of the proposed unlabelled data scoring methods for SSDL, and .
TB-1.1 results: Accuracy of a Densenet model trained with MixMatch with different datasets. The unlabelled datasets Chest-Xray8, Costa Rican and Chinese datasets include only COVID-19 observations. No use of a fine-tuned feature extractor.
| Dataset | ||
|---|---|---|
| Supervised | ||
| Indiana (with COVID-19 | ||
| China | ||
| Costa Rica | ||
| ChestX-ray8 | ||
| ChestX-ray8 65% - Costa Rica 35% | ||
| ChestX-ray8 35% - Costa Rica 65% | ||
| China 65% - Costa Rica 35% | ||
| China 35% - Costa Rica 65% | ||
| Indiana 65% - Costa Rica 35% | ||
| Indiana 35% - Costa Rica 65% |
TB-1.1 results: Accuracy of a Densenet model trained with MixMatch with different datasets. The unlabelled datasets Chest-Xray8, Costa Rican and Chinese datasets include only COVID-19 observations. Using the fine-tuned feature extractor.
| Dataset | ||
|---|---|---|
| Supervised | ||
| Indiana (with COVID-19 | ||
| China | ||
| Costa Rica | ||
| ChestX-ray8 | ||
| ChestX-ray8 65% - Costa Rica 35% | ||
| ChestX-ray8 35% - Costa Rica 65% | ||
| China 65% - Costa Rica 35% | ||
| China 35% - Costa Rica 65% | ||
| Indiana 65% - Costa Rica 35% | ||
| Indiana 35% - Costa Rica 65% |
TB-1.2 test results: Pearson coefficient between the accuracy and the calculated divergences.
| SSDL model | Pearson coefficient | |
|---|---|---|
| Alexnet | 20 | −0.798 |
| 40 | −0.75 | |
| Densenet | 20 | −0.665 |
| 40 | −0.662 | |
Accuracy of a Alexnet model trained with MixMatch, with the filtered datasets using the harm coefficient with the two proposed feature density based methods: FH and the Mahalanobis based filter. The percentage of discarded observations is the same of the amount of Costa Rican observations.
| Dataset | Acc. FD | Acc. Maha. | Acc. FD | Acc. Maha. |
|---|---|---|---|---|
| ChestX-ray8 35% - Costa Rica 65% | ||||
| ChestX-ray8 65% - Costa Rica 35% | ||||
| China 35% - Costa Rica 65% | ||||
| China 65% - Costa Rica 35% | ||||
| Indiana 35% - Costa Rica 65% | ||||
| Indiana 65% - Costa Rica 35% | ||||
Accuracy of a Densenet model trained with MixMatch, with the filtered datasets using the harm coefficient with the two proposed feature density based methods: FH and the Mahalanobis based filter. The percentage of discarded observations is the same of the amount of Costa Rican observations.
| Dataset | Acc. FD | Acc. Maha. | Acc. FD | Acc. Maha. |
|---|---|---|---|---|
| ChestX-ray8 35% - Costa Rica 65% | ||||
| ChestX-ray8 65% - Costa Rica 35% | ||||
| China 35% - Costa Rica 65% | ||||
| China 65% - Costa Rica 35% | ||||
| Indiana 35% - Costa Rica 65% | ||||
| Indiana 65% - Costa Rica 35% | ||||
Average and standard deviation of the execution time, in seconds, of the different unlabelled harmful data techniques tested in this work. The execution time of using 10 random data batches was measured.
| Harmful data filter | Time (s) |
|---|---|
| Mahalanobis | |
| Feature Histograms | |
| Softmax | |
| Monte Carlo Dropout |
TB-1.2 results: Cosine DeDiM distance, using 10 different batches of 80 observations, between the labelled and unlabelled datasets, and , respectively. Using Alexnet, to keep computing cost low.
| Dataset | |
|---|---|
| China | |
| Costa Rica | |
| ChestX-ray8 | |
| ChestX-ray8 65% - Costa Rica 35% | |
| ChestX-ray8 35% - Costa Rica 65% | |
| China 65% - Costa Rica 35% | |
| China 35% - Costa Rica 65% | |
| Indiana 65% - Costa Rica 35% | |
| Indiana 35% - Costa Rica 65% |
Accuracy of a Alexnet model trained with MixMatch, with the filtered datasets using the harm coefficient with the two output-based methods: MCD and Softmax. The percentage of discarded observations is the same of the amount of Costa Rican observations.
| Dataset | Acc. Softmax | Acc. MCD | Acc. Softmax | Acc. MCD |
|---|---|---|---|---|
| ChestX-ray8 35% - Costa Rica 65% | ||||
| ChestX-ray8 65% - Costa Rica 35% | ||||
| China 35% - Costa Rica 65% | ||||
| China 65% - Costa Rica 35% | ||||
| Indiana 35% - Costa Rica 65% | ||||
| Indiana 65% - Costa Rica 35% | ||||
Accuracy of a Densenet model trained with MixMatch, with the filtered datasets using the harm coefficient with the two output-based methods: MCD and Softmax. The percentage of discarded observations is the same of the amount of Costa Rican observations.
| Dataset | Acc. Softmax | Acc. MCD | Acc. Softmax | Acc. MCD |
|---|---|---|---|---|
| ChestX-ray8 35% - Costa Rica 65% | ||||
| ChestX-ray8 65% - Costa Rica 35% | ||||
| China 35% - Costa Rica 65% | ||||
| China 65% - Costa Rica 35% | ||||
| Indiana 35% - Costa Rica 65% | ||||
| Indiana 65% - Costa Rica 35% | ||||