| Literature DB >> 36201483 |
Anusua Trivedi1,2, Caleb Robinson1, Marian Blazes3, Anthony Ortiz1, Jocelyn Desbiens4, Sunil Gupta4, Rahul Dodhia1, Pavan K Bhatraju5,6, W Conrad Liles5, Jayashree Kalpathy-Cramer7, Aaron Y Lee3, Juan M Lavista Ferres1.
Abstract
In response to the COVID-19 global pandemic, recent research has proposed creating deep learning based models that use chest radiographs (CXRs) in a variety of clinical tasks to help manage the crisis. However, the size of existing datasets of CXRs from COVID-19+ patients are relatively small, and researchers often pool CXR data from multiple sources, for example, using different x-ray machines in various patient populations under different clinical scenarios. Deep learning models trained on such datasets have been shown to overfit to erroneous features instead of learning pulmonary characteristics in a phenomenon known as shortcut learning. We propose adding feature disentanglement to the training process. This technique forces the models to identify pulmonary features from the images and penalizes them for learning features that can discriminate between the original datasets that the images come from. We find that models trained in this way indeed have better generalization performance on unseen data; in the best case we found that it improved AUC by 0.13 on held out data. We further find that this outperforms masking out non-lung parts of the CXRs and performing histogram equalization, both of which are recently proposed methods for removing biases in CXR datasets.Entities:
Mesh:
Year: 2022 PMID: 36201483 PMCID: PMC9536609 DOI: 10.1371/journal.pone.0274098
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.752
Dataset overview.
Counts of disease label type per dataset. The COVIDx dataset is made up of 5 sub-datasets and the CC-CCII dataset is used as a held-out test set.
| Normal | Pneumonia | COVID-19+ | |
|---|---|---|---|
|
| 8,851 | 6,040 | 421 |
| | – | 28 | 289 |
| | – | – | 35 |
| A | – | – | 51 |
| SIRM | – | – | 46 |
| RSNA | 8,851 | 6,012 | – |
|
| 11,604 | 18,236 | 1,690 |
Fig 1Overview of the feature disentanglement modeling approach.
We propose to learn a model that simultaneously predicts the class label and domain label for a given CXR image. The parameters of the model are updated to extract representations that contain information about the class label but not about domain label.
Results showing how well logistic regression classifiers can identify which sub-dataset a CXR is from within the COVIDx dataset, and how well classifiers can identify which dataset a “COVID-19+” CXR is from across both the COVIDx and CC-CCII datasets.
We report AUC values as averages of the one-vs-all binary AUCs between all classes, and accuracy (ACC) as the average accuracy over all classes. We observe that the representations generated by the classifiers, even from masked/equalized inputs, contain enough information to accurately identify the sources of the imagery in both cases.
| COVIDx datasets | All COVID-19+ samples | ||||
|---|---|---|---|---|---|
|
|
|
|
|
|
|
| Pixel intensity histogram | 256 | 0.75 ± 0.03 | 0.36 ± 0.04 | 0.83 ± 0.03 | 0.47 ± 0.07 |
| Torchxrayvision embedding | 1024 | 0.93 ± 0.03 | 0.54 ± 0.09 | 0.93 ± 0.03 | 0.59 ± 0.07 |
| ImageNet embedding | 1024 | 0.95 ± 0.02 | 0.53 ± 0.07 | 0.97 ± 0.01 | 0.66 ± 0.09 |
| COVID-Net embedding | 2048 | 0.89 ± 0.05 | 0.39 ± 0.05 | 0.91 ± 0.03 | 0.45 ± 0.04 |
|
|
|
|
|
|
|
| Pixel intensity histogram | 256 | 0.57 ± 0.06 | 0.27 ± 0.06 | 0.75 ± 0.02 | 0.25 ± 0.04 |
| Torchxrayvision embedding | 1024 | 0.76 ± 0.05 | 0.35 ± 0.07 | 0.88 ± 0.02 | 0.42 ± 0.08 |
| ImageNet embedding | 1024 | 0.84 ± 0.02 | 0.34 ± 0.04 | 0.91 ± 0.02 | 0.50 ± 0.10 |
| COVID-Net embedding | 2048 | 0.71 ± 0.04 | 0.26 ± 0.04 | 0.85 ± 0.04 | 0.35 ± 0.05 |
Results showing within-dataset class performance, within-dataset domain performance, and out-of-sample class performance from training models with the COVIDx dataset.
The task performance (task AUC and task accuracy) shows how well classifiers are able to distinguish between “Normal”, “Pneumonia”, and “COVID-19+” disease labels, while the domain performance (domain AUC) shows how well classifiers are able to distinguish which sub-dataset an image belongs to. We report AUC values as averages of the one-vs-all binary AUCs between all classes, and accuracy (ACC) as the average accuracy over all classes. In all cases class performance (both within-dataset and out-of-sample) is reported from the classifier trained on samples within-dataset, while domain performance is reported from an additional classifier trained to predict domain labels on top of the learned representations, z′, as a measure of how much domain information the representation contains. We observe that using feature disentanglement decreases within-dataset domain performance as expected, and increases out-of-sample class performance—i.e. improves generalization performance.
| COVIDx | CC-CCII | |||
|---|---|---|---|---|
| Torchxrayvision Embeddings | Task AUC | Domain AUC | Task AUC | Task ACC |
| Unmasked | 0.97 ± 0.01 | 0.94 ± 0.02 | 0.55 ± 0.03 | 0.34 ± 0.04 |
| Masked/equalized | 0.92 ± 0.01 | 0.85 ± 0.03 | 0.65 ± 0.02 | 0.42 ± 0.02 |
| Unmasked + Disentanglement | 0.90 ± 0.03 | 0.56 ± 0.07 | 0.68 ± 0.04 | 0.49 ± 0.02 |
| Masked/equalized + Disentanglement | 0.87 ± 0.02 | 0.53 ± 0.06 | 0.71 ± 0.03 | 0.47 ± 0.02 |
|
| ||||
| Unmasked | 0.96 ± 0.01 | 0.97 ± 0.02 | 0.64 ± 0.02 | 0.37 ± 0.01 |
| Masked/equalized | 0.94 ± 0.01 | 0.88 ± 0.03 | 0.67 ± 0.03 | 0.41 ± 0.03 |
| Unmasked + Disentanglement | 0.85 ± 0.03 | 0.57 ± 0.07 | 0.73 ± 0.03 | 0.43 ± 0.02 |
| Masked/equalized + Disentanglement | 0.85 ± 0.03 | 0.57 ± 0.04 | 0.73 ± 0.03 | 0.46 ± 0.02 |
|
| ||||
| Unmasked | 0.96 ± 0.01 | 0.93 ± 0.02 | 0.59 ± 0.02 | 0.38 ± 0.01 |
| Masked/equalized | 0.89 ± 0.02 | 0.80 ± 0.04 | 0.63 ± 0.02 | 0.44 ± 0.02 |
| Unmasked + Disentanglement | 0.92 ± 0.02 | 0.53 ± 0.10 | 0.67 ± 0.03 | 0.41 ± 0.02 |
| Masked/equalized + Disentanglement | 0.83 ± 0.02 | 0.52 ± 0.07 | 0.62 ± 0.01 | 0.45 ± 0.00 |
Fig 2UMAP projections of features learned from models trained with and without feature disentanglement on unmasked imagery.
Each point represents a CXR from the COVIDx dataset. The top row colors points by their domain label—which subdataset of the COVIDx dataset they are in—while the bottom row colors points by their disease label. We observe that without feature disentanglement, the learned representations easily separate datasets—despite not being trained for this task—however, with feature disentanglement, the learned representations do not clearly separate datasets.