| Literature DB >> 34052882 |
Daphna Keidar1, Daniel Yaron2, Elisha Goldstein3, Yair Shachar4, Ayelet Blass2, Leonid Charbinsky5, Israel Aharony5, Liza Lifshitz5, Dimitri Lumelsky5, Ziv Neeman5, Matti Mizrachi6,7, Majd Hajouj6,7, Nethanel Eizenbach6,7, Eyal Sela6,7, Chedva S Weiss8, Philip Levin8, Ofer Benjaminov8, Gil N Bachar9,10, Shlomit Tamir9,10, Yael Rapson9,10, Dror Suhami9,10, Eli Atar9,10, Amiel A Dror6,7, Naama R Bogot8, Ahuva Grubstein9,10, Nogah Shabshin5, Yishai M Elyada11, Yonina C Eldar12.
Abstract
OBJECTIVES: In the midst of the coronavirus disease 2019 (COVID-19) outbreak, chest X-ray (CXR) imaging is playing an important role in diagnosis and monitoring of patients with COVID-19. We propose a deep learning model for detection of COVID-19 from CXRs, as well as a tool for retrieving similar patients according to the model's results on their CXRs. For training and evaluating our model, we collected CXRs from inpatients hospitalized in four different hospitals.Entities:
Keywords: COVID-19; Machine learning; Radiography; Thoracic; X-rays
Mesh:
Year: 2021 PMID: 34052882 PMCID: PMC8164481 DOI: 10.1007/s00330-021-08050-1
Source DB: PubMed Journal: Eur Radiol ISSN: 0938-7994 Impact factor: 7.034
Fig. 1Full pipeline workflow overview. First each image undergoes processing consisting of augmentation, which is a set of visual transformations (transformations shown: (a) original image, (b) brighten, (c) horizontal flip, (d) 7 degrees rotation, (e) CLAHE transformation, (f) scale), normalization, in order to set a standard scale of image size and color, and segmentation, which emphasizes the area of the lungs and is combined to the image. The entire image set is then fed into a neural network which produces a classification outcome for each image as positive for coronavirus disease 2019 (COVID-19) or negative for COVID-19. In addition, embedded features are extracted from the last layer of the network and are used to find images with similar characteristics to a given image as learned by the network
Demographic statistics on patients and chest images in this study
| Label | No. of patients | No. of images | Sex (men/women/unknown) | Age (years mean ± std) |
|---|---|---|---|---|
| COVID-19 positive | 360 | 1191 | 199 (55%)/132 (36%)/29 (9%) | 60 ± 18 |
| COVID-19 negative | 1024 | 1135 | 353 (34%)/323 (32%)/348 (34%) | 65 ± 19 |
Comparison of accuracy, sensitivity, and specificity of various deep networks trained and tested on the same test set
| Training model | Accuracy (%) | Sensitivity (%) | Specificity (%) |
|---|---|---|---|
| ResNet34 | 86.8 (305 of 351) | 83.81 (151 of 180) | 90.0 (154 of 171) |
| ResNet50 | 90.0 (316 of 351) | 90.5 (163 of 180) | 89.4 (153 of 171) |
| ResNet50 - No preprocessing | 85.1 (298 of 350) | 82.1 (147 of 179) | 88.3 (151 of 171) |
| ResNet152 | 87.1 (306 of 351) | 83.3 (150 of 180) | 91.2 (156 of 171) |
| CheXpert | 80.6 (283 of 351) | 81.1 (146 of 180) | 80.6 (137 of 171) |
| VGG16 | 85.2(299 of 351) | 81.6 (147of 180) | 88.8 (152 of 171) |
*Bold: model with best accuracy and sensitivity is the ensemble shown in bold
Fig. 2Performance of the model. a Confusion matrix of the classification. True positive rate (TPR) at the bottom right corner, true negative rate (TNR) at the top left corner, false positive rate (FPR) at the top right corner, and false negative rate (FNR) at the bottom left corner. b Receiver operating characteristic (ROC) curve. The curve shows the relation between true positive rate (TPR) and false positive rate (FPR) as the threshold of the separation between positive and negative classification is varied. The performance of the model is measured by the area under the curve (AUC). Ideally, the curve should cover as much area as possible up to the upper left corner (AUC score of 1), which minimizes the FPR while maximizing the TPR. The AUC is 0.95. c Precision-recall curve. Shows the relation between precision and recall. Precision and recall are affected from different classes of the data, thus can vary in scores when data is imbalanced (e.g., more observations of positive or negative compared to the other). We would like to have the AUC as large as possible up to the upper right corner, which maximizes both precision and recall. d Classification score histogram. Ground truth (GT) labels are in colors. Every image is scored on a scale between 0 and 1 with threshold of 0.5, seen as a dashed line, such that all images with a higher score will be classified as positive for COVID-19 and images below as negative. Negatively labeled images that received a score above 0.5 are, therefore, incorrectly classified images, and vice versa with respect to positively labeled images. However, the closer the image score is to one of the edges (0 or 1), the stronger the confidence in the image’s classification. The accumulation of two distinct colors on the edges point to good separation of many observations with strong confidence in the classification
Fig. 3t-Distributed stochastic neighbor embedding (t-SNE). A high-dimensional feature vector was extracted for each image from the last layer before the network output, and reduced into 2 dimensions. Each point on the graph represents the features of an image after dimension reduction and arrangement in space. Next the images were colored according to their ground truth (GT), thus revealing two main clusters. The clusters are mostly in one color each, which essentially shows a strong association of the features, extracted from the decision layer and are used to arrange in space, with the GT of the images, represented by the colors
Fig. 4Classification score as a function of time change. The first image of each patient was acquired at the same day of first admission; we note that time value as day 0. Other images of patients which were scanned more than once were noted with time value according to the number of days since the first image was acquired, thus representing the time elapsed from first admission and is ordered on the x-axis. The y-axis shows the classification score of each image between 0 (= negative for COVID-19) and 1 (= positive for COVID-19), such that a score closer to the edge indicates more confidence in the network’s classification. a The classification score with respect to change in time. The more days elapse since first admission, the more confident the classification. b Mean values of classification scores for all images of the same day value
Fig. 5Three images labeled by a radiologist as hard to diagnose. Despite this, the model was able to classify them correctly. Each image is scored with a classification score on a scale between 0 and × 1 with threshold of 0.5 such that all images with score above the threshold will be labeled as positive for COVID-19 and images below as negative. The ground truth (GT) label of each image is also shown
Fig. 6In the figures, the left image is a CXR from the test set, and the two on the right are the two images closest to it from the training set, given the image embeddings from the network’s last layer. a All three images are COVID-19 negative. The distances between the middle and rightmost images to the left one are 0.54 and 0.56 respectively. b All three images are COVID-19 positive. The distances between the middle and rightmost images to the left one are 0.51 and 0.55 respectively. The overall mean distance between training and test images is 3.9 ± 2.5 (mean ± std). The mean distance between all positive training and positive test images is 1.4 ± 1.9, between negative training and negative test images 2.2 ± 1.3, and between images from different classes is 5.8 ± 1.9. We see that images from different classes are further away from each other, but whether a close distance truly corresponds to similar lung findings still requires verification