| Literature DB >> 36050406 |
Kai Packhäuser1, Sebastian Gündel2, Nicolas Münster2, Christopher Syben2, Vincent Christlein2, Andreas Maier2.
Abstract
With the rise and ever-increasing potential of deep learning techniques in recent years, publicly available medical datasets became a key factor to enable reproducible development of diagnostic algorithms in the medical domain. Medical data contains sensitive patient-related information and is therefore usually anonymized by removing patient identifiers, e.g., patient names before publication. To the best of our knowledge, we are the first to show that a well-trained deep learning system is able to recover the patient identity from chest X-ray data. We demonstrate this using the publicly available large-scale ChestX-ray14 dataset, a collection of 112,120 frontal-view chest X-ray images from 30,805 unique patients. Our verification system is able to identify whether two frontal chest X-ray images are from the same person with an AUC of 0.9940 and a classification accuracy of 95.55%. We further highlight that the proposed system is able to reveal the same person even ten and more years after the initial scan. When pursuing a retrieval approach, we observe an mAP@R of 0.9748 and a precision@1 of 0.9963. Furthermore, we achieve an AUC of up to 0.9870 and a precision@1 of up to 0.9444 when evaluating our trained networks on external datasets such as CheXpert and the COVID-19 Image Data Collection. Based on this high identification rate, a potential attacker may leak patient-related information and additionally cross-reference images to obtain more information. Thus, there is a great risk of sensitive content falling into unauthorized hands or being disseminated against the will of the concerned patients. Especially during the COVID-19 pandemic, numerous chest X-ray datasets have been published to advance research. Therefore, such data may be vulnerable to potential attacks by deep learning-based re-identification algorithms.Entities:
Mesh:
Year: 2022 PMID: 36050406 PMCID: PMC9434540 DOI: 10.1038/s41598-022-19045-3
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.996
Figure 1General problem scenario: Comparing a given chest radiograph to publicly available dataset images by means of DL techniques would either result in discrete labels indicating whether or not the dataset images belong to the same patient as the given radiograph (verification scenario) or yield a ranked list of the most similar radiographs related to the given scan (retrieval scenario). Images belonging to the same patient are highlighted with the same color. The given radiograph is marked with an asterisk. The shown cases would enable a potential attacker to link sensitive patient-related information contained in the dataset to the image of interest.
Overview of the obtained verification results for our experiments using varying training set sizes at different learning rates . Moreover, different data handling techniques were used (FTS Fixed training set, RNP Randomized negative pairs). For each experiment, the training sets were balanced with respect to the amount of positive and negative image pairs. In this table, we present the AUC (together with the lower and upper bounds of the 95% confidence intervals from 10,000 bootstrap runs), the accuracy, the specificity, the recall, the precision, and the F1-score. Bold text emphasizes the overall highest AUC value.
| Data handling | Ns | AUC + 95 % CI | Accuracy ( | Specificity ( | Recall ( | Precision ( | F1-score | |
|---|---|---|---|---|---|---|---|---|
| FTS | 100,000 | 0.7797 | ||||||
| 200,000 | 0.8750 | |||||||
| 400,000 | 0.8684 | |||||||
| 800,000 | 0.9536 | |||||||
| RNP | 800,000 | 0.9542 |
Figure 2ROC curves for different training set sizes and a fixed LR of . During training, the fixed data handling technique was employed.
Figure 3Confusion matrix corresponding to the best experiment shown in Table 1 (last row) giving clear insights into the performance of our trained model.
Figure 4TPR for image pairs with (a) age differences, (b) with changes in the disease pattern, and (c) with changes in the projection view. The absolute numbers of true positives and overall positives are given for each bin. Note that the number of image pairs with age differences of more than 12 years is comparatively small, which is why the corresponding TPRs are neglected in this figure.
Figure 5Exemplary image pairs are classified by our best performing verification model. Each column represents one image pair. The first four columns (a–d) show true positive classifications. The last two columns (e) and (f) depict a false positive and a false negative classification, respectively.
Figure 6Grad-CAM visualizations for the first convolutional layer of the ResNet-50 incorporated in our SNN. Each column represents one image pair. The first four columns (a–d) show true positive classifications. The last two columns (e) and (f) depict a false positive and a false negative classification, respectively. The shown images illustrate that the anatomical structure of, e.g. the breast (cf. (a,b,e)), the lungs (cf. (a,b,c,e)), and the heart (cf. (a,b)) have a high impact on the final model prediction. Furthermore, it can be seen that our network focuses on the collarbones (cf. (a,c,d,f)) and the ribs (cf. (b,c,e)). The upper images of (a) and (f) also highlight that our network pays attention to the contour of the diaphragm.
Figure 7Grad-CAM visualizations for an intermediate convolutional layer of the ResNet-50 incorporated in our SNN. Each column represents one image pair. The first four columns (a–d) show true positive classifications. The last two columns (e) and (f) depict a false positive and a false negative classification, respectively. The obtained attention maps clearly illustrate that the selected network layer focuses on the ribs and the outline of the thorax.
Comparison of the verification performance on two different subsets of the ChestX-ray14 dataset that either contain foreign material or not (first two rows). Furthermore, we show the verification results for the CheXpert dataset and the COVID-19 Image Data Collection (last two rows). We present the AUC (together with the lower and upper bounds of the 95% confidence intervals from 10,000 bootstrap runs), the accuracy, the specificity, the recall, the precision, and the F1-score.
| Dataset | Subset | AUC + 95 % CI | Accuracy ( | Specificity ( | Recall ( | Precision ( | F1-score |
|---|---|---|---|---|---|---|---|
| ChestX-ray14 | w/ foreign material | 0.9795 | |||||
| w/o foreign material | 0.9862 | ||||||
| CheXpert | – | 0.9429 | |||||
| COVID-19 | – | 0.9127 |
Overview of the obtained results for our image retrieval experiments. In this table, we report the mAP@R, the R-Precision, and the Precision@1. The first 4 rows show the results on the ChestX-ray14 dataset for different image resolutions used for evaluation. The fifth row shows the outcomes on the CheXpert dataset. The last row indicates the results on the COVID-19 Image Data Collection. Bold text represents the overall highest performance metrics.
| Dataset | Input dimensions | mAP@R | R-Precision | Precision@1 |
|---|---|---|---|---|
| ChestX-ray14 | 1024 | |||
| 800 | 0.9709 | 0.9726 | 0.9958 | |
| 512 | 0.9572 | 0.9601 | 0.9945 | |
| 224 | 0.7730 | 0.7979 | 0.9756 | |
| CheXpert | Original | 0.8001 | 0.8148 | 0.9444 |
| COVID-19 | Original | 0.8569 | 0.8707 | 0.8821 |
Comparison of the re-identification performance on two different subsets (ChestX-ray14) that either contain foreign material or not. We report the mAP@R, the R-Precision, and the Precision@1.
| Subset | mAP@R | R-Precision | Precision@1 |
|---|---|---|---|
| w/foreign material | 0.9925 | 0.9925 | > 0.9999 |
| w/o foreign material | > 0.9999 | > 0.9999 | > 0.9999 |
Figure 8SNN architecture used for patient verification on the ChestX-ray14[16] dataset. The feature extraction blocks (light blue) share the same set of network parameters and produce the feature representations and (yellow). After merging (orange) and an additional FC and sigmoid layer (light blue), the network yields the final output score (green). For our patient re-identification experiments, we used the same architecture but ejected all the layers from the merging layer onwards.