| Literature DB >> 32352039 |
Shih-Cheng Huang1,2, Tanay Kothari3, Imon Banerjee1,2,4,5, Chris Chute3, Robyn L Ball2, Norah Borus3, Andrew Huang3, Bhavik N Patel5, Pranav Rajpurkar3, Jeremy Irvin3, Jared Dunnmon5, Joseph Bledsoe6, Katie Shpanskaya5, Abhay Dhaliwal7, Roham Zamanian8,9, Andrew Y Ng3, Matthew P Lungren1,2,5.
Abstract
Pulmonary embolism (PE) is a life-threatening clinical problem and computed tomography pulmonary angiography (CTPA) is the gold standard for diagnosis. Prompt diagnosis and immediate treatment are critical to avoid high morbidity and mortality rates, yet PE remains among the diagnoses most frequently missed or delayed. In this study, we developed a deep learning model-PENet, to automatically detect PE on volumetric CTPA scans as an end-to-end solution for this purpose. The PENet is a 77-layer 3D convolutional neural network (CNN) pretrained on the Kinetics-600 dataset and fine-tuned on a retrospective CTPA dataset collected from a single academic institution. The PENet model performance was evaluated in detecting PE on data from two different institutions: one as a hold-out dataset from the same institution as the training data and a second collected from an external institution to evaluate model generalizability to an unrelated population dataset. PENet achieved an AUROC of 0.84 [0.82-0.87] on detecting PE on the hold out internal test set and 0.85 [0.81-0.88] on external dataset. PENet also outperformed current state-of-the-art 3D CNN models. The results represent successful application of an end-to-end 3D CNN model for the complex task of PE diagnosis without requiring computationally intensive and time consuming preprocessing and demonstrates sustained performance on data from an external institution. Our model could be applied as a triage tool to automatically identify clinically important PEs allowing for prioritization for diagnostic radiology interpretation and improved care pathways via more efficient diagnosis.Entities:
Keywords: Cardiovascular diseases; Radiography
Year: 2020 PMID: 32352039 PMCID: PMC7181770 DOI: 10.1038/s41746-020-0266-y
Source DB: PubMed Journal: NPJ Digit Med ISSN: 2398-6352
Model performance.
| Internal dataset: Stanford | Internal dataset: Stanford (real prevalence) | External dataset: Intermountain | External dataset: Intermountain (real prevalence) | |
|---|---|---|---|---|
| Accuracy | 0.77 [0.76–0.78] | 0.81 [0.80–0.82] | 0.78 [0.77–0.78] | 0.80 [0.79–0.81] |
| AUROC | 0.84 [0.82–0.87] | 0.84 [0.79–0.90] | 0.85 [0.81–0.88] | 0.85 [0.80–0.90] |
| Specificity | 0.82 [0.81–0.83] | 0.82 [0.82–0.83] | 0.80 [0.79–0.81] | 0.81 [0.80–0.82] |
| Sensitivity | 0.73 [0.72–0.74] | 0.75 [0.73–0.77] | 0.75 [0.74–0.76] | 0.75 [0.73–0.77] |
| PPV/precision | 0.81 [0.80–0.81] | 0.47 [0.45–0.48] | 0.77 [0.76–0.78] | 0.44 [0.43–0.46] |
| NPV | 0.75 [0.74–0.76] | 0.94 [0.94–0.95] | 0.78 [0.77–0.79] | 0.94 [0.94–0.95] |
Model performance on the internal test set (Stanford) and external test set (Intermountain) with 95% confidence interval using probability threshold of 0.55 that maximizes both sensitivity and specificity on Stanford validation dataset. Bootstrapping is used to generate prevalence of PE in real world (between 14 and 22%).
Fig. 1PENet performance on independent test datasets.
Receiver operating characteristic curve (ROC) with bootstrap confidence intervals on Stanford internal test set (a) and Intermountain external test set (b).
Comparison with state-of-the-art 3D CNN models.
| Internal dataset: Stanford | External dataset: Intermountain | |
|---|---|---|
| Metric (AUROC) | ||
| PENet—24 slices kinetics pretrained | 0.84 [0.82–0.87] | 0.85 [0.81–0.88] |
| PENet no pretraining | 0.69 [0.74–0.65] | 0.62 [0.57–0.88] |
| ResNet3D-50 kinetics pretrained | 0.78 [0.74–0.81] | 0.77 [0.74–0.80] |
| ResNeXt3D-101 kinetics pretrained | 0.80 [0.77–0.82] | 0.83 [0.81–0.85] |
| DenseNet3D-121 kinetics pretrained | 0.69 [0.64–0.73] | 0.67 [0.63–0.71] |
AUROC on the internal test set (Stanford) and external test set (Intermountain) with 95% confidence interval: ResNet3D[47], ResNeXt3D[45] and DenseNet3D[46] were pretrained with Kinetics-600 and finetuned using the Internal dataset using the same training hyperparameters as PENet. PENet outperforms each of these models on both the internal and external test set.
Fig. 2(Sensitivity vs. specificity plot): Sensitivity and specificity across different operating point (probability threshold) with increment of 0.05 on the Stanford internal test set (a) and Intermountain external test (b).
Fig. 3(Class Activation Maps): Class activation map (CAM) representation of true positive (Stanford (a) and Intermountain (b), false-positive (Stanford (c) and Intermountain (d) and false-negative samples (Stanford (e) and Intermountain (f)—axial contrast enhanced CT pulmonary angiogram (left) and CAM inferred by the model overlay with the scan (right). a (Stanford test set: true positive): (left) demonstrates a non-occlusive filling defect in a left lower pulmonary artery segmental branch that is correctly localized by the model as seen in the CAM image overlay (right). b (Intermountain test set: True Positive): (left) demonstrates a non-occlusive filling defect in a left main pulmonary artery that is correctly localized by the model as seen in the CAM image overlay (right). c (Stanford test set: false positive): left) demonstrates a large left hilar node adjacent to the pulmonary artery that is incorrectly labeled as PE by the model as seen in the CAM image overlay (right). d (Intermountain test set: false positive): (left) demonstrates an enlarged unopacfied left lower lobe pulmonary vein invaded by tumor that is incorrectly labeled as PE by the model as seen in the CAM image overlay (right). e (Stanford test set: false negative): (left) Pulmonary embolism in right middle lobe segmental branch that is missed by the model as seen in the CAM image overlay (right). f (Intermountain test set: False negative): (left) Pulmonary embolism in left upper lobe segmental branch that is missed by the model as seen in the CAM image overlay (right).
Data characteristics of the internal (SMC) and external (Intermountain) dataset.
| Overall | Train | Validation | Test | External test | |
|---|---|---|---|---|---|
| Number of studies | 1797 | 1461 | 167 | 169 | 200 |
| Median age (IQR) | 66.14 (53.24–82.40) | 66.13 (53.14–82.95) | 64.10 (50.88–78.38) | 67.24 (56.62–82.76) | 55.3 (42.0–69.5) |
| Number of patients (Female %) | 1773 (57.07%) | 1414 (56.64%) | 162 (67.36%) | 163 (52.08%) | 198 (58.5%) |
| Median number of slices (IQR) | 386 (134) | 385 (136) | 388 (132) | 388 (139) | 324 |
| Number of positive PE | 655 | 488 | 82 | 85 | 94 |
| Number of negative PE | 1142 | 973 | 85 | 84 | 106 |
The internal SMC dataset was divided into training, validation and test. The training set was used to optimize model parameters and the validation set was used to select the best model and operating points. The hold-out test set was used to evaluate the model’s performance. The external Intermountain dataset was used solely for evaluation.
Fig. 4PENet architecture used in this study.
PENet is built using four architectural units: the PENet unit, Squeeze-and-Excitation block, the PE-Net bottleneck, and the PE-Net encoder. Each building block in the network is color-coded.
Input slice number experimentation.
| Internal dataset: Stanford | External dataset: Intermountain | |
|---|---|---|
| Metric (AUROC) | ||
| PENet—1 slice | 0.48 [0.45–0.51] | 0.51 [0.47–0.54] |
| PENet—6 slices | 0.57 [0.53–0.60] | 0.58 [0.55–0.59] |
| PENet—12 slices | 0.74 [0.70–0.77] | 0.69 [0.67–0.72] |
| PENet—24 slices | 0.84 [0.82–0.87] | 0.85 [0.81–0.88] |
| PENet—48 slices | 0.80 [0.77–0.83] | 0.83 [0.76–0.86] |
AUROC on the internal test set (Stanford) and external test set (Intermountain) with 95% confidence interval: smaller input slice number does not provide enough structural information to learn while too many input slices makes pulmonary embolism hard to detect.