| Literature DB >> 34599265 |
Jakob Weiss1,2,3,4,5, Jana Taron6,7, Zexi Jin6, Thomas Mayrhofer6,8, Hugo J W L Aerts9,6,10,11,12, Michael T Lu9,6, Udo Hoffmann9,6.
Abstract
Deep learning convolutional neural network (CNN) can predict mortality from chest radiographs, yet, it is unknown whether radiologists can perform the same task. Here, we investigate whether radiologists can visually assess image gestalt (defined as deviation from an unremarkable chest radiograph associated with the likelihood of 6-year mortality) of a chest radiograph to predict 6-year mortality. The assessment was validated in an independent testing dataset and compared to the performance of a CNN developed for mortality prediction. Results are reported for the testing dataset only (n = 100; age 62.5 ± 5.2; male 55%, event rate 50%). The probability of 6-year mortality based on image gestalt had high accuracy (AUC: 0.68 (95% CI 0.58-0.78), similar to that of the CNN (AUC: 0.67 (95% CI 0.57-0.77); p = 0.90). Patients with high/very high image gestalt ratings were significantly more likely to die when compared to those rated as very low (p ≤ 0.04). Assignment to risk categories was not explained by patient characteristics or traditional risk factors and imaging findings (p ≥ 0.2). In conclusion, assessing image gestalt on chest radiographs by radiologists renders high prognostic accuracy for the probability of mortality, similar to that of a specifically trained CNN. Further studies are warranted to confirm this concept and to determine potential clinical benefits.Entities:
Mesh:
Year: 2021 PMID: 34599265 PMCID: PMC8486799 DOI: 10.1038/s41598-021-99107-0
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Figure 1Overview of the study design. DL deep learning, NLST National Lung Screening Trial, PLCO Prostate, Lung, Colorectal, and Ovarian Cancer Screening Trial.
Patient characteristics in the training, tuning and testing dataset.
| Variables | Training | Tuning | Testing | |
|---|---|---|---|---|
| Participants | 100 | 100 | 100 | 1.00 |
| White | 96 (96%) | 97 (97%) | 86 (86%) | 0.008 |
| Black | 3 (3%) | 3 83%) | 12 (12%) | 0.01 |
| Other | 1 (1%) | – | 2 (2%) | 0.8 |
| Male sex | 58 (58%) | 61 (61%) | 55 (55%) | 0.7 |
| Age | 62.6 ± 5.6 | 63.6 ± 5.7 | 62.5 ± 5.2 | 0.3 |
| Obesity (BMI ≥ 30 kg/m2) | 20 (20%) | 30 (30%) | 25 (25%) | 0.3 |
| Current | 52 (52%) | 49 (49%) | 55 (55%) | 0.3 |
| Former | 48 (48%) | 51 (51%) | 45 (45%) | 0.7 |
| Diabetes | 9 (9%) | 14 (14%) | 14 (14%) | 0.5 |
| Hypertension | 42 (42%) | 42 (42%) | 34 (34%) | 0.4 |
| Stroke | 9 (9%) | 5 (5%) | 6 (6%) | 0.5 |
| Myocardial infarction | 7 (7%) | 14 (14%) | 18 (18%) | 0.06 |
| Cancer | 4 (4%) | 4 (4%) | 5 (5%) | 1.00 |
| Median follow-up (years) | 5.9 (4.1–6.4) | 5.9 (3.5–6.5) | 5.9 (3.4–6.4) | 0.9 |
Baseline characteristics and risk factors of the participants included for training, tuning and testing.
IQR interquartile range, BMI body mass index calculated as weight in kg/height in meters squared.
Figure 2Representative cases from the training data set: in the training data set, image findings (diagnostic and subclinical) as well as gestalt of the image (defined as the degree of deviation from an unremarkable chest radiograph associated with the likelihood of 6-year mortality rated on a binary scale) were assessed. In row (A), both participants presented with emphysema, indicating that a single major diagnostic finding are not reliable predictors for outcome. In row (B), 13 findings were reported in image on left, 4 findings were reported in image on right; example that sum of findings per subject was not associated with mortality. In row (C), image on left surgical clips indicate elevated probability of dying, however, radiologists rated gestalt of the image as “absent” on left and as “present” on right.
Figure 3Gestalt ratings for 6-year mortality by radiologists (A) and the deep learning network (B) in the testing dataset as well as the areas under the curve for the discriminatory ability (C). HR hazard ratio, CI confidence interval.
Figure 4Confusion matrix of the risk ratings for 6-mortality between radiologists and the deep learning convolutional neural network. Dark blue: agreement between radiologists and the deep learning convolutional neural network; light blue: agreement with deviation by one category. DL CNN deep learning convolutional neural network.
Figure 5Representative images of participants correctly classified as low (A) and high (B) risk of dying by radiologists and the deep learning convolutional neural network.