| Literature DB >> 33034642 |
Joy T Wu1, Ken C L Wong1, Yaniv Gur1, Nadeem Ansari1, Alexandros Karargyris1, Arjun Sharma1, Michael Morris1, Babak Saboury1, Hassan Ahmad1, Orest Boyko2, Ali Syed1, Ashutosh Jadhav1, Hongzhi Wang1, Anup Pillai1, Satyananda Kashyap1, Mehdi Moradi1, Tanveer Syeda-Mahmood1.
Abstract
Importance: Chest radiography is the most common diagnostic imaging examination performed in emergency departments (EDs). Augmenting clinicians with automated preliminary read assistants could help expedite their workflows, improve accuracy, and reduce the cost of care. Objective: To assess the performance of artificial intelligence (AI) algorithms in realistic radiology workflows by performing an objective comparative evaluation of the preliminary reads of anteroposterior (AP) frontal chest radiographs performed by an AI algorithm and radiology residents. Design, Setting, and Participants: This diagnostic study included a set of 72 findings assembled by clinical experts to constitute a full-fledged preliminary read of AP frontal chest radiographs. A novel deep learning architecture was designed for an AI algorithm to estimate the findings per image. The AI algorithm was trained using a multihospital training data set of 342 126 frontal chest radiographs captured in ED and urgent care settings. The training data were labeled from their associated reports. Image-based F1 score was chosen to optimize the operating point on the receiver operating characteristics (ROC) curve so as to minimize the number of missed findings and overcalls per image read. The performance of the model was compared with that of 5 radiology residents recruited from multiple institutions in the US in an objective study in which a separate data set of 1998 AP frontal chest radiographs was drawn from a hospital source representative of realistic preliminary reads in inpatient and ED settings. A triple consensus with adjudication process was used to derive the ground truth labels for the study data set. The performance of AI algorithm and radiology residents was assessed by comparing their reads with ground truth findings. All studies were conducted through a web-based clinical study application system. The triple consensus data set was collected between February and October 2018. The comparison study was preformed between January and October 2019. Data were analyzed from October to February 2020. After the first round of reviews, further analysis of the data was performed from March to July 2020. Main Outcomes and Measures: The learning performance of the AI algorithm was judged using the conventional ROC curve and the area under the curve (AUC) during training and field testing on the study data set. For the AI algorithm and radiology residents, the individual finding label performance was measured using the conventional measures of label-based sensitivity, specificity, and positive predictive value (PPV). In addition, the agreement with the ground truth on the assignment of findings to images was measured using the pooled κ statistic. The preliminary read performance was recorded for AI algorithm and radiology residents using new measures of mean image-based sensitivity, specificity, and PPV designed for recording the fraction of misses and overcalls on a per image basis. The 1-sided analysis of variance test was used to compare the means of each group (AI algorithm vs radiology residents) using the F distribution, and the null hypothesis was that the groups would have similar means.Entities:
Mesh:
Year: 2020 PMID: 33034642 PMCID: PMC7547369 DOI: 10.1001/jamanetworkopen.2020.22779
Source DB: PubMed Journal: JAMA Netw Open ISSN: 2574-3805
Core Finding Labels Derived From the Chest Radiograph Lexicon
| Finding | Samples in modeling data set, No. | AUC of AI algorithm | |
|---|---|---|---|
| Type | Label | ||
| Anatomical | Not otherwise specified opacity (eg, pleural or parenchymal opacity) | 81 013 | 0.736 |
| Anatomical | Linear or patchy atelectasis | 79 218 | 0.776 |
| Anatomical | Pleural effusion or thickening | 76 954 | 0.887 |
| Anatomical | No anomalies | 55 894 | 0.847 |
| Anatomical | Enlarged cardiac silhouette | 49 444 | 0.846 |
| Anatomical | Pulmonary edema or hazy opacity | 40 208 | 0.861 |
| Anatomical | Consolidation | 29 986 | 0.79 |
| Anatomical | Not otherwise specified calcification | 14 333 | 0.82 |
| Anatomical | Pneumothorax | 11 686 | 0.877 |
| Anatomical | Lobar or segmental collapse | 10 868 | 0.814 |
| Anatomical | Fracture | 9738 | 0.758 |
| Anatomical | Mass or nodule (not otherwise specified) | 8588 | 0.742 |
| Anatomical | Hyperaeration | 8197 | 0.905 |
| Anatomical | Degenerative changes | 7747 | 0.83 |
| Anatomical | Vascular calcification | 4481 | 0.873 |
| Anatomical | Tortuous aorta | 3947 | 0.814 |
| Anatomical | Multiple masses or nodules | 3453 | 0.754 |
| Anatomical | Vascular redistribution | 3436 | 0.705 |
| Anatomical | Enlarged hilum | 3106 | 0.734 |
| Anatomical | Scoliosis | 2968 | 0.815 |
| Anatomical | Bone lesion | 2879 | 0.762 |
| Anatomical | Hernia | 2792 | 0.828 |
| Anatomical | Postsurgical changes | 2526 | 0.834 |
| Anatomical | Mediastinal displacement | 1868 | 0.907 |
| Anatomical | Increased reticular markings or ILD pattern | 1828 | 0.891 |
| Anatomical | Old fractures | 1760 | 0.762 |
| Anatomical | Subcutaneous air | 1664 | 0.913 |
| Anatomical | Elevated hemidiaphragm | 1439 | 0.775 |
| Anatomical | Superior mediastinal mass or enlargement | 1345 | 0.709 |
| Anatomical | Subdiaphragmatic air | 1258 | 0.75 |
| Anatomical | Pneumomediastinum | 915 | 0.807 |
| Anatomical | Cyst or Bullae | 778 | 0.76 |
| Anatomical | Hydropneumothorax | 630 | 0.935 |
| Anatomical | Spinal degenerative changes | 454 | 0.818 |
| Anatomical | Calcified nodule | 439 | 0.736 |
| Anatomical | Lymph node calcification | 346 | 0.603 |
| Anatomical | Bullet or foreign bodies | 339 | 0.715 |
| Anatomical | Other soft tissue abnormalities | 334 | 0.652 |
| Anatomical | Diffuse osseous irregularity | 322 | 0.89 |
| Anatomical | Dislocation | 180 | 0.728 |
| Anatomical | Dilated bowel | 92 | 0.805 |
| Anatomical | Osteotomy changes | 76 | 0.942 |
| Anatomical | New fractures | 70 | 0.696 |
| Anatomical | Shoulder osteoarthritis | 70 | 0.698 |
| Anatomical | Elevated humeral head | 69 | 0.731 |
| Anatomical | Azygous fissure (benign) | 47 | 0.652 |
| Anatomical | Contrast in the GI or GU tract | 17 | 0.724 |
| Device | Other internal postsurgical material | 26 191 | 0.831 |
| Device | Sternotomy wires | 12 262 | 0.972 |
| Device | Cardiac pacer and wires | 12 109 | 0.985 |
| Device | Musculoskeletal or spinal hardware | 5481 | 0.848 |
| Technical | Low lung volumes | 25 546 | 0.877 |
| Technical | Rotated | 3809 | 0.803 |
| Technical | Lungs otherwise not fully included | 1440 | 0.717 |
| Technical | Lungs obscured by overlying object or structure | 653 | 0.68 |
| Technical | Apical lordotic | 620 | 0.716 |
| Technical | Apical kyphotic | 566 | 0.872 |
| Technical | Nondiagnostic radiograph | 316 | 0.858 |
| Technical | Limited by motion | 290 | 0.628 |
| Technical | Limited by exposure or penetration | 187 | 0.834 |
| Technical | Apices not included | 175 | 0.822 |
| Technical | Costophrenic angle not included | 62 | 0.807 |
| Tubes and lines | Central intravascular lines | 57 868 | 0.891 |
| Tubes and lines | Tubes in the airway | 32 718 | 0.96 |
| Tubes and lines | Enteric tubes | 27 998 | 0.939 |
| Tubes and lines | Incorrect placement | 11 619 | 0.827 |
| Tubes and lines | Central intravascular lines: incorrectly positioned | 4434 | 0.769 |
| Tubes and lines | Enteric tubes: incorrectly positioned | 4372 | 0.931 |
| Tubes and lines | Coiled, kinked, or fractured | 4325 | 0.857 |
| Tubes and lines | Tubes in the airway: incorrectly positioned | 1962 | 0.919 |
Abbreviations: AI, artificial intelligence; AUC, area under the curve; GI, gastrointestinal; GU, genitourinary; ILD, interstitial lung disease.
Figure 1. Sampling of Data Distributions for Artificial Intelligence Algorithm Training and Evaluation
Two images were excluded from the comparison study data set owing to radiology resident annotations missing. The prevalence distribution of training and study data sets are different owing to the difference in the sampling process.
Figure 2. Deep Learning Network Architecture for Anteroposterior Chest Radiographs
Figure 3. Receiver Operating Characteristic Curves of Artifical Intelligence Algorithm on Study Data Set and Relative Performance
The findings selected were from the most prevalent ones in the modeling data set. The light blue square indicates mean sensitivity and 1 − specificity of the radiology residents on the comparison study data set; dark blue circle, operating point of the artificial intelligence algorithm based on the F1 score–based threshold derived from training data.
Preliminary Read Performance Differences Between Radiology Residents and AI Algorithm
| Method | No. | Image-based measure, mean (95% CI) | |||
|---|---|---|---|---|---|
| Images | Findings | PPV | Sensitivity | Specificity | |
| All radiology residents | 1998 | 72 | 0.682 (0.670-0.694) | 0.720 (0.709-0.732) | 0.973 (0.971-0.974) |
| AI algorithm | 1998 | 72 | 0.730 (0.718-0.742) | 0.716 (0.704-0.729) | 0.980 (0.979-0.981) |
| AI vs radiology residents, | NA | NA | .001 | .66 | <.001 |
Abbreviations: AI, artificial intelligence; NA, not applicable; PPV, positive predictive value.