| Literature DB >> 29577791 |
Seok Won Chung1, Seung Seog Han2, Ji Whan Lee1, Kyung-Soo Oh1, Na Ra Kim3, Jong Pil Yoon4, Joon Yub Kim5, Sung Hoon Moon6, Jieun Kwon7, Hyo-Jin Lee8, Young-Min Noh9, Youngjun Kim10.
Abstract
Background and purpose - We aimed to evaluate the ability of artificial intelligence (a deep learning algorithm) to detect and classify proximal humerus fractures using plain anteroposterior shoulder radiographs. Patients and methods - 1,891 images (1 image per person) of normal shoulders (n = 515) and 4 proximal humerus fracture types (greater tuberosity, 346; surgical neck, 514; 3-part, 269; 4-part, 247) classified by 3 specialists were evaluated. We trained a deep convolutional neural network (CNN) after augmentation of a training dataset. The ability of the CNN, as measured by top-1 accuracy, area under receiver operating characteristics curve (AUC), sensitivity/specificity, and Youden index, in comparison with humans (28 general physicians, 11 general orthopedists, and 19 orthopedists specialized in the shoulder) to detect and classify proximal humerus fractures was evaluated. Results - The CNN showed a high performance of 96% top-1 accuracy, 1.00 AUC, 0.99/0.97 sensitivity/specificity, and 0.97 Youden index for distinguishing normal shoulders from proximal humerus fractures. In addition, the CNN showed promising results with 65-86% top-1 accuracy, 0.90-0.98 AUC, 0.88/0.83-0.97/0.94 sensitivity/specificity, and 0.71-0.90 Youden index for classifying fracture type. When compared with the human groups, the CNN showed superior performance to that of general physicians and orthopedists, similar performance to orthopedists specialized in the shoulder, and the superior performance of the CNN was more marked in complex 3- and 4-part fractures. Interpretation - The use of artificial intelligence can accurately detect and classify proximal humerus fractures on plain shoulder AP radiographs. Further studies are necessary to determine the feasibility of applying artificial intelligence in the clinic and whether its use could improve care and outcomes compared with current orthopedic assessments.Entities:
Mesh:
Year: 2018 PMID: 29577791 PMCID: PMC6066766 DOI: 10.1080/17453674.2018.1453714
Source DB: PubMed Journal: Acta Orthop ISSN: 1745-3674 Impact factor: 3.717
Figure 1.Each shoulder anteroposterior radiograph was manually cropped into a square in which the humeral head and neck are centered such that they comprise approximately 50% of the square image as illustrated above. Images were then resized to 256 × 256 pixels. Examples of normal and each fracture type: (A) normal, (B) greater tuberosity fracture, (C) surgical neck fracture, (D) 3-part fracture, and (E) 4-part fracture.
Mispredicted cases in the convolutional neural network model. Values are n (%)
| Types mispredicted as | |||||
|---|---|---|---|---|---|
| Dataset | Normal | Greater tuberosity fracture | Surgical neck fracture | Three- part fracture | Four- part fracture |
| Normal (n = 1,500) | 47 (3) | 19 (1) | 1 (0) | 0 (0) | |
| Greater tuberosity fracture (n = 990) | 37 (4) | 30 (3) | 68 (7) | 5 (1) | |
| Surgical neck fracture (n = 1,500) | 16 (1) | 19 (1) | 115 (8) | 148 (10) | |
| Three-part fracture (n = 750) | 0 (0) | 39 (5) | 135 (18) | 88 (12) | |
| Four-part fracture (n = 690) | 2 (0) | 1 (0) | 98 (14) | 70 (10) | |
50 in each partition x three repetitions x 10 partitions
Diagnostic accuracy for differentiating proximal humerus fractures from normal shoulders among the CNN and human groups. Values are mean (CI)
| CNN | General physician | General orthopedist | Orthopedists specialized in shoulder | p-value | |
|---|---|---|---|---|---|
| Top-1 accuracy (%) | 96 (94–97) | 85 (80–90) | 93 (90–96) | 93 (87–99) | < 0.001 |
| Sensitivity | 0.99 (0.99–1.00) | 0.82 (0.78–0.87) | 0.93 (0.89–0.97) | 0.96 (0.95–0.98) | < 0.001 |
| Specificity | 0.97 (0.97–0.98) | 0.94 (0.93–0.96) | 0.97 (0.96–0.98) | 0.98 (0.96–1.00) | 0.002 |
| Youden index | 0.97 (0.96–0.97) | 0.77 (0.72–0.82) | 0.90 (0.87–0.94) | 0.94 (0.92–0.96) | < 0.001 |
CNN, convolutional neural network
Youden index was calculated as [sensitivity + specificity – 1].
Statistically significant in a comparison of CNN and each human group (results from a Bonferroni post hoc analysis)
Figure 3.The diagnostic performance between the CNN and each human group was compared using the receiver operating characteristics curves of the CNN and the sensitivity–specificity distribution of each human group to differentiate normal shoulders from proximal humerus fractures (A) and to classify each fracture type: (B) greater tuberosity fracture, (C) surgical neck fracture, (D) 3-part fracture, and (E) 4-part fracture.
CNN = convolutional neural network; AUC = area under curve of the receiver operating characteristics curve.
The representative receiver operating characteristics curve of the CNN was selected as the curve with the closest AUC value to the average AUC.
The dots on the plots represent the sensitivity and specificity of each group (yellow, general physicians; green, general orthopedists; red, orthopedists specialized in the shoulder). All AUCs for the normal shoulder and each fracture type were over 90%. The CNN achieved superior performance at least to a general physician (yellow dot) or to a general orthopedist (green dot), most of whose sensitivity/specificity point lay below the receiver operating characteristic curve of the CNN.