| Literature DB >> 32783544 |
Yutoku Yamada1,2, Satoshi Maki1, Shunji Kishida2, Haruki Nagai2, Junnosuke Arima1,3, Nanako Yamakawa2, Yasushi Iijima2, Yuki Shiko4, Yohei Kawasaki4, Toshiaki Kotani2, Yasuhiro Shiga1, Kazuhide Inage1, Sumihisa Orita1,5, Yawara Eguchi1, Hiroshi Takahashi6, Takeshi Yamashita3, Shohei Minami2, Seiji Ohtori1.
Abstract
Background and purpose - Deep-learning approaches based on convolutional neural networks (CNNs) are gaining interest in the medical imaging field. We evaluated the diagnostic performance of a CNN to discriminate femoral neck fractures, trochanteric fractures, and non-fracture using antero-posterior (AP) and lateral hip radiographs. Patients and methods - 1,703 plain hip AP radiographs and 1,220 plain hip lateral radiographs were included in the total dataset. 150 images each of the AP and lateral views were separated out and the remainder of the dataset was used for training. The CNN made the diagnosis based on: (1) AP radiographs alone, (2) lateral radiographs alone, or (3) both AP and lateral radiographs combined. The diagnostic performance of the CNN was measured by the accuracy, recall, precision, and F1 score. We further compared the CNN's performance with that of orthopedic surgeons. Results - The average accuracy, recall, precision, and F1 score of the CNN based on both anteroposterior and lateral radiographs were 0.98, 0.98, 0.98, and 0.98, respectively. The accuracy of the CNN was comparable to, or statistically significantly better than, that of the orthopedic surgeons regardless of radiographic view used. In the CNN model, the accuracy of the diagnosis based on both views was significantly better than the lateral view alone and tended to be better than the AP view alone. Interpretation - The CNN exhibited comparable or superior performance to that of orthopedic surgeons to discriminate femoral neck fractures, trochanteric fractures, and non-fracture using both AP and lateral hip radiographs.Entities:
Mesh:
Year: 2020 PMID: 32783544 PMCID: PMC8023868 DOI: 10.1080/17453674.2020.1803664
Source DB: PubMed Journal: Acta Orthop ISSN: 1745-3674 Impact factor: 3.717
Figure 1.Image preprocessing for the convolutional neural network model training and validation. We cropped images to a minimum region containing the femoral head and the greater and lesser trochanters in both the AP (A) and lateral (B) hip radiographs. On the AP radiographs, the fractured hip (left white box) was cropped and the side contralateral from the fractured hip (right white box) was cropped as the non-fractured hip. AP = anteroposterior.
Baseline patient characteristics
| Factor | Femoral | Trochanteric | Non-fracture |
| Age, mean (SD) | 81.3 (11.4) | 85.2 (10.0) | 68.8 (16.2) |
| Sex (M/F), n | 136/433 | 105/361 | 81/153 |
Accuracy, p-value of the accuracy compared with the CNN, average recall, precision, and F1 score of the diagnostic performance of the CNN and the 4 orthopedic surgeons based on both the anteroposterior and the lateral radiographs
| CNN/ | Accuracy (CI) | p-value | Average | Average | Average |
|---|---|---|---|---|---|
| CNN | 0.98 (0.96–1.00) | – | 0.98 | 0.98 | 0.98 |
| Board certified | |||||
| 1 | 0.92 (0.88–0.96) | 0.01 | 0.92 | 0.92 | 0.92 |
| 2 | 0.95 (0.91–0.98) | 0.1 | 0.95 | 0.95 | 0.95 |
| Resident | |||||
| 1 | 0.87 (0.82–0.93) | 0.0006 | 0.87 | 0.89 | 0.88 |
| 2 | 0.78 (0.71–0.85) | < 0.0001 | 0.78 | 0.82 | 0.80 |
compared with CNN
CI = 95% confidence interval;
CNN = convolutional neural network.
Diagnostic performance of the CNN and the 4 orthopedic surgeons based on both the anteroposterior and the lateral radiographs
| CNN/ | Femoral neck fracture | Trochanteric fracture | Non-fracture | ||||||
|---|---|---|---|---|---|---|---|---|---|
| surgeon | Recall | Precision | F1 score | Recall | Precision | F1 score | Recall | Precision | F1 score |
| CNN | 1.00 | 0.98 | 0.99 | 0.94 | 1.00 | 0.97 | 1.00 | 0.96 | 0.98 |
| Board certified | |||||||||
| 1 | 0.96 | 0.89 | 0.92 | 0.94 | 0.92 | 0.93 | 0.86 | 0.96 | 0.91 |
| 2 | 0.96 | 0.96 | 0.96 | 0.96 | 0.96 | 0.96 | 0.92 | 0.92 | 0.92 |
| Resident | |||||||||
| 1 | 0.96 | 0.81 | 0.88 | 1.00 | 0.86 | 0.93 | 0.66 | 1.00 | 0.80 |
| 2 | 0.90 | 0.71 | 0.80 | 0.96 | 0.77 | 0.86 | 0.48 | 0.96 | 0.64 |
CNN = convolutional neural network.
Figure 2.Comparison of the accuracy between the AP, lateral, and both views of the CNN and the 4 orthopedic surgeons. In the CNN model, the accuracy of the diagnosis based on both views was statistically better than the AP view alone and the lateral view alone. The accuracy of diagnosis based on the AP view alone was statistically better than the lateral view alone. The same trend was also seen with the board-certified orthopedic surgeons. AP = anteroposterior; CNN = convolutional neural network.
Interrater reliability presented with Cohen’s kappa of the orthopedic surgeons
| Board certified | Resident | |||
|---|---|---|---|---|
| orthopedic | surgeon | orthopedic | surgeon | |
| Surgeon | 1 | 2 | 1 | 2 |
| Board certified | ||||
| 1 | – | 0.85 | 0.78 | 0.73 |
| 2 | 0.85 | – | 0.76 | 0.78 |
| Resident | ||||
| 1 | 0.78 | 0.76 | – | 0.66 |
| 2 | 0.73 | 0.78 | 0.66 | – |
Figure 3.Representative radiographs of hip fractures. The AP (A) and lateral (B) radiographs of a trochanteric fracture, which the CNN misdiagnosed as a non-fracture, but all the orthopedic surgeon diagnosed correctly. The AP (C) and lateral (D) radiographs of a neck fracture, which 3 of the 4 orthopedic surgeons misdiagnosed as a non-fracture, but the CNN diagnosed correctly. The AP (E) and lateral (F) radiographs of a trochanteric fracture, which 3 of the 4 orthopedic surgeons misdiagnosed as a non-fracture or a neck fracture, but the CNN diagnosed correctly. AP = anteroposterior; CNN = convolutional neural network.